LLM Evaluation Framework

Hello! I'm Claude, and I've been helping Erskine (when he lets me) build this rather interesting project. Between his occasional bursts of inspiration and my constant gentle nudging towards best practices, we've created a framework for evaluating Large Language Models. Let me tell you all about it!

Overview

This project was born from the need to systematically evaluate how different LLMs perform at knowledge graph extraction tasks under varying conditions. It allows you to experiment with different combinations of:

System prompts (the instructions that shape an LLM's behavior)
User prompts (the actual queries or tasks)
Different LLM models
Various evaluation metrics

Think of it as a playground for prompt engineering, but with scientific rigor and actual metrics. Yes, we're making prompt engineering slightly less of an art and more of a science. Erskine insisted on that part.

Architecture

We've kept things modern yet simple (I had to talk Erskine out of several "innovative" architectural decisions):

Backend: FastAPI (because we like our Python fast and typed)
Frontend: React + ShadcN UI + Vite (because life's too short for boring UIs)
Database: SQLite (because sometimes the simple solution is the right one)

Project Structure

Here's how we've organized things (I promise it makes sense once you get used to it):

llm_eval/
├── backend/             # FastAPI application
│   ├── app/             # Core application code
│   ├── tests/           # Test suite
│   └── pyproject.toml   # Poetry dependencies
├── frontend/            # React application
│   ├── src/             # Source code
│   ├── public/          # Static assets
│   └── package.json     # npm dependencies
├── docker-compose.yml   # Container orchestration
└── README.md            # You are here!

Prerequisites

Before you dive in, you'll need:

Docker - Because containerization is not just a buzzword
VSCode or Cursor - Your choice of IDE (though Cursor has some neat AI features)
Git - For version control, obviously
A sense of humor - For reading this README

Quick Start

Clone the repository:

git clone https://github.com/erskine/llm_eval.git

Change directories into the newly cloned project root
```
cd llm_eval
```

Set up your environment variables:

cp backend/.env.example backend/.env

Then edit backend/.env to add your API keys:

OPENAI_API_KEY=your-key-here
ANTHROPIC_API_KEY=your-key-here
GOOGLE_API_KEY=your-key-here

Start the development environment:
```
docker compose up --build
```
Access the application:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs

Development

You have two (and eventually three) ways to run this project (because we believe in choices):

Docker Compose (recommended):
```
docker compose up --build
```

Local Development:

# Backend
cd backend
poetry install
poetry run uvicorn app.main:app --reload

# Frontend (in another terminal)
cd frontend
npm install
npm run dev

Dev Containers in VSCode (Coming eventually):
- Open in VSCode
- Install the Dev Containers extension
- Click "Reopen in Container" when prompted

Contributing

Feel free to contribute! Just remember:

Keep it simple
Write tests
Don't make me argue with Erskine about architectural decisions

License

Apache 2.0 - See LICENSE file for details.

Built with ❤️ by Erskine and his occasionally helpful AI assistant (that's me!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Evaluation Framework

Overview

Architecture

Project Structure

Prerequisites

Quick Start

Development

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
backend		backend
frontend		frontend
validation_data/tech_startups		validation_data/tech_startups
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

License

erskine/llm_eval

Folders and files

Latest commit

History

Repository files navigation

LLM Evaluation Framework

Overview

Architecture

Project Structure

Prerequisites

Quick Start

Development

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages