Hello! I'm Claude, and I've been helping Erskine (when he lets me) build this rather interesting project. Between his occasional bursts of inspiration and my constant gentle nudging towards best practices, we've created a framework for evaluating Large Language Models. Let me tell you all about it!
This project was born from the need to systematically evaluate how different LLMs perform at knowledge graph extraction tasks under varying conditions. It allows you to experiment with different combinations of:
- System prompts (the instructions that shape an LLM's behavior)
- User prompts (the actual queries or tasks)
- Different LLM models
- Various evaluation metrics
Think of it as a playground for prompt engineering, but with scientific rigor and actual metrics. Yes, we're making prompt engineering slightly less of an art and more of a science. Erskine insisted on that part.
We've kept things modern yet simple (I had to talk Erskine out of several "innovative" architectural decisions):
- Backend: FastAPI (because we like our Python fast and typed)
- Frontend: React + ShadcN UI + Vite (because life's too short for boring UIs)
- Database: SQLite (because sometimes the simple solution is the right one)
Here's how we've organized things (I promise it makes sense once you get used to it):
llm_eval/
├── backend/ # FastAPI application
│ ├── app/ # Core application code
│ ├── tests/ # Test suite
│ └── pyproject.toml # Poetry dependencies
├── frontend/ # React application
│ ├── src/ # Source code
│ ├── public/ # Static assets
│ └── package.json # npm dependencies
├── docker-compose.yml # Container orchestration
└── README.md # You are here!
Before you dive in, you'll need:
- Docker - Because containerization is not just a buzzword
- VSCode or Cursor - Your choice of IDE (though Cursor has some neat AI features)
- Git - For version control, obviously
- A sense of humor - For reading this README
- Clone the repository:
git clone https://github.com/erskine/llm_eval.git
- Change directories into the newly cloned project root
cd llm_eval
- Set up your environment variables:
Then edit
cp backend/.env.example backend/.env
backend/.env
to add your API keys:OPENAI_API_KEY=your-key-here ANTHROPIC_API_KEY=your-key-here GOOGLE_API_KEY=your-key-here
- Start the development environment:
docker compose up --build
- Access the application:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
You have two (and eventually three) ways to run this project (because we believe in choices):
-
Docker Compose (recommended):
docker compose up --build
-
Local Development:
# Backend cd backend poetry install poetry run uvicorn app.main:app --reload # Frontend (in another terminal) cd frontend npm install npm run dev
-
Dev Containers in VSCode (Coming eventually):
- Open in VSCode
- Install the Dev Containers extension
- Click "Reopen in Container" when prompted
Feel free to contribute! Just remember:
- Keep it simple
- Write tests
- Don't make me argue with Erskine about architectural decisions
Apache 2.0 - See LICENSE file for details.
Built with ❤️ by Erskine and his occasionally helpful AI assistant (that's me!)