Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORY - Automate scraping + add indexing + scheduling + evaluation #137

Open
AndreasThinks opened this issue Jul 26, 2024 · 1 comment
Open
Milestone

Comments

@AndreasThinks
Copy link
Contributor

AndreasThinks commented Jul 26, 2024

We need to review how we do this broadly.

@AndreasThinks AndreasThinks added this to the 1.1 milestone Jul 26, 2024
@mhgov
Copy link
Contributor

mhgov commented Jul 26, 2024

User stories:

  • be able to version control and monitor LLM answers
  • be able to see quality of answers, refine and improve prompts over time
  • as expert user play with different prompts and see if I can improve
  • as a developer run a unit test/regression test to see if LLM can be deployed
  • Witness scores through time to check regression/performance

Short term maintenance approach:

  • Sort caddy messages v.s. caddy responses dynamo db table to include the various additional desired tags (i.e. routing, eval scores, recieved timestamps, etc)
  • Add Evaluation metrics (as in KM portal) into the caddy responses table
  • run evaluation metrics on current set of 20 caddy questions and generated answers
  • bring eval into Ci/CD as basic unit test

Long term maintenance approach:

  • Take all topics/sample of queries
  • Use caddy to generate answer for each question
  • Crowdsource to allow advisors/supervisors across LCAs to refine and create 'model' answers
  • Through time, measure incoming queries against model queries and look at drift
  • separate platform for caddy?

Separate project on exprt/crowdsource management of LLM answers in Public sector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants