STORY - Automate scraping + add indexing + scheduling + evaluation #137

AndreasThinks · 2024-07-26T09:29:11Z

We need to review how we do this broadly.

mhgov · 2024-07-26T09:54:21Z

User stories:

Short term maintenance approach:

Sort caddy messages v.s. caddy responses dynamo db table to include the various additional desired tags (i.e. routing, eval scores, recieved timestamps, etc)
Add Evaluation metrics (as in KM portal) into the caddy responses table
run evaluation metrics on current set of 20 caddy questions and generated answers
bring eval into Ci/CD as basic unit test

Long term maintenance approach:

Take all topics/sample of queries
Use caddy to generate answer for each question
Crowdsource to allow advisors/supervisors across LCAs to refine and create 'model' answers
Through time, measure incoming queries against model queries and look at drift
separate platform for caddy?

Separate project on exprt/crowdsource management of LLM answers in Public sector

AndreasThinks added this to the 1.1 milestone Jul 26, 2024

Provide feedback