Implement caching for query embeddings #113

thoj · 2025-01-24T00:38:17Z

Added caching to store query embeddings, preventing redundant API calls for the same query across multiple files in the store.

When using file search with many files rag_api sends a embeddings query for each file in the store. This is completely unnecessary AFAIK. This pull implements a very simple change to speed up file search queries with many files.

On my test agent in LibreChat with over 40 files, Without cache:

rag_api           | 2025-01-24 00:08:07,012 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
rag_api           | 2025-01-24 00:08:07,476 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
.... x40
rag_api           | 2025-01-24 00:08:24,632 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
rag_api           | 2025-01-24 00:08:24,837 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
rag_api           | 2025-01-24 00:08:24,841 - root - INFO - Request POST http://rag_api:8000/query - 200

With Cache:

rag_api           | 2025-01-24 00:16:28,813 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
rag_api           | 2025-01-24 00:16:28,929 - root - INFO - Request POST http://rag_api:8000/query - 200
rag_api           | 2025-01-24 00:16:28,938 - root - INFO - Request POST http://rag_api:8000/query - 200
rag_api           | 2025-01-24 00:16:28,939 - root - INFO - Request POST http://rag_api:8000/query - 200

Notice only one query to OpenAI. This saves almost 20 seconds.

Added caching to store query embeddings, preventing redundant API calls for the same query across multiple files in the store.

Implement caching for query embeddings

aef73da

Added caching to store query embeddings, preventing redundant API calls for the same query across multiple files in the store.

thoj force-pushed the main branch from 14857ac to aef73da Compare January 24, 2025 01:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement caching for query embeddings #113

Implement caching for query embeddings #113

thoj commented Jan 24, 2025

Implement caching for query embeddings #113

Are you sure you want to change the base?

Implement caching for query embeddings #113

Conversation

thoj commented Jan 24, 2025