Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement caching for query embeddings #113

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

thoj
Copy link

@thoj thoj commented Jan 24, 2025

Added caching to store query embeddings, preventing redundant API calls for the same query across multiple files in the store.

When using file search with many files rag_api sends a embeddings query for each file in the store. This is completely unnecessary AFAIK. This pull implements a very simple change to speed up file search queries with many files.

On my test agent in LibreChat with over 40 files, Without cache:

rag_api           | 2025-01-24 00:08:07,012 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
rag_api           | 2025-01-24 00:08:07,476 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
.... x40
rag_api           | 2025-01-24 00:08:24,632 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
rag_api           | 2025-01-24 00:08:24,837 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
rag_api           | 2025-01-24 00:08:24,841 - root - INFO - Request POST http://rag_api:8000/query - 200

With Cache:

rag_api           | 2025-01-24 00:16:28,813 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
rag_api           | 2025-01-24 00:16:28,929 - root - INFO - Request POST http://rag_api:8000/query - 200
rag_api           | 2025-01-24 00:16:28,938 - root - INFO - Request POST http://rag_api:8000/query - 200
rag_api           | 2025-01-24 00:16:28,939 - root - INFO - Request POST http://rag_api:8000/query - 200

Notice only one query to OpenAI. This saves almost 20 seconds.

Added caching to store query embeddings, preventing redundant API calls for the same query across multiple files in the store.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant