-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest Folder #15
Comments
@pdchristian, thanks for the question. I would like to know how you're chunking all the documents you're loading. You would need to chunk them and iteratively pass them to the embedding model to create vector embeddings and load to vector storage. |
@tonykipkemboi, Aufzeichnung.2024-08-13.165752.mp4Is there a problem with the DirectoryLoader, how I am trying to use it? |
@pdchristian, thanks for the video. I'll recreate the issue and report back to you. |
Hello,
I would very much like to ingest all my local text files (pdf, docx and txt). Therefore I replaced the loader with the DirectoryLoader, as shown below. This basically works, but only the last document is ingested (I have 4 pdfs for testing).
local_path = "../data"
Local PDF file uploads
if local_path:
loader = DirectoryLoader(local_path, glob='**/[!.]*', use_multithreading=True, show_progress=True)
data = loader.load()
data[0]
Output:
100%|██████████| 4/4 [00:31<00:00, 7.93s/it]
Add to vector database
vector_db = Chroma.from_documents(
documents=chunks,
embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
#embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
collection_name="local-rag"
)
Output OllamaEmbedings:
OllamaEmbeddings: 100%|██████████| 143/143 [00:11<00:00, 12.73it/s]
Should be a much higher number of chunks
It would be great if my local office documents could be ingested.
The text was updated successfully, but these errors were encountered: