Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest Folder #15

Open
pdchristian opened this issue Aug 4, 2024 · 3 comments
Open

Ingest Folder #15

pdchristian opened this issue Aug 4, 2024 · 3 comments

Comments

@pdchristian
Copy link

Hello,

I would very much like to ingest all my local text files (pdf, docx and txt). Therefore I replaced the loader with the DirectoryLoader, as shown below. This basically works, but only the last document is ingested (I have 4 pdfs for testing).

local_path = "../data"

Local PDF file uploads

if local_path:
loader = DirectoryLoader(local_path, glob='**/[!.]*', use_multithreading=True, show_progress=True)
data = loader.load()
data[0]

Output:
100%|██████████| 4/4 [00:31<00:00, 7.93s/it]

Add to vector database

vector_db = Chroma.from_documents(
documents=chunks,
embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
#embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
collection_name="local-rag"
)

Output OllamaEmbedings:
OllamaEmbeddings: 100%|██████████| 143/143 [00:11<00:00, 12.73it/s]
Should be a much higher number of chunks

It would be great if my local office documents could be ingested.

@tonykipkemboi
Copy link
Owner

@pdchristian, thanks for the question. I would like to know how you're chunking all the documents you're loading. You would need to chunk them and iteratively pass them to the embedding model to create vector embeddings and load to vector storage.

@pdchristian
Copy link
Author

@tonykipkemboi,
thanks for your response.
I think the code I updated to load the documents seems to be buggy. Only loading the first document takes time. For the other 3 ones, the progress bar jumps quickly form 1 to 4.

Aufzeichnung.2024-08-13.165752.mp4

Is there a problem with the DirectoryLoader, how I am trying to use it?

@tonykipkemboi tonykipkemboi reopened this Aug 15, 2024
@tonykipkemboi
Copy link
Owner

@pdchristian, thanks for the video. I'll recreate the issue and report back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants