Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[install-help]: Responses Not Based on the Specified Directory or File Content #94

Open
aef5748 opened this issue Oct 30, 2024 · 10 comments
Labels
help wanted Extra attention is needed

Comments

@aef5748
Copy link

aef5748 commented Oct 30, 2024

When asking questions and specifying a directory or file for reference, the responses may sometimes include content that is not found within the specified directory or file. This is particularly likely to occur when a directory is chosen as the basis for the inquiry.

Is there any specific setting or way of asking questions that would allow for accurately finding content based on the inquiry and the specified document?

@aef5748 aef5748 added the help wanted Extra attention is needed label Oct 30, 2024
@kyteinsky
Copy link
Contributor

Hi, how did you confirm this?
Do you see files from other directories when using "Selective context" and specifying a directory in the assistant?

@aef5748
Copy link
Author

aef5748 commented Nov 1, 2024

I have designated specific folders such as the green boxed part (about 42 files), and asked questions to help with the collection
Image

But the reply information is meaningless, and the files in the red box are not the content of the data I specified.
Image
Image
Image

@kyteinsky
Copy link
Contributor

There seem to be two issues:

  1. Weird answer (what does it say at the top of the answer though?)
    This could be because of the model's abilities to handle certain types of data. Try formatting the "Meeting minutes.ods" file (in a new file to preserve the original file) to get a better answer.
  2. Files from outside the specified scope
    Did you reinstall nextcloud during testing by any chance? There might be mixups of file IDs from an indexing of the previous nextcloud install. Try completely removing context chat (with data, its in a docker volume, use this command: docker volume rm nc_app_context_chat_backend_data) and reinstalling. For faster indexing of one user, you can use this command: occ context_chat:scan <user_id>

@aef5748
Copy link
Author

aef5748 commented Nov 7, 2024

Weird answer (what does it say at the top of the answer though?)

The top answer is as follows
Image

If I change other models, maybe it can be solved?

Files from outside the specified scope
Did you reinstall nextcloud during testing by any chance?

Nextcloud was not reinstalled during the test.
I upgraded from Nextcloud26 to Nextcloud30, and then installed context_chat.

I am currently trying to delete the files in persistent_storage/vector_db_data and reinstall context_chat, but it seems that there is no action to transfer the files to context_chat (I have executed php occ background-job:worker -v -t 60 "OCA\ContextChat\BackgroundJobs\IndexerJob ")

Executing occ context_chat:scan <user_id> will pop up the following error (It will be normal after repeating it a few times)
Image

From the log, I can see that there are messages about receiving data (which will keep recurring), but I don’t see any tasks for uploading files to Nextcloud.
context_chat_backend_error_logs_20241107.txt

Copy link

github-actions bot commented Dec 7, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Dec 7, 2024
@kyteinsky kyteinsky removed the stale label Dec 11, 2024
@kyteinsky
Copy link
Contributor

hi, sorry for the late reply.

Executing occ context_chat:scan <user_id> will pop up the following error

this is the way to do that.

From the logs it looks like a big file. What are the sizes of your test files? It seems like there was a request timeout by your proxy server since the file transfer took too long.
It is not recommended to use that large files, we'll in future either limit the max file size (configurable) or make the indexing async.

@kyteinsky
Copy link
Contributor

If I change other models, maybe it can be solved?

maybe, or changing the sampler settings like temperature, top_p, etc., or try with different formats for the same data, arranging it around so its easier for the llm to understand it. It can't be done directly since you don't exactly know how the parser will parse the ods file so probably a text-like document with the data would be better.

@aef5748
Copy link
Author

aef5748 commented Dec 16, 2024

From the logs it looks like a big file. What are the sizes of your test files?

They are all default files after account creation, with a maximum size of about 14.3MB.
I have rerun it several times, and the location where the error message appears is not necessarily in the case of large files. Sometimes such error messages will also appear in files of hundreds of KB.

@kyteinsky
Copy link
Contributor

I'm not entirely sure what happened but I guess the collectives folder was a bit too large to be processed synchronously when used in selective context (it wasn't indexed before). So I would suggest to first index the collectives folder with occ context_chat:scan -d 'Collectives' <user_id> and then try the same thing.
Also, please use the latest version of context_chat and context_chat_backend for this.

@DaphneMuller
Copy link

it is not recommended to use that large files

maybe document that in the ai docs too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants