Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Memory leaks with load_dataset_multi_txn #330

Closed
Ramlaoui opened this issue Dec 24, 2024 · 3 comments
Closed

[Bug]: Memory leaks with load_dataset_multi_txn #330

Ramlaoui opened this issue Dec 24, 2024 · 3 comments
Labels
bug Something isn't working community pgai

Comments

@Ramlaoui
Copy link

Ramlaoui commented Dec 24, 2024

What happened?

Hi, I've been using the new feature to add HugginFace datasets inside a table. However, for large datasets it seems like the call to load_dataset_multi_txn crashes after a certain time because of OOM problems.
I've tried playing with the size of the batch and commit_every_n_batches but I still get the same issue.

Is there any way to mitigate this issue or at least to have a parameter setting where we want to start uploading from the dataset (eg. after 1000 batches).

image

pgai extension affected

0.6.0

pgai library affected

No response

PostgreSQL version used

17.1

What operating system did you use?

Ubuntu 24.04 32GB RAM

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

call ai.load_dataset_multi_txn('LeMaterial/LeMat-Bulk', 'compatible_pbe', table_name => 'lemat', if_table_exists => 'append', commit_every_n_batches => 100);

Are you going to work on the bugfix?

🆘 No, could someone else please work on the bugfix?

@Ramlaoui Ramlaoui added bug Something isn't working community pgai labels Dec 24, 2024
@cevian
Copy link
Collaborator

cevian commented Jan 10, 2025

turns out this is a Postgres memory leak. I've submitted a patch up to them. Let's see if they accept it. If they don't there is a workaround we can implement on our side, but it is ugly.

@cevian
Copy link
Collaborator

cevian commented Jan 13, 2025

The modified patch made it in!

@Ramlaoui
Copy link
Author

Thanks a lot, that's very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community pgai
Projects
None yet
Development

No branches or pull requests

2 participants