You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I've been using the new feature to add HugginFace datasets inside a table. However, for large datasets it seems like the call to load_dataset_multi_txn crashes after a certain time because of OOM problems.
I've tried playing with the size of the batch and commit_every_n_batches but I still get the same issue.
Is there any way to mitigate this issue or at least to have a parameter setting where we want to start uploading from the dataset (eg. after 1000 batches).
turns out this is a Postgres memory leak. I've submitted a patch up to them. Let's see if they accept it. If they don't there is a workaround we can implement on our side, but it is ugly.
What happened?
Hi, I've been using the new feature to add HugginFace datasets inside a table. However, for large datasets it seems like the call to
load_dataset_multi_txn
crashes after a certain time because of OOM problems.I've tried playing with the size of the batch and
commit_every_n_batches
but I still get the same issue.Is there any way to mitigate this issue or at least to have a parameter setting where we want to start uploading from the dataset (eg. after 1000 batches).
pgai extension affected
0.6.0
pgai library affected
No response
PostgreSQL version used
17.1
What operating system did you use?
Ubuntu 24.04 32GB RAM
What installation method did you use?
Docker
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?
Are you going to work on the bugfix?
🆘 No, could someone else please work on the bugfix?
The text was updated successfully, but these errors were encountered: