-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce readers' memory consumption #229
Comments
Thanks @marcovarrone for reporting this and for implementing a solution, which from the profiling is giving excellent results! I am still doing some triaging some issues/feature requests so I haven't tried your code yet, but from a quick look I have a few comments. I'll also answer your questions above.
I would change the implementation a bit by removing the calls to internal APIs and instead use just
I think it is a good approach. I would keep the implementation simple and not add additional parameter as the case in which the user wants to modify the writing option is very rare, and in such cases I think it would be acceptable for the user to use extra RAM and call the write function manually.
Good point. Some users experienced performance problems because it was not clear to them that they had to write and re-read the |
In the current implementation, a reader loads all the object from the raw files in memory and creates a
spatialdata
object. Is then the user's responsibility to save the object to disk.For big samples, this leads to large memory requirements as it needs to load the whole object in memory.
I would propose to add a new
output_path
parameter in a reader's function to allow saving every element of the spatialdata object as soon as it's created. This allows to free up part of the memory during the function execution.I created a draft pull request (#228) only for Xenium but if I receive the ok I can implement it for the rest of the readers.
Here is a comparison using
memory_profiler
Current version
New version
As you can see, it reduces the maximum memory occupied by around 44% (from 64 GB to 35GB).
I need your opinion on a few things:
spatialdata.write()
. Instead, could we usespatialdata.write()
repeatedly every time a new element is generated? In theory, it should write only the new element, right? In this way, if something changes in thespatialdata
code for writing, we will not have to update code of the readers.spatialdata
to not add too many parameters to the reader function. However, should we add a parameterwrite_kwargs
to allow the user to change the parameters of the writing?output_path
is set, it returnsNone
instead of the spatialdata object. Should we return an on-disk spatialdata object by rereading the file?Looking forward to feedback!
The text was updated successfully, but these errors were encountered: