You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am software engineer at Algolia and we love your library. However, we've encountered a pickle with certain documents we are trying to process, which just fail to be loaded because of the memory consumption of cheerio.load.
After doing some analysis, it seems that the cheerio.load will transform a 5MB file into a 150MB - 500MB memory representation. That's a x30 to x100 increase in size.
It would be awesome for us to have a more memory-efficient parser. I have looked into how the htmlparser2 library is used and it seems to me that it could be possible to have a more efficient representation of the elements, but I am not 100% sure how.
Could this type of constraint be something you consider for a future release?
Thank you!
Hi,
I am software engineer at Algolia and we love your library. However, we've encountered a pickle with certain documents we are trying to process, which just fail to be loaded because of the memory consumption of
cheerio.load
.After doing some analysis, it seems that the cheerio.load will transform a 5MB file into a 150MB - 500MB memory representation. That's a x30 to x100 increase in size.
It would be awesome for us to have a more memory-efficient parser. I have looked into how the
htmlparser2
library is used and it seems to me that it could be possible to have a more efficient representation of the elements, but I am not 100% sure how.Could this type of constraint be something you consider for a future release?
Thank you!
Code snippet used for measurements:
cheerio version: 1.0.0-rc.3
The text was updated successfully, but these errors were encountered: