-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version 0.3.3 crashes kernel with large dataset #75
Comments
Do you have this error?
|
The problem arises because of the However... I'm not sure how well this is going to work in a model. You'll have 30542 parameters for this term and that seems like a lot. Have you ever worked with such sizes in other GLM frameworks? |
Hey @tomicapretto I did not get that error, just a kernel crash! But I can definitely see why that's happening, I hadn't realised the explosion behind the scenes for the group specific matrix 🤯 I've worked with similar sized datasets a lot with GLM's but actually never with group-specific terms. Each |
Alex, I guess you're getting a similar error, but somehow Jupyter does not show that to you (unless you have > 41 GB of RAM). Maybe you could try with a simpler model, or |
Thanks @tomicapretto! With PyMC3 I used patsy to build the design matrix for fixed effects, and then just indexed the non-centred group effect using codes as is usually done in PyMC for hierarchical models. It sampled fine with ADVI, and the predictions looked sensible. What's strange about my issue is that using the conda install of bambi (so 0.2.0 for formulae), the model will build, sample with ADVI, and predict just fine, albeit with the modification of the design to the below:
Unfortunately, I need that interaction term, but I can make do with PyMC3 here. I am still noticing that even when fitting the model with ADVI via bambi that the predictions for the |
Hmmm to be honest I don't know why it works with formulae==0.2.0 and it does not work with formulae>0.3.0. I do think there are a couple of things we could do to improve it, which is adding support for sparse matrices and using them appropriately in Bambi. In the meantime, I would recommend to use the PyMC approach if that works for you 👍 |
I will stick with the PyMC/patsy hybrid I have up and running for now! I'm relieved to see bambi gives me very similar results when it does build, which is a good sanity check. I'd be happy to try and contribute something with sparse matrices for formulae but will need to study the docs first! Thanks for all your help @tomicapretto |
Hi all,
Thank you for the work on this awesome package!
I am using
bambi
to fit a fairly complex model on a large dataset with a single group effect, which has many individuals. There are around 183,000 rows and 30,542 unique groups. Version 0.3.3 crashes Jupyter reliably when instantiating this design matrix. Interestingly, version 0.2.0 will instantiate it, with the caveat I can't include an interaction term I can in 0.3.3 within bambi (see bambi issue 495).I've tried a few different approaches, and have found it will instantiate with about 25-50% of the data (using
DataFrame.sample(frac=.25)
), so it seems more an issue of sheer scale than anything else. I've also tried with Spyder, getting the same issue.The code below will grab a modified dataset of the same size and structure I am using and set up the model design, which kills my kernel after a minute or two.
0.2.0 will return this after a little while, however. Any help is greatly appreciated, hopefully this issue isn't localised to my machine!
The text was updated successfully, but these errors were encountered: