CUDA Out of Memory with GCMC #445
Replies: 5 comments 10 replies
-
Hey, what is your model size, and your cutoff? This greatly influence the number of atoms that can fit on a GPU. |
Beta Was this translation helpful? Give feedback.
-
Could you please repeat your run but without changing the number of atoms in the system (i.e. always reject the GCMC steps which change the number of atoms). Let's see if the problem is with the changing atom number. |
Beta Was this translation helpful? Give feedback.
-
I've encountered a similar problem with a atom-swapping MD/MC code in ase. When the order of elements gets modified between MD steps, the GPU memory starts to slowly but steadily fill up, until it runs out. If I just do sequential short MDs without swapping atoms this doesn't happen. |
Beta Was this translation helpful? Give feedback.
-
If @jmargraf is having the same problem, it's (hopefully) not a LAMMPS issue. |
Beta Was this translation helpful? Give feedback.
-
Some more insight: The problem seems to come from making copies of the mace calculator (or deepcopies of atoms objects, which amounts to the same thing). I now modified the code to swap atom positions instead of modifying the element order, and changed everything to operate on a single atoms object in place. This seems to solve the issue. I'm not sure how to deal with this more generally though, when I do want to make copies of atoms objects. |
Beta Was this translation helpful? Give feedback.
-
Hi, I am trying to run Grand Canonical Monte Carlo (GCMC) for water adsorption in silica using MACE. The issue is that I have a CUDA out of memory error after few thousands steps (~4000 with one gpu). I also tried using more than one gpu, but I still have cuda out of memory. I am using A100 GPUs 64GB.
Input
This is the input file that I am using. Note that I specify bond and angle styles as harmonic but then the coefficient is 0 in order to be able to run the MC with the water molecule, so at the end the energies that I get are only mace predicted energies.
Running environment
I used the following specifications to build lammps-mace:
When I run, I load these modules:
Error message
This is the error message that I get
I am not sure why I have cuda out of memory error although the size of my system is small, I would appreciate any insights, thanks!
Beta Was this translation helpful? Give feedback.
All reactions