Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Parallelization of Model Creation #76

Open
adityakavalur opened this issue Sep 27, 2022 · 4 comments
Open

Feature request: Parallelization of Model Creation #76

adityakavalur opened this issue Sep 27, 2022 · 4 comments

Comments

@adityakavalur
Copy link
Contributor

I am trying to develop a physics-based model: LJ potential using OpenKIM LJ_ElliottAkerson_2015 and DFT data that runs into 1000s of configuration snapshots with 100s of atoms in each and it seems to take a long time.

I am playing around with a smaller problem i.e. smaller dataset and severely restricted number of minimization iterations and based on the profiling results it looks like _update_neigh in kliff/models/kim.py is the main bottleneck. However, the calc.create command that calls this is not parallelized.

So my questions are as follows:

  1. Have you all faced this before?
  2. Is there a reason why this is not parallelized; If no - then is this on your TODO list?
  3. If you are not currently working on this, would you accept an MR?

For the above point 3, I poked around a little bit and it seems like using the parmap1 function on KIMComputeArguments here would probably be the easiest way to proceed.

Thanks for developing this utility! It is really nifty and will hopefully standardize this aspect of atomistic modeling.
Please let me know if anything above is unclear!

@adityakavalur adityakavalur changed the title Parallelization of Model Creation Feature request: Parallelization of Model Creation Sep 27, 2022
@mjwen
Copy link
Collaborator

mjwen commented Sep 27, 2022

Right, it is not parallelized. The reason is that this only needs to be done once at the beginning, before any optimization happens. For LJ, it is probably not ideal, because LJ is cheap to evaluate.

A MR is very welcome and your suggestion seems the right way to go!

@adityakavalur
Copy link
Contributor Author

adityakavalur commented Oct 4, 2022

Hi @mjwen thanks for the reply
Unfortunately, I have run into a blocker that I need some help with. I provide details below if you want a working example but the crux of the situation is that I run into the error
TypeError: can't pickle kimpy.compute_arguments.PyComputeArguments objects when I try to parallelize it, i.e. the object class that is being returned here is not serializable as is. I tried to keep modifications to a minimum so I used parmap2 to create parmap21 so that multiple non-sequential arguments can be iterated. However, if we are not able to send the entire object class across a pipe, there seems to be 2 options

  1. Make the function parmap21 very non-generic, i.e. intialize all the classes on the primary process and try to get back only the neighbor list, which is the expensive computation
  2. Use get_state /save_state to severely reduce the object being sent across the pipe something leveraging this

Both of these are of course not ideal. If you can think of a better way please let me know. I am not a Python expert by any stretch of the imagination.

How-to Reproduce the problem:

  1. Use my fork https://github.com/adityakavalur/kliff
  2. Use the below modified KLIFF tutorial example
"""
.. _tut_lj:

Train a Lennard-Jones potential
===============================

In this tutorial, we train a Lennard-Jones potential that is build in KLIFF (i.e. not
models archived on OpenKIM_). From a user's perspective, a KLIFF built-in model is not
different from a KIM model.

Compare this with :ref:`tut_kim_sw`.

.. _OpenKIM: https://openkim.org
"""
from kliff.calculators import Calculator
from kliff.dataset import Dataset
from kliff.loss import Loss
from kliff.models import LennardJones
from kliff.utils import download_dataset

from kliff.models import KIMModel

# training set
dataset_path = download_dataset(dataset_name="Si_training_set_4_configs")
tset = Dataset(dataset_path)
configs = tset.get_configs()

# calculator
model = KIMModel(model_name="LJ_ElliottAkerson_2015")


# fitting parameters
epslist = [[0.0,"fix"]]*9730
siglist = [[0.0,"fix"]]*9730
cutlist = [[0.0,"fix"]]*9730

epslist[1729] = ["default", 0.001, None]
siglist[1729] = ["default", 1.977, None]
cutlist[1729] = ["default", "fix"]
model.set_opt_params(cutoffs=cutlist,sigmas=siglist, epsilons=epslist)



calc = Calculator(model)
#calc.create(configs, use_energy=True, use_forces=False)
calc.create(configs, use_energy=True, use_forces=False, nprocs=2)

# loss
loss = Loss(calc, nprocs=1)
result = loss.minimize(method="L-BFGS-B", options={"disp": True, "maxiter": 10})


# print optimized parameters
model.save("kliff_model.yaml")

@mjwen
Copy link
Collaborator

mjwen commented Oct 7, 2022

@adityakavalur thanks for the detailed info!

The PyComputeArguments is essentially a C++ object, and we wrapped it using pybind11 to create a python interface. I just checked pybind11 quickly, and it seems it can be made picklable. @yafshar do you have experience with this?

@adityakavalur
Copy link
Contributor Author

@mjwen thanks for the helpful comment! Looking at pickling example in pybind11 here. It looks like they are also using savestate and getstate as I mentioned above but they are adding all the members of the object to a tuple and sending that across, so there is no expected loss of information. I hadnt thought to do that. Let me see if I can get this to work, there will still be a level of non-generality to the new function parmap21 but its a lot lesser than what I initially thought

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants