how to run meep remotely? #2071

joamatab · 2022-05-12T17:21:32Z

joamatab
May 12, 2022

What is the recommended way to run meep on a remote server?

some ideas for plugins or demos to add to meep:

GCP announced https://cloud.google.com/run/docs/overview/what-is-cloud-run#jobs

we could add a silicon photonics labrad demo running meep on the cloud
https://github.com/GoogleCloudPlatform/rad-lab/tree/main/modules/silicon_design

@proppy
@flaport
@simbilod
@HelgeGehring

joamatab · 2022-05-12T17:44:27Z

joamatab
May 12, 2022
Author

Also, it may be nice to integrate some examples on running on jupyter notebooks as professor Alex Tait did @atait
https://github.com/atait/jupyter-meep

0 replies

stevengj · 2022-05-12T23:08:50Z

stevengj
May 12, 2022
Maintainer

What is the recommended way to run meep on a remote server?

Just log onto the remote server (with ssh or similar) and then run python my-meep-script.py?

Do you mean interactively with Jupyter notebooks? You can do this in a variety of ways, e.g. by tunneling through ssh.

0 replies

stevengj · 2022-05-12T23:09:49Z

stevengj
May 12, 2022
Maintainer

@oskooi has a lot of experience running Meep on AWS and similar cloud servers, if that's what you mean.

0 replies

smartalecH · 2022-05-12T23:53:54Z

smartalecH
May 12, 2022
Collaborator

I think @joamatab is referring to running meep on systems (like Google Cloud) where you may not have access to the same hardware for an extended period of time (similar to our discussion with @ianwilliamson some time ago). Specifically, platforms like this often require a slightly more "dynamic" job manager (running mpirun python3 my_script.py won't' cut it here).

Although even with Google Cloud, you should be able to run your script on multiple processes using MPI. I've had success with this on a variety of (commercial) clusters.

2 replies

ianwilliamson May 13, 2022

I think @joamatab is referring to running meep on systems (like Google Cloud) where you may not have access to the same hardware for an extended period of time (similar to our discussion with @ianwilliamson some time ago). Specifically, platforms like this often require a slightly more "dynamic" job manager (running mpirun python3 my_script.py won't' cut it here).

HPC oriented VMs are available on GCP: https://cloud.google.com/compute/docs/instances/create-hpc-vm. These are probably what would work best for Meep, rather than Spot (preemptable) VMs: https://cloud.google.com/compute/docs/instances/spot, since MPI is quite brittle to worker failure.

joamatab May 16, 2022
Author

Yes, i also have been using tidy3d which has a powerful client server architecture for running simulations remotely and I was wondering what you guys think about doing something similar with meep

oskooi · 2022-05-16T06:55:38Z

oskooi
May 16, 2022
Collaborator

In practice, the statement that "MPI is quite brittle to worker failure" is an issue typically when the cluster is configured with a large number of virtual machines (i.e., ~10+) running for a long period of time (i.e., several hours or days). Obviously, the larger the number of VMs and the longer they are running, the higher the probability that one of these VMs could be preempted. Note that, in any circumstance in which the job may need to be restarted, Meep supports checkpointing the simulation state. Thus, for MPI jobs in the public cloud, it is good practice to set up your job to periodically dump its state to persistent storage.

Combined with this checkpointing, the absence of fault tolerance in the MPI standard can be mitigated in most public clouds in another simple way: for jobs with a fixed number of Meep chunks (or CPU cores), reconfigure the cluster to use a smaller number of nodes but with a larger number of vCPUs per node. A large degree of runtime customization is a key feature of all major public clouds. As an example using GCP, a parallel simulation requiring 224 cores or MPI processes can be executed using: (1) two nodes of c2d-highmem-112 or (20 seven nodes of c2d-highmem-32. This particular configuration provides at least ~1.8 TB of memory which is probably sufficient to support a large, high-resolution 3D simulation. (Note that even though #1628 added support for thread-level parallelism in the field-update loops during the timestepping, we have not yet been able to demonstrate a noticeable speed up in benchmarking studies. This means that, for the time being, it may be better to disable hyperthreading for improved runtime performance.) Also, in general using fewer nodes is advantageous because it reduces the impact of any bottlenecks due to the network communication. For a discussion including results from benchmarking studies on the impact of communication vs. computation on the runtime of parallel Meep simulations on AWS, see https://arxiv.org/abs/2003.04287. Also related to improving runtime performance, I should point out the recent feature to enable dynamic chunk balancing.

Based on this, I think practically all the pieces are there now to be able to scale up Meep jobs in the public cloud to arbitrary size problems.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to run meep remotely? #2071

{{title}}

Replies: 5 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

how to run meep remotely? #2071

joamatab May 12, 2022

Replies: 5 comments · 2 replies

joamatab May 12, 2022 Author

stevengj May 12, 2022 Maintainer

stevengj May 12, 2022 Maintainer

smartalecH May 12, 2022 Collaborator

ianwilliamson May 13, 2022

joamatab May 16, 2022 Author

oskooi May 16, 2022 Collaborator

joamatab
May 12, 2022

Replies: 5 comments 2 replies

joamatab
May 12, 2022
Author

stevengj
May 12, 2022
Maintainer

stevengj
May 12, 2022
Maintainer

smartalecH
May 12, 2022
Collaborator

joamatab May 16, 2022
Author

oskooi
May 16, 2022
Collaborator