Hi, a little bit confuse about your code, please give me some help. #1

tbwxmu · 2019-11-24T15:52:30Z

I noticed that they use the "Save and Load Checkpoints" to synchronize all models in different process in the PyTorch tutorial https://pytorch.org/tutorials/intermediate/ddp_tutorial.html

So, I want to know if there are some implicit synchronization mechanisms in your distributed_tutorial code.

LearnedVector · 2020-01-10T21:22:23Z

hello @tbwxmu did you figure this out?

linminhtoo · 2021-01-06T14:31:11Z

Anyone has a solution for this?

Provide feedback