Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix minor grammatical corrections in docs #1181

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/deep_dive/oss_sdp_fsdp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ that aim to tackle the tradeoff between using Data Parallel training and Model P
When using Data Parallel training, you tradeoff memory for computation/communication efficiency.
On the other hand, when using Model Parallel training, you tradeoff computation/communication
efficiency for memory. ZeRO attempts to solve this problem. Model training generally involves memory
footprints that falls into two categories:
footprints that fall into two categories:

1. Model states - optimizer states, gradients, parameters

Expand Down
2 changes: 1 addition & 1 deletion docs/source/deep_dive/pipeline_parallelism.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Gpipe first shards the model across different devices where each device hosts a
A shard can be a single layer or a series of layers. However Gpipe splits a mini-batch of data into
micro-batches and feeds it to the device hosting the first shard. The layers on each device process
the micro-batches and send the output to the following shard/device. In the meantime it is ready to
process the micro batch from the previous shard/device. By pipepling the input in this way, Gpipe is
process the micro batch from the previous shard/device. By pipelining the input in this way, Gpipe is
able to reduce the idle time of devices.

Best practices for using `fairscale.nn.Pipe`
Expand Down