Skip to content

Rollback ddp_tutorial fix #1621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions intermediate_source/ddp_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,8 +265,8 @@ either the application or the model ``forward()`` method.
setup(rank, world_size)

# setup mp_model and devices for this process
dev0 = (rank * 2) % world_size
dev1 = (rank * 2 + 1) % world_size
dev0 = rank * 2
Copy link
Contributor Author

@bowangbj bowangbj Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli

Copied the conversation from #1618:

Shen:
Unfortunately, we cannot do this. This will make rank0 and rank 1 share the two GPUs, if there are just two GPUs. DDP and collective comm requires each process to exclusively work on GPUs, otherwise, the comm might hang.

Bo:
The issue I got is like below:

I tried to run the code on 2 GPUs machine.
For rank0 -- dev0=0 dev1=1
For rank1 --- dev0=2 dev2=3

With that I got exception about wrong device ordinals since only cuda:0 and cuda:1 are valid. But for rank2, it ends up with cuda:2 and cuda:3

Any idea how to avoid such exception?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like only demo_model_parallel needs >= 4 GPUs. Maybe we can do the following?

  1. skip demo_model_parallel when there are less than 4 GPUs
  2. pass ngpus / 2 as world_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, changed as suggested

dev1 = rank * 2 + 1
mp_model = ToyMpModel(dev0, dev1)
ddp_mp_model = DDP(mp_model)

Expand All @@ -286,7 +286,9 @@ either the application or the model ``forward()`` method.
if __name__ == "__main__":
n_gpus = torch.cuda.device_count()
assert n_gpus >= 2, f"Requires at least 2 GPUs to run, but got {n_gpus}"
world_size = n_gpus
run_demo(demo_basic, world_size)
run_demo(demo_checkpoint, world_size)
run_demo(demo_model_parallel, world_size)
run_demo(demo_basic, n_gpus)
run_demo(demo_checkpoint, n_gpus)
if n_gpus < 4:
print("Skipped demo_model_parallel since it requires >= 4 GPUs.")
else:
run_demo(demo_model_parallel, world_size)