-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Rollback ddp_tutorial fix #1621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: #1618 was merged unintentionally before all the cmts were resolved. Sending out the CL to roll it back. Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
@@ -265,8 +265,8 @@ either the application or the model ``forward()`` method. | |||
setup(rank, world_size) | |||
|
|||
# setup mp_model and devices for this process | |||
dev0 = (rank * 2) % world_size | |||
dev1 = (rank * 2 + 1) % world_size | |||
dev0 = rank * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied the conversation from #1618:
Shen:
Unfortunately, we cannot do this. This will make rank0 and rank 1 share the two GPUs, if there are just two GPUs. DDP and collective comm requires each process to exclusively work on GPUs, otherwise, the comm might hang.
Bo:
The issue I got is like below:
I tried to run the code on 2 GPUs machine.
For rank0 -- dev0=0 dev1=1
For rank1 --- dev0=2 dev2=3
With that I got exception about wrong device ordinals since only cuda:0 and cuda:1 are valid. But for rank2, it ends up with cuda:2 and cuda:3
Any idea how to avoid such exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like only demo_model_parallel
needs >= 4 GPUs. Maybe we can do the following?
- skip
demo_model_parallel
when there are less than 4 GPUs - pass ngpus / 2 as world_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG, changed as suggested
Continued discussion of #1618 here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
@@ -265,8 +265,8 @@ either the application or the model ``forward()`` method. | |||
setup(rank, world_size) | |||
|
|||
# setup mp_model and devices for this process | |||
dev0 = (rank * 2) % world_size | |||
dev1 = (rank * 2 + 1) % world_size | |||
dev0 = rank * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like only demo_model_parallel
needs >= 4 GPUs. Maybe we can do the following?
- skip
demo_model_parallel
when there are less than 4 GPUs - pass ngpus / 2 as world_size
Summary: #1618 was merged unintentionally before all the cmts were resolved. Also did a minior fix to skip demo parallel when ngpus is < 4 Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Thanks Shen for the quick review, fixed as commented, PTAL |
Summary: #1618 was merged unintentionally before all the cmts were resolved. Also did a minior fix to skip demo parallel when ngpus is < 4 Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: #1618 was merged unintentionally before all the cmts were resolved. Also did a minior fix to skip demo parallel when ngpus is < 4 Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: See pytorch#1621 Test Plan: Reviewers: Subscribers: Tasks: Tags:
Closing this PR which is subsumed by #1641 |
Stack from ghstack:
Summary:
#1618 was merged unintentionally before all the cmts were resolved.
Also did a minior fix to skip demo parallel when ngpus is < 4
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: