Skip to content

💡 [REQUEST] - <GETTING STARTED WITH FULLY SHARDED DATA PARALLEL(FSDP)> #2613

Closed
@Wesleystormrage

Description

@Wesleystormrage

🚀 Descirbe the improvement or the new tutorial

After I read the "How FSDP works" in https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html?highlight=fsdp, I still couldn't figure out what FSDP is due to the lack of explaination of ALL_GATHER and REDUCE-SCATTER which I believe are the key concepts in FSDP.
And this article helped me. https://engineering.fb.com/2021/07/15/open-source/fsdp/

"I believe The key insight to unlock full parameter sharding is that we can decompose the all-reduce operations in DDP into separate reduce-scatter and all-gather operations:"
image

I think adding this part can greatly help readers to better understand FSDP.

Existing tutorials on this topic

GETTING STARTED WITH FULLY SHARDED DATA PARALLEL(FSDP)
https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html?highlight=fsdp

Additional context

No response

cc @wconstab @osalpekar @H-Huang @kwen2501 @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions