Skip to content

[FX] Added fuser tutorial #1356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 4, 2021
Merged

[FX] Added fuser tutorial #1356

merged 9 commits into from
Mar 4, 2021

Conversation

Chillee
Copy link
Contributor

@Chillee Chillee commented Feb 11, 2021

Not sure how to test it in notebook format.

Also, perhaps I'd like to bake in the output somehow? It would be somewhat embarassing if the fused version was slower due to noise :)

@netlify
Copy link

netlify bot commented Feb 11, 2021

Deploy preview for pytorch-tutorials-preview ready!

Built with commit 14a7913

https://deploy-preview-1356--pytorch-tutorials-preview.netlify.app

index.rst Outdated
.. Code Transformations with FX
.. customcarditem::
:header: Building a Convolution/Batch Norm fuser in FX
:card_description: Build a simple FX interpreter to record the runtime of op, module, and function calls and report statistics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the card description and images correct? They look like they belong to other tutorials?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description is wrong, but not sure what to put for the image. @jamesr66a is there any reason you chose this image for the performance profiling for FX? https://github.com/pytorch/tutorials/pull/1319/files#diff-54a294a5d016e1a8e98bc95668ed84a99a9edd5c10394d9a2b1ee848006e98a7R223

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put that because it's just a generic PyTorch logo:
image

And wasn't sure if I should copy that into a new filename or if we should come up with some logo to use

@@ -215,6 +215,15 @@ Welcome to PyTorch Tutorials
:link: advanced/super_resolution_with_onnxruntime.html
:tags: Production

.. Code Transformations with FX
.. customcarditem::
:header: Building a Convolution/Batch Norm fuser in FX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this technique more commonly referred to as "folding" but both make sense (https://towardsdatascience.com/speed-up-inference-with-batch-normalization-folding-8a45a83a89d8, https://arxiv.org/abs/1611.09842 calls it "absorbing").

Might be nice to use different terminology in case we want to add a "first class" fusion tutorial later that e.g. directly calls into NNC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the terminology is confusing, but I think fusion is an acceptable (and more widely understood) term. If we add a fusion tutorial later I'd be glad to rename it to something to avoid name conflicts.

@Chillee Chillee requested a review from jamesr66a February 16, 2021 10:38
Base automatically changed from master to main February 16, 2021 19:33
Base automatically changed from main to master February 16, 2021 19:37
Copy link

@jamesr66a jamesr66a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

index.rst Outdated
.. Code Transformations with FX
.. customcarditem::
:header: Building a Convolution/Batch Norm fuser in FX
:card_description: Build a simple FX interpreter to record the runtime of op, module, and function calls and report statistics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put that because it's just a generic PyTorch logo:
image

And wasn't sure if I should copy that into a new filename or if we should come up with some logo to use

# accessing the computational graph. FX resolves this problem by symbolically
# tracing the actual operations called, so that we can track the computations
# through the `forward` call, nested within Sequential modules, or wrapped in
# an user-defined module.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a user-defined module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Shouldn't it be an before user? Since user starts with a vowel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rule is based on the sound, not the letter.


fused_model = fuse(model)
print(fused_model.code)
inp = torch.randn(5, 1, 1, 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we run this on a more realistic input shape?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wrote all the conv/batch norm modules to operate on a [1,1,1] shape. We're not measuring the performance of this module so I don't think it matters.

@brianjo brianjo added the 1.8 PRs for upcoming release label Feb 17, 2021
@brianjo brianjo changed the base branch from master to 1.8-RC5-TEST March 4, 2021 13:05
@brianjo brianjo merged commit 5bda6b0 into pytorch:1.8-RC5-TEST Mar 4, 2021
brianjo added a commit that referenced this pull request Mar 4, 2021
* Update build.sh

* Update audio tutorial for release pytorch 1.8 / torchaudio 0.8 (#1379)

* [wip] replace audio tutorial

* Update

* Update

* Update

* fixup

* Update requirements.txt

* update

* Update

Co-authored-by: Brian Johnson <brianjo@fb.com>

* [1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial (#1352)

* switch to the new dataset API

* checkpoint

* checkpoint

* checkpoint

* update docs

* checkpoint

* switch to legacy vocab

* update to follow the master API

* checkpoint

* checkpoint

* address reviewer's comments

Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair>
Co-authored-by: Brian Johnson <brianjo@fb.com>

* [1.8 release] Switch to LM dataset in torchtext 0.9.0 release (#1349)

* switch to raw text dataset in torchtext 0.9.0 release

* follow the new API in torchtext master

Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair>
Co-authored-by: Brian Johnson <brianjo@fb.com>

* [WIP][FX] CPU Performance Profiling with FX (#1319)

Co-authored-by: Brian Johnson <brianjo@fb.com>

* [FX] Added fuser tutorial (#1356)

* Added fuser tutorial

* updated index.rst

* fixed conclusion

* responded to some comments

* responded to comments

* respond

Co-authored-by: Brian Johnson <brianjo@fb.com>

* Update numeric_suite_tutorial.py

* Tutorial combining DDP with Pipeline Parallelism to Train Transformer models (#1347)

* Tutorial combining DDP with Pipeline Parallelism to Train Transformer models.

Summary: Tutorial which places a pipe on GPUs 0 and 1 and another Pipe
on GPUs 2 and 3. Both pipe replicas are replicated via DDP. One process
drives GPUs 0 and 1 and another drives GPUs 2 and 3.

* Polish out some of the docs.

* Add thumbnail and address some comments.

Co-authored-by: pritam <pritam.damania@fb.com>

* More updates to numeric_suite

* Even more updates

* Update numeric_suite_tutorial.py

Hopefully that's the last one

* Update numeric_suite_tutorial.py

Last one

* Update build.sh

Co-authored-by: moto <855818+mthrok@users.noreply.github.com>
Co-authored-by: Guanheng George Zhang <6156351+zhangguanheng66@users.noreply.github.com>
Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair>
Co-authored-by: James Reed <jamesreed@fb.com>
Co-authored-by: Horace He <horacehe2007@yahoo.com>
Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com>
Co-authored-by: pritam <pritam.damania@fb.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
rodrigo-techera pushed a commit to Experience-Monks/tutorials that referenced this pull request Nov 29, 2021
* Update build.sh

* Update audio tutorial for release pytorch 1.8 / torchaudio 0.8 (pytorch#1379)

* [wip] replace audio tutorial

* Update

* Update

* Update

* fixup

* Update requirements.txt

* update

* Update

Co-authored-by: Brian Johnson <brianjo@fb.com>

* [1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial (pytorch#1352)

* switch to the new dataset API

* checkpoint

* checkpoint

* checkpoint

* update docs

* checkpoint

* switch to legacy vocab

* update to follow the master API

* checkpoint

* checkpoint

* address reviewer's comments

Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair>
Co-authored-by: Brian Johnson <brianjo@fb.com>

* [1.8 release] Switch to LM dataset in torchtext 0.9.0 release (pytorch#1349)

* switch to raw text dataset in torchtext 0.9.0 release

* follow the new API in torchtext master

Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair>
Co-authored-by: Brian Johnson <brianjo@fb.com>

* [WIP][FX] CPU Performance Profiling with FX (pytorch#1319)

Co-authored-by: Brian Johnson <brianjo@fb.com>

* [FX] Added fuser tutorial (pytorch#1356)

* Added fuser tutorial

* updated index.rst

* fixed conclusion

* responded to some comments

* responded to comments

* respond

Co-authored-by: Brian Johnson <brianjo@fb.com>

* Update numeric_suite_tutorial.py

* Tutorial combining DDP with Pipeline Parallelism to Train Transformer models (pytorch#1347)

* Tutorial combining DDP with Pipeline Parallelism to Train Transformer models.

Summary: Tutorial which places a pipe on GPUs 0 and 1 and another Pipe
on GPUs 2 and 3. Both pipe replicas are replicated via DDP. One process
drives GPUs 0 and 1 and another drives GPUs 2 and 3.

* Polish out some of the docs.

* Add thumbnail and address some comments.

Co-authored-by: pritam <pritam.damania@fb.com>

* More updates to numeric_suite

* Even more updates

* Update numeric_suite_tutorial.py

Hopefully that's the last one

* Update numeric_suite_tutorial.py

Last one

* Update build.sh

Co-authored-by: moto <855818+mthrok@users.noreply.github.com>
Co-authored-by: Guanheng George Zhang <6156351+zhangguanheng66@users.noreply.github.com>
Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair>
Co-authored-by: James Reed <jamesreed@fb.com>
Co-authored-by: Horace He <horacehe2007@yahoo.com>
Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com>
Co-authored-by: pritam <pritam.damania@fb.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.8 PRs for upcoming release cla signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants