-
Notifications
You must be signed in to change notification settings - Fork 4.2k
[FX] Added fuser tutorial #1356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploy preview for pytorch-tutorials-preview ready! Built with commit 14a7913 https://deploy-preview-1356--pytorch-tutorials-preview.netlify.app |
index.rst
Outdated
.. Code Transformations with FX | ||
.. customcarditem:: | ||
:header: Building a Convolution/Batch Norm fuser in FX | ||
:card_description: Build a simple FX interpreter to record the runtime of op, module, and function calls and report statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the card description and images correct? They look like they belong to other tutorials?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description is wrong, but not sure what to put for the image. @jamesr66a is there any reason you chose this image for the performance profiling for FX? https://github.com/pytorch/tutorials/pull/1319/files#diff-54a294a5d016e1a8e98bc95668ed84a99a9edd5c10394d9a2b1ee848006e98a7R223
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -215,6 +215,15 @@ Welcome to PyTorch Tutorials | |||
:link: advanced/super_resolution_with_onnxruntime.html | |||
:tags: Production | |||
|
|||
.. Code Transformations with FX | |||
.. customcarditem:: | |||
:header: Building a Convolution/Batch Norm fuser in FX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this technique more commonly referred to as "folding" but both make sense (https://towardsdatascience.com/speed-up-inference-with-batch-normalization-folding-8a45a83a89d8, https://arxiv.org/abs/1611.09842 calls it "absorbing").
Might be nice to use different terminology in case we want to add a "first class" fusion tutorial later that e.g. directly calls into NNC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the terminology is confusing, but I think fusion is an acceptable (and more widely understood) term. If we add a fusion tutorial later I'd be glad to rename it to something to avoid name conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
index.rst
Outdated
.. Code Transformations with FX | ||
.. customcarditem:: | ||
:header: Building a Convolution/Batch Norm fuser in FX | ||
:card_description: Build a simple FX interpreter to record the runtime of op, module, and function calls and report statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# accessing the computational graph. FX resolves this problem by symbolically | ||
# tracing the actual operations called, so that we can track the computations | ||
# through the `forward` call, nested within Sequential modules, or wrapped in | ||
# an user-defined module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a user-defined module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Shouldn't it be an
before user
? Since user
starts with a vowel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rule is based on the sound, not the letter.
|
||
fused_model = fuse(model) | ||
print(fused_model.code) | ||
inp = torch.randn(5, 1, 1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we run this on a more realistic input shape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wrote all the conv/batch norm modules to operate on a [1,1,1]
shape. We're not measuring the performance of this module so I don't think it matters.
* Update build.sh * Update audio tutorial for release pytorch 1.8 / torchaudio 0.8 (#1379) * [wip] replace audio tutorial * Update * Update * Update * fixup * Update requirements.txt * update * Update Co-authored-by: Brian Johnson <brianjo@fb.com> * [1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial (#1352) * switch to the new dataset API * checkpoint * checkpoint * checkpoint * update docs * checkpoint * switch to legacy vocab * update to follow the master API * checkpoint * checkpoint * address reviewer's comments Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: Brian Johnson <brianjo@fb.com> * [1.8 release] Switch to LM dataset in torchtext 0.9.0 release (#1349) * switch to raw text dataset in torchtext 0.9.0 release * follow the new API in torchtext master Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: Brian Johnson <brianjo@fb.com> * [WIP][FX] CPU Performance Profiling with FX (#1319) Co-authored-by: Brian Johnson <brianjo@fb.com> * [FX] Added fuser tutorial (#1356) * Added fuser tutorial * updated index.rst * fixed conclusion * responded to some comments * responded to comments * respond Co-authored-by: Brian Johnson <brianjo@fb.com> * Update numeric_suite_tutorial.py * Tutorial combining DDP with Pipeline Parallelism to Train Transformer models (#1347) * Tutorial combining DDP with Pipeline Parallelism to Train Transformer models. Summary: Tutorial which places a pipe on GPUs 0 and 1 and another Pipe on GPUs 2 and 3. Both pipe replicas are replicated via DDP. One process drives GPUs 0 and 1 and another drives GPUs 2 and 3. * Polish out some of the docs. * Add thumbnail and address some comments. Co-authored-by: pritam <pritam.damania@fb.com> * More updates to numeric_suite * Even more updates * Update numeric_suite_tutorial.py Hopefully that's the last one * Update numeric_suite_tutorial.py Last one * Update build.sh Co-authored-by: moto <855818+mthrok@users.noreply.github.com> Co-authored-by: Guanheng George Zhang <6156351+zhangguanheng66@users.noreply.github.com> Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: James Reed <jamesreed@fb.com> Co-authored-by: Horace He <horacehe2007@yahoo.com> Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com> Co-authored-by: pritam <pritam.damania@fb.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>
* Update build.sh * Update audio tutorial for release pytorch 1.8 / torchaudio 0.8 (pytorch#1379) * [wip] replace audio tutorial * Update * Update * Update * fixup * Update requirements.txt * update * Update Co-authored-by: Brian Johnson <brianjo@fb.com> * [1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial (pytorch#1352) * switch to the new dataset API * checkpoint * checkpoint * checkpoint * update docs * checkpoint * switch to legacy vocab * update to follow the master API * checkpoint * checkpoint * address reviewer's comments Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: Brian Johnson <brianjo@fb.com> * [1.8 release] Switch to LM dataset in torchtext 0.9.0 release (pytorch#1349) * switch to raw text dataset in torchtext 0.9.0 release * follow the new API in torchtext master Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: Brian Johnson <brianjo@fb.com> * [WIP][FX] CPU Performance Profiling with FX (pytorch#1319) Co-authored-by: Brian Johnson <brianjo@fb.com> * [FX] Added fuser tutorial (pytorch#1356) * Added fuser tutorial * updated index.rst * fixed conclusion * responded to some comments * responded to comments * respond Co-authored-by: Brian Johnson <brianjo@fb.com> * Update numeric_suite_tutorial.py * Tutorial combining DDP with Pipeline Parallelism to Train Transformer models (pytorch#1347) * Tutorial combining DDP with Pipeline Parallelism to Train Transformer models. Summary: Tutorial which places a pipe on GPUs 0 and 1 and another Pipe on GPUs 2 and 3. Both pipe replicas are replicated via DDP. One process drives GPUs 0 and 1 and another drives GPUs 2 and 3. * Polish out some of the docs. * Add thumbnail and address some comments. Co-authored-by: pritam <pritam.damania@fb.com> * More updates to numeric_suite * Even more updates * Update numeric_suite_tutorial.py Hopefully that's the last one * Update numeric_suite_tutorial.py Last one * Update build.sh Co-authored-by: moto <855818+mthrok@users.noreply.github.com> Co-authored-by: Guanheng George Zhang <6156351+zhangguanheng66@users.noreply.github.com> Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: James Reed <jamesreed@fb.com> Co-authored-by: Horace He <horacehe2007@yahoo.com> Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com> Co-authored-by: pritam <pritam.damania@fb.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>
Not sure how to test it in notebook format.
Also, perhaps I'd like to bake in the output somehow? It would be somewhat embarassing if the fused version was slower due to noise :)