-
Notifications
You must be signed in to change notification settings - Fork 127
Extending minifier for detecting accuracy issues #1242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When I patch this into the symbolic shapes branches, and then attempt to run the minifier with
I get
|
more informative stack trace
|
@ezyang Fixed in the latest commit. |
I ran it, it chundered on for a bit, and then it seems to have failed in a way where I don't have a minified copy. It is failing with:
Is the minifier not catching enough exceptions? |
e1cbac7
to
9e42f0f
Compare
I try to use your branch for #1039, but the generated repro.py doesn't really give an accuracy error. |
I pushed a merge to master as this has the zero_grad fix that I still need for pytorch_BERT based minimizations |
The produced minified scripts don't actually test for equality on the outputs |
19f8b9d
to
56f5587
Compare
torchdynamo/debug_utils.py
Outdated
tensor_str = f"torch.randn({list(buffer.shape)}, dtype={buffer.dtype})" | ||
else: | ||
tensor_str = ( | ||
f"torch.randint(2, size={list(buffer.shape)}, dtype={buffer.dtype})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably just do 1 here.
torchdynamo/debug_utils.py
Outdated
from torchinductor import config | ||
from torchinductor.compile_fx import compile_fx_inner | ||
|
||
config.triton.autotune = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for?
inductor or nvfuser. Intercepting after Aot Autograd presents neat | ||
abstration, where all the params are lifted as graph inputs, making it easy | ||
to save the graph as a string. | ||
""" | ||
|
||
@functools.wraps(compiler) | ||
@functools.wraps(compiler_fn) | ||
def debug_wrapper(gm, example_inputs, **kwargs): | ||
orig_graph = copy.deepcopy(gm.graph) | ||
assert config.repro_after in ("dynamo", "aot", None) | ||
|
||
def deferred_for_real_inputs(*real_inputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to fix, but this seems kinda ... strange to me. We delay compilation from the ... first time we see inputs, to slightly later after the first time we see inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do compilation with fake tensors first and then run the compiled model with real tensors later. But the code gets real ugly.
@@ -646,24 +762,42 @@ def debug_wrapper(gm, example_inputs, **kwargs): | |||
config.raise_on_backend_error = True | |||
if config.repro_level == 3: | |||
dump_to_minify_after_dynamo(gm, example_inputs, compiler_name) | |||
try: | |||
|
|||
# Check for either accuracy (level 4) or other type of failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if we could share more code between the two, but not a big deal if it's awkward.
56f5587
to
bc6ba39
Compare
zero_grad Requires grad Accuracy minifier Extending to Inductor
bc6ba39
to
c74e5de
Compare
Accuracy minifier for TORCHDYANMO_REPRO_AFTER = "dynamo". This likely requires more work but unblocks currents efforts to move forward with Inductor accuracy debugging.
Example usage -
TORCHDYNAMO_REPRO_AFTER="dynamo" TORCHDYNAMO_REPRO_LEVEL=4 python benchmarks/timm_models.py --accuracy --ci -d cuda --inductor --float32 --training --only=crossvit_9_240
Remaining work tracked here - pytorch/pytorch#93673