-
Notifications
You must be signed in to change notification settings - Fork 24.9k
[quant][graphmode][fx] Produce torch.cat instead of torch.ops.quantized.cat #54924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ed.cat Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 1117ed2 (more details on the Dr. CI page):
1 failure not recognized by patterns:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
makes sense. Does this negatively impact accuracy? What happens if there is something like this
and conv1 and conv2 have qparams which are not close to each other? |
good question, I think probably not, even in current implementation, we add observer for output of cat, which is a concatenation of the output of both convs, so they are observed with the same observer anyways. |
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
@@ -67,6 +66,7 @@ | |||
torch.avg_pool1d, | |||
torch._C._nn.avg_pool2d, | |||
torch._C._nn.avg_pool3d, | |||
torch.cat, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we also need to update the TestFXNumericSuiteCoreAPIs.test_op_io_dtype_coverage
test for cat to point to this list of functions, instead of the old one
…ops.quantized.cat" Summary: Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D27416528](https://our.internmc.facebook.com/intern/diff/D27416528) [ghstack-poisoned]
This pull request has been merged in 096089a. |
…ed.cat (pytorch#54924) Summary: Pull Request resolved: pytorch#54924 Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Imported from OSS Reviewed By: vkuzo Differential Revision: D27416528 fbshipit-source-id: 896c280abec2903c29d597c655729666583ff0dd
Stack from ghstack:
Summary:
Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them
and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat
will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with
the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing
the same observer/fakequant instance).
Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_cat
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D27416528