Skip to content

quantize_per_tensor get diffrenet results w/o setting OMP_NUM_THREADS=1 #80501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zhuhaozhe opened this issue Jun 29, 2022 · 7 comments
Closed
Assignees
Labels
oncall: quantization Quantization support in PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@zhuhaozhe
Copy link
Collaborator

zhuhaozhe commented Jun 29, 2022

🐛 Describe the bug

For python file below (we call test_quant.py)

import torch
torch.manual_seed(20)

w = torch.randn(39979771, 128)
scales =  0.00187
zp = 0

i8_arg = torch.quantize_per_tensor(w, scales, zp, torch.qint8)
arg = i8_arg.dequantize()
print(w[10622505])
print(i8_arg[10622505])
python test_quant.py
OMP_NUM_THREADS=1 python test_quant.py

will get different i8_args.

Versions

Collecting environment information...
PyTorch version: 1.13.0a0+git0922cc0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: CentOS Stream 8 (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
Clang version: 13.0.0 (Red Hat 13.0.0-3.module_el8.6.0+1074+380cef3f)
CMake version: version 3.19.6
Libc version: glibc-2.10

Python version: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.18.0-365.el8.x86_64-x86_64-with-centos-8
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] intel-extension-for-pytorch==1.12.0+cpu
[pip3] numpy==1.21.2
[pip3] torch==1.13.0a0+git0922cc0
[conda] blas 1.0 mkl
[conda] intel-extension-for-pytorch 1.12.0+cpu pypi_0 pypi
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-include 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py37h7f8727e_0
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h51133e4_0
[conda] numpy 1.21.2 py37h20f2e39_0
[conda] numpy-base 1.21.2 py37h79a1101_0
[conda] torch 1.13.0a0+git0922cc0 dev_0

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo

@zhuhaozhe
Copy link
Collaborator Author

zhuhaozhe commented Jun 29, 2022

with setting OMP_NUM_THREADS=1
will get all zero tensor
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.], size=(128,), dtype=torch.qint8,
quantization_scheme=torch.per_tensor_affine, scale=0.00187, zero_point=0)

@zhuhaozhe
Copy link
Collaborator Author

And if not setting OMP_NUM_THREADS
will get
tensor([-0.2394, -0.2394, -0.2394, 0.2375, 0.2375, -0.2394, -0.2394, -0.2394,
-0.2394, 0.1066, -0.2394, -0.2394, -0.2394, -0.2394, 0.2375, -0.2394,
-0.2394, -0.2076, -0.2394, -0.2394, -0.0430, -0.2394, -0.2394, -0.2394,
0.1010, -0.0374, 0.2375, -0.0411, 0.2375, 0.2375, 0.2375, -0.2394,
0.2375, -0.2394, -0.2394, 0.2375, -0.2394, -0.0037, -0.2394, 0.2375,
0.1477, -0.1945, -0.0112, 0.2375, 0.2375, 0.2375, -0.0206, -0.0280,
-0.2394, -0.2394, 0.0879, 0.2375, 0.0430, -0.2394, -0.2394, 0.2375,
-0.2394, 0.2375, 0.0785, 0.2375, -0.1889, 0.2375, -0.2394, 0.2375,
-0.2394, 0.0785, 0.2375, -0.2394, 0.2375, -0.2394, 0.2375, -0.2394,
0.2375, -0.2394, -0.2394, -0.2394, -0.0411, 0.2375, 0.2375, -0.2394,
0.2375, 0.2375, -0.2394, -0.2394, 0.1964, 0.2375, -0.2394, -0.2394,
0.2375, 0.2375, 0.2375, 0.0561, 0.2375, -0.2394, 0.2375, 0.0954,
-0.2394, 0.2375, -0.0729, -0.2394, 0.2375, 0.2375, -0.2394, 0.2375,
0.2375, 0.2375, -0.2394, -0.2394, 0.2375, 0.2375, -0.2394, -0.0262,
0.2375, -0.1384, -0.0729, -0.2394, -0.2394, -0.2394, 0.2375, -0.1982,
0.2375, 0.0916, 0.2375, -0.2394, 0.2375, -0.2394, 0.1253, 0.2375],
size=(128,), dtype=torch.qint8,
quantization_scheme=torch.per_tensor_affine, scale=0.00187, zero_point=0)

@soulitzer soulitzer added the oncall: quantization Quantization support in PyTorch label Jun 29, 2022
@jerryzh168
Copy link
Contributor

can you confirm w is the same in both cases

@zhuhaozhe
Copy link
Collaborator Author

OMP_NUM_THREADS=1 python test_quant.py

Thanks for reply, @jerryzh168 .
I confirmed w is the same.
I print w[10622505] and w.sum(). Got same results:
image

@z-a-f z-a-f added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 2, 2022
@XiaobingSuper
Copy link
Collaborator

@jerryzh168 , there has a patch fix in pytorch/FBGEMM#1261.

facebook-github-bot pushed a commit to pytorch/FBGEMM that referenced this issue Aug 30, 2022
Summary:
There will have a data overflow for int type when input len > 2,147,483,647, PyTorch side has reported an issue which using FBGEMM Quantize, see pytorch/pytorch#80501, the user example's input len is 5,117,410,688,  but FBGEMM side use **int** to represent the input len, there will get a wrong number for single thread case.

This PR just fixed those two **Quantize** and **FindMinMax** API which are used in PyTorch side, there may have other functions which need to be updated using high-precison dtype.

Pull Request resolved: #1261

Reviewed By: jianyuh

Differential Revision: D39089686

Pulled By: jspark1105

fbshipit-source-id: 9623bbb20bdba0f98040a1c8143e4bc552d2a6cb
@jerryzh168
Copy link
Contributor

is this fixed? @zhuhaozhe @XiaobingSuper

@XiaobingSuper
Copy link
Collaborator

@jerryzh168 , I check the FBGEMM used by PyTorch, pytorch/FBGEMM#1261 has been applied, so I think this issue is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: quantization Quantization support in PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants