[CPU] Enable DA8W4 on CPU #2128

Xia-Weiwen · 2025-04-25T10:22:16Z

Summary
This PR enables DA8W4 on CPU.

It adds a new layout Int8DynamicActInt4WeightCPULayout and its implementation
It adds two custom ops: da8w4_linear_prepack_cpu for weight packing and da8w4_linear_cpu for DA8W4 GEMM.
It adds C++ kernels for the two new custom ops

The ops and kernels won't be available unless torchao is built from source with USE_CPP_KERNELS=1 on Linux only.

Test plan

pytest test/quantization/test_quant_api.py -k test_8da4w_cpu

pytorch-bot · 2025-04-25T10:22:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2128

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 369000f with merge base 60d63a6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2025-04-28T23:48:00Z

test/quantization/test_quant_api.py

+                torch.compile(m, fullgraph=True, dynamic=True),
+                *example_inputs,
+            )
+            assert "_weight_int4pack_mm_for_cpu" in code[0]


I remember this op is weight only quant?

Yes. This op is used here because we don't have an op to compute da8w4 on CPU yet. So it will fallback to explicit dequantization and call of this op.

I have re-implemented the path. Now we use a new layout for da8w4. Thanks.

Xia-Weiwen · 2025-05-14T10:22:42Z

@leslie-fang-intel This PR is updated to use a new layout. Please review again. Thanks.

leslie-fang-intel · 2025-05-15T02:46:14Z

torchao/dtypes/uintx/int4_cpu_layout.py

+                args[0].scales,
+                args[0].qzeros,
+                not args[0].transposed,
+                args[0]._layout,


Could you explain more about the implementation of aten.t.default for DA8W4CPUAQTTensorImpl? It seems we only changes the transposed flag here? Do we have testcase to cover it?

Thanks for the comment. There is a comment explaining this just above. And this is a copy from the Int4CPULayout. We have it here because Int4CPULayout has it. And I am afraid there is not a UT for it. And it seems unused when running models. Do you want me to remove it?

yean, how about we remove it at now and add it back with the UT when met the use case.

I have removed it. Thanks.

Xia-Weiwen · 2025-05-16T09:51:35Z

Hi @jerryzh168 Could you please review this PR? Thanks.

Xia-Weiwen · 2025-05-19T02:09:46Z

Hi @jerryzh168 Could you please review this PR? Thanks.

Xia-Weiwen · 2025-05-20T14:36:51Z

Hi @jerryzh168 Could you please review this PR? Thanks.

[CPU] enable int8_dynamic_activation_int4_weight with Int4CPULayout

0581451

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 25, 2025

Merge branch 'main' into da8w4_with_int4_cpu_layout

dffbbab

Xia-Weiwen added cpu quantize topic: new feature Use this tag if this PR adds a new feature labels Apr 25, 2025

Xia-Weiwen added 2 commits April 25, 2025 03:27

Fix format issue

9fb7f77

Merge branch 'main' into da8w4_with_int4_cpu_layout

35ece3b

Xia-Weiwen requested a review from leslie-fang-intel April 28, 2025 11:02

jerryzh168 reviewed Apr 28, 2025

View reviewed changes

leslie-fang-intel approved these changes Apr 29, 2025

View reviewed changes

Xia-Weiwen marked this pull request as ready for review April 29, 2025 02:01

Xia-Weiwen requested a review from jerryzh168 April 29, 2025 03:16

Xia-Weiwen marked this pull request as draft May 7, 2025 01:17

Xia-Weiwen added 2 commits May 11, 2025 20:08

Merge branch 'main' into da8w4_with_int4_cpu_layout

c5b6d87

Add Int8DynamicActInt4WeightCPULayout

8e80d03

Xia-Weiwen requested a review from leslie-fang-intel May 14, 2025 10:22

Merge branch 'main' into da8w4_with_int4_cpu_layout

51249c3

leslie-fang-intel reviewed May 15, 2025

View reviewed changes

Xia-Weiwen changed the title ~~[CPU] enable int8_dynamic_activation_int4_weight with Int4CPULayout~~ [CPU] enable int8_dynamic_activation_int4_weight on CPU May 16, 2025

remove dispatch for t()

3e20172

Xia-Weiwen marked this pull request as ready for review May 16, 2025 05:59

Xia-Weiwen changed the title ~~[CPU] enable int8_dynamic_activation_int4_weight on CPU~~ [CPU] Add a new layout for int8_dynamic_activation_int4_weight on CPU May 16, 2025

Merge branch 'main' into da8w4_with_int4_cpu_layout

e765664

Xia-Weiwen marked this pull request as draft May 21, 2025 02:57

Xia-Weiwen removed the request for review from jerryzh168 May 21, 2025 02:57

Xia-Weiwen added 3 commits May 23, 2025 00:30

Add cpp kernel for weight packing and GEMM

4feac3f

Register ATQ linear dispatch for da8w4 linear

0d85183

Fix issues with torch.compile

c42abdb

Xia-Weiwen changed the title ~~[CPU] Add a new layout for int8_dynamic_activation_int4_weight on CPU~~ [CPU] Enable DA8W4 on CPU May 26, 2025

Xia-Weiwen added 4 commits May 25, 2025 19:34

Merge branch 'main' into da8w4_with_int4_cpu_layout

e2815ce

Fix DA8W4CPUAQTTensorImpl.get_plain

8c5eebb

Test DA8W4CPUAQTTensorImpl.get_plain in UT

2a26e15

Skip UT if CPP kernel not built

369000f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CPU] Enable DA8W4 on CPU #2128

[CPU] Enable DA8W4 on CPU #2128

Uh oh!

Xia-Weiwen commented Apr 25, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 25, 2025 •

edited

Loading

Uh oh!

jerryzh168 Apr 28, 2025

Uh oh!

Xia-Weiwen Apr 29, 2025

Uh oh!

Xia-Weiwen May 14, 2025

Uh oh!

Xia-Weiwen commented May 14, 2025

Uh oh!

leslie-fang-intel May 15, 2025

Uh oh!

Xia-Weiwen May 16, 2025 •

edited

Loading

Uh oh!

leslie-fang-intel May 16, 2025

Uh oh!

Xia-Weiwen May 16, 2025

Uh oh!

Xia-Weiwen commented May 16, 2025

Uh oh!

Xia-Weiwen commented May 19, 2025

Uh oh!

Xia-Weiwen commented May 20, 2025

Uh oh!

Uh oh!

[CPU] Enable DA8W4 on CPU #2128

Are you sure you want to change the base?

[CPU] Enable DA8W4 on CPU #2128

Uh oh!

Conversation

Xia-Weiwen commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2128

✅ No Failures

Uh oh!

jerryzh168 Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen May 14, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen commented May 14, 2025

Uh oh!

leslie-fang-intel May 15, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leslie-fang-intel May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen commented May 16, 2025

Uh oh!

Xia-Weiwen commented May 19, 2025

Uh oh!

Xia-Weiwen commented May 20, 2025

Uh oh!

Uh oh!

Xia-Weiwen commented Apr 25, 2025 •

edited

Loading

pytorch-bot bot commented Apr 25, 2025 •

edited

Loading

Xia-Weiwen May 16, 2025 •

edited

Loading