Description
🚀 Descirbe the improvement or the new tutorial
The 4th Generation Intel® Xeon® Scalable Processor platform is an unique, scalable platform optimized for different workloads acceleration on AI. The new built-in AI acceleration engine, Intel® Advanced Matrix Extensions (AMX) is able to accelerate a variety of AI Inference and Training workloads (NLP, recommendation systems, image recognition…) with BF16 and INT8 datatype.
PyTorch has enabled AMX support for computation intensive operators, e.g. Conv2d, ConvTranspose2d, Linear, MatMul, bmm with torch.bfloat16
datatype and int8 on the quantization backend. It is better to write a tutorial to tell users how to leverage AMX on PyTorch.
Existing tutorials on this topic
No response
Additional context
We aim to complete the document as part of PyTorch Docathon 2023. cc @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10 @ZailiWang @ZhaoqiongZ @leslie-fang-intel @Xia-Weiwen @sekahler2 @CaoE @zhuhaozhe @Valentine233 @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen @CaoE