[Inductor] [CPU] Vectorization not supporting python pass-in scalar double in speech_transformer #93446
Labels
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
Comparing performances of
speech_transformer
with backends inductor and IPEX, inductor is 0.68 IPEX. The main reason is that vectorization does not support python pass-in scalar double.Profiling and Code snippet
According to the profiling analysis, bottlenecks are
kernel_cpp_8, kernel_cpp_14, kernel_cpp_20, kernel_cpp_2, kernel_cpp_32 and kernel_cpp_26
, which are the implementations for the same Python code snippet:As
self.temperature
in Python, a.k.a.__restrict__ in_ptr2
in C++, is a double scalar, vectorization is not applied.Minified repro
python benchmarks/dynamo/torchbench.py --performance --float32 -dcpu -n50 --inductor --no-skip --dashboard -k "speech_transformer" --cold_start_latency --channels-last
cc @EikanWang @jgong5
The text was updated successfully, but these errors were encountered: