-
Notifications
You must be signed in to change notification settings - Fork 273
Insights: pytorch/ao
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v0.11.0
published
May 9, 2025
82 Pull requests merged by 29 people
-
Remove Constraint for sm89 hardware
#2281 merged
Jun 2, 2025 -
Fix benchmark_low_bit_adam.py reference
#2287 merged
Jun 1, 2025 -
Fix Bug in MX Builds
#2284 merged
May 31, 2025 -
Add back AOPerModuleConfig for BC
#2282 merged
May 31, 2025 -
Patch the _is_conv_node function
#2257 merged
May 31, 2025 -
Fixes MX formats build for blackwell
#2278 merged
May 30, 2025 -
Update CMake to enable building ops on iOS
#2274 merged
May 30, 2025 -
Resolve logger warnings
#2250 merged
May 30, 2025 -
Add Integration Tests to H100 CI
#2268 merged
May 30, 2025 -
Make optim lazily intialize global state
#2277 merged
May 30, 2025 -
Fix generate.py for fbgemm int4 integration
#2273 merged
May 29, 2025 -
Mark QAT range learning as prototype for now
#2272 merged
May 29, 2025 -
Enable range learning for QAT
#2033 merged
May 29, 2025 -
Fix torchao generate script for cpu device
#2267 merged
May 29, 2025 -
Enable fp16+int4 mixed precission path for int4 xpu path with int zero point
#2240 merged
May 29, 2025 -
integration-vllm-test
#2258 merged
May 28, 2025 -
Add support for fbgemm int4 mm kernel
#2255 merged
May 28, 2025 -
[reland2][ROCm] preshuffled weight mm
#2207 merged
May 28, 2025 -
Support INT8 SDPA template for CPU
#2148 merged
May 28, 2025 -
Fix Per Row scaling for inference
#2253 merged
May 27, 2025 -
Revert "Try fixing CI by pinning pytest (#2238)"
#2263 merged
May 27, 2025 -
Rename AOPerModuleConfig to ModuleFqnToConfig
#2243 merged
May 24, 2025 -
Add backward compatible types to pt2e prepare
#2244 merged
May 23, 2025 -
Relax int4wo device mismatch error
#2254 merged
May 23, 2025 -
Revert "Patch the _is_conv_node function"
#2247 merged
May 23, 2025 -
Patch the _is_conv_node function
#2223 merged
May 22, 2025 -
Update Readme
#1526 merged
May 22, 2025 -
[sparse] Add fp8 sparse gemm with rowwise scaling for activation sparsity
#2242 merged
May 22, 2025 -
Try fixing CI by pinning pytest
#2238 merged
May 22, 2025 -
Relax MOE constraints and add test for torch.mm computation
#2227 merged
May 22, 2025 -
clean up prototype folder
#2232 merged
May 21, 2025 -
remove benchmarks from top level repo
#2233 merged
May 21, 2025 -
Update GemLite to support vLLM V1
#2199 merged
May 21, 2025 -
Remove preserve_zero and zero_point_domain from choose_qparams_affine
#2149 merged
May 21, 2025 -
use correct fp8 quantization dtype for AMD GPU
#2225 merged
May 21, 2025 -
Re-land the PR of "Add INT8 SDPA path for CPU"
#2215 merged
May 21, 2025 -
Update config.py
#2224 merged
May 20, 2025 -
Make torchao pt2e prepare/convert functions compatible with quantizers in torch.ao
#2221 merged
May 19, 2025 -
Enable {conv3d, conv_transpose3d} + bn fusion in pt2e
#2212 merged
May 15, 2025 -
Add CI for Arm Linux
#2211 merged
May 15, 2025 -
ROCm mxfp4 Skips
#2209 merged
May 14, 2025 -
Add support for KleidiAI int4 kernels on aarch64 Linux
#2169 merged
May 14, 2025 -
unbreak CI by fixing MX tests
#2208 merged
May 14, 2025 -
Update __init__.py
#2206 merged
May 14, 2025 -
Add mx_fp4 path
#2201 merged
May 13, 2025 -
Arm_inductor_quantizer for Pt2e quantization
#2139 merged
May 13, 2025 -
[float] document e2e training -> inference flow
#2190 merged
May 13, 2025 -
Remove
sparsity/prototype/blocksparse
#2205 merged
May 13, 2025 -
Skips for ROCm (X86 Inductor Tests)
#2202 merged
May 13, 2025 -
Add blockwise fp8 gemm benchmarks to README
#2203 merged
May 12, 2025 -
Feat: Implementation of the DeepSeek blockwise quantization for fp8 tensors
#1763 merged
May 12, 2025 -
Add noindex to 0.10 and 0.9 docs
#2194 merged
May 12, 2025 -
Add subclass based method for inference w/ MXFP8
#2132 merged
May 12, 2025 -
unpin torch to unbreak mac tests
#2198 merged
May 12, 2025 -
2:4 activation sparsity packing kernels
#2012 merged
May 12, 2025 -
Forward fix lint
#2197 merged
May 12, 2025 -
Skip ROCm MoE Quantization
#2191 merged
May 12, 2025 -
[optim] Fix low-bit optim when used with FSDP2+CPUOffload
#2195 merged
May 10, 2025 -
[PT2E][X86] Migrate fusion passes in Inductor to torchao
#2140 merged
May 10, 2025 -
Uses torch.version.cuda to compile CUDA extensions
#2193 merged
May 9, 2025 -
Move moe quant to better prototype dir
#2192 merged
May 9, 2025 -
Set eps in end-to-end QAT flow
#2180 merged
May 9, 2025 -
metal lowbit kernels: qmv_fast optimization
#2167 merged
May 9, 2025 -
[testing]Triaging ROCm wheel build
#2161 merged
May 9, 2025 -
Add a triton kernel for swizziling
#2168 merged
May 9, 2025 -
Enabling MOE Quantization using linear decomposition
#2043 merged
May 8, 2025 -
Remove broken test
#2188 merged
May 8, 2025 -
Add serialization support for
AOPerModuleConfig
#2186 merged
May 8, 2025 -
Generate speedup for inference
#2151 merged
May 7, 2025 -
Fix cuda compile error with bf16
#2122 merged
May 7, 2025 -
[BE] Fix MPS experimental workflow
#2181 merged
May 7, 2025 -
Bump version to 0.12.0
#2178 merged
May 6, 2025 -
Fix linux cpu builds. Resolves nightly build for mac stops on 0422
#2170 merged
May 6, 2025 -
[reland] Fixing aliasing behavior for slice in AQT int4wo layout
#2176 merged
May 6, 2025 -
Revert "Fixing aliasing behavior for slice in AQT TensorCoreTiledLayout"
#2175 merged
May 6, 2025 -
Fixing aliasing behavior for slice in AQT TensorCoreTiledLayout
#2174 merged
May 6, 2025 -
Update ruff version in dev-requirements to match CI
#2172 merged
May 5, 2025 -
Remove fix not needed anymore after moving CUTLASS pin to v3.9.0
#2160 merged
May 3, 2025 -
Update QAT README.md
#2162 merged
May 2, 2025 -
Removes pinned version for pytest
#2158 merged
May 2, 2025 -
[MX] Remove mxfp8 kernel and rely on cublas
#2130 merged
May 2, 2025 -
Uses torch.version.cuda to compile CUDA extensions
#2163 merged
May 2, 2025
30 Pull requests opened by 19 people
-
Update utils_parallel_dequant.cuh
#2164 opened
May 2, 2025 -
tesor scaling added
#2171 opened
May 5, 2025 -
[PT2E] Fix per-tensor observer issue with varying shape & rank
#2177 opened
May 6, 2025 -
Eval hf models using lm_eval
#2179 opened
May 6, 2025 -
[Do not Land] Re-land "Add INT8 SDPA path for CPU" (#2093)
#2183 opened
May 7, 2025 -
[Not for land] remove workaround for slow rowwise cutlass gemm
#2185 opened
May 8, 2025 -
Enable Int4WeightOnlyGPTQQuantizer on Intel GPU.
#2200 opened
May 12, 2025 -
primitive scale fix
#2210 opened
May 14, 2025 -
Add activation sparsity (24 + fp8 dynamic quant) subclass
#2213 opened
May 15, 2025 -
Fixes MX formats build for blackwell
#2214 opened
May 15, 2025 -
Convert Pytest to Unittest for tests under test/dtypes/
#2216 opened
May 16, 2025 -
Update temp_build.py
#2218 opened
May 17, 2025 -
Manually specify flags if no arch set
#2219 opened
May 19, 2025 -
Fix failing tests on h100
#2231 opened
May 21, 2025 -
GPTQ updates
#2235 opened
May 21, 2025 -
Test older almalinux image
#2236 opened
May 21, 2025 -
[draft] Update regression_test.yml
#2237 opened
May 22, 2025 -
fix _replace_with_custom_fn_if_matches_filter in quant_api.py
#2252 opened
May 23, 2025 -
Add a way to do power of 2 scaling
#2256 opened
May 23, 2025 -
Add benchmark numbers to dashboard
#2260 opened
May 24, 2025 -
test_affine_quantized_float.py pytest too unittest
#2261 opened
May 25, 2025 -
Test d script
#2264 opened
May 27, 2025 -
Update QAT docs, highlight axolotl integration
#2266 opened
May 28, 2025 -
[float8 training] remove duplicate override for view
#2269 opened
May 29, 2025 -
float8 moe training conversion API prototype
#2275 opened
May 30, 2025 -
[WIP] Add support for fbgemm fp8 kernels
#2276 opened
May 30, 2025 -
Fix QAT range learning, ensure scales get gradients
#2280 opened
May 30, 2025 -
[do not land] testing if moving this breaks my PRs
#2283 opened
May 30, 2025 -
Build mxfp4 kernel for sm120a
#2285 opened
May 31, 2025 -
[optim] Fix bug when default dtype is BF16
#2286 opened
May 31, 2025
9 Issues closed by 6 people
-
cannot save fp8-wo model
#2230 closed
May 21, 2025 -
KleidiAI int4 kernels not loading properly on aarch64 Linux
#2143 closed
May 16, 2025 -
New test files will likely fail on ROCM
#2204 closed
May 13, 2025 -
FSDP2 + CPU Offload + AdamW8bit issue
#1931 closed
May 10, 2025 -
nightly build for mac stops on 0422
#2157 closed
May 6, 2025 -
Torchao's CPU overhead counteracts the performance benefit of using quantization kernel.
#1930 closed
May 6, 2025 -
QAT docs
#2155 closed
May 2, 2025 -
[Doc] gemlite version
#1653 closed
May 2, 2025
18 Issues opened by 12 people
-
QAT range learning tracker
#2271 opened
May 29, 2025 -
[pt2e] QAT training and FSDP support
#2265 opened
May 27, 2025 -
convert_to_float8_training and torch.compile make model slow
#2262 opened
May 26, 2025 -
torch.ao.quantization deprecation tracker
#2259 opened
May 24, 2025 -
We should deprecate Float8LinearConfig.force_recompute_fp8_weight_in_bwd
#2251 opened
May 23, 2025 -
int4_weight_only get plain weight are padded
#2249 opened
May 23, 2025 -
`quantize_(nn.Linear)` doesn't work with module swaps
#2241 opened
May 22, 2025 -
BatchNorm + Convolution fusion in `prepare_pt2e` removal
#2245 opened
May 22, 2025 -
Tensor Subclass + VLLM Compile
#2239 opened
May 22, 2025 -
MXFP Inference Tracking Doc
#2229 opened
May 21, 2025 -
[Quant] Can quant not be decomposed on inductor?
#2228 opened
May 20, 2025 -
newer torchao breaks sglang?
#2226 opened
May 19, 2025 -
TorchAO needs to update its build system
#2222 opened
May 19, 2025 -
Ship all CUDA kernels in a single .so
#2220 opened
May 19, 2025 -
Add MXFP casting kernels from triton Repro
#2217 opened
May 16, 2025 -
[QAT] Linear layer's weight quantization granularity can only be per_group
#2189 opened
May 9, 2025 -
[float8] Support power of 2 scales with PerRow scales for inference
#2182 opened
May 7, 2025
19 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[CPU] Enable DA8W4 on CPU
#2128 commented on
May 27, 2025 • 5 new comments -
Support microbenchmarking for low precision training
#2101 commented on
May 8, 2025 • 5 new comments -
Enhance test_autoquant_compile to support ROCm
#2100 commented on
May 14, 2025 • 2 new comments -
Implement dtensor.shard_dim_alltoall, aten.contiguous, aten.chunk
#2154 commented on
May 20, 2025 • 0 new comments -
[WIP] all-gather fp8 for rowwise
#2145 commented on
May 23, 2025 • 0 new comments -
ROCm mx-fp8 Gemm
#2066 commented on
May 6, 2025 • 0 new comments -
[sparsity] Add PartialLinear module for structured sparsity
#1982 commented on
May 15, 2025 • 0 new comments -
Fix wrong scale eps applied
#1770 commented on
May 19, 2025 • 0 new comments -
[draft] add all_gather_into_tensor
#1737 commented on
May 16, 2025 • 0 new comments -
Sam2 video
#1564 commented on
Jun 1, 2025 • 0 new comments -
[roadmap/tracker] Low precision training for MoEs
#2147 commented on
May 27, 2025 • 0 new comments -
MX single node performance tracker
#1768 commented on
May 22, 2025 • 0 new comments -
[feature request] np.packbits / np.unpackbits, general BitTensors (maybe can be just tensors with dtype torch.bits8 or have a new dtype torch.bits introduced) and bit packed tensors utilities for saving memory / accesses, support for BitTensors wherever BoolTensors are used
#292 commented on
May 15, 2025 • 0 new comments -
How does this work with ONNX export and quantization?
#777 commented on
May 14, 2025 • 0 new comments -
[float8] Add support for blockwise fp8 quantization scheme used in DeepSeek v3
#1594 commented on
May 13, 2025 • 0 new comments -
Dynamo error with large mesh + AdamWFp8 + bf16 stochastic rounding
#2074 commented on
May 12, 2025 • 0 new comments -
Can FP8 GEMM be enabled via module hooks instead of module swapping?
#1887 commented on
May 12, 2025 • 0 new comments -
[PT2E] observers do not handle inputs with different shapes correctly
#2112 commented on
May 8, 2025 • 0 new comments -
QAT model drops accuracy after converting with torch.ao.quantization.convert
#2138 commented on
May 5, 2025 • 0 new comments