Skip to content

Commit e79f441

Browse files
authored
Merge branch 'site' into 4-15
2 parents a362015 + f740983 commit e79f441

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1956
-544
lines changed

.github/workflows/update-quick-start-module.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ jobs:
6363
update-quick-start:
6464
needs: [linux-nightly-matrix, windows-nightly-matrix, macos-arm64-nightly-matrix,
6565
linux-release-matrix, windows-release-matrix, macos-arm64-release-matrix]
66-
runs-on: "ubuntu-20.04"
66+
runs-on: "ubuntu-latest"
6767
environment: pytorchbot-env
6868
steps:
6969
- name: Checkout pytorch.github.io
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
category: event
3+
title: "Towards Autonomous Language Model Systems"
4+
date: May 21, 2025
5+
poster: assets/images/pt-day-cfp.png
6+
---
7+
8+
<a href="/autonomous-language-model-systems">
9+
<img style="width:100%" src="/assets/images/autonomous-language-model-systems.png" alt="Towards Autonomous Language Model Systems">
10+
</a>
11+
12+
**Date**: May 21, 2025, 11AM PT / 2PM ET
13+
**Location**: Online
14+
15+
Language models (LMs) are increasingly used to assist users in day-to-day tasks such as programming (Github Copilot) or search (Google's AI Overviews). But can we build language model systems that are able to autonomously complete entire tasks end-to-end?
16+
17+
In this talk, Ofir Press will discuss efforts to build autonomous LM systems, focusing on the software engineering domain. Ofir will present SWE-bench, a novel method for measuring AI systems on their abilities to fix real issues in popular software libraries. Ofir will then discuss SWE-agent, a system for solving SWE-bench tasks.
18+
19+
SWE-bench and SWE-agent are used by many leading AI organizations in academia and industry, including OpenAI, Anthropic, Meta, and Google, and SWE-bench has been downloaded over 2 million times. These projects show that academics on tight budgets can have a substantial impact in steering the research community toward building autonomous systems that can complete challenging tasks.
20+
21+
Ofir is a postdoc at Princeton University, where they mainly work with Karthik Narasimhan's lab. Ofir previously completed their PhD at the University of Washington in Seattle, where Ofir was advised by Noah Smith. During their PhD, Ofir spent two years at Facebook AI Research Labs on Luke Zettlemoyer's team.
22+
23+
[Register Now](/autonomous-language-model-systems)

_events/pt-27-release-qa.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,16 @@ poster: assets/images/pt27qa.png
1010
</a>
1111

1212
**Date**: April 28, 12 pm PT
13-
**Speaker**: Piotr Bialecki
13+
**Speakers**: Piotr Bialecki (NVIDIA) and Nikita Shulga (Meta)
1414
**Location**: Online
1515

16-
Wondering what's new in the PyTorch 2.7 release? Do you have questions? Join us for a live Q&A on PyTorch 2.7 with Piotr Bialecki, PyTorch Core Maintainer and Director of Engineering at NVIDIA.
16+
Have questions about PyTorch 2.7? Join PyTorch Core Maintainers Piotr Bialecki (NVIDIA) and Nikita Shulga (Meta) for a live Q&A session on Monday, April 28 at 12 PM PST.
1717

1818
Piotr joined the PyTorch team at NVIDIA in 2019 and currently manages the team. He drives NVIDIA’s effort in maintaining and advancing PyTorch’s CUDA backend and received the PyTorch SUPERHERO award in 2023 for his community contributions, especially in the PyTorch discussion board. As a Core Maintainer, he is also focused on PyTorch’s long-term vision and development.
1919

20-
Bring your PyTorch 2.7 questions for Piotr during this live Q&A session.
20+
Nikita is a Software Engineer at Meta where, among other things, he is responsible for PyTorch releases and continuous integration. Nikita is committed to uplifting the developer community and continuously improving PyTorch. He earned a Master’s degree in Applied Mathematics from the Moscow Institute of Physics and Technology (MIPT).
2121

22-
[Register now](/pt-27-release-qa)
22+
Bring your PyTorch 2.7 questions for Piotr & Nikita during this live Q&A session.
23+
24+
[Learn more about this event](/pt-27-release-qa)
2325

_get_started/mobile.md

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
layout: get_started
3-
title: ExecuTorch
3+
title: PyTorch for Edge
44
permalink: /get-started/executorch/
55
background-class: get-started-background
66
body-class: get-started
@@ -10,11 +10,29 @@ published: true
1010

1111
## Get Started with PyTorch ExecuTorch
1212

13-
<p>
14-
<a href="https://pytorch.org/executorch/stable/index.html" class="btn btn-lg with-right-arrow">
15-
ExecuTorch Documentation
16-
</a>
17-
</p>
13+
PyTorch’s edge specific library is [ExecuTorch](https://github.com/pytorch/executorch/) and is designed to be lightweight, very performant even on devices with constrained hardware such as mobile phones, embedded systems and microcontrollers.
14+
15+
ExecuTorch relies heavily on PyTorch core technologies such as [torch.compile](https://pytorch.org/docs/stable/torch.compiler.html) and [torch.export](https://pytorch.org/docs/stable/export.html), and should be very familiar to anyone who has used PyTorch in the past.
16+
17+
### Getting Started
18+
You can get started by following the [general getting started guide](https://pytorch.org/executorch/stable/getting-started.html#) or jump to the specific steps for your target device.
19+
20+
* [Using ExecuTorch on Android](https://pytorch.org/executorch/stable/using-executorch-android.html)
21+
* [Using ExecuTorch on iOS](https://pytorch.org/executorch/stable/using-executorch-ios.html)
22+
* [Using ExecuTorch with C++](https://pytorch.org/executorch/stable/using-executorch-cpp.html)
23+
24+
### Hardware Acceleration
25+
ExecuTorch provides out of the box hardware acceleration for a growing number of chip manufacturers. See the following resources to learn more about how to leverage them:
26+
27+
* [Backend Overview](https://pytorch.org/executorch/stable/backends-overview.html)
28+
* [XNNPACK](https://pytorch.org/executorch/stable/backends-xnnpack.html)
29+
* [Core ML](https://pytorch.org/executorch/stable/backends-coreml.html)
30+
* [MPS](https://pytorch.org/executorch/stable/backends-mps.html)
31+
* [Vulkan](https://pytorch.org/executorch/stable/backends-vulkan.html)
32+
* [ARM Ethos-U](https://pytorch.org/executorch/stable/backends-arm-ethos-u.html)
33+
* [Qualcomm AI Engine](https://pytorch.org/executorch/stable/backends-qualcomm.html)
34+
* [MediaTek](https://pytorch.org/executorch/stable/backends-mediatek.html)
35+
* [Cadence Xtensa](https://pytorch.org/executorch/stable/backends-cadence.html)
1836

1937

2038
<script page-id="mobile" src="{{ site.baseurl }}/assets/menu-tab-selection.js"></script>

_get_started/previous-versions.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,33 @@ your convenience.
1717

1818
## Commands for Versions >= 1.0.0
1919

20+
### v2.6.0
21+
22+
#### Wheel
23+
24+
##### OSX
25+
26+
```
27+
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0
28+
```
29+
30+
##### Linux and Windows
31+
32+
```
33+
# ROCM 6.1 (Linux only)
34+
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/rocm6.1
35+
# ROCM 6.2.4 (Linux only)
36+
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/rocm6.2.4
37+
# CUDA 11.8
38+
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118
39+
# CUDA 12.4
40+
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
41+
# CUDA 12.6
42+
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
43+
# CPU only
44+
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu
45+
```
46+
2047
### v2.5.1
2148

2249
#### Conda

_includes/main_menu.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343
<span class="dropdown-title">Tools</span>
4444
<p>Learn about the tools and frameworks in the PyTorch Ecosystem</p>
4545
</a>
46-
<a class="nav-dropdown-item" href="https://github.com/pytorch-fdn/ecosystem" target="_blank">
46+
<a class="nav-dropdown-item" href="{{ site.baseurl}}/join-ecosystem">
4747
<span class="dropdown-title">Join the Ecosystem</span>
4848
</a>
4949
<a class="nav-dropdown-item" href="{{ site.baseurl }}/#community-module">

_includes/mobile_menu.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
<a href="https://landscape.pytorch.org/">Tools</a>
5555
</li>
5656
<li>
57-
<a href="https://github.com/pytorch-fdn/ecosystem">Join the Ecosystem</a>
57+
<a href="{{ site.baseurl}}/join-ecosystem">Join the Ecosystem</a>
5858
</li>
5959
<li>
6060
<a href="{{ site.baseurl}}/#community-module">Community</a>

_includes/quick_start_local.html

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -58,16 +58,13 @@
5858
<div class="col-md-12 title-block mobile-heading">
5959
<div class="option-text">Package</div>
6060
</div>
61-
<div class="col-md-3 option block" id="conda">
62-
<div class="option-text">Conda</div>
63-
</div>
64-
<div class="col-md-3 option block selected" id="pip">
61+
<div class="col-md-4 option block selected" id="pip">
6562
<div class="option-text">Pip</div>
6663
</div>
67-
<div class="col-md-3 option block" id="libtorch">
64+
<div class="col-md-4 option block" id="libtorch">
6865
<div class="option-text">LibTorch</div>
6966
</div>
70-
<div class="col-md-3 option block" id="source">
67+
<div class="col-md-4 option block" id="source">
7168
<div class="option-text">Source</div>
7269
</div>
7370
</div>
@@ -107,7 +104,7 @@
107104
<div class="option-text">Run this Command:</div>
108105
</div>
109106
<div class="command-container">
110-
<div class="col-md-12" id="command">conda install pytorch torchvision -c pytorch</div>
107+
<div class="col-md-12" id="command">pip install torch torchvision</div>
111108
</div>
112109
</div>
113110
</div>

_posts/2025-04-23-pytorch-2-7.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
---
2+
layout: blog_detail
3+
title: "PyTorch 2.7 Release"
4+
---
5+
6+
We are excited to announce the release of PyTorch® 2.7 ([release notes](https://github.com/pytorch/pytorch/releases/tag/v2.7.0))! This release features:
7+
8+
* support for the [NVIDIA Blackwell GPU architecture](https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/) and pre-built wheels for [CUDA 12.8](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html) across Linux x86 and arm64 architectures.
9+
* *torch.compile* support for Torch Function Modes which enables users to override any *torch.** operation to implement custom user-defined behavior.
10+
* Mega Cache which allows users to have end-to-end portable caching for torch;
11+
* new features for FlexAttention - LLM first token processing, LLM throughput mode optimization and Flex Attention for Inference.
12+
13+
This release is composed of 3262 commits from 457 contributors since PyTorch 2.6. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.7. More information about how to get started with the PyTorch 2-series can be found at our [Getting Started](https://pytorch.org/get-started/pytorch-2.0/) page.
14+
15+
16+
<table class="table table-bordered">
17+
<tr>
18+
<td><strong>Beta</strong>
19+
</td>
20+
<td><strong>Prototype</strong>
21+
</td>
22+
</tr>
23+
<tr>
24+
<td>Torch.Compile support for Torch Function Modes
25+
</td>
26+
<td>NVIDIA Blackwell Architecture Support
27+
</td>
28+
</tr>
29+
<tr>
30+
<td>Mega Cache
31+
</td>
32+
<td>PyTorch Native Context Parallel
33+
</td>
34+
</tr>
35+
<tr>
36+
<td>
37+
</td>
38+
<td>Enhancing Intel GPU Acceleration
39+
</td>
40+
</tr>
41+
<tr>
42+
<td>
43+
</td>
44+
<td>FlexAttention LLM <span style="text-decoration:underline;">first token processing</span> on x86 CPUs
45+
</td>
46+
</tr>
47+
<tr>
48+
<td>
49+
</td>
50+
<td>FlexAttention LLM <span style="text-decoration:underline;">throughput mode optimization</span> on x86 CPUs
51+
</td>
52+
</tr>
53+
<tr>
54+
<td>
55+
</td>
56+
<td>Foreach Map
57+
</td>
58+
</tr>
59+
<tr>
60+
<td>
61+
</td>
62+
<td>Flex Attention for Inference
63+
</td>
64+
</tr>
65+
<tr>
66+
<td>
67+
</td>
68+
<td>Prologue Fusion Support in Inductor
69+
</td>
70+
</tr>
71+
</table>
72+
73+
74+
*To see a full list of public feature submissions click [here](https://docs.google.com/spreadsheets/d/1TzGkWuUMF1yTe88adz1dt2mzbIsZLd3PBasy588VWgk/edit?usp=sharing).
75+
76+
77+
## BETA FEATURES
78+
79+
80+
### [Beta] Torch.Compile support for Torch Function Modes
81+
82+
This feature enables users to override any *torch.** operation to implement custom user-defined behavior. For example, ops can be rewritten to accommodate a specific backend. This is used in FlexAttention to re-write indexing ops.
83+
84+
See the [tutorial](https://pytorch.org/tutorials/recipes/torch_compile_torch_function_modes.html) for more information.
85+
86+
87+
### [Beta] Mega Cache
88+
89+
Mega Cache allows users to have end-to-end portable caching for torch. The intended use case is after compiling and executing a model, the user calls *torch.compiler.save_cache_artifacts()* which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call *torch.compiler.load_cache_artifacts()* with these artifacts to pre-populate the torch.compile caches in order to jump-start their cache.
90+
91+
See the [tutorial](https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html#torch-compile-end-to-end-caching-mega-cache) for more information.
92+
93+
94+
## PROTOTYPE FEATURES
95+
96+
97+
### [Prototype] NVIDIA Blackwell Architecture Support
98+
99+
PyTorch 2.7 introduces support for NVIDIA's new Blackwell GPU architecture and ships pre-built wheels for CUDA 12.8. For more details on CUDA 12.8 see [CUDA Toolkit Release](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html).
100+
101+
102+
103+
* Core components and libraries including cuDNN, NCCL, and CUTLASS have been upgraded to ensure compatibility with Blackwell platforms.
104+
* PyTorch 2.7 includes Triton 3.3, which adds support for the Blackwell architecture with torch.compile compatibility.
105+
* To utilize these new features, install PyTorch with CUDA 12.8 using: *pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu128*
106+
107+
More context can also be found [here](https://github.com/pytorch/pytorch/issues/145949).
108+
109+
110+
### [Prototype] PyTorch Native Context Parallel
111+
112+
PyTorch Context Parallel API allows users to create a Python context so that every *torch.nn.functional.scaled_dot_product_attention() *call within will run with context parallelism. Currently, PyTorch Context Parallel supports 3 attention backends: 1. Flash attention; 2. Efficient attention; and 3. cuDNN attention.
113+
114+
As an example, this is [used within TorchTitan as the Context Parallel solution for LLM training](https://discuss.pytorch.org/t/distributed-w-torchtitan-breaking-barriers-training-long-context-llms-with-1m-sequence-length-in-pytorch-using-context-parallel/215082).
115+
116+
See [tutorial](https://pytorch.org/tutorials/prototype/context_parallel.html) here.
117+
118+
119+
### [Prototype] Enhancing Intel GPU Acceleration
120+
121+
This latest release introduces enhanced performance optimizations for Intel GPU architectures. These improvements accelerate workloads across various Intel GPUs through the following key enhancements:
122+
123+
124+
125+
* Enable torch.compile on Windows 11 for Intel GPUs, delivering the performance advantages over eager mode as on Linux.
126+
* Optimize the performance of PyTorch 2 Export Post Training Quantization (PT2E) on Intel GPU to provide a full graph mode quantization pipelines with enhanced computational efficiency.
127+
* Improve Scaled Dot-Product Attention (SDPA) inference performance with bfloat16 and float16 to accelerate attention-based models on Intel GPUs.
128+
* Enable AOTInuctor and torch.export on Linux to simplify deployment workflows.
129+
* Implement more Aten operators to enhance the continuity of operators execution on Intel GPU and increase the performance on Intel GPU in eager mode.
130+
* Enable profiler on both Windows and Linux to facilitate model performance analysis.
131+
* Expand the Intel GPUs support to [Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics](https://www.intel.com/content/www/us/en/products/details/processors/core-ultra.html), and [Intel® Arc™ B-Series graphics](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/desktop/b-series/overview.html) on both Windows and Linux.
132+
133+
For more information regarding Intel GPU support, please refer to [Getting Started Guide](https://pytorch.org/docs/main/notes/get_start_xpu.html).
134+
135+
See also the tutorials [here](https://pytorch.org/tutorials/prototype/inductor_windows.html) and [here](https://pytorch.org/tutorials/prototype/pt2e_quant_xpu_inductor.html).
136+
137+
138+
### [Prototype] FlexAttention LLM first token processing on x86 CPUs
139+
140+
FlexAttention x86 CPU support was first introduced in PyTorch 2.6, offering optimized implementations — such as PageAttention, which is critical for LLM inference—via the TorchInductor C++ backend. In PyTorch 2.7, more attention variants for first token processing of LLMs are supported. With this feature, users can have a smoother experience running FlexAttention on x86 CPUs, replacing specific *scaled_dot_product_attention* operators with a unified FlexAttention API, and benefiting from general support and good performance when using torch.compile.
141+
142+
143+
### [Prototype] FlexAttention LLM throughput mode optimization
144+
145+
The performance of FlexAttention on x86 CPUs for LLM inference throughput scenarios has been further improved by adopting the new C++ micro-GEMM template ability. This addresses the performance bottlenecks for large batch size scenarios present in PyTorch 2.6. With this enhancement, users can transparently benefit from better performance and a smoother experience when using FlexAttention APIs and torch.compile for LLM throughput serving on x86 CPUs.
146+
147+
148+
### [Prototype] Foreach Map
149+
150+
This feature uses torch.compile to allow users to apply any pointwise or user-defined function (e.g. torch.add) to lists of tensors, akin to the existing *torch._foreach_** ops. The main advantage over the existing *torch._foreach_** ops is that any mix of scalars or lists of tensors can be supplied as arguments, and even user-defined python functions can be lifted to apply to lists of tensors. Torch.compile will automatically generate a horizontally fused kernel for optimal performance.
151+
152+
See [tutorial](https://pytorch.org/tutorials/recipes/foreach_map.html) here.
153+
154+
155+
### [Prototype] Flex Attention for Inference
156+
157+
In release 2.5.0, [FlexAttention](https://pytorch.org/blog/flexattention/)* torch.nn.attention.flex_attention* was introduced for ML researchers who’d like to customize their attention kernels without writing kernel code. This update introduces a decoding backend optimized for inference, supporting GQA and PagedAttention, along with feature updates including nested jagged tensor support, performance tuning guides and trainable biases support.
158+
159+
### [Prototype] Prologue Fusion Support in Inductor
160+
161+
Prologue fusion optimizes matrix multiplication (matmul) operations by fusing operations that come before the matmul into the matmul kernel itself, improving performance by reducing global memory bandwidth.

0 commit comments

Comments
 (0)