Wafer-scale integration

Wafer-scale integration (WSI) is a system of building very-large integrated circuit (commonly called a "chip") networks from an entire silicon wafer to produce a single "super-chip". Combining large size and reduced packaging, WSI was expected to lead to dramatically reduced costs for some systems, notably massively parallel supercomputers but is now being employed for deep learning. The name is taken from the term very-large-scale integration, the state of the art when WSI was being developed.

Overview

In the normal integrated circuit manufacturing process, a single large cylindrical crystal (boule) of silicon is produced and then cut into disks known as wafers. The wafers are then cleaned and polished in preparation for the fabrication process. A photographic process is used to pattern the surface where material ought to be deposited on top of the wafer and where not to. The desired material is deposited and the photographic mask is removed for the next layer. From then on the wafer is repeatedly processed in this fashion, putting on layer after layer of circuitry on the surface.

Multiple copies of these patterns are deposited on the wafer in a grid fashion across the surface of the wafer. After all the possible locations are patterned, the wafer surface appears like a sheet of graph paper, with grid lines delineating the individual chips. Each of these grid locations is tested for manufacturing defects by automated equipment. Those locations that are found to be defective are recorded and marked with a dot of paint (this process is referred to as "inking a die" and more modern wafer fabrication techniques no longer require physical markings to identify defective die). The wafer is then sawed apart to cut out the individual chips. Those defective chips are thrown away, or recycled, while the working chips are placed into packaging and re-tested for any damage that might occur during the packaging process.

Flaws on the surface of the wafers and problems during the layering/depositing process are impossible to avoid, and cause some of the individual chips to be defective. The revenue from the remaining working chips has to pay for the entire cost of the wafer and its processing, including those discarded defective chips. Thus, the higher number of working chips or higher yield, the lower the cost of each individual chip. In order to maximize yield one wants to make the chips as small as possible, so that a higher number of working chips can be obtained per wafer.^{[clarification needed]}

Lowering cost

The significant fraction of the cost of fabrication (typically 30%-50%)^{[citation needed]} is related to testing and packaging the individual chips. Further cost is associated with connecting the chips into an integrated system (usually via a printed circuit board). Wafer-scale integration seeks to reduce this cost, as well as improve performance, by building larger chips in a single package – in principle, chips as large as a full wafer.^{[citation needed]}

Of course this is not easy, since given the flaws on the wafers a single large design printed onto a wafer would almost always not work. It has been an ongoing goal to develop methods to handle faulty areas of the wafers through logic, as opposed to sawing them out of the wafer. Generally, this approach uses a grid pattern of sub-circuits and "rewires" around the damaged areas using appropriate logic. If the resulting wafer has enough working sub-circuits, it can be used despite faults.

Challenges

Most yield loss in chipmaking comes from defects in the transistor layers or in the high-density lower metal layers. Another approach – silicon-interconnect fabric (Si-IF) – has neither on the wafer. Si-IF puts only relatively low-density metal layers on the wafer, roughly the same density as the upper layers of a system on a chip, using the wafer only for interconnects between tightly-packed small bare chiplets.^[1] Si-IF-based processors^[2] and network switches^[3] have been studied.

Production attempts

Many companies attempted to develop WSI production systems in the 1970s and 1980s, but all failed. Texas Instruments and ITT Corporation both saw it as a way to develop complex pipelined microprocessors and re-enter a market where they were losing ground, but neither released any products.

Gene Amdahl also attempted to develop WSI as a method of making a supercomputer, starting Trilogy Systems in 1980^[4]^[5]^[6] and garnering investments from Groupe Bull, Sperry Rand and Digital Equipment Corporation, who (along with others) provided an estimated $230 million in financing. The design called for a 2.5" square chip with 1200 pins on the bottom.

The effort was plagued by a series of disasters, including floods which delayed the construction of the plant and later ruined the clean-room interior. After burning through about 1⁄3 of the capital with nothing to show for it, Amdahl eventually declared the idea would only work with a 99.99% yield, which wouldn't happen for 100 years. He used Trilogy's remaining seed capital to buy Elxsi, a maker of superminicomputers, in 1985. The Trilogy efforts were eventually ended and "became" Elxsi.^[7]

In 1989 Anamartic developed a wafer stack memory based on the technology of Ivor Catt,^[8] but the company was unable to ensure a large enough supply of silicon wafers and folded in 1992.

Wafer-scale devices in production

Cerebras Systems processor

On August 19, 2019, American computer systems company Cerebras Systems presented their development progress of WSI for deep learning acceleration. Cerebras' Wafer-Scale Engine (WSE-1) chip is 46,225mm² (215mm × 215mm), around 56× larger than the largest GPU die. It is manufactured by TSMC using their 16nm process. The WSE-1 features 1.2 trillion transistors, 400,000 AI cores, 18GB of on-chip SRAM, 100Pbit/s on-wafer fabric bandwidth, and 1.2Pbit/s I/O off-wafer bandwidth. The price and clock rate have not been disclosed.^[9] In 2020, the company's product, the CS-1, was tested in computational fluid dynamics simulations. Compared to the Joule Supercomputer at NETL, the CS-1 was 200 times faster, while using much less power.^[10]

In April 2021, Cerebras announced the WSE-2, with twice the number of transistors and 100% claimed yield,^[11] which is achieved by designing a system in which any manufacturing defect can be bypassed.^[11] The Cerebras CS-2 system, which incorporates the WSE-2, is in serial production.

In March 2024, Cerebras announced the WSE-3 with twice the performance of the previous record-holder, the Cerebras WSE-2, at the same power draw and for the same price. It is aimed at AI training and built on TSMC's 5nm process.^[12]

References

^ Puneet Gupta and Subramanian S. Iyer. "Goodbye, Motherboard. Hello, Silicon-Interconnect Fabric" 2019.
^ Saptadeep Pal, Daniel Petrisko, Matthew Tomei, Puneet Gupta, Subbu Iyer, and Rakesh Kumar. "Architecting a Waferscale Processor - A GPU Case Study" 2019.
^ Shuangliang Chen, Saptadeep Pal, and Rakesh Kumar. "Waferscale Network Switches"2024.
^ Fortune Magazine article on Trilogy's history, 1986-09-01
^ CAN TROUBLED TRILOGY FULFILL ITS DREAM? / ERIC N. BERG, NYTimes, July 8, 1984
^ Trilogy definition in PCMag Encyclopedia
^ Ivor Catt: Dinosaur Computers, ELECTRONICS WORLD, June 2003
^ "Anamartic Wafer Stack". Computing History. Retrieved 27 September 2020.
^ Cutress, Dr Ian. "Hot Chips 31 Live Blogs: Cerebras' 1.2 Trillion Transistor Deep Learning Processor". www.anandtech.com. Retrieved 2019-08-29.
^ "Cerebras' wafer-size chip is 10,000 times faster than a GPU". VentureBeat. 2020-11-17. Retrieved 2020-11-26.
^ ^a ^b Cutress, Dr Ian. "Cerebras Unveils Wafer Scale Engine Two (WSE2): 2.6 Trillion Transistors, 100% Yield". www.anandtech.com. Retrieved 2021-07-26.
^ "Cerebras Systems Unveils World's Fastest AI Chip with Whopping 4 Trillion Transistors". Cerebras Systems. 2024-03-11. Retrieved 2024-03-19.

External links

"Giant microcircuits for superfast computers", Jim Schefter, Popular Science, January 1984, pp 66–67, 155

[1] Puneet Gupta and Subramanian S. Iyer. "Goodbye, Motherboard. Hello, Silicon-Interconnect Fabric" 2019.

[2] Saptadeep Pal, Daniel Petrisko, Matthew Tomei, Puneet Gupta, Subbu Iyer, and Rakesh Kumar. "Architecting a Waferscale Processor - A GPU Case Study" 2019.

[3] Shuangliang Chen, Saptadeep Pal, and Rakesh Kumar. "Waferscale Network Switches"2024.

[4] Fortune Magazine article on Trilogy's history, 1986-09-01

[5] CAN TROUBLED TRILOGY FULFILL ITS DREAM? / ERIC N. BERG, NYTimes, July 8, 1984

[6] Trilogy definition in PCMag Encyclopedia

[7] Ivor Catt: Dinosaur Computers, ELECTRONICS WORLD, June 2003

[8] "Anamartic Wafer Stack". Computing History. Retrieved 27 September 2020.

[9] Cutress, Dr Ian. "Hot Chips 31 Live Blogs: Cerebras' 1.2 Trillion Transistor Deep Learning Processor". www.anandtech.com. Retrieved 2019-08-29.

[10] "Cerebras' wafer-size chip is 10,000 times faster than a GPU". VentureBeat. 2020-11-17. Retrieved 2020-11-26.

[:0-11] Cutress, Dr Ian. "Cerebras Unveils Wafer Scale Engine Two (WSE2): 2.6 Trillion Transistors, 100% Yield". www.anandtech.com. Retrieved 2021-07-26.

[12] "Cerebras Systems Unveils World's Fastest AI Chip with Whopping 4 Trillion Transistors". Cerebras Systems. 2024-03-11. Retrieved 2024-03-19.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]