Nvidia’s ‘flawed’ Al superchip that CEO Jensen Huang said has been fixed are still overheating

TOI Tech Desk / TIMESOFINDIA.COM / Updated: Nov 18, 2024, 15:47 IST

Text Size

Small
Medium
Large

Nvidia's upcoming Blackwell AI chips are facing overheating issues in high-density server setups, potentially causing delays for major customers like Meta and Google. While Nvidia acknowledges the challenges as part of the development process, reports suggest the company is working with suppliers to resolve the problem. This setback follows a previous design flaw that impacted production.

Nvidia’s ‘flawed’ Al superchip that CEO Jensen Huang said has been fixed are still overheating

Nvidia's highly anticipated Blackwell AI chips have run into another hurdle, a report has said, adding that the AI chip is experiencing overheating issues in the accompanying servers. According to a report by The Information, these problems have caused concerns among some customers who fear delays in getting their new data centres operational.
The report by The Information (via news agency Reuters) reveals that the Blackwell graphics processing units (GPUs), – dubbed 'superchip' and will be available by 2024-end – overheat when deployed in high-density server racks designed to accommodate up to 72 chips.
Citing sources, the report also notes that Nvidia has repeatedly requested design modifications from its suppliers to address the overheating, but a definitive solution has not been found.

What Nvidia has to say

Nvidia acknowledged the challenges but downplayed the issue.
“Nvidia is working with leading cloud service providers as an integral part of our engineering team and process. The engineering iterations are normal and expected,” a company spokesperson was quoted as saying.
The ongoing problems may add to delays in the Blackwell rollout, potentially impacting major customers like Meta, Google, and Microsoft who are eager to leverage the chip's capabilities for AI applications, the report highlighted.

“...100% Nvidia's fault”: CEO Jensen Huang

Last month, Nvidia CEO Jensen Huang said that a design flaw that impacted the Blackwell chips’ production and caused delays has been fixed.
“It was functional, but the design flaw caused the yield to be low. It was 100% Nvidia's fault. In order to make a Blackwell computer work, seven different types of chips were designed from scratch and had to be ramped into production at the same time,” he said.
"What TSMC did was to help us recover from that yield difficulty and resume the manufacturing of Blackwell at an incredible place," the CEO added.
Despite these setbacks, the Blackwell chip represents a significant leap forward in AI processing power. By combining two silicon squares into a single component, Nvidia claims it delivers a 30-fold increase in speed for tasks like generating chatbot responses.

About the Author

TOI Tech Desk

The TOI Tech Desk is a dedicated team of journalists committed to delivering the latest and most relevant news from the world of technology to readers of The Times of India. TOI Tech Desk’s news coverage spans a wide spectrum across gadget launches, gadget reviews, trends, in-depth analysis, exclusive reports and breaking stories that impact technology and the digital universe. Be it how-tos or the latest happenings in AI, cybersecurity, personal gadgets, platforms like WhatsApp, Instagram, Facebook and more; TOI Tech Desk brings the news with accuracy and authenticity.

End of Article