AI Accelerator
AI Accelerator
Definition
An AI accelerator is a high-performance parallel computation machine that is
specifically designed for the efficient processing of AI workloads like neural
networks.
Traditionally, in software design, computer scientists focused on developing
algorithmic approaches that matched specific problems and implemented them in
a high-level procedural language. To take advantage of available hardware,
some algorithms could be threaded; however, massive parallelism was difficult to
achieve because of the implications of Amdahl’s Law.
Thanks to big data and everything-connectivity, we now have a new paradigm: design by
optimization. According to the design by optimization methodology, data scientists use
inherently parallelized computing systems, such as neural networks, to ingest massive
amounts of data and train themselves through iterative optimization.
The industry’s predominant workhorses for executing software – standardized
Instruction Set Architectures (ISA) – aren't suited for this approach. Instead, AI
accelerators have emerged to deliver the processing power and energy efficiency
needed to enable our world of abundant-data computing.
How does an AI Accelerator work?
There are currently two distinct AI accelerator spaces: the data center and the
edge.
Data centers, particularly hyperscale data centers, require massively scalable
compute architectures. For this space, the chip industry is going big. Cerebras,
for example, has pioneered the Wafer-Scale Engine (WSE), the biggest chip ever
built, for deep-learning systems. By delivering more compute, memory, and
communication bandwidth, the WSE can support AI research at dramatically
faster speeds and scalability compared with traditional architectures.
The edge represents the other end of the spectrum. Here, energy efficiency is
key and real estate is limited, since the intelligence is distributed at the edge of
the network rather than a more centralized location. AI accelerator IP is
integrated into edge SoC devices which, no matter how small, deliver the near-
instantaneous results needed for, say, interactive programs that run on
smartphones or for industrial robotics.
The different types of hardware AI Accelerators
While the WSE is one approach for accelerating AI applications, there are a
variety of other types of hardware AI accelerators for applications that don’t
require one large chip. Examples include:
Each of these are separate chips that can be combined by the tens to hundreds into larger
systems to enable processing large neural networks. Coarse-grain reconfigurable
architectures (CGRA) are gaining significant momentum in this space as they can offer
attractive tradeoffs between performance and energy-efficiency on one side and
flexibility to program different networks on the other.
For example, consider Megatron, one of the world’s largest transformer-based language
neural network models for natural language processing (NLP). Created by the Applied
Deep Learning Research team at NVIDIA, Megatron provides an 8.3 billion parameter
transformer language model with 8-way model parallelism and 64-way data parallelism,
according to NVIDIA. To execute this model, which is generally pre-trained on a dataset
of 3.3 billion words, the company developed the NVIDIA A100 GPU, which delivers 312
teraFLOPs of FP16 compute power. Google’s TPU provides another example; it can be
combined in pod configurations that deliver more than 100 petaFLOPS of processing
power for training neural networ
k models.
Source: Open AI
Different AI accelerator architectures may offer different performance tradeoffs, but they
all require an associated software stack to enable system-level performance; otherwise,
the hardware could be underutilized. To facilitate connectivity between high-level
software frameworks, such as TensorFlow™ or PyTorch™, and different AI accelerators,
machine learning compilers are emerging to enable interoperability. A representative
example is the Facebook Glow compiler.
Measuring performance of AI accelerators has been a contentious topic. For an
independent assessment of training and inference performance of machine
learning hardware, software, and services, teams can consult MLPerf, an
independent organization formed by a group of engineers and researchers from
industry and academia.
As intelligence moves to the edge in many applications, this is creating greater
differentiation in AI accelerators. The edge offers a tremendous variety of
applications that requires AI accelerators to be specifically optimized for different
characteristics like latency, energy efficiency, and memory based on the needs of
the end application. For example, while autonomous navigation demands a
computational response latency limit of 20μs, voice and video assistants must
understand spoken keywords in less than 10μs and hand gestures in a few
hundred milliseconds.
In the future, cognitive systems, which aim to simulate human thought processes,
will emerge with greater prominence. Compared to today’s neural networks,
cognitive systems have a deeper understanding of how to interpret data at a
different level of abstraction.
Benefits of an AI Accelerator
Given that processing speed and scalability are two key demands from AI
applications, AI accelerators play a critical role in delivering the near-
instantaneous results that make these applications valuable. Let’s dive into the
top benefits of AI accelerators in some more detail: