2022-01-01 How Much Longer Can Computing Power Drive Artificial Intelligence Progress (CSET)
2022-01-01 How Much Longer Can Computing Power Drive Artificial Intelligence Progress (CSET)
AUTHORS
Andrew J. Lohn
Micah Musser
Executive Summary
In the field of AI, not checking the news for a few months is
enough to become “out of touch.” Occasionally, this breakneck
speed of development is driven by revolutionary theories or
original ideas. More often, the newest state-of-the-art model
doesn’t rely on any new conceptual advances at all, rather just a
larger neural network and more powerful computing systems than
were used in previous attempts.
Central Processing Small models can be directly trained or Central unit of every
Unit (CPU) fine-tuned on CPUs; necessary in larger computing device; at least
models as a means to coordinate training one CPU is necessary for
across GPUs or ASICs. Sometimes every computer, phone,
needed to generate or manipulate smart appliance, etc.
training data.
Field Programmable Primarily used for model inference using Used in a wide variety of
Gate Array (FPGA) AI models that have already been trained applications, particularly in
embedded systems
Source: CSET.
*
OpenAI did not release the actual costs, but estimates are typically higher than
ours because we have made conservatively low pricing assumptions and
assumed 100 percent accelerator utilization. An estimate of about $4.6 million
is probably more accurate. We use conservatively low cost estimates to ensure
that we do not overstate the rising cost of training models and the impending
slowdown in compute growth. Chuan Li, “OpenAI’s GPT-3 Language Model: A
Technical Overview,” Lambda Labs, June 3, 2020,
https://lambdalabs.com/blog/demystifying-gpt-3/.
*
Our calculations for reaching this conclusion, along with our calculations for
other figures in this and the next two sections, can be found in our GitHub
repository: https://github.com/georgetown-cset/AI_and_compute_2022.
*
For single precision, the A100 advertises 19.5 teraFLOPS and costs $2.939 per
hour on Google Cloud. The V100 costs $2.48 per hour and advertises single
precision at between 14 and 16.4 teraFLOPS.
Source: CSET. Note: The blue line represents growing costs assuming compute
per dollar doubles every four years, with error shading representing no change
in compute costs or a doubling time as fast as every two years. The red line
represents expected GDP at a growth of 3 percent per year from 2019 levels
with error shading representing growth between 2 and 5 percent.
*
For this calculation we assumed that 37 percent of Nvidia’s revenue coming
from the datacenter market implies that 37 percent of its units are shipped to
datacenters, but high-end AI processors are more expensive than most
consumer GPUs, which means that fewer Nvidia accelerators likely end up in
cloud datacenters each year than what we have calculated.
†
Specifically, December 2025. Even if our estimate for the number of
accelerators available in the cloud to train on is off by an order of magnitude,
this breaking point would still be reached by December of 2026. The reality may
even be more pessimistic than we claim here, because for our calculations we
assume that every accelerator in the cloud is capable of operating continuously
with a throughput of 163 teraFLOPs per second, a figure that has been obtained
experimentally on Nvidia A100 GPUs but that likely overestimates the average
performance of all accelerators available in the cloud. See Deepak Narayanan et
al., “Efficient Large-Scale Language Model Training on GPU Clusters Using
Megatron-LM,” arXiv [cs.CL] (April 2021): arXiv:2104.04473.
Source: CSET.
Although these results may seem bleak, AI progress will not grind
to a halt. The trend in growing compute consumption that drove
many of the headlines for the past decade cannot last for much
longer, but it will probably slow rather than end abruptly. We
should also not discount ingenuity and innovations that could lead
For nearly a decade, buying and using more compute each year
has been a primary factor driving AI research beyond what was
previously thought possible. This trend is likely to break soon.
Although experts may disagree about which limitation is most
critical, continued progress in AI will soon require addressing
major structural challenges such as exploding costs, chip
shortages, and parallelization bottlenecks. Future progress will
likely rest far more on a shift towards efficiency in both algorithms
and hardware rather than massive increases in compute usage. In
addition, we anticipate that the future of AI research will
increasingly rely on tailoring algorithms, hardware, and
approaches to sub-disciplines and applications.
Acknowledgments
1
Dario Amodei et al., “AI and Compute,” OpenAI, May 16, 2018,
https://openai.com/blog/ai-and-compute/.
2
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, “ImageNet
Classification with Deep Convolutional Neural Networks,” Communications of
the ACM 60, no. 6 (June 2017): 84-90.
3
Volodymyr Mnih et al., “Playing Atari with Deep Reinforcement Learning,”
arXiv preprint arXiv:1312.5602 (2013); David Silver et al., “Mastering Chess and
Shogi by Self-Play with a General Reinforcement Learning Algorithm,” arXiv
preprint arXiv:1712.01815 (2017).
4
Jacob Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding,” arXiv preprint arXiv:1810.04805 (2018); Yinhan Liu
et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv
preprint arXiv:1907.11692 (2019); Colin Raffel et al., “Exploring the Limits of
Transfer Learning with a Unified Text-to-Text Transformer,” arXiv preprint
arXiv:1910.10683 (2019); Alec Radford et al., “Language Models are
Unsupervised Multitask Learners,” Papers With Code, 2019,
https://paperswithcode.com/paper/language-models-are-unsupervised-
multitask; Tom Brown et al., “Language Models are Few-Shot Learners,” arXiv
preprint arXiv:2005.14165 (2020).
5
Brown, “Language Models.”
6
Rishi Bommasani et al., “On the Opportunities and Risks of Foundation
Models,” arXiv preprint arXiv:2108.07258 (2021).
7
For this paper, we mean to say that this trend is unsustainable in the sense
that the trend itself cannot continue. But it is worth mentioning that spiraling
compute demands are also unsustainable in an environmental sense. Training
GPT-3 released an estimated 552 tons of CO2 equivalent into the atmosphere—
the equivalent of 460 round-trip flights between San Francisco and New York.
David Patterson et al., “Carbon Emissions and Large Neural Network Training,”
arXiv [cs.LG] (April 2021): arXiv:2104.10350. It is easy to overstate the
importance of this value: even in 2020, roughly six times this many people flew
between San Francisco and New York every day. “SFO Fact Sheet,” FlySFO,
accessed December 4, 2021, https://www.flysfo.com/sfo-fact-sheet. This energy
consumption is also miniscule compared to other emerging technologies that
require enormous amounts of computing, most notably cryptocurrencies. In
2018, Bitcoin alone was estimated to have generated 100,000 times that
volume of CO2 emissions, and by 2021 the energy requirements of Bitcoin had
nearly doubled relative to 2018. That is more electricity than the entire nation of
8
Brown, “Language Models”; Wei Zeng et al., “PanGu-: Large-scale
Autoregressive Pretrained Chinese Language Models with Auto-parallel
Computation,” arXiv preprint arXiv:2104.12369 (2021); Paresh Kharya and Ali
Alvi, “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the
World’s Largest and Most Powerful Generative Language Model,” Nvidia
Developer Blog, October 11, 2021, https://developer.nvidia.com/blog/using-
deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-
largest-and-most-powerful-generative-language-model/; Jack W. Rae et al.,
“Scaling Language Models: Methods, Analysis & Insights from Training Gopher,”
arXiv preprint arXiv:2112.11446 (2021).
9
Jennifer Langston, “Microsoft announces new supercomputer, lays out vision
for future AI work,” Microsoft AI Blog, May 19, 2020,
https://blogs.microsoft.com/ai/openai-azure-supercomputer/.
10
“Apple unleashes M1,” Apple, November 10, 2020,
https://www.apple.com/newsroom/2020/11/apple-unleashes-m1/; Jon
Martindale, “What is a teraflop?,” Digital Trends, June 14, 2021,
https://www.digitaltrends.com/computing/what-is-a-teraflop/.
11
Saif M. Khan and Alexander Mann, “AI Chips: What They Are and Why They
Matter” (Center for Security and Emerging Technology, April 2020),
https://doi.org/10.51593/20190014; Albert Reuther et al., “Survey of Machine
Learning Accelerators,” arXiv preprint arXiv:2009.00993 (2020).
12
Colby Banbury et al., “Benchmarking TinyML Systems: Challenges and
Direction,” arXiv preprint arXiv:2003.04821 (2020).
13
No accelerator is ever as efficient as its advertised full-utilization speeds
would suggest. Although our calculations assume that models are trained
continuously at the theoretical peak performance of the accelerators, AI
14
“FAQs,” National Ignition Facility & Photon Science, accessed December 4,
2021, https://lasers.llnl.gov/about/faqs; Alex Knapp, “How Much Does It Cost to
Find a Higgs Boson?” Forbes, July 5, 2012,
https://www.forbes.com/sites/alexknapp/2012/07/05/how-much-does-it-cost-
to-find-a-higgs-boson/?sh=38b3b9e13948; Deborah D. Stine, “The Manhattan
Project, the Apollo Program, and Federal Energy Technology R&D Programs: A
Comparative Analysis” (Congressional Research Service, June 2009),
https://sgp.fas.org/crs/misc/RL34645.pdf.
15
Wikipedia has a historical account of lowest cost to performance ratios over
many years showing decades of rapidly falling prices that have stalled since
2017. See “FLOPS,” Wikipedia, last modified December 4, 2021,
https://en.wikipedia.org/wiki/FLOPS#Hardware_costs.
16
Compare Web Archive captures of “Amazon EC2 P2 Instances,” Amazon Web
Services for February 10, 2017 (representing the earliest available capture) and
November 23, 2021 (representing the latest available capture at time of writing)
at
https://web.archive.org/web/20170210084643/https://aws.amazon.com/ec2/ins
tance-types/p2/ and
https://web.archive.org/web/20211123064633/https://aws.amazon.com/ec2/ins
tance-types/p2/, respectively. For Google Cloud pricing, compare Web Archive
captures of “GPUs pricing,” Google Cloud for August 26, 2019 (representing the
earliest available capture) and November 17, 2021 (representing the latest
available capture at time of writing) at
https://web.archive.org/web/20190826211015/https://cloud.google.com/compu
te/gpus-pricing and
https://web.archive.org/web/20211117225616/https://cloud.google.com/compu
te/gpus-pricing, respectively. Of all GPUs available at both times, only the
Nvidia T4 has declined in price since 2019. It is not among the most powerful
GPUs offered by Nvidia and is not commonly used for training large models.
17
See “Trends in the cost of computing,” AI Impacts, accessed December 4,
2021, https://aiimpacts.org/trends-in-the-cost-of-computing/. Much of the data
discussed in this research is either outdated or of dubious quality. A historical
doubling rate of roughly two years with a more recent slowdown is nonetheless
18
Sean Hollister, “The street prices of Nvidia and AMD GPUs are utterly out of
control,” The Verge, March 23, 2021,
https://www.theverge.com/2021/3/23/22345891/nvidia-amd-rtx-gpus-price-
scalpers-ebay-graphics-cards.
19
Stephen Wilmot, “The Great Car-Chip Shortage Will Have Lasting
Consequences,” The Wall Street Journal, September 27, 2021,
https://www.wsj.com/articles/the-great-car-chip-shortage-will-have-lasting-
consequences-11632737422; Stephen Nellis, “Apple says chip shortage
reaches iPhone, growth forecast slows,” Reuters, July 27, 2021,
https://www.reuters.com/world/china/apple-beats-sales-expectations-iphone-
services-china-strength-2021-07-27/; Paul Tassi, “PS5 and Xbox Series X
Shortages Will Continue Through 2023, Most Likely,” Forbes, September 4,
2021, https://www.forbes.com/sites/paultassi/2021/09/04/ps5-and-xbox-
series-x-shortages-will-continue-through-2023-most-likely/; Abram Brown,
“The War Between Gamers and Cryptominers—and the Scarce Global Resource
that Sparked It,” Forbes, May 24, 2021,
https://www.forbes.com/sites/abrambrown/2021/05/24/gamers-cryptocurrency-
cryptominers-gpu-microchip/?sh=33f44052dbf8.
20
Although a single inference usually requires markedly less computation than
training a model, inference can be much more computationally demanding over
the long run than the initial computational cost of training a model. Nvidia has
claimed that as much as 80–90 percent of compute used in AI applications is
used for inference, not training. George Leopold, “AWS to Offer Nvidia’s T4
GPUs for AI Inferencing,” HPC Wire, March 29, 2019,
https://www.hpcwire.com/2019/03/19/aws-upgrades-its-gpu-backed-ai-
inference-platform/. This provides another reason to think that our predictions
here represent overestimates of the amount of time left in the compute demand
trendline—we are focused only on the growing cost of training, but if the
demands of inference are factored into these equations, the overall cost of these
AI models would become even higher, and the supply of AI accelerators
available specifically for training would become even more strained.
21
Aleksandar Kostovic, “GPU Shipments Soar in Q2 with 123 Million Units
Delivered,” Tom’s Hardware, August 27, 2021,
https://www.tomshardware.com/news/jpr-gpu-q2-vendor-share.
23
“NVIDIA maintains dominant position in 2020 market for AI processors for
cloud and data center,” Omdia, August 4, 2021,
https://omdia.tech.informa.com/pr/2021-aug/nvidia-maintains-dominant-
position-in-2020-market-for-ai-processors-for-cloud-and-data-center.
24
Paresh Kharya and Ali Alvi, “Using DeepSpeed and Megatron to Train
Megatron-Turing NLG 530B, the World’s Largest and Most Powerful
Generative Language Model,” NVIDIA Developer Blog, October 11, 2021,
https://developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-
megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-
language-model/.
26
Jared Kaplan et al., “Scaling Law for Neural Language Models,” arXiv preprint
arXiv:2001.08361 (2020); Tom Henighan et al., “Scaling Laws for
Autoregressive Generative Modeling,” arXiv preprint arXiv:2010.14701 (2020).
27
It is important to note that this scaling law applies specifically to the single-
transformer architecture. There are other approaches—for instance, mixture of
expert models, discussed a bit later—for which more parameters can be trained
using less compute but sacrificing aspects of performance. We analyze the
transformer architecture, as it is currently the favored approach for language
models and is versatile enough to perform well across a number of other
domains. Eventually, the transformer will likely be supplanted by other models,
but this discussion helps to ground the amount of compute that would be
required under the current paradigm to reach models of various sizes.
28
Lower prices per GPU-hour are available for long term commitments, but long
training times are less useful than splitting the model or increasing the batch
size, so it is not clear that long term commitments are beneficial. See Jared
Kaplan et al., “Scaling Law for Neural Language Models,” arXiv preprint
arXiv:2001.08361 (2020).
29
Kharya and Alvi, “Using DeepSpeed.”
30
Bommasani et al., “Opportunities and Risks.”
31
Danny Hernandez and Tom B. Brown, “Measuring the Algorithmic Efficiency
of Neural Networks,” arXiv preprint arXiv:2005.04305 (2020).
32
Note that efficiency improvements can also be due to improvements other
than those in the algorithms used to train models, such as improvements in
33
There have also been some analyses of algorithmic efficiency improvements
in fields beyond AI. See Yash Sherry and Neil C. Thompson, “How Fast do
Algorithms Improve?,” Proceedings of the IEEE 109, no. 11 (November 2021):
1768-1777.
34
Hernandez and Brown, “Measuring Algorithmic Efficiency.”
35
Noam Shazeer et al., “Outrageously Large Neural Networks: The Sparsely-
Gated Mixture-of-Experts Layer,” arXiv preprint arXiv:1701.06538 (2017);
William Fedus, Barret Zoph, and Noam Shazeer, “Switch Transformers: Scaling
to Trillion Parameter Models with Simple and Efficient Sparsity,” arXiv preprint
arXiv:2101.03961 (2021); Alberto Romero, “GPT-3 Scared You? Meet Wu Dao
2.0: A Monster of 1.75 Trillion Parameters,” Towards Data Science, June 5,
2021, https://towardsdatascience.com/gpt-3-scared-you-meet-wu-dao-2-0-a-
monster-of-1-75-trillion-parameters-832cd83db484.
36
John Jumper et al., “Highly accurate protein structure prediction with
AlphaFold,” Nature 596 (August 2021): 583-589.
37
Hieu Pham et al., “Meta Pseudo Labels,” arXiv preprint arXiv:2003.10580
(2020).
38
Hernandez and Brown, “Measuring Algorithmic Efficiency.”
39
“The cost of training machines is becoming a problem,” The Economist, June
13, 2020, https://www.economist.com/technology-quarterly/2020/06/11/the-
cost-of-training-machines-is-becoming-a-problem.
40
Danny Hernandez et al., “Scaling Laws for Transfer,” arXiv preprint
arXiv:2102.01293 (2021).
41
Bommasani et al., “Opportunities and Risks.”
42
Andrew J. Lohn, “Poison in the Well” (Center for Security and Emerging
Technology, June 2021), https://doi.org/10.51593/2020CA013; Benjamin
Buchanan et al., “Truth, Lies, and Automation: How Language Models Could
Change Disinformation” (Center for Security and Emerging Technology, May
2021), https://doi.org/10.51593/2021CA003.
43
Diana Gehlhaus et al., “U.S. AI Workforce: Policy Recommendations” (Center
for Security and Emerging Technology, October 2021),
https://doi.org/10.51593/20200087; Dahlia Peterson, Kayla Goode, and Diana
Gehlhaus, “AI Education in China and the United States” (Center for Security
44
“The Biden Administration Launches the National Artificial Intelligence
Research Resource Task Force,” The White House, June 10, 2021,
https://www.whitehouse.gov/ostp/news-updates/2021/06/10/the-biden-
administration-launches-the-national-artificial-intelligence-research-resource-
task-force/.
45
See, e.g., Buchanan et al., “Truth, Lies, and Automation.”