HAI - AI-Index-Report-2024 (Chapter 1)
HAI - AI-Index-Report-2024 (Chapter 1)
Intelligence
Index Report
2024
Artificial Intelligence
CHAPTER 1:
Index Report 2024
Research and
Development
CHAPTER 1:
Artificial Intelligence
Index Report 2024 Research and
Development
Preview
Overview 29 1.4 AI Conferences 66
Chapter Highlights 30 Conference Attendance 66
1.2 Patents 38
AI Patents 38
Overview 38
By Filing Status and Region 39
Table of Contents 28
CHAPTER 1:
Artificial Intelligence
Index Report 2024 Research and
Development
Overview
This chapter studies trends in AI research and development. It begins by examining
trends in AI publications and patents, and then examines trends in notable AI systems and
foundation models. It concludes by analyzing AI conference attendance and open-source
AI software projects.
Table of Contents 29
CHAPTER 1:
Artificial Intelligence
Index Report 2024 Research and
Development
Chapter Highlights
1. Industry continues to dominate frontier AI research. In 2023, industry produced 51 notable
machine learning models, while academia contributed only 15. There were also 21 notable models resulting from
industry-academia collaborations in 2023, a new high.
2. More foundation models and more open foundation models. In 2023, a total of 149 foundation
models were released, more than double the amount released in 2022. Of these newly released models, 65.7%
were open-source, compared to only 44.4% in 2022 and 33.3% in 2021.
3. Frontier models get way more expensive. According to AI Index estimates, the training costs of
state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an estimated
$78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute.
4. The United States leads China, the EU, and the U.K. as the leading source of top AI
models. In 2023, 61 notable AI models originated from U.S.-based institutions, far outpacing the European
Union’s 21 and China’s 15.
5. The number of AI patents skyrockets. From 2021 to 2022, AI patent grants worldwide increased
sharply by 62.7%. Since 2010, the number of granted AI patents has increased more than 31 times.
6. China dominates AI patents. In 2022, China led global AI patent origins with 61.1%, significantly
outpacing the United States, which accounted for 20.9% of AI patent origins. Since 2010, the U.S. share of AI
patents has decreased from 54.1%.
7. Open-source AI research explodes. Since 2011, the number of AI-related projects on GitHub has
seen a consistent increase, growing from 845 in 2011 to approximately 1.8 million in 2023. Notably, there was a
sharp 59.3% rise in the total number of GitHub AI projects in 2023 alone. The total number of stars for AI-related
projects on GitHub also significantly increased in 2023, more than tripling from 4.0 million in 2022 to 12.2 million.
8. The number of AI publications continues to rise. Between 2010 and 2022, the total number of AI
publications nearly tripled, rising from approximately 88,000 in 2010 to more than 240,000 in 2022. The increase
over the last year was a modest 1.1%.
Table of Contents 30
Artificial Intelligence Chapter 1: Research and Development
Index Report 2024 1.1 Publications
1.1 Publications
Overview Total Number of AI Publications1
The figures below present the global count of Figure 1.1.1 displays the global count of AI publications.
English-language AI publications from 2010 to Between 2010 and 2022, the total number of AI
2022, categorized by type of affiliation and cross- publications nearly tripled, rising from approximately
sector collaborations. Additionally, this section 88,000 in 2010 to more than 240,000 in 2022. The
details publication data for AI journal articles and increase over the last year was a modest 1.1%.
conference papers.
200
Number of AI publications (in thousands)
150
100
50
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.1
1 The data on publications presented this year is sourced from CSET. Both the methodology and data sources used by CSET to classify AI publications have changed since their data was last
featured in the AI Index (2023). As a result, the numbers reported in this year’s section differ slightly from those reported in last year’s edition. Moreover, the AI-related publication data is fully
available only up to 2022 due to a significant lag in updating publication data. Readers are advised to approach publication figures with appropriate caution.
By Type of Publication
Figure 1.1.2 illustrates the distribution of AI publication journal and conference publications have increased
types globally over time. In 2022, there were roughly at comparable rates. In 2022, there were 2.6 times as
230,000 AI journal articles compared to roughly many conference publications and 2.4 times as many
42,000 conference submissions. Since 2015, AI journal publications as there were in 2015.
232.67, Journal
200
Number of AI publications (in thousands)
150
100
41.17, Conference
12.88, Book chapter
50 5.07, Preprint
1.49, Article
0.79, Unknown
0.70, Dissertation
0.57, Book
0.12, Other
0 0.05, Clinical trial
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.22
2 It is possible for an AI publication to be mapped to more than one publication type, so the totals in Figure 1.1.2 do not completely align with those in Figure 1.1.1.
By Field of Study
Figure 1.1.3 examines the total number of AI sevenfold since 2015. Following machine learning, the
publications by field of study since 2010. Machine most published AI fields in 2022 were computer vision
learning publications have seen the most rapid (21,309 publications), pattern recognition (19,841), and
growth over the past decade, increasing nearly process management (12,052).
60
Number of AI publications (in thousands)
50
40
30
21.31, Computer vision
19.84, Pattern recognition
20 12.05, Process management
10.39, Computer network
9.17, Control theory
8.31, Algorithm
10 7.18, Linguistics
6.83, Mathematical optimization
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.3
By Sector
This section presents the distribution of AI publications (81.1%), maintaining its position as the
publications by sector—education, government, leading global source of AI research over the past
industry, nonprofit, and other—globally and then decade across all regions (Figure 1.1.4 and Figure 1.1.5).
specifically within the United States, China, and the Industry participation is most significant in the United
European Union plus the United Kingdom. In 2022, States, followed by the European Union plus the United
the academic sector contributed the majority of AI Kingdom, and China (Figure 1.1.5).
70%
60%
AI publications (% of total)
50%
40%
30%
20%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.4
75.48%
Education 75.63%
81.75%
14.06%
Industry 9.47%
7.39%
5.60%
Government 9.28%
10.05%
4.87%
United States
Nonpro t 5.62% European Union and United Kingdom
0.80% China
AI Journal Publications
Figure 1.1.6 illustrates the total number of AI journal publications from 2010 to 2022. The number of AI journal
publications experienced modest growth from 2010 to 2015 but grew approximately 2.4 times since 2015.
Between 2021 and 2022, AI journal publications saw a 4.5% increase.
200
Number of AI publications (in thousands)
150
100
50
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.6
35
30
25
20
15
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.7
This section examines trends over time in global AI patents, which can reveal important insights into the evolution of
innovation, research, and development within AI. Additionally, analyzing AI patents can reveal how these advancements
are distributed globally. Similar to the publications data, there is a noticeable delay in AI patent data availability, with
2022 being the most recent year for which data is accessible. The data in this section comes from CSET.
1.2 Patents
AI Patents
Overview
Figure 1.2.1 examines the global growth in granted years. For instance, between 2010 and 2014, the total
AI patents from 2010 to 2022. Over the last decade, growth in granted AI patents was 56.1%. However,
there has been a significant rise in the number of AI from 2021 to 2022 alone, the number of AI patents
patents, with a particularly sharp increase in recent increased by 62.7%.
50
Number of AI patents (in thousands)
40
30
20
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.1
120
100
Number of AI patents (in thousands)
80
60 62.26, granted
40
20
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.2
The gap between granted and not granted AI (Figure 1.2.3). In recent years, all three geographic
patents is evident across all major patent-originating areas have experienced an increase in both the total
geographic areas, including China, the European number of AI patent filings and the number of
Union and United Kingdom, and the United States patents granted.
70 70 70
Number of AI patent lings (in thousands)
60 60 60
50 50 50
40 40 40
35.31
30 30 30
20 20 20
15.11
12.08
10 10 10
2.17
0 0 1.17 0
2010 2012 2014 2016 2018 2020 2022 2010 2012 2014 2016 2018 2020 2022 2010 2012 2014 2016 2018 2020 2022
Figure 1.2.3
Figure 1.2.4 showcases the regional breakdown North America led in the number of global AI patents.
of granted AI patents. As of 2022, the bulk of the However, since then, there has been a significant
world’s granted AI patents (75.2%) originated from shift toward an increasing proportion of AI patents
East Asia and the Pacific, with North America being originating from East Asia and the Pacific.
the next largest contributor at 21.2%. Up until 2011,
80%
75.20%, East Asia and Paci c
70%
Granted AI patents (% of world total)
60%
50%
40%
30%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.4
Disaggregated by geographic area, the majority of the world’s granted AI patents are from China (61.1%) and the
United States (20.9%) (Figure 1.2.5). The share of AI patents originating from the United States has declined from
54.1% in 2010.
61.13%, China
60%
50%
Granted AI patents (% of world total)
40%
30%
10%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.5
Figure 1.2.6 and Figure 1.2.7 document which (Figure 1.2.6). Figure 1.2.7 highlights the change in
countries lead in AI patents per capita. In 2022, the granted AI patents per capita from 2012 to 2022.
country with the most granted AI patents per 100,000 Singapore, South Korea, and China experienced the
inhabitants was South Korea (10.3), followed by greatest increase in AI patenting per capita during
Luxembourg (8.8) and the United States (4.2) that time period.
Luxembourg 8.73
Japan 2.53
China 2.51
Singapore 2.06
Australia 1.91
Canada 1.25
Germany 0.66
Denmark 0.56
Finland 0.56
France 0.33
Lithuania 0.28
0 1 2 3 4 5 6 7 8 9 10
Granted AI patents (per 100,000 inhabitants) Figure 1.2.6
Percentage change of granted AI patents per 100,000 inhabitants by country, 2012 vs. 2022
Source: Center for Security and Emerging Technology, 2023 | Chart: 2024 AI Index report
Singapore 5,366%
China 3,569%
Denmark 1,463%
Japan 1,137%
France 1,086%
Germany 961%
Australia 908%
Finland 907%
Canada 803%
0% 500% 1,000% 1,500% 2,000% 2,500% 3,000% 3,500% 4,000% 4,500% 5,000% 5,500%
% change of granted AI patents (per 100,000 inhabitants)
Figure 1.2.7
This section explores the frontier of AI research. While many new AI models are introduced annually, only a small
sample represents the most advanced research. Admittedly what constitutes advanced or frontier research is
somewhat subjective. Frontier research could reflect a model posting a new state-of-the-art result on a benchmark,
introducing a meaningful new architecture, or exercising some impressive new capabilities.
The AI Index studies trends in two types of frontier AI models: “notable models” and foundation models.3 Epoch, an
AI Index data provider, uses the term “notable machine learning models” to designate noteworthy models handpicked
as being particularly influential within the AI/machine learning ecosystem. In contrast, foundation models are
exceptionally large AI models trained on massive datasets, capable of performing a multitude of downstream tasks.
Examples of foundation models include GPT-4, Claude 3, and Gemini. While many foundation models may qualify as
notable models, not all notable models are foundation models.
Within this section, the AI Index explores trends in notable models and foundation models from various perspectives,
including originating organization, country of origin, parameter count, and compute usage. The analysis concludes
with an examination of machine learning training costs.
3 “AI system” refers to a computer program or product based on AI, such as ChatGPT. “AI model” refers to a collection of parameters whose values are learned during training, such as GPT-4.
4 New and historic models are continually added to the Epoch database, so the total year-by-year counts of models included in this year’s AI Index might not exactly match those published in
last year’s report.
Sector Analysis
Until 2014, academia led in the release of machine Creating cutting-edge AI models now demands a
learning models. Since then, industry has taken substantial amount of data, computing power, and
the lead. In 2023, there were 51 notable machine financial resources that are not available in academia.
learning models produced by industry compared to This shift toward increased industrial dominance in
just 15 from academia (Figure 1.3.1). Significantly, 21 leading AI models was first highlighted in last year’s
notable models resulted from industry/academic AI Index report. Although this year the gap has slightly
collaborations in 2023, a new high. narrowed, the trend largely persists.
51, Industry
50
Number of notable machine learning models
40
30
15, Academia
10
2, Government
0, Industry–research collective collaboration
0, Research collective
0 0, Academia-government collaboration
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
Figure 1.3.1
AI, the AI Index research team analyzed the country United States 61
of origin of notable models. China 15
15, and France with 8. For the first time since 2019, Singapore 3
the European Union and the United Kingdom
United Arab Emirates 3
together have surpassed China in the number of
Egypt 2
notable AI models produced (Figure 1.3.3). Since
0 5 10 15 20 25 30 35 40 45 50 55 60
2003, the United States has produced more models Number of notable machine learning models
50
40
30
25, European Union and
United Kingdom
20
15, China
10
0
2003
2005
2007
2009
2011
2013
2015
2017
2019
2021
2023
Figure 1.3.3
5 A machine learning model is considered associated with a specific country if at least one author of the paper introducing it has an affiliation with an institution based in that country. In cases
where a model’s authors come from several countries, double counting can occur.
1–10
11–20
21–60
61–100
101–430
Figure 1.3.4
Parameter Trends
Parameters in machine learning models are numerical originate. Parameter counts have risen sharply since
values learned during training that determine how a the early 2010s, reflecting the growing complexity
model interprets input data and makes predictions. of tasks AI models are designed for, the greater
Models trained on more data will usually have more availability of data, improvements in hardware, and
parameters than those trained on less data. Likewise, proven efficacy of larger models. High-parameter
models with more parameters typically outperform models are particularly notable in the industry sector,
those with fewer parameters. underscoring the capacity of companies like OpenAI,
Anthropic, and Google to bear the computational
Figure 1.3.5 demonstrates the parameter count of
costs of training on vast volumes of data.
machine learning models in the Epoch dataset,
categorized by the sector from which the models
10B
Number of parameters (log scale)
100M
1M
10K
100
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Publication date
Figure 1.3.5
100M
1M
10K
100
0.01
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Publication date
Figure 1.3.6
6 FLOP stands for “floating-point operation.” A floating-point operation is a single arithmetic operation involving floating-point numbers, such as addition, subtraction, multiplication, or
division. The number of FLOPs a processor or computer can perform per second is an indicator of its computational power. The higher the FLOP rate, the more powerful the computer is.
An AI model with a higher FLOP rate reflects its requirement for more computational resources during training.
Figure 1.3.7 highlights the training compute of notable The original Transformer, released in 2017, required
machine learning models since 2012. For example, around 7,400 petaFLOPs. Google’s Gemini Ultra, one
AlexNet, one of the papers that popularized the now of the current state-of-the-art foundation models,
standard practice of using GPUs to improve AI models, required 50 billion petaFLOPs.
required an estimated 470 petaFLOPs for training.
GPT-4
10B Claude 2
PaLM (540B)
Llama 2-70B
Training compute (petaFLOP - log scale)
1M
BERT-Large
100K
Transformer
10K
1000 AlexNet
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Publication date
Figure 1.3.7
Highlight:
behind powerful LLMs, has been achieved by Stock type Historical projection Compute projection
training models on increasingly larger amounts of Low-quality 2032.4 [2028.4; 2039.2] 2040.5 [2034.6; 2048.9]
language stock
data. As noted recently by Anthropic cofounder
High-quality 2024.5 [2023.5; 2025.7] 2024.1 [2023.2; 2025.3]
and AI Index Steering Committee member Jack language stock
Clark, foundation models have been trained on Image stock 2046 [2037; 2062.8] 2038.8 [2032; 2049.8]
The growing data dependency of AI models data, which is data generated by AI models
has led to concerns that future generations of themselves. For example, it is possible to use
computer scientists will run out of data to further text produced by one LLM to train another LLM.
scale and improve their systems. Research from The use of synthetic data for training AI systems
Epoch suggests that these concerns are somewhat is particularly attractive, not only as a solution
warranted. Epoch researchers have generated for potential data depletion but also because
historical and compute-based projections for generative AI systems could, in principle, generate
when AI researchers might expect to run out data in instances where naturally occurring data
of data. The historical projections are based on is sparse—for example, data for rare diseases or
observed growth rates in the sizes of data used to underrepresented populations. Until recently, the
train foundation models. The compute projections feasibility and effectiveness of using synthetic
adjust the historical growth rate based on data for training generative AI systems were not
projections of compute availability. well understood. However, research this year has
For instance, the researchers estimate that suggested that there are limitations associated with
computer scientists could deplete the stock of training models on synthetic data.
high-quality language data by 2024, exhaust low- For instance, a team of British and Canadian
quality language data within two decades, and researchers discovered that models predominantly
use up image data by the late 2030s to mid-2040s trained on synthetic data experience model
(Figure 1.3.8). collapse, a phenomenon where, over time, they
Theoretically, the challenge of limited data lose the ability to remember true underlying data
availability can be addressed by using synthetic distributions and start producing a narrow range of
Highlight:
Figure 1.3.9
7 In the context of generative models, density refers to the level of complexity and variation in the outputs produced by an AI model. Models that have a higher generation density
produce a wider range of higher-quality outputs. Models with low generation density produce a narrower range of more simplistic outputs.
Highlight:
1.60
1.40
1.20
1.00
Density
Generation 0
0.80
Generation 1
Generation 2
0.60 Generation 3
Generation 4
Generation 5
0.40
Generation 6
Generation 7
0.20 Generation 8
Generation 9
0.00
−3 −2 −1 0 1 2 3
Figure 1.3.10
In a similar study published in 2023 on the use generated images declines. Figure 1.3.11 highlights
of synthetic data in generative imaging models, the degraded image generations of models that are
researchers found that generative image models augmented with synthetic data; for example, the
trained solely on synthetic data cycles—or with faces generated in steps 7 and 9 increasingly display
insufficient real human data—experience a strange-looking hash marks. From a statistical
significant drop in output quality. The authors perspective, images generated with both synthetic
label this phenomenon Model Autophagy Disorder data and synthetic augmentation loops have higher
(MAD), in reference to mad cow disease. FID scores (indicating less similarity to real images),
lower precision scores (signifying reduced realism
The study examines two types of training processes:
or quality), and lower recall scores (suggesting
fully synthetic, where models are trained exclusively
decreased diversity) (Figure 1.3.12). While synthetic
on synthetic data, and synthetic augmentation,
augmentation loops, which incorporate some real
where models are trained on a mix of synthetic
data, show less degradation than fully synthetic
and real data. In both scenarios, as the number of
loops, both methods exhibit diminishing returns with
training generations increases, the quality of the
further training.
Highlight:
Figure 1.3.11
Assessing FFHQ syntheses: FID, precision, and recall in synthetic and mixed-data training loops
Source: Alemohammad et al., 2023 | Chart: 2024 AI Index report
0.70 0.40
20
0.60 0.35
0.30
0.50
15
0.25
0.40
Precision
Recall
FID
0.20
10
0.30
0.15
0.20
0.10
5
0.10 0.05
0 0.00 0.00
0 2 4 6 0 2 4 6 0 2 4 6
Generations Generations Generations
Figure 1.3.12
Model Release
Foundation Models Foundation models can be accessed in different
Foundation models represent a rapidly evolving ways. No access models, like Google’s PaLM-E, are
and popular category of AI models. Trained on vast only accessible to their developers. Limited access
datasets, they are versatile and suitable for numerous models, like OpenAI’s GPT-4, offer limited access to
downstream applications. Foundation models such as the models, often through a public API. Open models,
GPT-4, Claude 3, and Llama 2 showcase remarkable like Meta’s Llama 2, fully release model weights, which
abilities and are increasingly being deployed in real- means the models can be modified and freely used.
world scenarios.
Figure 1.3.13 visualizes the total number of foundation
Introduced in 2023, the Ecosystem Graphs is a new models by access type since 2019. In recent years, the
community resource from Stanford that tracks the number of foundation models has risen sharply, more
foundation model ecosystem, including datasets, than doubling since 2022 and growing by a factor of
models, and applications. This section uses data from nearly 38 since 2019. Of the 149 foundation models
the Ecosystem Graphs to study trends in foundation released in 2023, 98 were open, 23 limited and 28
models over time. 8
no access.
160
Open Limited No access
149
140
120
100 98
Foundation models
80
72
60
32
40 23
12
27
20 9
28 28
4 2 10
0
2019 2020 2021 2022 2023
Figure 1.3.13
8 The Ecosystem Graphs make efforts to survey the global AI ecosystem, but it is possible that they underreport models from certain nations like South Korea and China.
In 2023, the majority of foundation models were released as open access (65.8%), with 18.8% having no access
and 15.4% limited access (Figure 1.3.14). Since 2021, there has been a significant increase in the proportion of
models released with open access.
70%
65.77%, Open
60%
Foundation models (% of total)
50%
40%
30%
20%
18.79%, No access
15.44%, Limited
10%
0%
Figure 1.3.14
108, Industry
100
80
Number of foundation models
60
40
28, Academia
20
9, Industry-academia collaboration
4, Government
0 0, Industry-government collaboration
Figure 1.3.15
Figure 1.3.16 highlights the source of various foundation models that were released in 2023. Google introduced
the most models (18), followed by Meta (11), and Microsoft (9). The academic institution that released the most
foundation models in 2023 was UC Berkeley (3).
Google 18
Meta 11
Microsoft 9
OpenAI 7
Together 5
Hugging Face 4
Anthropic 4
AI2 4
Stability AI 3
Cerebras 3
Shanghai AI Laboratory 3
Adobe 3
Industry
UC Berkeley 3
Academia
DeepMind 2 Nonpro t
Stanford University 2
0 2 4 6 8 10 12 14 16 18
Number of foundation models
Figure 1.3.16
Since 2019, Google has led in releasing the most foundation models, with a total of 40, followed by OpenAI with
20 (Figure 1.3.17). Tsinghua University stands out as the top non-Western institution, with seven foundation model
releases, while Stanford University is the leading American academic institution, with five releases.
Google 40
OpenAI 20
Meta 19
Microsoft 18
DeepMind 15
Tsinghua University 7
EleutherAI 6
Together 6
Cohere 5
Stanford University 5
Hugging Face 5
Anthropic 5
Industry
AI2 4
Academia
BigScience 4 Nonpro t
Shanghai AI Laboratory 4
0 4 8 12 16 20 24 28 32 36 40
Number of foundation models
Figure 1.3.17
China (20), and the United Kingdom (Figure 1.3.18). 0 10 20 30 40 50 60 70 80 90 100 110
Number of foundation models
Since 2019, the United States has consistently led
in originating the majority of foundation models Figure 1.3.18
(Figure 1.3.19).
80
60
40
20, China
20 15, European Union and
United Kingdom
Figure 1.3.19
Figure 1.3.20 depicts the cumulative count of foundation models released and attributed to respective countries
since 2019. The country with the greatest number of foundation models released since 2019 is the United States
(182), followed by China (30), and the United Kingdom (21).
1–10
11–30
31–182
Figure 1.3.20
9 Ben Cottier and Robi Rahman led research at Epoch AI into model training cost.
10 A detailed description of the estimation methodology is provided in the Appendix.
11 The cost figures reported in this section are inflation-adjusted.
200M 191,400,000
Training cost (in U.S. dollars)
150M
100M
78,352,034
50M
12,389,056
4,324,883 6,405,653 1,319,586 3,931,897
930 3,288 160,018
0
Transformer
BERT-Large
RoBERTa Large
LaMDA
PaLM (540B)
GPT-4
Gemini Ultra
Llama 2 70B
2017 2018 2019 2020 2021 2022 2023
Figure 1.3.21
Figure 1.3.22 visualizes the training cost of all AI models for which the AI Index has estimates. As the figure shows,
model training costs have sharply increased over time.
Gemini Ultra
GPT-4
100M
Falcon 180B
PaLM (540B)
Training cost (in U.S. dollars - log scale)
10M
GPT-3 175B (davinci) Llama 2 70B
BLOOM-176B
Megatron-BERT HyperCLOVA LLaMA-65B
1M Switch Flamingo
AlphaStar Meta Pseudo Labels PaLI StarCoder
GNMT
RoBERTa Large
100K T0-XXL
JFT Imagen
Xception
BigGAN-deep 512×512
10K Big Transformer for Back-Translation
BERT-Large
Transformer
1000 SciBERT
IMPALA
100
As established in previous AI Index reports, there is a direct correlation between the training costs of AI models
and their computational requirements. As illustrated in Figure 1.3.23, models with greater computational training
needs cost substantially more to train.
Gemini Ultra
GPT-4
100M
PaLM (540B)
Training cost (in U.S. dollars - log scale)
10M
GPT-3 175B (davinci) Megatron-Turing NLG 530B
Llama 2 70B
LaMDA
1M
RoBERTa Large
100K
10K
BERT-Large
Transformer
1000
AI conferences serve as essential platforms for researchers to present their findings and network with peers and
collaborators. Over the past two decades, these conferences have expanded in scale, quantity, and prestige.
This section explores trends in attendance at major AI conferences.
1.4 AI Conferences
Conference Attendance
Figure 1.4.1 graphs attendance at a selection of Specifically, there was a 6.7% rise in total attendance
AI conferences since 2010. Following a decline in over the last year. Since 2015, the annual number of
attendance, likely due to the shift back to exclusively attendees has risen by around 50,000, reflecting not
in-person formats, the AI Index reports an increase just a growing interest in AI research but also the
in conference attendance from 2022 to 2023.12 emergence of new AI conferences.
80
70
Number of attendees (in thousands)
63.29
60
50
40
30
20
10
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.4.1
12 This data should be interpreted with caution given that many conferences in the last few years have had virtual or hybrid formats. Conference organizers report that measuring the exact
attendance numbers at virtual conferences is difficult, as virtual conferences allow for higher attendance of researchers from around the world. The conferences for which the AI Index tracked
data include NeurIPS, CVPR, ICML, ICCV, ICRA, AAAI, ICLR, IROS, IJCAI, AAMAS, FAccT, UAI, ICAPS, and KR.
Neural Information Processing Systems (NeurIPS) AI conferences, NeurIPS, ICML, ICCV, and AAAI
remains one of the most attended AI conferences, experienced year-over-year increases in attendance.
attracting approximately 16,380 participants in 2023 However, in the past year, CVPR, ICRA, ICLR, and IROS
(Figure 1.4.2 and Figure 1.4.3). Among the major observed slight declines in their attendance figures.
30
25
Number of attendees (in thousands)
20
16.38, NeurIPS
15
10 8.34, CVPR
7.92, ICML
7.33, ICCV
6.60, ICRA
4.47, AAAI
5 3.76, ICLR
3.65, IROS
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.4.2
3.50
3.00
Number of attendees (in thousands)
2.50
1.50
Figure 1.4.3
GitHub is a web-based platform that enables individuals and teams to host, review, and collaborate on code
repositories. Widely used by software developers, GitHub facilitates code management, project collaboration,
and open-source software support. This section draws on data from GitHub providing insights into broader trends in
open-source AI software development not reflected in academic publication data.
1.81
1.50
Number of AI projects (in millions)
1.00
0.50
0.00
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.1
13 GitHub’s methodology for identifying AI-related projects has evolved over the past year. For classifying AI projects, GitHub has started incorporating generative AI keywords from a
recently published research paper, a shift from the previously detailed methodology in an earlier paper. This edition of the AI Index is the first to adopt this updated approach. Moreover, the
previous edition of the AI Index utilized country-level mapping of GitHub AI projects conducted by the OECD, which depended on self-reported data—a method experiencing a decline in
coverage over time. This year, the AI Index has adopted geographic mapping from GitHub, leveraging server-side data for broader coverage. Consequently, the data presented here may not
align perfectly with data in earlier versions of the report.
Figure 1.5.2 reports GitHub AI projects by geographic followed closely by the European Union and the
area since 2011. As of 2023, a significant share United Kingdom at 17.9%. Notably, the proportion of AI
of GitHub AI projects were located in the United projects from developers located in the United States
States, accounting for 22.9% of contributions. India on GitHub has been on a steady decline since 2016.
was the second-largest contributor with 19.0%,
60%
50%
AI projects (% of total)
40%
37.09%, Rest of the world
30%
10%
3.04%, China
0%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.2
Stars
a platform that offers a variety of tools for computer
GitHub users can show their interest in a repository
vision, such as object detection and feature extraction.
by “starring” it, a feature similar to liking a post on
social media, which signifies support for an open- The total number of stars for AI-related projects on
source project. Among the most starred repositories GitHub saw a significant increase in the last year, more
are libraries such as TensorFlow, OpenCV, Keras, and than tripling from 4.0 million in 2022 to 12.2 million in
PyTorch, which enjoy widespread popularity among 2023 (Figure 1.5.3). This sharp increase in GitHub stars,
software developers in the AI coding community. For along with the previously reported rise in projects,
example, TensorFlow is a popular library for building underscores the accelerating growth of open-source
and deploying machine learning models. OpenCV is AI software development.
12 12.21
10
Number of GitHub stars (in millions)
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.3
In 2023, the United States led in receiving the China, and India, saw a year-over-year increase in
highest number of GitHub stars, totaling 10.5 million the total number of GitHub stars awarded to projects
(Figure 1.5.4). All major geographic regions sampled, located in their countries.
including the European Union and United Kingdom,
2.12, China
2 1.92, India
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.4