0% found this document useful (0 votes)
371 views286 pages

OpenAlex Technical Documentation

Uploaded by

Changjing Zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
371 views286 pages

OpenAlex Technical Documentation

Uploaded by

Changjing Zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 286

OpenAlex technical

documentation
Overview

OpenAlex is a fully open catalog of the global research system. It's named after the
ancient Library of Alexandria and made by the nonprofit OurResearch.

This is the technical documentation for OpenAlex, including the OpenAlex API
and the data snapshot. Here, you can learn how to set up your code to access
OpenAlex's data. If you want to explore the data as a human, you may be more
interested in OpenAlex Web.

Data
The OpenAlex dataset describes scholarly entities and how those entities are
connected to each other. Types of entities include works, authors, sources,
institutions, topics, publishers, and funders.

Together, these make a huge web (or more technically, heterogeneous directed
graph) of hundreds of millions of entities and billions of connections between them
all.

Learn more at our general help center article: About the data

Access
We offer a fast, modern REST API to get OpenAlex data programmatically. It's free
and requires no authentication. The daily limit for API calls is 100,000 requests per
user per day. For best performance, add your email to all API requests, like
[email protected] . Learn more
There is also a complete database snapshot available to download. Learn more
about the data snapshot here.

The API has a limit of 100,000 calls per day, and the snapshot is updated monthly. If
you need a higher limit, or more frequent updates, please look into OpenAlex
Premium.

The web interface for OpenAlex, built directly on top of the API, is the quickest and
easiest way to get started with OpenAlex.

Why OpenAlex?
OpenAlex offers an open replacement for industry-standard scientific knowledge
bases like Elsevier's Scopus and Clarivate's Web of Science. Compared to these
paywalled services, OpenAlex offers significant advantages in terms of inclusivity,
affordability, and avaliability.

OpenAlex is:

Big — We have about twice the coverage of the other services, and have
significantly better coverage of non-English works and works from the Global
South.
Easy — Our service is fast, modern, and well-documented.
Open — Our complete dataset is free under the CC0 license, which allows for
transparency and reuse.

Many people and organizations have already found great value using OpenAlex.
Have a look at the Testimonials to hear what they've said!

Contact
For tech support and bug reports, please visit our help page. You can also join the
OpenAlex user group, and follow us on Twitter (@OpenAlex_org) and Mastodon.
Citation
If you use OpenAlex in research, please cite this paper:

Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of
scholarly works, authors, venues, institutions, and concepts. ArXiv.
https://arxiv.org/abs/2205.01833
Quickstart tutorial
Query the OpenAlex dataset using the magic of The Internet

Lets use the OpenAlex API to get journal articles and books published by authors at
Stanford University. We'll limit our search to articles published between 2010 and
2020. Since OpenAlex is free and openly available, these examples work without
any login or account creation. 👍
If you open these examples in a web browser, they will look much better if you have a
browser plug-in such as JSONVue installed.

1. Find the institution


You can use the institutions endpoint to learn about universities and research
centers. OpenAlex has a powerful search feature that searches across 108,000
institutions.

Lets use it to search for Stanford University:

Find Stanford University


https://api.openalex.org/institutions?search=stanford

Our first result looks correct (yeah!):

{
"id": "https://openalex.org/I97018004",
"ror": "https://ror.org/00f54p054",
"display_name": "Stanford University",
"country_code": "US",
"type": "education",
"homepage_url": "http://www.stanford.edu/"
// other fields removed
}

We can use the ID https://openalex.org/I97018004 in that result to find out more.


2. Find articles (works) associated with Stanford
University
The works endpoint contains over 240 million articles, books, and theses 😲. We
can filter to show works associated with Stanford.

Show works where at least one author is associated with Stanford University
https://api.openalex.org/works?
filter=institutions.id:https://openalex.org/I97018004

This is just one of the 50+ ways that you can filter works!

3. Filter works by publication year


Right now the list shows records for all years. Lets narrow it down to works that
were published between 2010 to 2020, and sort from newest to oldest.

Show works with publication years 2010 to 2020, associated with Stanford
University
https://api.openalex.org/works?
filter=institutions.id:https://openalex.org/I97018004,publication_year:2010-
2020&sort=publication_date:desc

4. Group works by publication year to show counts by


year
Finally, you can group our result by publication year to get our final result, which is
the number of articles produced by Stanford, by year from 2010 to 2020. There are
more than 30 ways to group records in OpenAlex, including by publisher, journal,
and open access status.

Group records by publication year


https://api.openalex.org/works?
filter=institutions.id:https://openalex.org/I97018004,publication\_year
:2010-2020\&group-by=publication\_year
That gives a result like this:

[
{
"key": "2020",
"key_display_name": "2020",
"count": 18627
},
{
"key": "2019",
"key_display_name": "2019",
"count": 15933
},
{
"key": "2017",
"key_display_name": "2017",
"count": 14789
},
...
]

There you have it! This same technique can be applied to hundreds of questions
around scholarly data. The data you received is under a CC0 license, so not only
🎉
did you access it easily, you can share it freely!

What's next?
Jump into an area of OpenAlex that interests you:

Works
Authors
Sources
Institutions
Topics
Publishers
Funders

And check out our tutorials page for some hands-on examples!
API Entities
Entities overview
The OpenAlex dataset describes scholarly entities and how those entities are
connected to each other. Together, these make a huge web (or more technically,
heterogeneous directed graph) of hundreds of millions of entities and billions of
connections between them all.

Learn more about the OpenAlex entities:

Works: Scholarly documents like journal articles, books, datasets, and theses
Authors: People who create works
Sources: Where works are hosted (such as journals, conferences, and
repositories)
Institutions: Universities and other organizations to which authors claim
affiliations
Topics: Topics assigned to works
Publishers: Companies and organizations that distribute works
Funders: Organizations that fund research
Geo: Where things are in the world
Works
Journal articles, books, datasets, and theses

Works are scholarly documents like journal articles, books, datasets, and theses.
OpenAlex indexes over 240M works, with about 50,000 added daily. You can
access a work in the OpenAlex API like this:

Get a list of OpenAlex works:


https://api.openalex.org/works

That will return a list of Work object, describing everything OpenAlex knows about
each work. We collect new works from many sources, including Crossref, PubMed,
institutional and discipline-specific repositories (eg, arXiv). Many older works come
from the now-defunct Microsoft Academic Graph (MAG).

Works are linked to other works via the referenced_works (outgoing citations),
cited_by_api_url (incoming citations), and related_works properties.

What's next
Learn more about what you can do with works:

The Work object


Get a single work
Get lists of works
Filter works
Search for works
Group works
Get N-grams
Work object
There's a lot of useful data inside a work. When you use the API to get a single
work or lists of works, this is what's returned.

abstract_inverted_index

Object: The abstract of the work, as an inverted index, which encodes information
about the abstract's words and their positions within the text. Like Microsoft
Academic Graph, OpenAlex doesn't include plaintext abstracts due to legal
constraints.

abstract_inverted_index: {
Despite: [
0
],
growing: [
1
],
interest: [
2
],
in: [
3,
57,
73,
110,
122
],
Open: [
4,
201
],
Access: [
5
],
...
}

Abstract inverted index coverage


Newer works are more likely to have an abstract inverted index. For example, over
60% of works in 2022 have abstract data, compared to 45% for works older than
2000. Full chart is below:

alternate_host_venues (deprecated)

The host_venue and alternate_host_venues properties have been deprecated in


favor of primary_location and locations . The attributes host_venue and
alternate_host_venues are no longer available in the Work object, and trying to
access them in filters or group-bys will return an error.

authorships

List: List of Authorship objects, each representing an author and their institution.
Limited to the first 100 authors to maintain API performance.
For more information, see the Authorship object page.

authorships: [
// first authorship object:
{
author_position: "middle",
author: {
id: "https://openalex.org/A5023888391",
display_name: "Jason Priem",
orcid: "https://orcid.org/0000-0001-6187-6610"
},
institutions: [
{
id: "https://openalex.org/I4200000001",
display_name: "OurResearch",
ror: "https://ror.org/02nr0ka47",
country_code: "US",
type: "nonprofit"
}
],
// other fields removed for brevity. See the Authorship object do
},

// more authorship objects go here


]

apc_list

Object: Information about this work's APC (article processing charge). The object
contains:

value : Integer
currency : String
provenance : String — the source of this data. Currently the only value is “doaj”
(DOAJ)
value_usd : Integer — the APC converted into USD

This value is the APC list price–the price as listed by the journal’s publisher. That’s
not always the price actually paid, because publishers may offer various discounts
to authors. Unfortunately we don’t always know this discounted price, but when we
do you can find it in apc_paid .

Currently our only source for this data is DOAJ, and so doaj is the only value for
apc_list.provenance , but we’ll add other sources over time.

We currently don’t have information on the list price for hybrid journals (toll-access
journals that also provide an open-access option), but we will add this at some
point. We do have apc_paid information for hybrid OA works occasionally.

You can use this attribute to find works published in Diamond open access journals
by looking at works where apc_list.value is zero. See open_access.oa_status
for more info.

apc_payment: {
value: 3200,
currency: "USD",
value_usd: 3200,
provenance: "doaj"
}

apc_paid

Object: Information about the paid APC (article processing charge) for this work.
The object contains:

value : Integer
currency : String
provenance : String — currently either openapc or doaj , but more will be
added; see below for details.
value_usd : Integer — the APC converted into USD

You can find the listed APC price (when we know it) for a given work using
apc_list . However, authors don’t always pay the listed price; often they get a
discounted price from publishers. So it’s useful to know the APC actually paid by
authors, as distinct from the list price. This is our effort to provide this.
Our best source for the actually paid price is the OpenAPC project. Where available,
we use that data, and so apc_paid.provenance is openapc . Where OpenAPC data
is unavailable (and unfortunately this is common) we make our best guess by
assuming the author paid the APC list price, and apc_paid.provenance will be set to
wherever we got the list price from.

apc_payment: {
value: 2250,
currency: "EUR",
value_usd: 2426,
provenance: "openapc"
}

best_oa_location

Object: A Location object with the best available open access location for this
work.

We score open locations to determine which is best using these factors:

1. Must have is_oa: true


2. type_:_ "publisher" is better than "repository".
3. version: "publishedVersion" is better than "acceptedVersion", which is better
than "submittedVersion".
4. pdf_url: A location with a direct PDF link is better than one without.
5. repository rankings: Some major repositories like PubMed Central and arXiv are
ranked above others.
best_oa_location: {
is_oa: true,
landing_page_url: "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1398957
pdf_url: null,
source: {
id: "https://openalex.org/S2764455111",
display_name: "PubMed Central",
issn_l: null,
issn: null,
host_organization: "https://openalex.org/I1299303238",
type: "repository"
},
license: null,
version: "publishedVersion"
}

biblio

Object: Old-timey bibliographic info for this work. This is mostly useful only in
citation/reference contexts. These are all strings because sometimes you'll get fun
values like "Spring" and "Inside cover."

volume (String)
issue (String)
first_page (String)
last_page (String)

biblio: {
volume: "495",
issue: "7442",
first_page: "437",
last_page: "440"
}

cited_by_api_url

String: A URL that uses the cites filter to display a list of works that cite this work.
This is a way to expand cited_by_count into an actual list of works.
cited_by_count

Integer: The number of citations to this work. These are the times that other works
have cited this work: Other works ➞ This work.

cited_by_count: 382

concepts

List: List of dehydrated Concept objects.

Each Concept object in the list also has one additional property:

score (Float): The strength of the connection between the work and this
concept (higher is stronger). This number is produced by AWS Sagemaker, in
the last layer of the machine learning model that assigns concepts.

Concepts with a score of at least 0.3 are assigned to the work. However, ancestors
of an assigned concept are also added to the work, even if the ancestor scores are
below 0.3.

Because ancestor concepts are assigned to works, you may see concepts in works
with very low scores, even some zero scores.
concepts: [
{
id: "https://openalex.org/C71924100",
wikidata: "https://www.wikidata.org/wiki/Q11190",
display_name: "Medicine",
level: 0,
score: 0.9187037
},
{
id: "https://openalex.org/C3007834351",
wikidata: "https://www.wikidata.org/wiki/Q82069695",
display_name: "Severe acute respiratory syndrome coronavirus 2 (SA
level: 5,
score: 0.8070164
},
...
{
id: "https://openalex.org/C191935318",
wikidata: "https://www.wikidata.org/wiki/Q148",
display_name: "China",
level: 2,
score: 0.5948172
},
...
{
id: "https://openalex.org/C121608353",
wikidata: "https://www.wikidata.org/wiki/Q12078",
display_name: "Cancer",
level: 2,
score: 0.46887803
},
...
{
id: "https://openalex.org/C17744445",
wikidata: "https://www.wikidata.org/wiki/Q36442",
display_name: "Political science",
level: 0,
score: 0
}
]

corresponding_author_ids

List: OpenAlex IDs of any authors for which authorships.is_corresponding is true .


corresponding_author_ids: ["https://openalex.org/A5004365451"]

corresponding_institution_ids

List: OpenAlex IDs of any institutions found within an authorship for which
authorships.is_corresponding is true .

corresponding_institution_ids: ["https://openalex.org/I4210123613"]

countries_distinct_count

Integer: Number of distinct country_codes among the authorships for this work.

countries_distinct_count: 4

counts_by_year

List: Works.cited_by_count for each of the last ten years, binned by year. To put it
another way: each year, you can see how many times this work was cited.

Any citations older than ten years old aren't included. Years with zero citations have
been removed so you will need to add those in if you need them.
counts_by_year: [
{
year: 2022,
cited_by_count: 8
},
{
year: 2021,
cited_by_count: 252
},
...
{
year: 2012,
cited_by_count: 79
}
]

created_date

String: The date this Work object was created in the OpenAlex dataset, expressed
as an ISO 8601 date string.

created_date: "2017-08-08"

display_name

String: Exactly the same as Work.title . It's useful for Work s to include a
display_name property, since all the other entities have one.

display_name: "The state of OA: a large-scale analysis of the prevalence

doi

String: The DOI for the work. This is the Canonical External ID for works.

Occasionally, a work has more than one DOI--for example, there might be one DOI
for a preprint version hosted on bioRxiv, and another DOI for the published version.
However, this field always has just one DOI, the DOI for the published work.

doi: "https://doi.org/10.7717/peerj.4375"

fulltext_origin

String: If a work's full text is searchable in OpenAlex ( has_fulltext is true ), this


tells you how we got the text. This will be one of:

pdf : We used Grobid to get the text from an open-access PDF.


ngrams : Full text search is enabled using N-grams obtained from the Internet
Archive.

This attribute is only available for works with has_fulltext:true .

fulltext_origin: "pdf"

fwci

Float: The Field-weighted Citation Impact (FWCI), calculated for a work as the ratio
of citations received / citations expected in the year of publications and three
following years. Learn more in the reference article: Field Weighted Citation Impact
(FWCI).

fwci: 76.992

grants

List: List of grant objects, which include the Funder and the award ID, if available.
Our grants data comes from Crossref, and is currently fairly limited.
grants: [
// grant for which we have the grant details:
{
funder: "https://openalex.org/F4320306076",
funder_display_name: "National Science Foundation",
award_id: "ABI 1661218",
},
// grant for which we do not have the details:
{
funder: "https://openalex.org/F4320306084",
funder_display_name: "U.S. Department of Energy",
award_id: null,
},
]

has_fulltext

Boolean: Set to true if the work's full text is searchable in OpenAlex. This does
not necessarily mean that the full text is available to you, dear reader; rather, it
means that we have indexed the full text and can use it to help power searches. If
you are trying to find the full text for yourself, try looking in open_access.oa_url .

We get access to the full text in one of two ways: either using an open-access PDF,
or using N-grams obtained from the Internet Archive. You can learn where a work's
full text came from at fulltext_origin .

has_fulltext: true

host_venue (deprecated)

The host_venue and alternate_host_venues properties have been deprecated in


favor of primary_location and locations . The attributes host_venue and
alternate_host_venues are no longer available in the Work object, and trying to
access them in filters or group-bys will return an error.

id
String: The OpenAlex ID for this work.

id: "https://openalex.org/W2741809807"

ids

Object: All the external identifiers that we know about for this work. IDs are
expressed as URIs whenever possible. Possible ID types:

doi (String: The DOI. Same as Work.doi )


mag (Integer: the Microsoft Academic Graph ID)
openalex (String: The OpenAlex ID. Same as Work.id )
pmid (String: The Pubmed Identifier)
pmcid (String: the Pubmed Central identifier)

Most works are missing one or more ID types (either because we don't know the ID, or
because it was never assigned). Keys for null IDs are not displayed.

ids: {
openalex: "https://openalex.org/W2741809807",
doi: "https://doi.org/10.7717/peerj.4375",
mag: 2741809807,
pmid: "https://pubmed.ncbi.nlm.nih.gov/29456894"
}

indexed_in

List: The sources this work is indexed in. Possible values: arxiv , crossref , doaj
, pubmed .

indexed_in: [
"arxiv", "crossref", "pubmed"
]
institutions_distinct_count

Integer: Number of distinct institutions among the authorships for this work.

institutions_distinct_count: 4

is_paratext

Boolean: True if we think this work is paratext.

In our context, paratext is stuff that's in a scholarly venue (like a journal) but is
about the venue rather than a scholarly work properly speaking. Some examples
and nonexamples:

yep it's paratext: front cover, back cover, table of contents, editorial board
listing, issue information, masthead.
no, not paratext: research paper, dataset, letters to the editor, figures

Turns out there is a lot of paratext in registries like Crossref. That's not a bad
thing... but we've found that it's good to have a way to filter it out.

We determine is_paratext algorithmically using title heuristics.

is_paratext: false

is_retracted

Boolean: True if we know this work has been retracted.

We identify works that have been retracted using the public Retraction Watch
database, a public resource made possible by a partnership between Crossref and
The Center for Scientific Integrity.
is_retracted: false

keywords

List of objects: Short phrases identified based on works' Topics. For background on
how Keywords are identified, see the Keywords page at OpenAlex help pages.

The score for each keyword represents the similarity score of that keyword to the
title and abstract text of the work.

We provide up to 5 keywords per work, for all keywords with scores above a certain
threshold.

[
{
id: "https://openalex.org/keywords/global-seaweed-distribution",
display_name: "Global Seaweed Distribution",
score: 0.559386
},
{
id: "https://openalex.org/keywords/climate-change-impacts",
display_name: "Climate Change Impacts",
score: 0.535795
},
{
id: "https://openalex.org/keywords/ecosystem-resilience",
display_name: "Ecosystem Resilience",
score: 0.502789
}
]

language

String: The language of the work in ISO 639-1 format. The language is automatically
detected using the information we have about the work. We use the langdetect
software library on the words in the work's abstract, or the title if we do not have
the abstract. The source code for this procedure is here. Keep in mind that this
method is not perfect, and that in some cases the language of the title or abstract
could be different from the body of the work.

A few things to keep in mind about this:

We don't always assign a language if we do not have enough words available to


accurately guess.
We report the language of the metadata, not the full text. For example, if a work
is in French, but the title and abstract are in English, we report the language as
English.
In some cases, abstracts are in two different languages. Unfortunately, when this
happens, what we report will not be accurate.

language: "en"

license

String: The license applied to this work at this host. Most toll-access works don't
have an explicit license (they're under "all rights reserved" copyright), so this field
generally has content only if is_oa is true .

license: "cc-by"

locations

List: A list of Location objects describing all unique places where this work lives.
locations: [
{
is_oa: true,
landing_page_url: "https://doi.org/10.1073/pnas.17.6.401",
pdf_url: "http://www.pnas.org/content/17/6/401.full.pdf",
source: {
id: "https://openalex.org/S125754415",
display_name: "Proceedings of the National Academy of Sciences of t
issn_l: "0027-8424",
issn: ["1091-6490", "0027-8424"],
host_organization: "https://openalex.org/P4310320052",
type: "journal"
},
license: null,
version: "publishedVersion"
},
{
is_oa: true,
landing_page_url: "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10760
pdf_url: null,
source: {
id: "https://openalex.org/S2764455111",
display_name: "PubMed Central",
issn_l: null,
issn: null,
host_organization: "https://openalex.org/I1299303238",
type: "repository"
},
license: null,
version: "publishedVersion"
}
]

locations_count

Integer: Number of locations for this work.

locations_count: 3

mesh
List: List of MeSH tag objects. Only works found in PubMed have MeSH tags; for all
other works, this is an empty list.

mesh: [
{
descriptor_ui: "D017712",
descriptor_name: "Peer Review, Research",
qualifier_ui: "Q000379",
qualifier_name: "methods",
is_major_topic: false
},
{
descriptor_ui: "D017712",
descriptor_name: "Peer Review, Research",
qualifier_ui: "Q000592",
qualifier_name: "standards",
is_major_topic: true
}
]

open_access

Object: Information about the access status of this work, as an OpenAccess object.

open_access: {
is_oa: true,
oa_status: "gold",
oa_url: "https://peerj.com/articles/4375.pdf",
any_repository_has_fulltext: true
},

primary_location

Object: A Location object with the primary location of this work.

The primary_location is where you can find the best (closest to the version of
record) copy of this work. For a peer-reviewed journal article, this would be a full
text published version, hosted by the publisher at the article's DOI URL.
primary_location: {
is_oa: true,
landing_page_url: "https://doi.org/10.1073/pnas.17.6.401",
pdf_url: "http://www.pnas.org/content/17/6/401.full.pdf",
source: {
id: "https://openalex.org/S125754415",
display_name: "Proceedings of the National Academy of Sciences of the
issn_l: "0027-8424",
issn: ["1091-6490", "0027-8424"],
host_organization: "https://openalex.org/P4310320052",
type: "journal"
},
license: null,
version: "publishedVersion"
}

primary_topic

Object

The top ranked Topic for this work. This is the same as the first item in
Work.topics .

primary_topic: {
id: "https://openalex.org/T12419",
display_name: "Analysis of Cardiac and Respiratory Sounds",
score: 0.9997,
subfield: {
id: 2740,
display_name: "Pulmonary and Respiratory Medicine"
}
field: {
id: 27,
display_name: "Medicine"
}
domain: {
id: 4,
display_name: "Health Sciences"
}
}
publication_date

String: The day when this work was published, formatted as an ISO 8601 date.

Where different publication dates exist, we usually select the earliest available date
of electronic publication.

This date applies to the version found at Work.url . The other versions, found in
Work.locations , may have been published at different (earlier) dates.

publication_date: "2018-02-13"

publication_year

Integer: The year this work was published.

This year applies to the version found at Work.url . The other versions, found in
Work.locations , may have been published in different (earlier) years.

publication_year: 2018

referenced_works

List: OpenAlex IDs for works that this work cites. These are citations that go from
this work out to another work: This work ➞ Other works.

referenced_works: [
"https://openalex.org/W2753353163",
"https://openalex.org/W2785823074",
"https://openalex.org/W2511661767",
"https://openalex.org/W2115339903",
"https://openalex.org/W2031754690"
]
related_works

List: OpenAlex IDs for works related to this work. Related works are computed
algorithmically; the algorithm finds recent papers with the most concepts in
common with the current paper.

related_works: [
"https://openalex.org/W2753353163",
"https://openalex.org/W2785823074",
"https://openalex.org/W2511661767",
"https://openalex.org/W2115339903",
"https://openalex.org/W2031754690",
]

sustainable_development_goals

List: List of objects

The United Nations' 17 Sustainable Development Goals are a collection of goals at


the heart of a global "shared blueprint for peace and prosperity for people and the
planet." We use a machine learning model to tag works with their relevance to these
goals based on our OpenAlex SDG Classifier, an mBERT machine learning model
developed by the Aurora Universities Network. The score represents the model's
predicted probability of the work's relevance for a particular goal.

We display all of the SDGs with a prediction score higher than 0.4.

sustainable_development_goals: [
{
id: "https://metadata.un.org/sdg/3",
display_name: "Good health and well-being",
score: 0.95
}
]

topics
List: List of objects

The top ranked Topics for this work. We provide up to 3 topics per work.

topics: [
{
id: "https://openalex.org/T12419",
display_name: "Analysis of Cardiac and Respiratory Sounds",
score: 0.9997,
subfield: {
id: 2740,
display_name: "Pulmonary and Respiratory Medicine"
}
field: {
id: 27,
display_name: "Medicine"
}
domain: {
id: 4,
display_name: "Health Sciences"
}
}
...
]

title

String: The title of this work.

This is exactly the same as Work.display_name . We include both attributes with


the same information because we want all entities to have a display_name , but
there's a longstanding tradition of calling this the "title," so we figured you'll be
expecting works to have it as a property.

title: "The state of OA: a large-scale analysis of the prevalence and imp

type

String: The type of the work.


You can see all of the different types along with their counts in the OpenAlex API
here: https://api.openalex.org/works?group_by=type .

Most works are type article . This includes what was formerly (and currently in
type_crossref ) labeled as journal-article , proceedings-article , and
posted-content . We consider all of these to be article type works, and the
distinctions between them to be more about where they are published or hosted:

Journal articles will have a primary_location.source.type of journal

Conference proceedings will have a primary_location.source.type of


conference

Preprints or "posted content" will have a primary_location.version of


submittedVersion

(Note that distinguishing between journals and conferences is a hard problem, one
we often get wrong. We are working on improving this, but we also point out that
the two have a lot of overlap in terms of their roles as hosts of research
publications.)

Works that are hosted primarily on a preprint, or that are identified speicifically as
preprints in the metadata we receive, are assigned the type preprint rather than
article .

Works that represent stuff that is about the venue (such as a journal)—rather than a
scholarly work properly speaking—have type paratext . These include things like
front-covers, back-covers, tables of contents, and the journal itself (e.g.,
https://openalex.org/W4232230324 ).

We also have types for letter , editorial , erratum (corrections), libguides ,


supplementary-materials , and review (currently, articles that come from
journals that exclusively publish review articles). Coverage is low on these but will
improve.

Other work types follow the Crossref "type" controlled vocabulary—see


type_crossref .

type: "article"
type_crossref

String: Legacy type information, using Crossref's "type" controlled vocabulary.

These are the work types that we used to use, before switching to our current
system (see type ).

You can see all possible values of Crossref's "type" controlled vocabulary via the
Crossref api here: https://api.crossref.org/types .

Where possible, we just pass along Crossref's type value for each work. When
that's impossible (eg the work isn't in Crossref), we do our best to figure out the
type ourselves.

type_crossref: "journal-article"

updated_date

String: The last time anything in this Work object changed, expressed as an ISO
8601 date string (in UTC). This date is updated for any change at all, including
increases in various counts.

updated_date: "2022-01-02T00:22:35.180390"

The OpenAccess object


The OpenAccess object describes access options for a given work. It's only found
as part of the Work object.

any_repository_has_fulltext
Boolean: True if any of this work's locations has location.is_oa=true and
location.source.type=repository .

Use case: researchers want to track Green OA, using a definition of "any repository
hosts this." OpenAlex's definition (as used in oa_status ) doesn't support this,
because as soon as there's a publisher-hosted copy (bronze, hybrid, or gold),
oa_status is set to that publisher-hosted status.

So there's a lot of repository-hosted content that the oa_status can't tell you
about. Our State of OA paper calls this "shadowed Green." This feature makes it
possible to track shadowed Green.

any_repository_has_fulltext: true

is_oa

Boolean: True if this work is Open Access (OA).

There are many ways to define OA. OpenAlex uses a broad definition: having a URL
where you can read the fulltext of this work without needing to pay money or log in.
You can use the locations and oa_status fields to narrow your results further,
accommodating any definition of OA you like.

is_oa: true

oa_status

String: The Open Access (OA) status of this work. Possible values are:

diamond : Published in a fully OA journal—one that is indexed by the DOAJ or


that we have determined to be OA—with no article processing charges (i.e., free
for both readers and authors).
gold : Published in a fully OA journal.
green : Toll-access on the publisher landing page, but there is a free copy in an
OA repository.
hybrid : Free under an open license in a toll-access journal.
bronze : Free to read on the publisher landing page, but without any identifiable
license.
closed : All other articles.

oa_status: "gold"

oa_url

String: The best Open Access (OA) URL for this work.

Although there are many ways to define OA, in this context an OA URL is one where
you can read the fulltext of this work without needing to pay money or log in. The
"best" such URL is the one closest to the version of record.

This URL might be a direct link to a PDF, or it might be to a landing page that links to
the free PDF

oa_url: "https://peerj.com/articles/4375.pdf"
Authorship object
The Authorship object represents a single author and her institutional affiliations in
the context of a given work. It is only found as part of a Work object, in the
work.authorships property.

affiliations

List: List of objects

Each institutional affiliation that this author has claimed will be listed here: the raw
affiliation string that we found, along with the OpenAlex Institution ID or IDs that
we matched it to.

This information will be redundant with institutions below, but is useful if you
need to know about what we used to match institutions.

affiliations: [
{
raw_affiliation_string: "Scholarly Communications Lab, Simon Fras
institution_ids: [
"https://openalex.org/I18014758"
]
}
]

author

String: An author of this work, as a dehydrated Author object.

Note that, sometimes, we assign ORCID using author disambiguation, so the ORCID
we associate with an author was not necessarily included with this work.
author: {
id: "https://openalex.org/A5085171399",
display_name: "Juan Pablo Alperin",
orcid: "https://orcid.org/0000-0002-9344-7439"
}

author_position

String: A summarized description of this author's position in the work's author list.
Possible values are first , middle , and last .

It's not strictly necessary, because author order is already implicitly recorded by
the list order of Authorship objects; however it's useful in some contexts to have
this as a categorical value.

author_position: "first"

countries

List: The country or countries for this author.

We determine the countries using a combination of matched institutions and


parsing of the raw affiliation strings, so we can have this information for some
authors even if we do not have a specific institutional affiliation.

countries: [
"US"
]

institutions

List: The institutional affiliations this author claimed in the context of this work, as
dehydrated Institution objects.
institutions: [
{
id: "https://openalex.org/I18014758",
display_name: "Simon Fraser University",
ror: "https://ror.org/0213rcc28",
country_code: "CA",
type: "education",
lineage: ["https://openalex.org/I18014758"]
}
]

is_corresponding

Boolean: If true , this is a corresponding author for this work.

This is a new feature, and the information may be missing for many works. We are
working on this, and coverage will improve soon.

raw_affiliation_strings

List: This author's affiliation as it originally came to us (on a webpage or in an API),


as a list of raw unformatted strings. If there is only one affiliation, it will be a list of
length one.

raw_affiliation_strings: [
"Canadian Institute for Studies in Publishing, Simon Fraser Universit
],

raw_author_name

String: This author's name as it originally came to us (on a webpage or in an API),


as a raw unformatted string.

raw_author_name: "Juan Pablo Alperin"


Location object
The Location object describes the location of a given work. It's only found as part
of the Work object.

Locations are meant to capture the way that a work exists in different versions. So,
for example, a work may have a version that has been peer-reviewed and published
in a journal (the version of record). This would be one of the work's locations. It
may have another version available on a preprint server like bioRxiv—this version
having been posted before it was accepted for publication. This would be another
one of the work's locations.

Below is an example of a work in OpenAlex (https://openalex.org/W2807749226)


that has multiple locations with different properties. The version of record,
published in a peer-reviewed journal, is listed first, and is not open-access. The
second location is a university repository, where one can find an open-access copy
of the published version of the work. Other locations are listed below.
One work can have multiple locations. These locations can differ in properties such as version
and open-access status.

Locations are meant to cover anywhere that a given work can be found. This can
include journals, proceedings, institutional repositories, and subject-area
repositories like arXiv and bioRxiv. If you are only interested in a certain one of
these (like journal), you can use a filter to specify the locations.source.type .
(Learn more about types here.)

There are three places in the Work object where you can find locations:

primary_location : The best (closest to the version of record) copy of this


work.
best_oa_location : The best available open access location of this work.
locations : A list of all of the locations where this work lives. This will include
the two locations above if availabe, and can also include other locations.

is_accepted

Boolean: if this location's version is either


true acceptedVersion or
publishedVersion ; otherwise false .

is_accepted: true

is_oa

Boolean: True if an Open Access (OA) version of this work is available at this
location.

There are many ways to define OA. OpenAlex uses a broad definition: having a URL
where you can read the fulltext of this work without needing to pay money or log in.
is_oa: true

is_published

Boolean: true if this location's version is publishedVersion ; otherwise false .

is_published: true

landing_page_url
String: The landing page URL for this location.

landing_page_url: "https://doi.org/10.1590/s1678-77572010000100010"

license
String: The location's publishing license. This can be a Creative Commons license
such as cc0 or cc-by, a publisher-specific license, or null which means we are not
able to determine a license for this location.

license: "cc-by"

source
Object: Information about the source of this location, as a DehydratedSource
object.

The concept of a source is meant to capture a certain social relationship between


the host organization and a version of a work. When an organization puts the work
on the internet, there is an understanding that they have, at some level, endorsed
the work. This level varies, and can be very different depending on the source!
source {
id: "https://openalex.org/S125754415",
display_name: "Proceedings of the National Academy of Sciences of the
issn_l: "0027-8424",
issn: ["1091-6490", "0027-8424"],
host_organization: "https://openalex.org/P4310320052",
type: "journal"
}

pdf_url
String: A URL where you can find this location as a PDF.

pdf_url: "http://www.scielo.br/pdf/jaos/v18n1/a10v18n1.pdf"

version
String: The version of the work, based on the DRIVER Guidelines versioning
scheme. Possible values are:.

publishedVersion : The document’s version of record. This is the most


authoritative version.
acceptedVersion : The document after having completed peer review and being
officially accepted for publication. It will lack publisher formatting, but the
content should be interchangeable with the that of the publishedVersion .
submittedVersion : the document as submitted to the publisher by the authors,
but before peer-review. Its content may differ significantly from that of the
accepted article.

version: "publishedVersion"
Get a single work
It's easy to get a work from from the API with: /works/<entity_id> Here's an
example:

Get the work with the OpenAlex ID W2741809807 :


https://api.openalex.org/works/W2741809807

That will return a Work object, describing everything OpenAlex knows about the
work with that ID.

{
"id": "https://openalex.org/W2741809807",
"doi": "https://doi.org/10.7717/peerj.4375",
"title": "The state of OA: a large-scale analysis of the prevalence a
"display_name": "The state of OA: a large-scale analysis of the preva
"publication_year": 2018,
"publication_date": "2018-02-13",
// other fields removed for brevity
}

You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax (tutorial).

External IDs
You can look up works using external IDs such as a DOI:

Get the work with this DOI: https://doi.org/10.7717/peerj.4375 :


https://api.openalex.org/works/https://doi.org/10.7717/peerj.4375

You can use the full ID or a shorter Uniform Resource Name (URN) format like so:

Get the work with PubMed ID: https://pubmed.ncbi.nlm.nih.gov/14907713 :


https://api.openalex.org/works/pmid:14907713
Available external IDs for works are:

External ID URN

DOI doi

Microsoft Academic Graph (MAG) mag

PubMed ID (PMID) pmid

PubMed Central ID (PMCID) pmcid

You must make sure that the ID(s) you supply are valid and correct. If an ID you
request is incorrect, you will get no result. If you request an illegal ID—such as one
containing a , or & , the query will fail and you will get a 403 error.

Select fields
You can use select to limit the fields that are returned in a work object. More
details are here.

Display only the id and display_name for a work object


https://api.openalex.org/works/W2741809807?select=id,display_name
Get lists of works
You can get lists of works:

Get all of the works in OpenAlex


https://api.openalex.org/works

Which returns a response like this:

{
"meta": {
"count": 245684392,
"db_response_time_ms": 929,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/W1775749144",
"doi": "https://doi.org/10.1016/s0021-9258(19)52451-6",
"title": "PROTEIN MEASUREMENT WITH THE FOLIN PHENOL REAGENT",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/W2100837269",
"doi": "https://doi.org/10.1038/227680a0",
"title": "Cleavage of Structural Proteins during the Assembly
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}

Page and sort works


You can page through works and change the default number of results returned
with the page and per-page parameters:
Get a second page of results with 50 results per page
https://api.openalex.org/works?per-page=50\&page=2

You can sort results with the sort parameter:

Sort works by publication year


https://api.openalex.org/works?sort=publication\_year

Continue on to learn how you can filter and search lists of works.

Sample works
You can use sample to get a random batch of works. Read more about sampling
and how to add a seed value here.

Get 20 random works


https://api.openalex.org/works?sample=20

Select fields
You can use select to limit the fields that are returned in a list of works. More
details are here.

Display only the id and display_name within works results


https://api.openalex.org/works?select=id,display\_name
Filter works
It's easy to filter works with the filter parameter:

Get works where the publication year is 2020


https://api.openalex.org/works?filter=publication\_year:2020

In this example the filter is publication_year and the value is 2020.

It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query.

/works attribute filters


You can filter using these attributes of the Work object (click each one to view
their documentation on the Work object page):

The host_venue and alternate_host_venues properties have been deprecated in


favor of primary_location and locations . The attributes host_venue and
alternate_host_venues are no longer available in the Work object, and trying to
access them in filters or group-bys will return an error.

authorships.affiliations.institution_ids

authorships.author.id (alias: author.id ) — Authors for a work (OpenAlex ID)


authorships.author.orcid (alias: author.orcid ) — Authors for a work
(ORCID)
authorships.countries

authorships.institutions.country_code (alias: institutions.country_code )


authorships.institutions.id (alias: institutions.id ) — Institutions
affiliated with the authors of a work (OpenAlex ID)
authorships.institutions.lineage
authorships.institutions.ror (alias: institutions.ror ) — Institutions
affiliated with the authors of a work (ROR ID)
authorships.institutions.type

authorships.is_corresponding (alias: is_corresponding ) — This filter marks


whether or not we have corresponding author information for a given work
apc_list.value

apc_list.currency

apc_list.provenance

apc_list.value_usd

apc_paid.value

apc_paid.currency

apc_paid.provenance

apc_paid.value_usd

best_oa_location.is_accepted

best_oa_location.is_published

best_oa_location.license — The Open Acess license for a work


best_oa_location.source.id

best_oa_location.source.is_in_doaj

best_oa_location.source.issn

best_oa_location.source.host_organization

best_oa_location.source.type

best_oa_location.version

biblio.first_page

biblio.issue

biblio.last_page

biblio.volume

cited_by_count

concepts.id (alias: concept.id ) — The concepts associated with a work


concepts.wikidata

corresponding_author_ids — Corresponding authors for a work (OpenAlex ID)


corresponding_institution_ids

countries_distinct_count

doi — The DOI (Digital Object Identifier) of a work


fulltext_origin

fwci

grants.award_id — Award IDs for grants


grants.funder — Funding organizations linked to grants for a work
has_fulltext

ids.pmcid

ids.pmid (alias: pmid )


ids.openalex (alias: openalex ) — The OpenAlex ID for a work
ids.mag (alias: mag )
indexed_in

institutions_distinct_count

is_paratext

is_retracted

keywords.keyword

language

locations.is_accepted

locations.is_oa

locations.is_published

locations.license

locations.source.id

locations.source.is_core

locations.source.is_in_doaj

locations.source.issn

locations.source.host_organization

locations.source.type

locations.version

locations_count
open_access.any_repository_has_fulltext

open_access.is_oa (alias: is_oa ) — Whether a work is Open Access


open_access.oa_status (alias: oa_status ) — The Open Access status for a
work (e.g., gold, green, hybrid, etc.)
primary_location.is_accepted

primary_location.is_oa

primary_location.is_published

primary_location.license

primary_location.source.id

primary_location.source.is_core

primary_location.source.is_in_doaj

primary_location.source.issn

primary_location.source.host_organization

primary_location.source.type

primary_location.version

primary_topic.id

primary_topic.domain.id

primary_topic.field.id

primary_topic.subfield.id

publication_year

publication_date

sustainable_development_goals.id

topics.id

topics.domain.id

topics.field.id

topics.subfield.id

type

type_crossref
Want to filter by the display_name of an associated entity (author, institution, source,
etc.)? See here.

/works convenience filters


These filters aren't attributes of the Work object, but they're handy for solving
some common use cases:

abstract.search

Text search using abstracts

Value: a search string

Returns: works whose abstract includes the given string. See the search page for
details on the search algorithm used.

Get works with abstracts that mention "artificial intelligence":


https://api.openalex.org/works?
filter=abstract.search:artificial%20intelligence

authors_count

Number of authors for a work

Value: an Integer

Returns: works with the chosen number of authorships objects (authors). You can
use the inequality filter to select a range, such as authors_count:>5 .

Get works that have exactly one author


https://api.openalex.org/works?filter=authors\_count:1

authorships.institutions.continent (alias: institutions.continent )


Value: a String with a valid continent filter

Returns: works where at least one of the author's institutions is in the chosen
continent.

Get works where at least one author's institution in each work is located in
Europe
https://api.openalex.org/works?
filter=authorships.institutions.continent:europe

authorships.institutions.is_global_south (alias:
institutions.is_global_south )

Value: a Boolean ( true or false )

Returns: works where at least one of the author's institutions is in the Global South
(read more).

Get works where at least one author's institution is in the Global South
https://api.openalex.org/works?
filter=authorships.institutions.is\_global\_south:true

best_open_version

Value: a String with one of the following values:

any : This means that best_oa_location.version = submittedVersion ,


acceptedVersion , or publishedVersion

acceptedOrPublished : This means that best_oa_location.version can be


acceptedVersion or publishedVersion

published : This means that best_oa_location.version = publishedVersion

Returns: works that meet the above criteria for best_oa_location .

Get works whose best_oa_location is a submitted, accepted, or published


version: https://api.openalex.org/works?filter=best_open_version:any ``

cited_by
Value: the OpenAlex ID for a given work

Returns: works found in the given work's referenced_works section. You can think
of this as outgoing citations.

Get works cited by https://openalex.org/W2766808518 :


https://api.openalex.org/works?filter=cited_by:W2766808518

cites

Value: the OpenAlex ID for a given work

Returns: works that cite the given work. This is works that have the given OpenAlex
ID in the referenced_works section. You can think of this as incoming citations.

Get works that cite https://openalex.org/W2741809807 :


https://api.openalex.org/works?filter=cites:W2741809807 ``

The number of results returned by this filter may be slightly higher than the work's
cited_by_count due to a timing lag in updating that field.

concepts_count

Value: an Integer

Returns: works with the chosen number of concepts .

Get works with at least three concepts assigned


https://api.openalex.org/works?filter=concepts\_count:>2

default.search

Text search across titles, abstracts, and full text of works

Value: a search string


This works the same as using the search parameter for Works.

display_name.search (alias: title.search )

Text search across titles for works

Value: a search string

Returns: works whose display_name (title) includes the given string; see the
search page for details.

Get works with titles that mention the word "wombat":


https://api.openalex.org/works?filter=title.search:wombat

For most cases, you should use the search parameter instead of this filter, because it
uses a better search algorithm and searches over abstracts as well as titles.

from_created_date

Value: a date, formatted as yyyy-mm-dd

Returns: works with created_date greater than or equal to the given date.

This field requires an OpenAlex Premium subscription to access. Click here to learn
more.

Get works created on or after January 12th, 2023 (does not work without valid
API key):
https://api.openalex.org/works?filter=from_created_date:2023-01-
12&api_key=myapikey

from_publication_date

Value: a date, formatted as yyyy-mm-dd

Returns: works with publication_date greater than or equal to the given date.
Get works published on or after March 14th, 2001:
https://api.openalex.org/works?filter=from_publication_date:2001-03-14

Filtering by publication date is not a reliable way to retrieve recently updated and
created works, due to the way publishers assign publication dates. Use
from_created_date or from_updated_date to get the latest changes in OpenAlex.

from_updated_date

Value: a date, formatted as an ISO 8601 date or date-time string (for example:
"2020-05-17", "2020-05-17T15:30", or "2020-01-02T00:22:35.180390").

Returns: works with updated_date greater than or equal to the given date.

This field requires an OpenAlex Premium subscription to access. Click here to learn
more.

Get works updated on or after January 12th, 2023 (does not work without valid
API key):
https://api.openalex.org/works?filter=from_updated_date:2023-01-
12&api_key=myapikey

Learn more about using this filter to get the freshest data possible with our Premium
How-To.

fulltext.search

Value: a search string

Returns: works whose fulltext includes the given string. Fulltext search is available
for a subset of works, obtained either from PDFs or n-grams, see
Work.has_fulltext for more details.

Get works with fulltext that mention "climate change":


https://api.openalex.org/works?filter=fulltext.search:climate%20change
We combined some n-grams before storing them in our search database, so querying
for an exact phrase using quotes does not always work well.

has_abstract

Works that have an abstract available

Value: a Boolean ( true or false )

Returns: works that have or lack an abstract, depending on the given value.

Get the works that have abstracts:


https://api.openalex.org/works?filter=has_abstract:true

has_doi

Value: a Boolean ( true or false )

Returns: works that have or lack a DOI, depending on the given value. It's especially
useful for grouping.

Get the works that have no DOI assigned:


https://api.openalex.org/works?filter=has_doi:false ``

has_oa_accepted_or_published_version

Value: a Boolean ( true or false )

Returns: works with at least one of the locations has is_oa = true and version
is acceptedVersion or publishedVersion. For Works that undergo peer review, like
journal articles, this means there is a peer-reviewed OA copy somewhere. For some
items, like books, a published version doesn't imply peer review, so they aren't
quite synonymous.

Get works with an OA accepted or published copy


https://api.openalex.org/works?
filter=has_oa_accepted_or_published_version:true
has_oa_submitted_version

Value: a Boolean ( true or false )

Returns: works with at least one of the locations has is_oa = true and version
is submittedVersion. This is useful for finding works with preprints deposited
somewhere.

Get works with an OA submitted copy:


https://api.openalex.org/works?filter=has_oa_submitted_version:true ``

has_orcid

Value: a Boolean ( true or false )

Returns: if true it returns works where at least one author or has an ORCID ID. If
false , it returns works where no authors have an ORCID ID. This is based on the
orcid field within authorships.author . Note that, sometimes, we assign ORCID
using author disambiguation, so this does not necessarily mean that the work itself
has ORCID information.

Get the works where at least one author has an ORCID ID:
https://api.openalex.org/works?filter=has_orcid:true

has_pmcid

Value: a Boolean ( true or false )

Returns: works that have or lack a PubMed Central identifier ( pmcid ) depending on
the given value.

Get the works that have a pmcid :


https://api.openalex.org/works?filter=has_pmcid:true ``

has_pmid

Value: a Boolean ( true or false )


Returns: works that have or lack a PubMed identifier ( pmid ), depending on the
given value.

Get the works that have a pmid :


https://api.openalex.org/works?filter=has_pmid:true ``

has_ngrams (DEPRECATED)

Works that have n-grams available to enable full-text search in OpenAlex.

This filter has been deprecated. See instead: has_fulltext .

Value: a Boolean ( true or false )

Returns: works for which n-grams are available or unavailable, depending on the
given value. N-grams power fulltext searches through the fulltext.search filter
and the search parameter.

Get the works that have n-grams:


https://api.openalex.org/works?filter=has_ngrams:true

has_references

Value: a Boolean ( true or false )

Returns: works that have or lack referenced_works , depending on the given value.

Get the works that have references:


https://api.openalex.org/works?filter=has_references:true

journal

Value: the OpenAlex ID for a given source, where the source is type: journal

Returns: works where the chosen source ID is the primary_location.source .

locations.source.host_institution_lineage
Value: the OpenAlex ID for an Institution

Returns: works where the given institution ID is in


locations.source.host_organization_lineage

Get the works that have https://openalex.org/I205783295 in their


host_organization_lineage :
https://api.openalex.org/works?
filter=locations.source.host_institution_lineage:https://openalex.org/I
205783295

locations.source.publisher_lineage

Value: the OpenAlex ID for a Publisher

Returns: works where the given publisher ID is in


locations.source.host_organization_lineage

Get the works that have https://openalex.org/P4310320547 in their


publisher_lineage :
https://api.openalex.org/works?
filter=locations.source.publisher_lineage:https://openalex.org/P4310320
547

mag_only

Value: a Boolean ( true or false )

Returns: works which came from MAG (Microsoft Academic Graph), and no other
data sources.

MAG was a project by Microsoft Research to catalog all of the scholarly content on
the internet. After it was discontinued in 2021, OpenAlex built upon the data MAG
had accumulated, connecting and expanding it using a variety of other sources.
The methods that MAG used to identify and aggregate scholarly content were quite
different from most of our other sources, and so the content inherited from MAG,
especially works that we did not connect with data from other sources, can look
different from other works. While it's great to have these MAG-only works available,
you may not always want to include them in your results or analyses. This filter
allows you to include or exclude any works that came from MAG and only MAG.

Get all MAG-only works:


https://api.openalex.org/works?filter=mag_only:true

primary_location.source.has_issn

Value: a Boolean ( true or false )

Returns: works where the primary_location has at least one ISSN assigned.

Get the works that have an ISSN within the primary location:
https://api.openalex.org/works?
filter=primary_location.source.has_issn:true

primary_location.source.publisher_lineage

Value: the OpenAlex ID for a Publisher

Returns: works where the given publisher ID is in


primary_location.source.host_organization_lineage

Get the works that have https://openalex.org/P4310320547 in their


publisher_lineage :
https://api.openalex.org/works?
filter=primary_location.source.publisher_lineage:https://openalex.org/P
4310320547

raw_affiliation_strings.search

This filter used to be named raw_affiliation_string.search , but it is now


raw_affiliation_strings.search (i.e., plural, with an 's').

Value: a search string


Returns: works that have at least one raw_affiliation_strings which includes
the given string. See the search page for details on the search algorithm used.

Get works with the words Department of Political Science, University of


Amsterdam somewhere in at least one author's raw_affiliation_strings :
https://api.openalex.org/works?
filter=raw_affiliation_strings.search:department%20of%20political%20sci
ence%20university%20of%amsterdam

related_to

Value: the OpenAlex ID for a given work

Returns: works found in the given work's related_works section.

Get works related to https://openalex.org/W2486144666:


https://api.openalex.org/works?filter=related_to:W2486144666

repository

Value: the OpenAlex ID for a given source, where the source is type: repository

Returns: works where the chosen source ID exists within the locations .

You can use this to find works where authors are associated with your university,
but the work is not part of the university's repository.👏
Get works that are available in the University of Michigan Deep Blue repository
(OpenAlex ID: https://openalex.org/S4306400393 )
https://api.openalex.org/works?filter=repository:S4306400393

Get works where at least one author is associated with the University of
Michigan, but the works are not found in the University of Michigan Deep Blue
repository
https://api.openalex.org/works?
filter=institutions.id:I27837315,repository:!S4306400393

You can also use this as a group_by to learn things about repositories:
Learn which repositories have the most open access works
https://api.openalex.org/works?filter=is_oa:true&group_by=repository

title_and_abstract.search

Text search across titles and abstracts for works

Value: a search string

Returns: works whose display_name (title) or abstract includes the given string;
see the search page for details.

Get works with title or abstract mentioning "gum disease":


https://api.openalex.org/works?
filter=title_and_abstract.search:gum%20disease

to_created_date

Value: a date, formatted as yyyy-mm-dd

Returns: works with created_date less than or equal to the given date.

This field requires an OpenAlex Premium subscription to access. Click here to learn
more.

Get works created on or after January 12th, 2023 (does not work without valid
API key):
https://api.openalex.org/works?filter=to_created_date:2024-01-
12&api_key=myapikey

to_publication_date

Value: a date, formatted as yyyy-mm-dd

Returns: works with publication_date less than or equal to the given date.

Get works published on or before March 14th, 2001:


https://api.openalex.org/works?filter=to_publication_date:2001-03-14
to_updated_date

Value: a date, formatted as an ISO 8601 date or date-time string (for example:
"2020-05-17", "2020-05-17T15:30", or "2020-01-02T00:22:35.180390").

Returns: works with updated_date less than or equal to the given date.

This field requires an OpenAlex Premium subscription to access. Click here to learn
more.

Get works updated before or on January 12th, 2023 (does not work without valid
API key):
https://api.openalex.org/works?filter=to_updated_date:2023-01-
12&api_key=myapikey

version

Value: a String with value publishedVersion , submittedVersion ,


acceptedVersion , or null

Returns: works where the chosen version exists within the locations . If null , it
returns works where no version is found in any of the locations.

Get works where a published version is available in at least one of the locations:
https://api.openalex.org/works?filter=version:publishedVersion
Search works
The best way to search for works is to use the search query parameter, which
searches across titles, abstracts, and fulltext. Example:

Get works with search term "dna" in the title, abstract, or fulltext:
https://api.openalex.org/works?search=dna

Fulltext search is available for a subset of works, see Work.has_fulltext for more
details.

You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.

Search a specific field


You can use search as a filter, allowing you to fine-tune the fields you're searching
over. To do this, you append .search to the end of the property you are filtering
for:

Get works with "cubist" in the title:


https://api.openalex.org/works?filter=title.search:cubist

The following fields can be searched within works:

Search filter Field that is searched

abstract.search abstract_inverted_index

display_name.search display_name

fulltext.search fulltext via n-grams

raw_affiliation_strings.search authorships.raw_affiliation_strings
Search filter Field that is searched

title.search display_name

title_and_abstract.search display_name and abstract_inverted_index

You can also use the filter default.search , which works the same as using the
search parameter.

These searches make use of stemming and stop-word removal. You can disable
this for searches on titles and abstracts. Learn how to do this here.

Why can't I search by name of related entity (author


name, institution name, etc.)?
Rather than searching for the names of entities related to works—such as authors,
institutions, and sources—you need to search by a more unique identifier for that
entity, like the OpenAlex ID. This means that there is a 2 step process:

1. Find the ID of the related entity. For example, if you're interested in works
associated with NYU, you could search the /institutions endpoint for that
name: https://api.openalex.org/institutions?search=nyu . Looking at the
first result, you'll see that the OpenAlex ID for NYU is I57206974 .
2. Use a filter with the /works endpoint to get all of the works:
https://api.openalex.org/works?filter=institutions.id:I57206974 .

Why can't you do this in just one step? Well, if you use the search term, "NYU," you
might end up missing the ones that use the full name "New York University," rather
than the initials. Sure, you could try to think of all possible variants and search for
all of them, but you might miss some, and you risk putting in search terms that let in
works that you're not interested in. Figuring out which works are actually
associated with the "NYU" you're interested shouldn't be your responsibility—that's
our job! We've done that work for you, so all the relevant works should be
associated with one unique ID.
Autocomplete works
You can autocomplete works to create a very fast type-ahead style search
function:

Autocomplete works with "tigers" in the title:


https://api.openalex.org/autocomplete/works?q=tigers

This returns a list of works titles with the author of each work set as the hint:

{
"results": [
{
"id": "https://openalex.org/W2125098916",
"display_name": "Crouching tigers, hidden prey: Sumatran tiger and
"hint": "Timothy G. O'Brien, Margaret F. Kinnaird, Hariyo T. Wibiso
"cited_by_count": 620,
"works_count": null,
"entity_type": "work",
"external_id": "https://doi.org/10.1017/s1367943003003172"
},
...
]
}

Read more about autocomplete.


Group works
You can group works with the group_by parameter:

Get counts of works by Open Access status:


https://api.openalex.org/works?group_by=oa_status

Or you can group using one the attributes below.

It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.

/works group_by attributes

The host_venue and alternate_host_venues properties have been deprecated in


favor of primary_location and locations . The attributes host_venue and
alternate_host_venues are no longer available in the Work object, and trying to
access them in filters or group-bys will return an error.

authors_count

authorships.affiliations.institution_ids

authorships.author.id (alias author.id )


authorships.author.orcid (alias author.orcid )
authorships.countries

authorships.institutions.country_code (alias institutions.country_code )


authorships.institutions.continent (alias institutions.continent )
authorships.institutions.is_global_south

authorships.institutions.id (alias institutions.id )


authorships.institutions.lineage

authorships.institutions.ror (alias institutions.ror )


authorships.institutions.type (alias institutions.type )
authorships.is_corresponding (alias: is_corresponding ): this marks whether
or not we have corresponding author information for a given work
apc_list.value

apc_list.currency

apc_list.provenance

apc_list.value_usd

apc_paid.value

apc_paid.currency

apc_paid.provenance

apc_paid.value_usd

best_oa_location.is_accepted

best_oa_location.is_published

best_oa_location.license

best_oa_location.source.host_organization

best_oa_location.source.id

best_oa_location.source.is_in_doaj

best_oa_location.source.issn

best_oa_location.source.type

best_oa_location.version

best_open_version

biblio.first_page

biblio.issue

biblio.last_page

biblio.volume

cited_by_count

cites

concepts_count

concepts.id

concepts.wikidata

corresponding_author_ids
corresponding_institution_ids

countries_distinct_count

fulltext_origin

grants.award_id

grants.funder

has_abstract

has_doi

has_fulltext

has_orcid

has_pmid

has_pmcid

has_ngrams (DEPRECATED)
has_references

indexed_in

is_retracted

is_paratext

journal

keywords.keyword

language

locations.is_accepted

locations.is_published

locations.source.host_institutions_lineage

locations.source.is_core

locations.source.is_in_doaj

locations.source.publisher_lineage

locations_count

mag_only

open_access.any_repository_has_fulltext

open_access.is_oa (alias is_oa )


open_access.oa_status (alias oa_status )
primary_location.is_accepted

primary_location.is_oa

primary_location.is_published

primary_location.license

primary_location.source.has_issn

primary_location.source.host_organization

primary_location.source.id

primary_location.source.is_core

primary_location.source.is_in_doaj

primary_location.source.issn

primary_location.source.publisher_lineage

primary_location.source.type

primary_location.version

primary_topic.id

primary_topic.domain.id

primary_topic.field.id

primary_topic.subfield.id

publication_year

repository

sustainable_development_goals.id

topics.id

topics.domain.id

topics.field.id

topics.subfield.id

type

type_crossref
Get N-grams
N-grams are groups of sequential words that occur in the text of a Work.

N-grams list the words and phrases that occur in the full text of a Work . We obtain
👏
them from Internet Archive's publicly (and generously ) available General Index
and use them to enable fulltext searches on the Works that have them, through
both the fulltext.search filter, and as an element of the more holistic search
parameter.

Note that while n-grams are derived from the fulltext of a Work, the presence of n-
grams for a given Work doesn't imply that the fulltext is available to you, the reader.
It only means the fulltext was available to Internet Archive for indexing.
Work.open_access is the place to go for information on public fulltext availability.

API Endpoint
The n-gram API endpoint is not currently in service. The n-grams are still used on our
backend to help power fulltext search. If you have any questions about this, please
submit a support ticket.

Fulltext Coverage
You can see which works we have full-text for using the has_fulltext filter. This
does not necessarily mean that the full text is available to you, dear reader; rather, it
means that we have indexed the full text and can use it to help power searches. If
you are trying to find the full text for yourself, try looking in open_access.oa_url .

We get access to the full text in one of two ways: either using an open-access PDF,
or using N-grams obtained from the Internet Archive. You can learn where a work's
full text came from at fulltext_origin .
About 57 million works have n-grams coverage through Internet Archive.
OurResearch is the first organization to host this data in a highly usable way, and
we are proud to integrate it into OpenAlex!

Curious about n-grams used in search? Browse them all via the API. Highly-cited
works and less recent works are more likely to have n-grams, as shown by the
coverage charts below:
Authors
People who create works

Authors are people who create works. You can get an author from the API like this:

Get a list of OpenAlex authors:


https://api.openalex.org/authors

The Canonical External ID for authors is ORCID; only a small percentage of authors
have one, but the percentage is higher for more recent works.

Our information about authors comes from MAG, Crossref, PubMed, ORCID, and
publisher websites, among other sources. To learn more about how we combine
this information to get OpenAlex Authors, see Author disambiguation

Authors are linked to works via the works.authorships property.

What's next
Learn more about what you can with authors:

The Author object


Get a single author
Get lists of authors
Filter authors
Search authors
Group authors
Author object
When you use the API to get a single author or lists of authors, this is what's
returned.

affiliations

List: List of objects, representing the affiliations this author has claimed in their
publications. Each object in the list has two properties:

institution : a dehydrated Institution object


years : a list of the years in which this author claimed an affiliation with this
institution

affiliations: [
{
institution: {
id: "https://openalex.org/I201448701",
ror: "https://ror.org/00cvxb145",
...
},
years: [2018, 2019, 2020]
},
{
institution: {
id: "https://openalex.org/I74973139",
ror: "https://ror.org/05x2bcf33",
...
},
years: [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]
}
]

cited_by_count

Integer: The total number 📄 Works that cite a work this author has created.
cited_by_count: 38
counts_by_year

List: Author.works_count and Author.cited_by_count for each of the last ten


years, binned by year. To put it another way: each year, you can see how many
works this author published, and how many times they got cited.

Any works or citations older than ten years old aren't included. Years with zero
works and zero citations have been removed so you will need to add those in if you
need them.

counts_by_year: [
{
year: 2022,
works_count: 0,
cited_by_count: 8
},
{
year: 2021,
works_count: 1,
cited_by_count: 252
},
...
{
year: 2012,
works_count: 7,
cited_by_count: 79
}
]

created_date

String: The date this Author object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.

created_date: "2017-08-08"

display_name
String: The name of the author as a single string.

display_name: "Jason Priem"

display_name_alternatives

List: Other ways that we've found this author's name displayed.

display_name_alternatives: [
"Jason R Priem"
]

id

String: The OpenAlex ID for this author.

id: "https://openalex.org/A5023888391"

ids

Object: All the external identifiers that we know about for this author. IDs are
expressed as URIs whenever possible. Possible ID types:

openalex (String: this author's OpenAlex ID. Same as Author.id )


orcid (String: this author's ORCID ID. Same as Author.orcid )
scopus (String: this author's Scopus author ID)
twitter (String: this author's Twitter handle)
wikipedia (String: this author's Wikipedia page)

Most authors are missing one or more ID types (either because we don't know the ID,
or because it was never assigned). Keys for null IDs are not displayed.
ids: {
openalex: "https://openalex.org/A5023888391",
orcid: "https://orcid.org/0000-0001-6187-6610",
scopus: "http://www.scopus.com/inward/authorDetails.url?authorID=3645
},

last_known_institution (deprecated)

This field has been deprecated. Its replacement is last_known_institutions .

last_known_institutions

List: List of Institution objects. This author's last known institutional affiliations. In
this context "last known" means that we took all the author's Works, sorted them by
publication date, and selected the most recent one. If there is only one affiliated
institution for this author for the work, this will be a list of length 1; if there are
multiple affiliations, they will all be included in the list.

Each item in the list is a dehydrated Institution object, and you can find more
documentation on the Institution page.

last_known_institutions: [{
id: "https://openalex.org/I4200000001",
ror: "https://ror.org/02nr0ka47",
display_name: "OurResearch",
country_code: "CA",
type: "nonprofit",
lineage: ["https://openalex.org/I4200000001"]
}],

orcid

String: The ORCID ID for this author. ORCID is a global and unique ID for authors.
This is the Canonical external ID for authors.
Compared to other Canonical IDs, ORCID coverage is relatively low in OpenAlex,
because ORCID adoption in the wild has been slow compared with DOI, for example.
This is particularly an issue when dealing with older works and authors.

orcid: "https://orcid.org/0000-0001-6187-6610"

summary_stats

Object: Citation metrics for this author

2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this author.

i10_index Integer: The i-10 index for this author.

While the 2-year mean citedness is normally a journal-level metric, it can be


calculated for any set of papers, so we include it for authors.

summary_stats: {
2yr_mean_citedness: 1.5295340589458237,
h_index: 45,
i10_index: 205
}

updated_date

String: The last time anything in this author object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.

updated_date: "2022-01-02T00:00:00"
works_api_url

String: A URL that will get you a list of all this author's works.

We express this as an API URL (instead of just listing the works themselves)
because sometimes an author's publication list is too long to reasonably fit into a
single author object.

works_api_url: "https://api.openalex.org/works?filter=author.id:A50238883

works_count

Integer: The number of 📄 Works this this author has created.


works_count: 38

This is updated a couple times per day. So the count may be slightly different than
what's in works when viewed like this.

x_concepts

x_concepts will be deprecated and removed soon. We will be replacing this


functionality with Topics instead.

List: The concepts most frequently applied to works created by this author. Each is
represented as a dehydrated Concept object, with one additional attribute:

score(Float): The strength of association between this author and the listed
concept, from 0-100.
x_concepts: [
{
id: "https://openalex.org/C41008148",
wikidata: null,
display_name: "Computer science",
level: 0,
score: 97.4
},
{
id: "https://openalex.org/C17744445",
wikidata: null,
display_name: "Political science",
level: 0,
score: 78.9
}
]

The Dehydrated Author object


The DehydratedAuthor is stripped-down Author object, with most of its
properties removed to save weight. Its only remaining properties are:

id

display_name

orcid
Get a single author
It's easy to get an author from from the API with: /authors/<entity_id> . Here's an
example:

Get the author with the OpenAlex ID A5023888391 :


https://api.openalex.org/authors/A5023888391

That will return an Author object, describing everything OpenAlex knows about
the author with that ID:

{
"id": "https://openalex.org/A5023888391",
"orcid": "https://orcid.org/0000-0001-6187-6610",
"display_name": "Jason Priem",
"display_name_alternatives": [],
"works_count": 53,
// other fields removed for brevity
}

You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.

Authors are also available via an alias: /people

External IDs
You can look up authors using external IDs such as an ORCID:

Get the author with this ORCID: https://orcid.org/0000-0002-1298-3089 :


https://api.openalex.org/authors/https://orcid.org/0000-0002-1298-3089

You can use the full ID or a shorter Uniform Resource Name (URN) format like so:
https://api.openalex.org/authors/orcid:0000-0002-1298-3089
Available external IDs for authors are:

External ID URN

ORCID orcid

Scopus scopus

Twitter twitter

Wikipedia wikipedia

Select fields
You can use select to limit the fields that are returned in an author object. More
details are here.

Display only the id and display_name and orcid for an author object
https://api.openalex.org/authors/A5023888391?
select=id,display_name,orcid
Get lists of authors
You can get lists of authors:

Get all authors in OpenAlex


https://api.openalex.org/authors

Which returns a response like this:

{
"meta": {
"count": 93011659,
"db_response_time_ms": 150,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/A5053780153",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/A5032245741",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}

Page and sort authors


By default we return 25 results per page. You can change this default and page
through works with the per-page and page parameters:

Get the second page of authors results, with 50 results returned per page
https://api.openalex.org/authors?per-page=50\&page=2

You also can sort results with the sort parameter:


Sort authors by cited by count, descending
https://api.openalex.org/authors?sort=cited\_by\_count:desc

Continue on to learn how you can filter and search lists of authors.

Sample authors
You can use sample to get a random batch of authors. Read more about sampling
and how to add a seed value here.

Get 25 random authors


https://api.openalex.org/authors?sample=25

Select fields
You can use select to limit the fields that are returned in a list of authors. More
details are here.

Display only the id and display_name and orcid within authors results
https://api.openalex.org/authors?select=id,display\_name,orcid
Filter authors
You can filter authors with the filter parameter:

Get authors that have an ORCID


https://api.openalex.org/authors?filter=has_orcid:true

It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query.

/authors attribute filters


You can filter using these attributes of the Author entity object (click each one to
view their documentation on the Author object page):

affiliations.institution.country_code

affiliations.institution.id

affiliations.institution.lineage

affiliations.institution.ror

affiliations.institution.type

cited_by_count

ids.openalex (alias: openalex )


last_known_institution.country_code

last_known_institution.id

last_known_institution.lineage

last_known_institution.ror

last_known_institution.type

orcid

scopus (the author's scopus ID, as an integer)


summary_stats.2yr_mean_citedness (accepts float, null, !null, can use range
queries such as < >)
summary_stats.h_index (accepts integer, null, !null, can use range queries)
summary_stats.i10_index (accepts integer, null, !null, can use range queries)
works_count

x_concepts.id (alias: concepts.id or concept.id ) -- will be deprecated soon

Want to filter by last_known_institution.display_name ? This is a two-step process:

1. Find the institution.id by searching institutions by display_name .


2. Filter works by last_known_institution.id .
To learn more about why we do it this way, see here.

/authors convenience filters


These filters aren't attributes of the Author object, but they're included to address
some common use cases:

default.search

Value: a search string

This works the same as using the search parameter for Authors.

display_name.search

Value: a search string

Returns: Authors whose display_name contains the given string; see the search
filter for details.

Get authors named "tupolev":


https://api.openalex.org/authors?filter=display_name.search:tupolev

has_orcid
Value: a Boolean ( true or false )

Returns: authors that have or lack an orcid, depending on the given value.

Get the authors that have an ORCID:


`` https://api.openalex.org/authors?filter=has_orcid:true

last_known_institution.continent

Value: a String with a valid continent filter

Returns: authors where where the last known institution is in the chosen continent.

Get authors where the last known institution is located in Africa


https://api.openalex.org/authors?filter=last_known_institution.continent:africa

last_known_institution.is_global_south

Value: a Boolean ( true or false )

Returns: works where at least one of the author's institutions is in the Global South.

Get authors where the last known institution is located in the Global South
https://api.openalex.org/authors?
filter=last_known_institution.is_global_south:true
Search authors
The best way to search for authors is to use the search query parameter, which
searches the display_name and the display_name_alternatives fields. Example:

Get works with the author name "Carl Sagan":


https://api.openalex.org/authors?search=carl sagan

Searching without a middle initial returns names with and without middle initials. So
a search for "John Smith" will also return "John W. Smith".

Names with diacritics are flexible as well. So a search for David Tarrago can return
David Tarragó, and a search for David Tarragó can return David Tarrago. When
searching with a diacritic, diacritic versions of the names are prioritized in order to
honor the original form of the author's name. Read more about our handling of
diacritics here.

You can read more in the search page in the API Guide. It will show you how relevance
score is calculated, how words are stemmed to improve search results, and how to do
complex boolean searches.

Search a specific field


You can also use search as a filter, by appending .search to the end of the
property you are filtering for:

Get authors with the name "john smith" in the display_name:


https://api.openalex.org/authors?filter=display\_name.search:john smith

When searching for authors, there is no difference when using the search
parameter or the filter display_name.search , since display_name is the only field
searched when finding authors.
Search filter Field that is searched

display_name.search display_name

You can also use the filter default.search , which works the same as using the
search parameter.

Autocomplete authors
You can autocomplete authors to create a very fast type-ahead style search
function:

Autocomplete authors with "ronald sw" in the display name:


https://api.openalex.org/autocomplete/authors?q=ronald sw

This returns a list of authors with their last known affiliated institution as the hint:

{
"results": [
{
"id": "https://openalex.org/A5007433649",
"display_name": "Ronald Swanstrom",
"hint": "University of North Carolina at Chapel Hill, USA",
"cited_by_count": 19142,
"works_count": 339,
"entity_type": "author",
"external_id": "https://orcid.org/0000-0001-7777-0773",
"filter_key": "authorships.author.id"
},
...
]
}

Read more about autocomplete.


Group authors
You can group authors with the group_by parameter:

Get counts of authors by the last known institution continent:


https://api.openalex.org/authors?
group_by=last_known_institution.continent
``

Or you can group using one the attributes below.

It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.

/authors group_by attributes


affiliations.institution.country_code

affiliations.institution.id

affiliations.institution.lineage

affiliations.institution.ror

affiliations.institution.type

cited_by_count

has_orcid

last_known_institution.continent

last_known_institution.country_code

last_known_institution.id

last_known_institution.is_global_south

last_known_institution.lineage

last_known_institution.ror

last_known_institution.type

summary_stats.2yr_mean_citedness

summary_stats.h_index
summary_stats.i10_index

works_count
Limitations
Works with more than 100 authors are
truncated
When retrieving a list of works in the API, the authorships list within each work
will be cut off at 100 authorships objects in order to keep things running well. When
this happens the boolean value is_authors_truncated will be available and set to
true . This affects a small portion of OpenAlex, as there are around 35,000 works
with more than 100 authors. This limitation does not apply to the data snapshot.

Example list of works with truncated authors


https://api.openalex.org/works?filter=authors\_count:>100

To see the full list of authors, go to the individual record for the work, which is
never truncated.

Work with all 249 authors available


https://api.openalex.org/works/W2168909179

This affects filtering as well. So if you filter works using an author ID or ROR, you
will not receive works where that author is listed further than 100 places down on
the list of authors. We plan to change this in the future, so that filtering works as
expected.
Author disambiguation
Our information about authors comes from MAG, Crossref, PubMed, ORCID, and
publisher websites. We use an algorithm to disambiguate authors; this uses an
author’s name, their publication record, their citation patterns, and (where available)
their ORCID.

So for example, if J. Schmidt and John Jacob Jingleheimer Schmidt both write
about 19th-century ketchup production, we’ll treat them as one author–but we won’t
include the JJJ Schmidt who writes about weasel migration (even though his name
is their name, too).

Our methods, code, and models are all, of course, fully open. You can find technical
documentation on the author disambiguation model on Github here. You will also
find code and links to training data there.

In late July, 2023, we switched to a new, more accurate author disambiguation


system, with a better machine-learning model to identify authors, a smarter
strategy for author assignments for new works, and a much better integration with
ORCID data when it is available. As part of that switch, we deprecated all of the old
OpenAlex Author IDs, and assigned new Author IDs to all authors. You can find the
old Author IDs, along with their associated works, as a data dump here. New Author
IDs have a numeric component of their OpenAlex ID >5000000000. The new
Author IDs have been used since late July, 2023, and in the data snapshots starting
in August, 2023.

The "null" Author ID


You may come across an OpenAlex Author with ID A9999999999 , particularly if you
are using the data snapshot. We use this author ID internally within the
disambiguation system as our "null author". It is assigned to all authorships that do
not go through disambiguation. Usually, this is because we did not receive an
author name for that authorship, the name was too short to disambiguate, or it was
a phrase we have specifically called out to ignore in our disambiguation process
(for example, "'Unknown Unknown" or "Unknown Author").
Sources
Journals and repositories that host works

Sources are where works are hosted. OpenAlex indexes about 249,000 sources.
There are several types, including journals, conferences, preprint repositories, and
institutional repositories.

Get a list of OpenAlex sources:


https://api.openalex.org/sources

The Canonical External ID for sources is ISSN-L, which is a special "main" ISSN
assigned to every sources (sources tend to have multiple ISSNs). About 90% of
sources in OpenAlex have an ISSN-L or ISSN.

Our information about sources comes from Crossref, the ISSN Network, and MAG.
These datasets are joined automatically where possible, but there’s also a lot of
manual combining involved. We do not curate journals, so any journal that is
available in the data sources should make its way into OpenAlex.

Several sources may host the same work. OpenAlex reports both the primary host
source (generally wherever the version of record lives), and alternate host sources
(like preprint repositories).

Sources are linked to works via the works.primary_location and


works.locations properties.

Check out the Japanese Sources tutorial, a Jupyter notebook showing how to use
Python and the API to learn about all of the sources in a country.

What's next
Learn more about what you can do with sources:

The Source object


Get a single source
Get lists of sources
Filter sources
Search sources
Group sources
Source object
These are the fields in a source object. When you use the API to get a single source
or lists of sources, this is what's returned.

abbreviated_title
String: An abbreviated title obtained from the ISSN Centre.

abbreviated_title: "J. addict. med. ther. sci."

alternate_titles
Array: Alternate titles for this source, as obtained from the ISSN Centre and
individual work records, like Crossref DOIs, that carry the source name as a string.
These are commonly abbreviations or translations of the source's canonical name.

alternate_titles: [
"ACRJ"
]

apc_prices
List: List of objects, each with price (Integer) and currency (String).

Article processing charge information, taken directly from DOAJ.

apc_prices: [
{
price: 3920,
currency: "GBP"
}
]
apc_usd
Integer: The source's article processing charge in US Dollars, if available from
DOAJ.

The apc_usd value is calculated by taking the APC price (see apc_prices ) with a
currency of USD if it is available. If it's not available, we convert the first available
value from apc_prices into USD, using recent exchange rates.

apc_usd: 5200

cited_by_count

Integer: The total number of Works that cite a Work hosted in this source.

cited_by_count: 133702

country_code

String: The country that this source is associated with, represented as an ISO two-
letter country code.

country_code: "GB"

counts_by_year

List: works_count and cited_by_count for each of the last ten years, binned by
year. To put it another way: each year, you can see how many new works this
source started hosting, and how many times any work in this source got cited.

If the source was founded less than ten years ago, there will naturally be fewer than
ten years in this list. Years with zero citations and zero works have been removed
so you will need to add those in if you need them.

counts_by_year: [
{
year: 2021,
works_count: 4338,
cited_by_count: 127268
},
{
year: 2020,
works_count: 4363,
cited_by_count: 119531
},

// and so forth
]

created_date

String: The date this Source object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.

created_date: "2017-08-08"

display_name

String: The name of the source.

display_name: "PeerJ"

homepage_url

String: The starting page for navigating the contents of this source; the homepage
for this source's website.

homepage_url: "http://www.peerj.com/"
host_organization

String: The host organization for this source as an OpenAlex ID. This will be an
Institution.id if the source is a repository, and a Publisher.id if the source is
a journal, conference, or eBook platform (based on the type field).

id: "https://openalex.org/P4310320595"

host_organization_lineage

List: OpenAlex IDs — See Publisher.lineage . This will only be included if the
host_organization is a publisher (and not if the host_organization is an
institution).

host_organization_lineage: [
"https://openalex.org/P4310321285",
"https://openalex.org/P4310319900",
"https://openalex.org/P4310319965"
]

host_organization_name

String: The display_name from the host_organization, shown for convenience.

host_organization_name: "Elsevier BV"

id

String: The OpenAlex ID for this source.

id: "https://openalex.org/S1983995261"
ids

Object: All the external identifiers that we know about for this source. IDs are
expressed as URIs whenever possible. Possible ID types:

fatcat (String: this source's Fatcat ID)


issn (List: a list of this source's ISSNs. Same as Source.issn )
issn_l (String: this source's ISSN-L. Same as Source.issn_l )
mag (Integer: this source's Microsoft Academic Graph ID)
openalex (String: this source's OpenAlex ID. Same as Source.id )
wikidata (String: this source's Wikidata ID)

Many sources are missing one or more ID types (either because we don't know the ID,
or because it was never assigned). Keys for null IDs are not displayed.

Example

ids: {
openalex: "https://openalex.org/S1983995261",
issn_l: "2167-8359",
issn: [
"2167-8359"
],
mag: 1983995261,
fatcat: "https://fatcat.wiki/container/z3ijzhu7zzey3f7jwws7r
wikidata: "https://www.wikidata.org/entity/Q96326029"
}

is_core

Boolean: Whether this source is identified as a "core source" by CWTS, used in the
Open Leiden Ranking of universities around the world. The list of core sources can
be found here.
is_core: true

is_in_doaj

Boolean: Whether this is a journal listed in the Directory of Open Access Journals
(DOAJ).

is_in_doaj: true

is_oa

Boolean: Whether this is currently fully-open-access source. This could be true


for a preprint repository where everything uploaded is free to read, or for a Gold or
Diamond open access journal, where all newly published Works are available for
free under an open license.

We say "currently" because the status of a source can change over time. It's
common for journals to "flip" to Gold OA, after which they may make only future
articles open or also open their back catalogs. It's entirely possible for a source to
say is_oa: true , but for an article from last year to require a subscription.

is_oa: true

issn

List: The ISSNs used by this source. Many publications have multiple ISSNs , so
ISSN-L should be used when possible.

issn: ["2167-8359"]

issn_l
String: The ISSN-L identifying this source. This is the Canonical External ID for
sources.

ISSN is a global and unique ID for serial publications. However, different media
versions of a given publication (e.g., print and electronic) often have different
ISSNs. This is why we can't have nice things. The ISSN-L or Linking ISSN solves the
problem by designating a single canonical ISSN for all media versions of the title.
It's usually the same as the print ISSN.

issn_l: "2167-8359"

societies
Array: Societies on whose behalf the source is published and maintained, obtained
from our crowdsourced list. Thanks!

societies: [
{
"url": "http://www.counseling.org/",
"organization": "American Counseling Association on behalf of the
}
]

summary_stats

Object: Citation metrics for this source

2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this source.

i10_index Integer: The i-10 index for this source.

While the h-index and the i-10 index are normally author-level metrics, they can be
calculated for any set of papers, so we include them for sources.
summary_stats: {
2yr_mean_citedness: 1.5295340589458237,
h_index: 105,
i10_index: 5045
}

type

String: The type of source, which will be one of: journal , repository ,
conference , ebook platform , book series , metadata , or other .

type: "journal"

updated_date

String: The last time anything in this Source object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.

updated_date: "2022-01-02T00:00:00"

works_api_url

String: A URL that will get you a list of all this source's Works .

We express this as an API URL (instead of just listing the works themselves)
because sometimes a source's publication list is too long to reasonably fit into a
single Source object.

works_api_url: "https://api.openalex.org/works?filter=primary_location.so

works_count
Integer: The number of Works this source hosts.

works_count: 20184

x_concepts

x_concepts will be deprecated and removed soon. We will be replacing this


functionality with Topics instead.

List: The Concepts most frequently applied to works hosted by this source. Each is
represented as a dehydrated Concept object, with one additional attribute:

score (Float): The strength of association between this source and the listed
concept, from 0-100.

x_concepts: [
{
id: "https://openalex.org/C86803240",
wikidata: null,
display_name: "Biology",
level: 0,
score: 86.7
},
{
id: "https://openalex.org/C185592680",
wikidata: null,
display_name: "Chemistry",
level: 0,
score: 51.4
},

// and so forth
]

The DehydratedSource object


The DehydratedSource is stripped-down Source object, with most of its
properties removed to save weight. Its only remaining properties are:

display_name

host_organization

host_organization_lineage

host_organization_name

id

is_core

is_in_doaj

is_oa

issn

issn_l

type
Get a single source
It's easy to get a source from from the API with: /sources/<entity_id> . Here's an
example:

Get the source with the OpenAlex ID S137773608 :


https://api.openalex.org/sources/S137773608

That will return an Source object, describing everything OpenAlex knows about
the source with that ID:

{
"id": "https://openalex.org/S137773608",
"issn_l": "0028-0836",
"issn": [
"1476-4687",
"0028-0836"
],
"display_name": "Nature",
// other fields removed for brevity
}

You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.

Sources are also available via an alias: /journals

External IDs
You can look up journals using external IDs such as an ISSN:

Get the source with ISSN: 2041-1723 :


https://api.openalex.org/sources/issn:2041-1723

Available external IDs for sources are:


External ID URN

ISSN issn

Fatcat fatcat

Microsoft Academic Graph (MAG) mag

Wikidata wikidata

Select fields
You can use select to limit the fields that are returned in a source object. More
details are here.

Display only the id and display_name for a source object


https://api.openalex.org/sources/S137773608?select=id,display_name
Get lists of sources
You can get lists of sources:

Get all sources in OpenAlex


https://api.openalex.org/sources

Which returns a response like this:

{
"meta": {
"count": 226727,
"db_response_time_ms": 32,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/S2764455111",
"issn_l": null,
"issn": null,
"display_name": "PubMed Central",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/S4306400806",
"issn_l": null,
"issn": null,
"display_name": "PubMed Central - Europe PMC",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}

Page and sort sources


By default we return 25 results per page. You can change this default and page
through sources with the per-page and page parameters:
Get the second page of sources results, with 50 results returned per page
https://api.openalex.org/sources?per-page=50&page=2

You also can sort results with the sort parameter:

Sort sources by cited by count, descending


https://api.openalex.org/sources?sort=cited_by_count:desc

Continue on to learn how you can filter and search lists of sources.

Sample sources
You can use sample to get a random batch of sources. Read more about sampling
and how to add a seed value here.

Get 10 random sources


https://api.openalex.org/sources?sample=10

Select fields
You can use select to limit the fields that are returned in a list of sources. More
details are here.

Display only the id , display_name and issn within sources results


https://api.openalex.org/sources?select=id,display_name,issn
Filter sources
You can filter sources with the filter parameter:

Get sources that have an ISSN


https://api.openalex.org/sources?filter=has_issn:true

It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query

/sources attribute filters


You can filter using these attributes of the Source entity object (click each one to
view their documentation on the Source object page):

apc_prices.currency

apc_prices.price

apc_usd

cited_by_count

country_code

host_organization (alias: host_organization.id )


host_organization_lineage — Use this with a publisher ID to find works from
that publisher and all of its children.
ids.openalex (alias: openalex )
is_core

is_in_doaj

is_oa

issn

publisher — Requires exact match. Use the host_organization_lineage filter


instead if you want to find works from a publisher and all of its children.
summary_stats.2yr_mean_citedness (accepts float, null, !null, can use range
queries such as < >)
summary_stats.h_index (accepts integer, null, !null, can use range queries)
summary_stats.i10_index (accepts integer, null, !null, can use range queries)
type

works_count

x_concepts.id (alias: concepts.id or concept.id ) -- will be deprecated soon

Want to filter by host_organization.display_name ? This is a two-step process:

1. Find the host organization's ID by searching by display_name in


Publishers or Institutions, depending on which type you are looking for.
2. Filter works by host_organization.id .
To learn more about why we do it this way, see here.

/sources convenience filters


These filters aren't attributes of the Source object, but they're included to address
some common use cases:

continent

Value: a String with a valid continent filter

Returns: sources that are associated with the chosen continent.

Get sources that are associated with Asia


https://api.openalex.org/sources?filter=continent:asia

default.search

Value: a search string

This works the same as using the search parameter for Sources.

display_name.search
Value: a search string

Returns: sources with a display_name containing the given string; see the search
page for details.

Get sources with names containing "Neurology":


https://api.openalex.org/sources?filter=display_name.search:Neurology ``

In most cases, you should use the search parameter instead of this filter because it
uses a better search algorithm.

has_issn

Value: a Boolean ( true or false )

Returns: sources that have or lack an ISSN, depending on the given value.

Get sources without ISSNs:


https://api.openalex.org/sources?filter=has_issn:false ``

is_global_south

Value: a Boolean ( true or false )

Returns: sources that are associated with the Global South.

Get sources that are located in the Global South


https://api.openalex.org/sources?filter=is\_global\_south:true
Search sources
The best way to search for sources is to use the search query parameter, which
searches across display_name , alternate_titles , and abbreviated_title .
Example:

Search for the abbreviated version of the Journal of the American Chemical
Society " jacs ":
https://api.openalex.org/sources?search=jacs

You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.

Search a specific field


You can also use search as a filter, allowing you to fine-tune the fields you're
searching over. To do this, you append .search to the end of the property you are
filtering for:

Get sources with "nature" in the title:


https://api.openalex.org/sources?filter=display_name.search:nature

The following fields can be searched as a filter within sources:

Search filter Field that is searched

display_name.search display_name

You can also use the filter default.search , which works the same as using the
search parameter.

Autocomplete sources
You can autocomplete sources to create a very fast type-ahead style search
function:

Autocomplete sources with "neuro" in the display_name:


https://api.openalex.org/autocomplete/sources?q=neuro

This returns a list of sources with the publisher set as the hint:

{
"results": [
{
"id": "https://openalex.org/S5555990",
"display_name": "The Journal of Neuroscience",
"hint": "Society for Neuroscience",
"cited_by_count": 4274712,
"works_count": 40376,
"entity_type": "source",
"external_id": "0270-6474"
},
// more results
]
}

Read more in the autocomplete page in the API guide.


Group sources
You can group sources with the group_by parameter:

Get counts of sources by publisher:


https://api.openalex.org/sources?group_by=publisher

Or you can group using one the attributes below.

It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.

/sources group_by attributes


apc_prices.currency

apc_usd

cited_by_count

has_issn

continent

country_code

host_organization (alias: host_organization.id )


host_organization_lineage (alias: host_organization.id )
is_global_south

is_core

is_in_doaj

is_oa

issn

publisher

summary_stats.2yr_mean_citedness

summary_stats.h_index

summary_stats.i10_index
type

works_count
Institutions
Universities and other organizations to which authors claim affiliations

Institutions are universities and other organizations to which authors claim


affiliations. OpenAlex indexes about 109,000 institutions.

Get a list of OpenAlex institutions:


https://api.openalex.org/institutions

The Canonical External ID for institutions is the ROR ID. All institutions in OpenAlex
have ROR IDs.

Our information about institutions comes from metadata found in Crossref, PubMed,
ROR, MAG, and publisher websites. In order to link institutions to works, we parse
every affiliation listed by every author. These affiliation strings can be quite messy,
so we’ve trained an algorithm to interpret them and extract the actual institutions
with reasonably high reliability.

For a simple example: we will treat both “MIT, Boston, USA” and “Massachusetts
Institute of Technology” as the same institution (https://ror.org/042nb2s44).

Institutions are linked to works via the works.authorships property.

What's next
Learn more about what you can do with institutions:

The Institution object


Get a single institution
Get lists of institutions
Filter institutions
Search institutions
Group institutions
Institution object
These are the fields in an institution object. When you use the API to get a single
institution or lists of institutions, this is what's returned.

associated_institutions

List: Institutions related to this one. Each associated institution is represented


as a dehydrated Institution object, with one extra property:

relationship (String): The type of relationship between this institution and the
listed institution. Possible values: parent , child , and related .

Institution associations and the relationship vocabulary come from ROR's


relationships .

associated_institutions: [
{
id: "https://openalex.org/I2802101240",
ror: "https://ror.org/0483mr804",
display_name: "Carolinas Medical Center",
country_code: "US",
type: "healthcare",
relationship: "related"
},
{
id: "https://openalex.org/I69048370",
ror: "https://ror.org/01s91ey96",
display_name: "Renaissance Computing Institute",
country_code: "US",
type: "education",
relationship: "related"
},

// and so forth
]

cited_by_count
Integer: The total number Works that cite a work created by an author affiliated
with this institution. Or less formally: the number of citations this institution has
collected.

cited_by_count: 21199844

country_code

String: The country where this institution is located, represented as an ISO two-
letter country code.

country_code: "US"

counts_by_year

List: works_count and cited_by_count for each of the last ten years, binned by
year. To put it another way: each year, you can see how many new works this
institution put out, and how many times any work affiliated with this institution got
cited.

Years with zero citations and zero works have been removed so you will need to
add those in if you need them.

counts_by_year: [
{
year: 2022,
works_count: 133,
cited_by_count: 32731
},
{
year: 2021,
works_count: 12565,
cited_by_count: 2180827
},

// and so forth
]
created_date

String: The date this Institution object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.

created_date: "2017-08-08"

display_name

String: The primary name of the institution.

display_name: "University of North Carolina at Chapel Hill"

display_name_acronyms

List: Acronyms or initialisms that people sometimes use instead of the full
display_name .

display_name_acronyms:["UNC"]

display_name_alternatives

List: Other names people may use for this institution.

display_name_alternatives: [
"UNC-Chapel Hill"
]

geo

Object: A bunch of stuff we know about the location of this institution:


city (String): The city where this institution lives.
geonames_city_id (String): The city where this institution lives, as a GeoNames
database ID.
region (String): The sub-national region (state, province) where this institution
lives.
country_code (String): The country where this institution lives, represented as
an ISO two-letter country code.
country (String): The country where this institution lives.
latitude (Float): Does what it says.
longitude (Float): Does what it says.

geo: {
city: "Chapel Hill",
geonames_city_id: "4460162",
region: "North Carolina",
country_code: "US",
country: "United States",
latitude: 35.9083,
longitude: -79.0492
}

homepage_url

String: The URL for institution's primary homepage.

homepage_url: "http://www.unc.edu/"

id

String: The OpenAlex ID for this institution.

id: "https://openalex.org/I114027177"

ids
Object: All the external identifiers that we know about for this institution. IDs are
expressed as URIs whenever possible. Possible ID types:

grid (String: this institution's GRID ID)


mag (Integer: this institution's Microsoft Academic Graph ID)
openalex (String: this institution's OpenAlex ID. Same as Institution.id )
ror (String: this institution's ROR ID. Same as Institution.ror )
wikipedia (String: this institution's Wikipedia page URL)
wikidata (String: this institution's Wikidata ID)

Many institution are missing one or more ID types (either because we don't know the
ID, or because it was never assigned). Keys for null IDs are not displayed.

ids: {
openalex: "https://openalex.org/I114027177",
ror: "https://ror.org/0130frc33",
grid: "grid.10698.36",
wikipedia: "https://en.wikipedia.org/wiki/University%20of%20North%20C
wikidata: "https://www.wikidata.org/wiki/Q192334",
mag: 114027177
}

image_thumbnail_url

String: Same as image_url , but it's a smaller image.

image_thumbnail_url: "https://upload.wikimedia.org/wikipedia/en/thumb/5/5

is_super_system

Boolean: True if this institution is a "super system". This includes large university
systems such as the University of California System (
https://openalex.org/I2803209242 ), as well as some governments and
multinational companies.
We have this special flag for these institutions so that we can exclude them from
other institutions' lineage , which we do because these super systems are not
generally relevant in group-by results when you're looking at ranked lists of
institutions.

The list of institution IDs marked as super systems can be found in this file.

image_url

String: URL where you can get an image representing this institution. Usually this is
hosted on Wikipedia, and usually it's a seal or logo.

image_url: "https://upload.wikimedia.org/wikipedia/en/5/5c/University_of_

international

Object: The institution's display name in different languages. Derived from the
wikipedia page for the institution in the given language.

display_name (Object)
key (String): language code in wikidata language code format. Full list of
languages is here.
value (String): display_name in the given language

international: {
display_name: {
"ar": "‫"جامعة نورث كارولينا في تشابل هيل‬,
"en": "University of North Carolina at Chapel Hill",
"es": "Universidad de Carolina del Norte en Chapel Hill",
"zh-cn": "北卡罗来纳大学教堂山分校",
...
}
}

lineage
List: OpenAlex IDs of institutions. The list will include this institution's ID, as well as
any parent institutions. If this institution has no parent institutions, this list will only
contain its own ID.

This information comes from ROR's relationships , specifically the Parent/Child


relationships.

Super systems are excluded from the lineage. See is_super_system above.

id: "https://openalex.org/I170203145",
...
lineage: [
"https://openalex.org/I170203145",
"https://openalex.org/I90344618"
]

repositories

List: Repositories ( Sources with type: repository ) that have this institution as
their host_organization

repositories: [
{
id: "https://openalex.org/S4306402521",
display_name: "University of Minnesota Digital Conservancy (Unive
host_organization: "https://openalex.org/I130238516",
host_organization_name: "University of Minnesota",
host_organization_lineage: ["https://openalex.org/I130238516"]
}
// and so forth
]

roles

List: List of role objects, which include the role (one of institution , funder , or
publisher ), the id (OpenAlex ID), and the works_count .
In many cases, a single organization does not fit neatly into one role. For example,
Yale University is a single organization that is a research university, funds research
studies, and publishes an academic journal. The roles property links the
OpenAlex entities together for a single organization, and includes counts for the
works associated with each role.

The roles list of an entity (Funder, Publisher, or Institution) always includes itself.
In the case where an organization only has one role, the roles will be a list of
length one, with itself as the only item.

roles: [
{
role: "funder",
id: "https://openalex.org/F4320308380",
works_count: 1004,
},
{
role: "publisher",
id: "https://openalex.org/P4310315589",
works_count: 13986,
},
{
role: "institution",
id: "https://openalex.org/I32971472",
works_count: 250031,
}
]

ror

String: The ROR ID for this institution. This is the Canonical External ID for
institutions.

The ROR (Research Organization Registry) identifier is a globally unique ID for


research organization. ROR is the successor to GRiD, which is no longer being
updated.

ror: "https://ror.org/0130frc33"
summary_stats

Object: Citation metrics for this institution

2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this institution.

i10_index Integer: The i-10 index for this institution.

While the h-index and the i-10 index are normally author-level metrics and the 2-
year mean citedness is normally a journal-level metric, they can be calculated for
any set of papers, so we include them for institutions.

summary_stats: {
2yr_mean_citedness: 5.065784263815827,
h_index: 985,
i10_index: 176682
}

type

String: The institution's primary type, using the ROR "type" controlled vocabulary.

Possible values are: Education , Healthcare , Company , Archive , Nonprofit ,


Government , Facility , and Other .

type: "education"

updated_date

String: The last time anything in this Institution changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.

updated_date: "2022-01-02T00:27:23.088909"

works_api_url

String: A URL that will get you a list of all the Works affiliated with this institution.

We express this as an API URL (instead of just listing the Works themselves)
because most institutions have way too many works to reasonably fit into a single
return object.

works_api_url: "https://api.openalex.org/works?filter=institutions.id:I114

works_count

Integer: The number of Works created by authors affiliated with this institution. Or
less formally: the number of works coming out of this institution.

works_count: 202704

x_concepts

x_concepts will be deprecated and removed soon. We will be replacing this


functionality with Topics instead.

List: The Concepts most frequently applied to works affiliated with this institution.
Each is represented as a dehydrated Concept object, with one additional attribute:

score(Float): The strength of association between this institution and the listed
concept, from 0-100.
x_concepts: [
{
id: "https://openalex.org/C86803240",
wikidata: null,
display_name: "Biology",
level: 0,
score: 86.7
},
{
id: "https://openalex.org/C185592680",
wikidata: null,
display_name: "Chemistry",
level: 0,
score: 51.4
},

// and so forth
]

The DehydratedInstitution object


The DehydratedInstitution is a stripped-down Institution object, with most of
its properties removed to save weight. Its only remaining properties are:

country_code

display_name

id

lineage

ror

type
Get a single institution
It's easy to get an institution from from the API with: /institutions/<entity_id> .
Here's an example:

Get the institution with the OpenAlex ID I27837315 :


https://api.openalex.org/institutions/I27837315

That will return an Institution object, describing everything OpenAlex knows


about the institution with that ID:

{
"id": "https://openalex.org/I27837315",
"ror": "https://ror.org/00jmfr291",
"display_name": "University of Michigan–Ann Arbor",
"country_code": "US",
"type": "education",
// other fields removed for brevity
}

You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.

External IDs
You can look up institutions using external IDs such as a ROR ID:

Get the institution with ROR ID https://ror.org/00cvxb145 :


https://api.openalex.org/institutions/ror:https://ror.org/00cvxb145

Available external IDs for institutions are:

External ID URN

ROR ror

Microsoft Academic Graph (MAG) mag


External ID URN

Wikidata wikidata

Select fields
You can use select to limit the fields that are returned in an institution object.
More details are here.

Display only the id and display_name for an institution object


https://api.openalex.org/institutions/I27837315?select=id,display_name
Get lists of institutions
You can get lists of institutions:

Get all institutions in OpenAlex


https://api.openalex.org/institutions

Which returns a response like this:

{
"meta": {
"count": 108618,
"db_response_time_ms": 32,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/I27837315",
"ror": "https://ror.org/00jmfr291",
"display_name": "University of Michigan–Ann Arbor",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/I201448701",
"ror": "https://ror.org/00cvxb145",
"display_name": "University of Washington",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}

Page and sort institutions


By default we return 25 results per page. You can change this default and page
through institutions with the per-page and page parameters:
Get the second page of institutions results, with 50 results returned per page
https://api.openalex.org/institutions?per-page=50&page=2

You also can sort results with the sort parameter:

Sort institutions by cited by count, descending


https://api.openalex.org/institutions?sort=cited_by_count:desc

Continue on to learn how you can filter and search lists of institutions.

Sample institutions
You can use sample to get a random batch of institutions. Read more about
sampling and how to add a seed value here.

Get 50 random institutions


https://api.openalex.org/institutions?sample=50&per-page=50

Select fields
You can use select to limit the fields that are returned in a list of institutions. More
details are here.

Display only the id , ror , and display_name within institutions results


https://api.openalex.org/institutions?select=id,display_name,ror
Filter institutions
You can filter institutions with the filter parameter:

Get institutions that are located in Canada


https://api.openalex.org/institutions?filter=country_code:ca

It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query

/institutions attribute filters


You can filter using these attributes of the Institution entity object (click each
one to view their documentation on the Institution object page):

cited_by_count

country_code

is_super_system

lineage : OpenAlex ID for an Institution

openalex : the OpenAlex ID of the Institution


repositories.host_organization : OpenAlex ID for an Institution

repositories.host_organization_lineage : OpenAlex ID for an Institution

repositories.id : the OpenAlex ID of a repository (a Source )


ror : the ROR ID of the Institution
summary_stats.2yr_mean_citedness (accepts float, null, !null, can use range
queries such as < >)
summary_stats.h_index (accepts integer, null, !null, can use range queries)
summary_stats.i10_index (accepts integer, null, !null, can use range queries)
type

works_count

x_concepts.id (alias: concepts.id or concept.id ) -- will be deprecated soon


/institutions convenience filters
These filters aren't attributes of the Institution object, but they're included to
address some common use cases:

continent

Value: a String with a valid continent filter

Returns: institutions that are located in the chosen continent.

Get institutions that are located in South America


https://api.openalex.org/institutions?filter=continent:south_america

default.search

Value: a search string

This works the same as using the search parameter for Institutions.

display_name.search

Value: a search string

Returns: institutions with a display_name containing the given string; see the
search page for details.

Get institutions with names containing "technology":


https://api.openalex.org/institutions?
filter=display_name.search:technology

In most cases, you should use the search parameter instead of this filter because it
uses a better search algorithm.

has_ror
Value: a Boolean ( true or false )

Returns: institutions that have or lack a ROR ID, depending on the given value.

Get institutions without ROR IDs:


https://api.openalex.org/institutions?filter=has_ror:false

is_global_south

Value: a Boolean ( true or false )

Returns: institutions that are located in the Global South.

Get institutions that are located in the Global South


https://api.openalex.org/institutions?filter=is_global_south:true
Search institutions
The best way to search for institutions is to use the search query parameter,
which searches the display_name , the display_name_alternatives , and the
display_name_acronyms . Example:

Search institutions for San Diego State University:


https://api.openalex.org/institutions?search=san diego state university

You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.

Search a specific field


You can also use search as a filter, allowing you to fine-tune the fields you're
searching over. To do this, you append .search to the end of the property you are
filtering for:

Get institutions with "florida" in the display_name :


https://api.openalex.org/institutions?filter=display_name.search:florida

The following field can be searched as a filter within institutions:

Search filter Field that is searched

display_name.search display_name

You can also use the filter default.search , which works the same as using the
search parameter.

Autocomplete institutions
You can autocomplete institutions to create a very fast type-ahead style search
function:

Autocomplete institutions with "harv" in the display_name :


https://api.openalex.org/autocomplete/institutions?q=harv

This returns a list of institutions with the institution location set as the hint:

{
"results": [
{
"id": "https://openalex.org/I136199984",
"display_name": "Harvard University",
"hint": "Cambridge, USA",
"cited_by_count": 37792327,
"works_count": 242547,
"entity_type": "institution",
"external_id": "https://ror.org/03vek6s52"
},
...
]
}

Read more in the autocomplete page in the API guide.


Group institutions
You can group institutions with the group_by parameter:

Get counts of institutions by country code:


https://api.openalex.org/institutions?group_by=country_code

Or you can group using one the attributes below.

It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.

/institutions group_by attributes


cited_by_count

continent

country_code

has_ror

is_global_south

is_super_system

lineage

repositories.host_organization

summary_stats.2yr_mean_citedness

summary_stats.h_index

summary_stats.i10_index

type

works_count
Topics
Topics assigned to works

Works in OpenAlex are tagged with Topics using an automated system that takes
into account the available information about the work, including title, abstract,
source (journal) name, and citations. There are around 4,500 Topics. Works are
assigned topics using a model that assigns scores for each topic for a work. The
highest-scoring topic is that work's primary_topic . We also provide additional
highly ranked topics for works, in Work.topics.

To learn more about how OpenAlex topics work in general, see the Topics page at
OpenAlex help pages.

For a detailed description of the methods behind OpenAlex Topics, see our paper:
"OpenAlex: End-to-End Process for Topic Classification". The code and model are
available at https://github.com/ourresearch/openalex-topic-classification .

What's next
Learn more about what you can do with topics:

The Topic object


Get a single topic
Get lists of topics
Filter topics
Search topics
Group topics
Topic object
These are the fields in a topic object. When you use the API to get a single topic or
lists of topics, this is what's returned.

description

String: A description of this topic, generated by AI.

description: "This cluster of papers explores the intersection of artific

display_name

String: The English-language label of the topic.

display_name: "Artificial Intelligence in Medicine"

domain

Object: The ID and the name ( display_name ) for the domain of this topic. The
domain is the highest level in the "domain, field, subfield, topic" system, which
means it is the least granular. See the topics overview for more explanation and a
diagram.

domain: {
id: 4,
display_name: "Health Sciences"
}

field

Object: The ID and the name ( display_name ) for the field of this topic. The field is
the second-highest level in the "domain, field, subfield, topic" system, which means
it is the second-least granular. See the topics overview for more explanation and a
diagram.

field: {
id: 27,
display_name: "Medicine"
}

id

String: The OpenAlex ID for this topic.

id: "https://openalex.org/T11636"

ids

Object: All the external identifiers that we know about for this topic. IDs are
expressed as URIs whenever possible. Possible ID types:

openalex (String: this topic's OpenAlex ID. Same as Topic.id )


wikipedia (String: this topic's Wikipedia page URL)

ids: {
openalex: "https://openalex.org/T11636",
wikipedia: "https://en.wikipedia.org/wiki/Artificial_intelligence_in_
}

keywords

List: Keywords consisting of one or several words each, meant to represent the
content of the papers in the topic. These keywords were generated as part of the AI
model. For now, they are provided as-is, but we will be providing more support and
documenting them more thoroughly.
keywords: [
"Artificial Intelligence",
"Machine Learning",
"Healthcare",
"Medical Imaging",
"Clinical Decision Support",
...
]

subfield

Object: The ID and the name ( display_name ) for the subfield of this topic. The
subfield is the third-highest level in the "domain, field, subfield, topic" system,
which means it is the third-least granular. See the topics overview for more
explanation and a diagram.

subfield: {
id: 2718,
display_name: "Health Informatics"
}

updated_date

String: The last time anything in this topic object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.

updated_date: "2024-02-05T05:00:03.798420"

works_count

Integer: The number of works tagged with this topic.

works_count: 21737
Get a single topic
It's easy to get a topic from the API with: /topics/<entity_id> . Here's an
example:

Get the topic with the OpenAlex ID C71924100 :


https://api.openalex.org/topics/T11636

That will return a Topic object, describing everything OpenAlex knows about the
topic with that ID:

{
"id": "https://openalex.org/T11636",
"display_name": "Artificial Intelligence in Medicine",
// other fields removed for brevity
}

You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.

Select fields
You can use select to limit the fields that are returned in a topic object. More
details are here.

Display only the id and display_name for a topic object


https://api.openalex.org/topics/T11636?select=id,display\_name
Get lists of topics
You can get lists of topics:

Get all topics in OpenAlex


https://api.openalex.org/topics

Which returns a response like this:

{
"meta": {
"count": 4516,
"db_response_time_ms": 10,
"page": 1,
"per_page": 25,
"groups_count": null
},
"results": [
{
"id": "https://openalex.org/T11475",
"display_name": "Territorial Governance and Environmental Par
// more fields (removed to save space)
},
{
"id": "https://openalex.org/T13445",
"display_name": "American Political Thought and History",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}

Page and sort topics


By default we return 25 results per page. You can change this default and page
through topics with the per-page and page parameters:

Get the second page of topics results, with 50 results returned per page
https://api.openalex.org/topics?per-page=50\&page=2
You also can sort results with the sort parameter:

Sort topics by cited by count, descending


https://api.openalex.org/topics?sort=cited\_by\_count:desc

Continue on to learn how you can filter and search lists of topics.

Sample topics
You can use sample to get a random batch of topics. Read more about sampling
and how to add a seed value here.

Get 10 random topics


https://api.openalex.org/topics?sample=10

Select fields
You can use select to limit the fields that are returned in a list of topics. More
details are here.

Display only the id , display_name , and description within topics results


https://api.openalex.org/topics?select=id,display\_name,description
Filter topics
You can filter topics with the filter parameter:

Get topics that are in the subfield "Epidemiology" (id: 2713)


https://api.openalex.org/topics?filter=subfield.id:2713

It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query

/topics attribute filters


You can filter using these attributes of the Topic object (click each one to view
their documentation on the Topic object page):

cited_by_count

domain.id

field.id

ids.openalex (alias: openalex )


subfield.id

works_count

/topics convenience filters


These filters aren't attributes of the Topic object, but they're included to address
some common use cases:

default.search

Value: a search string

This works the same as using the search parameter for Topics.
display_name.search

Value: a search string

Returns: topics with a display_name containing the given string; see the search
page for details.

Get topics with display_name containing "artificial" and "intelligence":


https://api.openalex.org/topics?
filter=display_name.search:artificial+intelligence

In most cases, you should use the search parameter instead of this filter because it
uses a better search algorithm.
Search topics
The best way to search for topics is to use the search query parameter, which
searches the display_name , description , and keyword fields. Example:

Search topics' display_name and description for "artificial intelligence":


https://api.openalex.org/topics?search=artificial intelligence

You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.

Search a specific field


You can also use search as a filter, allowing you to fine-tune the fields you're
searching over. To do this, you append .search to the end of the property you are
filtering for:

Get topics with "medical" in the display_name :


https://api.openalex.org/topics?filter=display_name.search:medical

The following fields can be searched as a filter within topics:

Search filter Field that is searched

display_name.search display_name

description.search description

keywords.search keywords

You can also use the filter default.search , which works the same as using the
search parameter.
Group topics
You can group topics with the group_by parameter:

Get counts of topics by domain :


https://api.openalex.org/topics?group_by=domain.id

Or you can group using one the attributes below.

It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.

Topics group_by attributes


cited_by_count

domain.id

field.id

subfield.id

works_count
Keywords
Short words or phrases assigned to works using AI

Works in OpenAlex are tagged with Keywords using an automated system based on
Topics.

To learn more about how OpenAlex Keywords work in general, see the Keywords
page at OpenAlex help pages.

Keyword object
These are the fields in a keyword object. When you use the API to get a single
keyword or lists of keywords, this is what's returned.

cited_by_count

Integer: The number of citations to works that have been tagged with this keyword.
Or less formally: the number of citations to this keyword.

For example, if there are just two works tagged with this keyword and one of them
has been cited 10 times, and the other has been cited 1 time, cited_by_count for
this keyword would be 11 .

cited_by_count: 4347000

created_date

String: The date this Keyword object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.

created_date: "2024-04-10"
display_name

String: The English-language label of the keyword.

display_name: "Cardiac Imaging"

id

String: The OpenAlex ID for this keyword.

id: "https://openalex.org/keywords/cardiac-imaging"

updated_date

String: The last time anything in this keyword object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.

updated_date: "2024-05-09T05:00:03.798420"

works_count

Integer: The number of works tagged with this keyword.

works_count: 21737

Get a single keyword


It's easy to get a keyword from the API with: /keyword/<entity_id> . Here's an
example:
Get the keyword with the ID cardiac-imaging :
https://api.openalex.org/keywords/cardiac-imaging

That will return a Keyword object, describing everything OpenAlex knows about
the keyword with that ID:

{
"id": "https://openalex.org/keywords/cardiac-imaging",
"display_name": "Cardiac Imaging",
// other fields removed for brevity
}

You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.

Select fields
You can use select to limit the fields that are returned in a keyword object. More
details are here.

Display only the id and display_name for a keyword object


https://api.openalex.org/keywords/cardiac-imaging?
select=id,display_name

Get a list of keywords


You can get lists of keywords:

Get all keywords in OpenAlex


https://api.openalex.org/keywords

Which returns a response like this:


{
"meta": {
"count": 4516,
"db_response_time_ms": 10,
"page": 1,
"per_page": 25,
"groups_count": null
},
"results": [
{
"id": "https://openalex.org/T11475",
"display_name": "Territorial Governance and Environmental Par
// more fields (removed to save space)
},
{
"id": "https://openalex.org/T13445",
"display_name": "American Political Thought and History",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}

Filter keywords
You can filter keywords with the filter parameter:

Get keywords that are in the subfield "Epidemiology" (id: 2713)


https://api.openalex.org/keywords?filter=subfield.id:2713

It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query

/keywords attribute filters


You can filter using these attributes of the Keyword object:
cited_by_count

id

works_count

/keywords convenience filters


These filters aren't attributes of the Keyword object, but they're included to
address some common use cases:

default.search

Value: a search string

This works the same as using the search parameter for Keywords.

display_name.search

Value: a search string

Returns: keywords with a display_name containing the given string.

Get keywords with display_name containing "artificial" and "intelligence":


https://api.openalex.org/keywords?
filter=display_name.search:artificial+intelligence

Search keywords
You can search for keywords using the search query parameter, which searches
the display_name fileds. For example:

Search keywords' display_name "artificial intelligence":


https://api.openalex.org/keywords?search=artificial intelligence
You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.

Group keywords
You can group keywords with the group_by parameter:

Get counts of keywords by cited_by_count :


https://api.openalex.org/keywords?group_by=cited_by_count

Or you can group using one the attributes below.

It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.

Keywords group_by attributes


cited_by_count

works_count
Publishers
Companies and organizations that distribute works

Publishers are companies and organizations that distribute journal articles, books,
and theses. OpenAlex indexes about 10,000 publishers.

Get a list of OpenAlex publishers:


https://api.openalex.org/publishers

Our publisher data is closely tied to the publisher information in Wikidata. So the
Canonical External ID for OpenAlex publishers is a Wikidata ID, and almost every
publisher has one. Publishers are linked to sources through the
host_organization field.

What's next
Learn more about what you can do with publishers:

The Publisher object


Get a single publisher
Get lists of publishers
Filter publishers
Search publishers
Group publishers
Publisher object
Here are the fields in a publisher object. When you use the API to get a single
publisher or lists of publishers, this is what's returned.

alternate_titles

List: A list of alternate titles for this publisher.

alternate_titles: [
"Elsevier",
"elsevier.com",
"Elsevier Science",
"Uitg. Elsevier",
"‫"السفیر‬,
"‫"السویر‬,
"‫"انتشارات الزویر‬,
"‫"لودویک السفیر‬,
"爱思唯尔"
]

cited_by_count

Integer: The number of citations to works that are linked to this publisher through
journals or other sources.

For example, if a publisher publishes 27 journals and those 27 journals have 3,050
works, this number is the sum of the cited_by_count values for all of those 3,050
works.

cited_by_count: 407508754

country_codes
List: The countries where the publisher is primarily located, as an ISO two-letter
country code.

country_codes: ["DE"]

counts_by_year

List: The values of works_count and cited_by_count for each of the last ten
years, binned by year. To put it another way: for every listed year, you can see how
many new works are linked to this publisher, and how many times any work linked
to this publisher was cited.

Years with zero citations and zero works have been removed so you will need to
add those back in if you need them.

counts_by_year: [
{
year: 2021,
works_count: 4211,
cited_by_count: 120939
},
{
year: 2020,
works_count: 4363,
cited_by_count: 119531
},

// and so forth
]

created_date

String: The date this Publisher object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.

created_date: "2017-08-08"
display_name

String: The primary name of the publisher.

display_name: "Elsevier BV"

hierarchy_level

Integer: The hierarchy level for this publisher. A publisher with hierarchy level 0 has
no parent publishers. A hierarchy level 1 publisher has one parent above it, and so
on.

hierarchy_level: 1

id

String: The OpenAlex ID for this publisher.

id: "https://openalex.org/P4310320990"

ids

Object: All the external identifiers that we know about for this publisher. IDs are
expressed as URIs whenever possible. Possible ID types:

openalex String: this publishers's OpenAlex ID


ror String: this publisher's ROR ID

wikidata String: this publisher's Wikidata ID


ids: {
openalex: "https://openalex.org/P4310320990",
ror: "https://ror.org/02scfj030",
wikidata: "https://www.wikidata.org/entity/Q746413"
},

image_thumbnail_url

String: Same as image_url , but it's a smaller image.

This is usually a hotlink to a wikimedia image. You can change the width=300
parameter in the URL if you want a different thumbnail size.

image_thumbnail_url: "https://commons.wikimedia.org/w/index.php?title=Spe

image_url

String: URL where you can get an image representing this publisher. Usually this a
hotlink to a Wikimedia image, and usually it's a seal or logo.

image_url: "https://commons.wikimedia.org/w/index.php?title=Special:Redir

lineage

List: OpenAlex IDs of publishers. The list will include this publisher's ID, as well as
any parent publishers. If this publisher's hierarchy_level is 0, this list will only
contain its own ID.
id: "https://openalex.org/P4310321285",
...
hierarchy_level: 2,
lineage: [
"https://openalex.org/P4310321285",
"https://openalex.org/P4310319900",
"https://openalex.org/P4310319965"
]

parent_publisher

String: An OpenAlex ID linking to the direct parent of the publisher. This will be null
if the publisher's hierarchy_level is 0.

parent_publisher: "https://openalex.org/P4310311775"

roles

List: List of role objects, which include the role (one of institution , funder , or
publisher ), the id (OpenAlex ID), and the works_count .

In many cases, a single organization does not fit neatly into one role. For example,
Yale University is a single organization that is a research university, funds research
studies, and publishes an academic journal. The roles property links the
OpenAlex entities together for a single organization, and includes counts for the
works associated with each role.

The roles list of an entity (Funder, Publisher, or Institution) always includes itself.
In the case where an organization only has one role, the roles will be a list of
length one, with itself as the only item.
roles: [
{
role: "funder",
id: "https://openalex.org/F4320308380",
works_count: 1004,
},
{
role: "publisher",
id: "https://openalex.org/P4310315589",
works_count: 13986,
},
{
role: "institution",
id: "https://openalex.org/I32971472",
works_count: 250031,
}
]

sources_api_url

String: An URL that will get you a list of all the sources published by this publisher.

We express this as an API URL (instead of just listing the sources themselves)
because there might be thousands of sources linked to a publisher, and that's too
many to fit here.

sources_api_url: "https://api.openalex.org/sources?filter=host_organizati

summary_stats

Object: Citation metrics for this publisher

2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this publisher.

i10_index Integer: The i-10 index for this publisher.


While the h-index and the i-10 index are normally author-level metrics and the 2-
year mean citedness is normally a journal-level metric, they can be calculated for
any set of papers, so we include them for publishers.

summary_stats: {
2yr_mean_citedness: 5.065784263815827,
h_index: 985,
i10_index: 176682
}

updated_date

String: The last time anything in this publisher object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.

updated_date: "2021-12-25T14:04:30.578837"

works_count

Integer: The number of works published by this publisher.

works_count: 13789818
Get a single publisher
It's easy to get a publisher from from the API with: /publishers/<entity_id> .
Here's an example:

Get the publisher with the OpenAlex ID P4310319965 :


https://api.openalex.org/publishers/P4310319965

That will return a Publisher object, describing everything OpenAlex knows about
the publisher with that ID:

{
"id": "https://openalex.org/P4310319965",
"display_name": "Springer Nature",
"alternate_titles": [
"エイプレス",
"Springer Nature Group",
"施普林格-自然出版集团"
],
"hierarchy_level": 0,
// other fields removed for brevity
}

You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.

External IDs
You can look up publishers using external IDs such as a Wikidata ID:

Get the publisher with Wikidata ID Q1479654:


https://api.openalex.org/publishers/wikidata:Q1479654

Available external IDs for publishers are:


External ID URN

ROR ror

Wikidata wikidata

Select fields
You can use select to limit the fields that are returned in a publisher object. More
details are here.

Display only the id and display_name for a publisher object


https://api.openalex.org/publishers/P4310319965?select=id,display_name
Get lists of publishers
You can get lists of publishers:

Get all publishers in OpenAlex


https://api.openalex.org/publishers

Which returns a response like this:

{
"meta": {
"count": 7207,
"db_response_time_ms": 26,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/P4310311775",
"display_name": "RELX Group",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/P4310320990",
"display_name": "Elsevier BV",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}

Page and sort publishers


By default we return 25 results per page. You can change this default and page
through publishers with the per-page and page parameters:

Get the second page of publishers results, with 50 results returned per page
https://api.openalex.org/publishers?per-page=50&page=2
You also can sort results with the sort parameter:

Sort publishers by display name, descending


https://api.openalex.org/publishers?sort=display_name:desc

Continue on to learn how you can filter and search lists of publishers.

Sample publishers
You can use sample to get a random batch of publishers. Read more about
sampling and how to add a seed value here.

Get 10 random publishers


https://api.openalex.org/publishers?sample=10

Select fields
You can use select to limit the fields that are returned in a list of publishers. More
details are here.

Display only the id , display_name , and alternate_titles within publishers


results
https://api.openalex.org/publishers?select=id,display_name,alternate_titles
Filter publishers
You can filter publishers with the filter parameter:

Get publishers that are hierarchy level 0


https://api.openalex.org/publishers?filter=hierarchy\_level:0

It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query

/publishers attribute filters


You can filter using these attributes of the Publisher entity object (click each one
to view their documentation on the Publisher object page):

cited_by_count

country_codes

hierarchy_level

ids.openalex (alias: openalex )


ids.ror (alias: ror )
ids.wikidata (alias: wikidata )
lineage — Use this with a publisher ID to find that publisher and all of its
children
parent_publisher

summary_stats.2yr_mean_citedness (accepts float, null, !null, can use range


queries such as < >)
summary_stats.h_index (accepts integer, null, !null, can use range queries)
summary_stats.i10_index (accepts integer, null, !null, can use range queries)
works_count

/publishers convenience filters


These filters aren't attributes of the Publisher object, but they're included to
address some common use cases:

continent

Value: a String with a valid continent filter

Returns: publishers that are located in the chosen continent.

Get publishers that are located in South America


https://api.openalex.org/publishers?filter=continent:south_america

default.search

Value: a search string

This works the same as using the search parameter for Publishers.

display_name.search

Value: a search string

Returns: publishers with a display_name containing the given string; see the
search page for details.

Get publishers with names containing "elsevier":


https://api.openalex.org/publishers?filter=display_name.search:elsevier
``

In most cases, you should use the search parameter instead of this filter because it
uses a better search algorithm.
Search publishers
The best way to search for publishers is to use the search query parameter, which
searches the display_name and alternate_titles fields. Example:

Search publishers' display_name and alternate_titles for "springer":


https://api.openalex.org/publishers?search=springer

You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.

Search a specific field


You can also use search as a filter, allowing you to fine-tune the fields you're
searching over. To do this, you append .search to the end of the property you are
filtering for:

Get publishers with "elsevier" in the display_name :


https://api.openalex.org/publishers?filter=display_name.search:elsevier

The following field can be searched as a filter within publishers:

Search filter Field that is searched

display_name.search display_name

You can also use the filter default.search , which works the same as using the
search parameter.

Autocomplete publishers
You can autocomplete publishers to create a very fast type-ahead style search
function:
Autocomplete publishers with "els" in the display_name :
https://api.openalex.org/autocomplete/publishers?q=els

This returns a list of publishers:

{
"results": [
{
"id": "https://openalex.org/P4310320990",
"display_name": "Elsevier BV",
"hint": null,
"cited_by_count": 407508754,
"works_count": 20311868,
"entity_type": "publisher",
"external_id": "https://www.wikidata.org/entity/Q746413"
},
...
]
}

Read more in the autocomplete page in the API guide.


Group publishers
You can group publishers with the group_by parameter:

Get counts of publishers by country_codes :


https://api.openalex.org/publishers?group\_by=country\_codes

Or you can group using one the attributes below.

It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.

/publishers group_by attributes


country_codes

hierarchy_level

lineage

summary_stats.2yr_mean_citedness

summary_stats.h_index

summary_stats.i10_index
Funders
Organizations that fund research

Funders are organizations that fund research. OpenAlex indexes about 32,000
funders. Funder data comes from Crossref, and is enhanced with data from
Wikidata and ROR.

Get a list of OpenAlex funders:


https://api.openalex.org/funders

Funders are connected to works through grants.

What's next
Learn more about what you can do with funders:

The Funder object


Get a single funder
Get lists of funders
Filter funders
Search funders
Group funders
Funder object
These are the fields in a funder object. When you use the API to get a single funder
or lists of funders, this is what's returned.

alternate_titles

List: A list of alternate titles for this funder.

alternate_titles: [
"US National Institutes of Health",
"Institutos Nacionales de la Salud",
"NIH"
]

cited_by_count

Integer: The total number Works that cite a work linked to this funder.

cited_by_count: 7823467

country_code

String: The country where this funder is located, represented as an ISO two-letter
country code.

country_code: "US"

counts_by_year

List: The values of works_count and cited_by_count for each of the last ten
years, binned by year. To put it another way: for every listed year, you can see how
many new works are linked to this funder, and how many times any work linked to
this funder was cited.

Years with zero citations and zero works have been removed so you will need to
add those back in if you need them.

counts_by_year: [
{
year: 2021,
works_count: 4211,
cited_by_count: 120939
},
{
year: 2020,
works_count: 4363,
cited_by_count: 119531
},

// and so forth
]

created_date

String: The date this Funder object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.

created_date: "2023-02-13"

description

String: A short description of this funder, taken from Wikidata.

description: "medical research organization in the United States"

display_name
String: The primary name of the funder.

display_name: "National Institutes of Health"

grants_count

Integer: The number of grants linked to this funder.

grants_count: 7109

homepage_url

String: The URL for this funder's primary homepage.

homepage_url: "http://www.nih.gov/"

id

String: The OpenAlex ID for this funder.

id: "https://openalex.org/F4320332161"

ids

Object: All the external identifiers that we know about for this funder. IDs are
expressed as URIs whenever possible. Possible ID types:

crossref String: this funder's Crossref ID


doi String: this funder's DOI

openalex String: this funder's OpenAlex ID

ror String: this funder's ROR ID


wikidata String: this funder's Wikidata ID

ids: {
openalex: "https://openalex.org/F4320332161",
ror: "https://ror.org/01cwqze88",
wikidata: "https://www.wikidata.org/entity/Q390551",
crossref: "100000002",
doi: "https://doi.org/10.13039/100000002"
}

image_thumbnail_url

String: Same as image_url , but it's a smaller image.

This is usually a hotlink to a wikimedia image. You can change the width=300
parameter in the URL if you want a different thumbnail size.

image_thumbnail_url: "https://commons.wikimedia.org/w/index.php?title=Spe

image_url

String: URL where you can get an image representing this funder. Usually this a
hotlink to a Wikimedia image, and usually it's a seal or logo.

image_url: "https://commons.wikimedia.org/w/index.php?title=Special:Redir

roles

List: List of role objects, which include the role (one of institution , funder , or
publisher ), the id (OpenAlex ID), and the works_count .

In many cases, a single organization does not fit neatly into one role. For example,
Yale University is a single organization that is a research university, funds research
studies, and publishes an academic journal. The roles property links the
OpenAlex entities together for a single organization, and includes counts for the
works associated with each role.

The roles list of an entity (Funder, Publisher, or Institution) always includes itself.
In the case where an organization only has one role, the roles will be a list of
length one, with itself as the only item.

roles: [
{
role: "funder",
id: "https://openalex.org/F4320308380",
works_count: 1004,
},
{
role: "publisher",
id: "https://openalex.org/P4310315589",
works_count: 13986,
},
{
role: "institution",
id: "https://openalex.org/I32971472",
works_count: 250031,
}
]

summary_stats

Object: Citation metrics for this funder

2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this funder.

i10_index Integer: The i-10 index for this funder.

While the h-index and the i-10 index are normally author-level metrics and the 2-
year mean citedness is normally a journal-level metric, they can be calculated for
any set of papers, so we include them for funders.
summary_stats: {
2yr_mean_citedness: 5.065784263815827,
h_index: 985,
i10_index: 176682
}

updated_date

String: The last time anything in this funder object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.

updated_date: "2023-04-21T16:54:19.012138"

works_count

Integer: The number of works linked to this funder.

works_count: 260210
Get a single funder
It's easy to get a funder from from the API with: /funders/<entity_id> . Here's an
example:

Get the funder with the OpenAlex ID F4320332161 :


https://api.openalex.org/funders/F4320332161

That will return a Funder object, describing everything OpenAlex knows about the
funder with that ID:

{
"id": "https://openalex.org/F4320332161",
"display_name": "National Institutes of Health",
"alternate_titles": [
"US National Institutes of Health",
"Institutos Nacionales de la Salud",
"NIH"
],
// other fields removed for brevity
}

You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.

External IDs
You can look up funders using external IDs such as a Wikidata ID:

Get the funder with Wikidata ID Q1479654:


https://api.openalex.org/funders/wikidata:Q390551

Available external IDs for funders are:

External ID URN

ROR ror
External ID URN

Wikidata wikidata

Select fields
You can use select to limit the fields that are returned in a funder object. More
details are here.

Display only the id and display_name for a funder object


https://api.openalex.org/funders/F4320332161?select=id,display_name
Get lists of funders
You can get lists of funders:

Get all funders in OpenAlex


https://api.openalex.org/funders

Which returns a response like this:

{
"meta": {
"count": 32437,
"db_response_time_ms": 26,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/F4320321001",
"display_name": "National Natural Science Foundation of China
// more fields (removed to save space)
},
{
"id": "https://openalex.org/F4320306076",
"display_name": "National Science Foundation",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}

Page and sort funders


By default we return 25 results per page. You can change this default and page
through funders with the per-page and page parameters:

Get the second page of funders results, with 50 results returned per page
https://api.openalex.org/funders?per-page=50&page=2
You also can sort results with the sort parameter:

Sort funders by display name, descending


https://api.openalex.org/funders?sort=display_name:desc

Continue on to learn how you can filter and search lists of funders.

Sample funders
You can use sample to get a random batch of funders. Read more about sampling
and how to add a seed value here.

Get 10 random funders


https://api.openalex.org/funders?sample=10

Select fields
You can use select to limit the fields that are returned in a list of funders. More
details are here.

Display only the id , display_name , and alternate_titles within funders


results
https://api.openalex.org/funders?select=id,display_name,alternate_titles
Filter funders
You can filter funders with the filter parameter:

Get funders that are located in Canada


https://api.openalex.org/funders?filter=country_code:ca

It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query

/funders attribute filters


You can filter using these attributes of the Funder entity object (click each one to
view their documentation on the Funder object page):

cited_by_count

country_code

grants_count

ids.openalex (alias: openalex )


ids.ror (alias: ror )
ids.wikidata (alias: wikidata )
summary_stats.2yr_mean_citedness (accepts float, null, !null, can use range
queries such as < >)
summary_stats.h_index (accepts integer, null, !null, can use range queries)
summary_stats.i10_index (accepts integer, null, !null, can use range queries)
works_count

/funders convenience filters


These filters aren't attributes of the Funder object, but they're included to address
some common use cases:
continent

Value: a String with a valid continent filter

Returns: funders that are located in the chosen continent.

Get funders that are located in South America


https://api.openalex.org/funders?filter=continent:south\_america

default.search

Value: a search string

This works the same as using the search parameter for Funders.

description.search

Value: a search string

Returns: funders with a description containing the given string; see the search
page for details.

Get funders with description containing "health":


https://api.openalex.org/funders?filter=description.search:health

display_name.search

Value: a search string

Returns: funders with a display_name containing the given string; see the search
page for details.

Get funders with names containing "health":


https://api.openalex.org/funders?filter=display_name.search:health
In most cases, you should use the search parameter instead of this filter because it
uses a better search algorithm.

is_global_south

Value: a Boolean ( true or false )

Returns: funders that are located in the Global South.

Get funders that are located in the Global South


https://api.openalex.org/funders?filter=is_global_south:true
Search funders
The best way to search for funders is to use the search query parameter, which
searches the display_name , the alternate_titles , and the description fields.
Example:

Search funders' display_name , alternate_titles , and description for


"health":
https://api.openalex.org/funders?search=health

You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.

Search a specific field


You can also use search as a filter, allowing you to fine-tune the fields you're
searching over. To do this, you append .search to the end of the property you are
filtering for:

Get funders with "florida" in the display_name :


https://api.openalex.org/funders?filter=display_name.search:florida

The following fields can be searched as a filter within funders:

Search filter Field that is searched

display_name.search display_name

description.search description

You can also use the filter default.search , which works the same as using the
search parameter.
Autocomplete funders
You can autocomplete funders to create a very fast type-ahead style search
function:

Autocomplete funders with "national sci" in the display_name :


https://api.openalex.org/autocomplete/funders?q=national+sci

This returns a list of funders with the funder location set as the hint:

"results": [
{
"id": "https://openalex.org/F4320306076",
"display_name": "National Science Foundation",
"hint": null,
"cited_by_count": 6705777,
"works_count": 264303,
"entity_type": "funder",
"external_id": "https://ror.org/021nxhr62"
},
...
]

Read more in the autocomplete page in the API guide.


Group funders
You can group funders with the group_by parameter:

Get counts of funders by country_code :


https://api.openalex.org/funders?group_by=country_code

Or you can group using one the attributes below.

It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.

/funders group_by attributes


cited_by_count

continent

country_code

grants_count

is_global_south

summary_stats.2yr_mean_citedness

summary_stats.h_index

summary_stats.i10_index

works_count
Geo
Where things are in the world

While geo is not a core entity within OpenAlex, geography is central to categorizing
scholarly data. That's why OpenAlex uses United Nations data to divide the globe
into continents and regions that makes filtering data easier.

Here are some ways you can filter and group by continents and the Global South.

Get institutions located in South America


https://api.openalex.org/institutions?filter=continent:south_america

Get works where at least one author's institution is located in the Global South
https://api.openalex.org/works?
filter=institutions.is\_global\_south:true

Group highly-cited authors by their last known institution's continent


https://api.openalex.org/authors?group-
by=last\_known\_institution.continent\&filter=cited\_by\_count:>100

What's next
Learn more about what you can do with geo:

Continents
Regions
Continents
Countries are mapped to continents using data from the United Nations Statistics
Division. You can see the actual mapping used by the API here.

Filter by continent

There are three ways to use continent filters:

Endpoint Format

/authors?filter=last_known_institution.continent:
Authors <continent>

Institutions /institutions?filter=continent:<continent>

Works /works?filter=institutions.continent:<continent>

Available values for the <continent> filter are:

Continent Filter Value Canonical ID

Africa africa Q15

Antarctica antarctica Q51

Asia asia Q48

Europe europe Q46

North America north_america Q49

Oceania oceania Q55643

South America south_america Q18

Group by continent

You can group by continent.


Group institutions by continent
https://api.openalex.org/institutions?group-by=continent

Response:

{
key: "Q46",
key_display_name: "Europe",
count: 41382
},
{
key: "Q49",
key_display_name: "North America",
count: 37458
},
{
key: "Q48",
key_display_name: "Asia",
count: 20432
}...

Groups are available in these endpoints:

Endpoint Format

Authors /authors?group-by=last_known_institution.continent

Institutions /institutions?group-by=continent

Works /works?group-by=institutions.continent
Regions
Global South
The Global South is a term used to identify regions within Latin America, Asia,
Africa, and Oceania. Our source for this group of countries is the United Nations
Finance Center for South-South Cooperation.

Filter by Global South

You can filter Global South countries by using the boolean filter is_global_south
in the following endpoints:

Endpoint Format

/authors?filter=last_known_institution.is_global_south:
Authors <boolean>

Institutions /institutions?filter=is_global_south:<boolean>

Works /works?filter=institutions.is_global_south:<boolean>

Group by Global South

You can also group by the Global South:

Endpoint Format

Authors /authors?group-by=last_known_institution.is_global_south

Institutions /institutions?group-by=is_global_south

Works /works?group-by=institutions.is_global_south

Tips & Tricks


To see country-by-country details for a geographic region, filter by region, then
group by country_code .

Get number of authors with last known institution in the Global South, by country
https://api.openalex.org/authors?
filter=last_known_institution.is_global_south:true&group-
by=last_known_institution.country_code

Response:

// all countries are in the Global South


{
key: "CN",
key_display_name: "China",
count: 13926441
},
{
key: "IN",
key_display_name: "India",
count: 2632721
},
{
key: "BR",
key_display_name: "Brazil",
count: 2089957
}...
Concepts
These are the original OpenAlex Concepts, which are being deprecated in favor of
Topics. We will continue to provide these Concepts for Works, but we will not be
actively maintaining, updating, or providing support for these concepts. Unless you
have a good reason to be relying on them, we encourage you to look into Topics
instead.

Concepts are abstract ideas that works are about. OpenAlex indexes about 65k
concepts.

Get all the concepts used by OpenAlex:


https://api.openalex.org/concepts

The Canonical External ID for OpenAlex concepts is the Wikidata ID, and each of
our concepts has one, because all OpenAlex concepts are also Wikidata concepts.

Concepts are hierarchical, like a tree. There are 19 root-level concepts, and six
layers of descendants branching out from them, containing about 65 thousand
concepts all told. This concept tree is a modified version of the one created by
MAG.

You can view all the concepts and their position in the tree as a spreadsheet here.
About 85% of works are tagged with at least one concept (here's the breakdown of
concept counts per work).

How concepts are assigned


Each work is tagged with multiple concepts, based on the title, abstract, and the
title of its host venue. The tagging is done using an automated classifier that was
trained on MAG’s corpus; you can read more about the development and operation
of this classifier in Automated concept tagging for OpenAlex, an open index of
scholarly articles. You can implement the classifier yourself using our models and
code.
A score is available for each concept in a work, showing the classifier's confidence
in choosing that concept. However, when assigning a lower-level child concept, we
also assign all of its parent concepts all the way up to the root. This means that
some concept assignment scores will be 0.0. The tagger adds concepts to works
written in different languages, but it is optimized for English.

Concepts are linked to works via the concepts property, and to other concepts via
the ancestors and related_concepts properties.

What's next
Learn more about what you can do with concepts:

The Concept object


Get a single concept
Get lists of concepts
Filter concepts
Search concepts
Group concepts
Concept object
These are the original OpenAlex Concepts, which are being deprecated in favor of
Topics. We will continue to provide these Concepts for Works, but we will not be
actively maintaining, updating, or providing support for these concepts. Unless you
have a good reason to be relying on them, we encourage you to look into Topics
instead.

These are the fields in a concept object. When you use the API to get a single
concept or lists of concepts, this is what's returned.

ancestors

List: List of concepts that this concept descends from, as dehydrated Concept
objects. See the concept tree section for more details on how the different layers of
concepts work together.

ancestors: [
{
id: "https://openalex.org/C2522767166",
wikidata: "https://www.wikidata.org/wiki/Q2374463",
display_name: "Data science",
level: 1
},
{
id: "https://openalex.org/C161191863",
wikidata: "https://www.wikidata.org/wiki/Q199655",
display_name: "Library science",
level: 1
},

// and so forth
]

cited_by_count
Integer: The number citations to works that have been tagged with this concept. Or
less formally: the number of citations to this concept.

For example, if there are just two works tagged with this concept and one of them
has been cited 10 times, and the other has been cited 1 time, cited_by_count for
this concept would be 11 .

cited_by_count: 20248

counts_by_year

List: The values of works_count and cited_by_count for each of the last ten
years, binned by year. To put it another way: for every listed year, you can see how
many new works were tagged with this concept, and how many times any work
tagged with this concept got cited.

Years with zero citations and zero works have been removed so you will need to
add those back in if you need them.

counts_by_year: [
{
year: 2021,
works_count: 4211,
cited_by_count: 120939
},
{
year: 2020,
works_count: 4363,
cited_by_count: 119531
},

// and so forth
]

created_date
String: The date this Concept object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.

created_date: "2017-08-08"

description

String: A brief description of this concept.

description: "study of alternative metrics for analyzing and informing sc

display_name

String: The English-language label of the concept.

display_name: "Altmetrics"

id

String: The OpenAlex ID for this concept.

id: "https://openalex.org/C2778407487"

ids

Object: All the external identifiers that we know about for this concept. IDs are
expressed as URIs whenever possible. Possible ID types:

mag (Integer: this concept's Microsoft Academic Graph ID)


openalex (String: this concept's OpenAlex ID. Same as Concept.id )
umls_cui (List: this concept's Unified Medical Language System Concept
Unique Identifiers)
umls_aui (List: this concept's Unified Medical Language System Atom Unique
Identifiers)
wikidata (String: this concept's Wikidata ID. Same as Concept.wikidata )
wikipedia (String: this concept's Wikipedia page URL)

Many concepts are missing one or more ID types (either because we don't know the
ID, or because it was never assigned). Keys for null IDs are not displayed..

ids: {
openalex: "https://openalex.org/C2778407487",
wikidata: "https://www.wikidata.org/wiki/Q14565201",
wikipedia: "https://en.wikipedia.org/wiki/Altmetrics",
mag: 2778407487
}

image_thumbnail_url

String: Same as image_url , but it's a smaller image.

image_thumbnail_url: "https://upload.wikimedia.org/wikipedia/commons/thum

image_url

String: URL where you can get an image representing this concept, where available.
Usually this is hosted on Wikipedia.

image_url: "https://upload.wikimedia.org/wikipedia/commons/f/f1/Altmetric

international

Object: This concept's display name in many languages, derived from article titles
on each language's wikipedia. See the Wikidata entry for "Java Bytecode" for
example source data.
display_name (Object)
key (String): language code in wikidata language code format. Full list of
languages is here.
value (String): display_name in the given language

international: {
display_name: {
ca: "Altmetrics",
...
}
}

level

Integer: The level in the concept tree where this concept lives. Lower-level
concepts are more general, and higher-level concepts are more specific. Computer
Science has a level of 0; Java Bytecode has a level of 5. Level 0 concepts have no
ancestors and level 5 concepts have no descendants.

level: 2

related_concepts

List: Concepts that are similar to this one. Each listed concept is a dehydrated
Concept object, with one additional attribute:

score (Float): The strength of association between this concept and the listed
concept, on a scale of 0-100.
related_concepts: [
{
id: "https://openalex.org/C2778793908",
wikidata: null,
display_name: "Citation impact",
level: 3,
score: 4.56749
},
{
id: "https://openalex.org/C2779455604",
wikidata: null,
display_name: "Impact factor",
level: 2,
score: 4.46396
}

// and so forth
]

summary_stats

Object: Citation metrics for this concept

2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this concept.

i10_index Integer: The i-10 index for this concept.

While the h-index and the i-10 index are normally author-level metrics and the 2-
year mean citedness is normally a journal-level metric, they can be calculated for
any set of papers, so we include them for concepts.

summary_stats: {
2yr_mean_citedness: 1.5295340589458237,
h_index: 105,
i10_index: 5045
}
updated_date

String: The last time anything in this concept object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.

updated_date: "2021-12-25T14:04:30.578837"

wikidata

String: The Wikidata ID for this concept. This is the Canonical External ID for
concepts.

All OpenAlex concepts have a Wikidata ID, because all OpenAlex concepts are also
Wikidata concepts.

wikidata: "https://www.wikidata.org/wiki/Q14565201"

works_api_url

String: An URL that will get you a list of all the works tagged with this concept.

We express this as an API URL (instead of just listing the works themselves)
because there might be millions of works tagged with this concept, and that's too
many to fit here.

works_api_url: "https://api.openalex.org/works?filter=concept.id:C27784074

works_count

Integer: The number of works tagged with this concept.


works_count: 3078

The DehydratedConcept object


The DehydratedConcept is stripped-down Concept object, with most of its
properties removed to save weight. Its only remaining properties are:

[`display_name`](concept-object.md#display\_name)

[`id`](concept-object.md#id)

[`level`](concept-object.md#level)

[`wikidata`](concept-object.md#wikidata)
Get a single concept
These are the original OpenAlex Concepts, which are being deprecated in favor of
Topics. We will continue to provide these Concepts for Works, but we will not be
actively maintaining, updating, or providing support for these concepts. Unless you
have a good reason to be relying on them, we encourage you to look into Topics
instead.

It's easy to get a concept from the API with: /concepts/<entity_id> . Here's an
example:

Get the concept with the OpenAlex ID C71924100 :


https://api.openalex.org/concepts/C71924100

That will return a Concept object, describing everything OpenAlex knows about
the concept with that ID:

{
"id": "https://openalex.org/C71924100",
"wikidata": "https://www.wikidata.org/wiki/Q11190",
"display_name": "Medicine",
"level": 0,
"description": "field of study for diagnosing, treating and preventin
// other fields removed for brevity
}

You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.

External IDs
You can look up concepts using external IDs such as a wikidata ID:

Get the concept with wikidata ID Q11190:


https://api.openalex.org/concepts/wikidata:Q11190
Available external IDs for concepts are:

External ID URN

Microsoft Academic Graph (MAG) mag

Wikidata wikidata

Select fields
You can use select to limit the fields that are returned in a concept object. More
details are here.

Display only the id and display_name for a concept object


https://api.openalex.org/concepts/C71924100?select=id,display_name
Get lists of concepts
These are the original OpenAlex Concepts, which are being deprecated in favor of
Topics. We will continue to provide these Concepts for Works, but we will not be
actively maintaining, updating, or providing support for these concepts. Unless you
have a good reason to be relying on them, we encourage you to look into Topics
instead.

You can get lists of concepts:

Get all concepts in OpenAlex


https://api.openalex.org/concepts

Which returns a response like this:

{
"meta": {
"count": 65073,
"db_response_time_ms": 26,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/C41008148",
"wikidata": "https://www.wikidata.org/wiki/Q21198",
"display_name": "Computer science",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/C71924100",
"wikidata": "https://www.wikidata.org/wiki/Q11190",
"display_name": "Medicine",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}
Page and sort concepts
By default we return 25 results per page. You can change this default and page
through concepts with the per-page and page parameters:

Get the second page of concepts results, with 50 results returned per page
https://api.openalex.org/concepts?per-page=50&page=2

You also can sort results with the sort parameter:

Sort concepts by cited by count, descending


https://api.openalex.org/concepts?sort=cited_by_count:desc

Continue on to learn how you can filter and search lists of concepts.

Sample concepts
You can use sample to get a random batch of concepts. Read more about
sampling and how to add a seed value here.

Get 10 random concepts


https://api.openalex.org/concepts?sample=10

Select fields
You can use select to limit the fields that are returned in a list of concepts. More
details are here.

Display only the id , display_name , and description within concepts results


https://api.openalex.org/concepts?select=id,display_name,description
Filter concepts
These are the original OpenAlex Concepts, which are being deprecated in favor of
Topics. We will continue to provide these Concepts for Works, but we will not be
actively maintaining, updating, or providing support for these concepts. Unless you
have a good reason to be relying on them, we encourage you to look into Topics
instead.

You can filter concepts with the filter parameter:

Get concepts that are at level 0 (top level)


https://api.openalex.org/concepts?filter=level:0

It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query

/concepts attribute filters


You can filter using these attributes of the Concept object (click each one to view
their documentation on the Concept object page):

ancestors.id

cited_by_count

ids.openalex (alias: openalex )


level

summary_stats.2yr_mean_citedness (accepts float, null, !null, can use range


queries such as < >)
summary_stats.h_index (accepts integer, null, !null, can use range queries)
summary_stats.i10_index (accepts integer, null, !null, can use range queries)
works_count

/concepts convenience filters


These filters aren't attributes of the Concept object, but they're included to
address some common use cases:

default.search

Value: a search string

This works the same as using the search parameter for Concepts.

display_name.search

Value: a search string

Returns: concepts with a display_name containing the given string; see the search
page for details.

Get concepts with display_name containing "electrodynamics":


https://api.openalex.org/concepts?
filter=display_name.search:electrodynamics

In most cases, you should use the search parameter instead of this filter because it
uses a better search algorithm.

has_wikidata

Value: a Boolean ( true or false )

Returns: concepts that have or lack a Wikidata ID, depending on the given value.
For now, all concepts in OpenAlex do have Wikidata IDs.

Get concepts without Wikidata IDs:


https://api.openalex.org/concepts?filter=has_wikidata:false
Search concepts
These are the original OpenAlex Concepts, which are being deprecated in favor of
Topics. We will continue to provide these Concepts for Works, but we will not be
actively maintaining, updating, or providing support for these concepts. Unless you
have a good reason to be relying on them, we encourage you to look into Topics
instead.

The best way to search for concepts is to use the search query parameter, which
searches the display_name and description fields. Example:

Search concepts' display_name and description for "artificial intelligence":


https://api.openalex.org/concepts?search=artificial intelligence

You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.

Search a specific field


You can also use search as a filter, allowing you to fine-tune the fields you're
searching over. To do this, you append .search to the end of the property you are
filtering for:

Get concepts with "medical" in the display_name :


https://api.openalex.org/concepts?filter=display_name.search:medical

The following field can be searched as a filter within concepts:

Search filter Field that is searched

display_name.search display_name
You can also use the filter default.search , which works the same as using the
search parameter.

Autocomplete concepts
You can autocomplete concepts to create a very fast type-ahead style search
function:

Autocomplete concepts with "comp" in the display_name :


https://api.openalex.org/autocomplete/concepts?q=comp

This returns a list of concepts with the description set as the hint:

{
"results": [
{
"id": "https://openalex.org/C41008148",
"display_name": "Computer science",
"hint": "theoretical study of the formal foundation enabling the
"cited_by_count": 392939277,
"works_count": 76722605,
"entity_type": "concept",
"external_id": "https://www.wikidata.org/wiki/Q21198"
},
...
]
}

Read more in the autocomplete page in the API guide.


Group concepts
These are the original OpenAlex Concepts, which are being deprecated in favor of
Topics. We will continue to provide these Concepts for Works, but we will not be
actively maintaining, updating, or providing support for these concepts. Unless you
have a good reason to be relying on them, we encourage you to look into Topics
instead.

You can group concepts with the group_by parameter:

Get counts of concepts by level :


https://api.openalex.org/concepts?group_by=level

Or you can group using one the attributes below.

It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.

/concepts group_by __ attributes


ancestors.id

cited_by_count

has_wikidata

level

summary_stats.2yr_mean_citedness

summary_stats.h_index

summary_stats.i10_index

works_count
Aboutness endpoint (/text)
You can use the /text API endpoint to tag your own free text with OpenAlex's
"aboutness" assignments—topics, keywords, and concepts.

Accepts a title and optional abstract in the GET params or as a POST request.
The results are straight from the model, with 0 values truncated.

Examples

Get OpenAlex Keywords for your text


https://api.openalex.org/text/keywords?
title=type%201%20diabetes%20research%20for%20children

Get OpenAlex Topics for your text


https://api.openalex.org/text/topics?
title=type%201%20diabetes%20research%20for%20children

Get OpenAlex Concepts for your text


https://api.openalex.org/text/concepts?
title=type%201%20diabetes%20research%20for%20children

Get all of the above in one request


https://api.openalex.org/text?
title=type%201%20diabetes%20research%20for%20children

Example response for that last one:


{
meta: {
keywords_count: 5,
topics_count: 3,
concepts_count: 3
},
keywords: [
id: "https://openalex.org/keywords/type-1-diabetes",
display_name: "Type 1 Diabetes",
score: 0.677
], ...
primary_topic: {
id: "https://openalex.org/T10560",
display_name: "Management of Diabetes Mellitus and Hypoglycemia",
score: 0.995
// more information about the primary topic, removed for brevity
},
topics: [
// list of topic objects with scores
],
concepts: [
// list of concept objects with scores
]
}

Queries are limited to between 20 and 2000 characters. The endpoints are rate
limited to 1 per second and 1000 requests per day.
How to use the API
API Overview
The API is the primary way to get OpenAlex data. It's free and requires no
authentication. The daily limit for API calls is 100,000 requests per user per day. For
best performance, add your email to all API requests, like
[email protected] .

Learn more about the API


Get single entities
Get lists of entities — Learn how to use paging, filtering, and sorting
Get groups of entities — Group and count entities in different ways
Rate limits and authentication — Learn about joining the polite pool
Tutorials — Hands-on examples with code

Client Libraries
There are several third-party libraries you can use to get data from OpenAlex:

openalexR (R)
KtAlex (Kotlin)
PyAlex (Python)
diophila (Python)
OpenAlexAPI (Python)

If you're looking for a visual interface, you can also check out the free VOSviewer,
which lets you make network visualizations based on OpenAlex data:
Get single entities
Get a single entity, based on an ID

This is a more detailed guide to single entities in OpenAlex. If you're just getting
started, check out get a single work.

It's easy to get a singleton entity object from from the API:
/<entity_name>/<entity_id>. Here's an example:

Get the work with the OpenAlex ID W2741809807 :


https://api.openalex.org/works/W2741809807

That will return a Work object, describing everything OpenAlex knows about the
work with that ID. You can use IDs other than OpenAlex IDs, and you can also
format the IDs in different ways. Read below to learn more.

You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.

To get a single entity, you need a single unambiguous identifier, like an ORCID or an
OpenAlex ID. If you've got an ambiguous identifier (like an author's name), you'll want
to search instead.

The OpenAlex ID
The OpenAlex ID is the primary key for all entities. It's a URL shaped like this:
https://openalex.org/<OpenAlex_key> . Here's a real-world example:

https://openalex.org/W2741809807

The OpenAlex Key


The OpenAlex ID has two parts. The first part is the Base; it's always
https://openalex.org/. The second part is the Key; it's the unique primary key
that identifies a given resource in our database.

The key starts with a letter; that letter tells you what kind of entity you've got:
W(ork), A(uthor), S(ource), I(nstitution), C(oncept), P(ublisher), or F(under). The IDs
are not case-sensitive, so w2741809807 is just as valid as W2741809807 . So in the
example above, the Key is W2741809807 , and the W at the front tells us that this is
a Work .

Because OpenAlex was launched as a replacement for Microsoft Academic Graph


(MAG), OpenAlex IDs are designed to be backwards-compatible with MAG IDs,
where they exist. To find the MAG ID, just take the first letter off the front of the
unique part of the ID (so in the example above, the MAG ID is 2741809807 ). Of
course this won't yield anything useful for entities that don't have a MAG ID.

Merged Entity IDs


At times we need to merge two Entities, effectively deleting one of them. This
usually happens when we discover two Entities that represent the same real-world
entity - for example, two Authors that are really the same person.

If you request an Entity using its OpenAlex ID, and that Entity has been merged into
another Entity, you will be redirected to the Entity it has been merged into. For
example, https://openalex.org/A5092938886 has been merged into
https://openalex.org/A5006060960, so in the API the former will redirect to the
latter:

$ curl -i https://api.openalex.org/authors/A5092938886
HTTP/1.1 301 MOVED PERMANENTLY
Location: https://api.openalex.org/authors/A5006060960

Most clients will handle this transparently; you'll get the data for author
A5006060960 without knowing the redirect even happened. If you have stored
Entity ID lists and do notice the redirect, you might as well replace the merged-
away ID to skip the redirect next time.
Supported IDs
For each entity type, you can retrieve the entity using by any of the external IDs we
support--not just the native OpenAlex IDs. So for example:

Get the work with this doi: https://doi.org/10.7717/peerj.4375 :


https://api.openalex.org/works/https://doi.org/10.7717/peerj.4375

This works with DOIs, ISSNs, ORCIDs, and lots of other IDs...in fact, you can use
any ID listed in an entity's ids property, as listed below:

Work.ids

Author.ids

Source.ids

Institution.ids

Concept.ids

Publisher.ids

ID formats
Most of the external IDs OpenAlex supports are canonically expressed as URLs...for
example, the canonical form of a DOI always starts with https://doi.org/ . You
can always use these URL-style IDs in the entity endpoints. Examples:

Get the institution with the ROR https://ror.org/02y3ad647 (University of Florida):


https://api.openalex.org/institutions/https://ror.org/02y3ad647

Get the author with the ORCID https://orcid.org/0000-0003-1613-5981 (Heather


Piwowar):
https://api.openalex.org/authors/https://orcid.org/0000-0003-1613-5981

For simplicity and clarity, you may also want to express those IDs in a simpler, URN-
style format, and that's supported as well; you just write the namespace of the ID,
followed by the ID itself. Here are the same examples from above, but in the
namespace:id format:

Get the institution with the ROR https://ror.org/02y3ad647 (University of Florida):


https://api.openalex.org/institutions/ror:02y3ad647

Get the author with the ORCID https://orcid.org/0000-0003-1613-5981 (Heather


Piwowar):
https://api.openalex.org/authors/orcid:0000-0003-1613-5981

Finally, if you're using an OpenAlex ID, you can be even more succinct, and just use
the Key part of the ID all by itself, the part that looks like w1234567 :

Get the work with OpenAlex ID https://openalex.org/W2741809807:


https://api.openalex.org/works/W2741809807

Canonical External IDs


Every entity has an OpenAlex ID. Most entities also have IDs in other systems, too.
There are hundreds of different ID systems, but we've selected a single external ID
system for each entity to provide the Canonical External ID--this is the ID in the
system that's been most fully adopted by the community, and is most frequently
used in the wild. We support other external IDs as well, but the canonical ones get a
privileged spot in the API and dataset.

These are the Canonical External IDs:

Works: DOI
Authors: ORCID
Sources: ISSN-L
Institutions: ROR ID
Concepts: Wikidata ID
Publishers: Wikidata ID
Dehydrated entity objects
The full entity objects can get pretty unwieldy, especially when you're embedding a
list of them in another object (for instance, a list of Concept s in a Work ). For these
cases, all the entities except Work s have a dehydrated version. This is a stripped-
down representation of the entity that carries only its most essential properties.
These properties are documented individually on their respective entity pages.

\
Random result
You can get a random result by using the string random where an ID would
normally go. OMG that's so random! Each time you call this URL you'll get a
different entity. Examples:

Get a random institution:


https://api.openalex.org/institutions/random

Get a random concept:


https://api.openalex.org/concepts/random
Select fields
You can use select to choose top-level fields you want to see in a result.

Display id and display_name for a work


https://api.openalex.org/works/W2138270253?select=id,display_name

{
id: "https://openalex.org/W2138270253",
display_name: "DNA sequencing with chain-terminating inhibitors"
}

Read more about this feature here.


Get lists of entities
It's easy to get a list of entity objects from from the API: /<entity_name> . Here's an
example:

Get a list of all the topics in OpenAlex:


https://api.openalex.org/topics

This query returns a meta object with details about the query, a results list of
Topic objects, and an empty group_by list:

meta: {
count: 4516,
db_response_time_ms: 81,
page: 1,
per_page: 25
},
results: [
// long list of Topic entities
],
group_by: [] // empty

Listing entities is a lot more useful when you add parameters to page, filter, search,
and sort them. Keep reading to learn how to do that.
Paging
You can see executable examples of paging in this user-contributed Jupyter notebook!

Basic paging
Use the query parameter to control which page of results you want (eg
page
page=1 , page=2 , etc). By default there are 25 results per page; you can use the
per-page parameter to change that to any number between 1 and 200.

Get the 2nd page of a list:


https://api.openalex.org/works?page=2

Get 200 results on the second page:


https://api.openalex.org/works?page=2&per-page=200

Basic paging only works to get the first 10,000 results of any list. If you want to see
more than 10,000 results, you'll need to use cursor paging.

Cursor paging
Cursor paging is a bit more complicated than basic paging, but it allows you to
access as many records as you like.

To use cursor paging, you request a cursor by adding the cursor=* parameter-
value pair to your query.

Get a cursor in order to start cursor pagination:


https://api.openalex.org/works?filter=publication_year:2020&per-
page=100&cursor=*

The response to your query will include a next_cursor value in the response's
meta object. Here's what it looks like:
{
"meta": {
"count": 8695857,
"db_response_time_ms": 28,
"page": null,
"per_page": 100,
"next_cursor": "IlsxNjA5MzcyODAwMDAwLCAnaHR0cHM6Ly9vcGVuYWxleC5vcmcvV
},
"results" : [
// the first page of results
]
}

To retrieve the next page of results, copy the meta.next_cursor value into the
cursor field of your next request.

Get the next page of results using a cursor value:


https://api.openalex.org/works?filter=publication_year:2020&per-
page=100&cursor=IlsxNjA5MzcyODAwMDAwLCAnaHR0cHM6Ly9vcGVuYWxleC5vcmcvVzI
0ODg0OTk3NjQnXSI=

This second page of results will have a new value for meta.next_cursor . You'll use
this new value the same way you did the first, and it'll give you the second page of
results. To get all the results, keep repeating this process until meta.next_cursor
is null and the results set is empty.

Besides using cursor paging to get entities, you can also use it in group_by
queries.

Don't use cursor paging to download the whole dataset.

It's bad for you because it will take many days to page through a long list
like /works or /authors.
It's bad for us (and other users!) because it puts a massive load on our
servers.
Instead, download everything at once, using the OpenAlex snapshot. It's free, easy,
fast, and you get all the results in same format you'd get from the API.
Filter entity lists
Filters narrow the list down to just entities that meet a particular condition--
specifically, a particular value for a particular attribute.

A list of filters are set using the parameter, formatted like this:
filter
filter=attribute:value,attribute2:value2 . Examples:

Get the works whose type is book :


https://api.openalex.org/works?filter=type:book

Get the authors whose name is Einstein:


https://api.openalex.org/authors?filter=display_name.search:einstein

Filters are case-insensitive.

Logical expressions
Inequality
For numerical filters, use the less-than ( < ) and greater-than ( > ) symbols to filter
by inequalities. Example:

Get sources that host more than 1000 works:


https://api.openalex.org/sources?filter=works_count:>1000

Some attributes have special filters that act as syntactic sugar around commonly-
expressed inequalities: for example, the from_publication_date filter on works .
See the endpoint-specific documentation below for more information. Example:

Get all works published between 2022-01-01 and 2022-01-26 (inclusive):


https://api.openalex.org/works?filter=from_publication_date:2022-01-
01,to_publication_date:2022-01-26

Negation (NOT)
You can negate any filter, numerical or otherwise, by prepending the exclamation
mark symbol ( ! ) to the filter value. Example:

Get all institutions except for ones located in the US:


https://api.openalex.org/institutions?filter=country_code:!us

Intersection (AND)
By default, the returned result set includes only records that satisfy all the supplied
filters. In other words, filters are combined as an AND query. Example:

Get all works that have been cited more than once and are free to read:
https://api.openalex.org/works?filter=cited_by_count:>1,is_oa:true

To create an AND query within a single attribute, you can either repeat a filter, or
use the plus symbol ( + ):

Get all the works that have an author from France and an author from the UK:
Using repeating filters:
https://api.openalex.org/works?
filter=institutions.country_code:fr,institutions.country_code:gb

Using the plus symbol ( + ):


https://api.openalex.org/works?
filter=institutions.country_code:fr+gb

Note that the plus symbol ( + ) syntax will not work for search filters, boolean
filters, or numeric filters.

Addition (OR)
Use the pipe symbol ( | ) to input lists of values such that any of the values can be
satisfied--in other words, when you separate filter values with a pipe, they'll be
combined as an OR query. Example:

Get all the works that have an author from France or an author from the UK:
https://api.openalex.org/works?filter=institutions.country_code:fr|gb
This is particularly useful when you want to retrieve a many records by ID all at
once. Instead of making a whole bunch of singleton calls in a loop, you can make
one call, like this:

Get the works with DOI 10.1371/journal.pone.0266781 or with DOI


10.1371/journal.pone.0267149 (note the pipe separator between the two
DOIs):
https://api.openalex.org/works?
filter=doi:https://doi.org/10.1371/journal.pone.0266781|https://doi.org
/10.1371/journal.pone.0267149

You can combine up to 100 values for a given filter in this way. You will also need to
use the parameter per-page=100 to get all of the results per query. See our blog
post for a tutorial.

You can use OR for values within a given filter, but not between different filters. So
this, for example, doesn't work and will return an error:

Get either French works or ones published in the journal with ISSN
0957-1558:
https://api.openalex.org/works?
filter=institutions.country_code:fr|primary_location.source.issn
:0957-1558

Available Filters
The filters for each entity can be found here:

Works
Authors
Sources
Institutions
Concepts
Publishers
Funders
Search entities
The search parameter
The search query parameter finds results that match a given text search.
Example:

Get works with search term "dna" in the title, abstract, or fulltext:
https://api.openalex.org/works?search=dna

When you search works , the API looks for matches in titles, abstracts, and fulltext.
When you search concepts , we look in each concept's display_name and
description fields. When you search sources , we look at the display_name ,
alternate_titles , and abbreviated_title fields. When you search authors ,
we look at the display_name and display_name_alternatives fields. When you
search institutions , we look at the display_name , display_name_alternatives
, and display_name_acronyms fields.

For most text search we remove stop words and use stemming (specifically, the
Kstem token filter) to improve results. So words like "the" and "an" are
transparently removed, and a search for "possums" will also return records using
the word "possum." With the exception of raw affiliation strings, we do not search
within words but rather try to match whole words. So a search with "lun" will not
match the word "lunar".

Search without stemming


To disable stemming and the removal of stop words for searches on titles and
abstracts, you can add .no_stem to the search filter. So, for example, if you want
to search for "surgery" and not get "surgeries" too:

https://api.openalex.org/works?
filter=display_name.search.no_stem:surgery

https://api.openalex.org/works?filter=title.search.no_stem:surgery
https://api.openalex.org/works?filter=abstract.search.no_stem:surgery

https://api.openalex.org/works?
filter=title_and_abstract.search.no_stem:surgery

Boolean searches
Including any of the words AND , OR , or NOT in any of your searches will enable
boolean search. Those words must be UPPERCASE. You can use this in all
searches, including using the search parameter, and using search filters.

This allows you to craft complex queries using those boolean operators along with
parentheses and quotation marks. Surrounding a phrase with quotation marks will
search for an exact match of that phrase, after stemming and stop-word removal
(be sure to use double quotation marks — " ). Using parentheses will specify
order of operations for the boolean operators. Words that are not separated by one
of the boolean operators will be interpreted as AND .

Behind the scenes, the boolean search is using Elasticsearch's query string query
on the searchable fields (such as title, abstract, and fulltext for works; see each
individual entity page for specifics about that entity). Wildcard and fuzzy searches
using * , ? or ~ are not allowed; these characters will be removed from any
searches. These searches, even when using quotation marks, will go through the
same cleaning as desscribed above, including stemming and removal of stop
words.

Search for works that mention "elmo" and "sesame street," but not the words
"cookie" or "monster":
https://api.openalex.org/works?search=(elmo AND "sesame street") NOT
(cookie OR monster)

Relevance score
When you use search, each returned entity in the results lists gets an extra
property called relevance_score , and the list is by default sorted in descending
order of relevance_score . The relevance_score is based on text similarity to
your search term. It also includes a weighting term for citation counts: more highly-
cited entities score higher, all else being equal.

If you search for a multiple-word phrase, the algorithm will treat each word
separately, and rank results higher when the words appear close together. If you
want to return only results where the exact phrase is used, just enclose your phrase
within quotes. Example:

Get works with the exact phrase "fierce creatures" in the title or abstract
(returns just a few results):
https://api.openalex.org/works?search="fierce%20creatures"

Get works with the words "fierce" and "creatures" in the title or abstract, with
works that have the two words close together ranked higher by
relevance_score (returns way more results):
https://api.openalex.org/works?search=fierce%20creatures

The search filter


You can also use search as a filter, allowing you to fine-tune the fields you're
searching over. To do this, you append .search to the end of the property you are
filtering for:

Get authors who have "Einstein" as part of their name:


https://api.openalex.org/authors?filter=display_name.search:einstein

Get works with "cubist" in the title:


https://api.openalex.org/works?filter=title.search:cubist

Additionally, the filter default.search is available on all entities; this works the
same as the search parameter.
You might be tempted to use the search filter to power an autocomplete or typeahead.
Instead, we recommend you use the autocomplete endpoint, which is much faster.

👎 https://api.openalex.org/institutions?filter=display_name.search:florida

👍 https://api.openalex.org/autocomplete/institutions?q=Florida
Sort entity lists
Use the ?sort parameter to specify the property you want your list sorted by. You
can sort by these properties, where they exist:

display_name

cited_by_count

works_count

publication_date

relevance_score (only exists if there's a search filter active)

By default, sort direction is ascending. You can reverse this by appending :desc to
the sort key like works_count:desc . You can sort by multiple properties by
providing multiple sort keys, separated by commas. Examples:

All works, sorted by cited_by_count (highest counts first)


https://api.openalex.org/works?sort=cited_by_count:desc

All sources, in alphabetical order by title:


https://api.openalex.org/sources?sort=display_name

You can sort by relevance_score when searching:

Sort by year, then by relevance_score when searching for "bioplastics":


https://api.openalex.org/works?
filter=display_name.search:bioplastics&sort=publication_year:desc,relev
ance_score:desc

An error is thrown if attempting to sort by relevance_score without a search


query.
Select fields
You can use select to limit the fields that are returned in results.

Display works with only the id , doi , and display_name returned in the
results
https://api.openalex.org/works?select=id,doi,display\_name

"results": [
{
"id": "https://openalex.org/W1775749144",
"doi": "https://doi.org/10.1016/s0021-9258(19)52451-6",
"display_name": "PROTEIN MEASUREMENT WITH THE FOLIN PHENOL REAGENT"
},
{
"id": "https://openalex.org/W2100837269",
"doi": "https://doi.org/10.1038/227680a0",
"display_name": "Cleavage of Structural Proteins during the Assembly
},
// more results removed for brevity
]

Limitations
The fields you choose must exist within the entity (of course). You can only select
root-level fields.

So if we have a record like so:

"id": "https://openalex.org/W2138270253",
"open_access": {
"is_oa": true,
"oa_status": "bronze",
"oa_url": "http://www.pnas.org/content/74/12/5463.full.pdf"
}

You can choose to display id and open_access , but you will get an error if you try
to choose open_access.is_oa .
You can use select fields when getting lists of entities or a single entity. It does not
work with group-by or autocomplete.
Sample entity lists
You can use sample to get a random list of up to 10,000 results.

Get 100 random works


https://api.openalex.org/works?sample=100&per-page=100
Get 50 random works that are open access and published in 2021
https://api.openalex.org/works?
filter=open_access.is_oa:true,publication_year:2021&sample=50&per-page=50

You can add a seed value in order to retrieve the same set of random records, in
the same order, multiple times.

Get 20 random sources with a seed value


https://api.openalex.org/sources?sample=20&seed=123

Depending on your query, random results with a seed value may change over time due
to new records coming into OpenAlex.

Limitations
The sample size is limited to 10,000 results.
You must provide a seed value when paging beyond the first page of results.
Without a seed value, you might get duplicate records in your results.
You must use basic paging when sampling. Cursor pagination is not supported.
Autocomplete entities
The autocomplete endpoint lets you add autocomplete or typeahead components
to your applications, without the overhead of hosting your own API endpoint.

Each endpoint takes a string, and (very quickly) returns a list of entities that match
that string.

Here's an example of an autocomplete component that lets users quickly select an


institution:

A user looking for information on the flagship of Florida's state university system.

This is the query behind that result:


https://api.openalex.org/autocomplete/institutions?q=flori

The autocomplete endpoint is very fast; queries generally return in around 200ms.
If you'd like to see it in action, we're using a slightly-modified version of this
endpoint in the OpenAlex website here: https://explore.openalex.org/

Request format
The format for requests is simple: /autocomplete/<entity_type>?q=<query>

entity_type (optional): the name of one of the OpenAlex entities: works ,


authors , sources , institutions , concepts , publishers , or funders .

query : the search string supplied by the user.

You can optionally filter autocomplete results.


Response format
Each request returns a response object with two properties:

meta : an object with information about the request, including timing and results
count
results: a list of up to ten results for the query, sorted by citation count. Each
result represents an entity that matched against the query.

{
meta: {
count: 183,
db_response_time_ms: 5,
page: 1,
per_page: 10
},
results: [
{
id: "https://openalex.org/I33213144",
display_name: "University of Florida",
hint: "Gainesville, USA",
cited_by_count: 17190001,
entity_type: "institution",
external_id: "https://ror.org/02y3ad647"
},
// more results...
]
}

Each object in the results list includes these properties:

id (string): The OpenAlex ID for this result entity.


external_id (string): The Canonical External ID for this result entity.
display_name (string): The entity's display_name property.
entity_type (string): The entity's type: author , concept , institution ,
source , publisher , funder , or work .

cited_by_count (integer): The entity's cited_by_count property. For works


this is simply the number of incoming citations. For other entities, it's the sum of
incoming citations for all the works linked to that entity.
works_count (integer): The number of works associated with the entity. For
entity type work it's always null.
hint : Some extra information that can help identify the right item. Differs by
entity type.

The hint property


Result objects have a hint property. You can show this to users to help them
identify which item they're selecting. This is particularly helpful when the
display_name values of different results are the same, as often happens when
autocompleting an author entity--a user who types in John Smi is going to see a
lot of identical-looking results, even though each one is a different person.

The content of the hint property varies depending on what kind of entity you're
looking up:

Work: The work's authors' display names, concatenated. e.g. "R. Alexander
Pyron, John J. Wiens"
Author: The author's last known institution, e.g. "University of North Carolina at
Chapel Hill, USA"
Source : The host_organization , e.g. "Oxford University Press"
Institution : The institution's location, e.g. "Gainesville, USA"
Concept : The Concept's description, e.g. "the study of relation between plant
species and genera"

IDs in autocomplete
Canonical External IDs and OpenAlex IDs are detected within autocomplete queries
and matched to the appropriate record if it exists. For example:

The query
https://api.openalex.org/autocomplete?q=https://orcid.org/0000-0002-
7436-3176
will search for the author with ORCID ID
https://orcid.org/0000-0002-7436-3176 and return 0 records if it does not
exist.
The query https://api.openalex.org/autocomplete/sources?q=S49861241 will
search for the source with OpenAlex ID https://openalex.org/S49861241 and
return 0 records if it does not exist.

Filter autocomplete results


All entity filters and search queries can be added to autocomplete and work as
expected, like:

https://api.openalex.org/autocomplete/works?
filter=publication_year:2010&search=frogs&q=greenhou
Get groups of entities
Sometimes instead of just listing entities, you want to group them into facets, and
count how many entities are in each group. For example, maybe you want to count
the number of Works by open access status. To do that, you call the entity
endpoint, adding the group_by parameter. Example:

Get counts of works by type:


https://api.openalex.org/works?group_by=type

This returns a meta object with details about the query, and a group_by object
with the groups you've asked for:
{
meta: {
count: 246136992,
db_response_time_ms: 271,
page: 1,
per_page: 200,
groups_count: 15
},
group_by: [
{
key: "article",
key_display_name: "article",
count: 202814957
},
{
key: "book-chapter",
key_display_name: "book-chapter",
count: 21250659
},
{
key: "dissertation",
key_display_name: "dissertation",
count: 6055973
},
{
key: "book",
key_display_name: "book",
count: 5400871
},
...
]
}

So from this we can see that the majority of works (202,814,957 of them) are type
article , with another 21,250,659 book-chapter , and so forth.

You can group by most of the same properties that you can filter by, and you can
combine grouping with filtering.

Group properties
Each group object in the group_by list contains three properties:
key

Value: a string; the OpenAlex ID or raw value of the group_by parameter for
members of this group. See details on key and key_display_name .

key_display_name

Value: a string; the display_name or raw value of the group_by parameter for
members of this group. See details on key and key_display_name .

count

Value: an integer; the number of entities in the group.

"Unknown" groups
The "unknown" group is hidden by default. If you want to include this group in the
response, add :include_unknown after the group-by parameter.

Group works by authorships.countries (unknown group hidden):


https://api.openalex.org/works?group_by=authorships.countries

Group works by authorships.countries (includes unknown group):


https://api.openalex.org/works?
group_by=authorships.countries:include_unknown

key and key_display_name


If the value being grouped by is an OpenAlex Entity , the key and
key_display_name properties will be that Entity 's id and display_name ,
respectively.

Group Works by Institution :


https://api.openalex.org/works?group_by=authorships.institutions.id
For one group, key is "https://openalex.org/I136199984" and
key_display_name is "Harvard University".

Otherwise, key is the same as key_display_name ; both are the raw value of the
group_by parameter for this group.

Group Concepts by level :


https://api.openalex.org/concepts?group_by=level

For one group, both key and key_display_name are "3".

Group-by meta properties


meta.count is the total number of works (this will be all works if no filter is
applied). meta.groups_count is the count of groups (in the current page).

If there are no groups in the response, meta.groups_count is null .

Due to a technical limitation, we can only report the number of groups in the current
page, and not the total number of groups.

Paging
The maximum number of groups returned is 200. If you want to get more than 200
groups, you can use cursor pagination. This works the same as it does when
getting lists of entities, so head over to the section on paging through lists of results
to learn how.

Due to technical constraints, when paging, results are sorted by key, rather than by
count.
Rate limits and authentication
The API is rate-limited. The limits are:

max 100,000 calls every day, and also


max 10 requests every second.

If you hit the API more than 100k times in a day or more than 10 in a second, you'll
get 429 errors instead of useful data.

Are those rate limits too low for you? No problem! We can raise those limits as high
as you need if you subscribe to our Premium plan. And if you're an academic
researcher we can likely do it for free; just drop us a line at [email protected].

Are you scrolling through a list of entities, calling the API for each? You can go way
faster by squishing 50 requests into one using our OR syntax. Here's a tutorial
showing how.

Authentication
The OpenAlex API doesn't require authentication. However, it is helpful for us to
know who's behind each API call, for two reasons:

It allows us to get in touch with the user if something's gone wrong--for


instance, their script has run amok and we've needed to start blocking or
throttling their usage.
It lets us report back to our funders, which helps us keep the lights on.

Like Crossref (whose approach we are shamelessly stealing), we prefer carrots to


sticks for this. So, depending on your preferences, you'll be in one of two API pools:

The polite pool


The polite pool has much faster and more consistent response times. It's a good
place to be.

To get into the polite pool, you just have to give us an email where we can contact
you. You can give us this email in one of two ways:

Add the [email protected] parameter in your API request, like this:


https://api.openalex.org/[email protected]

Add mailto:[email protected] somewhere in your User-Agent request header.

The common pool


The common pool has slower and less consistent response times. It's a less good
place to be. We encourage everyone to get in the polite pool 😇👍

Usage tips
Calling the API in your browser
Because the API is all GET requests without fancy authentication, you can view any
request in your browser. This is a very useful and pleasant way to explore the API
and debug scripts; we use it all the time.

However, this is much nicer if you install an extension to pretty-print the JSON;
JSONVue (Chrome) and JSONView (Firefox) are popular, free choices. Here's what
an API response looks like with one of these extensions enabled:
A lot prettier than cURL
Download all data
OpenAlex snapshot
For most use cases, the REST API is your best option. However, you can also
download (instructions here) and install a complete copy of the OpenAlex database
on your own server, using the database snapshot. The snapshot consists of seven
files (split into smaller files for convenience), with one file for each of our seven
entity types. The files are in the JSON Lines format; each line is a JSON object,
exactly the same as you'd get from our API. The properties of these JSON objects
are documented in each entity's object section (for example, the Work object).

The snapshot is updated about once per month; you can read release notes for
each new update here.

If you've worked with a dataset like this before, the snapshot data format may be all
you need to get going. If not, read on.

The rest of this guide will tell you how to (a) download the snapshot and (b) upload
it to your own database. We’ll cover two general approaches:

Load the intact OpenAlex records to a data warehouse (we’ll use BigQuery as an
example) and use native JSON functions to query the Work, Author, Source,
Institution, Concept, and Publisher objects directly.
Flatten the records into a normalized schema in a relational database (we’ll use
PostgreSQL) while preserving the relationships between objects.

We'll assume you're initializing a fresh snapshot. To keep it up to date, you'll have
to take the information from Downloading updated Entities and generalize from the
steps in the guide.

This is hard. Working with such a big and complicated dataset hardly ever goes
according to plan. If it gets scary, try the REST API. In fact, try the REST API first. It can
answer most of your questions and has a much lower barrier to entry.

There’s more than one way to do everything. We’ve tried to pick one reasonable
default way to do each step, so if something doesn’t work in your environment or
with the tools you have available, let us know.
Up next: the snapshot data format, downloading the data and getting it into your
database.
Snapshot data format
Here are the details on where the OpenAlex data lives and how it's structured.

All the data is stored in Amazon S3, in the openalex bucket.


The data files are gzip-compressed JSON Lines, one row per entity.
The bucket contains one prefix (folder) for each entity type: work, author,
source, institution, concept, and publisher.
Records are partitioned by updated_date. Within each entity type prefix, each
object (file) is further prefixed by this date. For example, if an Author has an
updated_date of 2021-12-30 it will be prefixed
/data/authors/updated_date=2021-12-30/ .

If you're initializing a fresh snapshot, the updated_date partitions aren't


important yet. You need all the entities, so for Authors you would get
/data/authors /*/*.gz

There are multiple objects under each updated_date partition. Each is under
2GB.
The manifest file is JSON (in redshift manifest format) and lists all the data files
for each object type - /data/works/manifest lists all the works.
The gzip-compressed snapshot takes up about 330 GB and decompresses to
about 1.6 TB.

The structure of each entity type is documented here: Work, Author, Source,
Institution, Concept, and Publisher.

We have recently added folders for new entities topics , fields , subfields , and
domains , and we will be adding others soon. This documentation will soon be
updated to reflect these changes.

Visualization of the entity_type/updated_date folder structure

This is a screenshot showing the "leaf" nodes of one entity type, updated date
folder. You can also click around the browser links above to get a sense of the
snapshot's structure.
Downloading updated Entities
Once you have a copy of the snapshot, you'll probably want to keep it up to date.
The updated_date partitions make this easy, but the way they work may be
unfamiliar. Unlike a set of dated snapshots that each contain the full dataset as of a
certain date, each partition contains the records that last changed on that date.

If we imagine launching OpenAlex on 2021-12-30 with 1000 Authors , each being


newly created on that date, /data/authors/ looks like this:

/data/authors/
├── manifest
└── updated_date=2021-12-30 [1000 Authors]
├── 0000_part_00.gz
...
└── 0031_part_00.gz

If, on 2022-01-04, we made changes to 50 of those Authors , they would come out
of one of the files in /data/authors/updated_date=2021-12-30 and go into one in
/data/authors/updated_date=2022-01-04:
/data/authors/
├── manifest
├── updated_date=2021-12-30 [950 Authors]
│ ├── 0000_part_00.gz
│ ...
│ └── 0031_part_00.gz
└── updated_date=2022-01-04 [50 Authors]
├── 0000_part_00.gz
...
└── 0031_part_00.gz

If we also discovered 50 new Authors, they would go in that same partition, so the
totals would look like this:

/data/authors/
├── manifest
├── updated_date=2021-12-30 [950 Authors]
│ ├── 0000_part_00.gz
│ ...
│ └── 0031_part_00.gz
└── updated_date=2022-01-04 [100 Authors]
├── 0000_part_00.gz
...
└── 0031_part_00.gz

So if you made your copy of the snapshot on 2021-12-30, you would only need to
download /data/authors/updated_date=2022-01-04 to get everything that was
changed or added since then.

To update a snapshot copy that you created or updated on date X , insert or update
the records in objects where updated_date > X .

You never need to go back for a partition you've already downloaded. Anything that
changed isn't there anymore, it's in a new partition.

At the time of writing, these are the Author partitions and the number of records in
each (in the actual dataset):

updated_date=2021-12-30/ - 62,573,099
updated_date=2022-12-31/ - 97,559,192
updated_date=2022-01-01/ - 46,766,699
updated_date=2022-01-02/ - 1,352,773

This reflects the creation of the dataset on 2021-12-30 and 145,678,664 combined
updates and inserts since then - 1,352,773 of which were on 2022-01-02. Over
time, the number of partitions will grow. If we make a change that affects all
records, the partitions before the date of the change will disappear.

Merged Entities

See Merged Entities for an explanation of what Entity merging is and why we do it.

Alongside the folders for the six Entity types - work, author, source, institution,
concept, and publisher - you'll find a seventh folder: merged_ids. Within this folder
you'll find the IDs of Entities that have been merged away, along with the Entity IDs
they were merged into.

Keep in mind that merging an Entity ID is a way of deleting the Entity while
persisting its ID in OpenAlex. In practice, you can just delete the Entity it belongs to.
It's not necessary to keep track of the date or which entity it was merged into.

Merge operations are separated into files by date. Each file lists the IDs of Entities
that were merged on that date, and names the Entities they were merged into.

/data/merged_ids/
├── authors
│ └── 2022-06-07.csv.gz
├── institutions
│ └── 2022-06-01.csv.gz
├── venues
│ └── 2022-06-03.csv.gz
└── works
└── 2022-06-06.csv.gz

For example, data/merged_ids/authors/2022-06-07.csv.gz begins:


merge_date,id,merge_into_id
2022-06-07,A2257618939,A2208157607

When processing this file, all you need to do is delete A2257618939. The effects of
merging these authors, like crediting A2208157607 with their Works, are already
reflected in the affected Entities.

Like the Entities' updated_date partitions, you only ever need to download
merged_ids files that are new to you. Any later merges will appear in new files with
later dates.

The manifest file


When we start writing a new updated_date partition for an entity, we'll delete that
entity's manifest file. When we finish writing the partition, we'll recreate the
manifest, including the newly-created objects. So if manifest is there, all the
entities are there too.

The file is in redshift manifest format. To use it as part of the update process for an
Entity type (we'll keep using Authors as an example):

1. Download s3://openalex/data/authors/manifest .

2. Get the file list from the url property of each item in the entries list.
3. Download any objects with an updated_date you haven't seen before.
4. Download s3://openalex/data/authors/manifest again. If it hasn't changed
since (1), no records moved around and any date partitions you downloaded are
valid.
5. Decompress the files you downloaded and parse one JSON Author per line.
Insert or update into your database of choice, using each entity's ID as a
primary key.

If you’ve worked with dataset like this before and have a toolchain picked out, this
may be all you need to know. If you want more detailed steps, proceed to download
the data.
Download to your machine
First off: anyone can get the data for free. While the files are hosted on S3 and we’ll
be using Amazon tools in these instructions, you don’t need an Amazon account.

Many thanks to the AWS Open Data program. They cover the data-transfer fees (about
$70 per download!) so users don't have to.

Before you load the snapshot contents to your database, you’ll need to get the files
that make it up onto your own computer. There are exceptions, like loading to
redshift from s3 or using an ETL product like Xplenty with an S3 connector. If either
of these apply to you, see if the snapshot data format is enough to get you started.

The easiest way to get the files is with the Amazon Web Services Command Line
Interface (AWS CLI). Sample commands in this documentation will use the AWS
CLI. You can find instructions for installing it on your system here:
https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html

You can also browse the snapshot files using the AWS console here:
https://openalex.s3.amazonaws.com/browse.html. This browser and the CLI will
work without an account.

This shell command will copy everything in the openalex S3 bucket to a local
folder named openalex-snapshot . It'll take up roughly 300GB of disk space.

aws s3 sync "s3://openalex" "openalex-snapshot" --no-sign-request

If you download the snapshot into an existing folder, you'll need to use the
aws s3 sync --delete flag to remove files from any previous downloads. You can
also remove the contents of destination folder manually. If you don't, you will see
duplicate Entities that have moved from one file to another between snapshot updates.

The size of the snapshot will change over time. You can check the current size
before downloading by looking at the output of:
aws s3 ls --summarize --human-readable --no-sign-request --recursive "s3:

You should get a file structure like this (edited for length - there are more objects in
the actual bucket):

openalex-snapshot/
├── LICENSE.txt
├── RELEASE_NOTES.txt
└── data
├── authors
│ ├── manifest
│ └── updated_date=2021-12-28
│ ├── 0000_part_00.gz
│ └── 0001_part_00.gz
├── concepts
│ ├── manifest
│ └── updated_date=2021-12-28
│ ├── 0000_part_00.gz
│ └── 0001_part_00.gz
├── institutions
│ ├── manifest
│ └── updated_date=2021-12-28
│ ├── 0000_part_00.gz
│ └── 0001_part_00.gz
├── sources
│ ├── manifest
│ └── updated_date=2021-12-28
│ ├── 0000_part_00.gz
│ └── 0001_part_00.gz
└── works
├── manifest
└── updated_date=2021-12-28
├── 0000_part_00.gz
└── 0001_part_00.gz
Upload to your database
Now that you have a copy of the OpenAlex data you can do one these:

upload it to a data warehouse


upload it to a relational database
Load to a data warehouse
In many data warehouse and document store applications, you can load the
OpenAlex entities as-is and query them directly. We’ll use BigQuery as an example
here. (Elasticsearch docs coming soon). To follow along you’ll need the Google
Cloud SDK. You’ll also need a Google account that can make BigQuery tables that
are, well… big. Which means it probably won’t be free.

We'll show you how to do this in 4 steps:

1. Create a BigQuery Project and Dataset to hold your tables


2. Create the tables that will hold your entity JSON records
3. Copy the data files to the tables you created
4. Run some queries on the data you loaded

This guide will have you load each entity to a single text column, then use BigQuery's
JSON functions to parse them when you run your queries. This is convenient but
inefficient since each object has to be parsed every time you run a query.

This project, kindly shared by @DShvadron, takes a more efficient approach:


https://github.com/DrorSh/openalex_to_gbq

Separating the Entity data into multiple columns takes more work up front but lets you
write queries that are faster, simpler, and often cheaper.

Snowflake users can connect to a ready-to-query data set on the marketplace,


helpfully maintained by Util -
https://app.snowflake.com/marketplace/listing/GZT0ZOMX4O7

Step 1: Create a BigQuery Project and


Dataset
In BigQuery, you need a Project and Dataset to hold your tables. We’ll call the
project “openalex-demo” and the dataset “openalex”. Follow the linked instructions
to create the Project, then create the dataset inside it:

bq mk openalex-demo:openalex

Dataset 'openalex-demo:openalex' successfully created

Step 2: Create tables for each entity type


Now, we’ll create tables inside the dataset. There will be 5 tables, one for each
entity type. Since we’re using JSON, each table will have just one text column
named after the table.

bq mk --table openalex-demo:openalex.works work:string

Table 'openalex-demo:openalex.works' successfully created.

bq mk --table openalex-demo:openalex.authors author:string

Table 'openalex-demo:openalex.authors' successfully created

and so on for sources , institutions , concepts, and publishers .

Step 3: Load the data files


We’ll load each table’s data from the JSON Lines files we downloaded earlier. For
works , the files were:

openalex-snapshot/data/works/updated_date=2021-12-28/0000_part_00.gz
openalex-snapshot/data/works/updated_date=2021-12-28/0001_part_00.gz
Here’s a command to load one works file (don’t run it yet):

bq load \
--project_id openalex-demo \
--source_format=CSV -F '\t' \
--schema 'work:string' \
openalex.works \
'openalex-snapshot/data/works/updated_date=2021-12-28/0000_part_00.gz'

See the full documentation for the bq load command here:


https://cloud.google.com/bigquery/docs/reference/bq-cli-reference#bq_load

This part of the command may need some explanation:

--source_format=CSV -F '\t' --schema 'work:string'

Bigquery is expecting multiple columns with predefined datatypes (a “schema”).


We’re tricking it into accepting a single text column ( --schema 'work:string' ) by
specifying CSV format ( --source_format=CSV ) with a column delimiter that isn’t
present in the file ( -F '\t') (\t means “tab”).

bq load can only handle one file at a time, so you must run this command once
per file. But remember that the real dataset will have many more files than this
example does, so it's impractical to copy, edit, and rerun the command each time.
It's easier to handle all the files in a loop, like this:

for data_file in openalex-snapshot/data/works/*/*.gz;


do
bq load --source_format=CSV -F '\t' \
--schema 'work:string' \
--project_id openalex-demo \
openalex.works $data_file;
done

This step is slow. How slow depends on your upload speed, but for Author and
Work we're talking hours, not minutes.
You can speed this up by using parallel or other tools to run multiple upload
commands at once. If you do, watch out for errors caused by hitting BigQuery quota
limits.

Do this once per entity type, substituting each entity name for work / works as
needed. When you’re finished, you’ll have five tables that look like this:

a screenshot of two rows of the works table from the BigQuery console

Step 4: Run your queries!


Now you have the all the OpenAlex data in a place where you can do anything you
want with it using BigQuery JSON functions through bq query or the BigQuery
console.

Here’s a simple one, extracting the OpenAlex ID and OA status for each work:

select
json_value(work, '$.id') as work_id,
json_value(work, '$.open_access.is_oa') as is_oa
from
`openalex-demo.openalex.works`;

It will give you a list of IDs (this is a truncated sample, the real result will be millions
of rows):
https://openalex.org/W2741809807 TRUE

https://openalex.org/W1491283979 FALSE

https://openalex.org/W1491315632 FALSE

You can run queries like this directly in your shell:

bq query \
--project_id=openalex-demo \
--use_legacy_sql=false \
"select json_value(work, '$.id') as work_id, json_value(work, '$.open_acc

But even simple queries are hard to read and edit this way. It’s better to write them
in a file than directly on the command line. Here’s an example of a slightly more
complex query - finding the author with the most open access works of all time:

with work_authorships_oa as (
select
json_value(work, '$.id') as work_id,
json_query_array(work, '$.authorships') as authorships,
cast(json_value(work, '$.open_access.is_oa') as BOOL) as is_oa
from `openalex-demo.openalex.works`
), flat_authorships as (
select work_id, authorship, is_oa
from work_authorships_oa,
unnest(authorships) as authorship
)
select
json_value(authorship, '$.author.id') as author_id,
count(distinct work_id) as num_oa_works
from flat_authorships
where is_oa
group by author_id
order by num_oa_works desc
limit 1;

We get one result:


author_id num_oa_works

https://openalex.org/A2798520857 3297

Checking out https://api.openalex.org/authors/A2798520857, we see that this is


Ashok Kumar at Manipal University Jaipur.
Load to a relational database
Compared to using a data warehouse, loading the dataset into a relational database
takes more work up front but lets you write simpler queries and run them on less
powerful machines. One important caveat is that this is a lot of data, and
exploration will be very slow in most relational databases.

By using a relational database, you trade flexibility for efficiency in certain selected
operations. The tables, columns, and indexes we have chosen in this guide represent
only one of many ways the entity objects could be stored. It may not be the best way
to store them given the queries you want to run. Some queries will be fast, others will
be painfully slow.

We’re going to use PostgreSQL as an example and skip the database server setup
itself. We’ll assume you have a working postgres 13+ installation on which you can
create schemas and tables and run queries. With that as a starting point, we'll take
you through these steps:

1. Define the tables the data will be stored in and some key relationships between
them (the "schema").
2. Convert the JSON Lines files you downloaded to CSV files that can be read by
the database application. We'll flatten them to fit a hierarchical database model.
3. Load the CSV data into to the tables you created.
4. Run some queries on the data you loaded.

Step 1: Create the schema


Running this SQL on your database (in the psql client, for example) will initialize a
schema for you.

Run it and you'll be set up to follow the next steps. To show you what it's doing,
we'll explain some excerpts here, using the concept entity as an example.
SQL in this section isn't anything additional you need to run. It's part of the schema we
already defined in the file above.

The key thing we're doing is "flattening" the nested JSON data. Some parts of this
are easy. Concept.id is just a string, so it goes in a text column called "id":

CREATE TABLE openalex.concepts (


id text NOT NULL,
-- plus some other columns ...
);

But Concept.related_concepts isn't so simple. You could store the JSON array
intact in a postgres JSON or JSONB column, but you would lose much of the
benefit of a relational database. It would be hard to answer questions about related
concepts with more than one degree of separation, for example. So we make a
separate table to hold these relationships:

CREATE TABLE openalex.concepts_related_concepts (


concept_id text,
related_concept_id text,
score real
);

We can preserve score in this relationship table and look up any other attributes
of the dehydrated related concepts in the main table concepts . Creating indexes
on concept_id and related_concept_id lets us look up concepts on both sides
of the relationship quickly.

Step 2: Convert the JSON Lines files to CSV


This python script will turn the JSON Lines files you downloaded into CSV files that
can be copied to the the tables you created in step 1.

This script assumes your downloaded snapshot is in openalex-snapshot and you've


made a directory csv-files to hold the CSV files.
Edit SNAPSHOT_DIR and CSV_DIR at the top of the script to read or write the files
somewhere else.

This script has only been tested using python 3.9.5.

Copy the script to the directory above your snapshot (if the snapshot is in
/home/yourname/openalex/openalex-snapshot/ , name it something like
/home/yourname/openalex/flatten-openalex-jsonl.py)

run it like this:

mkdir -p csv-files
python3 flatten-openalex-jsonl.py

This script is slow. Exactly how slow depends on the machine you run it on, but think
hours, not minutes.

If you're familiar with python, there are two big improvements you can make:

Run flatten_authors and flatten_works at the same time, either by


using threading in python or just running two copies of the script with
the appropriate lines commented out.
Flatten multiple .gz files within each entity type at the same time. This
means parallelizing the for jsonl_file_name ... loop in each
flatten_ function and writing multiple CSV files per entity type.

You should now have a directory full of nice, flat CSV files:
$ tree csv-files/
csv-files/
├── concepts.csv
├── concepts_ancestors.csv
├── concepts_counts_by_year.csv
├── concepts_ids.csv
└── concepts_related_concepts.csv
...
$ cat csv-files/concepts_related_concepts.csv
concept_id,related_concept_id,score
https://openalex.org/C41008148,https://openalex.org/C33923547,253.92
https://openalex.org/C41008148,https://openalex.org/C119599485,153.019
https://openalex.org/C41008148,https://openalex.org/C121332964,143.935
...

Step 3: Load the CSV files to the database


Now we run one postgres copy command to load each CSV file to its corresponding
table. Each command looks like this:

\copy openalex.concepts_ancestors (concept_id, ancestor_id) from csv-file

This script will run all the copy commands in the right order. Here's how to run it:

1. Copy it to the same place as the python script from step 2, right above the folder
with your CSV files.
2. Set the environment variable OPENALEX_SNAPSHOT_DB to the connection URI
for your database.
3. If your CSV files aren't in csv-files , replace each occurence of 'csv-files/' in
the script with the correct path.
4. Run it like this (from your shell prompt)

psql $OPENALEX_SNAPSHOT_DB < copy-openalex-csv.sql

or like this (from psql)

\i copy-openalex-csv.sql
There are a bunch of ways you can do this - just run the copy commands from the
script above in the right order in whatever client you're familiar with.

Step 4: Run your queries!


Now you have all the OpenAlex data in your database and can run queries in your
favorite client.

Here’s a simple one, getting the OpenAlex ID and OA status for each work:

select w.id, oa.oa_status


from openalex.works w
join openalex.works_open_access oa
on w.id = oa.work_id;

You'll get results like this (truncated, the actual result will be millions of rows):

id oa_status

https://openalex.org/W1496190310 closed

https://openalex.org/W2741809807 gold

https://openalex.org/W1496404095 bronze

Here’s an example of a more complex query - finding the author with the most open
access works of all time:
select
author_id,
count(distinct work_id) as num_oa_works
from (
select
a.id as author_id,
w.id as work_id,
oa.is_oa
from
openalex.authors a
join openalex.works_authorships wa on a.id = wa.author_id
join openalex.works w on wa.work_id = w.id
join openalex.works_open_access oa on w.id = oa.work_id
) work_authorships_oa
where is_oa
group by 1
order by 2 desc
limit 1;

We get the one row we asked for:

author_id num_oa_works

https://openalex.org/A2798520857 3297

Checking out https://api.openalex.org/authors/A2798520857, we see that this is


Ashok Kumar at Manipal University Jaipur. We could also have found this directly in
the query, through openalex.authors .
Postgres schema diagram
This is a diagram of one possible schema for storing the OpenAlex data in a
relational database. It's the one used in our examples here, but may not be the best
one for the ways you'll use the dataset.

(click to embiggen)
Additional Help
Tutorials
We're working on making a collection of tutorials to demonstrate how to use
OpenAlex to answer all sorts of questions. Check back often for more! Here's what
we have currently

Turn the page - Use paging to collect all of the works from an author.
Monitoring Open Access publications for a given institution - Learn how to filter
and group with the API.
What are the publication sources located in Japan? - Use the source entity to
look at a country's publications over time.
Calculate the h-index for a given author - Use filtering, sorting, and paging to get
citation counts and calculate the h-index, an author-level metric.
How are my institution's researchers collaborating with people around the
globe? - Learn about institutions in OpenAlex while exploring the
international research collaborations made by a university.
Getting started with OpenAlex Premium - Use your Premium API Key to
download the latest updates from our API and keep your data in sync with ours.
Introduction to openalexR - In this R notebook, an accompaniment to the
webinar on openalexR, you'll learn the basics of using the openalexR library to
get data from OpenAlex.
Report bugs
Oh no, you found a bug! 🕷️
Please tell us about it using this form on our help page.
FAQ
How do I cite OpenAlex?
See our citation section here.

Are OpenAlex IDs stable?


Yes!* The work associated with ID W1234 will keep the ID W1234.

When we find duplicated works, authors, etc that already have assigned IDs, we
merge them. Merged entities will redirect to the proper entity in the API. In the data
snapshot, there is a directory which lists the IDs that have been merged.

*In July 2023, OpenAlex switched to a new, more accurate, author identification
system, replaced all OpenAlex Author IDs with new ones. This is a very rare case in
which we violate the rule of having stable IDs, which is needed to make the
improvements. Old IDs and their connections to works remain available in the
historical OpenAlex data.

Can you index my journal?


We automatically index new journals and articles so there is nothing you need to
do. We primarily retrieve new records from Crossref. So if you are not seeing your
journal or article in OpenAlex, it is best to check if it is in Crossref with a query like
https://api.crossref.org/works/<doi> (example). We do not curate journals or
limit which journals will be included in OpenAlex. So any discoverable journals will
be added to the data set.

If your example DOI is in Crossref but not in OpenAlex, please send us a support
request so we can look into it further!

Do you disambiguate authors?


Yes. Using coauthors, references, and other features of the data, we can tell that
the same Jane Smith wrote both "Frog behavior" and "Frogs: A retrospective," but
it's a different Jane Smith who wrote "Oats before boats: The breakfast customs of
17th-Century Dutch bargemen." For more details on this, see the page on Author
disambiguation.

Do you gather author affiliations?


Yes. We automatically gather and normalize author affiliations from both structured
and unstructured sources.

Where does your data come from?


OpenAlex is not doing this alone! Rather, we're aggregating and standardizing data
from a whole bunch of other great projects, like a river fed by many tributaries. Our
two most important data sources are MAG and Crossref. Other key sources include:

ORCID
ROR
DOAJ
Unpaywall
Pubmed
Pubmed Central
The ISSN International Centre
Internet Archive
Web crawls
Subject-area and institutional repositories from arXiv to Zenodo and everywhere
in between

Learn more at our general help center article: About the data

How often is the data updated?


For now, the database snapshot is updated about once per month. We also offer a
much faster update cadence—as often as once every few hours—through
OpenAlex Premium.

Is your data quality better than ____?


Our dataset is still very young, so there's not a lot of systematic research
comparing OpenAlex to peer databases like MAG, Scopus, Dimensions, etc. We're
currently working on publishing some research like that ourselves. Our initial finding
are very encouraging...we believe OpenAlex is already comparable in coverage and
accuracy to the more established players--but OpenAlex is 100% open data, built
on 100% open-source code. We think that's a really important feature. We will also
continue improving the data quality in the days, weeks, months, and years ahead!

How is OpenAlex licensed?


OpenAlex data is licensed as CC0 so it is free to use and distribute.

How much does OpenAlex cost?


It's free! The website, the API, and the database snapshot are all available at no
charge. As a nonprofit, making this data free and open is part of our mission.

For those who would like a higher level of service and to provide direct financial
support for our mission, we offer OpenAlex Premium. Click here to learn more.

I've noticed incorrect data in an OpenAlex author


profile. How can I correct it?
Please see the help section on Author profile curation.

What's your sustainability plan?


Our nonprofit (OurResearch) has a ten-year track record of building sustainable
scholarly infrastructure, and a formal commitment to sustainability as part of our
adoption of the POSI principles.

We're currently still exploring our options for OpenAlex's sustainability plan. Thanks
to a generous grant from Arcadia, we've got lots of runway, and we don't need to
roll anything out in a rush.

Our Unpaywall project (a free index of the world's open-access research literature)
has been self-sustaining via a freemium revenue model for nearly five years, and
we have recently introduced a similar model in OpenAlex Premium. Access to the
data will always be free for everyone, but OpenAlex Premium offers several benefits
in service above the services we offer for free.

I have a question about the openalexR library. Could you


help me?
The openalexR package is a great way to work with the OpenAlex API using the R
programming language, but it is third-party software that we do not maintain
ourselves. Please direct any questions you have to them instead.

How can I count self-citations between works?


If you want to count self-citations—or, inversely independent citations where citing
and the cited work do not have any authors in common—you can check each
citation for whether they share any Author IDs in common in their authorships
field. See here for more information.

Do you provide access to full-text papers?


We provide links to the full-text PDFs for open-access works whenever possible. In
addition, we have access to raw full-text for many works either through PDF
parsing we have done, or using the Internet Archive's general index, which we use
to power our search. You can learn more about this here. We do not currently offer
direct access to raw full-text through the API or data snapshot.

You might also like