OpenAlex Technical Documentation
OpenAlex Technical Documentation
documentation
Overview
OpenAlex is a fully open catalog of the global research system. It's named after the
ancient Library of Alexandria and made by the nonprofit OurResearch.
This is the technical documentation for OpenAlex, including the OpenAlex API
and the data snapshot. Here, you can learn how to set up your code to access
OpenAlex's data. If you want to explore the data as a human, you may be more
interested in OpenAlex Web.
Data
The OpenAlex dataset describes scholarly entities and how those entities are
connected to each other. Types of entities include works, authors, sources,
institutions, topics, publishers, and funders.
Together, these make a huge web (or more technically, heterogeneous directed
graph) of hundreds of millions of entities and billions of connections between them
all.
Learn more at our general help center article: About the data
Access
We offer a fast, modern REST API to get OpenAlex data programmatically. It's free
and requires no authentication. The daily limit for API calls is 100,000 requests per
user per day. For best performance, add your email to all API requests, like
[email protected] . Learn more
There is also a complete database snapshot available to download. Learn more
about the data snapshot here.
The API has a limit of 100,000 calls per day, and the snapshot is updated monthly. If
you need a higher limit, or more frequent updates, please look into OpenAlex
Premium.
The web interface for OpenAlex, built directly on top of the API, is the quickest and
easiest way to get started with OpenAlex.
Why OpenAlex?
OpenAlex offers an open replacement for industry-standard scientific knowledge
bases like Elsevier's Scopus and Clarivate's Web of Science. Compared to these
paywalled services, OpenAlex offers significant advantages in terms of inclusivity,
affordability, and avaliability.
OpenAlex is:
Big — We have about twice the coverage of the other services, and have
significantly better coverage of non-English works and works from the Global
South.
Easy — Our service is fast, modern, and well-documented.
Open — Our complete dataset is free under the CC0 license, which allows for
transparency and reuse.
Many people and organizations have already found great value using OpenAlex.
Have a look at the Testimonials to hear what they've said!
Contact
For tech support and bug reports, please visit our help page. You can also join the
OpenAlex user group, and follow us on Twitter (@OpenAlex_org) and Mastodon.
Citation
If you use OpenAlex in research, please cite this paper:
Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of
scholarly works, authors, venues, institutions, and concepts. ArXiv.
https://arxiv.org/abs/2205.01833
Quickstart tutorial
Query the OpenAlex dataset using the magic of The Internet
Lets use the OpenAlex API to get journal articles and books published by authors at
Stanford University. We'll limit our search to articles published between 2010 and
2020. Since OpenAlex is free and openly available, these examples work without
any login or account creation. 👍
If you open these examples in a web browser, they will look much better if you have a
browser plug-in such as JSONVue installed.
{
"id": "https://openalex.org/I97018004",
"ror": "https://ror.org/00f54p054",
"display_name": "Stanford University",
"country_code": "US",
"type": "education",
"homepage_url": "http://www.stanford.edu/"
// other fields removed
}
Show works where at least one author is associated with Stanford University
https://api.openalex.org/works?
filter=institutions.id:https://openalex.org/I97018004
This is just one of the 50+ ways that you can filter works!
Show works with publication years 2010 to 2020, associated with Stanford
University
https://api.openalex.org/works?
filter=institutions.id:https://openalex.org/I97018004,publication_year:2010-
2020&sort=publication_date:desc
[
{
"key": "2020",
"key_display_name": "2020",
"count": 18627
},
{
"key": "2019",
"key_display_name": "2019",
"count": 15933
},
{
"key": "2017",
"key_display_name": "2017",
"count": 14789
},
...
]
There you have it! This same technique can be applied to hundreds of questions
around scholarly data. The data you received is under a CC0 license, so not only
🎉
did you access it easily, you can share it freely!
What's next?
Jump into an area of OpenAlex that interests you:
Works
Authors
Sources
Institutions
Topics
Publishers
Funders
And check out our tutorials page for some hands-on examples!
API Entities
Entities overview
The OpenAlex dataset describes scholarly entities and how those entities are
connected to each other. Together, these make a huge web (or more technically,
heterogeneous directed graph) of hundreds of millions of entities and billions of
connections between them all.
Works: Scholarly documents like journal articles, books, datasets, and theses
Authors: People who create works
Sources: Where works are hosted (such as journals, conferences, and
repositories)
Institutions: Universities and other organizations to which authors claim
affiliations
Topics: Topics assigned to works
Publishers: Companies and organizations that distribute works
Funders: Organizations that fund research
Geo: Where things are in the world
Works
Journal articles, books, datasets, and theses
Works are scholarly documents like journal articles, books, datasets, and theses.
OpenAlex indexes over 240M works, with about 50,000 added daily. You can
access a work in the OpenAlex API like this:
That will return a list of Work object, describing everything OpenAlex knows about
each work. We collect new works from many sources, including Crossref, PubMed,
institutional and discipline-specific repositories (eg, arXiv). Many older works come
from the now-defunct Microsoft Academic Graph (MAG).
Works are linked to other works via the referenced_works (outgoing citations),
cited_by_api_url (incoming citations), and related_works properties.
What's next
Learn more about what you can do with works:
abstract_inverted_index
Object: The abstract of the work, as an inverted index, which encodes information
about the abstract's words and their positions within the text. Like Microsoft
Academic Graph, OpenAlex doesn't include plaintext abstracts due to legal
constraints.
abstract_inverted_index: {
Despite: [
0
],
growing: [
1
],
interest: [
2
],
in: [
3,
57,
73,
110,
122
],
Open: [
4,
201
],
Access: [
5
],
...
}
alternate_host_venues (deprecated)
authorships
List: List of Authorship objects, each representing an author and their institution.
Limited to the first 100 authors to maintain API performance.
For more information, see the Authorship object page.
authorships: [
// first authorship object:
{
author_position: "middle",
author: {
id: "https://openalex.org/A5023888391",
display_name: "Jason Priem",
orcid: "https://orcid.org/0000-0001-6187-6610"
},
institutions: [
{
id: "https://openalex.org/I4200000001",
display_name: "OurResearch",
ror: "https://ror.org/02nr0ka47",
country_code: "US",
type: "nonprofit"
}
],
// other fields removed for brevity. See the Authorship object do
},
apc_list
Object: Information about this work's APC (article processing charge). The object
contains:
value : Integer
currency : String
provenance : String — the source of this data. Currently the only value is “doaj”
(DOAJ)
value_usd : Integer — the APC converted into USD
This value is the APC list price–the price as listed by the journal’s publisher. That’s
not always the price actually paid, because publishers may offer various discounts
to authors. Unfortunately we don’t always know this discounted price, but when we
do you can find it in apc_paid .
Currently our only source for this data is DOAJ, and so doaj is the only value for
apc_list.provenance , but we’ll add other sources over time.
We currently don’t have information on the list price for hybrid journals (toll-access
journals that also provide an open-access option), but we will add this at some
point. We do have apc_paid information for hybrid OA works occasionally.
You can use this attribute to find works published in Diamond open access journals
by looking at works where apc_list.value is zero. See open_access.oa_status
for more info.
apc_payment: {
value: 3200,
currency: "USD",
value_usd: 3200,
provenance: "doaj"
}
apc_paid
Object: Information about the paid APC (article processing charge) for this work.
The object contains:
value : Integer
currency : String
provenance : String — currently either openapc or doaj , but more will be
added; see below for details.
value_usd : Integer — the APC converted into USD
You can find the listed APC price (when we know it) for a given work using
apc_list . However, authors don’t always pay the listed price; often they get a
discounted price from publishers. So it’s useful to know the APC actually paid by
authors, as distinct from the list price. This is our effort to provide this.
Our best source for the actually paid price is the OpenAPC project. Where available,
we use that data, and so apc_paid.provenance is openapc . Where OpenAPC data
is unavailable (and unfortunately this is common) we make our best guess by
assuming the author paid the APC list price, and apc_paid.provenance will be set to
wherever we got the list price from.
apc_payment: {
value: 2250,
currency: "EUR",
value_usd: 2426,
provenance: "openapc"
}
best_oa_location
Object: A Location object with the best available open access location for this
work.
biblio
Object: Old-timey bibliographic info for this work. This is mostly useful only in
citation/reference contexts. These are all strings because sometimes you'll get fun
values like "Spring" and "Inside cover."
volume (String)
issue (String)
first_page (String)
last_page (String)
biblio: {
volume: "495",
issue: "7442",
first_page: "437",
last_page: "440"
}
cited_by_api_url
String: A URL that uses the cites filter to display a list of works that cite this work.
This is a way to expand cited_by_count into an actual list of works.
cited_by_count
Integer: The number of citations to this work. These are the times that other works
have cited this work: Other works ➞ This work.
cited_by_count: 382
concepts
Each Concept object in the list also has one additional property:
score (Float): The strength of the connection between the work and this
concept (higher is stronger). This number is produced by AWS Sagemaker, in
the last layer of the machine learning model that assigns concepts.
Concepts with a score of at least 0.3 are assigned to the work. However, ancestors
of an assigned concept are also added to the work, even if the ancestor scores are
below 0.3.
Because ancestor concepts are assigned to works, you may see concepts in works
with very low scores, even some zero scores.
concepts: [
{
id: "https://openalex.org/C71924100",
wikidata: "https://www.wikidata.org/wiki/Q11190",
display_name: "Medicine",
level: 0,
score: 0.9187037
},
{
id: "https://openalex.org/C3007834351",
wikidata: "https://www.wikidata.org/wiki/Q82069695",
display_name: "Severe acute respiratory syndrome coronavirus 2 (SA
level: 5,
score: 0.8070164
},
...
{
id: "https://openalex.org/C191935318",
wikidata: "https://www.wikidata.org/wiki/Q148",
display_name: "China",
level: 2,
score: 0.5948172
},
...
{
id: "https://openalex.org/C121608353",
wikidata: "https://www.wikidata.org/wiki/Q12078",
display_name: "Cancer",
level: 2,
score: 0.46887803
},
...
{
id: "https://openalex.org/C17744445",
wikidata: "https://www.wikidata.org/wiki/Q36442",
display_name: "Political science",
level: 0,
score: 0
}
]
corresponding_author_ids
corresponding_institution_ids
List: OpenAlex IDs of any institutions found within an authorship for which
authorships.is_corresponding is true .
corresponding_institution_ids: ["https://openalex.org/I4210123613"]
countries_distinct_count
Integer: Number of distinct country_codes among the authorships for this work.
countries_distinct_count: 4
counts_by_year
List: Works.cited_by_count for each of the last ten years, binned by year. To put it
another way: each year, you can see how many times this work was cited.
Any citations older than ten years old aren't included. Years with zero citations have
been removed so you will need to add those in if you need them.
counts_by_year: [
{
year: 2022,
cited_by_count: 8
},
{
year: 2021,
cited_by_count: 252
},
...
{
year: 2012,
cited_by_count: 79
}
]
created_date
String: The date this Work object was created in the OpenAlex dataset, expressed
as an ISO 8601 date string.
created_date: "2017-08-08"
display_name
String: Exactly the same as Work.title . It's useful for Work s to include a
display_name property, since all the other entities have one.
doi
String: The DOI for the work. This is the Canonical External ID for works.
Occasionally, a work has more than one DOI--for example, there might be one DOI
for a preprint version hosted on bioRxiv, and another DOI for the published version.
However, this field always has just one DOI, the DOI for the published work.
doi: "https://doi.org/10.7717/peerj.4375"
fulltext_origin
fulltext_origin: "pdf"
fwci
Float: The Field-weighted Citation Impact (FWCI), calculated for a work as the ratio
of citations received / citations expected in the year of publications and three
following years. Learn more in the reference article: Field Weighted Citation Impact
(FWCI).
fwci: 76.992
grants
List: List of grant objects, which include the Funder and the award ID, if available.
Our grants data comes from Crossref, and is currently fairly limited.
grants: [
// grant for which we have the grant details:
{
funder: "https://openalex.org/F4320306076",
funder_display_name: "National Science Foundation",
award_id: "ABI 1661218",
},
// grant for which we do not have the details:
{
funder: "https://openalex.org/F4320306084",
funder_display_name: "U.S. Department of Energy",
award_id: null,
},
]
has_fulltext
Boolean: Set to true if the work's full text is searchable in OpenAlex. This does
not necessarily mean that the full text is available to you, dear reader; rather, it
means that we have indexed the full text and can use it to help power searches. If
you are trying to find the full text for yourself, try looking in open_access.oa_url .
We get access to the full text in one of two ways: either using an open-access PDF,
or using N-grams obtained from the Internet Archive. You can learn where a work's
full text came from at fulltext_origin .
has_fulltext: true
host_venue (deprecated)
id
String: The OpenAlex ID for this work.
id: "https://openalex.org/W2741809807"
ids
Object: All the external identifiers that we know about for this work. IDs are
expressed as URIs whenever possible. Possible ID types:
Most works are missing one or more ID types (either because we don't know the ID, or
because it was never assigned). Keys for null IDs are not displayed.
ids: {
openalex: "https://openalex.org/W2741809807",
doi: "https://doi.org/10.7717/peerj.4375",
mag: 2741809807,
pmid: "https://pubmed.ncbi.nlm.nih.gov/29456894"
}
indexed_in
List: The sources this work is indexed in. Possible values: arxiv , crossref , doaj
, pubmed .
indexed_in: [
"arxiv", "crossref", "pubmed"
]
institutions_distinct_count
Integer: Number of distinct institutions among the authorships for this work.
institutions_distinct_count: 4
is_paratext
In our context, paratext is stuff that's in a scholarly venue (like a journal) but is
about the venue rather than a scholarly work properly speaking. Some examples
and nonexamples:
yep it's paratext: front cover, back cover, table of contents, editorial board
listing, issue information, masthead.
no, not paratext: research paper, dataset, letters to the editor, figures
Turns out there is a lot of paratext in registries like Crossref. That's not a bad
thing... but we've found that it's good to have a way to filter it out.
is_paratext: false
is_retracted
We identify works that have been retracted using the public Retraction Watch
database, a public resource made possible by a partnership between Crossref and
The Center for Scientific Integrity.
is_retracted: false
keywords
List of objects: Short phrases identified based on works' Topics. For background on
how Keywords are identified, see the Keywords page at OpenAlex help pages.
The score for each keyword represents the similarity score of that keyword to the
title and abstract text of the work.
We provide up to 5 keywords per work, for all keywords with scores above a certain
threshold.
[
{
id: "https://openalex.org/keywords/global-seaweed-distribution",
display_name: "Global Seaweed Distribution",
score: 0.559386
},
{
id: "https://openalex.org/keywords/climate-change-impacts",
display_name: "Climate Change Impacts",
score: 0.535795
},
{
id: "https://openalex.org/keywords/ecosystem-resilience",
display_name: "Ecosystem Resilience",
score: 0.502789
}
]
language
String: The language of the work in ISO 639-1 format. The language is automatically
detected using the information we have about the work. We use the langdetect
software library on the words in the work's abstract, or the title if we do not have
the abstract. The source code for this procedure is here. Keep in mind that this
method is not perfect, and that in some cases the language of the title or abstract
could be different from the body of the work.
language: "en"
license
String: The license applied to this work at this host. Most toll-access works don't
have an explicit license (they're under "all rights reserved" copyright), so this field
generally has content only if is_oa is true .
license: "cc-by"
locations
List: A list of Location objects describing all unique places where this work lives.
locations: [
{
is_oa: true,
landing_page_url: "https://doi.org/10.1073/pnas.17.6.401",
pdf_url: "http://www.pnas.org/content/17/6/401.full.pdf",
source: {
id: "https://openalex.org/S125754415",
display_name: "Proceedings of the National Academy of Sciences of t
issn_l: "0027-8424",
issn: ["1091-6490", "0027-8424"],
host_organization: "https://openalex.org/P4310320052",
type: "journal"
},
license: null,
version: "publishedVersion"
},
{
is_oa: true,
landing_page_url: "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10760
pdf_url: null,
source: {
id: "https://openalex.org/S2764455111",
display_name: "PubMed Central",
issn_l: null,
issn: null,
host_organization: "https://openalex.org/I1299303238",
type: "repository"
},
license: null,
version: "publishedVersion"
}
]
locations_count
locations_count: 3
mesh
List: List of MeSH tag objects. Only works found in PubMed have MeSH tags; for all
other works, this is an empty list.
mesh: [
{
descriptor_ui: "D017712",
descriptor_name: "Peer Review, Research",
qualifier_ui: "Q000379",
qualifier_name: "methods",
is_major_topic: false
},
{
descriptor_ui: "D017712",
descriptor_name: "Peer Review, Research",
qualifier_ui: "Q000592",
qualifier_name: "standards",
is_major_topic: true
}
]
open_access
Object: Information about the access status of this work, as an OpenAccess object.
open_access: {
is_oa: true,
oa_status: "gold",
oa_url: "https://peerj.com/articles/4375.pdf",
any_repository_has_fulltext: true
},
primary_location
The primary_location is where you can find the best (closest to the version of
record) copy of this work. For a peer-reviewed journal article, this would be a full
text published version, hosted by the publisher at the article's DOI URL.
primary_location: {
is_oa: true,
landing_page_url: "https://doi.org/10.1073/pnas.17.6.401",
pdf_url: "http://www.pnas.org/content/17/6/401.full.pdf",
source: {
id: "https://openalex.org/S125754415",
display_name: "Proceedings of the National Academy of Sciences of the
issn_l: "0027-8424",
issn: ["1091-6490", "0027-8424"],
host_organization: "https://openalex.org/P4310320052",
type: "journal"
},
license: null,
version: "publishedVersion"
}
primary_topic
Object
The top ranked Topic for this work. This is the same as the first item in
Work.topics .
primary_topic: {
id: "https://openalex.org/T12419",
display_name: "Analysis of Cardiac and Respiratory Sounds",
score: 0.9997,
subfield: {
id: 2740,
display_name: "Pulmonary and Respiratory Medicine"
}
field: {
id: 27,
display_name: "Medicine"
}
domain: {
id: 4,
display_name: "Health Sciences"
}
}
publication_date
String: The day when this work was published, formatted as an ISO 8601 date.
Where different publication dates exist, we usually select the earliest available date
of electronic publication.
This date applies to the version found at Work.url . The other versions, found in
Work.locations , may have been published at different (earlier) dates.
publication_date: "2018-02-13"
publication_year
This year applies to the version found at Work.url . The other versions, found in
Work.locations , may have been published in different (earlier) years.
publication_year: 2018
referenced_works
List: OpenAlex IDs for works that this work cites. These are citations that go from
this work out to another work: This work ➞ Other works.
referenced_works: [
"https://openalex.org/W2753353163",
"https://openalex.org/W2785823074",
"https://openalex.org/W2511661767",
"https://openalex.org/W2115339903",
"https://openalex.org/W2031754690"
]
related_works
List: OpenAlex IDs for works related to this work. Related works are computed
algorithmically; the algorithm finds recent papers with the most concepts in
common with the current paper.
related_works: [
"https://openalex.org/W2753353163",
"https://openalex.org/W2785823074",
"https://openalex.org/W2511661767",
"https://openalex.org/W2115339903",
"https://openalex.org/W2031754690",
]
sustainable_development_goals
We display all of the SDGs with a prediction score higher than 0.4.
sustainable_development_goals: [
{
id: "https://metadata.un.org/sdg/3",
display_name: "Good health and well-being",
score: 0.95
}
]
topics
List: List of objects
The top ranked Topics for this work. We provide up to 3 topics per work.
topics: [
{
id: "https://openalex.org/T12419",
display_name: "Analysis of Cardiac and Respiratory Sounds",
score: 0.9997,
subfield: {
id: 2740,
display_name: "Pulmonary and Respiratory Medicine"
}
field: {
id: 27,
display_name: "Medicine"
}
domain: {
id: 4,
display_name: "Health Sciences"
}
}
...
]
title
title: "The state of OA: a large-scale analysis of the prevalence and imp
type
Most works are type article . This includes what was formerly (and currently in
type_crossref ) labeled as journal-article , proceedings-article , and
posted-content . We consider all of these to be article type works, and the
distinctions between them to be more about where they are published or hosted:
(Note that distinguishing between journals and conferences is a hard problem, one
we often get wrong. We are working on improving this, but we also point out that
the two have a lot of overlap in terms of their roles as hosts of research
publications.)
Works that are hosted primarily on a preprint, or that are identified speicifically as
preprints in the metadata we receive, are assigned the type preprint rather than
article .
Works that represent stuff that is about the venue (such as a journal)—rather than a
scholarly work properly speaking—have type paratext . These include things like
front-covers, back-covers, tables of contents, and the journal itself (e.g.,
https://openalex.org/W4232230324 ).
type: "article"
type_crossref
These are the work types that we used to use, before switching to our current
system (see type ).
You can see all possible values of Crossref's "type" controlled vocabulary via the
Crossref api here: https://api.crossref.org/types .
Where possible, we just pass along Crossref's type value for each work. When
that's impossible (eg the work isn't in Crossref), we do our best to figure out the
type ourselves.
type_crossref: "journal-article"
updated_date
String: The last time anything in this Work object changed, expressed as an ISO
8601 date string (in UTC). This date is updated for any change at all, including
increases in various counts.
updated_date: "2022-01-02T00:22:35.180390"
any_repository_has_fulltext
Boolean: True if any of this work's locations has location.is_oa=true and
location.source.type=repository .
Use case: researchers want to track Green OA, using a definition of "any repository
hosts this." OpenAlex's definition (as used in oa_status ) doesn't support this,
because as soon as there's a publisher-hosted copy (bronze, hybrid, or gold),
oa_status is set to that publisher-hosted status.
So there's a lot of repository-hosted content that the oa_status can't tell you
about. Our State of OA paper calls this "shadowed Green." This feature makes it
possible to track shadowed Green.
any_repository_has_fulltext: true
is_oa
There are many ways to define OA. OpenAlex uses a broad definition: having a URL
where you can read the fulltext of this work without needing to pay money or log in.
You can use the locations and oa_status fields to narrow your results further,
accommodating any definition of OA you like.
is_oa: true
oa_status
String: The Open Access (OA) status of this work. Possible values are:
oa_status: "gold"
oa_url
String: The best Open Access (OA) URL for this work.
Although there are many ways to define OA, in this context an OA URL is one where
you can read the fulltext of this work without needing to pay money or log in. The
"best" such URL is the one closest to the version of record.
This URL might be a direct link to a PDF, or it might be to a landing page that links to
the free PDF
oa_url: "https://peerj.com/articles/4375.pdf"
Authorship object
The Authorship object represents a single author and her institutional affiliations in
the context of a given work. It is only found as part of a Work object, in the
work.authorships property.
affiliations
Each institutional affiliation that this author has claimed will be listed here: the raw
affiliation string that we found, along with the OpenAlex Institution ID or IDs that
we matched it to.
This information will be redundant with institutions below, but is useful if you
need to know about what we used to match institutions.
affiliations: [
{
raw_affiliation_string: "Scholarly Communications Lab, Simon Fras
institution_ids: [
"https://openalex.org/I18014758"
]
}
]
author
Note that, sometimes, we assign ORCID using author disambiguation, so the ORCID
we associate with an author was not necessarily included with this work.
author: {
id: "https://openalex.org/A5085171399",
display_name: "Juan Pablo Alperin",
orcid: "https://orcid.org/0000-0002-9344-7439"
}
author_position
String: A summarized description of this author's position in the work's author list.
Possible values are first , middle , and last .
It's not strictly necessary, because author order is already implicitly recorded by
the list order of Authorship objects; however it's useful in some contexts to have
this as a categorical value.
author_position: "first"
countries
countries: [
"US"
]
institutions
List: The institutional affiliations this author claimed in the context of this work, as
dehydrated Institution objects.
institutions: [
{
id: "https://openalex.org/I18014758",
display_name: "Simon Fraser University",
ror: "https://ror.org/0213rcc28",
country_code: "CA",
type: "education",
lineage: ["https://openalex.org/I18014758"]
}
]
is_corresponding
This is a new feature, and the information may be missing for many works. We are
working on this, and coverage will improve soon.
raw_affiliation_strings
raw_affiliation_strings: [
"Canadian Institute for Studies in Publishing, Simon Fraser Universit
],
raw_author_name
Locations are meant to capture the way that a work exists in different versions. So,
for example, a work may have a version that has been peer-reviewed and published
in a journal (the version of record). This would be one of the work's locations. It
may have another version available on a preprint server like bioRxiv—this version
having been posted before it was accepted for publication. This would be another
one of the work's locations.
Locations are meant to cover anywhere that a given work can be found. This can
include journals, proceedings, institutional repositories, and subject-area
repositories like arXiv and bioRxiv. If you are only interested in a certain one of
these (like journal), you can use a filter to specify the locations.source.type .
(Learn more about types here.)
There are three places in the Work object where you can find locations:
is_accepted
is_accepted: true
is_oa
Boolean: True if an Open Access (OA) version of this work is available at this
location.
There are many ways to define OA. OpenAlex uses a broad definition: having a URL
where you can read the fulltext of this work without needing to pay money or log in.
is_oa: true
is_published
is_published: true
landing_page_url
String: The landing page URL for this location.
landing_page_url: "https://doi.org/10.1590/s1678-77572010000100010"
license
String: The location's publishing license. This can be a Creative Commons license
such as cc0 or cc-by, a publisher-specific license, or null which means we are not
able to determine a license for this location.
license: "cc-by"
source
Object: Information about the source of this location, as a DehydratedSource
object.
pdf_url
String: A URL where you can find this location as a PDF.
pdf_url: "http://www.scielo.br/pdf/jaos/v18n1/a10v18n1.pdf"
version
String: The version of the work, based on the DRIVER Guidelines versioning
scheme. Possible values are:.
version: "publishedVersion"
Get a single work
It's easy to get a work from from the API with: /works/<entity_id> Here's an
example:
That will return a Work object, describing everything OpenAlex knows about the
work with that ID.
{
"id": "https://openalex.org/W2741809807",
"doi": "https://doi.org/10.7717/peerj.4375",
"title": "The state of OA: a large-scale analysis of the prevalence a
"display_name": "The state of OA: a large-scale analysis of the preva
"publication_year": 2018,
"publication_date": "2018-02-13",
// other fields removed for brevity
}
You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax (tutorial).
External IDs
You can look up works using external IDs such as a DOI:
You can use the full ID or a shorter Uniform Resource Name (URN) format like so:
External ID URN
DOI doi
You must make sure that the ID(s) you supply are valid and correct. If an ID you
request is incorrect, you will get no result. If you request an illegal ID—such as one
containing a , or & , the query will fail and you will get a 403 error.
Select fields
You can use select to limit the fields that are returned in a work object. More
details are here.
{
"meta": {
"count": 245684392,
"db_response_time_ms": 929,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/W1775749144",
"doi": "https://doi.org/10.1016/s0021-9258(19)52451-6",
"title": "PROTEIN MEASUREMENT WITH THE FOLIN PHENOL REAGENT",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/W2100837269",
"doi": "https://doi.org/10.1038/227680a0",
"title": "Cleavage of Structural Proteins during the Assembly
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}
Continue on to learn how you can filter and search lists of works.
Sample works
You can use sample to get a random batch of works. Read more about sampling
and how to add a seed value here.
Select fields
You can use select to limit the fields that are returned in a list of works. More
details are here.
It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query.
authorships.affiliations.institution_ids
apc_list.currency
apc_list.provenance
apc_list.value_usd
apc_paid.value
apc_paid.currency
apc_paid.provenance
apc_paid.value_usd
best_oa_location.is_accepted
best_oa_location.is_published
best_oa_location.source.is_in_doaj
best_oa_location.source.issn
best_oa_location.source.host_organization
best_oa_location.source.type
best_oa_location.version
biblio.first_page
biblio.issue
biblio.last_page
biblio.volume
cited_by_count
countries_distinct_count
fwci
ids.pmcid
institutions_distinct_count
is_paratext
is_retracted
keywords.keyword
language
locations.is_accepted
locations.is_oa
locations.is_published
locations.license
locations.source.id
locations.source.is_core
locations.source.is_in_doaj
locations.source.issn
locations.source.host_organization
locations.source.type
locations.version
locations_count
open_access.any_repository_has_fulltext
primary_location.is_oa
primary_location.is_published
primary_location.license
primary_location.source.id
primary_location.source.is_core
primary_location.source.is_in_doaj
primary_location.source.issn
primary_location.source.host_organization
primary_location.source.type
primary_location.version
primary_topic.id
primary_topic.domain.id
primary_topic.field.id
primary_topic.subfield.id
publication_year
publication_date
sustainable_development_goals.id
topics.id
topics.domain.id
topics.field.id
topics.subfield.id
type
type_crossref
Want to filter by the display_name of an associated entity (author, institution, source,
etc.)? See here.
abstract.search
Returns: works whose abstract includes the given string. See the search page for
details on the search algorithm used.
authors_count
Value: an Integer
Returns: works with the chosen number of authorships objects (authors). You can
use the inequality filter to select a range, such as authors_count:>5 .
Returns: works where at least one of the author's institutions is in the chosen
continent.
Get works where at least one author's institution in each work is located in
Europe
https://api.openalex.org/works?
filter=authorships.institutions.continent:europe
authorships.institutions.is_global_south (alias:
institutions.is_global_south )
Returns: works where at least one of the author's institutions is in the Global South
(read more).
Get works where at least one author's institution is in the Global South
https://api.openalex.org/works?
filter=authorships.institutions.is\_global\_south:true
best_open_version
cited_by
Value: the OpenAlex ID for a given work
Returns: works found in the given work's referenced_works section. You can think
of this as outgoing citations.
cites
Returns: works that cite the given work. This is works that have the given OpenAlex
ID in the referenced_works section. You can think of this as incoming citations.
The number of results returned by this filter may be slightly higher than the work's
cited_by_count due to a timing lag in updating that field.
concepts_count
Value: an Integer
default.search
Returns: works whose display_name (title) includes the given string; see the
search page for details.
For most cases, you should use the search parameter instead of this filter, because it
uses a better search algorithm and searches over abstracts as well as titles.
from_created_date
Returns: works with created_date greater than or equal to the given date.
This field requires an OpenAlex Premium subscription to access. Click here to learn
more.
Get works created on or after January 12th, 2023 (does not work without valid
API key):
https://api.openalex.org/works?filter=from_created_date:2023-01-
12&api_key=myapikey
from_publication_date
Returns: works with publication_date greater than or equal to the given date.
Get works published on or after March 14th, 2001:
https://api.openalex.org/works?filter=from_publication_date:2001-03-14
Filtering by publication date is not a reliable way to retrieve recently updated and
created works, due to the way publishers assign publication dates. Use
from_created_date or from_updated_date to get the latest changes in OpenAlex.
from_updated_date
Value: a date, formatted as an ISO 8601 date or date-time string (for example:
"2020-05-17", "2020-05-17T15:30", or "2020-01-02T00:22:35.180390").
Returns: works with updated_date greater than or equal to the given date.
This field requires an OpenAlex Premium subscription to access. Click here to learn
more.
Get works updated on or after January 12th, 2023 (does not work without valid
API key):
https://api.openalex.org/works?filter=from_updated_date:2023-01-
12&api_key=myapikey
Learn more about using this filter to get the freshest data possible with our Premium
How-To.
fulltext.search
Returns: works whose fulltext includes the given string. Fulltext search is available
for a subset of works, obtained either from PDFs or n-grams, see
Work.has_fulltext for more details.
has_abstract
Returns: works that have or lack an abstract, depending on the given value.
has_doi
Returns: works that have or lack a DOI, depending on the given value. It's especially
useful for grouping.
has_oa_accepted_or_published_version
Returns: works with at least one of the locations has is_oa = true and version
is acceptedVersion or publishedVersion. For Works that undergo peer review, like
journal articles, this means there is a peer-reviewed OA copy somewhere. For some
items, like books, a published version doesn't imply peer review, so they aren't
quite synonymous.
Returns: works with at least one of the locations has is_oa = true and version
is submittedVersion. This is useful for finding works with preprints deposited
somewhere.
has_orcid
Returns: if true it returns works where at least one author or has an ORCID ID. If
false , it returns works where no authors have an ORCID ID. This is based on the
orcid field within authorships.author . Note that, sometimes, we assign ORCID
using author disambiguation, so this does not necessarily mean that the work itself
has ORCID information.
Get the works where at least one author has an ORCID ID:
https://api.openalex.org/works?filter=has_orcid:true
has_pmcid
Returns: works that have or lack a PubMed Central identifier ( pmcid ) depending on
the given value.
has_pmid
has_ngrams (DEPRECATED)
Returns: works for which n-grams are available or unavailable, depending on the
given value. N-grams power fulltext searches through the fulltext.search filter
and the search parameter.
has_references
Returns: works that have or lack referenced_works , depending on the given value.
journal
Value: the OpenAlex ID for a given source, where the source is type: journal
locations.source.host_institution_lineage
Value: the OpenAlex ID for an Institution
locations.source.publisher_lineage
mag_only
Returns: works which came from MAG (Microsoft Academic Graph), and no other
data sources.
MAG was a project by Microsoft Research to catalog all of the scholarly content on
the internet. After it was discontinued in 2021, OpenAlex built upon the data MAG
had accumulated, connecting and expanding it using a variety of other sources.
The methods that MAG used to identify and aggregate scholarly content were quite
different from most of our other sources, and so the content inherited from MAG,
especially works that we did not connect with data from other sources, can look
different from other works. While it's great to have these MAG-only works available,
you may not always want to include them in your results or analyses. This filter
allows you to include or exclude any works that came from MAG and only MAG.
primary_location.source.has_issn
Returns: works where the primary_location has at least one ISSN assigned.
Get the works that have an ISSN within the primary location:
https://api.openalex.org/works?
filter=primary_location.source.has_issn:true
primary_location.source.publisher_lineage
raw_affiliation_strings.search
related_to
repository
Value: the OpenAlex ID for a given source, where the source is type: repository
Returns: works where the chosen source ID exists within the locations .
You can use this to find works where authors are associated with your university,
but the work is not part of the university's repository.👏
Get works that are available in the University of Michigan Deep Blue repository
(OpenAlex ID: https://openalex.org/S4306400393 )
https://api.openalex.org/works?filter=repository:S4306400393
Get works where at least one author is associated with the University of
Michigan, but the works are not found in the University of Michigan Deep Blue
repository
https://api.openalex.org/works?
filter=institutions.id:I27837315,repository:!S4306400393
You can also use this as a group_by to learn things about repositories:
Learn which repositories have the most open access works
https://api.openalex.org/works?filter=is_oa:true&group_by=repository
title_and_abstract.search
Returns: works whose display_name (title) or abstract includes the given string;
see the search page for details.
to_created_date
Returns: works with created_date less than or equal to the given date.
This field requires an OpenAlex Premium subscription to access. Click here to learn
more.
Get works created on or after January 12th, 2023 (does not work without valid
API key):
https://api.openalex.org/works?filter=to_created_date:2024-01-
12&api_key=myapikey
to_publication_date
Returns: works with publication_date less than or equal to the given date.
Value: a date, formatted as an ISO 8601 date or date-time string (for example:
"2020-05-17", "2020-05-17T15:30", or "2020-01-02T00:22:35.180390").
Returns: works with updated_date less than or equal to the given date.
This field requires an OpenAlex Premium subscription to access. Click here to learn
more.
Get works updated before or on January 12th, 2023 (does not work without valid
API key):
https://api.openalex.org/works?filter=to_updated_date:2023-01-
12&api_key=myapikey
version
Returns: works where the chosen version exists within the locations . If null , it
returns works where no version is found in any of the locations.
Get works where a published version is available in at least one of the locations:
https://api.openalex.org/works?filter=version:publishedVersion
Search works
The best way to search for works is to use the search query parameter, which
searches across titles, abstracts, and fulltext. Example:
Get works with search term "dna" in the title, abstract, or fulltext:
https://api.openalex.org/works?search=dna
Fulltext search is available for a subset of works, see Work.has_fulltext for more
details.
You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.
abstract.search abstract_inverted_index
display_name.search display_name
raw_affiliation_strings.search authorships.raw_affiliation_strings
Search filter Field that is searched
title.search display_name
You can also use the filter default.search , which works the same as using the
search parameter.
These searches make use of stemming and stop-word removal. You can disable
this for searches on titles and abstracts. Learn how to do this here.
1. Find the ID of the related entity. For example, if you're interested in works
associated with NYU, you could search the /institutions endpoint for that
name: https://api.openalex.org/institutions?search=nyu . Looking at the
first result, you'll see that the OpenAlex ID for NYU is I57206974 .
2. Use a filter with the /works endpoint to get all of the works:
https://api.openalex.org/works?filter=institutions.id:I57206974 .
Why can't you do this in just one step? Well, if you use the search term, "NYU," you
might end up missing the ones that use the full name "New York University," rather
than the initials. Sure, you could try to think of all possible variants and search for
all of them, but you might miss some, and you risk putting in search terms that let in
works that you're not interested in. Figuring out which works are actually
associated with the "NYU" you're interested shouldn't be your responsibility—that's
our job! We've done that work for you, so all the relevant works should be
associated with one unique ID.
Autocomplete works
You can autocomplete works to create a very fast type-ahead style search
function:
This returns a list of works titles with the author of each work set as the hint:
{
"results": [
{
"id": "https://openalex.org/W2125098916",
"display_name": "Crouching tigers, hidden prey: Sumatran tiger and
"hint": "Timothy G. O'Brien, Margaret F. Kinnaird, Hariyo T. Wibiso
"cited_by_count": 620,
"works_count": null,
"entity_type": "work",
"external_id": "https://doi.org/10.1017/s1367943003003172"
},
...
]
}
It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.
authors_count
authorships.affiliations.institution_ids
apc_list.currency
apc_list.provenance
apc_list.value_usd
apc_paid.value
apc_paid.currency
apc_paid.provenance
apc_paid.value_usd
best_oa_location.is_accepted
best_oa_location.is_published
best_oa_location.license
best_oa_location.source.host_organization
best_oa_location.source.id
best_oa_location.source.is_in_doaj
best_oa_location.source.issn
best_oa_location.source.type
best_oa_location.version
best_open_version
biblio.first_page
biblio.issue
biblio.last_page
biblio.volume
cited_by_count
cites
concepts_count
concepts.id
concepts.wikidata
corresponding_author_ids
corresponding_institution_ids
countries_distinct_count
fulltext_origin
grants.award_id
grants.funder
has_abstract
has_doi
has_fulltext
has_orcid
has_pmid
has_pmcid
has_ngrams (DEPRECATED)
has_references
indexed_in
is_retracted
is_paratext
journal
keywords.keyword
language
locations.is_accepted
locations.is_published
locations.source.host_institutions_lineage
locations.source.is_core
locations.source.is_in_doaj
locations.source.publisher_lineage
locations_count
mag_only
open_access.any_repository_has_fulltext
primary_location.is_oa
primary_location.is_published
primary_location.license
primary_location.source.has_issn
primary_location.source.host_organization
primary_location.source.id
primary_location.source.is_core
primary_location.source.is_in_doaj
primary_location.source.issn
primary_location.source.publisher_lineage
primary_location.source.type
primary_location.version
primary_topic.id
primary_topic.domain.id
primary_topic.field.id
primary_topic.subfield.id
publication_year
repository
sustainable_development_goals.id
topics.id
topics.domain.id
topics.field.id
topics.subfield.id
type
type_crossref
Get N-grams
N-grams are groups of sequential words that occur in the text of a Work.
N-grams list the words and phrases that occur in the full text of a Work . We obtain
👏
them from Internet Archive's publicly (and generously ) available General Index
and use them to enable fulltext searches on the Works that have them, through
both the fulltext.search filter, and as an element of the more holistic search
parameter.
Note that while n-grams are derived from the fulltext of a Work, the presence of n-
grams for a given Work doesn't imply that the fulltext is available to you, the reader.
It only means the fulltext was available to Internet Archive for indexing.
Work.open_access is the place to go for information on public fulltext availability.
API Endpoint
The n-gram API endpoint is not currently in service. The n-grams are still used on our
backend to help power fulltext search. If you have any questions about this, please
submit a support ticket.
Fulltext Coverage
You can see which works we have full-text for using the has_fulltext filter. This
does not necessarily mean that the full text is available to you, dear reader; rather, it
means that we have indexed the full text and can use it to help power searches. If
you are trying to find the full text for yourself, try looking in open_access.oa_url .
We get access to the full text in one of two ways: either using an open-access PDF,
or using N-grams obtained from the Internet Archive. You can learn where a work's
full text came from at fulltext_origin .
About 57 million works have n-grams coverage through Internet Archive.
OurResearch is the first organization to host this data in a highly usable way, and
we are proud to integrate it into OpenAlex!
Curious about n-grams used in search? Browse them all via the API. Highly-cited
works and less recent works are more likely to have n-grams, as shown by the
coverage charts below:
Authors
People who create works
Authors are people who create works. You can get an author from the API like this:
The Canonical External ID for authors is ORCID; only a small percentage of authors
have one, but the percentage is higher for more recent works.
Our information about authors comes from MAG, Crossref, PubMed, ORCID, and
publisher websites, among other sources. To learn more about how we combine
this information to get OpenAlex Authors, see Author disambiguation
What's next
Learn more about what you can with authors:
affiliations
List: List of objects, representing the affiliations this author has claimed in their
publications. Each object in the list has two properties:
affiliations: [
{
institution: {
id: "https://openalex.org/I201448701",
ror: "https://ror.org/00cvxb145",
...
},
years: [2018, 2019, 2020]
},
{
institution: {
id: "https://openalex.org/I74973139",
ror: "https://ror.org/05x2bcf33",
...
},
years: [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]
}
]
cited_by_count
Integer: The total number 📄 Works that cite a work this author has created.
cited_by_count: 38
counts_by_year
Any works or citations older than ten years old aren't included. Years with zero
works and zero citations have been removed so you will need to add those in if you
need them.
counts_by_year: [
{
year: 2022,
works_count: 0,
cited_by_count: 8
},
{
year: 2021,
works_count: 1,
cited_by_count: 252
},
...
{
year: 2012,
works_count: 7,
cited_by_count: 79
}
]
created_date
String: The date this Author object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.
created_date: "2017-08-08"
display_name
String: The name of the author as a single string.
display_name_alternatives
List: Other ways that we've found this author's name displayed.
display_name_alternatives: [
"Jason R Priem"
]
id
id: "https://openalex.org/A5023888391"
ids
Object: All the external identifiers that we know about for this author. IDs are
expressed as URIs whenever possible. Possible ID types:
Most authors are missing one or more ID types (either because we don't know the ID,
or because it was never assigned). Keys for null IDs are not displayed.
ids: {
openalex: "https://openalex.org/A5023888391",
orcid: "https://orcid.org/0000-0001-6187-6610",
scopus: "http://www.scopus.com/inward/authorDetails.url?authorID=3645
},
last_known_institution (deprecated)
last_known_institutions
List: List of Institution objects. This author's last known institutional affiliations. In
this context "last known" means that we took all the author's Works, sorted them by
publication date, and selected the most recent one. If there is only one affiliated
institution for this author for the work, this will be a list of length 1; if there are
multiple affiliations, they will all be included in the list.
Each item in the list is a dehydrated Institution object, and you can find more
documentation on the Institution page.
last_known_institutions: [{
id: "https://openalex.org/I4200000001",
ror: "https://ror.org/02nr0ka47",
display_name: "OurResearch",
country_code: "CA",
type: "nonprofit",
lineage: ["https://openalex.org/I4200000001"]
}],
orcid
String: The ORCID ID for this author. ORCID is a global and unique ID for authors.
This is the Canonical external ID for authors.
Compared to other Canonical IDs, ORCID coverage is relatively low in OpenAlex,
because ORCID adoption in the wild has been slow compared with DOI, for example.
This is particularly an issue when dealing with older works and authors.
orcid: "https://orcid.org/0000-0001-6187-6610"
summary_stats
2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this author.
summary_stats: {
2yr_mean_citedness: 1.5295340589458237,
h_index: 45,
i10_index: 205
}
updated_date
String: The last time anything in this author object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.
updated_date: "2022-01-02T00:00:00"
works_api_url
String: A URL that will get you a list of all this author's works.
We express this as an API URL (instead of just listing the works themselves)
because sometimes an author's publication list is too long to reasonably fit into a
single author object.
works_api_url: "https://api.openalex.org/works?filter=author.id:A50238883
works_count
This is updated a couple times per day. So the count may be slightly different than
what's in works when viewed like this.
x_concepts
List: The concepts most frequently applied to works created by this author. Each is
represented as a dehydrated Concept object, with one additional attribute:
score(Float): The strength of association between this author and the listed
concept, from 0-100.
x_concepts: [
{
id: "https://openalex.org/C41008148",
wikidata: null,
display_name: "Computer science",
level: 0,
score: 97.4
},
{
id: "https://openalex.org/C17744445",
wikidata: null,
display_name: "Political science",
level: 0,
score: 78.9
}
]
id
display_name
orcid
Get a single author
It's easy to get an author from from the API with: /authors/<entity_id> . Here's an
example:
That will return an Author object, describing everything OpenAlex knows about
the author with that ID:
{
"id": "https://openalex.org/A5023888391",
"orcid": "https://orcid.org/0000-0001-6187-6610",
"display_name": "Jason Priem",
"display_name_alternatives": [],
"works_count": 53,
// other fields removed for brevity
}
You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.
External IDs
You can look up authors using external IDs such as an ORCID:
You can use the full ID or a shorter Uniform Resource Name (URN) format like so:
https://api.openalex.org/authors/orcid:0000-0002-1298-3089
Available external IDs for authors are:
External ID URN
ORCID orcid
Scopus scopus
Twitter twitter
Wikipedia wikipedia
Select fields
You can use select to limit the fields that are returned in an author object. More
details are here.
Display only the id and display_name and orcid for an author object
https://api.openalex.org/authors/A5023888391?
select=id,display_name,orcid
Get lists of authors
You can get lists of authors:
{
"meta": {
"count": 93011659,
"db_response_time_ms": 150,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/A5053780153",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/A5032245741",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}
Get the second page of authors results, with 50 results returned per page
https://api.openalex.org/authors?per-page=50\&page=2
Continue on to learn how you can filter and search lists of authors.
Sample authors
You can use sample to get a random batch of authors. Read more about sampling
and how to add a seed value here.
Select fields
You can use select to limit the fields that are returned in a list of authors. More
details are here.
Display only the id and display_name and orcid within authors results
https://api.openalex.org/authors?select=id,display\_name,orcid
Filter authors
You can filter authors with the filter parameter:
It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query.
affiliations.institution.country_code
affiliations.institution.id
affiliations.institution.lineage
affiliations.institution.ror
affiliations.institution.type
cited_by_count
last_known_institution.id
last_known_institution.lineage
last_known_institution.ror
last_known_institution.type
orcid
default.search
This works the same as using the search parameter for Authors.
display_name.search
Returns: Authors whose display_name contains the given string; see the search
filter for details.
has_orcid
Value: a Boolean ( true or false )
Returns: authors that have or lack an orcid, depending on the given value.
last_known_institution.continent
Returns: authors where where the last known institution is in the chosen continent.
last_known_institution.is_global_south
Returns: works where at least one of the author's institutions is in the Global South.
Get authors where the last known institution is located in the Global South
https://api.openalex.org/authors?
filter=last_known_institution.is_global_south:true
Search authors
The best way to search for authors is to use the search query parameter, which
searches the display_name and the display_name_alternatives fields. Example:
Searching without a middle initial returns names with and without middle initials. So
a search for "John Smith" will also return "John W. Smith".
Names with diacritics are flexible as well. So a search for David Tarrago can return
David Tarragó, and a search for David Tarragó can return David Tarrago. When
searching with a diacritic, diacritic versions of the names are prioritized in order to
honor the original form of the author's name. Read more about our handling of
diacritics here.
You can read more in the search page in the API Guide. It will show you how relevance
score is calculated, how words are stemmed to improve search results, and how to do
complex boolean searches.
When searching for authors, there is no difference when using the search
parameter or the filter display_name.search , since display_name is the only field
searched when finding authors.
Search filter Field that is searched
display_name.search display_name
You can also use the filter default.search , which works the same as using the
search parameter.
Autocomplete authors
You can autocomplete authors to create a very fast type-ahead style search
function:
This returns a list of authors with their last known affiliated institution as the hint:
{
"results": [
{
"id": "https://openalex.org/A5007433649",
"display_name": "Ronald Swanstrom",
"hint": "University of North Carolina at Chapel Hill, USA",
"cited_by_count": 19142,
"works_count": 339,
"entity_type": "author",
"external_id": "https://orcid.org/0000-0001-7777-0773",
"filter_key": "authorships.author.id"
},
...
]
}
It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.
affiliations.institution.id
affiliations.institution.lineage
affiliations.institution.ror
affiliations.institution.type
cited_by_count
has_orcid
last_known_institution.continent
last_known_institution.country_code
last_known_institution.id
last_known_institution.is_global_south
last_known_institution.lineage
last_known_institution.ror
last_known_institution.type
summary_stats.2yr_mean_citedness
summary_stats.h_index
summary_stats.i10_index
works_count
Limitations
Works with more than 100 authors are
truncated
When retrieving a list of works in the API, the authorships list within each work
will be cut off at 100 authorships objects in order to keep things running well. When
this happens the boolean value is_authors_truncated will be available and set to
true . This affects a small portion of OpenAlex, as there are around 35,000 works
with more than 100 authors. This limitation does not apply to the data snapshot.
To see the full list of authors, go to the individual record for the work, which is
never truncated.
This affects filtering as well. So if you filter works using an author ID or ROR, you
will not receive works where that author is listed further than 100 places down on
the list of authors. We plan to change this in the future, so that filtering works as
expected.
Author disambiguation
Our information about authors comes from MAG, Crossref, PubMed, ORCID, and
publisher websites. We use an algorithm to disambiguate authors; this uses an
author’s name, their publication record, their citation patterns, and (where available)
their ORCID.
So for example, if J. Schmidt and John Jacob Jingleheimer Schmidt both write
about 19th-century ketchup production, we’ll treat them as one author–but we won’t
include the JJJ Schmidt who writes about weasel migration (even though his name
is their name, too).
Our methods, code, and models are all, of course, fully open. You can find technical
documentation on the author disambiguation model on Github here. You will also
find code and links to training data there.
Sources are where works are hosted. OpenAlex indexes about 249,000 sources.
There are several types, including journals, conferences, preprint repositories, and
institutional repositories.
The Canonical External ID for sources is ISSN-L, which is a special "main" ISSN
assigned to every sources (sources tend to have multiple ISSNs). About 90% of
sources in OpenAlex have an ISSN-L or ISSN.
Our information about sources comes from Crossref, the ISSN Network, and MAG.
These datasets are joined automatically where possible, but there’s also a lot of
manual combining involved. We do not curate journals, so any journal that is
available in the data sources should make its way into OpenAlex.
Several sources may host the same work. OpenAlex reports both the primary host
source (generally wherever the version of record lives), and alternate host sources
(like preprint repositories).
Check out the Japanese Sources tutorial, a Jupyter notebook showing how to use
Python and the API to learn about all of the sources in a country.
What's next
Learn more about what you can do with sources:
abbreviated_title
String: An abbreviated title obtained from the ISSN Centre.
alternate_titles
Array: Alternate titles for this source, as obtained from the ISSN Centre and
individual work records, like Crossref DOIs, that carry the source name as a string.
These are commonly abbreviations or translations of the source's canonical name.
alternate_titles: [
"ACRJ"
]
apc_prices
List: List of objects, each with price (Integer) and currency (String).
apc_prices: [
{
price: 3920,
currency: "GBP"
}
]
apc_usd
Integer: The source's article processing charge in US Dollars, if available from
DOAJ.
The apc_usd value is calculated by taking the APC price (see apc_prices ) with a
currency of USD if it is available. If it's not available, we convert the first available
value from apc_prices into USD, using recent exchange rates.
apc_usd: 5200
cited_by_count
Integer: The total number of Works that cite a Work hosted in this source.
cited_by_count: 133702
country_code
String: The country that this source is associated with, represented as an ISO two-
letter country code.
country_code: "GB"
counts_by_year
List: works_count and cited_by_count for each of the last ten years, binned by
year. To put it another way: each year, you can see how many new works this
source started hosting, and how many times any work in this source got cited.
If the source was founded less than ten years ago, there will naturally be fewer than
ten years in this list. Years with zero citations and zero works have been removed
so you will need to add those in if you need them.
counts_by_year: [
{
year: 2021,
works_count: 4338,
cited_by_count: 127268
},
{
year: 2020,
works_count: 4363,
cited_by_count: 119531
},
// and so forth
]
created_date
String: The date this Source object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.
created_date: "2017-08-08"
display_name
display_name: "PeerJ"
homepage_url
String: The starting page for navigating the contents of this source; the homepage
for this source's website.
homepage_url: "http://www.peerj.com/"
host_organization
String: The host organization for this source as an OpenAlex ID. This will be an
Institution.id if the source is a repository, and a Publisher.id if the source is
a journal, conference, or eBook platform (based on the type field).
id: "https://openalex.org/P4310320595"
host_organization_lineage
List: OpenAlex IDs — See Publisher.lineage . This will only be included if the
host_organization is a publisher (and not if the host_organization is an
institution).
host_organization_lineage: [
"https://openalex.org/P4310321285",
"https://openalex.org/P4310319900",
"https://openalex.org/P4310319965"
]
host_organization_name
id
id: "https://openalex.org/S1983995261"
ids
Object: All the external identifiers that we know about for this source. IDs are
expressed as URIs whenever possible. Possible ID types:
Many sources are missing one or more ID types (either because we don't know the ID,
or because it was never assigned). Keys for null IDs are not displayed.
Example
ids: {
openalex: "https://openalex.org/S1983995261",
issn_l: "2167-8359",
issn: [
"2167-8359"
],
mag: 1983995261,
fatcat: "https://fatcat.wiki/container/z3ijzhu7zzey3f7jwws7r
wikidata: "https://www.wikidata.org/entity/Q96326029"
}
is_core
Boolean: Whether this source is identified as a "core source" by CWTS, used in the
Open Leiden Ranking of universities around the world. The list of core sources can
be found here.
is_core: true
is_in_doaj
Boolean: Whether this is a journal listed in the Directory of Open Access Journals
(DOAJ).
is_in_doaj: true
is_oa
We say "currently" because the status of a source can change over time. It's
common for journals to "flip" to Gold OA, after which they may make only future
articles open or also open their back catalogs. It's entirely possible for a source to
say is_oa: true , but for an article from last year to require a subscription.
is_oa: true
issn
List: The ISSNs used by this source. Many publications have multiple ISSNs , so
ISSN-L should be used when possible.
issn: ["2167-8359"]
issn_l
String: The ISSN-L identifying this source. This is the Canonical External ID for
sources.
ISSN is a global and unique ID for serial publications. However, different media
versions of a given publication (e.g., print and electronic) often have different
ISSNs. This is why we can't have nice things. The ISSN-L or Linking ISSN solves the
problem by designating a single canonical ISSN for all media versions of the title.
It's usually the same as the print ISSN.
issn_l: "2167-8359"
societies
Array: Societies on whose behalf the source is published and maintained, obtained
from our crowdsourced list. Thanks!
societies: [
{
"url": "http://www.counseling.org/",
"organization": "American Counseling Association on behalf of the
}
]
summary_stats
2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this source.
While the h-index and the i-10 index are normally author-level metrics, they can be
calculated for any set of papers, so we include them for sources.
summary_stats: {
2yr_mean_citedness: 1.5295340589458237,
h_index: 105,
i10_index: 5045
}
type
String: The type of source, which will be one of: journal , repository ,
conference , ebook platform , book series , metadata , or other .
type: "journal"
updated_date
String: The last time anything in this Source object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.
updated_date: "2022-01-02T00:00:00"
works_api_url
String: A URL that will get you a list of all this source's Works .
We express this as an API URL (instead of just listing the works themselves)
because sometimes a source's publication list is too long to reasonably fit into a
single Source object.
works_api_url: "https://api.openalex.org/works?filter=primary_location.so
works_count
Integer: The number of Works this source hosts.
works_count: 20184
x_concepts
List: The Concepts most frequently applied to works hosted by this source. Each is
represented as a dehydrated Concept object, with one additional attribute:
score (Float): The strength of association between this source and the listed
concept, from 0-100.
x_concepts: [
{
id: "https://openalex.org/C86803240",
wikidata: null,
display_name: "Biology",
level: 0,
score: 86.7
},
{
id: "https://openalex.org/C185592680",
wikidata: null,
display_name: "Chemistry",
level: 0,
score: 51.4
},
// and so forth
]
display_name
host_organization
host_organization_lineage
host_organization_name
id
is_core
is_in_doaj
is_oa
issn
issn_l
type
Get a single source
It's easy to get a source from from the API with: /sources/<entity_id> . Here's an
example:
That will return an Source object, describing everything OpenAlex knows about
the source with that ID:
{
"id": "https://openalex.org/S137773608",
"issn_l": "0028-0836",
"issn": [
"1476-4687",
"0028-0836"
],
"display_name": "Nature",
// other fields removed for brevity
}
You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.
External IDs
You can look up journals using external IDs such as an ISSN:
ISSN issn
Fatcat fatcat
Wikidata wikidata
Select fields
You can use select to limit the fields that are returned in a source object. More
details are here.
{
"meta": {
"count": 226727,
"db_response_time_ms": 32,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/S2764455111",
"issn_l": null,
"issn": null,
"display_name": "PubMed Central",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/S4306400806",
"issn_l": null,
"issn": null,
"display_name": "PubMed Central - Europe PMC",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}
Continue on to learn how you can filter and search lists of sources.
Sample sources
You can use sample to get a random batch of sources. Read more about sampling
and how to add a seed value here.
Select fields
You can use select to limit the fields that are returned in a list of sources. More
details are here.
It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query
apc_prices.currency
apc_prices.price
apc_usd
cited_by_count
country_code
is_in_doaj
is_oa
issn
works_count
continent
default.search
This works the same as using the search parameter for Sources.
display_name.search
Value: a search string
Returns: sources with a display_name containing the given string; see the search
page for details.
In most cases, you should use the search parameter instead of this filter because it
uses a better search algorithm.
has_issn
Returns: sources that have or lack an ISSN, depending on the given value.
is_global_south
Search for the abbreviated version of the Journal of the American Chemical
Society " jacs ":
https://api.openalex.org/sources?search=jacs
You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.
display_name.search display_name
You can also use the filter default.search , which works the same as using the
search parameter.
Autocomplete sources
You can autocomplete sources to create a very fast type-ahead style search
function:
This returns a list of sources with the publisher set as the hint:
{
"results": [
{
"id": "https://openalex.org/S5555990",
"display_name": "The Journal of Neuroscience",
"hint": "Society for Neuroscience",
"cited_by_count": 4274712,
"works_count": 40376,
"entity_type": "source",
"external_id": "0270-6474"
},
// more results
]
}
It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.
apc_usd
cited_by_count
has_issn
continent
country_code
is_core
is_in_doaj
is_oa
issn
publisher
summary_stats.2yr_mean_citedness
summary_stats.h_index
summary_stats.i10_index
type
works_count
Institutions
Universities and other organizations to which authors claim affiliations
The Canonical External ID for institutions is the ROR ID. All institutions in OpenAlex
have ROR IDs.
Our information about institutions comes from metadata found in Crossref, PubMed,
ROR, MAG, and publisher websites. In order to link institutions to works, we parse
every affiliation listed by every author. These affiliation strings can be quite messy,
so we’ve trained an algorithm to interpret them and extract the actual institutions
with reasonably high reliability.
For a simple example: we will treat both “MIT, Boston, USA” and “Massachusetts
Institute of Technology” as the same institution (https://ror.org/042nb2s44).
What's next
Learn more about what you can do with institutions:
associated_institutions
relationship (String): The type of relationship between this institution and the
listed institution. Possible values: parent , child , and related .
associated_institutions: [
{
id: "https://openalex.org/I2802101240",
ror: "https://ror.org/0483mr804",
display_name: "Carolinas Medical Center",
country_code: "US",
type: "healthcare",
relationship: "related"
},
{
id: "https://openalex.org/I69048370",
ror: "https://ror.org/01s91ey96",
display_name: "Renaissance Computing Institute",
country_code: "US",
type: "education",
relationship: "related"
},
// and so forth
]
cited_by_count
Integer: The total number Works that cite a work created by an author affiliated
with this institution. Or less formally: the number of citations this institution has
collected.
cited_by_count: 21199844
country_code
String: The country where this institution is located, represented as an ISO two-
letter country code.
country_code: "US"
counts_by_year
List: works_count and cited_by_count for each of the last ten years, binned by
year. To put it another way: each year, you can see how many new works this
institution put out, and how many times any work affiliated with this institution got
cited.
Years with zero citations and zero works have been removed so you will need to
add those in if you need them.
counts_by_year: [
{
year: 2022,
works_count: 133,
cited_by_count: 32731
},
{
year: 2021,
works_count: 12565,
cited_by_count: 2180827
},
// and so forth
]
created_date
String: The date this Institution object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.
created_date: "2017-08-08"
display_name
display_name_acronyms
List: Acronyms or initialisms that people sometimes use instead of the full
display_name .
display_name_acronyms:["UNC"]
display_name_alternatives
display_name_alternatives: [
"UNC-Chapel Hill"
]
geo
geo: {
city: "Chapel Hill",
geonames_city_id: "4460162",
region: "North Carolina",
country_code: "US",
country: "United States",
latitude: 35.9083,
longitude: -79.0492
}
homepage_url
homepage_url: "http://www.unc.edu/"
id
id: "https://openalex.org/I114027177"
ids
Object: All the external identifiers that we know about for this institution. IDs are
expressed as URIs whenever possible. Possible ID types:
Many institution are missing one or more ID types (either because we don't know the
ID, or because it was never assigned). Keys for null IDs are not displayed.
ids: {
openalex: "https://openalex.org/I114027177",
ror: "https://ror.org/0130frc33",
grid: "grid.10698.36",
wikipedia: "https://en.wikipedia.org/wiki/University%20of%20North%20C
wikidata: "https://www.wikidata.org/wiki/Q192334",
mag: 114027177
}
image_thumbnail_url
image_thumbnail_url: "https://upload.wikimedia.org/wikipedia/en/thumb/5/5
is_super_system
Boolean: True if this institution is a "super system". This includes large university
systems such as the University of California System (
https://openalex.org/I2803209242 ), as well as some governments and
multinational companies.
We have this special flag for these institutions so that we can exclude them from
other institutions' lineage , which we do because these super systems are not
generally relevant in group-by results when you're looking at ranked lists of
institutions.
The list of institution IDs marked as super systems can be found in this file.
image_url
String: URL where you can get an image representing this institution. Usually this is
hosted on Wikipedia, and usually it's a seal or logo.
image_url: "https://upload.wikimedia.org/wikipedia/en/5/5c/University_of_
international
Object: The institution's display name in different languages. Derived from the
wikipedia page for the institution in the given language.
display_name (Object)
key (String): language code in wikidata language code format. Full list of
languages is here.
value (String): display_name in the given language
international: {
display_name: {
"ar": ""جامعة نورث كارولينا في تشابل هيل,
"en": "University of North Carolina at Chapel Hill",
"es": "Universidad de Carolina del Norte en Chapel Hill",
"zh-cn": "北卡罗来纳大学教堂山分校",
...
}
}
lineage
List: OpenAlex IDs of institutions. The list will include this institution's ID, as well as
any parent institutions. If this institution has no parent institutions, this list will only
contain its own ID.
Super systems are excluded from the lineage. See is_super_system above.
id: "https://openalex.org/I170203145",
...
lineage: [
"https://openalex.org/I170203145",
"https://openalex.org/I90344618"
]
repositories
List: Repositories ( Sources with type: repository ) that have this institution as
their host_organization
repositories: [
{
id: "https://openalex.org/S4306402521",
display_name: "University of Minnesota Digital Conservancy (Unive
host_organization: "https://openalex.org/I130238516",
host_organization_name: "University of Minnesota",
host_organization_lineage: ["https://openalex.org/I130238516"]
}
// and so forth
]
roles
List: List of role objects, which include the role (one of institution , funder , or
publisher ), the id (OpenAlex ID), and the works_count .
In many cases, a single organization does not fit neatly into one role. For example,
Yale University is a single organization that is a research university, funds research
studies, and publishes an academic journal. The roles property links the
OpenAlex entities together for a single organization, and includes counts for the
works associated with each role.
The roles list of an entity (Funder, Publisher, or Institution) always includes itself.
In the case where an organization only has one role, the roles will be a list of
length one, with itself as the only item.
roles: [
{
role: "funder",
id: "https://openalex.org/F4320308380",
works_count: 1004,
},
{
role: "publisher",
id: "https://openalex.org/P4310315589",
works_count: 13986,
},
{
role: "institution",
id: "https://openalex.org/I32971472",
works_count: 250031,
}
]
ror
String: The ROR ID for this institution. This is the Canonical External ID for
institutions.
ror: "https://ror.org/0130frc33"
summary_stats
2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this institution.
While the h-index and the i-10 index are normally author-level metrics and the 2-
year mean citedness is normally a journal-level metric, they can be calculated for
any set of papers, so we include them for institutions.
summary_stats: {
2yr_mean_citedness: 5.065784263815827,
h_index: 985,
i10_index: 176682
}
type
String: The institution's primary type, using the ROR "type" controlled vocabulary.
type: "education"
updated_date
String: The last time anything in this Institution changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.
updated_date: "2022-01-02T00:27:23.088909"
works_api_url
String: A URL that will get you a list of all the Works affiliated with this institution.
We express this as an API URL (instead of just listing the Works themselves)
because most institutions have way too many works to reasonably fit into a single
return object.
works_api_url: "https://api.openalex.org/works?filter=institutions.id:I114
works_count
Integer: The number of Works created by authors affiliated with this institution. Or
less formally: the number of works coming out of this institution.
works_count: 202704
x_concepts
List: The Concepts most frequently applied to works affiliated with this institution.
Each is represented as a dehydrated Concept object, with one additional attribute:
score(Float): The strength of association between this institution and the listed
concept, from 0-100.
x_concepts: [
{
id: "https://openalex.org/C86803240",
wikidata: null,
display_name: "Biology",
level: 0,
score: 86.7
},
{
id: "https://openalex.org/C185592680",
wikidata: null,
display_name: "Chemistry",
level: 0,
score: 51.4
},
// and so forth
]
country_code
display_name
id
lineage
ror
type
Get a single institution
It's easy to get an institution from from the API with: /institutions/<entity_id> .
Here's an example:
{
"id": "https://openalex.org/I27837315",
"ror": "https://ror.org/00jmfr291",
"display_name": "University of Michigan–Ann Arbor",
"country_code": "US",
"type": "education",
// other fields removed for brevity
}
You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.
External IDs
You can look up institutions using external IDs such as a ROR ID:
External ID URN
ROR ror
Wikidata wikidata
Select fields
You can use select to limit the fields that are returned in an institution object.
More details are here.
{
"meta": {
"count": 108618,
"db_response_time_ms": 32,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/I27837315",
"ror": "https://ror.org/00jmfr291",
"display_name": "University of Michigan–Ann Arbor",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/I201448701",
"ror": "https://ror.org/00cvxb145",
"display_name": "University of Washington",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}
Continue on to learn how you can filter and search lists of institutions.
Sample institutions
You can use sample to get a random batch of institutions. Read more about
sampling and how to add a seed value here.
Select fields
You can use select to limit the fields that are returned in a list of institutions. More
details are here.
It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query
cited_by_count
country_code
is_super_system
works_count
continent
default.search
This works the same as using the search parameter for Institutions.
display_name.search
Returns: institutions with a display_name containing the given string; see the
search page for details.
In most cases, you should use the search parameter instead of this filter because it
uses a better search algorithm.
has_ror
Value: a Boolean ( true or false )
Returns: institutions that have or lack a ROR ID, depending on the given value.
is_global_south
You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.
display_name.search display_name
You can also use the filter default.search , which works the same as using the
search parameter.
Autocomplete institutions
You can autocomplete institutions to create a very fast type-ahead style search
function:
This returns a list of institutions with the institution location set as the hint:
{
"results": [
{
"id": "https://openalex.org/I136199984",
"display_name": "Harvard University",
"hint": "Cambridge, USA",
"cited_by_count": 37792327,
"works_count": 242547,
"entity_type": "institution",
"external_id": "https://ror.org/03vek6s52"
},
...
]
}
It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.
continent
country_code
has_ror
is_global_south
is_super_system
lineage
repositories.host_organization
summary_stats.2yr_mean_citedness
summary_stats.h_index
summary_stats.i10_index
type
works_count
Topics
Topics assigned to works
Works in OpenAlex are tagged with Topics using an automated system that takes
into account the available information about the work, including title, abstract,
source (journal) name, and citations. There are around 4,500 Topics. Works are
assigned topics using a model that assigns scores for each topic for a work. The
highest-scoring topic is that work's primary_topic . We also provide additional
highly ranked topics for works, in Work.topics.
To learn more about how OpenAlex topics work in general, see the Topics page at
OpenAlex help pages.
For a detailed description of the methods behind OpenAlex Topics, see our paper:
"OpenAlex: End-to-End Process for Topic Classification". The code and model are
available at https://github.com/ourresearch/openalex-topic-classification .
What's next
Learn more about what you can do with topics:
description
display_name
domain
Object: The ID and the name ( display_name ) for the domain of this topic. The
domain is the highest level in the "domain, field, subfield, topic" system, which
means it is the least granular. See the topics overview for more explanation and a
diagram.
domain: {
id: 4,
display_name: "Health Sciences"
}
field
Object: The ID and the name ( display_name ) for the field of this topic. The field is
the second-highest level in the "domain, field, subfield, topic" system, which means
it is the second-least granular. See the topics overview for more explanation and a
diagram.
field: {
id: 27,
display_name: "Medicine"
}
id
id: "https://openalex.org/T11636"
ids
Object: All the external identifiers that we know about for this topic. IDs are
expressed as URIs whenever possible. Possible ID types:
ids: {
openalex: "https://openalex.org/T11636",
wikipedia: "https://en.wikipedia.org/wiki/Artificial_intelligence_in_
}
keywords
List: Keywords consisting of one or several words each, meant to represent the
content of the papers in the topic. These keywords were generated as part of the AI
model. For now, they are provided as-is, but we will be providing more support and
documenting them more thoroughly.
keywords: [
"Artificial Intelligence",
"Machine Learning",
"Healthcare",
"Medical Imaging",
"Clinical Decision Support",
...
]
subfield
Object: The ID and the name ( display_name ) for the subfield of this topic. The
subfield is the third-highest level in the "domain, field, subfield, topic" system,
which means it is the third-least granular. See the topics overview for more
explanation and a diagram.
subfield: {
id: 2718,
display_name: "Health Informatics"
}
updated_date
String: The last time anything in this topic object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.
updated_date: "2024-02-05T05:00:03.798420"
works_count
works_count: 21737
Get a single topic
It's easy to get a topic from the API with: /topics/<entity_id> . Here's an
example:
That will return a Topic object, describing everything OpenAlex knows about the
topic with that ID:
{
"id": "https://openalex.org/T11636",
"display_name": "Artificial Intelligence in Medicine",
// other fields removed for brevity
}
You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.
Select fields
You can use select to limit the fields that are returned in a topic object. More
details are here.
{
"meta": {
"count": 4516,
"db_response_time_ms": 10,
"page": 1,
"per_page": 25,
"groups_count": null
},
"results": [
{
"id": "https://openalex.org/T11475",
"display_name": "Territorial Governance and Environmental Par
// more fields (removed to save space)
},
{
"id": "https://openalex.org/T13445",
"display_name": "American Political Thought and History",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}
Get the second page of topics results, with 50 results returned per page
https://api.openalex.org/topics?per-page=50\&page=2
You also can sort results with the sort parameter:
Continue on to learn how you can filter and search lists of topics.
Sample topics
You can use sample to get a random batch of topics. Read more about sampling
and how to add a seed value here.
Select fields
You can use select to limit the fields that are returned in a list of topics. More
details are here.
It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query
cited_by_count
domain.id
field.id
works_count
default.search
This works the same as using the search parameter for Topics.
display_name.search
Returns: topics with a display_name containing the given string; see the search
page for details.
In most cases, you should use the search parameter instead of this filter because it
uses a better search algorithm.
Search topics
The best way to search for topics is to use the search query parameter, which
searches the display_name , description , and keyword fields. Example:
You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.
display_name.search display_name
description.search description
keywords.search keywords
You can also use the filter default.search , which works the same as using the
search parameter.
Group topics
You can group topics with the group_by parameter:
It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.
domain.id
field.id
subfield.id
works_count
Keywords
Short words or phrases assigned to works using AI
Works in OpenAlex are tagged with Keywords using an automated system based on
Topics.
To learn more about how OpenAlex Keywords work in general, see the Keywords
page at OpenAlex help pages.
Keyword object
These are the fields in a keyword object. When you use the API to get a single
keyword or lists of keywords, this is what's returned.
cited_by_count
Integer: The number of citations to works that have been tagged with this keyword.
Or less formally: the number of citations to this keyword.
For example, if there are just two works tagged with this keyword and one of them
has been cited 10 times, and the other has been cited 1 time, cited_by_count for
this keyword would be 11 .
cited_by_count: 4347000
created_date
String: The date this Keyword object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.
created_date: "2024-04-10"
display_name
id
id: "https://openalex.org/keywords/cardiac-imaging"
updated_date
String: The last time anything in this keyword object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.
updated_date: "2024-05-09T05:00:03.798420"
works_count
works_count: 21737
That will return a Keyword object, describing everything OpenAlex knows about
the keyword with that ID:
{
"id": "https://openalex.org/keywords/cardiac-imaging",
"display_name": "Cardiac Imaging",
// other fields removed for brevity
}
You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.
Select fields
You can use select to limit the fields that are returned in a keyword object. More
details are here.
Filter keywords
You can filter keywords with the filter parameter:
It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query
id
works_count
default.search
This works the same as using the search parameter for Keywords.
display_name.search
Search keywords
You can search for keywords using the search query parameter, which searches
the display_name fileds. For example:
Group keywords
You can group keywords with the group_by parameter:
It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.
works_count
Publishers
Companies and organizations that distribute works
Publishers are companies and organizations that distribute journal articles, books,
and theses. OpenAlex indexes about 10,000 publishers.
Our publisher data is closely tied to the publisher information in Wikidata. So the
Canonical External ID for OpenAlex publishers is a Wikidata ID, and almost every
publisher has one. Publishers are linked to sources through the
host_organization field.
What's next
Learn more about what you can do with publishers:
alternate_titles
alternate_titles: [
"Elsevier",
"elsevier.com",
"Elsevier Science",
"Uitg. Elsevier",
""السفیر,
""السویر,
""انتشارات الزویر,
""لودویک السفیر,
"爱思唯尔"
]
cited_by_count
Integer: The number of citations to works that are linked to this publisher through
journals or other sources.
For example, if a publisher publishes 27 journals and those 27 journals have 3,050
works, this number is the sum of the cited_by_count values for all of those 3,050
works.
cited_by_count: 407508754
country_codes
List: The countries where the publisher is primarily located, as an ISO two-letter
country code.
country_codes: ["DE"]
counts_by_year
List: The values of works_count and cited_by_count for each of the last ten
years, binned by year. To put it another way: for every listed year, you can see how
many new works are linked to this publisher, and how many times any work linked
to this publisher was cited.
Years with zero citations and zero works have been removed so you will need to
add those back in if you need them.
counts_by_year: [
{
year: 2021,
works_count: 4211,
cited_by_count: 120939
},
{
year: 2020,
works_count: 4363,
cited_by_count: 119531
},
// and so forth
]
created_date
String: The date this Publisher object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.
created_date: "2017-08-08"
display_name
hierarchy_level
Integer: The hierarchy level for this publisher. A publisher with hierarchy level 0 has
no parent publishers. A hierarchy level 1 publisher has one parent above it, and so
on.
hierarchy_level: 1
id
id: "https://openalex.org/P4310320990"
ids
Object: All the external identifiers that we know about for this publisher. IDs are
expressed as URIs whenever possible. Possible ID types:
image_thumbnail_url
This is usually a hotlink to a wikimedia image. You can change the width=300
parameter in the URL if you want a different thumbnail size.
image_thumbnail_url: "https://commons.wikimedia.org/w/index.php?title=Spe
image_url
String: URL where you can get an image representing this publisher. Usually this a
hotlink to a Wikimedia image, and usually it's a seal or logo.
image_url: "https://commons.wikimedia.org/w/index.php?title=Special:Redir
lineage
List: OpenAlex IDs of publishers. The list will include this publisher's ID, as well as
any parent publishers. If this publisher's hierarchy_level is 0, this list will only
contain its own ID.
id: "https://openalex.org/P4310321285",
...
hierarchy_level: 2,
lineage: [
"https://openalex.org/P4310321285",
"https://openalex.org/P4310319900",
"https://openalex.org/P4310319965"
]
parent_publisher
String: An OpenAlex ID linking to the direct parent of the publisher. This will be null
if the publisher's hierarchy_level is 0.
parent_publisher: "https://openalex.org/P4310311775"
roles
List: List of role objects, which include the role (one of institution , funder , or
publisher ), the id (OpenAlex ID), and the works_count .
In many cases, a single organization does not fit neatly into one role. For example,
Yale University is a single organization that is a research university, funds research
studies, and publishes an academic journal. The roles property links the
OpenAlex entities together for a single organization, and includes counts for the
works associated with each role.
The roles list of an entity (Funder, Publisher, or Institution) always includes itself.
In the case where an organization only has one role, the roles will be a list of
length one, with itself as the only item.
roles: [
{
role: "funder",
id: "https://openalex.org/F4320308380",
works_count: 1004,
},
{
role: "publisher",
id: "https://openalex.org/P4310315589",
works_count: 13986,
},
{
role: "institution",
id: "https://openalex.org/I32971472",
works_count: 250031,
}
]
sources_api_url
String: An URL that will get you a list of all the sources published by this publisher.
We express this as an API URL (instead of just listing the sources themselves)
because there might be thousands of sources linked to a publisher, and that's too
many to fit here.
sources_api_url: "https://api.openalex.org/sources?filter=host_organizati
summary_stats
2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this publisher.
summary_stats: {
2yr_mean_citedness: 5.065784263815827,
h_index: 985,
i10_index: 176682
}
updated_date
String: The last time anything in this publisher object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.
updated_date: "2021-12-25T14:04:30.578837"
works_count
works_count: 13789818
Get a single publisher
It's easy to get a publisher from from the API with: /publishers/<entity_id> .
Here's an example:
That will return a Publisher object, describing everything OpenAlex knows about
the publisher with that ID:
{
"id": "https://openalex.org/P4310319965",
"display_name": "Springer Nature",
"alternate_titles": [
"エイプレス",
"Springer Nature Group",
"施普林格-自然出版集团"
],
"hierarchy_level": 0,
// other fields removed for brevity
}
You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.
External IDs
You can look up publishers using external IDs such as a Wikidata ID:
ROR ror
Wikidata wikidata
Select fields
You can use select to limit the fields that are returned in a publisher object. More
details are here.
{
"meta": {
"count": 7207,
"db_response_time_ms": 26,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/P4310311775",
"display_name": "RELX Group",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/P4310320990",
"display_name": "Elsevier BV",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}
Get the second page of publishers results, with 50 results returned per page
https://api.openalex.org/publishers?per-page=50&page=2
You also can sort results with the sort parameter:
Continue on to learn how you can filter and search lists of publishers.
Sample publishers
You can use sample to get a random batch of publishers. Read more about
sampling and how to add a seed value here.
Select fields
You can use select to limit the fields that are returned in a list of publishers. More
details are here.
It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query
cited_by_count
country_codes
hierarchy_level
continent
default.search
This works the same as using the search parameter for Publishers.
display_name.search
Returns: publishers with a display_name containing the given string; see the
search page for details.
In most cases, you should use the search parameter instead of this filter because it
uses a better search algorithm.
Search publishers
The best way to search for publishers is to use the search query parameter, which
searches the display_name and alternate_titles fields. Example:
You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.
display_name.search display_name
You can also use the filter default.search , which works the same as using the
search parameter.
Autocomplete publishers
You can autocomplete publishers to create a very fast type-ahead style search
function:
Autocomplete publishers with "els" in the display_name :
https://api.openalex.org/autocomplete/publishers?q=els
{
"results": [
{
"id": "https://openalex.org/P4310320990",
"display_name": "Elsevier BV",
"hint": null,
"cited_by_count": 407508754,
"works_count": 20311868,
"entity_type": "publisher",
"external_id": "https://www.wikidata.org/entity/Q746413"
},
...
]
}
It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.
hierarchy_level
lineage
summary_stats.2yr_mean_citedness
summary_stats.h_index
summary_stats.i10_index
Funders
Organizations that fund research
Funders are organizations that fund research. OpenAlex indexes about 32,000
funders. Funder data comes from Crossref, and is enhanced with data from
Wikidata and ROR.
What's next
Learn more about what you can do with funders:
alternate_titles
alternate_titles: [
"US National Institutes of Health",
"Institutos Nacionales de la Salud",
"NIH"
]
cited_by_count
Integer: The total number Works that cite a work linked to this funder.
cited_by_count: 7823467
country_code
String: The country where this funder is located, represented as an ISO two-letter
country code.
country_code: "US"
counts_by_year
List: The values of works_count and cited_by_count for each of the last ten
years, binned by year. To put it another way: for every listed year, you can see how
many new works are linked to this funder, and how many times any work linked to
this funder was cited.
Years with zero citations and zero works have been removed so you will need to
add those back in if you need them.
counts_by_year: [
{
year: 2021,
works_count: 4211,
cited_by_count: 120939
},
{
year: 2020,
works_count: 4363,
cited_by_count: 119531
},
// and so forth
]
created_date
String: The date this Funder object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.
created_date: "2023-02-13"
description
display_name
String: The primary name of the funder.
grants_count
grants_count: 7109
homepage_url
homepage_url: "http://www.nih.gov/"
id
id: "https://openalex.org/F4320332161"
ids
Object: All the external identifiers that we know about for this funder. IDs are
expressed as URIs whenever possible. Possible ID types:
ids: {
openalex: "https://openalex.org/F4320332161",
ror: "https://ror.org/01cwqze88",
wikidata: "https://www.wikidata.org/entity/Q390551",
crossref: "100000002",
doi: "https://doi.org/10.13039/100000002"
}
image_thumbnail_url
This is usually a hotlink to a wikimedia image. You can change the width=300
parameter in the URL if you want a different thumbnail size.
image_thumbnail_url: "https://commons.wikimedia.org/w/index.php?title=Spe
image_url
String: URL where you can get an image representing this funder. Usually this a
hotlink to a Wikimedia image, and usually it's a seal or logo.
image_url: "https://commons.wikimedia.org/w/index.php?title=Special:Redir
roles
List: List of role objects, which include the role (one of institution , funder , or
publisher ), the id (OpenAlex ID), and the works_count .
In many cases, a single organization does not fit neatly into one role. For example,
Yale University is a single organization that is a research university, funds research
studies, and publishes an academic journal. The roles property links the
OpenAlex entities together for a single organization, and includes counts for the
works associated with each role.
The roles list of an entity (Funder, Publisher, or Institution) always includes itself.
In the case where an organization only has one role, the roles will be a list of
length one, with itself as the only item.
roles: [
{
role: "funder",
id: "https://openalex.org/F4320308380",
works_count: 1004,
},
{
role: "publisher",
id: "https://openalex.org/P4310315589",
works_count: 13986,
},
{
role: "institution",
id: "https://openalex.org/I32971472",
works_count: 250031,
}
]
summary_stats
2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this funder.
While the h-index and the i-10 index are normally author-level metrics and the 2-
year mean citedness is normally a journal-level metric, they can be calculated for
any set of papers, so we include them for funders.
summary_stats: {
2yr_mean_citedness: 5.065784263815827,
h_index: 985,
i10_index: 176682
}
updated_date
String: The last time anything in this funder object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.
updated_date: "2023-04-21T16:54:19.012138"
works_count
works_count: 260210
Get a single funder
It's easy to get a funder from from the API with: /funders/<entity_id> . Here's an
example:
That will return a Funder object, describing everything OpenAlex knows about the
funder with that ID:
{
"id": "https://openalex.org/F4320332161",
"display_name": "National Institutes of Health",
"alternate_titles": [
"US National Institutes of Health",
"Institutos Nacionales de la Salud",
"NIH"
],
// other fields removed for brevity
}
You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.
External IDs
You can look up funders using external IDs such as a Wikidata ID:
External ID URN
ROR ror
External ID URN
Wikidata wikidata
Select fields
You can use select to limit the fields that are returned in a funder object. More
details are here.
{
"meta": {
"count": 32437,
"db_response_time_ms": 26,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/F4320321001",
"display_name": "National Natural Science Foundation of China
// more fields (removed to save space)
},
{
"id": "https://openalex.org/F4320306076",
"display_name": "National Science Foundation",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}
Get the second page of funders results, with 50 results returned per page
https://api.openalex.org/funders?per-page=50&page=2
You also can sort results with the sort parameter:
Continue on to learn how you can filter and search lists of funders.
Sample funders
You can use sample to get a random batch of funders. Read more about sampling
and how to add a seed value here.
Select fields
You can use select to limit the fields that are returned in a list of funders. More
details are here.
It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query
cited_by_count
country_code
grants_count
default.search
This works the same as using the search parameter for Funders.
description.search
Returns: funders with a description containing the given string; see the search
page for details.
display_name.search
Returns: funders with a display_name containing the given string; see the search
page for details.
is_global_south
You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.
display_name.search display_name
description.search description
You can also use the filter default.search , which works the same as using the
search parameter.
Autocomplete funders
You can autocomplete funders to create a very fast type-ahead style search
function:
This returns a list of funders with the funder location set as the hint:
"results": [
{
"id": "https://openalex.org/F4320306076",
"display_name": "National Science Foundation",
"hint": null,
"cited_by_count": 6705777,
"works_count": 264303,
"entity_type": "funder",
"external_id": "https://ror.org/021nxhr62"
},
...
]
It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.
continent
country_code
grants_count
is_global_south
summary_stats.2yr_mean_citedness
summary_stats.h_index
summary_stats.i10_index
works_count
Geo
Where things are in the world
While geo is not a core entity within OpenAlex, geography is central to categorizing
scholarly data. That's why OpenAlex uses United Nations data to divide the globe
into continents and regions that makes filtering data easier.
Here are some ways you can filter and group by continents and the Global South.
Get works where at least one author's institution is located in the Global South
https://api.openalex.org/works?
filter=institutions.is\_global\_south:true
What's next
Learn more about what you can do with geo:
Continents
Regions
Continents
Countries are mapped to continents using data from the United Nations Statistics
Division. You can see the actual mapping used by the API here.
Filter by continent
Endpoint Format
/authors?filter=last_known_institution.continent:
Authors <continent>
Institutions /institutions?filter=continent:<continent>
Works /works?filter=institutions.continent:<continent>
Group by continent
Response:
{
key: "Q46",
key_display_name: "Europe",
count: 41382
},
{
key: "Q49",
key_display_name: "North America",
count: 37458
},
{
key: "Q48",
key_display_name: "Asia",
count: 20432
}...
Endpoint Format
Authors /authors?group-by=last_known_institution.continent
Institutions /institutions?group-by=continent
Works /works?group-by=institutions.continent
Regions
Global South
The Global South is a term used to identify regions within Latin America, Asia,
Africa, and Oceania. Our source for this group of countries is the United Nations
Finance Center for South-South Cooperation.
You can filter Global South countries by using the boolean filter is_global_south
in the following endpoints:
Endpoint Format
/authors?filter=last_known_institution.is_global_south:
Authors <boolean>
Institutions /institutions?filter=is_global_south:<boolean>
Works /works?filter=institutions.is_global_south:<boolean>
Endpoint Format
Authors /authors?group-by=last_known_institution.is_global_south
Institutions /institutions?group-by=is_global_south
Works /works?group-by=institutions.is_global_south
Get number of authors with last known institution in the Global South, by country
https://api.openalex.org/authors?
filter=last_known_institution.is_global_south:true&group-
by=last_known_institution.country_code
Response:
Concepts are abstract ideas that works are about. OpenAlex indexes about 65k
concepts.
The Canonical External ID for OpenAlex concepts is the Wikidata ID, and each of
our concepts has one, because all OpenAlex concepts are also Wikidata concepts.
Concepts are hierarchical, like a tree. There are 19 root-level concepts, and six
layers of descendants branching out from them, containing about 65 thousand
concepts all told. This concept tree is a modified version of the one created by
MAG.
You can view all the concepts and their position in the tree as a spreadsheet here.
About 85% of works are tagged with at least one concept (here's the breakdown of
concept counts per work).
Concepts are linked to works via the concepts property, and to other concepts via
the ancestors and related_concepts properties.
What's next
Learn more about what you can do with concepts:
These are the fields in a concept object. When you use the API to get a single
concept or lists of concepts, this is what's returned.
ancestors
List: List of concepts that this concept descends from, as dehydrated Concept
objects. See the concept tree section for more details on how the different layers of
concepts work together.
ancestors: [
{
id: "https://openalex.org/C2522767166",
wikidata: "https://www.wikidata.org/wiki/Q2374463",
display_name: "Data science",
level: 1
},
{
id: "https://openalex.org/C161191863",
wikidata: "https://www.wikidata.org/wiki/Q199655",
display_name: "Library science",
level: 1
},
// and so forth
]
cited_by_count
Integer: The number citations to works that have been tagged with this concept. Or
less formally: the number of citations to this concept.
For example, if there are just two works tagged with this concept and one of them
has been cited 10 times, and the other has been cited 1 time, cited_by_count for
this concept would be 11 .
cited_by_count: 20248
counts_by_year
List: The values of works_count and cited_by_count for each of the last ten
years, binned by year. To put it another way: for every listed year, you can see how
many new works were tagged with this concept, and how many times any work
tagged with this concept got cited.
Years with zero citations and zero works have been removed so you will need to
add those back in if you need them.
counts_by_year: [
{
year: 2021,
works_count: 4211,
cited_by_count: 120939
},
{
year: 2020,
works_count: 4363,
cited_by_count: 119531
},
// and so forth
]
created_date
String: The date this Concept object was created in the OpenAlex dataset,
expressed as an ISO 8601 date string.
created_date: "2017-08-08"
description
display_name
display_name: "Altmetrics"
id
id: "https://openalex.org/C2778407487"
ids
Object: All the external identifiers that we know about for this concept. IDs are
expressed as URIs whenever possible. Possible ID types:
Many concepts are missing one or more ID types (either because we don't know the
ID, or because it was never assigned). Keys for null IDs are not displayed..
ids: {
openalex: "https://openalex.org/C2778407487",
wikidata: "https://www.wikidata.org/wiki/Q14565201",
wikipedia: "https://en.wikipedia.org/wiki/Altmetrics",
mag: 2778407487
}
image_thumbnail_url
image_thumbnail_url: "https://upload.wikimedia.org/wikipedia/commons/thum
image_url
String: URL where you can get an image representing this concept, where available.
Usually this is hosted on Wikipedia.
image_url: "https://upload.wikimedia.org/wikipedia/commons/f/f1/Altmetric
international
Object: This concept's display name in many languages, derived from article titles
on each language's wikipedia. See the Wikidata entry for "Java Bytecode" for
example source data.
display_name (Object)
key (String): language code in wikidata language code format. Full list of
languages is here.
value (String): display_name in the given language
international: {
display_name: {
ca: "Altmetrics",
...
}
}
level
Integer: The level in the concept tree where this concept lives. Lower-level
concepts are more general, and higher-level concepts are more specific. Computer
Science has a level of 0; Java Bytecode has a level of 5. Level 0 concepts have no
ancestors and level 5 concepts have no descendants.
level: 2
related_concepts
List: Concepts that are similar to this one. Each listed concept is a dehydrated
Concept object, with one additional attribute:
score (Float): The strength of association between this concept and the listed
concept, on a scale of 0-100.
related_concepts: [
{
id: "https://openalex.org/C2778793908",
wikidata: null,
display_name: "Citation impact",
level: 3,
score: 4.56749
},
{
id: "https://openalex.org/C2779455604",
wikidata: null,
display_name: "Impact factor",
level: 2,
score: 4.46396
}
// and so forth
]
summary_stats
2yr_mean_citedness Float: The 2-year mean citedness for this source. Also
known as impact factor. We use the year prior to the current year for the
citations (the numerator) and the two years prior to that for the citation-receiving
publications (the denominator).
h_index Integer: The h-index for this concept.
While the h-index and the i-10 index are normally author-level metrics and the 2-
year mean citedness is normally a journal-level metric, they can be calculated for
any set of papers, so we include them for concepts.
summary_stats: {
2yr_mean_citedness: 1.5295340589458237,
h_index: 105,
i10_index: 5045
}
updated_date
String: The last time anything in this concept object changed, expressed as an ISO
8601 date string. This date is updated for any change at all, including increases in
various counts.
updated_date: "2021-12-25T14:04:30.578837"
wikidata
String: The Wikidata ID for this concept. This is the Canonical External ID for
concepts.
All OpenAlex concepts have a Wikidata ID, because all OpenAlex concepts are also
Wikidata concepts.
wikidata: "https://www.wikidata.org/wiki/Q14565201"
works_api_url
String: An URL that will get you a list of all the works tagged with this concept.
We express this as an API URL (instead of just listing the works themselves)
because there might be millions of works tagged with this concept, and that's too
many to fit here.
works_api_url: "https://api.openalex.org/works?filter=concept.id:C27784074
works_count
[`display_name`](concept-object.md#display\_name)
[`id`](concept-object.md#id)
[`level`](concept-object.md#level)
[`wikidata`](concept-object.md#wikidata)
Get a single concept
These are the original OpenAlex Concepts, which are being deprecated in favor of
Topics. We will continue to provide these Concepts for Works, but we will not be
actively maintaining, updating, or providing support for these concepts. Unless you
have a good reason to be relying on them, we encourage you to look into Topics
instead.
It's easy to get a concept from the API with: /concepts/<entity_id> . Here's an
example:
That will return a Concept object, describing everything OpenAlex knows about
the concept with that ID:
{
"id": "https://openalex.org/C71924100",
"wikidata": "https://www.wikidata.org/wiki/Q11190",
"display_name": "Medicine",
"level": 0,
"description": "field of study for diagnosing, treating and preventin
// other fields removed for brevity
}
You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.
External IDs
You can look up concepts using external IDs such as a wikidata ID:
External ID URN
Wikidata wikidata
Select fields
You can use select to limit the fields that are returned in a concept object. More
details are here.
{
"meta": {
"count": 65073,
"db_response_time_ms": 26,
"page": 1,
"per_page": 25
},
"results": [
{
"id": "https://openalex.org/C41008148",
"wikidata": "https://www.wikidata.org/wiki/Q21198",
"display_name": "Computer science",
// more fields (removed to save space)
},
{
"id": "https://openalex.org/C71924100",
"wikidata": "https://www.wikidata.org/wiki/Q11190",
"display_name": "Medicine",
// more fields (removed to save space)
},
// more results (removed to save space)
],
"group_by": []
}
Page and sort concepts
By default we return 25 results per page. You can change this default and page
through concepts with the per-page and page parameters:
Get the second page of concepts results, with 50 results returned per page
https://api.openalex.org/concepts?per-page=50&page=2
Continue on to learn how you can filter and search lists of concepts.
Sample concepts
You can use sample to get a random batch of concepts. Read more about
sampling and how to add a seed value here.
Select fields
You can use select to limit the fields that are returned in a list of concepts. More
details are here.
It's best to read about filters before trying these out. It will show you how to combine
filters and build an AND, OR, or negation query
ancestors.id
cited_by_count
default.search
This works the same as using the search parameter for Concepts.
display_name.search
Returns: concepts with a display_name containing the given string; see the search
page for details.
In most cases, you should use the search parameter instead of this filter because it
uses a better search algorithm.
has_wikidata
Returns: concepts that have or lack a Wikidata ID, depending on the given value.
For now, all concepts in OpenAlex do have Wikidata IDs.
The best way to search for concepts is to use the search query parameter, which
searches the display_name and description fields. Example:
You can read more about search here. It will show you how relevance score is
calculated, how words are stemmed to improve search results, and how to do complex
boolean searches.
display_name.search display_name
You can also use the filter default.search , which works the same as using the
search parameter.
Autocomplete concepts
You can autocomplete concepts to create a very fast type-ahead style search
function:
This returns a list of concepts with the description set as the hint:
{
"results": [
{
"id": "https://openalex.org/C41008148",
"display_name": "Computer science",
"hint": "theoretical study of the formal foundation enabling the
"cited_by_count": 392939277,
"works_count": 76722605,
"entity_type": "concept",
"external_id": "https://www.wikidata.org/wiki/Q21198"
},
...
]
}
It's best to read about group by before trying these out. It will show you how results
are formatted, the number of results returned, and how to sort results.
cited_by_count
has_wikidata
level
summary_stats.2yr_mean_citedness
summary_stats.h_index
summary_stats.i10_index
works_count
Aboutness endpoint (/text)
You can use the /text API endpoint to tag your own free text with OpenAlex's
"aboutness" assignments—topics, keywords, and concepts.
Accepts a title and optional abstract in the GET params or as a POST request.
The results are straight from the model, with 0 values truncated.
Examples
Queries are limited to between 20 and 2000 characters. The endpoints are rate
limited to 1 per second and 1000 requests per day.
How to use the API
API Overview
The API is the primary way to get OpenAlex data. It's free and requires no
authentication. The daily limit for API calls is 100,000 requests per user per day. For
best performance, add your email to all API requests, like
[email protected] .
Client Libraries
There are several third-party libraries you can use to get data from OpenAlex:
openalexR (R)
KtAlex (Kotlin)
PyAlex (Python)
diophila (Python)
OpenAlexAPI (Python)
If you're looking for a visual interface, you can also check out the free VOSviewer,
which lets you make network visualizations based on OpenAlex data:
Get single entities
Get a single entity, based on an ID
This is a more detailed guide to single entities in OpenAlex. If you're just getting
started, check out get a single work.
It's easy to get a singleton entity object from from the API:
/<entity_name>/<entity_id>. Here's an example:
That will return a Work object, describing everything OpenAlex knows about the
work with that ID. You can use IDs other than OpenAlex IDs, and you can also
format the IDs in different ways. Read below to learn more.
You can make up to 50 of these queries at once by requesting a list of entities and
filtering on IDs using OR syntax.
To get a single entity, you need a single unambiguous identifier, like an ORCID or an
OpenAlex ID. If you've got an ambiguous identifier (like an author's name), you'll want
to search instead.
The OpenAlex ID
The OpenAlex ID is the primary key for all entities. It's a URL shaped like this:
https://openalex.org/<OpenAlex_key> . Here's a real-world example:
https://openalex.org/W2741809807
The key starts with a letter; that letter tells you what kind of entity you've got:
W(ork), A(uthor), S(ource), I(nstitution), C(oncept), P(ublisher), or F(under). The IDs
are not case-sensitive, so w2741809807 is just as valid as W2741809807 . So in the
example above, the Key is W2741809807 , and the W at the front tells us that this is
a Work .
If you request an Entity using its OpenAlex ID, and that Entity has been merged into
another Entity, you will be redirected to the Entity it has been merged into. For
example, https://openalex.org/A5092938886 has been merged into
https://openalex.org/A5006060960, so in the API the former will redirect to the
latter:
$ curl -i https://api.openalex.org/authors/A5092938886
HTTP/1.1 301 MOVED PERMANENTLY
Location: https://api.openalex.org/authors/A5006060960
Most clients will handle this transparently; you'll get the data for author
A5006060960 without knowing the redirect even happened. If you have stored
Entity ID lists and do notice the redirect, you might as well replace the merged-
away ID to skip the redirect next time.
Supported IDs
For each entity type, you can retrieve the entity using by any of the external IDs we
support--not just the native OpenAlex IDs. So for example:
This works with DOIs, ISSNs, ORCIDs, and lots of other IDs...in fact, you can use
any ID listed in an entity's ids property, as listed below:
Work.ids
Author.ids
Source.ids
Institution.ids
Concept.ids
Publisher.ids
ID formats
Most of the external IDs OpenAlex supports are canonically expressed as URLs...for
example, the canonical form of a DOI always starts with https://doi.org/ . You
can always use these URL-style IDs in the entity endpoints. Examples:
For simplicity and clarity, you may also want to express those IDs in a simpler, URN-
style format, and that's supported as well; you just write the namespace of the ID,
followed by the ID itself. Here are the same examples from above, but in the
namespace:id format:
Finally, if you're using an OpenAlex ID, you can be even more succinct, and just use
the Key part of the ID all by itself, the part that looks like w1234567 :
Works: DOI
Authors: ORCID
Sources: ISSN-L
Institutions: ROR ID
Concepts: Wikidata ID
Publishers: Wikidata ID
Dehydrated entity objects
The full entity objects can get pretty unwieldy, especially when you're embedding a
list of them in another object (for instance, a list of Concept s in a Work ). For these
cases, all the entities except Work s have a dehydrated version. This is a stripped-
down representation of the entity that carries only its most essential properties.
These properties are documented individually on their respective entity pages.
\
Random result
You can get a random result by using the string random where an ID would
normally go. OMG that's so random! Each time you call this URL you'll get a
different entity. Examples:
{
id: "https://openalex.org/W2138270253",
display_name: "DNA sequencing with chain-terminating inhibitors"
}
This query returns a meta object with details about the query, a results list of
Topic objects, and an empty group_by list:
meta: {
count: 4516,
db_response_time_ms: 81,
page: 1,
per_page: 25
},
results: [
// long list of Topic entities
],
group_by: [] // empty
Listing entities is a lot more useful when you add parameters to page, filter, search,
and sort them. Keep reading to learn how to do that.
Paging
You can see executable examples of paging in this user-contributed Jupyter notebook!
Basic paging
Use the query parameter to control which page of results you want (eg
page
page=1 , page=2 , etc). By default there are 25 results per page; you can use the
per-page parameter to change that to any number between 1 and 200.
Basic paging only works to get the first 10,000 results of any list. If you want to see
more than 10,000 results, you'll need to use cursor paging.
Cursor paging
Cursor paging is a bit more complicated than basic paging, but it allows you to
access as many records as you like.
To use cursor paging, you request a cursor by adding the cursor=* parameter-
value pair to your query.
The response to your query will include a next_cursor value in the response's
meta object. Here's what it looks like:
{
"meta": {
"count": 8695857,
"db_response_time_ms": 28,
"page": null,
"per_page": 100,
"next_cursor": "IlsxNjA5MzcyODAwMDAwLCAnaHR0cHM6Ly9vcGVuYWxleC5vcmcvV
},
"results" : [
// the first page of results
]
}
To retrieve the next page of results, copy the meta.next_cursor value into the
cursor field of your next request.
This second page of results will have a new value for meta.next_cursor . You'll use
this new value the same way you did the first, and it'll give you the second page of
results. To get all the results, keep repeating this process until meta.next_cursor
is null and the results set is empty.
Besides using cursor paging to get entities, you can also use it in group_by
queries.
It's bad for you because it will take many days to page through a long list
like /works or /authors.
It's bad for us (and other users!) because it puts a massive load on our
servers.
Instead, download everything at once, using the OpenAlex snapshot. It's free, easy,
fast, and you get all the results in same format you'd get from the API.
Filter entity lists
Filters narrow the list down to just entities that meet a particular condition--
specifically, a particular value for a particular attribute.
A list of filters are set using the parameter, formatted like this:
filter
filter=attribute:value,attribute2:value2 . Examples:
Logical expressions
Inequality
For numerical filters, use the less-than ( < ) and greater-than ( > ) symbols to filter
by inequalities. Example:
Some attributes have special filters that act as syntactic sugar around commonly-
expressed inequalities: for example, the from_publication_date filter on works .
See the endpoint-specific documentation below for more information. Example:
Negation (NOT)
You can negate any filter, numerical or otherwise, by prepending the exclamation
mark symbol ( ! ) to the filter value. Example:
Intersection (AND)
By default, the returned result set includes only records that satisfy all the supplied
filters. In other words, filters are combined as an AND query. Example:
Get all works that have been cited more than once and are free to read:
https://api.openalex.org/works?filter=cited_by_count:>1,is_oa:true
To create an AND query within a single attribute, you can either repeat a filter, or
use the plus symbol ( + ):
Get all the works that have an author from France and an author from the UK:
Using repeating filters:
https://api.openalex.org/works?
filter=institutions.country_code:fr,institutions.country_code:gb
Note that the plus symbol ( + ) syntax will not work for search filters, boolean
filters, or numeric filters.
Addition (OR)
Use the pipe symbol ( | ) to input lists of values such that any of the values can be
satisfied--in other words, when you separate filter values with a pipe, they'll be
combined as an OR query. Example:
Get all the works that have an author from France or an author from the UK:
https://api.openalex.org/works?filter=institutions.country_code:fr|gb
This is particularly useful when you want to retrieve a many records by ID all at
once. Instead of making a whole bunch of singleton calls in a loop, you can make
one call, like this:
You can combine up to 100 values for a given filter in this way. You will also need to
use the parameter per-page=100 to get all of the results per query. See our blog
post for a tutorial.
You can use OR for values within a given filter, but not between different filters. So
this, for example, doesn't work and will return an error:
Get either French works or ones published in the journal with ISSN
0957-1558:
https://api.openalex.org/works?
filter=institutions.country_code:fr|primary_location.source.issn
:0957-1558
Available Filters
The filters for each entity can be found here:
Works
Authors
Sources
Institutions
Concepts
Publishers
Funders
Search entities
The search parameter
The search query parameter finds results that match a given text search.
Example:
Get works with search term "dna" in the title, abstract, or fulltext:
https://api.openalex.org/works?search=dna
When you search works , the API looks for matches in titles, abstracts, and fulltext.
When you search concepts , we look in each concept's display_name and
description fields. When you search sources , we look at the display_name ,
alternate_titles , and abbreviated_title fields. When you search authors ,
we look at the display_name and display_name_alternatives fields. When you
search institutions , we look at the display_name , display_name_alternatives
, and display_name_acronyms fields.
For most text search we remove stop words and use stemming (specifically, the
Kstem token filter) to improve results. So words like "the" and "an" are
transparently removed, and a search for "possums" will also return records using
the word "possum." With the exception of raw affiliation strings, we do not search
within words but rather try to match whole words. So a search with "lun" will not
match the word "lunar".
https://api.openalex.org/works?
filter=display_name.search.no_stem:surgery
https://api.openalex.org/works?filter=title.search.no_stem:surgery
https://api.openalex.org/works?filter=abstract.search.no_stem:surgery
https://api.openalex.org/works?
filter=title_and_abstract.search.no_stem:surgery
Boolean searches
Including any of the words AND , OR , or NOT in any of your searches will enable
boolean search. Those words must be UPPERCASE. You can use this in all
searches, including using the search parameter, and using search filters.
This allows you to craft complex queries using those boolean operators along with
parentheses and quotation marks. Surrounding a phrase with quotation marks will
search for an exact match of that phrase, after stemming and stop-word removal
(be sure to use double quotation marks — " ). Using parentheses will specify
order of operations for the boolean operators. Words that are not separated by one
of the boolean operators will be interpreted as AND .
Behind the scenes, the boolean search is using Elasticsearch's query string query
on the searchable fields (such as title, abstract, and fulltext for works; see each
individual entity page for specifics about that entity). Wildcard and fuzzy searches
using * , ? or ~ are not allowed; these characters will be removed from any
searches. These searches, even when using quotation marks, will go through the
same cleaning as desscribed above, including stemming and removal of stop
words.
Search for works that mention "elmo" and "sesame street," but not the words
"cookie" or "monster":
https://api.openalex.org/works?search=(elmo AND "sesame street") NOT
(cookie OR monster)
Relevance score
When you use search, each returned entity in the results lists gets an extra
property called relevance_score , and the list is by default sorted in descending
order of relevance_score . The relevance_score is based on text similarity to
your search term. It also includes a weighting term for citation counts: more highly-
cited entities score higher, all else being equal.
If you search for a multiple-word phrase, the algorithm will treat each word
separately, and rank results higher when the words appear close together. If you
want to return only results where the exact phrase is used, just enclose your phrase
within quotes. Example:
Get works with the exact phrase "fierce creatures" in the title or abstract
(returns just a few results):
https://api.openalex.org/works?search="fierce%20creatures"
Get works with the words "fierce" and "creatures" in the title or abstract, with
works that have the two words close together ranked higher by
relevance_score (returns way more results):
https://api.openalex.org/works?search=fierce%20creatures
Additionally, the filter default.search is available on all entities; this works the
same as the search parameter.
You might be tempted to use the search filter to power an autocomplete or typeahead.
Instead, we recommend you use the autocomplete endpoint, which is much faster.
👎 https://api.openalex.org/institutions?filter=display_name.search:florida
👍 https://api.openalex.org/autocomplete/institutions?q=Florida
Sort entity lists
Use the ?sort parameter to specify the property you want your list sorted by. You
can sort by these properties, where they exist:
display_name
cited_by_count
works_count
publication_date
By default, sort direction is ascending. You can reverse this by appending :desc to
the sort key like works_count:desc . You can sort by multiple properties by
providing multiple sort keys, separated by commas. Examples:
Display works with only the id , doi , and display_name returned in the
results
https://api.openalex.org/works?select=id,doi,display\_name
"results": [
{
"id": "https://openalex.org/W1775749144",
"doi": "https://doi.org/10.1016/s0021-9258(19)52451-6",
"display_name": "PROTEIN MEASUREMENT WITH THE FOLIN PHENOL REAGENT"
},
{
"id": "https://openalex.org/W2100837269",
"doi": "https://doi.org/10.1038/227680a0",
"display_name": "Cleavage of Structural Proteins during the Assembly
},
// more results removed for brevity
]
Limitations
The fields you choose must exist within the entity (of course). You can only select
root-level fields.
"id": "https://openalex.org/W2138270253",
"open_access": {
"is_oa": true,
"oa_status": "bronze",
"oa_url": "http://www.pnas.org/content/74/12/5463.full.pdf"
}
You can choose to display id and open_access , but you will get an error if you try
to choose open_access.is_oa .
You can use select fields when getting lists of entities or a single entity. It does not
work with group-by or autocomplete.
Sample entity lists
You can use sample to get a random list of up to 10,000 results.
You can add a seed value in order to retrieve the same set of random records, in
the same order, multiple times.
Depending on your query, random results with a seed value may change over time due
to new records coming into OpenAlex.
Limitations
The sample size is limited to 10,000 results.
You must provide a seed value when paging beyond the first page of results.
Without a seed value, you might get duplicate records in your results.
You must use basic paging when sampling. Cursor pagination is not supported.
Autocomplete entities
The autocomplete endpoint lets you add autocomplete or typeahead components
to your applications, without the overhead of hosting your own API endpoint.
Each endpoint takes a string, and (very quickly) returns a list of entities that match
that string.
A user looking for information on the flagship of Florida's state university system.
The autocomplete endpoint is very fast; queries generally return in around 200ms.
If you'd like to see it in action, we're using a slightly-modified version of this
endpoint in the OpenAlex website here: https://explore.openalex.org/
Request format
The format for requests is simple: /autocomplete/<entity_type>?q=<query>
meta : an object with information about the request, including timing and results
count
results: a list of up to ten results for the query, sorted by citation count. Each
result represents an entity that matched against the query.
{
meta: {
count: 183,
db_response_time_ms: 5,
page: 1,
per_page: 10
},
results: [
{
id: "https://openalex.org/I33213144",
display_name: "University of Florida",
hint: "Gainesville, USA",
cited_by_count: 17190001,
entity_type: "institution",
external_id: "https://ror.org/02y3ad647"
},
// more results...
]
}
The content of the hint property varies depending on what kind of entity you're
looking up:
Work: The work's authors' display names, concatenated. e.g. "R. Alexander
Pyron, John J. Wiens"
Author: The author's last known institution, e.g. "University of North Carolina at
Chapel Hill, USA"
Source : The host_organization , e.g. "Oxford University Press"
Institution : The institution's location, e.g. "Gainesville, USA"
Concept : The Concept's description, e.g. "the study of relation between plant
species and genera"
IDs in autocomplete
Canonical External IDs and OpenAlex IDs are detected within autocomplete queries
and matched to the appropriate record if it exists. For example:
The query
https://api.openalex.org/autocomplete?q=https://orcid.org/0000-0002-
7436-3176
will search for the author with ORCID ID
https://orcid.org/0000-0002-7436-3176 and return 0 records if it does not
exist.
The query https://api.openalex.org/autocomplete/sources?q=S49861241 will
search for the source with OpenAlex ID https://openalex.org/S49861241 and
return 0 records if it does not exist.
https://api.openalex.org/autocomplete/works?
filter=publication_year:2010&search=frogs&q=greenhou
Get groups of entities
Sometimes instead of just listing entities, you want to group them into facets, and
count how many entities are in each group. For example, maybe you want to count
the number of Works by open access status. To do that, you call the entity
endpoint, adding the group_by parameter. Example:
This returns a meta object with details about the query, and a group_by object
with the groups you've asked for:
{
meta: {
count: 246136992,
db_response_time_ms: 271,
page: 1,
per_page: 200,
groups_count: 15
},
group_by: [
{
key: "article",
key_display_name: "article",
count: 202814957
},
{
key: "book-chapter",
key_display_name: "book-chapter",
count: 21250659
},
{
key: "dissertation",
key_display_name: "dissertation",
count: 6055973
},
{
key: "book",
key_display_name: "book",
count: 5400871
},
...
]
}
So from this we can see that the majority of works (202,814,957 of them) are type
article , with another 21,250,659 book-chapter , and so forth.
You can group by most of the same properties that you can filter by, and you can
combine grouping with filtering.
Group properties
Each group object in the group_by list contains three properties:
key
Value: a string; the OpenAlex ID or raw value of the group_by parameter for
members of this group. See details on key and key_display_name .
key_display_name
Value: a string; the display_name or raw value of the group_by parameter for
members of this group. See details on key and key_display_name .
count
"Unknown" groups
The "unknown" group is hidden by default. If you want to include this group in the
response, add :include_unknown after the group-by parameter.
Otherwise, key is the same as key_display_name ; both are the raw value of the
group_by parameter for this group.
Due to a technical limitation, we can only report the number of groups in the current
page, and not the total number of groups.
Paging
The maximum number of groups returned is 200. If you want to get more than 200
groups, you can use cursor pagination. This works the same as it does when
getting lists of entities, so head over to the section on paging through lists of results
to learn how.
Due to technical constraints, when paging, results are sorted by key, rather than by
count.
Rate limits and authentication
The API is rate-limited. The limits are:
If you hit the API more than 100k times in a day or more than 10 in a second, you'll
get 429 errors instead of useful data.
Are those rate limits too low for you? No problem! We can raise those limits as high
as you need if you subscribe to our Premium plan. And if you're an academic
researcher we can likely do it for free; just drop us a line at [email protected].
Are you scrolling through a list of entities, calling the API for each? You can go way
faster by squishing 50 requests into one using our OR syntax. Here's a tutorial
showing how.
Authentication
The OpenAlex API doesn't require authentication. However, it is helpful for us to
know who's behind each API call, for two reasons:
To get into the polite pool, you just have to give us an email where we can contact
you. You can give us this email in one of two ways:
Usage tips
Calling the API in your browser
Because the API is all GET requests without fancy authentication, you can view any
request in your browser. This is a very useful and pleasant way to explore the API
and debug scripts; we use it all the time.
However, this is much nicer if you install an extension to pretty-print the JSON;
JSONVue (Chrome) and JSONView (Firefox) are popular, free choices. Here's what
an API response looks like with one of these extensions enabled:
A lot prettier than cURL
Download all data
OpenAlex snapshot
For most use cases, the REST API is your best option. However, you can also
download (instructions here) and install a complete copy of the OpenAlex database
on your own server, using the database snapshot. The snapshot consists of seven
files (split into smaller files for convenience), with one file for each of our seven
entity types. The files are in the JSON Lines format; each line is a JSON object,
exactly the same as you'd get from our API. The properties of these JSON objects
are documented in each entity's object section (for example, the Work object).
The snapshot is updated about once per month; you can read release notes for
each new update here.
If you've worked with a dataset like this before, the snapshot data format may be all
you need to get going. If not, read on.
The rest of this guide will tell you how to (a) download the snapshot and (b) upload
it to your own database. We’ll cover two general approaches:
Load the intact OpenAlex records to a data warehouse (we’ll use BigQuery as an
example) and use native JSON functions to query the Work, Author, Source,
Institution, Concept, and Publisher objects directly.
Flatten the records into a normalized schema in a relational database (we’ll use
PostgreSQL) while preserving the relationships between objects.
We'll assume you're initializing a fresh snapshot. To keep it up to date, you'll have
to take the information from Downloading updated Entities and generalize from the
steps in the guide.
This is hard. Working with such a big and complicated dataset hardly ever goes
according to plan. If it gets scary, try the REST API. In fact, try the REST API first. It can
answer most of your questions and has a much lower barrier to entry.
There’s more than one way to do everything. We’ve tried to pick one reasonable
default way to do each step, so if something doesn’t work in your environment or
with the tools you have available, let us know.
Up next: the snapshot data format, downloading the data and getting it into your
database.
Snapshot data format
Here are the details on where the OpenAlex data lives and how it's structured.
There are multiple objects under each updated_date partition. Each is under
2GB.
The manifest file is JSON (in redshift manifest format) and lists all the data files
for each object type - /data/works/manifest lists all the works.
The gzip-compressed snapshot takes up about 330 GB and decompresses to
about 1.6 TB.
The structure of each entity type is documented here: Work, Author, Source,
Institution, Concept, and Publisher.
We have recently added folders for new entities topics , fields , subfields , and
domains , and we will be adding others soon. This documentation will soon be
updated to reflect these changes.
This is a screenshot showing the "leaf" nodes of one entity type, updated date
folder. You can also click around the browser links above to get a sense of the
snapshot's structure.
Downloading updated Entities
Once you have a copy of the snapshot, you'll probably want to keep it up to date.
The updated_date partitions make this easy, but the way they work may be
unfamiliar. Unlike a set of dated snapshots that each contain the full dataset as of a
certain date, each partition contains the records that last changed on that date.
/data/authors/
├── manifest
└── updated_date=2021-12-30 [1000 Authors]
├── 0000_part_00.gz
...
└── 0031_part_00.gz
If, on 2022-01-04, we made changes to 50 of those Authors , they would come out
of one of the files in /data/authors/updated_date=2021-12-30 and go into one in
/data/authors/updated_date=2022-01-04:
/data/authors/
├── manifest
├── updated_date=2021-12-30 [950 Authors]
│ ├── 0000_part_00.gz
│ ...
│ └── 0031_part_00.gz
└── updated_date=2022-01-04 [50 Authors]
├── 0000_part_00.gz
...
└── 0031_part_00.gz
If we also discovered 50 new Authors, they would go in that same partition, so the
totals would look like this:
/data/authors/
├── manifest
├── updated_date=2021-12-30 [950 Authors]
│ ├── 0000_part_00.gz
│ ...
│ └── 0031_part_00.gz
└── updated_date=2022-01-04 [100 Authors]
├── 0000_part_00.gz
...
└── 0031_part_00.gz
So if you made your copy of the snapshot on 2021-12-30, you would only need to
download /data/authors/updated_date=2022-01-04 to get everything that was
changed or added since then.
To update a snapshot copy that you created or updated on date X , insert or update
the records in objects where updated_date > X .
You never need to go back for a partition you've already downloaded. Anything that
changed isn't there anymore, it's in a new partition.
At the time of writing, these are the Author partitions and the number of records in
each (in the actual dataset):
updated_date=2021-12-30/ - 62,573,099
updated_date=2022-12-31/ - 97,559,192
updated_date=2022-01-01/ - 46,766,699
updated_date=2022-01-02/ - 1,352,773
This reflects the creation of the dataset on 2021-12-30 and 145,678,664 combined
updates and inserts since then - 1,352,773 of which were on 2022-01-02. Over
time, the number of partitions will grow. If we make a change that affects all
records, the partitions before the date of the change will disappear.
Merged Entities
See Merged Entities for an explanation of what Entity merging is and why we do it.
Alongside the folders for the six Entity types - work, author, source, institution,
concept, and publisher - you'll find a seventh folder: merged_ids. Within this folder
you'll find the IDs of Entities that have been merged away, along with the Entity IDs
they were merged into.
Keep in mind that merging an Entity ID is a way of deleting the Entity while
persisting its ID in OpenAlex. In practice, you can just delete the Entity it belongs to.
It's not necessary to keep track of the date or which entity it was merged into.
Merge operations are separated into files by date. Each file lists the IDs of Entities
that were merged on that date, and names the Entities they were merged into.
/data/merged_ids/
├── authors
│ └── 2022-06-07.csv.gz
├── institutions
│ └── 2022-06-01.csv.gz
├── venues
│ └── 2022-06-03.csv.gz
└── works
└── 2022-06-06.csv.gz
When processing this file, all you need to do is delete A2257618939. The effects of
merging these authors, like crediting A2208157607 with their Works, are already
reflected in the affected Entities.
Like the Entities' updated_date partitions, you only ever need to download
merged_ids files that are new to you. Any later merges will appear in new files with
later dates.
The file is in redshift manifest format. To use it as part of the update process for an
Entity type (we'll keep using Authors as an example):
1. Download s3://openalex/data/authors/manifest .
2. Get the file list from the url property of each item in the entries list.
3. Download any objects with an updated_date you haven't seen before.
4. Download s3://openalex/data/authors/manifest again. If it hasn't changed
since (1), no records moved around and any date partitions you downloaded are
valid.
5. Decompress the files you downloaded and parse one JSON Author per line.
Insert or update into your database of choice, using each entity's ID as a
primary key.
If you’ve worked with dataset like this before and have a toolchain picked out, this
may be all you need to know. If you want more detailed steps, proceed to download
the data.
Download to your machine
First off: anyone can get the data for free. While the files are hosted on S3 and we’ll
be using Amazon tools in these instructions, you don’t need an Amazon account.
Many thanks to the AWS Open Data program. They cover the data-transfer fees (about
$70 per download!) so users don't have to.
Before you load the snapshot contents to your database, you’ll need to get the files
that make it up onto your own computer. There are exceptions, like loading to
redshift from s3 or using an ETL product like Xplenty with an S3 connector. If either
of these apply to you, see if the snapshot data format is enough to get you started.
The easiest way to get the files is with the Amazon Web Services Command Line
Interface (AWS CLI). Sample commands in this documentation will use the AWS
CLI. You can find instructions for installing it on your system here:
https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
You can also browse the snapshot files using the AWS console here:
https://openalex.s3.amazonaws.com/browse.html. This browser and the CLI will
work without an account.
This shell command will copy everything in the openalex S3 bucket to a local
folder named openalex-snapshot . It'll take up roughly 300GB of disk space.
If you download the snapshot into an existing folder, you'll need to use the
aws s3 sync --delete flag to remove files from any previous downloads. You can
also remove the contents of destination folder manually. If you don't, you will see
duplicate Entities that have moved from one file to another between snapshot updates.
The size of the snapshot will change over time. You can check the current size
before downloading by looking at the output of:
aws s3 ls --summarize --human-readable --no-sign-request --recursive "s3:
You should get a file structure like this (edited for length - there are more objects in
the actual bucket):
openalex-snapshot/
├── LICENSE.txt
├── RELEASE_NOTES.txt
└── data
├── authors
│ ├── manifest
│ └── updated_date=2021-12-28
│ ├── 0000_part_00.gz
│ └── 0001_part_00.gz
├── concepts
│ ├── manifest
│ └── updated_date=2021-12-28
│ ├── 0000_part_00.gz
│ └── 0001_part_00.gz
├── institutions
│ ├── manifest
│ └── updated_date=2021-12-28
│ ├── 0000_part_00.gz
│ └── 0001_part_00.gz
├── sources
│ ├── manifest
│ └── updated_date=2021-12-28
│ ├── 0000_part_00.gz
│ └── 0001_part_00.gz
└── works
├── manifest
└── updated_date=2021-12-28
├── 0000_part_00.gz
└── 0001_part_00.gz
Upload to your database
Now that you have a copy of the OpenAlex data you can do one these:
This guide will have you load each entity to a single text column, then use BigQuery's
JSON functions to parse them when you run your queries. This is convenient but
inefficient since each object has to be parsed every time you run a query.
Separating the Entity data into multiple columns takes more work up front but lets you
write queries that are faster, simpler, and often cheaper.
bq mk openalex-demo:openalex
openalex-snapshot/data/works/updated_date=2021-12-28/0000_part_00.gz
openalex-snapshot/data/works/updated_date=2021-12-28/0001_part_00.gz
Here’s a command to load one works file (don’t run it yet):
bq load \
--project_id openalex-demo \
--source_format=CSV -F '\t' \
--schema 'work:string' \
openalex.works \
'openalex-snapshot/data/works/updated_date=2021-12-28/0000_part_00.gz'
bq load can only handle one file at a time, so you must run this command once
per file. But remember that the real dataset will have many more files than this
example does, so it's impractical to copy, edit, and rerun the command each time.
It's easier to handle all the files in a loop, like this:
This step is slow. How slow depends on your upload speed, but for Author and
Work we're talking hours, not minutes.
You can speed this up by using parallel or other tools to run multiple upload
commands at once. If you do, watch out for errors caused by hitting BigQuery quota
limits.
Do this once per entity type, substituting each entity name for work / works as
needed. When you’re finished, you’ll have five tables that look like this:
a screenshot of two rows of the works table from the BigQuery console
Here’s a simple one, extracting the OpenAlex ID and OA status for each work:
select
json_value(work, '$.id') as work_id,
json_value(work, '$.open_access.is_oa') as is_oa
from
`openalex-demo.openalex.works`;
It will give you a list of IDs (this is a truncated sample, the real result will be millions
of rows):
https://openalex.org/W2741809807 TRUE
https://openalex.org/W1491283979 FALSE
https://openalex.org/W1491315632 FALSE
bq query \
--project_id=openalex-demo \
--use_legacy_sql=false \
"select json_value(work, '$.id') as work_id, json_value(work, '$.open_acc
But even simple queries are hard to read and edit this way. It’s better to write them
in a file than directly on the command line. Here’s an example of a slightly more
complex query - finding the author with the most open access works of all time:
with work_authorships_oa as (
select
json_value(work, '$.id') as work_id,
json_query_array(work, '$.authorships') as authorships,
cast(json_value(work, '$.open_access.is_oa') as BOOL) as is_oa
from `openalex-demo.openalex.works`
), flat_authorships as (
select work_id, authorship, is_oa
from work_authorships_oa,
unnest(authorships) as authorship
)
select
json_value(authorship, '$.author.id') as author_id,
count(distinct work_id) as num_oa_works
from flat_authorships
where is_oa
group by author_id
order by num_oa_works desc
limit 1;
https://openalex.org/A2798520857 3297
By using a relational database, you trade flexibility for efficiency in certain selected
operations. The tables, columns, and indexes we have chosen in this guide represent
only one of many ways the entity objects could be stored. It may not be the best way
to store them given the queries you want to run. Some queries will be fast, others will
be painfully slow.
We’re going to use PostgreSQL as an example and skip the database server setup
itself. We’ll assume you have a working postgres 13+ installation on which you can
create schemas and tables and run queries. With that as a starting point, we'll take
you through these steps:
1. Define the tables the data will be stored in and some key relationships between
them (the "schema").
2. Convert the JSON Lines files you downloaded to CSV files that can be read by
the database application. We'll flatten them to fit a hierarchical database model.
3. Load the CSV data into to the tables you created.
4. Run some queries on the data you loaded.
Run it and you'll be set up to follow the next steps. To show you what it's doing,
we'll explain some excerpts here, using the concept entity as an example.
SQL in this section isn't anything additional you need to run. It's part of the schema we
already defined in the file above.
The key thing we're doing is "flattening" the nested JSON data. Some parts of this
are easy. Concept.id is just a string, so it goes in a text column called "id":
But Concept.related_concepts isn't so simple. You could store the JSON array
intact in a postgres JSON or JSONB column, but you would lose much of the
benefit of a relational database. It would be hard to answer questions about related
concepts with more than one degree of separation, for example. So we make a
separate table to hold these relationships:
We can preserve score in this relationship table and look up any other attributes
of the dehydrated related concepts in the main table concepts . Creating indexes
on concept_id and related_concept_id lets us look up concepts on both sides
of the relationship quickly.
Copy the script to the directory above your snapshot (if the snapshot is in
/home/yourname/openalex/openalex-snapshot/ , name it something like
/home/yourname/openalex/flatten-openalex-jsonl.py)
mkdir -p csv-files
python3 flatten-openalex-jsonl.py
This script is slow. Exactly how slow depends on the machine you run it on, but think
hours, not minutes.
If you're familiar with python, there are two big improvements you can make:
You should now have a directory full of nice, flat CSV files:
$ tree csv-files/
csv-files/
├── concepts.csv
├── concepts_ancestors.csv
├── concepts_counts_by_year.csv
├── concepts_ids.csv
└── concepts_related_concepts.csv
...
$ cat csv-files/concepts_related_concepts.csv
concept_id,related_concept_id,score
https://openalex.org/C41008148,https://openalex.org/C33923547,253.92
https://openalex.org/C41008148,https://openalex.org/C119599485,153.019
https://openalex.org/C41008148,https://openalex.org/C121332964,143.935
...
This script will run all the copy commands in the right order. Here's how to run it:
1. Copy it to the same place as the python script from step 2, right above the folder
with your CSV files.
2. Set the environment variable OPENALEX_SNAPSHOT_DB to the connection URI
for your database.
3. If your CSV files aren't in csv-files , replace each occurence of 'csv-files/' in
the script with the correct path.
4. Run it like this (from your shell prompt)
\i copy-openalex-csv.sql
There are a bunch of ways you can do this - just run the copy commands from the
script above in the right order in whatever client you're familiar with.
Here’s a simple one, getting the OpenAlex ID and OA status for each work:
You'll get results like this (truncated, the actual result will be millions of rows):
id oa_status
https://openalex.org/W1496190310 closed
https://openalex.org/W2741809807 gold
https://openalex.org/W1496404095 bronze
Here’s an example of a more complex query - finding the author with the most open
access works of all time:
select
author_id,
count(distinct work_id) as num_oa_works
from (
select
a.id as author_id,
w.id as work_id,
oa.is_oa
from
openalex.authors a
join openalex.works_authorships wa on a.id = wa.author_id
join openalex.works w on wa.work_id = w.id
join openalex.works_open_access oa on w.id = oa.work_id
) work_authorships_oa
where is_oa
group by 1
order by 2 desc
limit 1;
author_id num_oa_works
https://openalex.org/A2798520857 3297
(click to embiggen)
Additional Help
Tutorials
We're working on making a collection of tutorials to demonstrate how to use
OpenAlex to answer all sorts of questions. Check back often for more! Here's what
we have currently
Turn the page - Use paging to collect all of the works from an author.
Monitoring Open Access publications for a given institution - Learn how to filter
and group with the API.
What are the publication sources located in Japan? - Use the source entity to
look at a country's publications over time.
Calculate the h-index for a given author - Use filtering, sorting, and paging to get
citation counts and calculate the h-index, an author-level metric.
How are my institution's researchers collaborating with people around the
globe? - Learn about institutions in OpenAlex while exploring the
international research collaborations made by a university.
Getting started with OpenAlex Premium - Use your Premium API Key to
download the latest updates from our API and keep your data in sync with ours.
Introduction to openalexR - In this R notebook, an accompaniment to the
webinar on openalexR, you'll learn the basics of using the openalexR library to
get data from OpenAlex.
Report bugs
Oh no, you found a bug! 🕷️
Please tell us about it using this form on our help page.
FAQ
How do I cite OpenAlex?
See our citation section here.
When we find duplicated works, authors, etc that already have assigned IDs, we
merge them. Merged entities will redirect to the proper entity in the API. In the data
snapshot, there is a directory which lists the IDs that have been merged.
*In July 2023, OpenAlex switched to a new, more accurate, author identification
system, replaced all OpenAlex Author IDs with new ones. This is a very rare case in
which we violate the rule of having stable IDs, which is needed to make the
improvements. Old IDs and their connections to works remain available in the
historical OpenAlex data.
If your example DOI is in Crossref but not in OpenAlex, please send us a support
request so we can look into it further!
ORCID
ROR
DOAJ
Unpaywall
Pubmed
Pubmed Central
The ISSN International Centre
Internet Archive
Web crawls
Subject-area and institutional repositories from arXiv to Zenodo and everywhere
in between
Learn more at our general help center article: About the data
For those who would like a higher level of service and to provide direct financial
support for our mission, we offer OpenAlex Premium. Click here to learn more.
We're currently still exploring our options for OpenAlex's sustainability plan. Thanks
to a generous grant from Arcadia, we've got lots of runway, and we don't need to
roll anything out in a rush.
Our Unpaywall project (a free index of the world's open-access research literature)
has been self-sustaining via a freemium revenue model for nearly five years, and
we have recently introduced a similar model in OpenAlex Premium. Access to the
data will always be free for everyone, but OpenAlex Premium offers several benefits
in service above the services we offer for free.