CWTS Leiden Ranking 2021
CWTS Leiden Ranking 2021
CWTS Leiden Ranking 2021
Methodology
Centre for Science and
Technology Studies,
Leiden University
Data
The CWTS Leiden Ranking 2021 is based on bibliographic data from the Web of
Science database produced by Clarivate Analytics. Below we discuss the Web of
Science data that is used in the Leiden Ranking. We also discuss the enrichments
made to this data by CWTS.
Web of Science
The Web of Science database consists of a number of citation indices. The Leiden
Ranking uses data from the Science Citation Index Expanded, the Social Sciences
Citation Index, and the Arts & Humanities Citation Index. The Leiden Ranking is
based on Web of Science data because Web of Science offers a good coverage of the
international scientific literature and generally provides high quality data.
The Leiden Ranking does not take into account conference proceedings publications
and book publications. This is an important limitation in certain research fields,
especially in computer science, engineering, and the social sciences and humanities.
Enriched data
CWTS enriches Web of Science data in a number of ways. First of all, CWTS performs
its own citation matching (i.e., matching of cited references to the publications they
refer to). Furthermore, in order to calculate the various indicators included in the
Leiden Ranking, CWTS identifies publications by industrial organizations in Web of
Science, CWTS performs geocoding of the addresses listed in publications, CWTS
assigns open access labels (gold, hybrid, bronze, green) to publications, and CWTS
disambiguates authors and attempts to determine their gender. Most importantly,
CWTS puts a lot of effort in assigning publications to universities in a consistent and
accurate way. This is by no means a trivial issue. Universities may be referred to
using many different name variants, and the definition and delimitation of
universities is not obvious at all. The methodology employed in the Leiden Ranking
to assign publications to universities is discussed below.
More information
www.leidenranking.com | Page 2
Centre for Science and
Technology Studies,
Leiden University
Caron E., & Van Eck, N.J. (2014). Large scale author name disambiguation using rule-
based scoring and clustering. In E. Noyons, editor, Proceedings of the 19th
International Conference on Science and Technology Indicators (pp. 79-86).
Olensky, M., Schmidt, M., & Van Eck, N.J. (2016). Evaluation of the citation matching
algorithms of CWTS and iFQ in comparison to Web of Science. Journal of the
Association for Information Science and Technology, 67(10), 2550–2564.
doi:10.1002/asi.23590.
Waltman, L., Tijssen, R.J.W., & Van Eck, N.J. (2011). Globalisation of science in
kilometres.Journal of Informetrics, 5(4), 574–582.
doi:10.1016/j.joi.2011.05.003.
www.leidenranking.com | Page 3
Centre for Science and
Technology Studies,
Leiden University
Universities
The CWTS Leiden Ranking 2021 includes 1225 universities worldwide. These
universities have been selected based on their number of Web of Science indexed
publications in the period 2016–2019. As discussed below, a sophisticated data
collection methodology is employed to assign publications to universities.
Identification of universities
www.leidenranking.com | Page 4
Centre for Science and
Technology Studies,
Leiden University
Ranking still distinguishes between the different constituent institutions. The Leiden
Ranking 2021 includes French organizations that are designated as “établissements
publics expérimentaux (EPE)”. This is a new type of HEI in France created by the law
of 12 December 2018 in which different research and higher education institutions
work together in order to eventually form a single HEI. Research and educational
organizations that are part of a EPE as “établissements-composantes” will no longer
be included as separate organizations in the Leiden Ranking 2021.
Affiliated institutions
1. Component
3. Associated organization
The third type of affiliated institution is the associated organization, which is more
loosely connected to a university. This organization is an autonomous institution
www.leidenranking.com | Page 5
Centre for Science and
Technology Studies,
Leiden University
that collaborates with one or more universities based on a joint purpose but at the
same time has separate missions and tasks. In many countries, hospitals that
operate as teaching or university hospitals fall into this category. The Massachusetts
General Hospital, one of the teaching hospitals of the Harvard Medical School, is an
example of an associated organization.
www.leidenranking.com | Page 6
Centre for Science and
Technology Studies,
Leiden University
Selection of universities
The Leiden Ranking 2021 includes 1225 universities from 69 different countries.
These are all universities worldwide that have produced at least 800 Web of Science
indexed publications in the period 2016–2019. Only so-called core publications are
counted, which are publications in international scientific journals. Also, only
research articles and review articles are taken into account. Other types of
publications are not considered. Furthermore, collaborative publications are counted
fractionally. For instance, if a publication includes five authors of which two belong
to a particular university, the publication is counted with a weight of 2 / 5 = 0.4 for
that university.
Data quality
www.leidenranking.com | Page 7
Centre for Science and
Technology Studies,
Leiden University
Main fields
The CWTS Leiden Ranking 2021 provides statistics not only at the level of science as
a whole but also at the level of the following five main fields of science:
As discussed below, these five main fields are defined based on large number of
micro-level fields.
Publications are assigned to the five main fields using an algorithmic approach.
Traditionally, fields of science are defined by sets of related journals. This approach
is problematic especially in the case of multidisciplinary journals such as Nature,
PLOS ONE, PNAS, and Science, which do not belong to one specific scientific field.
The five main fields listed above are defined at the level of individual publications
rather than at the journal level. In this way, publications in multidisciplinary journals
can be properly assigned to a field.
1. We start with 4140 micro-level fields of science. These fields are constructed
algorithmically. Using a computer algorithm, each publication in W eb of
Science is assigned to one of the 4140 fields. This is done based on a large-
scale analysis of hundreds of millions of citation relations between
publications.
www.leidenranking.com | Page 8
Centre for Science and
Technology Studies,
Leiden University
2. We then determine for each of the 4140 micro-level fields the overlap with
each of the 254 journal subject categories defined in Web of Science
(excluding the Multidisciplinary Sciences subject category).
3. Each subject category in Web of Science has been linked to one of the five
main fields. Based on the link between subject categories and main fields, we
assign each of the 4140 micro-level fields to one or more of the five main
fields. A micro-level field is assigned to a main field if at least 25% of the
publications in the micro-level field belong to subject categories linked to the
main field.
After the above steps have been taken, each publication in Web of Science has an
assignment to a micro-level field, and each micro-level field in turn has an
assignment to at least one main field. Combining these results, we obtain for each
publication an assignment to one or more main fields.
More information
For more information on the methodology for the algorithmic construction of the
micro-level fields, we refer to a paper by Waltman and Van Eck (2012). The
methodology makes use of the Leiden algorithm. This algorithm is documented in a
paper by Traag et al. (2019).
Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a
publication-level classification system of science. Journal of the American Society
for Information Science and Technology, 63(12), 2378–2392.
doi:10.1002/asi.22748.
Traag, V.A., Waltman, L., & Van Eck, N.J. (2019). From Louvain to Leiden:
Guaranteeing well-connected communities. Scientific Reports, 9, 5233.
doi:10.1038/s41598-019-41695-z.
www.leidenranking.com | Page 9
Centre for Science and
Technology Studies,
Leiden University
Indicators
The CWTS Leiden Ranking 2021 offers a sophisticated set of bibliometric indicators
that provide statistics at the level of universities on scientific impact, collaboration,
open access publishing, and gender diversity. The indicators available in the Leiden
Ranking are discussed in detail below.
Publications
The Leiden Ranking takes into account only a subset of the publications in the
Science Citation Index Expanded, the Social Sciences Citation Index, and the Arts &
Humanities Citation Index. We refer to the publications in this subset as core
publications. Core publications are publications in international scientific journals in
fields that are suitable for citation analysis. In order to be classified as a core
publication, a publication must satisfy the following criteria:
The last criterion is a very important one. In the Leiden Ranking, a journal is
considered a core journal if it meets the following conditions:
www.leidenranking.com | Page 10
Centre for Science and
Technology Studies,
Leiden University
In the calculation of the Leiden Ranking indicators, only core publications are taken
into account. Excluding non-core publications ensures that the Leiden Ranking is
based on a relatively homogeneous set of publications, namely publications in
international scientific journals in fields that are suitable for citation analysis. The
use of such a relatively homogeneous set of publications enhances the international
comparability of universities. It should be emphasiz ed that non-core publications are
excluded not because they are considered less important than core publications.
Non-core publications may have an important scientific value. About one-sixth of the
publications in Web of Science are excluded because they have been classified as
non-core publications.
Our concept of core publications should not be confused with the Web of Science
Core Collection. The Web of Science Core Collection represents a subset of the
citation indices available in Web of Science. As explained above, the core
publications on which the Leiden Ranking is based represent a subset of the
publications in the Science Citation Index Expanded, the Social Sciences Citation
Index, and the Arts & Humanities Citation Index.
Indicators included in the Leiden Ranking have two variants: A size-dependent and a
size-independent variant. In general, size-dependent indicators are obtained by
counting the absolute number of publications of a university that hav e a certain
property, while size-independent indicators are obtained by calculating the
proportion of the publications of a university with a certain property. For instance,
the number of highly cited publications of a university and the number of
publications of a university co-authored with other organizations are size-dependent
indicators. The proportion of the publications of a university that are highly cited and
the proportion of a university’s publications co-authored with other organizations
are size-independent indicators. In the case of size-dependent indicators, universities
with a larger publication output tend to perform better than universities with a
www.leidenranking.com | Page 11
Centre for Science and
Technology Studies,
Leiden University
smaller publication output. Size-independent indicators have been corrected for the
size of the publication output of a university. Hence, when size-independent
indicators are used, both larger and smaller universities may perform well.
• P(top 1%) and PP(top 1%). The number and the proportion of a university’s
publications that, compared with other publications in the same field and in
the same year, belong to the top 1% most frequently cited.
• P(top 5%) and PP(top 5%). The number and the proportion of a university’s
publications that, compared with other publications in the same field and in
the same year, belong to the top 5% most frequently cited.
• P(top 10%) and PP(top 10%). The number and the proportion of a university’s
publications that, compared with other publications in the same field and in
the same year, belong to the top 10% most frequently cited.
• P(top 50%) and PP(top 50%). The number and the proportion of a university’s
publications that, compared with other publications in the same field and in
the same year, belong to the top 50% most frequently cited.
• TCS and MCS. The total and the average number of citations of the
publications of a university.
• TNCS and MNCS. The total and the average number of citations of the
publications of a university, normalized for field and publication year. An
MNCS value of two for instance means that the publications of a university
have been cited twice above the average of their field and publication year.
Citations are counted until the end of 2020 in the calculation of the above indicators.
Author self-citations are excluded. All indicators except for TCS and MCS are
normalized for differences in citation patterns between scientific fields. For the
purpose of this field normalization, about 4000 fields are distinguished. These fields
are defined at the level of individual publications. Using a computer algorithm, each
publication in Web of Science is assigned to a field based on its citation relations
with other publications.
www.leidenranking.com | Page 12
Centre for Science and
Technology Studies,
Leiden University
The TCS, MCS, TNCS, and MNCS indicators are not available on the main ranking
page. These indicators can be accessed by clicking on the name of a university. An
overview of all bibliometric statistics available for the university will then be
presented. This overview also includes the TCS, MCS, TNCS, and MNCS indicators.
Collaboration indicators
• P(int collab) and PP(int collab). The number and the proportion of a
university’s publications that have been co-authored by two or more
countries.
• P(<100 km) and pp(<100 km). The number and the proportion of a
university’s publications with a geographical collaboration distance of less
than 100 km. The geographical collaboration distance of a publication equals
the largest geographical distance between two addresses mentioned in the
publication’s address list.
• P(>5000 km) and PP(>5000 km). The number and the proportion of a
university’s publications with a geographical collaboration distance of more
than 5000 km.
Some limitations of the above indicators need to be mentioned. In the case of the
P(industry) and PP(industry) indicators, we have made an effort to identify industrial
organizations as accurately as possible. Inevitably, however, there will be
inaccuracies and omissions in the identification of industrial organizations. In the
www.leidenranking.com | Page 13
Centre for Science and
Technology Studies,
Leiden University
case of the P(<100 km), pp(<100 km), P(>5000 km), and PP(>5000 km) indicators, we
rely on geocoding of addresses listed in Web of Science. There may be some
inaccuracies in the geocoding that we have performed, and for addresses that are
used infrequently no geocodes may be available. In general, we expect these
inaccuracies and omissions to have only a small effect on the indicators.
The Leiden Ranking provides the following indicators of open access publishing:
• P(OA) and PP(OA). The number and the proportion of open access
publications of a university.
• P(gold OA) and PP(gold OA). The number and the proportion of gold open
access publications of a university. Gold open access publications are
publications in an open access journal.
• P(hybrid OA) and PP(hybrid OA). The number and the proportion of hybrid
open access publications of a university. Hybrid open access publications are
publications in a subscription journal that are open access with a license that
allows the publication to be reused.
• P(bronze OA) and PP(bronze OA). The number and the proportion of bronze
open access publications of a university. Bronze open access publications are
publications in a subscription journal that are open access without a license
that allows the publication to be reused.
• P(green OA) and PP(green OA). The number and the proportion of green open
access publications of a university. Green open access publications are
publications in a subscription journal that are open access not in the journal
itself but in a repository.
• P(OA unknown) and PP(OA unknown). The number and the proportion of a
university’s publications for which the open access status is unknown. These
publications typically do not have a DOI in the Web of Science database.
www.leidenranking.com | Page 14
Centre for Science and
Technology Studies,
Leiden University
Gender indicators
For each authorship of a university, the gender is determined using the following
four-step procedure:
1
https://unpaywall.org
www.leidenranking.com | Page 15
Centre for Science and
Technology Studies,
Leiden University
most often in the author’s publications, the author is linked to this country.
Otherwise, the author is linked to all countries occurring in his or her
publications.
3. Retrieval of gender statistics. For each author, gender statistics are collected
from three sources: Gender API2, Genderize.io3, and Gender Guesser4. Gender
statistics are obtained based on the first name of an author and the countries
to which the author is linked.
4. Gender assignment. For each author, a gender (male or female) is assigned if
Gender API is able to determine the gender with a reported accuracy of more
than 90%. If Gender API does not recognize the first name of an author,
Gender Guesser and Genderize.io are used. If none of these sources is able to
determine the gender of an author with sufficient accuracy, the gender is
considered unknown. For authors from Russia and a number of other
countries, the last name is also used to determine the gender of the author.
Using the above procedure, the gender can be determined for about 70% of all
authorships of universities included in the Leiden Ranking. For the remaining
authorships, the gender is unknown.
Counting method
The scientific impact indicators in the Leiden Ranking can be calculated using either
a full counting or a fractional counting method. The full counting method gives a full
weight of one to each publication of a university. The fractional counting method
gives less weight to collaborative publications than to non-collaborative ones. For
instance, if a publication has been co-authored by five researchers and two of these
researchers are affiliated with a particular university, the publication has a weight of
2 / 5 = 0.4 in the calculation of the scientific impact indicators for this university.
The fractional counting method leads to a more proper field normalization of
scientific impact indicators and therefore to fairer comparisons between universities
active in different fields. For this reason, fractional counting is the preferred
counting method for the scientific impact indicators in the Leiden Ranking.
2
https://gender-api.com
3
https://genderize.io
4
https://pypi.org/project/gender-guesser/0.4.0/
www.leidenranking.com | Page 16
Centre for Science and
Technology Studies,
Leiden University
Collaboration, open access, and gender indicators are always calculated using the
full counting method.
Trend analysis
To facilitate trend analyses, the Leiden Ranking provides statistics not only based on
publications from the period 2016–2019, but also based on publications from earlier
periods: 2006–2009, 2007–2010, …, 2015–2018. The statistics for the different
periods are calculated in a fully consistent way. For each period, citations are
counted until the end of the first year after the period has ended. For instance, in the
case of the period 2006–2009 citations are counted until the end of 2010, while in
the case of the period 2016–2019 citations are counted until the end of 2020.
Stability intervals
Stability intervals provide some insight into the uncertainty in bibliometric statistics.
A stability interval indicates a range of values of an indicator that are likely to be
observed when the underlying set of publications changes. For instance, the PP(top
10%) indicator may be equal to 15.3% for a particular university, with a stability
interval ranging from 14.1% to 16.5%. This means that the PP(top 10%) indicator
equals 15.3% for this university, but that changes in the set of publications of the
university may relatively easily lead to PP(top 10%) values in the range from 14.1% to
16.5%. The Leiden Ranking employs 95% stability intervals constructed using a
statistical technique known as bootstrapping.
More information
More information on the indicators available in the Leiden Ranking can be found in a
number of papers published by CWTS researchers. A detailed discussion of the
Leiden Ranking is presented by Waltman et al. (2012). This paper relates to the
2011/2012 edition of the Leiden Ranking. Although the paper is not up-to-date
anymore, it still provides relevant information on the Leiden Ranking. Field
normalization of scientific impact indicators based on algorithmically defined fields
is studied by Ruiz-Castillo and Waltman (2014). The methodology adopted in the
Leiden Ranking for identifying core publications and core journals is outlined by
Waltman and Van Eck (2013a, 2013b). Finally, the importance of using fractional
rather than full counting in the calculation of field-normalized scientific impact
indicators is explained by Waltman and Van Eck (2015).
www.leidenranking.com | Page 17
Centre for Science and
Technology Studies,
Leiden University
Waltman, L., Calero-Medina, C., Kosten, J., Noyons, E.C.M., Tijssen, R.J.W., Van Eck,
N.J., Van Leeuwen, T.N., Van Raan, A.F.J., Visser, M.S., & Wouters, P. (2012). The
Leiden Ranking 2011/2012: Data collection, indicators, and interpretation.
Journal of the American Society for Information Science and Technology, 63(12),
2419-2432. doi:10.1002/asi.22708.
Waltman, L., & Van Eck, N.J. (2013a). Source normalized indicators of citation impact:
An overview of different approaches and an empirical comparison.
Scientometrics, 96(3), 699-716. doi:10.1007/s11192-012-0913-4.
Waltman, L., & Van Eck, N.J. (2013b). A systematic empirical comparison of different
approaches for normalizing citation impact indicators. Journal of Informetrics,
7(4), 833–849. doi:10.1016/j.joi.2013.08.002.
Waltman, L., & Van Eck, N.J. (2015). Field-normalized citation impact indicators and
the choice of an appropriate counting method. Journal of Informetrics, 9(4), 872–
894. doi:10.1016/j.joi.2015.08.001.
www.leidenranking.com | Page 18