Host and Viral Traits Predict Zoonotic Spillover From Mammals. Nature 546,646-650 (2017) .
Host and Viral Traits Predict Zoonotic Spillover From Mammals. Nature 546,646-650 (2017) .
Host and Viral Traits Predict Zoonotic Spillover From Mammals. Nature 546,646-650 (2017) .
1038/nature22975
The majority of human emerging infectious diseases are zoonotic, highest for Bunya-, Flavi- and Arenaviruses in rodents; Flavi-, Bunya-
with viruses that originate in wild mammals of particular concern and Rhabdoviruses in bats; and Herpesviruses in non-human primates
(for example, HIV, Ebola and SARS)1–3. Understanding patterns (Extended Data Fig. 2). Of 586 mammalian viruses in our dataset,
of viral diversity in wildlife and determinants of successful cross- 263 (44.9%) have been detected in humans, 75 of which are exclu-
species transmission, or spillover, are therefore key goals for sively human and 188 (71.5% of human viruses) zoonotic—defined
pandemic surveillance programs4. However, few analytical tools operationally here as viruses detected at least once in humans and at
exist to identify which host species are likely to harbour the next least once in another mammal species (Methods). The proportion
human virus, or which viruses can cross species boundaries5–7. Here of zoonotic viruses is higher for RNA (159 of 382, 41.6%) than DNA
we conduct a comprehensive analysis of mammalian host–virus (29 of 205, 14.1%) viruses. The observed number of viruses per wild
relationships and show that both the total number of viruses that host species was comparable when averaged across orders, but bats,
infect a given species and the proportion likely to be zoonotic are primates, and rodents had a higher proportion of observed zoonotic
predictable. After controlling for research effort, the proportion of viruses compared to other groups of mammals (Fig. 1). Species in other
zoonotic viruses per species is predicted by phylogenetic relatedness orders (for example, Cingulata, Pilosa, Didelphimorphia, Eulipotyphla)
to humans, host taxonomy and human population within a species also shared a majority of their observed viruses with humans, but
range—which may reflect human–wildlife contact. We demonstrate data were limited in these less diverse and poorly studied orders.
that bats harbour a significantly higher proportion of zoonotic Several species of domesticated ungulates (orders Cetartiodactyla and
viruses than all other mammalian orders. We also identify the Perissodactyla) are outliers for their number of observed viruses, but
taxa and geographic regions with the largest estimated number of these species have a relatively low proportion of zoonotic viruses (Fig. 1;
‘missing viruses’ and ‘missing zoonoses’ and therefore of highest Supplementary Discussion).
value for future surveillance. We then show that phylogenetic Previous analyses show that zoonotic disease emergence events and
host breadth and other viral traits are significant predictors of human pathogen species richness are spatially correlated with mammal
zoonotic potential, providing a novel framework to assess if a newly and bird diversity2,17. However, these studies weight all species equally.
discovered mammalian virus could infect people. In reality, the risk of zoonotic viral transmission, or spillover, probably
Viral zoonoses are a serious threat to public health and global varies among host species owing to differences in underlying viral
security, and have caused the majority of recent pandemics in people4, richness, opportunity for contact with humans, propensity to exhibit
yet our understanding of the factors driving viral diversity in mammals, clinical signs that exacerbate viral shedding18, other ecological, behavi
viral host range, and cross-species transmission to humans remains oural and life-history differences5,12,15, and phylogenetic proximity to
poor. Recent studies have described broad patterns of pathogen host humans10. We hypothesize that the number of viruses a given mammal
range1,3 and various host or microbial factors that facilitate cross- species shares with humans increases with phylogenetic proximity to
species transmission5,7,8, or have focused on factors promoting humans and with opportunity for human contact. We used generalized
pathogen and p arasite sharing within specific mammalian taxonomic additive models (GAMs) to identify and rank host-specific predictors
groups including primates9–11, bats12–14, and rodents12,15—but to (ecological, life history, taxonomic, and phylogenetic traits, and a
date there has been no comprehensive, species-level analysis of viral control for research effort) of the number of total and zoonotic viruses
sharing between humans and all mammals. Here we create, and then in mammals (Methods; Supplementary Table 1).
analyse, a database of 2,805 mammal–virus associations, including The best-fit model for total viral richness per wild mammal s pecies
754 mammal species (14% of global mammal diversity) from 15 orders explained 49.2% of the total deviance, and included a per-species
and 586 unique viral species (every recognized virus found in measure of disease-related research effort, phylogenetically corrected
mammals16) from 28 viral families (Methods). We use these data to body mass, geographic range, mammal sympatry, and taxonomy (order)
test hypotheses on the determinants of viral richness and viral sharing (Fig. 2a–e). Not surprisingly, research effort had the strongest effect
with humans. We fit three inter-related models to elucidate specific on the total number of viruses per host, explaining 31.9% of the total
components of the process of zoonotic spillover (Extended Data Fig. 1). deviance for this model (Extended Data Table 1). The remaining
First, we identify factors that influence total viral richness (that is, t he 17.3% can be explained by biological factors, a value greater than or
number of unique viral species found in a given host, i ncluding those comparable to studies examining much narrower groups of mammal
which may have the potential to infect humans). Second, we identify hosts10,12,15 (Supplementary Discussion). Mammal sympatry was
and rank the ecological, phylogenetic and life-history traits that make the second most important predictor of total viral richness (Fig. 2d).
some s pecies more likely hosts of zoonoses than others. Third, recog- Our model selection consistently identified mammal sympatry
nizing that not all mammalian viruses will have the biological capacity calculated at a ≥20% area overlap over other thresholds explored
to infect humans, we identify and rank viral traits that increase the (Methods), providing insight into the minimum geographic overlap
likelihood of a virus being zoonotic. needed to facilitate viral sharing between hosts. Host geographic
In examining the raw data, we found that observed viral richness range was also significantly associated with increasing total viral
within mammals varies at a host order and viral family level, and is richness, although the strength of this effect was low (Fig. 2c). Several
1
EcoHealth Alliance, 460 West 34th Street, New York, New York 10001, USA.
6 4 6 | N A T U R E | V O L 5 4 6 | 2 9 J une 2 0 1 7
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Letter RESEARCH
Cingulata
Pilosa
Didelphimorphia
Eulipotyphla
Chiroptera
Primates
Rodentia
Carnivora
Lagomorpha
Proboscidea
Diprotodontia
Cetartiodactyla
Perissodactyla
Peramelemorphia
Scandentia
Figure 1 | Observed viral richness in mammals. a, b, Box plots of boxes, interquartile range. Animal silhouettes from PhyloPic. Data based
proportion of zoonotic viruses (a) and total viral richness per species (b), on 2,805 host–virus associations. See Methods for image credits and
aggregated by order. Data points represent wild (light grey, n = 721) and licensing.
domestic (dark red, n = 32) mammal species; lines represent median,
mammalian orders, Chiroptera (bats), Rodentia (rodents), Primates, social structure, and other life-history variables) may underlie their
Cetartiodactyla (even-toed u ngulates), and Perissodactyla (odd-toed capacity to harbour a greater number of viral species. Our models to
ungulates) listed here in order of relative deviance explained, had a predict total viral richness were comparable when excluding virus–
significantly greater mean viral richness than predicted by the other host associations detected by serology, that is, using the ‘stringent
variables (Fig. 2e). This finding highlights these taxa as important data’, and were robust when validated with random cross-validation
targets for global viral discovery in wildlife4, and suggests that traits tests (Extended Data Table 1; Supplementary Table 2). However, we
not captured in our analysis (for example, immunological function, identified several regions that showed significant bias when cross-
a b c d e
2 2 2 2 2
Strength of effect on
viruses per host
1 1 1 1 1
0 0 0 0 0
–1 –1 –1 –1 –1
–2 –2 –2 –2 –2
Chiroptera
Eulipotyphla
Cetartiodacty
Primates
Rodentia
Perissodactyla
0
0 0 0
–1
–1 –1 –1
Chiroptera
Perissodactyla
Cetartiodactyla
–2
Figure 2 | Host traits that predict total viral richness (top row) and range; and e, mammalian orders. f–i, Best model for proportion of
proportion of zoonotic viruses (bottom row) per wild mammal species. zoonoses includes: f, research effort (log); g, phylogenetic distance from
Partial effect plots show the relative effect of each variable included in humans (cytochrome b tree constrained to the topology of the mammal
the best-fit GAM, given the effect of the other variables. Shaded circles supertree28); h, ratio of urban to rural human population within species
represent partial residuals; shaded areas, 95% confidence intervals around range; and i, three mammalian orders. Bats are the only order with a
mean partial effect. a–e, Best model for total viral richness includes: significantly larger proportion of zoonotic viruses than would be predicted
a, number of disease-related citations per host species (research effort, by the other variables in the all-data model. Three additional mammalian
log); b, phylogenetic eigenvector regression (PVR) of body mass (log); orders, and whether or not a species is hunted, improved the overall
c, geographic range area of each species (log km2); d, number of sympatric predictive power of the best zoonotic virus model but were non-significant
mammal species overlapping with at least 20% area of target species and are not shown (see Extended Data Table 1).
2 9 J une 2 0 1 7 | V O L 5 4 6 | N A T U R E | 6 4 7
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH Letter
1,600
1,400 120
1,200 100
1,000 80
800 60
600
40
400
200 20
a 0 b 0
100
1,000
80 800
60 600
40 400
20 200
c d 0
0
80 250
70
200
60
50 150
40
30 100
20 50
10
e 0 f 0
Figure 3 | Global distribution of the predicted number of ‘missing d, Bats (order Chiroptera, n = 157). e, Primates (order Primates, n = 73).
zoonoses’ by order. Warmer colours highlight areas predicted to be of f, Rodents (order Rodentia, n = 183). Hatched regions represent areas
greatest value for discovering novel zoonotic viruses. a, All wild mammals where model predictions deviate systematically for the assemblage of
(n = 584 spp. included in the best-fit model). b, Carnivores (order species in that grid cell (approximately 18 km × 18 km, see Methods).
Carnivora, n = 55). c, Even-toed ungulates (order Cetartiodactyla, n = 70). Animal silhouettes from PhyloPic.
validated by excluding mammals from zoogeographic areas, suggesting phylogenetic effect, our best-fit model excluded the effect of the order
that there are location-specific factors that remain unexplained in our Primates as a discrete variable (Fig. 2i), suggesting that continuous vari-
models (Methods; Supplementary Table 3). ation in phylogenetic distance across primate species is more important,
Our best model to predict the number of zoonotic viruses per wild and is significant even when all mammals are included. This finding
mammal species explained 82% of the deviance, and included phy- highlights the need to uncover the mechanism by which phylogeny
logenetic distance from humans, the ratio of urban to rural human affects spillover risk, for example, evolutionarily related species sharing
population across a species range, host order, whether or not a species host cell receptors and viral binding affinities22,23 and specific viral
is hunted, disease-related research effort, and total viral richness mutations that may expand host range in related mammal species24.
(Extended Data Table 1). A large fraction of the deviance explained We tested several measures to estimate human–wildlife contact at a
is driven by the observed total viral richness per host, supporting the global scale for the 721 wild mammals in our dataset, but only the ratio
biological assumption that the number of viruses that infect humans of urban to rural human population (all data model), the change in
scales positively with the size of the potential ‘zoonotic pool’19 in each human population density, and the change in urban to rural population
reservoir host. Removing this contribution by including observed total ratio from 1970–2005 across a species range (stringent data model)
viral richness per host as an offset, the model explains 33% of the total were included (Extended Data Table 1). The response curve for urban
deviance in the proportion of viruses that are zoonotic (Methods), with to rural population suggests that increasing urbanization raises the
30% of total deviance explained by biological factors (Fig. 2f–i). Some risk of zoonotic spillover (Fig. 2h), as does increasing human popu-
mammalian orders had a significant positive (bats) or negative (two lation density and the change in urban to rural population ratio over
ungulate orders) effect on the proportion of zoonotic viruses (Fig. 2i). time. A single global metric of human–wildlife ecological contact did
A number of previous studies have proposed that bats are special among not emerge across models. However, the alternate inclusion of these
mammals as reservoir hosts of a large number of recently emerging related variables points to the importance of human–animal contact in
high-profile zoonoses (for example, SARS, Ebola virus, MERS)12,13,20. defining per-species spillover risk globally, and the need for controlled
Our study tests this hypothesis in the context of all known mammalian field experiments and human behavioural risk studies to uncover the
viruses and hosts. While other mammalian orders have relatively high mechanisms underlying this risk. Overall, the strength of the effect
proportions of observed zoonoses and others have been poorly studied of phylogenetic proximity was stronger than our proxies for animal–
(Fig. 1a), our model results show that bats are host to a significantly human contact in predicting proportion of zoonoses (30–44% stronger
higher proportion of zoonoses than all other mammalian orders after explanatory factor), but both remained significant after controlling for
controlling for reporting effort and other predictor variables. research effort (Extended Data Table 1).
We found that the proportion of zoonotic viruses per species increases The predominance of zoonoses of wildlife origin in emerging
with host phylogenetic proximity to humans, and that this relationship diseases has led to a series of programs to sample wildlife, discover
is significant even when we removed ‘reverse zoonoses’ primarily asso- novel viruses, and assess their zoonotic potential4,23,25,26. To inform
ciated with transmission from humans to primates (Methods). This is their scale and scope we calculate the expected number of as-yet undis-
the first time this relationship has been demonstrated using data for all covered viruses and zoonoses per host species using our best-fit GAMs
mammals and specifically as a determinant of zoonotic spillover, and and a scenario of increased research effort (Methods, Supplementary
is supported by previous taxon-specific studies that have examined Table 4). We then project these ‘missing viruses’ and ‘missing zoon-
host relatedness and parasite/pathogen sharing in primates9,10, bats14 oses’ geographically (Fig. 3, Extended Data Figs 3–8) to identify regions
and plants21. The proportion of zoonotic viruses shows some upward of the world where targeted, future surveillance to find new viruses
drift for mammals that are very phylogenetically distant from humans and zoonoses will be most effective. In the process of translating our
(Fig. 2g) that may represent an artefact of preferentially screening non-spatial, species-level predictions to geographic space, we identified
marsupials for human viruses. While primate species largely drive the several regions where our model predictions of the number of total
6 4 8 | N A T U R E | V O L 5 4 6 | 2 9 J une 2 0 1 7
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Letter RESEARCH
Strength of effect
Picobirnaviridae
5
Hepeviridae
Bornaviridae 0
Orthomyxoviridae
–5
Filoviridae
Togaviridae –10
Bunyaviridae –4 –3 –2 –1 0
Paramyxoviridae Max. phylogenetic
Flaviviridae host breadth (log)
Rhabdoviridae
Arenaviridae c
Poxviridae
Reoviridae 10
Strength of effect
Picornaviridae
5
Hepadnaviridae
Retroviridae 0
Adenoviridae –5
Parvoviridae
Polyomaviridae –10
Coronaviridae 0 3 6 9 12
PubMed citations (log)
Caliciviridae
Herpesviridae
Circoviridae d
10
Arteriviridae
Papillomaviridae
5
Strength of effect
Astroviridae
Asfarviridae 0
–5
0 100 200 300
Max. phylogenetic host breadth –10
Proportion of family zoonotic Vector- Cytoplasmic
borne replication
0 1
Figure 4 | Traits that predict zoonotic potential of a virus. a, Box plot phylogenetically broader range of hosts are more likely to be zoonotic.
of maximum phylogenetic host breadth per virus (PHB, see methods) for c, Research effort (log, number of PubMed citations per viral species).
each of 586 mammalian viruses, aggregated by 28 viral families. Individual d, Whether or not a virus replicates in the cytoplasm or is vector-borne.
points represent viral species, colour-coded by zoonotic status. Box plots Viral genome length and whether or not a virus is enveloped improved the
coloured and sorted by the proportion of zoonoses in each viral family. overall predictive power but were non-significant and are not shown
b–d, Partial effect plots for the best-fit GAM to predict the zoonotic (see Extended Data Table 1).
potential of a virus. b, Maximum PHB. Viruses that infect a
and zoonotic viruses were systematically biased (hatched regions in Finally, a significant challenge to preventing future disease emer-
Fig. 3 and Extended Data Figs 3–8; Methods). Local factors contribu gence is estimating the zoonotic potential of a newly discovered viral
ting to this bias may include geographic variation in the detection species or strain based on viral traits4–6,27. The best model for deter-
probability of human and/or wildlife viruses, indicating areas where mining whether or not a known virus (n = 586 mammalian viruses)
additional research and capacity strengthening for viral detection are has been observed as zoonotic explained 27.2% of total deviance and
most needed. Our model predictions were not systematically biased or included maximum phylogenetic host breadth (PHB—a virus-specific
clustered across host phylogeny (Extended Data Fig. 9). trait that measures the phylogenetic range of known hosts, e xcluding
Geographic hotspots of ‘missing zoonoses’ vary by host taxonomic humans), research effort, whether or not a virus replicates in the
order, with foci for carnivores and even-toed ungulates in eastern and cytoplasm, is vector-borne, or is enveloped, and average genome
southern Africa, bats in South and Central America and parts of Asia, length (Fig. 4). Using the ‘stringent’ dataset to define whether a virus is
primates in specific tropical regions in Central America, Africa, and zoonotic resulted in a reduced model that excluded enveloped status
southeast Asia; and rodents in pockets of North and South America and genome length (Extended Data Table 1). Our findings confirm a
and Central Africa. Areas where ‘missing zoonoses’ predictions were positive relationship between zoonotic potential and ability to replicate
systematically biased varied by taxonomic order, but included large in the cytoplasm7, and that viruses with arthropod vectors may be able
parts of Africa for the all-mammal dataset (Fig. 3a, Extended Data to infect a wider range of mammalian hosts5. Our phylogenetically
Figs 3–8f). By contrast, the distribution of bias in predicting the explicit measure of host breadth, PHB, can be used at various hierar-
‘missing viruses’ for all mammals was limited to patches of northeastern chical taxonomic levels to quantify and rank viruses from specialist to
Asia, Greenland, peninsular Malaysia, and scattered grid cells in generalist, and was the strongest predictor of zoonotic potential (12.4%
western Asia and Patagonia (Extended Data Fig. 3c). We also identify of total deviance explained). This highlights the value of field programs
geographic regions with large numbers of mammal species currently to identify the natural host range of newly discovered pathogens in
lacking any information regarding their viral diversity (Extended Data order to develop early proxies for their zoonotic potential4. Significant
Figs 3i–8i). In combination, these maps can be used for cost-effective variation in PHB across viral families is suggestive of intrinsic differ-
allocation of resources for viral discovery programs, such as the Global ences in the ability of a virus to infect diverse hosts, and this relates to
Virome Project (D. Carroll et al., submitted). the proportion of observed zoonoses in each family (Fig. 4a).
2 9 J une 2 0 1 7 | V O L 5 4 6 | N A T U R E | 6 4 9
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH Letter
We acknowledge several important caveats in this study. First, our 12. Luis, A. D. et al. A comparison of bats and rodents as reservoirs of zoonotic
estimates of missing viruses and missing zoonoses per species are based viruses: are bats special? Proc. R. Soc. Lond. B. 280, 20122753 (2013).
13. Brierley, L., Vonhof, M. J., Olival, K. J., Daszak, P. & Jones, K. E. Quantifying
on the current maximum observed research effort from the literature, global drivers of zoonotic bat viruses: a process-based perspective. Am. Nat.
and these estimates should be viewed as relative, not absolute. The true 187, E53–E64 (2016).
size of the undiscovered mammalian virome will probably increase with 14. Streicker, D. G. et al. Host phylogeny constrains cross-species emergence and
establishment of rabies virus in bats. Science 329, 676–679 (2010).
new genetic tools for unbiased viral discovery and in-depth studies 15. Han, B. A., Schmidt, J. P., Bowden, S. E. & Drake, J. M. Rodent reservoirs of
that repeatedly sample wildlife populations over time25. Second, our future zoonotic diseases. Proc. Natl Acad. Sci. USA 112, 7039–7044 (2015).
ecological and biological predictor variables only explain a portion of 16. Fauquet, C., Mayo, M. A., Maniloff, J., Desselberger, U. & Ball, L. A. Virus
taxonomy: Eighth Report of the International Committee on Taxonomy of Viruses.
the total variation in viral richness per host and zoonotic potential (Elsevier Academic Press, 2005).
based on viral traits, although this is greater than that reported in 17. Dunn, R. R., Davies, T. J., Harris, N. C. & Gavin, M. C. Global drivers of human
comparable order-specific studies10,12. Third, while we control for pathogen richness and prevalence. Proc. R. Soc. Lond. B 277, 2587–2595
research effort we cannot account for viruses or host associations that (2010).
18. Levinson, J. et al. Targeting surveillance for zoonotic virus discovery. Emerg.
have completely evaded human detection to date, nor those identified Infect. Dis. 19, 743–747 (2013).
but not published. Additional resources to support better data sharing 19. Morse, S. S. in Emerging Viruses (ed. Morse, S. S.) 10–28 (Oxford University
and on-the-ground viral surveillance in the species and regions we Press, 1993).
20. Zhou, P. et al. Contraction of the type I IFN locus and unusual constitutive
identify would help validate predictive models to identify zoonotic viral expression of IFN-αin bats. Proc. Natl Acad. Sci. USA 113, 2696–2701
hotspots, and streamline costly efforts to develop measures to prevent (2016).
their future emergence. 21. Parker, I. M. et al. Phylogenetic structure and host abundance drive disease
pressure in communities. Nature 520, 542–544 (2015).
The analyses reported herein have broad potential to assist in expe- 22. Longdon, B., Brockhurst, M. A., Russell, C. A., Welch, J. J. & Jiggins, F. M. The
diting viral discovery programs for public health. Our host-specific evolution and genetics of virus host shifts. PLoS Pathog. 10, e1004395 (2014).
analyses and estimates of missing zoonoses allow us to identify which 23. Ge, X.-Y. et al. Isolation and characterization of a bat SARS-like coronavirus that
uses the ACE2 receptor. Nature 503, 535–538 (2013).
species and regions should be preferentially targeted to characterize 24. Organtini, L. J., Allison, A. B., Lukk, T., Parrish, C. R. & Hafenstein, S. Global
the global mammalian virome. Our viral trait framework then allows displacement of canine parvovirus by a host-adapted variant: A structural
prioritization of newly discovered wildlife viruses for detailed charac- comparison between pandemic viruses with distinct host ranges. J. Virol. 89,
1909–1912 (2015).
terization (for example, by sequencing receptor-binding domains, and 25. Anthony, S. J. et al. A strategy to estimate unknown viral diversity in mammals.
conducting in vitro and in vivo infection experiments23) to assess their MBio 4, e00598–13 (2013).
potential to threaten human health. 26. Drexler, J. F. et al. Bats host major mammalian paramyxoviruses. Nat. Commun.
3, 796 (2012).
Online Content Methods, along with any additional Extended Data display items and 27. Geoghegan, J. L., Senior, A. M., Di Giallonardo, F. & Holmes, E. C. Virological
Source Data, are available in the online version of the paper; references unique to factors that increase the transmissibility of emerging human viruses. Proc. Natl
these sections appear only in the online paper. Acad. Sci. USA 113, 4170–4175 (2016).
28. Fritz, S. A., Bininda-Emonds, O. R. P. & Purvis, A. Geographical variation in
received 5 January 2016; accepted 24 May 2017. predictors of mammalian extinction risk: big is bad, but only in the tropics.
Ecol. Lett. 12, 538–549 (2009).
Published online 21 June 2017.
6 5 0 | N A T U R E | V O L 5 4 6 | 2 9 J une 2 0 1 7
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Letter RESEARCH
Methods measles virus; mumps virus)34,35. We present these additional analyses excluding
No statistical methods were used to predetermine sample size. The experiments reverse zoonoses and associated code at http://doi.org/10.5281/zenodo.596810.
were not randomized and the investigators were not blinded to allocation during Total viral richness was calculated as the number of unique ICTV-recognized
experiments and outcome assessment. viruses found in a given host species, and zoonotic viral richness was defined as
Database. To construct the mammal–virus association database we initially the number of unique ICTV-recognized viruses in a given host species that were
extracted all viruses listed as occurring in any mammal from the International also detected in humans in our database.
Committee on Taxonomy of Viruses database (ICTVdb), and further individually To assess research bias for both host and virus, we searched ISI Web of
went through each virus listed in the ICTV 8th edition master list and searched the Knowledge, including Web of Science and Zoological Record, and PubMed for
literature for mammalian hosts. All viral species names were synonymized to ICTV the number of research publications for a given host or pathogen. We recorded two
8th edition, which was the global authority on viral taxonomy at the start of our values for the number of research papers for a host. The first was a simple search
data collection in 2010 (ref. 16). From 2010–15 the authors and a team of research by scientific binomial in Zoological Abstracts where we recorded the number of
assistants and interns at EcoHealth Alliance compiled mammal species associa- papers published between 1940–2013 for each host species. We also recorded
tions for each of 586 unique viruses published in the literature between 1940–2015 the number of disease-related publications for each species using the scientific
initially by using the virus name and synonyms as the search keywords in the binomial AND topic keyword: disease*OR virus*OR pathogen*OR parasit*.
major online reference databases (Web of Science, PubMed, and Google Scholar) The *operator was used in our search criteria to capture all words that begin with
in addition to searching in books, reviews, and literature cited in sources we had each term, for example, ‘parasit*’ would return hits for ‘parasite’, ‘parasites’, and
already obtained. To narrow the search for hosts for well-researched viruses, we ‘parasitic’. These search criteria broadly included papers that examined disease or
additionally included the terms ‘host(s)’, ‘reservoir’, ‘wildlife’, ‘animals’, ‘surveillance’, diseases, virus or viruses, pathogen or pathogens, parasite parasites, or parasitology,
and other relevant terms to find publications related to host range. Associations for each species. Only one measure of per-host research effort was included at a
were cross-checked for completeness with the Global Mammal Parasite Database time in model selection. As these metrics are highly correlated and the number of
for primate, carnivore and ungulate viruses, version as of Nov 2006 (GMPD, disease related citations per host outperformed the total number of publications
http://www.mammalparasites.org)29 and other published reviews specific to bats per host in all but one model (all-data zoonoses), we decided to use disease-related
and rodents12,30,31. We excluded all records without species-level host information, publications as our per-species research effort measure for all models to improve
and those where we could not track down the primary references. Records of interpretability. We also recorded the number of publications for each of 586 virus
mammal–virus associations from experimental infection studies, zoological parks species using a keyword search by virus name in PubMed and Web of Science. Only
or captive breeding facilities, or cell culture discoveries were excluded. Host species one measure of per virus research effort was included at a time in model selection.
were defined as domestic or wild following the list of domestic animal species We used a phylogenetically corrected measure of body mass (see details below
from the Food and Agriculture Organization (FAO)32, and we removed the black under ‘Phylogenetic signal’) as our main life history predictor variable, because
rat (Rattus rattus) and domestic mouse (Mus musculus) from the domesticated it was the only one for which a nearly complete dataset existed for the species in
list as these two species make up their own ‘peri-domestic’ category. Host species our dataset. We used the body mass recorded in the PanTHERIA database36 for
were categorized as either occurring in human modified habitats or being hunted 709 species. For 3 species, we used the second choice option, body mass recorded
by humans—both estimates for human contact—according to the IUCN Red List in the AnAge database37. For 11 species, we used the third choice option of the
species descriptions33. extrapolated body mass recorded in PanTHERIA, which is based on body length
To control for the fact that some detection methods are more reliable than others or forearm length, depending on species. For 36 species, we used the average
in identifying the pathogen of interest, we recorded the detection method used for body mass for members of the genus that had a recorded body mass. We explored
each host–virus association and scored these as 0, 1, or 2 according to the r eliability other life-history variables related to longevity38, reproductive success, and basal
of detection method used. Viral isolation and PCR detection with sequence metabolic rate but these were ultimately excluded owing to the high number of
confirmation were scored as a 2 (=stringent data); and serological methods were missing records.
scored as a 0 or 1, with viral or serum neutralization tests (=1), and enzyme- Phylogenetic signal. We address the issue of non-independence of host
linked immunoassays (ELISA), antigen detection assays, and other serological species traits owing to shared ancestry39 in our analyses by first quantifying
assays scored as (=0). ‘Stringent data’ were analysed separately to remove potential the p hylogenetic signal for each variable in our model using Blomberg’s K40.
uncertainty owing to cross-reactivity with related viruses. We exhaustively searched Blomberg’s K measures phylogenetic signal in a given trait by quantifying trait
the literature to identify a stringent detection for each mammal–virus pair, and variance relative to an expectation under a Brownian motion null model of evolu-
only included the serological finding for that pair if no molecular or viral isolation tion using a phylogenetic tree with varying branch lengths. Blomberg’s K-values
studies were available. We partitioned data and conducted separate analyses for the are scaled from 0 to infinity, with a value of 0 equal to no phylogenetic signal and
entire data set (0 + 1 + 2 detection quality) and the stringent data (score of 2) to values greater than 1 equal to strong phylogenetic signal for closely related species
reduce the noise from potential serological cross-reactivity. Full list of host–virus that share more similar trait values. While there is no clearly defined K value cut-
associations, detection methods, and associated references are provided in our data off in which to apply phylogenetic comparative methods, non-significant value
and code repository at http://doi.org/10.5281/zenodo.596810. of <1, or more conservatively <0.5, are typical for traits that are phylogeneti-
Our operational definition of a zoonotic virus includes any virus that was cally independent. The only host variables we examined with significant K values
detected in humans and at least one other mammalian host in at least one primary >0.5 were host body mass, and our direct measure of phylogenetic distance to
publication, and does not imply directionality. Our complete dataset of mammalian humans. While there are several tools available to control for phylogeny in multi-
viral associations demonstrates evidence of past or current viral infection which variate analyses, for example, using phylogenetic generalized least square models
we believe is a reasonable proxy for measuring spillover, and our stringent dataset (for example, PGLS)41, there is currently no modelling approach to control for
specifically is more robust to exclude species that may have been exposed to a phylogeny using GAMs. More importantly, a wholesale effort to control for
given virus versus those that show some evidence for replication within the host phylogeny across all variables in our analysis was not appropriate here, as we
species. Our bi-directional definition of spillover follows a proposal by the WHO are explicitly testing the relative importance of phylogenetic distance to humans
that defines a zoonosis as “any disease or infection that is naturally transmissible versus other host traits including measures of human–wildlife contact to predict
from vertebrate animals to humans and vice-versa” (http://www.who.int/zoonoses/ the p roportion of zoonotic viruses for a given host species. This left body mass as
en/) and excludes any human pathogens that recently evolved from nonhuman the only variable in our models, excluding our direct measures of phylogenetic
pathogens (for example, HIV in primates), as per Woolhouse and Gowtage- distance, with a significant Blomberg K value that was greater than 1. We con-
Sequeria (2005) (ref. 1). trolled for the significant effect of shared evolutionary history using a phylogenetic
In order to address influence of transmission from humans to wildlife in our eigenvector regression (PVR)42,43 on body mass. The PVR approach allowed us to
models, we also ran our GAM model fitting and selection procedure (see below) remove phylogenetic signal for any phylogenetically non-independent v ariables
on a subset of data that excluded any probable ‘reverse zoonotic’ viruses. We first and then include the corrected values back in our GAMs, while retaining predictor
searched our entire dataset and removed any clear instances of transmission variables like phylogenetic distance to humans as unmodified. We calculated
from humans to primates, for example, including records from zoological parks PVR for body mass using the R package PVR and our custom-build maximum
and wildlife rehabilitation centres (as previously noted). We then additionally likelihood host phylogeny using cytochrome b sequences constrained to the
removed several human viruses most commonly transmitted from humans back order-level topology of the mammalian supertree28,44. Our new variable for body
to non-human primates to create a subset of data without the most common mass that controls for phylogenetic signal (PVRcytb_resid) removed most of the
reverse zoonotic viruses (adeno-associated virus-2; human adenovirus D; human phylogenetic signal, with K = 3.5 unadjusted, and K < 0.5 after PVR correction.
herpesvirus 4; human metapneumovirus; human respiratory syncytial virus; Our new metric of body mass scales in the same way, with larger values equal to
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH Letter
species with larger body mass. PVR body mass was included in our GAM model redictor variables (for example, Figs 2 and 4). Categorical and binary variables
p
selection for the total viral richness and zoonotic virus models. (for example, host order, IUCN status of hunted or not, and certain viral traits)
Host phylogenetic analysis and phylogenetic host breadth. We used two different were fit as random effects of each variable level. We used automated term s election
mammal phylogenetic trees in our analyses and used a model selection framework by double penalty smoothing52 to eliminate variables from the models. This
to determine which best explained our observed association with zoonotic viral method removes variables with little to no predictive power and has been shown
richness. First the mammal supertree was pruned in R (package ape, function drop. to be comparable or superior to comparing alternate models with and without
tips) to include only synonymous species for the 753 species in our database28,45. variables. We did use the model comparison method for domestic animals, where
We synonymized all host species names between the mammal supertree and the the s ample size was not sufficient for fitting all variables. In this case d ropping
host associations in our database using the IUCN Red List33. If the species was variables by double penalty smoothing still allowed pruning the model list to
listed as ‘cattle’ it was assumed to be Bos taurus, all other records were excluded eliminate redundant models. Where there were competing variables measuring
if there was ambiguity as to the scientific name for the host species. Second, a the same m echanistic effect, we fit alternate GAMs using only one of each of these
maximum likelihood cytochrome b tree was generated using the constraint of a variables (as specified in below and in the Extended Data Fig. 1). These included
multifurcating tree with taxa constrained to their respective orders and the order- phylogenetic variables, citation counts from alternate databases, and different
level topology matching that of the mammal supertree6, as per this Newick tree measures of human p opulation/host overlap. For example, to c apture host phy-
file: (MONOTREMATA,((DIDELPHIMORPHIA,(DIPROTODONTIA,PERAME logeny we used phylogenetic distance based on either the mammal supertree20 or a
LEMORPHIA)),(PROBOSCIDEA,((PILOSA,CINGULATA),((((RODENTIA,LAG purpose-built cytochrome b constrained by the topology of the mammal supertree,
OMORPHA),(PRIMATES,SCANDENTIA)),((((CETARTIODACTYLA,PERISSO but never both in the same model. For human population variables, we looked at
DACTYLA),CARNIVORA),CHIROPTERA),EULIPOTYPHLA))))))). This either variables measuring overlap of species range with human-occupied areas,
generated ahigher-resolution species-level mammal tree using cytochrome b data, or human population in those areas, as area- and population-based measures
with more reliable positioning of the higher-level taxonomic relationships than were highly co-linear. For citation variables, we looked at either all citations or the
was obtained in exploratory phylogenetic analyses using cytochrome b data alone. number of disease-related citations for each host s pecies, not both, and similarly
GenBank accession numbers and cytochrome b sequence lengths for each species citations in either PubMed or Web of Knowledge. We used a binomial GAM to
are provided in in our data and code repository. Cytochrome b gene fragments analyse the 586 mammalian viruses in our database and identify viral traits that
ranged from 143 to 1,140 bp, with >1,000 bp available for 558/665 (84%) of the may serve as predictors of zoonotic potential. Co-linearity was not a major issue
taxa. Data derived from the cytochrome b tree constrained to the topology of the among variables included in the same model.
mammal supertree was selected as the best option in all best-fit GAMs. We inspected models within 2 AIC units of the model with the lowest AIC,
Sequences were aligned using MUSCLE with default setting in Geneious R6, and present the outputs of the best-fit and all other top models (<2 ΔAIC) in our
and checked visually for errors46. The best maximum likelihood tree with and data and code repository. In general, variable effects retained the same functional
without the constraint tree were generated using RAxML-HPC2 on XSEDE form and effect size across models within 2 ΔAIC—differences were limited to the
via the CIPRES Science Gateway server v.3.1 (ref. 47) using a GTR model with adding or dropping of very weak, insignificant effects, or switching between highly
parsimony seed, 1.000 bootstrap replicates, and the following, specific parameters correlated competing variables such as citation counts from different databases.
(raxmlHPC-HYBRID -s infile -n result -x 12345 -g constraint.tre -N 1000 -c 25 -p For our model of number of zoonoses per host, we used the total number of
12345 -f a -m GTRCAT). observed viruses per host as an offset, effectively fitting a model of proportion of
Matrices of pairwise patristic distances between all species, including Homo zoonotic viruses per host. We found this variable had a coefficient near to one
sapiens, were calculated from the two phylogenies using the ‘cophenetic’ function when it was used as a linear predictor, indicating its appropriateness as an offset.
in the R package ape45. Phylogenetic trees (Newick format for pruned supertree We repeated the model selection process for all models using the more stringent
and cytochrome b tree) and matrices of phylogenetic distance from humans are set of data that used only virus identified in mammal hosts using viral isolation,
provided in the data and code repository. PCR, or other methods of nucleic acid sequence confirmation, that is, that excluded
We calculated mean, median, max., min., IQR, and standard deviation all associations detected via serology.
(represented as generic function F in equation (1) of phylogenetic host breadth All models were fit using the MGCV package for R (version 1.8-12.). We used
(PHB) from all known mammalian hosts for each virus using the pairwise p atristic the model with the lowest AIC to predict the number of expected zoonotic viruses
distances (di, j) for each mammal–mammal association for all hosts of a given virus for each host species, using all the data from our database that had complete obser-
excluding humans, where i indexes each mammal in the database, as does j, and J vations for the best model. Our top models consistently outperform the a lternatives
represents the total mammals in the database. We aggregated these PHB values by wide margins, as measured by AIC. We used standard methods in the R
using mean, median, or maximum values at a viral species, genus and viral family package MGCV to calculate deviance explained, which is defined as (D_null –
level to generate higher-level taxonomic variables of host breadth per viral group. D_model)/D_null. In this formula, D_null is the deviance (−2 × likelihood) of an
Our measure is similar to those developed by previous studies to understand intercept-only, (or, in the case of the zoonoses model, offset-only), model, while
parasite host specificity48–50, but here we create a generalizable variable to measure D_model is the deviance of our best-fit model.
viral host breadth that can be aggregated at different viral taxonomic levels. Analyses were limited to terrestrial mammal species as defined by the IUCN
Red List (marine mammals were excluded) and we ran separate analyses for wild
PHBi = F J j= 0 di, j (1) and domestic animals. As domestic animals made up a much smaller dataset
(n = 32 species) with a unique set of explanatory variables that differed from the
To make Extended Data Fig. 9, taxon names and terminal branches of cytochrome wild species analyses, these models were fit separately. Domestic species results are
b tree constrained to supertree were colour-coded using residual from the best- also discussed separately (see Supplementary Discussion) as they are tangential
fit zoonotic virus GAM (predicted minus observed zoonotic viral richness) for to the primary findings.
wildlife species, and plotted using the plot.phylo function in the R package ape45. Model cross-validation. We used k-fold cross-validation to evaluate goodness of fit
Symbols (circles) at terminal taxa additionally added to better visualize residual for all models. The data was divided into ten folds, selected randomly. For each fold,
value colours were added using willeerd.nodelabels function (http://dx.doi. the model was re-fit based on the other nine folds, and goodness of fit was assessed
org/10.5281/zenodo.10855). All marine mammals, domestic animals, and other by conducting a nonparametric permutation test comparing the predicted values
taxa with missing data were coded as grey for missing data. versus the real values for the kth fold, where a non-significant result indicates that
Viral richness heat map (Extended Data Fig. 2) was generated using the R predictions are unbiased. Poisson models goodness-of-fit may be compared via a
package pheatmap, and the ‘complete’ hierarchical clustering algorithm to sort parametric χ2 permutation test on deviance values, but this test is inappropriate
cells across rows and columns by similar values of viral richness. All box plots, in the case of models with low mean values, as is our case for some of our GAMs53.
histograms and all other figures generated in R v.3.3.0 (ref. 51). R code for primary The k-fold cross-validation confirmed the robustness of our model predictions
figure generation is provided in the code repository. for wild mammals, code and outputs from these tests for each best-fit GAM are
GAM fitting and selection. We fit a set of generalized additive models (GAMs) provided in Supplementary Table 2.
that included all of our selected potential variables explaining the number of total In addition to randomly selected k-fold cross-validation, we evaluated the
viruses or number of zoonoses in hosts, as well as whether viruses were zoonotic robustness of our models via a non-random geographic cross-validation, code and
(for conceptual framework and summary of each GAM see Extended Data Fig. 1; summary document provided in our code and data repository. In order to mean-
for full variable list and data sources see Supplementary Table 1). Our use of GAMs, ingfully organize species in our dataset by geographic areas, we used the 34 zoogeo
an incorporation of smooth spline predictor functions into the generalized l inear graphic regions for terrestrial mammals recently redefined by Holt et al.54. Using
model (GLM) framework, allowed us to examine the functional form of our QGIS55, a mammal-specific zoogeographical shapefile provided by Holt’s group
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Letter RESEARCH
at the University of Copenhagen (http://macroecology.ku.dk/resources/wallace) c ombining information from satellite imagery and sub-national crop and pasture
was intersected (using QGIS Vector > Geoprocessing Tools >Intersect) with a statistics56. In our GAMs, we used several transformations of these variables as
shapefile of IUCN’s host ranges for all mammals in our database. Areas of these competing proxies for human–wildlife contact: the log-transformed area of host
intersections were then calculated using an equal-area projection (Mollweide), and range that overlapped each type of human-modified land cover, log-transformed
each host was assigned to only the region that contained the greatest proportion human population in the host range, log-transformed human population density
of its range. We systematically removed all observations (species) from each given in the host range, and the log-ratio of urban and rural human populations in the
zoogeographical region, re-fit the model using all observations from outside host range. For each of these, we also included as a variable the change in value
the region, then performed a non-parametric permutation test comparing the from 1970 to 2005. Human–wildlife contact variables that significantly covaried
predicted values to the observed values for that region. Non-significant results were excluded (set as competing terms) during the model selection process. The
indicate that model predictions are unbiased. Significant results for a given ratio of urban to rural human population was used to disentangle variables of
zoogeographic region suggest that there are location-specific biases that remain human–wildlife contact that significantly covaried. For example, the total area of
unexplained. This systematic zoogeographic cross-validation supported the overall a species range that overlapped with urban and rural areas was highly correlated
robustness of our model predictions for several models, that is, all-data zoonoses, with the total geographic area variables we examined (for example, total area, and
all-data total viral richness, and stringent-data total viral richness models. For these area in crop, pasture, rural, and urban). The ratio of urban to rural population
models, even though a majority of zoogeographic regions were unbiased, we still allowed us to separate these signals and best represent this proxy of per-species
identified several zoogeographic regions that showed significant bias. Our zoogeo human–wildlife contact. All spatial analyses were performed in R (3.3.2)51, using
graphic cross-validation was equivocal for the stringent-data zoonoses model, with the following R libraries: raster57, rgdal58, and sp58.
eight regions that showed evidence of bias and seven regions which showed no Calculating and visualizing missing viruses and missing zoonoses. We used
evidence of bias (Supplementary Table 3). each respective best-fit, all-data GAM from the total viral richness and propor-
The presence of biased regions in our zoogeographic cross-validation sug- tion zoonoses models to calculate the estimated number of viruses that would
gested the possibility that there is a systematic bias associated with geography not be observed if the research effort variable for each species was equal to that of
captured by the predictor variables in our models. To further investigate this, we the most-studied wild species in our database (Vulpes vulpes with 4,433 total
added zoogeographical region as a categorical random effect to each of our best- publications and 1,477 disease-related publications). We used the prediction of the
fit models. For three of our best-fit GAMs (all-data total viruses, stringent-data total virus richness GAM as the offset for the zoonoses GAM. We then calculated
total viruses, and stringent-data zoonoses) the addition of zoogeographical region the missing viruses and missing zoonoses by subtracting the observed number of
as a categorical random effect decreased the model AIC and increased the total viruses and z oonoses from the predictions based on maximum research for each
deviance explained by 3–5%. The all-data zoonoses model, which was used to wild mammalian species.
create the series of maps in the main manuscript, does not improve with the inclu- We used geographic range maps from the IUCN spatial database (2015.2) to
sion of zoogeographical region. However, the improved predictive power of models visualize the spatial distribution of observed host–virus associations, observed
using region-specific terms is offset by the increase in degrees of freedom (that host–zoonoses associations, these associations as predicted under maximum
is, if we included 31 zoogeographic regions as separate terms) and, more impor- research, and the maximum predicted minus the observed viruses, or the missing
tantly, a decreased interpretability of our models—especially when compared to viruses and missing zoonoses (for example, Fig. 3; Extended Data Figs 3–8;
the geographical variables we used, such as host area or species range overlap with Supplementary Table 4). We also generated maps comparing species richness of all
human modified habitat. We opted not to include these random effects in our final species in the IUCN database against those with viral associations in our database.
GAMs in favour of keeping only variables interpretable in the context of our host For each species, the distribution range was converted to a grid system with cells
trait-specific framework. Instead, we indicate areas of geographic bias directly on 1/6 of a geographic degree (approximately 18 km × 18 km at the equator line).
our spatially mapped outputs. (See ‘Calculating and visualizing missing viruses Each grid cell was assigned a value of one to indicate presence. We repeated this
and missing zoonoses’, below.) Summaries of these models, along with changes in process and assigned the observed and predicted-under-maximum-effort number
relative deviance explained for the other explanatory variables when zoogeographic of zoonotic viruses to their correspondent grid cells. Viral and host species richness
region is added as a random effect, are provided in our code and data repository. maps, and both the missing viruses and missing zoonoses maps were calculated by
Spatial variables. For all the wildlife hosts we used the geographic range infor- overlying individual grids. Each richness map represents the sum of all values for
mation obtained from the IUCN spatial database version 2015.2. Wildlife host a given grid cell. We repeated the process for all the host species in our database
species shapefiles needed to replicate analysis are hosted on our Amazon S3 storage and created viral and species richness maps for the following orders: Carnivora,
(https://s3.amazonaws.com/hp3-shapefiles/Mammals_Terrestrial.zip)33. IUCN Cetartiodactyla, Chiroptera, Primates and Rodentia. These taxa were selected
depict species’ range distributions as polygons based on the extent of occurrence because they represent 681/736 (92.5%) of wild mammal species in our database.
(EOO), which is defined as the area contained within a minimum convex hull In the process of translating our non-spatial, species-level predictions to geo-
around species’ observations or records. This convex hull or polygon is further graphic space (that is, layered raster maps), we identified several geographic areas
improved by including areas known to be suitable or by removing unsuitable or where our model predictions of the number of total and zoonotic viruses were
unoccupied areas based on expert knowledge. To accurately calculate the area in systematically biased, that is, P < 0.05 (Supplementary Table 3). In order to visualize
km2 of each host species we projected the polygons to an equal area projection the geographic biases of our non-spatial model predictions in our maps (see above
(Mollweide). regarding zoogeographic cross-validation), we demarcate regions with significant
We calculated various thresholds of mammal sympatry based on percentage bias with hatching. Hatched regions represent areas where model predictions of
of range overlap for each wild species in our database using IUCN shape files for total or zoonotic viral richness deviate systematically for the collection of species
all mammals globally. We define mammal sympatry as the number of mamma- in that grid cell. For each grid cell we calculated whether the bias exceeded that
lian species that overlap with the target species’ geographic range. We calculated expected from a random sampling of hosts. This was accomplished by summing
mammal sympatry for each wild species in our database at six different thresholds the residuals from 100,000 random draws of species in our dataset that was equal
based on the percentage area overlap with the target species geographic range, to the number of species present in that grid cell, then identifying grid cells where
that is, the number of other wild mammal species with any (>0%), ≥ 20%, ≥ 40%, the observed bias was outside the middle 95% of the randomly drawn distribution.
≥ 50%, ≥ 80%, or 100% range overlap. The six different thresholds for mammal We calculated this for all mammals, and separately for each order across all grid
sympatry were included as competing terms in our model selection for the total cells. Areas with observed bias (outside of 95% of the randomly drawn distribution)
viral richness models. are shown with hatched regions on each missing virus and missing zoonoses map.
We derived and tested several global measures to estimate the level of human Animal images used in figures. Animal silhouettes added to Figs 1 and 3 and
contact with each wild species in our database. To estimate the area of host Extended Data Figs 1 and 2 to visually represent each mammalian order were
geographic range covered by crops, pastures, rural and urban areas—as m easures downloaded from PhyloPic (http://www.phylopic.org). Images used to represent
of global human contact with a given wildlife species—each species polygon was the orders Chiroptera, Cingulata, Diprotodontia, Lagomorpha, Peramelemorphia
intersected (overlapped) with spatial data representing those land cover types. and Primates were available for use under the Public Domain Dedication license.
Additionally, we calculated the total number of people within each host g eographic Images used to represent the orders Carnivora and Rodentia (by R. Groom),
range using data from HYDE database56, and also separately totalled the number Didelphimorphia, Pilosa, and Probscidea (by S. Werning), Eulipotyphyla (by
of people in rural and urban populations. We obtained data on the distribution of C. Rebler), Certartiodactyla and Perissodactyla (by J. A. Venter, H. H. T. Prins,
cropland, pastures, rural and urban areas also from the HYDE database56 for the D. A. Balfour & R. Slotow and vectorized by T. M. Keesey) were p rovided under
years 1970, 1980, 1990, 2000 and 2005 with a spatial resolution of 5 ×5 arc minutes, a Creative Commons license (https://creativecommons.org/licenses/by/3.0/). We
equivalent to 10 km by 10 km at the equator. These datasets were c reated by created the silhouette used to represent the order Scandentia.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH Letter
Data availability. All datasets (host traits, viral traits, full list of host–virus asso- 41. Grafen, A. The phylogenetic regression. Phil. Trans. R. Soc. Lond. B 326,
ciations and associated references, phylogenetic trees, and phylogenetic distance 119–157 (1989).
matrices) needed to fully replicate and evaluate these analyses are provided at 42. Diniz-Filho, J. A. F. et al. On the selection of phylogenetic eigenvectors for
ecological analyses. Ecography 35, 239–249 (2012).
http://doi.org/10.5281/zenodo.596810. The top-level README.txt file in the 43. Diniz-Filho, J. A. F., de Sant’Ana, C. E. R. & Bini, L. M. An eigenvector method for
directory details the file structure and metadata provided. estimating phylogenetic inertia. Evolution 52, 1247–1262 (1998).
Code availability. All R code and R package dependencies needed to fully replicate 44. Bininda-Emonds, O. R. P. et al. The delayed rise of present-day mammals.
and evaluate these analyses are provided at http://doi.org/10.5281/zenodo.596810. Nature 446, 507–512 (2007).
45. Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and
29. Nunn, C. L. & Altizer, S. M. The global mammal parasite database: An online evolution in R language. Bioinformatics 20, 289–290 (2004).
resource for infectious disease records in wild primates. Evol. Anthropol. 14, 46. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and
1–2 (2005). high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
30. Olival, K. J., Epstein, J. H., Wang, L. F., Field, H. E. & Daszak, P. in New Directions 47. Stamatakis, A., Hoover, P. & Rougemont, J. A rapid bootstrap algorithm for the
in Conservation Medicine: Applied Cases of Ecological Health (eds Aguirre, A. A., RAxML Web servers. Syst. Biol. 57, 758–771 (2008).
Ostfeld, R. S. & Daszak, P.) Ch. 14, 195–212 (Oxford University Press, 2012). 48. Cuthill, J. H. & Charleston, M. A. A simple model explains the dynamics of
31. Calisher, C. H., Childs, J. E., Field, H. E., Holmes, K. V. & Schountz, T. Bats: preferential host switching among mammal RNA viruses. Evolution 67,
important reservoir hosts of emerging viruses. Clin. Microbiol. Rev. 19, 980–990 (2013).
531–545 (2006). 49. Poulin, R., Krasnov, B. R. & Mouillot, D. Host specificity in phylogenetic and
32. Scherf, B. D. World Watch List for Domestic Animal Diversity. 3rd edn, (Food and geographic space. Trends Parasitol. 27, 355–361 (2011).
Agriculture Organization of the United Nations, 2000). 50. Poulin, R. & Mouillot, D. Parasite specialization from a phylogenetic
33. IUCN. The IUCN Red List of Threatened Species. Version 2014.1, http://www. perspective: a new index of host specificity. Parasitology 126, 473–480 (2003).
iucnredlist.org (2014). 51. R Core Team R: A language and environment for statistical computing. R
34. Epstein, J. H. & Price, J. T. The significant but understudied impact of pathogen Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.
transmission from humans to animals. Mt. Sinai J. Med. 76, 448–455 (2009). org/ (2014).
35. Messenger, A. M., Barnes, A. N. & Gray, G. C. Reverse zoonotic disease 52. Marra, G. & Wood, S. N. Practical variable selection for generalized additive
transmission (zooanthroponosis): a systematic review of seldom-documented models. Comput. Stat. Data Anal. 55, 2372–2387 (2011).
human biological threats to animals. PLoS One 9, e89055 (2014). 53. Pawitan, Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood.
36. Jones, K. E. et al. PanTHERIA: a species-level database of life history, ecology, (Oxford University Press, 2001).
and geography of extant and recently extinct mammals. Ecology 90, 2648 54. Holt, B. G. et al. An update of Wallace’s zoogeographic regions of the world.
(2009). Science 339, 74–78 (2013).
37. de Magalhães, J. P. & Costa, J. A database of vertebrate longevity records and 55. QGIS Geographic Information System. Open Source Geospatial Foundation
their relation to other life-history traits. J. Evol. Biol. 22, 1770–1774 (2009). Project http://www.qgis.org/ (2016).
38. Cooper, N., Kamilar, J. M. & Nunn, C. L. Host longevity and parasite species 56. Goldewijk, K. K., Beusen, A., van Drecht, G. & de Vos, M. The HYDE 3.1 spatially
richness in mammals. PLoS One 7, e42190 (2012). explicit database of human-induced global land-use change over the past
39. Felsenstein, J. Phylogenies and the comparative method. Am. Nat. 125, 1–15 12,000 years. Glob. Ecol. Biogeogr. 20, 73–86 (2011).
(1985). 57. raster: Geographic Data Analysis and Modeling version 2.3-40 https://
40. Blomberg, S. P., Garland, T., Jr & Ives, A. R. Testing for phylogenetic signal in cran.r-project.org/package=raster (2015).
comparative data: behavioral traits are more labile. Evolution 57, 717–745 58. sp: Classes and Methods for Spatial Data version 1.2-1 https://cran.r-project.
(2003). org/package=sp (2015).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Letter RESEARCH
a b
Total Viruses (Zoonotic Pool) Spillover Human Infection
ecological contact
phylogenetic distance
viral traits
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH Letter
Extended Data Figure 1 | Conceptual model of zoonotic spillover, viral may facilitate viral spillover. We examine the relative importance of host
richness, and summary of models. a, Conceptual model of zoonotic phylogenetic distance to humans, ecological opportunity for contact, or
spillover showing primary risk factors examined, colour-coded according other species-specific life-history and taxonomic traits (GAM 2), and
to generalized additive models used. b, Conceptual model of observed, identify viral traits associated with a higher likelihood of an observed
predicted, and actual viral richness in mammals. c, GAMs used in our virus being zoonotic (GAM 3). We estimate the total and zoonotic viral
study to address specific components of a and b, colour-coded by model. richness per host species using GAMs 1 and 2, and calculate the missing
Variables listed with ‘or’ under each GAM covaried and were provided as viruses and missing zoonoses under a scenario of increased research
competing terms in model selection, and those in bold were included in effort (b, Methods). Owing to imperfect surveillance in both humans and
the best-fit model using all host–virus associations. Significant variables wildlife and biases in viral detection, there may be uncertainty in the exact
from each best-fit GAM are noted with an asterisk. Zoonotic viral proportion of viruses that are zoonotic (b, light grey), and also between the
spillover first depends on the underlying total viral richness in mammal actual, or true, viral richness (dotted lines) and the predicted maximum
populations and the ecological, taxonomic, and life-history traits that viral richness per host (dashed line).
govern this diversity (GAM 1). Second, host- and virus-specific factors
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Letter RESEARCH
50
49 22 6 20 2 0 2 1 1 0 8 7 4 6 12 Bunyaviridae
9 9 1 18 0 1 1 1 0 0 1 1 2 5 3 Rhabdoviridae
8 15 3 4 3 0 1 2 0 0 0 0 0 6 6 Reoviridae 40
10 6 7 7 3 2 1 0 0 0 1 2 1 8 11 Togaviridae
21 0 1 1 0 0 0 0 0 0 0 0 0 1 0 Arenaviridae
8 2 5 0 0 0 0 0 0 0 1 0 0 3 0 Parvoviridae 30
1 2 4 3 0 0 0 0 0 0 0 0 0 0 0 Filoviridae
3 0 1 1 0 0 0 0 0 0 0 0 0 0 0 Hepadnaviridae 20
3 1 4 0 0 0 0 0 0 0 1 0 0 0 0 Polyomaviridae
1 1 1 0 0 0 1 0 0 0 1 0 1 1 1 Bornaviridae
1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 Picobirnaviridae 10
1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 Arteriviridae
1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 Hepeviridae
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
Unassigned
0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 Circoviridae
0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Anelloviridae
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 Asfarviridae
0 3 0 0 0 0 0 0 0 0 0 0 0 2 0 Astroviridae
2 3 2 1 0 0 0 0 0 0 0 0 0 3 1 Orthomyxoviridae
8 11 3 0 0 0 0 0 0 0 3 0 0 3 1 Poxviridae
2 7 0 0 0 0 0 0 0 0 2 0 0 7 1 Caliciviridae
2 6 1 1 0 0 0 0 0 0 1 0 0 3 1 Coronaviridae
2 7 1 0 0 0 0 0 0 0 2 0 0 2 2 Papillomaviridae
4 12 7 0 1 0 0 0 0 1 0 0 0 1 2 Adenoviridae
3 6 6 0 1 0 0 2 0 0 0 1 1 1 3 Picornaviridae
5 10 12 6 0 0 0 0 0 1 0 0 0 4 2 Paramyxoviridae
5 7 12 0 0 0 0 0 0 0 0 0 0 4 2 Retroviridae
21 10 13 24 1 1 1 0 0 0 1 2 1 4 8 Flaviviridae
13 15 23 0 2 0 0 1 0 1 0 0 0 8 9 Herpesviridae
RODENTIA
CETARTIODACTYLA
PRIMATES
CHIROPTERA
DIPROTODONTIA
CINGULATA
PILOSA
PROBOSCIDEA
PERAMELEMORPHIA
SCANDENTIA
LAGOMORPHA
DIDELPHIMORPHIA
EULIPOTYPHLA
CARNIVORA
PERISSODACTYLA
Extended Data Figure 2 | Heat map of observed total viral richness by mammalian order and viral family. Dataset includes 754 mammalian species
and 586 unique ICTV recognized viral species. Heat map aggregated by rows and columns to group taxa with similar levels of observed viral richness.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH Letter
Extended Data Figure 3 | Global distribution of viral and host species h, mammal richness for species in our database (n = 753); i, mammal
richness for all wild mammals. a, Observed total viral richness (for species with no described viruses in the literature. Warmer colours (larger
n = 576 host spp.); b, predicted total viral richness given maximum values) in panels c and f highlight areas predicted to be of greatest value
research effort; c, missing viruses or predicted minus observed total for discovering novel viruses or novel viral zoonoses, respectively, in
viral richness; d, observed zoonotic viral richness (n = 584); e, predicted mammals. Red/pink colours in panel i highlight areas with poor viral
zoonotic viral richness given maximum research effort; f, missing surveillance in mammal species to date. Hatched regions represent areas
zoonoses or predicted minus observed zoonotic viral richness (same where model predictions deviate systematically for the collection of
as included in Fig. 3a); g, global mammal species richness (n = 5,290); species in that grid cell (see Methods).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Letter RESEARCH
Extended Data Figure 4 | Global distribution of viral and host species (n = 276); h, host species richness for Carnivora in our database (n = 79);
richness for wild carnivores (order Carnivora). a, Observed total i, species of the order Carnivora with no described viruses in the literature.
viral richness (for n = 55 host spp.); b, predicted total viral richness Warmer colours (larger values) in c and f highlight areas predicted to be
given maximum research effort; c, missing viruses or predicted minus of greatest value for discovering novel viruses or novel viral zoonoses,
observed total viral richness; d, observed zoonotic viral richness (n = 55); respectively, in carnivores. Red/pink colours in panel i highlight areas
e, predicted zoonotic viral richness given maximum research effort; with poor viral surveillance in carnivore species to date. Hatched regions
f, missing zoonoses or predicted minus observed zoonotic viral richness represent areas where model predictions deviate systematically for the
(same as included in Fig. 3b); g, global host species richness for Carnivora collection of species in that grid cell (see Methods).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH Letter
Extended Data Figure 5 | Global distribution of viral and host species in our database (n = 105); i, species of the order Cetartiodactyla with no
richness for wild even-toed ungulates (order Cetartiodactyla). described viruses in the literature. Warmer colours (larger values) in c and
a, Observed total viral richness (for n = 70 host spp.); b, predicted total f highlight areas predicted to be of greatest value for discovering novel
viral richness given maximum research effort; c, missing viruses or viruses or novel viral zoonoses, respectively, in even-toed ungulates.
predicted minus observed total viral richness; d, observed zoonotic viral Red/pink colours in panel i highlight areas with poor viral surveillance
richness (n = 70); e, predicted zoonotic viral richness given maximum in even-toed ungulates species to date. Hatched regions represent areas
research effort; f, missing zoonoses or predicted minus observed zoonotic where model predictions deviate systematically for the collection of
viral richness (same as included in Fig. 3c); g, global host species richness species in that grid cell (see Methods).
for Cetartiodactyla (n = 229); h, host species richness for Cetartiodactyla
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Letter RESEARCH
Extended Data Figure 6 | Global distribution of viral and host species (n = 1117); h, host species richness for Chiroptera in our database
richness for bats (order Chiroptera). a, Observed total viral richness (n = 192); i, species of the order Chiroptera with no described viruses in
(for n = 156 host spp.); b, predicted total viral richness given maximum the literature. Warmer colours (larger values) in c and f highlight areas
research effort; c, missing viruses or predicted minus observed total predicted to be of greatest value for discovering novel viruses or novel
viral richness; d, observed zoonotic viral richness (n = 157); e, predicted viral zoonoses, respectively, in bats. Red/pink colours in panel i highlight
zoonotic viral richness given maximum research effort; f, missing areas with poor viral surveillance in bat species to date. Hatched regions
zoonoses or predicted minus observed zoonotic viral richness (same represent areas where model predictions deviate systematically for the
as included in Fig. 3d); g, global host species richness for Chiroptera collection of species in that grid cell (see Methods).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH Letter
Extended Data Figure 7 | Global distribution of viral and host species h, host species richness for Primates in our database (n = 98); i, primate
richness for primates (order Primates). a, Observed total viral richness species with no described viruses in the literature. Warmer colours (larger
(for n = 71 host spp.); b, predicted total viral richness given maximum values) in c and f highlight areas predicted to be of greatest value for
research effort; c, missing viruses or predicted minus observed total discovering novel viruses or novel viral zoonoses, respectively, in primates.
viral richness; d, observed zoonotic viral richness (n = 73); e, predicted Red/pink colours in panel i highlight areas with poor viral surveillance
zoonotic viral richness given maximum research effort; f, missing in primate species to date. Hatched regions represent areas where model
zoonoses or predicted minus observed zoonotic viral richness (same as predictions deviate systematically for the collection of species in that grid
included in Fig. 3e); g, global host species richness for Primates (n = 400); cell (see Methods).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Letter RESEARCH
Extended Data Figure 8 | Global distribution of viral and host species (n = 2206); h, host species richness for Rodentia in our database (n = 221);
richness for rodents (order Rodentia). a, Observed total viral richness i, rodent species with no described viruses in the literature. Warmer
(for n = 178 host spp.); b, predicted total viral richness given maximum colours (larger values) in c and f highlight areas predicted to be of greatest
research effort; c, missing viruses or predicted minus observed total value for discovering novel viruses or novel viral zoonoses, respectively,
viral richness; d, observed zoonotic viral richness (n = 183); e, predicted in wild rodents. Red/pink colours in panel i highlight areas with poor
zoonotic viral richness given maximum research effort; f, missing viral surveillance in rodent species to date. Hatched regions represent
zoonoses or predicted minus observed zoonotic viral richness (same areas where model predictions deviate systematically for the collection of
as included in Fig. 3f); g, global host species richness for Rodentia species in that grid cell (see Methods).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH Letter
Extended Data Figure 9 | Order-level phylogenies showing residuals minus observed) from the best-fit GAM to predict the number of zoonotic
from zoonoses model. a–e, Subtrees from cytochrome b maximum viruses using all data. Species with residual values between −1 and 1
likelihood phylogeny for 558 mammal species (constrained to order-level (black) are accurately predicted within one virus. Warm colours represent
topology of mammal supertree) for bats (a), carnivores (b), even-toed species with positive residuals (orange >1 to 3; red >3). Cool colours
ungulates (c), rodents (d) and primates (e). Species included have at least represent species with negative residuals (green <−1 to −3; blue <−3).
one described virus association and available genetic data. Wildlife species Marine mammals, domestic animals, and species with missing data and
names and terminal branches are colour-coded by the residuals (predicted not included in the best-fit models are shown in grey.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Letter RESEARCH
Extended Data Table 1 | Summary of best-fit GAMs for total and zoonotic viral richness per wild mammal species, and
probability of a virus being zoonotic
Effective Total Dev. Relative
Z Chi-sq
Term Value P-value Degrees of Explained Dev.
statistic statistic
Freedom Explained
Total Viral Richness Model (all data, n=576 species) 49.2%
Intercept 0.52 7.43 <0.001
Disease-related publications (log) 1846.57 <0.001 5.55 64.8%
Mammal sympatry (>20% range overlap) 301.38 <0.001 5.16 10.1%
Order CHIROPTERA 155.12 <0.001 1 9.9%
Order RODENTIA 95.49 <0.001 1 4.8%
Order PRIMATES 34.4 <0.001 0.94 2.5%
Phylogenetically-corrected body mass 216.42 0.009 3.82 1.9%
Order CETARTIODACTYLA 24.37 <0.001 0.94 1.8%
Geographic range (log) 18.93 0.025 3.58 1.6%
Order PERISSODACTYLA 9.95 0.001 1 1.4%
Order EULIPOTYPHLA 5.87 0.009 0.85 1.1%
Total Viral Richness Model (stringent data, n=575 species) 35.8%
Intercept -0.47 -5.31 <0.001
Disease-related publications (log) 923.02 <0.001 4.98 53.6%
Order RODENTIA 129.28 <0.001 0.98 12.6%
Order CHIROPTERA 109.23 <0.001 1 12.2%
Order PRIMATES 85.12 <0.001 1 11.8%
Mammal sympatry (>20% range overlap) 44.96 <0.001 4.69 3.9%
Phylogenetically-corrected body mass 9.65 0.036 3.51 2.8%
Geographic range (log) 11.14 0.079 2.66 1.5%
Order CINGULATA 0.87 0.286 0.76 0.6%
Order EULIPOTYPHLA 1.21 0.151 0.59 0.4%
Order PERAMELEMORPHIA 0.74 0.307 0.7 0.4%
Order SCANDENTIA 0.94 0.13 0.41 0.3%
Proportion Zoonoses Model (all data, n=584 species) 82.0% (number of zoonoses)
33.0% (proportion, w/offset)
Intercept -0.34 -8.57 <0.001
Order CETARTIODACTYLA 27 <0.001 0.88 36.3%
Phylog. dist. from humans (log, cytb tree) 12.7 0.002 1.88 17.0%
Urban to rural human population ratio in
10.01 0.002 1.25 13.0%
species range (log)
Disease-related publications (log) 5.81 0.017 1.2 7.7%
Order CHIROPTERA 4.43 0.015 0.71 6.5%
Order PERISSODACTYLA 3.28 0.039 0.76 6.4%
Order SCANDENTIA 0.81 0.311 0.79 5.3%
Order PERAMELEMORPHIA 0.76 0.323 0.78 4.8%
Order DIPROTODONTIA 0.72 0.194 0.43 1.7%
Hunted species, IUCN 0.75 0.167 0.36 1.3%
Proportion Zoonoses Model (stringent data, n=576 species) 23.6%
Intercept -1.35 -22.66 <0.001
Phylog. dist. from humans (log, cytb tree) 56.13 <0.001 2.36 34.5%
Order CETARTIODACTYLA 22.93 <0.001 0.94 28.0%
Urban to rural human population ratio
16.88 0.002 4.05 19.6%
change, 1970-2005
Order PERISSODACTYLA 0.86 0.308 0.83 5.0%
Change in human population density in
3.16 0.132 1.47 4.3%
range, 1970-2005
Disease-related publications (log) 5.03 0.014 1.21 3.8%
Order DIPROTODONTIA 2.39 0.066 0.71 2.8%
Phylogenetically-corrected body mass 0.12 0.294 0.12 1.1%
Order LAGOMORPHA 0.7 0.196 0.42 0.9%
Order PRIMATES 0.62 0.097 0.28 0.1%
Viral Traits Model (all data, n=464 viruses) 27.2%
Intercept -1.59 -5.69 <0.001
Max phylogenetic host breadth w/out
44.91 <0.001 2.94 45.6%
humans, (log, cytb tree)
Number of publications (log) 35.83 <0.001 3.28 37.4%
Cytoplasmic replication 10.96 <0.001 0.86 9.2%
Vector-borne 4.9 0.014 0.75 4.6%
Envelope 0.88 0.166 0.46 2.3%
Average genome length (log) 0.12 0.266 0.09 0.9%
Viral Traits Model (stringent data, n=408 viruses) 21.1%
Intercept -2.23 -7.51 <0.001
Number of publications (log) 29.51 <0.001 2.64 53.1%
Max phylogenetic host breadth w/out
15.75 <0.001 2.53 25.5%
humans, (log, cytb tree)
Cytoplasmic replication 10.33 0.001 0.88 17.5%
Vector-borne 1.87 0.085 0.6 3.9%
Models were selected separately using the entire dataset and a stringent dataset that excluded host–virus associations detected by serology. Variables are sorted by relative
per cent deviance explained with in each model.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
CORRECTIONS & AMENDMENTS
Erratum
doi:10.1038/nature23660
6 1 2 | N A T U R E | V O L 5 4 8 | 3 1 au g u s t 2 0 1 7
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.