Using Big Data To Promote Precision Oral Health in

J Public Health Dent .

ISSN 0022-4006

Using big data to promote precision oral health

in the context of a learning healthcare system
Joseph Finkelstein, MD, PhD1 ; Frederick Zhang, BA2; Seth A. Levitin, BS2;
David Cappelli, DMD, MPH, PhD3
1 Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
2 Center for Bioinformatics and Data Analytics in Oral Health, College of Dental Medicine, Columbia University, New York, NY, USA
3 Department of Biomedical Sciences, School of Dental Medicine, University of Nevada, Las Vegas, NV, USA

Keywords Summary
precision health; big data; learning health sys-
tem; public health dentistry. There has been a call for evidence-based oral healthcare guidelines, to improve
precision dentistry and oral healthcare delivery. The main challenges to this goal
are the current lack of up-to-date evidence, the limited integrative analytical data
Joseph Finkelstein, Department of Population
Health Science and Policy, Icahn School of
sets, and the slow translations to routine care delivery. Overcoming these issues
Medicine at Mount Sinai. New York, requires knowledge discovery pipelines based on big data and health analytics,
NY, USA. intelligent integrative informatics approaches, and learning health systems. This
Tel.: 212-659-9596 article examines how this can be accomplished by utilizing big data. These data
Fax: 212-423-2998 can be gathered from four major streams: patients, clinical data, biological data,
e-mail:[email protected]. and normative data sets. All these must then be uniformly combined for analysis
and modelling and the meaningful findings can be implemented clinically. By
Received: 2/14/2019; accepted 12/2/2019.
executing data capture cycles and integrating the subsequent findings, practi-
doi: 10.1111/jphd.12354 tioners are able to improve public oral health and care delivery.

J Public Health Dent (2020)

Introduction the field of healthcare and medicine have accelerated.9–11

Without evidence, clinical practice is empirical, anecdotal, and
Precision medicine has become a key component in the
experiential. In the field of dentistry, certain therapies or proce-
modern delivery of healthcare. The aim of precision medicine
dures are used in clinical practice without substantive evidence
is to tailor healthcare delivery to individual patients, thereby
improving care delivery and outcomes while decreasing costs.1 for their use. For example, a systematic review of scaling and
In determining the best care for an individual, precision medi- root planning therapy and its adjuncts found that of the nine
cine takes various factors into account in addition to disease adjuncts investigated in the paper, only four showed evidence
history, such as an individual’s environment, genome, and of benefit over scaling and root planning treatment alone.12
socioeconomic status. By combining these elements, a profile Another review could not locate long-term studies that deter-
can be generated for each individual that better predicts their mined the duration that positive effects from scaling could be
health outcomes and addresses risk factors. Combined infor- maintained. Accordingly, the review could not determine the
mation provides an opportunity to tailor treatment to the needs optimal recall frequency.13 Scaling and root planning therapy
of the individual patient based on their risk factors. To success- is a routine procedure that is considered “first-line therapy” for
fully integrate precision medicine into oral healthcare, three chronic periodontal disease, so the lack of literature that out-
major challenges must be addressed: development of up-to- lines how long therapy should be maintained is concerning.
date evidence-based guidelines,2,3 integration of large analytical The lack of well-established, evidence-based guidelines is par-
data sets,4,5 and translation of new knowledge into routine clin- tially a result of the limited availability of data sets in dental
ical care delivery.6,7 In the fields of dentistry and oral health, research. A call for evidence-based dentistry as the foundation
implementation of these factors represents a major obstacle to for modern dental practice has been recently reiterated by an
the delivery of precision oral medicine.8 editorial published by Dr. Robert J. Weyant.2
In recent years, oral health sciences called for a greater The second challenge facing precision oral health imple-
emphasis on evidence-based practice, especially as advances in mentation is the limited amount of integrative analytical data

Big Data to Promote Precision Oral Health Finkelstein J. et al.

sets, which precludes the possibility of making evidence-based Healthcare System (LHS) paradigm has been introduced to
decisions. The root causes of this problem are manifold. One capitalize on the great potential of high-scale healthcare data
significant problem is that insurance claims in the United combined with patient perspectives into integrated models of
States, which are a major source of data for dental research, continuous care improvement.20
do not have diagnostic codes, but rather treatment codes, A LHS was defined by the Institute of Medicine21 as a
when used for dental insurance.14 The data elements used to system in which “science, informatics, incentives, and cul-
track information in a dental clinic can vary widely from ture are aligned for continuous improvement and innova-
those used for research purposes. A study compared the data tion, with best practices seamlessly embedded in the
elements used by the National Institute of Dental and Cra- delivery process and new knowledge captured as an inte-
niofacial Research’s Cancer Data Standards Registry and the gral by-product of the delivery experience.” A scalable
Dental Information Model, a taxonomy of patient records infrastructure for big data aggregation and analytics pro-
vides necessary framework for personalized health imple-
based on data charting by practicing general dentists. This
mentation in the context of LHS.22 Thus, one potential
study found that only 46% of the data elements based on a
solution to the three challenges outlined above could be
clinical setting overlapped with the data elements used in
the implementation of big data and data science practices
research.15 These obstacles hinder the development of large,
into both the dental research infrastructure and routine
analytical data sets that can be used for research purposes.
clinical oral healthcare.
Improvements in data curation and aggregation in dentistry
The National Institute of Dental and Craniofacial Research
and oral medicine is a necessary step for the integration of has emphasized the importance of evidence-based treatment
precision medicine practices into oral healthcare. An urgent and developed ways to implement data science systematically
need for intelligent and harmonized aggregation of heteroge- through funding, education, and fellowships.9 The translation
neous oral health data sets has been well articulated in sev- of new knowledge discovered by big data analytics into
eral recent publications.4,8 evidence-based dental practice has potential to improve pub-
The third challenge is the slow translation from research to lic oral health outcomes.4 The term “precision public health”
routine care delivery. A survey of clinical dental school faculty has arisen as a descriptor for this approach. Precision public
found that only 47 percent of faculty integrate evidence-based health has been defined as a method to “[improve] the ability
dentistry into their teaching.16 One potential cause for this is to prevent disease, promote health, and reduce health dispar-
that the search for high-quality data is overwhelming and ities in populations by applying emerging methods and
cumbersome, and that clear clinical practice guidelines are technologies for measuring disease, pathogens, exposures,
not available.17 An easily accessible resource, based on up-to- behaviors, and susceptibility in populations; and developing
date evidence could help address this need and improve the policies and targeted implementation programs to improve
implementation of guidelines into clinical practice. Optimal health”.23 This approach requires the usage of big data in
approaches for facilitating translation of research findings order to predict risk profiles, improve our understanding of
into practice have been the subject of broad discussion across the pathogenesis of disease, and develop targeted treatment
all domains of care delivery as reflected by multiple publica- strategies, among other applications.24
tions in the field of dissemination and implementation sci- In this article, we will be discussing the potential role of
ence.18,19 In order to address the challenge of implementing big data in promoting the oral health of the public. The
personalized healthcare into routine practice, the Learning implementation of big data into oral health research and
practice is still a developing process; however, successful
Table 1 Challenges and Solutions Facing the Integration of Precision implementation of big data practices can overcome the chal-
Medicine into Oral Public Health lenges preventing the integration of precision medicine and
oral health. By addressing these challenges properly, the field
Precision public of oral health can become integrated into the age of precision
Challenge Solution medicine health medicine while also addressing dental public health needs, a
concept we have illustrated in Table 1. Through a scoping
Lack of up-to-date Knowledge discovery X X
evidence-based pipelines based on
review, we will examine the sources of data available for oral
guidelines big data and health health research, its applications for public oral health care
analytics delivery, and current barriers of implementation.
Limited integrative Intelligent integrative X X
analytical data sets informatics
approaches Methods
Slow translation to Learning health X X
Scoping review has been used in this article as a method to
routine care delivery systems
map out the available literature regarding a specific topic

of research in order to characterize the current landscape Results

and identify future directions in this field. The scoping
review methodology is based on the following five steps as Data charting
previously described25:
1. Identify a research question After applying article inclusion and exclusion criteria, we
2. Identify relevant studies found 21 total papers to be included in the final study.
3. Evaluate and select studies to include The scoping review identified three major categories of lit-
4. Chart the data erature regarding big data and oral health to be discussed
5. Collect, summarize, and report the results in this review. They are as follows:
1. Aggregation of big data for precision oral health: The
implementation of big data in oral health practice
requires the establishment of data streams to aggre-
Research question gate and warehouse data. In our search, we identified
four papers that describe the establishment of a data
The primary goal of this scoping review was to identify
current trends and future applications in the use of big repository for oral health research.
data in oral healthcare. To that end, the research question 2. Use of big data for predictive analytics and knowledge
discovery in oral health: Currently, the most
we developed was as follows: “What is the current role of
big data in oral health care and what new applications are established implementation of big data in oral health
being developed or proposed at this time?” research is secondary outcomes analysis. This pro-
cess consists of using data mining and other data
analysis techniques as established in the field of bio-
medical informatics to identify new patterns and
Search strategy trends from existing data sets.
We conducted a literature search of the PubMed database 3. Development of future applications in oral health
in order to identify papers to be included in this study. using big data: Beyond secondary outcome analysis
studies, we identified studies that represent forays
Our search only included articles that were either publi-
shed in English or had an English translation available. into novel areas of research. These studies did not fit
We used the following Boolean search term: (“big data” either of the above categories but represented new
applications of big data into the field of oral health,
OR “data mining” OR “data science” OR “data repository”
OR “precision dentistry”) AND (“oral health” OR “dental such as the role of social media in oral health
health” OR “dental care” OR “oral hygiene” OR “dental education.
medicine” OR “dental public health”). This search term
yielded 78 potential papers to be included in our study. Data extraction
The general characteristics of each of the studies included
in this review were extracted and summarized in Table 2.
Study selection In Table 3, we synthesized the different types of primary
data sources used for the 14 secondary outcomes analysis
After our initial search, we established inclusion and exclu- papers included in this review based on the country where
sion criteria for papers to be included in our final analysis. the study was conducted.
The inclusion criteria were as follows:
1. The paper had to investigate the applications of big
Data repositories and sources: aggregation
data, data mining, or data aggregation for oral health
of big data for precision oral health
2. The paper had to be available in English. Of the four papers devoted to developing data repositories
3. We included papers where big data may not have we identified in this review, three were based in the United
been the primary focus, but data mining or other States: BigMouth, the Consortium for Oral Health-Related
data analysis techniques were still used on large data Informatics (COHRI), and the National Dental Practice-
sets in the methodology of the study. Based Research Network (DPBRN). The BigMouth data
We then established the following exclusion criteria: repository was initially developed as a repository linking
1. Perspective or opinion pieces were not included in the electronic health records (EHRs) of four dental schools
this review. in the United States. This repository has since expanded
2. Other review articles were not included in this by including eight dental schools to house over 1.2 million
review. oral health records. COHRI is a consortium of over

Table 2 General Characteristics of the Studies Included

Author Country of origin Study description

Data repository development

von Bültzingslöwen et al., 2019 Sweden The Swedish Quality Registry for Caries and Periodontal Diseases is a database of
electronic patient dental records collected from affiliated dental care organizations26
Walji et al., 2014 USA The BigMouth data repository is a collection of EHRs which was initially collected from
four dental schools in the United States27
Gilbert et al., 2013 USA The National Dental Practice-Based Research Network is a network of practicing and
academic dentists and researchers who collaborate for data collection and research
Stark et al., 2010 USA The Consortium for Oral Health-Related Informatics is a consortium of over 20 dental
schools designed to share best practices and develop standardized data collection tools
including BigMouth29
Predictive analytics
Rao et al., 2019 Canada EHRs from the Canadian Hospitals Injury Reporting and Prevention Program database
were mined to identify the incidence of toothbrush-related injuries30
Suni et al., 2013 Finland Municipal dental records in Finland were mined to develop Kaplan–Meier survival curves
for caries-free permanent teeth and restoration survival distribution31
Käkilehto et al., 2009 Finland EHRs from four public dental health centers in Finland were mined to develop
Kaplan–Meier curves for restorations of different restorative materials32
Raedel et al., 2017 Germany Claims data from a large German health insurance company were mined to develop a
Kaplan–Meier survival curve for posterior tooth restorations33
Lee et al., 2018 Korea The Korea National Health and Nutrition Examination Surveys from 2010 to 2015 were
mined to develop a decision tree model for predicting risk of periodontal disease34
Chan et al., 2016 Taiwan EHRs from the National Health Insurance research database in Taiwan were mined to
identify differences in outcomes between patients who receive conventional
periodontal therapy and patients who receive comprehensive periodontal therapy35
Su et al., 2019 Taiwan Data from the Taiwanese Nationwide Oral Cancer Screening Program were mined to
determine the relationship between anatomic site of oral cancer and its staging and
Nalilah et al., 2013 USA The Nationwide Emergency Department Sample database was mined to discover the
relationship between mental illness and dental disease37
Thyvalikakath et al., 2015 USA EHRs from the Indiana University School of Dentistry were mined in order to develop a
model for predicting risk of periodontal disease38
Rai et al., 2019 USA EHRs from the University of Colorado School of Dental Medicine were mined in order to
identify factors associated with partial edentulism39
Filker et al., 2013 USA EHRs from the Nova Southeastern University College of Dental Medicine were mined to
find characteristics associated with caries risk level, including geographic median
income level40
Boland et al., 2013 USA EHRs from the Columbia University College of Dental Medicine were linked to medical
records of the same patients at a nearby hospital and analyzed in order to identify
associations between medical and dental diseases41
Kalenderian et al., 2016 USA Data from the BigMouth data repository were queried for patients diagnosed with chronic
moderate periodontitis and analyzed for the percentage that received treatment that
followed current evidence-based guidelines42
Tiwari et al., 2019 USA Data claims for Medicaid-enrolled children from 13 states were mined to find the
association between number of routine pediatric physician visits and preventive dental
visits in children43
Future applications
Huber et al., 2019 USA Text-based social media posts responding to the 2016 ADA sealants guideline across a
variety of different platforms were analyzed for their alignment with the ADA
Helmi et al., 2018 USA The Media Cloud searchable big data platform was queried for published digital media
related to community water fluoridation. These media were then analyzed for their
stance on community water fluoridation45
Liu et al., 2013 USA The data elements from the Cancer Data Standard Registry and Repository and the Dental
Information Model were compared to each other in order to characterize the overlap in
data elements used for dental research purposes as opposed to general clinical dental

Table 3 Primary Data Sources Used from Each Country from a large national health insurance company. Finally, the
Country Primary data sources used study from Korea used data from the Korean National
Health and Nutrition Examination Surveys. This includes a
United States Dental school EHRs
survey of health and nutrition behaviors as well as a physical
Hospital EHRs
Academic data repositories health examination, including an oral health examination.
Insurance claim databases Large EHR data sets were used successfully to generate
Canada Hospital EHRs significant research output on a variety of topics ranging
Finland Public health center EHRs from risk factors for periodontal disease, incidence of rare
Germany Insurance claim databases adverse events, tooth survival curves, predictive data min-
Korea Public health screening
ing for periodontal disease, and outcomes of comprehen-
Taiwan Insurance claims database
sive periodontal treatment. Other areas where EHR data
Public health screening
were used for secondary analysis included edentulism risk
factors, obesity impact on oral cancer outcomes, posterior
restoration outcomes, dental school strategic planning,
20 dental schools that share the common goal to further
quantitative model generation, and outcomes of dental
the field of oral health informatics and improve standard-
preventive visits. A broad range of methodologies was
ized data collection. The third US-based data repository is
employed from simple descriptive statistics to logistic
the DPBRN, which is sponsored by the NIDCR. This
regression, Kaplan–Meier survival analysis, decision tree
research network consists of practicing dentists and aca-
models and various machine-learning approaches.
demics in the field of oral health who have agreed to share
data and conduct research in a collaborative fashion. The
fourth paper describing the development of a data reposi- Future directions
tory was based in Sweden and presented the Swedish
Despite successful use of existing data sets for big data
Quality Registry for Caries and Periodontal Diseases
analytics, many articles emphasized limitations of their
(SKaPa). In contrast to the data repositories being devel-
data sets and need for more comprehensive integrated
oped in the United States, which were primarily based on
framework for big data allowing simultaneous inclusion of
collecting data from academic institutions, SKaPa was
multiple basic science, clinical and social science domains
developed as a registry for both private and public dental
related to oral health. Existing data repositories such as
clinics in Sweden.
SKaPa, National Dental Practice-Based Research Network,
BigMouth were instrumental in building useful predictive
Predictive analytics for knowledge discovery
models but lacked generalizability due to the limitations of
Of the data sources used for predictive analytics and the patient populations (dental school data, Medicaid data,
knowledge discovery in the studies that were based in the insurance data) included in these databases. However,
United States, the most common was EHR data from dental these repositories provided useful and valuable initial
schools, which was used in four of the seven studies included information for implementing precision dental public
in this review. Other data sources used were the Nationwide health strategies. A number of articles utilized big data
Emergency Department Sample, which is a national database generated by social media to characterize oral health in
of emergency department visits, the BigMouth data reposi- diverse patient population and to generate tailored messag-
tory described above, and Medicaid claims data. Studies ing promoting oral health guidelines. Overall, a consensus
based in countries outside of the United States used data framework for future directions of big data for precision
from a variety of sources. A Canadian study queried the oral health evolved from these articles that comprised a
Canadian Hospitals Injury Reporting and Prevention Pro- vision of synergistic and harmonized aggregation of multi-
gram, which is a hospital-based database of injury and poi- ple heterogeneous data sets pertinent to oral health and
soning events. Two studies based in Finland both used EHR dental care delivery from sequencing data, proteomics,
data from community health centers. A study in Taiwan metabolomics, to EHR data, exposome, and social media
used data from the National Health Insurance research data- and environmental data. Future directions using Big Data
base, which is a database of claims data from Taiwan’s man- were identified in articles describing generation of
datory single-payer insurance program. The other study supporting materials for evidence-based dentistry, tailored
from Taiwan used data from the Taiwanese Nationwide Oral healthcare guideline sharing via social media, development
Screening Program, a public health initiative to screen adults of common data elements for sharing clinical and
in Taiwan who have risk factors for oral cancer. These research data at point of care, and innovative approaches
patients were then linked to data from Taiwan’s National for identifying and targeting population subgroups for
Death Registry. A study in Germany also used claims data preventive care.

Discussion Some authors specify two additional V’s51: value (relevance of

the data), and variability (evolution and seasonality of dis-
The scoping review identified three major categories of arti-
eases). Overall, the wide spectrum of available data sets can be
cles reflecting the current state and potential future use of big
represented by four major data streams.52 The first stream is
data in precision dental public health. A number of articles
patient-generated data from self-report, wearable devices,
described approaches of aggregating multiple heterogeneous
ambient data capture, social media, and devices maintained
data streams to create harmonized systematic representation
throughout patient homes. The second stream comprises clin-
of oral disease prevention and care delivery. Despite multiple
ical data such as EHRs pathology reports, billing generated by
limitations, these data sets are rapidly increasing in volume
routine care delivery and provider characteristics. The third
and complexity requiring innovative approaches for their
visualizations and analysis.47 The second group of articles stream includes biologic data such as genetics, proteomics,
deals with various applications of big data analytics for knowl- microbiome and multi-omics53 The fourth is represented by
edge discovery and predictive modeling in the field of preci- normative data sets, which are data carefully collected in clini-
sion oral health.48 This area of research has been growing in cal trials, nationwide observations, and population sur-
geometric progression demonstrating significant potential in veys.54,55 The combination of these data streams can be used
identifying previously unrecognized subpopulations and to fuel knowledge discovery in research in order to deliver per-
approaches for oral health prevention and management.49 sonalized care, a concept summarized in Figure 1.
The third cohort of articles represent various innovative solu- Patient-generated data can come from new devices or
tions based on precision data analytics that address challeng- applications that capture objective measurements that are
ing problems in dental public health.7 regularly being introduced to the market. In the era of
“smart” devices, these data have become even more accessible
for patients, providers, and researchers alike. A 2018 study
Where to gather data
used toothbrushes that were connected to a smartphone appli-
Big data has been defined as large data sets that adhere to the cation in order to track brushing habits such as frequency,
four V’s: volume (size of the data set), velocity (speed at which duration, and surface coverage.56 More advanced systems such
data are generated), variety (different types of data being as 3-D motion tracking devices and “selfie” tooth brushing
incorporated), and veracity (accuracy of data reported).50 video interventions have also been tested in pilot studies in

Figure 1 Summary of knowledge discovery process using big data. [Color figure can be viewed at]

order to provide an even more detailed analysis of tooth brus- (NGS) data has led to the development of data repositories
hing technique.57,58 Currently, the implementation of this data for oral microbiome genomics. Repositories such as the
collection method is still in its relative infancy. As these types of Human Oral Microbiome database and the Human Micro-
technology develop and gain widespread adoption, these data biome Consortium have emerged as data aggregate to provide
can be analyzed to provide immediate feedback to patients. In standardized and easily accessible sources of oral microbiome
addition, potential exists for these data to be documented and genomic and taxonomic data to researchers and clinicians
uploaded into a central data repository to provide new avenues alike.63,64 Genome-wide association studies have also been
of research. performed on traditionally at-risk populations such as US
Our scoping review identified the development of major Hispanics and children to further identify characteristics that
data repositories aimed at collecting comprehensive dental can be used to stratify these patients.65,66 However, the rela-
data as another important trend in oral health research. As tionship between genetics and caries risk is still unclear. Cur-
we demonstrated in Results section, in the United States rently, these studies have only found modest associations
three major initiatives have emerged in recent years as between gene variants and caries risk. A large number of
forerunners for the big data trend in oral health. The larg- databases describing genes, proteins, and other biological fac-
est academic initiative is the Consortium for Oral Health- tors currently exist in order to facilitate research regarding
Related Informatics (COHRI), a consortium of members the biological mechanisms of disease.67 Conducting more
from over 20 dental schools aiming at standardizing oral NGS experiments to find links between biological factors and
health data collection and improving informatics utiliza- oral disease can improve integration of oral health pathways
tion in dental education, health care, and research.29 The into these databases and allow bioinformatics research in this
second data repository, BigMouth, was developed out of field to become more accessible.68
this consortium. The BigMouth data repository established Finally, the generation of data from normative data sets
the technical foundation and developed a data governance such as large clinical trials and national public health sur-
framework for secondary analysis of electronic dental veys represent another important source of data for oral
records including patient demographics, diagnoses, medi- health research at the population level. The randomized
cal history, dental history, procedures, odontogram, peri- clinical trial has long been recognized as the gold standard
odontal chart and treatment provider information.27 The for clinical investigational research. The data collected
third US-based data repository is the Dental Practice- from clinical trials have been pivotal in clarifying the effi-
Based Research Network (DPBRN), which is sponsored by cacy of many therapeutic agents used for caries and peri-
the NIDCR. This research network consists of practicing odontal disease control. In addition, data from clinical
dentists and academics in the field of oral health who have trials can be analyzed to find a variety of secondary out-
agreed to share data and conduct research in a collabora- comes. For example, the X-ACT clinical trial for xylitol
tive fashion.28 Outside of these data repositories that have lozenge therapy in adults found that xylitol supplementa-
been specially developed for academic purposes, clinical tion did not produce a statistically significant effect on car-
data can also come from a wide range of other sources. ies reduction.69 Beyond this primary finding, baseline data
Data from Medicaid claims, private insurance claims, collected from this clinical trial were analyzed to determine
emergency department records, and many other primary risk factors for root caries.70 However, in recent years
data sources have been used in oral health data analysis clinical trials in the field of oral health have come under
studies across the globe.33,37,43 Recent studies identified some scrutiny for their low adherence to best reporting
potential limitations of using EHR data generated solely by practices71. Prospective registration of clinical trials is nec-
dental schools as they may represent a biased patient sam- essary to promote study fidelity and data sharing. Unfortu-
ple.59 Similarly, administrative and claims records such as nately, some studies of dental clinical trials have found
Medicaid data represent a very valuable resource use of that only around 24–25 percent of clinical trials in the
which requires understanding of its strengths and limita- fields of dentistry and orthodontics are prospectively regis-
tions.60 Future research should capitalize on growing avail- tered.72,73 Because of this, the volume of normative data
ability of integrated comprehensive data sets representing sets available for public oral health research is lacking in
all facets of oral health.4 some regards. Improved adherence to clinical trial registra-
With the rise of the disciplines of the OMICS (genomics, tion and reporting guidelines and promotion of wide data
proteomics, metabolomics, etc.), biological data from in vitro sharing can facilitate the data analysis pipeline for oral
and in vivo sampling has also entered the age of big data. health research.74
Genomic sequencing of the oral microbiome has already hel- Social media platforms are another source of useful data
ped identify novel strains of cariogenic bacteria to improve for oral care delivery. In Japan, researchers set out to deter-
our understanding of the pathogenesis of caries.61,62 The mine if social networks among older adults impacted the
accelerating rate of production of next generation sequencing determinants of oral health. They found that the extent of

Figure 2 Illustration of knowledge discovery pipeline using electronic data. PCA: principal component analysis; PheWAS: phenome-wide association
study; GWAS: genome-wide association study; CART: classification and regression trees; SVM: support vector machine; NN: neural network; RF:ran-
dom forest. [Color figure can be viewed at]

edentulism in older adults is negatively correlated with the Science and Informatics (OHDSI) framework.77 In the
number of social networks.75 Another study looked at bully- educational domain, research on utilizing multiple data
ing on Twitter related to dentofacial features and orthodontic streams to better monitor student performance and iden-
treatment. They identified cases of bullying, qualified them, tify areas for personalized improvement will promote per-
and looked for coping mechanisms the victims and their fam- sonalized education and individualized student support
ilies had for the mistreatment.76 These studies demonstrate tailored to individualized performance profile. In Figure 2
the value of data gleaned from social media. Data gathered below, we illustrate an example of how electronic data can
from social media demonstrates that researchers can measure be used to facilitate knowledge discovery and eventually
the positive or negative impact of psychosocial variables on predictive modeling. Electronic phenotyping allows precise
oral health. The data present vast opportunities for tailored identification of specific oral health conditions and syn-
interventions both by clinicians and public healthcare dromes in the presence of data gaps and ambiguity of large
professionals. heterogeneous oral health data. Care pathways visualiza-
tion and analytics provides temporal representation of oral
healthcare delivery process. Sequential pattern mining
How to use data
identifies characteristic trajectories of dental conditions
Using the four major data streams discussed above, big and allows automated identification of patient subgroups
data provide four analytic domains that can be combined not readily discernible form a manual chart review. Elec-
to deliver optimal precision oral health. In the clinical tronic phenotyping results can be correlated with treat-
domain, multiple research opportunities exist to study pre- ments or medications used in the delivery of care in order
cision dental care delivery tailored to specific patient pro- to provide a holistic picture of each patient. Data mining
files. Development of real-time decision support tools for or other analysis techniques can then be applied to these
individualized diagnosis and treatment planning based on data to discover new patterns or associations and to iden-
a multitude of relevant factors provided before, during and tify optimal personalized treatment pathways. Finally, pre-
after the dental encounter will significantly improve the dictive models of outcomes and ideal treatments can be
quality of dental care and patient satisfaction. In the socio- established based on individual patient characteristics.
behavioral domain, identifying oral health risk factors spe- Similar analytical workflow can be applied to identify opti-
cific to particular population subgroups and delivering mal precision health pathways both for individual patients
targeted preventative interventions using digital media will as well as for unique patient subgroups such as elderly.
greatly facilitate individualized oral health on a population The development of clinical decision support systems
level. In the translational science domain, research on how (CDS) could greatly improve integration of evidence-based
the wired digital operatories access and utilize data from dentistry into clinical practice. A recent survey of dental clini-
outside streams, including a patient’s genetic traits and cians found that most providers were amenable to the idea of
microbiome to facilitate personalized care delivery will be implementing CDS into their daily practice and saw potential
supported by Common Data Models and cross-linked bio- for improving quality of care, patient oral health, and other
medical ontologies such as the Observational Health Data similar benefits.78 These tools can give providers summaries

of current evidence for every step of their decision-making Basic science data can be integrated into the clinician
process based on analysis of digitally entered patient informa- workspace so that biomarkers predicting disease risk and
tion or integration with patient EHR.79 Development and therapy outcomes can be identified for optimal treatment
integration of these tools into routine clinical care is still in planning. The integration of oral microbiome data and
progress. A CDS developed in New York provides dental genetic testing into clinical decision making can improve
hygienists with assessments and recommendations for diagnostic precision and the risk stratification process. The
screening of chronic, systemic conditions such as hyperten- principle of “deep phenotyping” can also be applied to
sion and diabetes.80 As vital signs such as blood pressure are improve our understanding of disease staging and out-
measured and recorded, the CDS provides assessments of comes. Deep phenotyping refers to “the precise and com-
personalized disease risk and prompts for referral to a special- prehensive analysis of phenotypic abnormalities in which
ist in appropriate situations as established by the current the individual components of the phenotype are observed
hypertension management guidelines.81 Another CDS was and described” in order to facilitate a more comprehensive
developed by Machado et al. for use in dental trauma man- understanding of the pathologic basis of disease.93 A study
agement and was found to improve adherence to evidence- on oral microbial profiles used principal components anal-
based guidelines in both dental students as well as experi- ysis to determine the microbial profile of healthy patients
enced pediatric dentists.82 The use of CDS for tobacco abuse when compared to chronic and aggressive periodontitis
screening and interventions has been found to improve both patients.94 Deep phenotyping of periodontitis patients can
provider adherence to current evidence and patient provide new insights into pathologic phenotypic character-
outcomes.83,84 istics that are predictive of tooth loss.95 Similar principles
Data mining techniques have high potential for improv- can also be applied to implant patients in order to develop
ing our understanding of particular socio-behavioral risk more comprehensive risk profiles of patients likely to
factors for oral disease. These techniques can be applied to develop implant failure or peri-implantitis.96 As we briefly
populations that are already considered at-risk in order to described earlier, there is currently lack of systematic evi-
identify individual risk factors that can be addressed using dence to support the use of genotyping in clinical dental
targeted public health interventions. This strategy is particu- practice.97 However, as patient genotyping becomes more
larly useful in the field of oral health because the develop- widespread and our understanding of the role of genomics
ment of caries and periodontal disease can be chronic and in oral health increases, there may be a role in the future
insidious in onset, leading to diagnosis in late stages of dis- for genetics to become another dimension for clinicians to
ease. Yoon et al. used big data and deep learning algorithms stratify patients.68
on a large sample of Latino patients to identify demo- Finally, data analysis can be implemented in the educa-
graphic, behavioral, and psychological factors associated tional domain in order to improve clinician training for future
with tooth mobility85 and other indicators of oral health sta- performance. The application of big data analytics in the area
tus in older adults.86 Other studies utilized big data in order of dental education spans from improving student training at
to develop risk prediction profiles for development of peri- dental schools to continuous professional improvement
odontal disease,34,38 implant failure,87 peri-implantitis,88 and and to ongoing real-time support via EHR at the point of
alveolar osteitis.89 Using big EHR data, Boehm A et al. care. Recent studies demonstrated potential of big data in
uncovered patient determinants of care utilization compli- predicting academic outcomes and professional performance
ance in a student dental clinic.90 In addition to risk predic- in graduate students.98,99 This approach was shown to
tion of oral disease, research in this field can be used to find have implications for optimizing personalized learning and
novel associations between oral health behavior and sys- improving student assessments based on individualized feed-
temic health. For example, a study conducted in 2018 found back provided in a timely fashion.100 Additionally, use of real
that regular dental visits were independently associated with life examples drawn from dental EHR in the process of dental
lower stroke risk.91 Similar studies can be used to further education demonstrated promising results in predoctoral39
integrate the fields of medicine and dentistry in order to and postdoctoral training.101,102 As evidence-based dentist
improve the overall health of the patient. Once individual training is being considered as an essential means for improve-
risk factors have been identified, preventive interventions ments in the quality of patients care,103 data analytics workflows
can be developed in order to increase patient awareness of for ongoing reporting of dental care quality metrics and pro-
these risk factors, change health behavior, and improve viding instructive feedback to dental professionals will be
patient outcomes. An example of this is the My Smile increasingly used in the context of LHS. Implementation of
Buddy iPad application, which engages families of at-risk point of care CDS based on individual patient profile and
populations to identify health behavior that puts their chil- driven by big data analytics has been shown to be an addi-
dren at risk of caries development and encourages them to tional vehicle for delivering best clinical practices and
improve their oral health habits.92 supporting ongoing clinician education.104

While social media can provide valuable data, social Three major challenges
marketing is a powerful tool for the promotion of health
messages, ones that can target and reduce oral health dis- A lack of up-to-date evidence-based guidelines has been
parities to subpopulations by changing attitudes, increasing previously noted in the literature.113 This paucity of
knowledge, and impacting behavior.105 The Mighty Mouth evidence-based practices can be corrected by employing
oral health program, which positions oral health as essen- knowledge discovery pipelines revealed through big data
tial to overall health, is a great example of a program that analytics and predictive modeling. Knowledge discovered
embraces social marketing. It emphasizes the immediate in multiple interconnected databases allows extract useful,
rewards of good oral health, such as fresher breath, and nontrivial, and valid patterns from large heterogeneous
frames oral health as easy, important, and cost-effective. It data sets,114 avoiding the “garbage in garbage out” risk
is presented as fun and informative, rather than demand- presented by the large quantities of data available. Broadly,
ing more scientific and informational approach. With big this pipeline may entail the following steps: electronic
data, oral healthcare providers can develop effective social phenotyping!care pathways visualization!sequential
marketing programs to tailor the message and improve pattern mining and predictive models. Electronic
care delivery. Social media can serve as a powerful source phenotyping is essential for big data research using EHRs.
of patient-generated data for big data analytics as well as This concept is based on the notion that every disease and
effective media for targeted messaging for oral health syndrome has a unique digital signature that can be used
promotion.106 to automatically identified cases in EHR even in the pres-
As of the July 2017, Hispanics in the United States ence of erroneous or insufficient coding. This is well illus-
accounted for 18.1 percent of the total population.107 They trated by a dry socket signature in EHR. When a patient is
face major general health and, more specifically, oral diagnosed with a dry socket, they must have had an extrac-
health inequities. More than half of Hispanics over the age tion first, then returned within a set number of days with
of 64 will suffer from tooth decay and they are more likely pain, open socket, or missing blood clot,59,89 with or with-
to experience delays in accessing dental care. There has out other possible symptoms. Even without the proper
been a widespread call by the greater dental community to code for a dry socket, we still have an electronic signature
address the oral health needs of this population.108 His- of this event in the record and possible accompanying side
panic Dental Association convened a workshop of health effects as well. Electronic phenotyping can be learned by
care providers and other experts to examine the current utilizing data science and classifying the exclusive signa-
state of Hispanic oral health research and identify gaps in tures of conditions, adverse events, and procedures. An
existing data and research methods. Research and develop- analytical system can read past what is stated in the note,
ment priorities were outlined by this workshop to better and with high sensitivity and specificity, recognize the clas-
meet the oral healthcare needs of Hispanic patients and to ses of the encounter, creating care pathway visualizations.
implement standardized, validated instruments using a The next step is employing sequential pattern mining, to
comprehensive data collection infrastructure.109 Another ultimately build precise predictive models. Overall, this
disadvantaged population whose oral healthcare needs approach can continue knowledge discovery and a quality
require urgent attention are older adults.110 In a recent improvement cycle.
article using cross-sectional analyses of the British Implementing these concepts requires verified, integrative,
Regional Heart Study (BRHS) and the Health, Aging, and and harmonized data sets, designing intelligent integrative
Body Composition (HABC) Study,111 markers of poor oral information architecture to manage these data. The systems
health were associated with disability and poor physical are integrative because they utilize multiple heterogeneous
function in older populations. The authors proposed fur- data sets from basic science, electronic medical records,
ther prospective investigations of these associations and patients, environments, and so on. It is insufficient to simply
underlying pathways. These research initiatives require an merge silos into one data set. It is required that there be an
increase in population-based studies, social and behavioral understanding of what the data are and how they are interre-
sciences, health promotion and communications, gene– lated; in that sense, the data sets must be intelligent. Data sets
environment interactions, and research training and work- can only know about selves if the system includes meta-data
force development. Broad inclusion of minority that explains what is in the data set and what are relationships
populations and other disadvantaged populations in big between internal and external data elements. The underlying
data initiatives was identified as a crucial component of systems will understand these relationships using cross-linked
addressing oral health inequities.112 By utilizing big data biomedical ontologies that will facilitate comprehensive
and precision oral health, public health professionals can knowledge discovery in a harmonized systematic way.
promote oral health in minority communities and attempt This process of learning and intervening has already
to reduce these disparities. begun, utilizing social media. An innovative example of

Figure 3 Example workflow of precision medicine integration into clinical practice. [Color figure can be viewed at]

using big data analytics for targeted public health interven- a cyclical process and once it reaches its action potential, it
tions was recently demonstrated by a team at Columbia will be a self-perpetuating system when it is implemented
University that developed a system that identifies in the context of LHS. It is a cycle of: capturing data
foodborne illnesses in NYC restaurants by analyzing Yelp actively and passively!organizing data into actionable
reviews.115 The system utilizes logistic regression trained information!analyzing outcomes!learning new proto-
with bias-adjusted augmented data, which has identified cols, and cycling back to the first step. From the patient’s
10 outbreaks and 8,253 complaints of foodborne illness perspective, the process has begun before the patient has
associated with NYC restaurants since 2012. Perhaps the even set foot into the office. In Figure 3, we illustrate the
best description of necessary pieces of a learning health care self-perpetuating cycle of precision medicine principles
system was given by Dr. Friedman, who chairs the Depart- into oral healthcare.
ment of Learning Health Sciences at the University of We can potentially use the data to identify a patient’s
Michigan School of Medicine in Ann Harbor. He outlines preferences and recognize moments when s/he can be
five major points that have been frequently repeated116: delivered messaging tailored to their individual oral health
1. One can learn from every patient’s characteristics needs. During the patient’s clinic visit, there is decision
and experiences. support, risk assessment, vital sign monitoring, pain reduc-
2. Best practice knowledge is immediately available to tion, resiliency support, oral health education, health
support decisions. screening, and survey completion. In the clinic, the deci-
3. Improvement is continuous through ongoing study. sion support can be provided based on personalized algo-
4. An infrastructure enables this to happen routinely rithm analytics that can identify similar patients and
and with economy of scale. understand which procedures were most effective for this
5. All of this is part of the culture. group of patients and suggest these treatments at the point
There is a slow progress in implementing these five of delivery. Post-visit, the process repeats itself, while con-
major tenets of LHS into dental care.113 The major prereq- tinually updating the intelligent integrative databases that
uisite of successful introduction of sustainable LHS ecosys- will then be able to help this individual and others in the
tem in dental care is establishment of comprehensive future. Overall, this environment would promote precision
integrated intelligent big data infrastructure combining oral health and allow us to build powerful predictive
multiple heterogeneous data streams related to oral health. models and hopefully improve dental care delivery and
This is both a challenge to and impetus for the implemen- patient outcomes. This strategy would allow the provider
tation of precision oral healthcare. Precision oral health is to better employ different translational projects that also

© 2020 The Authors. Journal of Public Health Dentistry published by Wiley Periodicals, Inc. on behalf of American Association of Public Health Dentistry. 11
Big Data to Promote Precision Oral Health Finkelstein J. et al.

