Network Analytics An Introduction and Illustrative Applications in Health Data Science
Network Analytics An Introduction and Illustrative Applications in Health Data Science
Research
To cite this article: Pankush Kalgotra & Ramesh Sharda (2023) Network analytics:
an introduction and illustrative applications in health data science, Journal of
Information Technology Case and Application Research, 25:3, 305-315, DOI:
10.1080/15228053.2023.2187995
ABSTRACT
Analytics researchers are widely using network analysis as a part
of their methodology. In this review paper, we discuss different
network concepts while summarizing some studies conducted
using descriptive, predictive, and prescriptive analytics
approaches. These applications illustrate the value of incorpor
ating network properties of a phenomenon in better under
standing the problem, prediction, and optimization of an
outcome of interest, especially in the health domain.
Introduction
Network Analysis is a popular method for analyzing complex problems
involving interactions among features or observations. While network analysis
is not a new technique, it has recently gained momentum due to the avail
ability of cheap computing as the algorithms to analyze large networks require
large processing power. In addition, its suitability for analyzing large datasets
involving underlying relationships or connectedness has made it one of the top
choices for analytics researchers.
A network comprises nodes connected through well-defined edges. One
major area that generates network-type data is social media, where relation
ships are explicitly embedded. In other words, the nodes make decisions to
connect to other nodes. For instance, two friends on Facebook make
a connection in the Facebook network, which is an explicit edge. However,
there are other types of networks with implicit relationships that are defined
using some underlying exchanges derived using some computation. Examples
include product co-purchase network (Dhar et al., 2014), ingredient network
(Teng et al., 2012), comorbidity network (Hidalgo et al., 2009; Kalgotra et al.,
2017), text-based network (Celardo & Everett, 2020), brain parts network
(Kalgotra & Sharda, 2018) and others.
In this paper, our focus is on discussing a variety of data science research
emerging using network analysis. It is important to note that our focus is not
specifically on social networks. We refer the interested reader to a review paper
by Borgatti et al. (2009) in the Science journal in which the authors elaborated
© 2023 The Author(s). Published with license by Taylor & Francis Group, LLC.
306 EDITORIAL
on the history of social network analysis, the theories emerging from the social
network analysis, and the type of research questions studied in the past.
In this review paper, we start by discussing the representation of network
data and common outputs from network analysis. Then, different types of
network analytics research are discussed. In each type, we include some of our
published papers as examples in addition to other papers. Finally, we conclude
by describing the potential contributions of network analytics research.
The most common outputs obtained from the network analysis are the node
centralities such as degree, closeness, betweenness, and eigenvector. Other
measures at the node level may include clustering coefficient and page rank,
among others. At the structural level (macro), the metrics such as distribution
of centralities, average path length, and density result in a typical network
analysis. The micro (node) and macro (structure) level measures are used to
understand the properties of the network. For instance, Watts and Strogatz
(1998) used clustering coefficient and path length to derive the small-world
phenomenon, whereas Barabási and Albert (1999) used the distribution of the
degree centrality to describe the scale-free property of the network.
In addition to the mathematical computations, the common output from
a network analysis is a visualization. Since it is difficult to make sense of the
visualization of a network consisting of a large number of nodes and edges,
visualization researchers have developed different layouts to construct a view
of the network. Common layouts include Fruchterman-Reingold
(Fruchterman & Reingold, 1991), Yifan Hu Multilevel (Hu, 2005), etc.
Although a layout simplifies the structure of the network by reorganizing
the nodes in a visual space, it does not provide precise information about
the network, especially when the number of nodes is large. A network visual is
of little value if it is not annotated properly or presented creatively. Therefore,
the onus is on the researchers to present the visual by annotating it or
creatively organizing it. A good example of the annotation of a network can
be found in Hidalgo et al. (2009). On the other hand, Figure 1 presents
a network of diseases developed by the authors of this paper. In this undirected
comorbidity network, the nodes are diseases. Two diseases are connected if
these co-occur in the patients. The strength of a connection is computed using
Salton Cosine Index. To provide meaning to the network, we organized the
network in the shape of a human body and placed the diseases at the corre
sponding organ system. The size of a node is based on the degree centrality. It
is easy to interpret from the visual that mental disorders and heart disorders
have the highest degree, which would have been difficult to infer using
a typical network layout. The same network was presented in Kalgotra et al.
(2017) with the Fruchterman-Reingold layout.
The original method of presenting the network is a graph matrix, which is
a mathematical representation of the network. With the advent of graphical
user interfaces, the two-dimensional visualizations of large networks became
popular with software such as UCINET (Borgatti et al., 2002), Gephi (Bastian
et al., 2009), etc. We expect the next step in the visual network analysis to be
the analysis through virtual and augmented reality, which is more immersive
and will likely increase the adoption of the network method further across the
disciplines (See Figure 2). It seems to be the natural evolution of network
analysis. Some researchers and companies are already exploring this idea of
network analysis in virtual reality such as VRNetzer (Pirch et al., 2021).
308
EDITORIAL
is by Kalgotra and Sharda (2021), in which network analysis has been used to
predict an outcome exogenous to the network. Specifically, the comorbidity
network was used to predict hospital length of stay (LOS). In this paper, the
authors used electronic health records of more than 24.7 million patients
across 662 US hospitals over 16 years (2000–2015). The authors used a two-
step approach to create the machine learning model – creating comorbidity
networks in the first step and then creating machine learning models for
predicting LOS in the second step.
First, an independent sample of about three million patients was used to
create comorbidity networks in which the diseases were the nodes, and two
diseases were connected if they appeared in a patient during the same hospital
visit. The comorbidity networks were used to create new variables for the
remaining patients who were not part of the network analysis. To understand
the new features generated using comorbidity networks, consider a patient
who visits the hospital with a complaint of a hypothetical disease A. The only
disease-related information available at the time of admission is disease A. In
our application case, the network was searched for disease A, and the top five
connected diseases were identified. These diseases were labeled as probable
diseases as these were likely to be diagnosed during the hospital stay. In
addition, a patient may have a history of diseases in the system, termed
historical diseases. Together, the probable and historical diseases were called
latent comorbidities. The new construct of latent comorbidities was then used
in modeling and predicting LOS at the time of admission. The predictive
models for LOS were created with patient demographics, the known diseases
at the time of admission, and latent comorbidities as the independent vari
ables. The Long-Short-Term Memory (LSTM) models were created without
and with the latent comorbidities to compute the explanatory and predictive
power added by the proposed variables. In terms of variance explained, the
new construct added 3.6%, and in terms of mean absolute percent error
(MAPE), the latent variable improved the MAPE by 1.9%. Although the
numbers seem low, these are equivalent to the improvement in the forecast
by $882.8 million. Therefore, the gain is practically significant. See Kalgotra et
al. (2023) for another such study.
As evident from the examples above, networks can be used for predictive
modeling to gain additional predictive power. More such network-driven
methods are required to predict outcomes that are endogenous or exogenous
to the networks.
Conclusions
Network science has been used as a theory to understand an emergent
phenomenon and as a methodology to model relationships. Although our
focus is not on theory-driven network analysis, it is worthwhile to mention
some important concepts and theories derived from network analysis. Some
of the well-known concepts include random graphs (Erdős & Rényi, 1959),
scale-free networks (Barabási & Albert, 1999), strength of weak ties
(Granovetter, 1973), power law distribution of WWW (Adamic &
Huberman, 2000), small world phenomenon (Watts & Strogatz, 1998),
Benford’s law in online social network (Golbeck, 2015), role of cliques
(Provan & Sebastian, 1998), information diffusion (Bakshy et al., 2012),
preferential attachment (Newman, 2001), network flow (Borgatti, 2005) and
community detection (Reichardt & Bornholdt, 2006), among others. In
addition, studies have been designed to understand the validity or the
structure of the network (Kalgotra, Sharda, & Luse, 2020).
In data-driven research, the purpose is to discover new theories and pat
terns. Therefore, it is important to generalize the novel relationships between
concepts involving network analysis. In network analytics studies, the con
textual and methodological contributions are more apparent. In the papers
with contextual contributions, novel networks are created. In other words,
a problem is studied through a novel network lens. On the other hand, in the
papers with methodological contributions, the network analysis is a crucial
part of a bigger methodological process and thus, contributes to the method of
the study. Subsequently, it is important to generalize the methodology so that
it can be applied in different settings and different problem domains.
In this paper, we attempted to review different types of analytics research
conducted using network analysis. In each type of analytics, several papers
across the disciplines were discussed. In addition, the relevant concepts and
references were listed throughout the paper. Therefore, our paper can be used
as a guide by researchers and educators interested in learning and applying
network methodology.
JOURNAL OF INFORMATION TECHNOLOGY CASE AND APPLICATION RESEARCH 313
Notes on contributors
Pankush Kalgotra is an Assistant Professor of Business Analytics at Auburn University. He
earned his Ph.D. in Management Science and Information Systems from Oklahoma State
University. His research interests include healthcare analytics, network science, neuroimaging
in information systems, and the dark side of information technology. His recent papers have
appeared in journals such as Journal of Management Information Systems, Journal of Business
Research, Nature: Scientific Reports, International Journal of Medical Informatics, Computer
Methods and Programs in Biomedicine, Information Systems Frontiers, Decision Sciences
Journal of Innovative Education and others.
Ramesh Sharda is the Vice Dean for Research and Graduate Programs, Watson/ConocoPhillips
Chair and a Regents Professor of Management Science and Information Systems in the Spears
School of Business at Oklahoma State University. His research has been published in major
journals in management science and information systems including Management Science,
Information Systems Research, Journal of Management Information Systems, Operations
Research Nature: Scientific Reports, INFORMS Journal on Computing, and many others. He
has coauthored two textbooks: Analytics, Data Science, and Artificial Intelligence: Systems for
Decision Support, 11th edition, and Business Intelligence, Analytics, and Data Science: A
Managerial Perspective, 4th edition. He is a member of several editorial boards, served as the
Executive Director of Teradata University Network, and was inducted into the Oklahoma Higher
Education Hall of Fame. Dr. Sharda is a Fellow of INFORMS and AIS.
References
Adamic, L. A., & Huberman, B. A. (2000). Power-law distribution of the world wide web.
Science, 287(5461), 2115. https://doi.org/10.1126/science.287.5461.2115a
Albert, R., Jeong, H., & Barabási, A. L. (1999). Diameter of the world-wide web. Nature, 401
(6749), 130–131. https://doi.org/10.1038/43601
Bakshy, E., Rosenn, I., Marlow, C., & Adamic, L. (2012, April 16 - 20). The role of social
networks in information diffusion. In Proceedings of the 21st international conference on
World Wide Web, Lyon, France, (pp. 519–528).
Balakrishnan, V. K. (2019). Network optimization. Chapman and Hall/CRC.
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286
(5439), 509–512. https://doi.org/10.1126/science.286.5439.509
Bastian, M., Heymann, S., & Jacomy, M. (2009, March). Gephi: An open source software for
exploring and manipulating networks. Proceedings of the International AAAI Conference on
Web and Social Media, 3(1), 361–362. https://doi.org/10.1609/icwsm.v3i1.13937
Borgatti, S. P. (2005). Centrality and network flow. Social Networks, 27(1), 55–71. https://doi.
org/10.1016/j.socnet.2004.11.008
Borgatti, S. P., Everett, M. G., & Freeman, L. C. (2002). Ucinet for windows: Software for social
network analysis. Harvard, MA: Analytic Technologies, 6, 12–15. https://pages.uoregon.edu/
vburris/hc431/Ucinet_Guide.pdf
Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social
sciences. Science, 323(5916), 892–895. https://doi.org/10.1126/science.1165821
Celardo, L., & Everett, M. G. (2020). Network text analysis: A two-way classification approach.
International Journal of Information Management, 51, 102009. https://doi.org/10.1016/j.
ijinfomgt.2019.09.005
314 EDITORIAL
Davazdahemami, B., Kalgotra, P., Zolbanin, H. M., & Delen, D. (2023). A developer-oriented
recommender model for the app store: A predictive network analytics approach. Journal of
Business Research, 158, 113649. https://doi.org/10.1016/j.jbusres.2023.113649
Dhar, V., Geva, T., Oestreicher-Singer, G., & Sundararajan, A. (2014). Prediction in economic
networks. Information Systems Research, 25(2), 264–284. https://doi.org/10.1287/isre.2013.
0510
Erdős, P., & Rényi, A. (1959). On random graphs. I. Publicationes Mathematicate, 6(3–4),
290–297. https://doi.org/10.5486/PMD.1959.6.3-4.12
Fruchterman, T. M., & Reingold, E. M. (1991). Graph drawing by force‐directed placement.
Software: Practice & Experience, 21(11), 1129–1164. https://doi.org/10.1002/spe.4380211102
Golbeck, J. (2015). Benford’s law applies to online social networks. Plos One, 10(8), e0135169.
https://doi.org/10.1371/journal.pone.0135169
Granovetter, M. S. (1973). The strength of weak ties. The American Journal of Sociology, 78(6),
1360–1380. https://doi.org/10.1086/225469
Hidalgo, C. A., Blumm, N., Barabási, A. L., & Christakis, N. A. (2009). A dynamic network
approach for the study of human phenotypes. PLoS Computational Biology, 5(4), e1000353.
https://doi.org/10.1371/journal.pcbi.1000353
Hu, Y. (2005). Efficient, high-quality force-directed graph drawing. Mathematica Journal, 10
(1), 37–71.
Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des Dranses et dans quelques
régions voisines. Bull Soc Vaudoise Sci Nat, 37, 241–272.
Kalgotra, P., & Sharda, R. (2018). BIARAM: A process for analyzing correlated brain regions
using association rule mining. Computer Methods and Programs in Biomedicine, 162,
99–108. https://doi.org/10.1016/j.cmpb.2018.05.001
Kalgotra, P., & Sharda, R. (2021). When will I get out of the hospital? Modeling length of stay
using comorbidity networks. Journal of Management Information Systems, 38(4),
1150–1184. https://doi.org/10.1080/07421222.2021.1990618
Kalgotra, P., Sharda, R., & Croff, J. M. (2017). Examining health disparities by gender:
A multimorbidity network analysis of electronic medical record. International Journal of
Medical Informatics, 108, 22–28. https://doi.org/10.1016/j.ijmedinf.2017.09.014
Kalgotra, P., Sharda, R., & Croff, J. M. (2020). Examining multimorbidity differences across
racial groups: A network analysis of electronic medical records. Scientific Reports, 10(1), 1–9.
https://doi.org/10.1038/s41598-020-70470-8
Kalgotra, P., Sharda, R., & Luse, A. (2020). Which similarity measure to use in network
analysis: Impact of sample size on phi correlation coefficient and Ochiai index.
International Journal of Information Management, 55, 102229. https://doi.org/10.1016/j.
ijinfomgt.2020.102229
Kalgotra, P., Sharda, R., & Parasa, S. (2023). Quantifying disease-interactions through co-
occurrence matrices to predict early onset colorectal cancer. Decision Support Systems,
113929. https://doi.org/10.1016/j.dss.2023.113929
Lü, L., & Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical
Mechanics and Its Applications, 390(6), 1150–1170. https://doi.org/10.1016/j.physa.2010.11.
027
Miao, Z., & Balasundaram, B. (2017). Approaches for finding cohesive subgroups in large‐
scale social networks via maximum k‐plex detection. Networks, 69(4), 388–407. https://doi.
org/10.1002/net.21745
Newman, M. E. (2001). Clustering and preferential attachment in growing networks. Physical
Review E, 64(2), 025102. https://doi.org/10.1103/PhysRevE.64.025102
JOURNAL OF INFORMATION TECHNOLOGY CASE AND APPLICATION RESEARCH 315
Pirch, S., Müller, F., Iofinova, E., Pazmandi, J., Hütter, C. V., Chiettini, M., Menche, J. . . .
Menche, J. (2021). The VRNetzer platform enables interactive network analysis in virtual
reality. Nature Communications, 12(1), 1–14. https://doi.org/10.1038/s41467-021-22570-w
Provan, K. G., & Sebastian, J. G. (1998). Networks within networks: Service link overlap,
organizational cliques, and network effectiveness. Academy of Management Journal, 41(4),
453–463. https://doi.org/10.2307/257084
Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical
Review E, 74(1), 016110. https://doi.org/10.1103/PhysRevE.74.016110
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. Mcgraw-Hill.
Selim, H., & Zhan, J. (2016). Towards shortest path identification on large networks. Journal of
Big Data, 3(1), 1–18. https://doi.org/10.1186/s40537-016-0042-7
Teng, C. Y., Lin, Y. R., & Adamic, L. A. (2012, June). Recipe recommendation using ingredient
networks. In Proceedings of the 4th Annual ACM Web Science Conference, Austin, TX, USA,
(pp. 298–307).
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature,
393(6684), 440–442. https://doi.org/10.1038/30918
Pankush Kalgotra
Harbert College of Business, Auburn University, AL, USA,
[email protected] http://orcid.org/0000-0003-2684-0342
Ramesh Sharda
Oklahoma State University, Stillwater, OK, USA