Module-1 Data Analytics in Healthcare Systems
Module-1 Data Analytics in Healthcare Systems
net/publication/351792114
CITATIONS READS
0 306
4 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by s. Suganthi on 19 September 2021.
CONTENTS
1.1 Introduction....................................................................................................... 1
1.1.1 Data Analytics in Healthcare................................................................. 2
1.1.2 Characteristics of Big Data.................................................................... 3
1.2 Architectural Framework.................................................................................. 3
1.2.1 Data Aggregation...................................................................................4
1.2.2 Data Processing.....................................................................................5
1.2.3 Data Visualization.................................................................................6
1.3 Data Analytics Tools in Healthcare...................................................................7
1.3.1 Data Integration Tools...........................................................................7
1.3.2 Searching and Processing Tools............................................................8
1.3.3 Machine Learning Tools........................................................................9
1.3.4 Real-Time and Streaming Data Processing Tools.................................9
1.3.5 Visual Data Analytical Tools............................................................... 10
1.4 Data Analytics Techniques in Healthcare....................................................... 11
1.5 Applications of Data Analytics in Healthcare................................................. 12
1.6 Challenges Associated with Healthcare Data.................................................. 18
1.7 Conclusion....................................................................................................... 19
References................................................................................................................. 19
1.1 INTRODUCTION
The healthcare industry is multidimensional, with multiple data sources involving
healthcare systems, health insurers, clinical researchers, social media, and govern-
ment [1], generating different types and massive amounts of data. It is impossible to
handle this big data with traditional software and hardware and the existing storage
methods and tools. Data analytics is the process of the analysis of data to identify
1
2 ML and Analytics in Healthcare Systems
trends and patterns to gain valuable insights. The data generated in the health indus-
try are characterized by the four Vs of big data, namely volume, velocity, variety,
and veracity, which play crucial roles in health data analytics. Also, evidence-based
decision making has gained importance, which involves the sharing of data among
various data repositories. According to Deloitte Global Healthcare Outlook, it is
expected that global healthcare expenditure will continue to increase at an annual
rate of 5.4% between 2017 and 2022. This is due to the increased importance of
personalized medicine, the use of advanced technologies, the demand for new pay-
ment models, improvement and expansion of care delivery sites, and competition.
Various research attempts, based on big data, have provided strong evidence that
the efficiency of healthcare applications is dependent upon the basic architecture,
techniques, and tools used. Statistical data and reports can be generated with the use
of patient records, aiding in knowledge discovery, and thereby influencing value-
added services to the patients, improving healthcare quality, the making of timely
decisions, and minimizing the costs incurred. Hence, there is a need to incorporate
and integrate big data analytics into existing healthcare systems. Despite healthcare
analytics having massive potential for value-added change, there are many tech-
nological, social, organizational, economic, and policy barriers associated with its
application [2].
1.2 ARCHITECTURAL FRAMEWORK
Hadoop/MapReduce is an open-source platform for big data analytics used in
healthcare, which performs parallel processing in a distributed environment, involv-
ing multiple nodes in the network. The use of Hadoop and MapReduce technologies
has been found to be fruitful in many healthcare applications, by improving the
performance of, for example, image processing, neural signal processing, protein
4 ML and Analytics in Healthcare Systems
structure alignments, signal detection algorithms, and lung texture classification [5].
The architectural framework of big data in healthcare is composed of three major
components, namely data aggregation, data processing, and data visualization [6].
Figure 1.1 illustrates the conceptual framework of big data architecture in healthcare.
1.2.1 Data Aggregation
Data aggregation in healthcare involves the process of collecting and integrating raw
data from various modalities and multiple systems and converting them into a single
standard format suitable for analysis, processing, and storage in a data warehouse.
The functionalities involved in the process include data extraction, data transforma-
tion, and data loading.
• Data Extraction
Healthcare data occupy large volumes and come from heterogeneous
sources. The primary sources of medical data include medical records,
health surveys, claims data, disease registries, vital records, surveil-
lance data, peer-reviewed literature, clinical trial data, and administrative
Data Analytics in Healthcare Systems 5
1.2.2 Data Processing
The data processing used in healthcare includes batch processing and stream pro-
cessing methods [7]. Batch processing is the method of analyzing data in batches,
which are collected and stored over a period and in which response time is not
considered. On the other hand, stream processing is the method of analyzing huge
volumes of data, to which a real-time response is required. Some applications in
healthcare require real-time processing of data and they are characterized by noisy
data with missing or redundant values, continuous changes in data, and the need for
a rapid response. Stream processing overcomes these difficulties with simple and
rapid information extraction by using data-mining methods, such as clustering, clas-
sification, and frequent pattern mining [7]. Apache Hadoop MapReduce is a popular
framework used for batch processing, whereas Storm and S4 are frameworks used
for stream processing, with Apache Spark and Apache Flink being frameworks used
for both batch and stream processing.
The Hadoop platform is most widely used for batch processing in which paral-
lel processing of huge volumes of data are carried out in a distributed manner. It is
6 ML and Analytics in Healthcare Systems
a framework in which the process of big data analytics is conducted through a col-
lection of various tools, methodologies, and libraries. It consists of two main compo-
nents, the HDFS and Hadoop MapReduce.
Data storage can be carried out with tools other than HDFS, such as HBase, Hive,
Cassandra, Pig, Apache Flume, Apache Squoop, and other relational databases.
Whereas Apache Oozie is used in the case of large numbers of interconnected sys-
tems, Apache Zookeeper is used to maintain application reliability, and Mahout is
used for machine- learning purposes [7].
1.2.3 Data Visualization
Visualization is the graphical representation of data, which helps the practitioner
gain more insights from the data. The analytical tool cleanses and evaluates the data
with the help of data-mining algorithms, evaluation, and software tools before the
data are visualized. The main applications of data analytics to healthcare are que-
ries, reports, online analytical processing (OLAP), and data mining. They are used
Data Analytics in Healthcare Systems 7
However, data integration tools can be used to integrate health data from multi-
ple sources to generate meaningful insights from the data. Data integration tools
include software and platforms that can aggregate data from disparate sources. The
following are some data integration solutions that can be considered in healthcare
organizations to make better and more efficient use of healthcare data.
• Attunity is an integration tool that can aggregate disparate data and files
across all major databases, including cloud platforms, data warehouses, and
Hadoop. It also supports the Health Level 7 (HL7) messaging standard,
which is a healthcare standard. It can integrate and connect with web appli-
cations in real time.
• Informatica is an advanced, multi-cloud and hybrid data integration tool
that can integrate data from multiple, disparate datasets, such as data ware-
houses, Hadoop, enterprise applications, message applications, and mid-
range systems. It also provides data management tools for companies in
the healthcare field to facilitate patient services with improved outcomes
and reduced costs. Informatica’s cloud integration allows administrators to
integrate data with in-house applications, claims processing, etc., in health
organization environments.
• Information Builder is a data integration and business intelligence tool
that can measure and aggregate very large healthcare data collected
throughout the patient lifecycle. These tools ensure the availability of data
in real time across the healthcare environment and enhance the quality of
health services.
• Jitterbit [8] is a single, secure, cloud-based data integration platform that
aggregates structured and unstructured health data or clinical data retrieved
from sources such as EHR. It enables more-efficient operations and pro-
vides complete access to health data in a format that can be used with other
systems.
• Magic is a data integration tool that connects disparate systems for health-
care organizations. It ensures the best possible care of patients by keeping
all health-related records up to date and available to all healthcare provid-
ers. Magic’s integration platform combines diverse systems into a single
interface via the graphical user interface.
• Lucene [8] is a scalable tool for indexing large blocks of unstructured text
that provides advanced, full-text search capabilities. It can integrate easily
with Hadoop to facilitate distributed text management.
Data Analytics in Healthcare Systems 9
• SAS Visual Analytics [13] is a web-based analytical tool that allows multi-
ple users to access a massive amount of real-time data simultaneously from
a LASR analytical server. It allows parallel networking by transferring data
from one machine to another machine to access secure data quickly.
• Tableau [13] is a business intelligence visualization tool that transforms
raw and large datasets into a defined format to provide real-time structured
data to support decision making. One of the uses of Tableau has been to
quickly diagnose genetic diseases and to help health practitioners in provid-
ing rapid treatment to the patients.
• QlikView [13] is a visualization tool that transmits related data from differ-
ent sources into electronic medical records. It provides in-memory analysis
and reduces the risk of medical error by tracking safety metrics and lowers
the cost of delivering services to the patients. It ensures that all regulatory
compliance in the healthcare system is delivered and maintained in a timely
manner.
Data Analytics in Healthcare Systems 11
• Data Mining is useful for discovering patterns and for extracting mean-
ingful information from large databases. With the rapid growth in mas-
sive health data, data-mining techniques have helped to search for new
and valuable information (knowledge) from large complex databases in
sheer variety of data generated from different sources at different time points. The
volume of and variation in data generated by this sector are what makes it a topic of
great interest for data analysts. Conventional computing mechanisms and systems
fail to provide real-time monitoring and preventive plans for patients as well as for
the doctor. Hence, there is a need for smart strategies that decipher the incoming
data to uncover trends and anomalies and which give recommendations for patients,
helping doctors in their practice.
Some applications of data analytics in healthcare are as follows.
• Image-Based Analytics
Image-based datasets are a common source of information and are primar-
ily used by doctors for internal imaging. X-rays, mammography, CT, posi-
tron emission tomography–CT (PET–CT), ultrasound, and MRI are some
of the imaging technologies commonly used for diagnostics [5, 17]. Many
organizations and medical institutions release open datasets in a hope to
foster research activities. Neuro-images and MRI of the brain are widely
used for detecting tumors, an anomaly where the cells are enlarged and
form solid neoplasms. Early-stage detection is necessary for effective treat-
ment and recovery, and for decreasing the risk of mortality [18–20]. X-rays
are carried out for detecting fractures, pneumonia, cancer, etc. A large
amount of research has also been carried out for analyzing X-rays and CT
scans to promote early detection of the novel coronavirus COVID-19, with
minimal human intervention and interaction [21, 22].
With the growing number of medical records, the reliance on computer-
aided diagnosis and analysis is increasing [8]. High-performance comput-
ing and advanced analytical methods, like ML and optimization techniques,
are aiming to minimize predictive errors. Countries with low doctor-to-
patient ratios can benefit greatly from such machine-aided diagnostics.
A significant challenge associated with image-based analytics is the
amount of data generated. Images are space intensive. A single X-ray can
take up several megabytes (MBs) of storage space. The quality of an image
plays a crucial role in correct diagnosis and hence must not be compro-
mised. What adds to the processing challenges is that the data are highly
unstructured. Image processing techniques like segmentation, denoising
(noise reduction), and enhancement form the pre-requisites before use-
ful features, like color, contour, shape, pixel intensity, edges, etc., can be
extracted to train models for classification and diagnosis.
• Signal-Based Analytics
In a world full of wearable sensors, time-based signals are being generated
at a frequency greater than that at which they can be processed. Wearable
devices, like smart watches, smart rings, and fitness trackers, continuously
track heart rate, blood pressure, sleep patterns, calories burned, etc. Apart
from personal devices and gadgets, time-stream data are being generated
by electrocardiograms (ECG), ventilators, electroneurograms (ENG), elec-
troencephalograms (EEG), phonocardiograms (PCG), etc. [23]. Analysis of
14 ML and Analytics in Healthcare Systems
personal data, etc., can all be easily retrieved and stored permanently
because of ever-decreasing hardware cost [17, 29].
Data analytics can be helpful not only in the detection of disease at the
earliest stage, but can also attain high levels of accuracy, predicting the
timeline and disease development trajectory. Furthermore, data analytics
can bring to the doctor’s notice whether there is any change in the vital
signs, indicating a deviation from a healthy state. It can provide transpar-
ency to patients and provide them with a more personalized experience
of the entire healthcare management system [8, 30]. Healthcare organiza-
tions also benefit from data analytics as they help to provide cost-effective
care and personalized predictions for each patient. Clinical data can also
be useful for research purposes, such as the demographics of patients with
a particular condition, predicting the sales of drugs and their profitability,
identifying drug competitors, usage patterns of drugs, and effective drug
design, uncovering inter-drug associations, etc. [8].
Clinical decision support systems (CDSS) [5, 23, 31] may therefore be
an all-round solution for automated clinical diagnostics and research. But
achieving systems with high levels of accuracy and efficiency is a big chal-
lenge. The data gathered from different sources in multiple formats over
time adds volume, variety, and velocity to the data. Systems with high com-
putational power to deal with structured, semi-structured, and unstructured
data need to be implemented. Furthermore, handwritten clinical notes, pre-
scriptions, and medical journals need advanced ML algorithms with con-
cepts of NLP. Thus, the heterogeneity of clinical data remains currently the
most significant challenge and an open research area.
• Disease Transmission and Prevention
Some diseases are infectious and can be spread via direct or indirect con-
tact with the infected person or carrier. To prevent the outbreak of such con-
ditions, it becomes critical to study the means of transmission and to predict
the spread of the disease to develop better mitigation plans and improved
disease management strategies [32]. With the help of data analytics, math-
ematical and stochastic models can be generated to predict the outreach of
the disease and to estimate its impact.
Many researchers have studied the transmission of the novel coronavirus
that emerged in Wuhan in late 2019 and which has spread throughout the
world. It has been declared by the World Health Organization (WHO) to be
the worst epidemic in the past two decades [33]. Based on early available
data, symptoms, numbers of positive cases, and international travel history,
a predictive model was developed by researchers to identify the extent of
transmission and the risk it posed to human life [34]. Many government
organizations have also funded projects to research the prevention and
preparedness of individual countries. Massive amounts of data gathered
globally have been used to develop preventive measures to stop or mini-
mize further spread of the virus. Many countries opted for complete lock-
down, halting businesses, closing schools and universities, banning travel,
16 ML and Analytics in Healthcare Systems
vendors, etc. They continuously explore ways to understand data being gener-
ated from different sources and to understand the impact such data have on
their policies and services. This co-development and deployment environ-
ment aims to reduce the cost of healthcare services [43, 44].
1.7 CONCLUSION
Data analytics with big data in healthcare is still at the developing stage and advances
in tools and techniques will improve and their applications will expand. In addition,
establishing proper standards and governance of data, ensuring data privacy and
security, and updating the healthcare systems continuously are some of the chal-
lenges faced by the healthcare industry. Improving communication and data sharing
among related sectors in healthcare would increase the overall efficiency by provid-
ing value-added services, with minimal additional costs incurred.
REFERENCES
1. A. Belle, R. Thiagarajan, S. M. R. Soroushmehr, F. Navidi, D. A. Beard, K. Najarian.
Big data analytics in healthcare. Biomedical Research International, 2015, 16 pages,
2015. http://dx.doi.org/10.1155/2015/370194.
2. A. Kankanhalli, J. Hahn, S. Tan, G. Gao. Big data and analytics in healthcare:
Introduction to the special section. Information Systems Frontiers, 18, 233–235, 2016.
3. M. J. Ward, K. A. Marsolo, C. M. Froehle. Applications of business analytics in health-
care. Business Horizons, 57, 571–582, 2014.
4. S. Kumar, M. Singh. Big data analytics for healthcare industry: Impact, applications,
and tools. IEEE, 2 (1), 48–57, 2019.
5. N. Mehta, A. Pandit. Concurrence of big data analytics and healthcare: A systematic
review. International Journal of Medical Informatics, 114, 57–65, 2018.
6. Y. Wang, N. Hajli. Exploring the path to big data analytics success in healthcare.
Journal of Business Research, 70, 287–299, 2017.
7. N. El aboudi, L. Benhlima. Big data management for healthcare systems: Architecture,
requirements, and implementation, Hindwai. Advances in Bioinformatics, 2018. https://
doi.org/10.1155/2018/4059018.
8. V. Palanisamy, R. Thirunavukarasu. Implications of big data analytics in developing
healthcare frameworks–A review. Journal of King Saud University-Computer and
Information Sciences, 31(4), 415–425, 2019.
9. S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, T. Vassilakis.
Dremel: Interactive analysis of web-scale datasets. In: 36th International Conference,
2010.
10. C. E. Seminario, D. C. Wilson. Case study evaluation of Mahout as a recommender
platform. In: 6th ACM Conference on Recommender Engines (RecSys 2012), pp. 45–50,
2012.
11. https://www.softwaretestinghelp.com/ big-data-tools/.
12. https://www.predictiveanalyticstoday.com /top-open-source-commercial-stream-ana-
lytics-platforms/.
13. https://www.yourtechdiet.com/ blogs/impact-data-visualization-healthcare/.
14. N. Downing, A. Cloninger, A. Venkatesh, A. Hsieh, E. Drye, R. Coifman, et al.
Describing the performance of U.S. hospitals by applying big data analytics. PLoS One
12(6), e0179603, 2017.
15. A. Khalifa, S. Meystre. Adapting existing natural language processing resources for
cardiovascular risk factors identification in clinical notes. Journal of Biomedical
Informatics, 58, S128–S132, 2015.
16. D. D. Luxton, J. D. June, A. Sano, T. Bickmore. Intelligent mobile, wearable, and ambi-
ent technologies for behavioral health care. Artificial Intelligence in Behavioral and
Mental Health Care, Elsevier, 137, 2015.
20 ML and Analytics in Healthcare Systems
17. B. Ristevski, M. Chen. Big data analytics in medicine and healthcare. Journal of
Integrative Bioinformatics, 15 (3), 1–5, 2018.
18. A. R. Kavitha, C. Chellamuthu. Brain tumour detection using self-adaptive learn-
ing PSO-based feature selection algorithm in MRI images. International Journal of
Business Intelligence and Data Mining, 15 (1), 2019.
19. S. Tchoketch Kebir, S. Mekaoui, M. Bouhedda. A fully automatic methodology for
MRI brain tumour detection and segmentation. The Imaging Science Journal, 67(1),
42–62, 2019.
20. T. V. N. Rao, H. Katukam, D. Guvva. Early brain tumour detection in MRI using
enhanced segmentation approach. image, 8, 9, 2019.
21. L. Brunese, F. Mercaldo, A. Reginelli, A. Santone. Explainable deep learning for pul-
monary disease and coronavirus COVID-19 detection from X-rays. Computer Methods
and Programs in Biomedicine, 196, 105608, 2020.
22. A. Jacobi, M. Chung, A. Bernheim, C. Eber. Portable chest X-ray in coronavirus dis-
ease-19 (COVID-19): A pictorial review. Clinical Imaging, 64 (April), 35–42, 2020.
23. C. K. Reddy, C. C. Aggarwal. An introduction to healthcare data analytics. In
Healthcare Data Analytics, 2015, pp. 1–18.
24. S. Dalal, V. P. Vishwakarma. GA-based KELM optimization for ECG classification.
Procedia Computer Science, 167, (2019), 580–588, 2020.
25. M. Bansal, B. Gandhi. IoT & Big Data in Smart Healthcare (ECG Monitoring). In
2019 International Conference on Machine Learning, Big Data, Cloud and Parallel
Computing (COMITCon), 390–396, IEEE, 2019, February.
26. S. Dalal, V. P. Vishwakarma, V. Sisaudia. ECG classification using Kernel extreme
learning machine. In 2nd IEEE International Conference on Power Electronics,
Intelligent Control and Energy systems (ICPEICES-2018), pp. 988–992, 2018.
27. Barua, S., Ahmed, M. U., & Begum, S. Distributed multivariate physiological signal
analytics for drivers’ mental state monitoring. In International Conference on IoT
Technologies for HealthCare, pp. 26–33, Springer, Cham, 2017, October.
28. M. Neumann, O. Roesler, D. Suendermann-oeft, V. Ramanarayanan. On the utility
of audiovisual dialog technologies and signal analytics for real-time remote monitor-
ing of depression biomarkers. In Proceedings of First Workshop on Natural Language
Processing for Medical Conversations, pp. 47–52, 2020.
29. Belle, A., Thiagarajan, R., Soroushmehr, S. M., Navidi, F., Beard, D. A., & Najarian, K.
Big data analytics in healthcare. BioMed Research International, 2015, 2015.
30. Wang, Y., & Hajli, N. Exploring the path to big data analytics success in healthcare.
Journal of Business Research, 70, 287–299, 2017.
31. Shafqat, S., Kishwer, S., Rasool, R. U., Qadir, J., Amjad, T., & Ahmad, H. F. Big data
analytics enhanced healthcare systems: A review. The Journal of Supercomputing,
76(3), 1754–1799, 2020.
32. Wong, Z. S., Zhou, J., & Zhang, Q. Artificial intelligence for infectious disease big data
analytics. Infection, Disease & Health, 24(1), 44–48, 2019.
33. Koubâa, A. Understanding the covid19 outbreak: A comparative data analytics and
study. arXiv preprint arXiv:2003.14150, 2020.
34. Kucharski, A. J., Russell, T. W., Diamond, C., Liu, Y., Edmunds, J., Funk, S., ... &
Flasche, S. Early dynamics of transmission and control of COVID-19: A mathematical
modelling study. The Lancet Infectious Diseases, 20(5), 553–558, 2020.
35. Das, N., Das, L., Rautaray, S. S., & Pandey, M. Detection and prevention of HIV
aids using big data tool. In 2018 3rd International Conference for Convergence in
Technology (I2CT), pp. 1–5. IEEE, 2018, April.
36. Razzak, M. I., Imran, M., & Xu, G. Big data analytics for preventive medicine. Neural
Computing and Applications, 32(9), 4417–4451, 2020.
Data Analytics in Healthcare Systems 21
37. Panayides, A. S., Pattichis, M. S., Leandrou, S., Pitris, C., Constantinidou, A., &
Pattichis, C. S. Radiogenomics for precision medicine with a big data analytics per-
spective. IEEE Journal of Biomedical and Health Informatics, 23(5), 2063–2079, 2018.
38. Hulsen, T., Jamuar, S. S., Moody, A. R., Karnes, J. H., Varga, O., Hedensted, S., ... &
McKinney, E. F. From big data to precision medicine. Frontiers in Medicine, 6, 34,
2019.
39. D. Cirillo, A. Valencia. Big data analytics for personalized medicine. Current Opinion
in Biotechnology, 58, 161–167, 2019.
40. Gupta, S., & Tripathi, P. An emerging trend of big data analytics with health insurance
in India. In 2016 International Conference on Innovation and Challenges in Cyber
Security (ICICCS-INBUSH), pp. 64–69, IEEE, 2016, February.
41. Revels, S., Kumar, S. A., & Ben-Assuli, O. Predicting obesity rate and obesity-related
healthcare costs using data analytics. Health Policy and Technology, 6(2), 198–207,
2017.
42. Alotaibi, S., Mehmood, R., & Katib, I. The role of big data and twitter data analytics
in healthcare supply chain management. In Smart Infrastructure and Applications, pp.
267–279, Springer, Cham, 2020.
43. M. A. Pikkarainen. Data as a driver for shaping the practices of a preventive healthcare
service delivery network. Journal of Innovation Management, 1, 55–79, 2018.
44. M. Usak, M. Kubiatko, M. Salman. Health care service delivery based on the
Internet of things: A systematic and comprehensive study. International Journal of
Communication Systems, 33, 1–17, 2019.
View publication stats