0% found this document useful (0 votes)

25 views8 pages

Bda 22 - Merged

Uploaded by

Parth Vora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views8 pages

Bda 22 - Merged

Uploaded by

Parth Vora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

A Mini Project Report on

Health Care Analysis System

Submitted in partial fulfillment of the requirements for the

award of the degree of

Bachelor of Engineering

in
Computer Engineering
by
Kartikey Singh(21102096)
Suraj Yadav(21102120)
Parth Vora(21102007)

Under the Guidance of

Prof Shamika Mule

for
Big Data Analytics Lab (CSL7012)

Department of Computer Engineering

A.P. Shah Institute of Technology

G.B.Road,Kasarvadavli, Thane(W), Mumbai-400615
UNIVERSITY OF MUMBAI

Academic Year 2024-2025

Index

Page No.
Sr. No. Table of Contents

1 Introduction 1

Objectives 2
2

3 Scope 2

4 Summarizing the dataset. 3

5 Visualizing the dataset. 4

6 Algorithms Details 5

7 Result 6

8 References 7
1. Introduction

The MIMICIII (Medical Information Mart for Intensive Care III) is one of the
most comprehensive and largescale databases available for healthcare analytics,
containing healthrelated data associated with over 40,000 critical care patients
admitted to the Beth Israel Deaconess Medical Center between 2001 and 2012. This
database is not only a valuable resource for health professionals but also for
researchers and data scientists seeking to gain insights into patient care, treatment
patterns, and outcomes in critical care units. Given the richness and depth of data,
the MIMICIII dataset serves as a benchmark for numerous research studies in
health informatics, machine learning, and clinical decisionmaking.

The data in MIMICIII includes detailed patient information such as demographics,

vital signs, laboratory test results, procedures, medications, and textual clinical
notes. It covers various aspects of patient care, ranging from admission details,
diagnoses, and treatment procedures to discharge summaries, making it a valuable
resource for deriving insights into disease progression, treatment effectiveness, and
patient outcomes. The granularity of the data enables healthcare researchers to
perform deep analyses that can drive advancements in medical research, predict
patient outcomes, and identify potential areas for improving healthcare delivery.

In this project, we aim to leverage the power of Big Data analytics using Python
and PySpark to analyze the MIMICIII dataset. The analysis involves extracting
meaningful insights from different tables of the dataset, specifically focusing on
patients' diagnosis history, treatment procedures, and clinical notes. Through this
project, we aim to demonstrate how datadriven insights can significantly enhance
healthcare strategies, improve patient care, and contribute to better decisionmaking
processes in the medical field.

1
2. Objectives:
To explore and analyze the MIMICIII dataset to understand patient care
patterns in critical care units.
To extract and visualize key insights from the dataset using Big Data
analytics tools like PySpark and Python.
To identify the most common diagnoses and procedures and analyze their
relationships using the ICD9 coding system.
To create meaningful visualizations that help in understanding trends and
patterns in the health care data.
To demonstrate how big data analytics can contribute to improving
healthcare delivery, decisionmaking, and patient outcomes.

3. Scope:
Data Ingestion : Extracting data from structured CSV files stored on Google
Drive and loading them into PySpark DataFrames for efficient processing.
Data Preprocessing : Cleaning, transforming, and integrating data from
multiple tables (e.g., noteevents, diagnosis_icd, procedures_icd) to create a
comprehensive dataset suitable for analysis.
Data Analysis : Analyzing the data using PySpark and Python to identify
patterns, trends, and relationships among diagnoses, procedures, and patient
outcomes.
Data Visualization : Using visualization tools such as Matplotlib and Seaborn
to present the insights derived from the analysis in a clear and concise manner.
Outcome Prediction : Providing actionable insights that can help healthcare
professionals make informed decisions and improve patient care strategies.

2
4. Summarizing the Dataset:

1.noteevents Table
Columns Present: ROW_ID, SUBJECT_ID, HADM_ID, CHARTDATE,
CHARTTIME, STORETIME, CATEGORY, DESCRIPTION, CGID,
ISERROR, TEXT.
Columns Selected: SUBJECT_ID, HADM_ID, TEXT.
Description: This table contains over 2 million rows with clinical notes,
divided into sections such as admission date, discharge summary, history of
present illness, medications, allergies, and laboratory studies.
Purpose: Extract patient-related information, understand the patient's
medical journey, and analyze common trends in disease progression.

2. diagnosis_icd Table
Columns Present : ROW_ID, SUBJECT_ID, HADM_ID, SEQ_NUM,
ICD9_CODE.
Columns Selected : SUBJECT_ID, HADM_ID, ICD9_CODE.
Description : Contains around 651,000 rows with 6,984 unique diagnoses.
For each SUBJECT_ID and HADM_ID combination, patients can have
between 1 and 38 diagnoses, with SEQ_NUM denoting their relevance.
Purpose : Identify the most frequent diagnoses and study the relationships
between different diseases using the ICD-9 codes.

3. procedures_icd Table
Columns Selected : SUBJECT_ID, HADM_ID, ICD9_CODE.
Columns Present : ROW_ID, SUBJECT_ID, HADM_ID, SEQ_NUM,
ICD9_CODE.
Description : Contains around 240,000 rows with 2,009 unique procedure
codes.

3
6. Algorithms Details

Data Ingestion: Data from the MIMIC-III dataset was imported into
PySpark DataFrames using spark.read.csv.
Data Preprocessing:
Handled missing values with dropna().
Removed duplicate records using dropDuplicates().
Converted data types using cast().
Merged tables using PySpark's join() function.
Data Analysis:
Aggregated data using groupBy() and count() functions to identify top
diagnoses and procedures.
Generated insights on patient demographics, diagnoses, and treatments.
Data Visualization: Used Seaborn and Matplotlib to create bar plots,
heatmaps, and word clouds

4
Result:
1.The most frequent diagnoses were related to cardiovascular and respiratory
conditions.
2.Common procedures included respiratory intubation and mechanical
ventilation.
3.The heatmap showed strong correlations between certain diagnoses and
procedures, indicating typical treatment pathways.
4.Text analysis of clinical notes identified common themes such as
"hypertension," "diabetes," and "heart failure."

5
8. References

MIMIC-III Clinical Database: Available on PhysioNet (doi:

10.13026/C2XW26)
Choi et al., (2017). Medical Data Analysis with Big Data Technologies.
Xiao, Choi, & Sun (2018). Healthcare Data Analytics: Challenges and
Opportunities.

ANA Standards
100% (2)
ANA Standards
1 page
A Companion To Fish's Psychopathology
100% (1)
A Companion To Fish's Psychopathology
77 pages
Organization and Presentation of Psychiatric Information
No ratings yet
Organization and Presentation of Psychiatric Information
6 pages
ASD Screening and Diagnostic Tools and Techniques
100% (5)
ASD Screening and Diagnostic Tools and Techniques
4 pages
Phase 2
No ratings yet
Phase 2
6 pages
Exploring Object Centric Process Mining With MIMIC IV
No ratings yet
Exploring Object Centric Process Mining With MIMIC IV
26 pages
Midpresentation Report-2024
No ratings yet
Midpresentation Report-2024
21 pages
DA in Medicine (2
No ratings yet
DA in Medicine (2
16 pages
2 Johnson2023
No ratings yet
2 Johnson2023
9 pages
Healthcare Analytics On Patient Data Using Big Data Technologies For Disease Prediction and Readmission Analysis
No ratings yet
Healthcare Analytics On Patient Data Using Big Data Technologies For Disease Prediction and Readmission Analysis
6 pages
Hda Toc
No ratings yet
Hda Toc
35 pages
Unit - 1 Introduction To Health Care Analysis: What Happened?
No ratings yet
Unit - 1 Introduction To Health Care Analysis: What Happened?
22 pages
An Introduction To Healthcare Data Analytics
No ratings yet
An Introduction To Healthcare Data Analytics
18 pages
UserGuide PDF
No ratings yet
UserGuide PDF
76 pages
Fraud Detection in Finance Refers To The Process of Identifying and Preven - 20250215 - 153408 - 0000
No ratings yet
Fraud Detection in Finance Refers To The Process of Identifying and Preven - 20250215 - 153408 - 0000
56 pages
05 Healthcare Data Analytics
No ratings yet
05 Healthcare Data Analytics
16 pages
Published Paper Idris
No ratings yet
Published Paper Idris
17 pages
RP Oose Ia1-Draft-6
No ratings yet
RP Oose Ia1-Draft-6
9 pages
2 - Clinical Data Lecture
No ratings yet
2 - Clinical Data Lecture
24 pages
(Ibm) 2390
No ratings yet
(Ibm) 2390
5 pages
Chapter 7 Healthcare Data Analytics
No ratings yet
Chapter 7 Healthcare Data Analytics
31 pages
Data Analytics in Medical Data Processing
No ratings yet
Data Analytics in Medical Data Processing
12 pages
Big Data Fraud Health Care
No ratings yet
Big Data Fraud Health Care
71 pages
BDA Miniproject
No ratings yet
BDA Miniproject
5 pages
Health Care Chapter - Big Data
No ratings yet
Health Care Chapter - Big Data
39 pages
Summary 2
No ratings yet
Summary 2
75 pages
Shantanu Main Project
No ratings yet
Shantanu Main Project
41 pages
Health Big Data Analytics: A Technology Survey
No ratings yet
Health Big Data Analytics: A Technology Survey
18 pages
Ibm PROJECT 1 1 Output
No ratings yet
Ibm PROJECT 1 1 Output
10 pages
Report
No ratings yet
Report
11 pages
Introduction To Healthcare Policy
No ratings yet
Introduction To Healthcare Policy
43 pages
Smart Health Disease Prediction Django
No ratings yet
Smart Health Disease Prediction Django
41 pages
IJRTI2404048
No ratings yet
IJRTI2404048
6 pages
Hca 1
No ratings yet
Hca 1
71 pages
1 2016 Johnson MIMICIII
No ratings yet
1 2016 Johnson MIMICIII
9 pages
MIMIC Extract Paper
No ratings yet
MIMIC Extract Paper
14 pages
Exploring Data Analytics in The Healthcare Industry For Improved Patient Care
No ratings yet
Exploring Data Analytics in The Healthcare Industry For Improved Patient Care
10 pages
PHD Thesis Topic Presentation MGMIHS DrDevTaneja 08.07.2025
No ratings yet
PHD Thesis Topic Presentation MGMIHS DrDevTaneja 08.07.2025
17 pages
Health Care Data Analytics
No ratings yet
Health Care Data Analytics
15 pages
Mid Term Evaluation
No ratings yet
Mid Term Evaluation
19 pages
TFM Miguel Perez Mateo
No ratings yet
TFM Miguel Perez Mateo
54 pages
Data Science in Healthcare
No ratings yet
Data Science in Healthcare
5 pages
3-Artificial Intelligence in Healthcare
No ratings yet
3-Artificial Intelligence in Healthcare
74 pages
Scribd 4
No ratings yet
Scribd 4
14 pages
Clinical Trial Management – an Overview
From Everand
Clinical Trial Management – an Overview
Editor IJSMI
No ratings yet
Cutting-Edge AI and ML Technological Solutions: Healthcare Industry
From Everand
Cutting-Edge AI and ML Technological Solutions: Healthcare Industry
Zemelak Goraga
No ratings yet
TCFL Projects Proposal Outline 2025 (1) .PPTX Anotidaishe
No ratings yet
TCFL Projects Proposal Outline 2025 (1) .PPTX Anotidaishe
5 pages
d8 Group Finalllllllllllllllllllllllllllllllllll
No ratings yet
d8 Group Finalllllllllllllllllllllllllllllllllll
74 pages
Aiport
No ratings yet
Aiport
11 pages
CHI4010 - HEALTHCARE-DATA-ANALYTICS - LP - 1.0 - 10 - CHI4010 - HEALTHCARE-DATA-ANALYTICS - LP - 1.0 - 1 - Healthcare Data Analytics
No ratings yet
CHI4010 - HEALTHCARE-DATA-ANALYTICS - LP - 1.0 - 10 - CHI4010 - HEALTHCARE-DATA-ANALYTICS - LP - 1.0 - 1 - Healthcare Data Analytics
2 pages
Health Data Analytics And Informatics
From Everand
Health Data Analytics And Informatics
Mbuso Mabuza
No ratings yet
Final Mini Project PPT (d8)
No ratings yet
Final Mini Project PPT (d8)
15 pages
Lecture - Leveraging AI in Medical Data Analytics and Interpretation
No ratings yet
Lecture - Leveraging AI in Medical Data Analytics and Interpretation
22 pages
Critical Care Data Preprocessing Report Detailed
No ratings yet
Critical Care Data Preprocessing Report Detailed
7 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Prashant Detailed Document
No ratings yet
Prashant Detailed Document
18 pages
Health Care System Analysispdf
No ratings yet
Health Care System Analysispdf
19 pages
CapstoneProject Presentation 71710004 71710069 71710106
No ratings yet
CapstoneProject Presentation 71710004 71710069 71710106
23 pages
The Story of MIMIC: Roger Mark
No ratings yet
The Story of MIMIC: Roger Mark
7 pages
SYSnopsis Final
No ratings yet
SYSnopsis Final
4 pages
Data Management in Healthcare
No ratings yet
Data Management in Healthcare
35 pages
Natural Language Understanding
No ratings yet
Natural Language Understanding
14 pages
.ELECTIVE FORMAThealthcare
No ratings yet
.ELECTIVE FORMAThealthcare
3 pages
Final Mini Project PPT (d8) PDF
No ratings yet
Final Mini Project PPT (d8) PDF
29 pages
Mary C. Gomez, MD, DPBP, FPPA Child, Adolescent, Adult Psychiatrist
No ratings yet
Mary C. Gomez, MD, DPBP, FPPA Child, Adolescent, Adult Psychiatrist
66 pages
Automatic Transmission: Section
No ratings yet
Automatic Transmission: Section
326 pages
An Adolescent Case With Internet Addiction and Hacking PDF
No ratings yet
An Adolescent Case With Internet Addiction and Hacking PDF
2 pages
Artificial Intelligence Within Medical Diagnostics
No ratings yet
Artificial Intelligence Within Medical Diagnostics
19 pages
Conversion Disorder, Functional Neurological Symptom Disorder, and Chronic Pain - Comorbidity, Assessment, and Treatment
No ratings yet
Conversion Disorder, Functional Neurological Symptom Disorder, and Chronic Pain - Comorbidity, Assessment, and Treatment
10 pages
Breast Ultrasound
No ratings yet
Breast Ultrasound
1,023 pages
Admission Procedure
50% (2)
Admission Procedure
4 pages
BDP
No ratings yet
BDP
23 pages
Donabedian 1988
No ratings yet
Donabedian 1988
6 pages
!separation Anxiety Disorder in Children and Adolescents
No ratings yet
!separation Anxiety Disorder in Children and Adolescents
24 pages
Abdominal Trauma Case Report by Slidesgo
No ratings yet
Abdominal Trauma Case Report by Slidesgo
40 pages
Case Study Hospital Format
0% (1)
Case Study Hospital Format
3 pages
Thesis Statement Examples Alzheimers Disease
100% (2)
Thesis Statement Examples Alzheimers Disease
6 pages
Medical-Surgical Nursing Assessment and Management of Clinical Problems 9e Chapter 59
100% (1)
Medical-Surgical Nursing Assessment and Management of Clinical Problems 9e Chapter 59
12 pages
Reflective Writing
No ratings yet
Reflective Writing
2 pages
Tumor Marker Tests - Cancer
No ratings yet
Tumor Marker Tests - Cancer
4 pages
Psychopathology I
No ratings yet
Psychopathology I
12 pages
KNNPVFaulty Identification Algorithm
No ratings yet
KNNPVFaulty Identification Algorithm
29 pages
Patient - Info/doctor/generalised Anxiety Disorder Assessment Gad 7
No ratings yet
Patient - Info/doctor/generalised Anxiety Disorder Assessment Gad 7
2 pages
Allianz Global Insurance
No ratings yet
Allianz Global Insurance
51 pages
Idsr Training
No ratings yet
Idsr Training
30 pages
2 OPD Request Form - MSword
No ratings yet
2 OPD Request Form - MSword
3 pages
Community Health Nursing Process
0% (1)
Community Health Nursing Process
23 pages
Somatic Symptom Disorder
No ratings yet
Somatic Symptom Disorder
7 pages
Cases in Differential Diagnosis For The Physical and Manipulative Therapies, 1st Edition Instant Access
100% (20)
Cases in Differential Diagnosis For The Physical and Manipulative Therapies, 1st Edition Instant Access
17 pages
FAQ Ivd2024
No ratings yet
FAQ Ivd2024
21 pages

Bda 22 - Merged

Uploaded by

Bda 22 - Merged

Uploaded by

A Mini Project Report on

Health Care Analysis System

Submitted in partial fulfillment of the requirements for the

Under the Guidance of

Department of Computer Engineering

A.P. Shah Institute of Technology

Academic Year 2024-2025

4 Summarizing the dataset. 3

5 Visualizing the dataset. 4

The data in MIMICIII includes detailed patient information such as demographics,

MIMIC-III Clinical Database: Available on PhysioNet (doi:

You might also like