Slides

Uploaded by

08. Aditya Ganges Swagota Bera

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Slides

Uploaded by

08. Aditya Ganges Swagota Bera

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

A Study of Disease Diagnosis using

Machine Learning
Samin Poudel*
*ComputationalData Science and Engineering, North Carolina A&T State University,
Greensboro, NC 27409

1
Presentation follows as below:

➢ Introduction
➢ Data, Algorithms and Methods
➢ Result and Discussion
➢ Conclusion and Future Work

2
Introduction

• Artificial Intelligence (AI)

and Machine Learning (ML)
is successfully applied to
practically in every domain
like robotics, education,
travel to health care
• Various applications of ML
in healthcare as shown in
the Figure 1
• In this past decade, the
investment in AI in Figure 1: Applications of Machine Learning in Healthcare*
healthcare applications has
increased significantly
3

*https://data-flair.training/blogs/machine-learning-in-healthcare/
Introduction

• The analysis of the clinical data can lead to the timely diagnosis of the
disease which will help to start cure for the patient in time as well
• Traditional approach of diagnosing disease is generally costly and time
consuming
• ML techniques have not only been able to diagnose the common
diseases but are also equally capable of diagnosing the rare diseases
• In general, a dataset table used to build a ML model for diagnosing a
disease have columns for different attributes and a column variable for
the class variable

4
Introduction
Problem Statement:
• Accuracy of the ML in diagnosing the diseases is still a concern
• Improvement in the performance of ML to diagnose disease is a hot topic in
healthcare domain
• Different ML approach perform differently for different healthcare dataset
• Need to find the way to apply many state of art algorithms to same dataset
in reasonable time with minimal lines of codes, so that the search of best
ML method can be pursued efficiently to diagnose a particular disease

Probable Solution:
• The use of libraries like AutoGluon can help to find the best performing ML
approach out of many ML approaches in diagnosing the disease for a given
dataset with optimal lines of codes.
5
Data, Algorithms and Methods
Data:
• Dataset Used: Pima Indian Diabetes
• This data set has 8 attributes and one class variable named Outcome.
• Outcome variable has value of 0 or 1, 1 means tested positive for diabetes
• The dataset has 768 instances, 268 instances are tested positive for diabetes
Table 1. Statistical description of Data based on Attributes
Diabetes
Pregnancies Glucose Blood Pressure Skin Thickness Insulin BMI Pedigree Age
Function

Count 768 768 768 768 768 768 768 768

Mean 3.85 120.89 69.10 20.57 79.79 31.99 0.47 33.24

std 3.37 31.97 19.35 15.95 115.244 7.88 0.33 11.76

min 0 0 0 0 0 0 0.078 21

25% (Q1) 1 99 62 0 0 27.3 0.24 24

50% (Q2) 3 117 72 23 30.5 32 0.37 29

75% (Q3) 6 140.25 80 32 127.25 36.6 0.63 41 6

max 17 199 122 99 846 67.1 2.42 81.0

Data, Algorithms and Methods
Data:
• Data Exploratory
Visualization
showed that ML
models can be built
without
preprocessing of
the data
• Every attribute may
be important for the
disease diagnosis
with Machine
Learning Figure 2: Histogram of Attributes
7
Data, Algorithms and Methods
Machine Learning Algorithms Used:
• 20 Machine Learning Algorithms are used by importing from scikit-learn and
AutoGluon Libraries in AWS SageMaker
Table 2. List of ML Algorithms Used

Library Number of ML
ML Algorithm
approaches
Scikit-Learn Random Forest Classifier, Decision Tree Classifier, Naïve Bayes Classifier, Perceptron, 6
Multilayer Perceptron, Voting Classifier

WeightedEnsemble_L2, LightGBM_BAG_L1, LightGBM_LARGE_BAG_L1,

NeuralNetFastAI_BAG_L1, CATBoost_BAG_L1, ExtraTreesGini_BAG_L1,
AutoGluon LightGBMXT_BAG_L1, XGBoost_BAG_L1, RandomForestEntr_BAG_L1, 14
RandomForestGini_BAG_L1, ExtraTreesEntr_BAG_L1, NeuralNetMXNet_BAG_L1,
KNeighborsUnif_BAG_L1, KNeighborsDist_BAG_L1

8
Data, Algorithms and Methods

• Overview of Methodology:
• Data Loaded to Amazon SageMaker’s Jupyter Instance
• Data Spitted to Training and Test set
• Machine Learning Algorithms trained and tested using scikit-learn and
AutoGluon Library
• Training and Test set for each of the ML algorithm should be same for
reasonable comparable among them. It was achieved by defining
random seed while splitting data into training and test sets
• Evaluation of ML algorithms to diagnose diabetes are performed using
classification metrics Accuracy, Precision, Recall and F1-score
• Detailed Implementation of the ML algorithms is in authors’ GitHub page
9
Result and Discussion
Evaluation of ML Algorithms:
• Although being a classical ML algorithm, Naive Bayes performed better among
the ML algorithms, based on combined analysis of all the evaluation metrics

Table 3. Evaluation of ML Algorithms

S. N ML Algorithm Accuracy F1-score Precision Recall

1 Random Forest Classifier (Scikit-learn) 0.74 0.81 0.78 0.84

2 Decision Tree Classifier (Scikit-learn) 0.65 0.73 0.73 0.73
3 Naïve Bayes Classifier (Scikit-learn) 0.77 0.83 0.80 0.86
4 Perceptron (Scikit-learn) 0.49 0.47 0.71 0.35
5 Multilayer Perceptron (Scikit-learn) 0.68 0.76 0.75 0.77
6 Voting Classifier (Scikit-learn) 0.72 0.78 0.79 0.77
7 AutoGluon Best Performer 0.74 0.82 0.76 0.88
10
Result and Discussion
• Accuracy performance of different AutoGluon ML algorithms when trained with accuracy as validation
metric is in Figure 3a. Similarly, performance in terms of F1-scores is shown when trained with F1-scores
as validation metric in Figure 3b
• Weighted Ensemble_L2 ML technique performs better for both the cases and KNN based ML has the least
performance for both the cases

Figure 3. (a) Evaluation of AutoGluon ML algorithms when trained with accuracy as validation metric 11

(b) Evaluation of AutoGluon ML algorithms when trained with F1-score as validation metric
Conclusion and Future Work

Conclusion:
• Libraries like AutoGluon help comparing performances of many ML approaches in diagnosing a
disease for a given dataset with optimal lines of code.
• This helps in finding the best performing ML algorithm for a particular dataset or a particular
type of disease as well. And it decreases the probability of inaccurate diagnosis, which is a
significantly important consideration while dealing with the health of the people.
• Performance of 20 ML approaches in diagnosing diabetes based on the Pima Indian Diabetes
Dataset tested
• For the data set considered, Naïve Bayes algorithm performed better among the other
algorithms. This shows that using the complex and computationally costly algorithms not
necessarily improve the accuracy of diagnosing a disease.

Future Work:
• The possibility of the improvement in the performance of ML models in future can be started by
finding the correlation among each attribute and dropping the highly correlated attributes.
Because the highly correlated attributes can confuse a model in the learning phase. 12
Thank you

THANK YOU

Drymax E60 EN V1 6
No ratings yet
Drymax E60 EN V1 6
53 pages
Legendary Og
No ratings yet
Legendary Og
1 page
Machine_Learning_for_Medical_and_Healthcare_Data_Analysis_and_Modelling
No ratings yet
Machine_Learning_for_Medical_and_Healthcare_Data_Analysis_and_Modelling
6 pages
Literature survey paper on Comparative Analysis of Diabetics Prediction Systems using Machine Learning Algorithms
No ratings yet
Literature survey paper on Comparative Analysis of Diabetics Prediction Systems using Machine Learning Algorithms
4 pages
Performance Evaluation of Different Machine Learni
No ratings yet
Performance Evaluation of Different Machine Learni
28 pages
Base Paper
No ratings yet
Base Paper
4 pages
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
No ratings yet
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
12 pages
22comparative Analysis of Machine Learning Algorithms For Diabetes Prediction Using Real-Time Data-Set
No ratings yet
22comparative Analysis of Machine Learning Algorithms For Diabetes Prediction Using Real-Time Data-Set
5 pages
Minor Project FINAL Review in Signal Processing Domain: Supervisor
No ratings yet
Minor Project FINAL Review in Signal Processing Domain: Supervisor
32 pages
final project
No ratings yet
final project
25 pages
CIEA_Term_Project
No ratings yet
CIEA_Term_Project
19 pages
Diabetes Prediction Using Machine Learning Algorithms and Ontology
No ratings yet
Diabetes Prediction Using Machine Learning Algorithms and Ontology
19 pages
review
No ratings yet
review
5 pages
A Comprehensive Review For Chronic Disease Prediction Using Machine Learning Algorithms
No ratings yet
A Comprehensive Review For Chronic Disease Prediction Using Machine Learning Algorithms
28 pages
Diabetes Disease Prediction Using Significant Attribute Selection and Classification Approach
No ratings yet
Diabetes Disease Prediction Using Significant Attribute Selection and Classification Approach
37 pages
Experimental Disease Prediction Research On Combining Natural Language Processing and Machine Learning
No ratings yet
Experimental Disease Prediction Research On Combining Natural Language Processing and Machine Learning
6 pages
[IJCST-V13I2P2]:Seema Saroj, Sakshi Sahu, Sanjana Patel, Suraj Sahu
No ratings yet
[IJCST-V13I2P2]:Seema Saroj, Sakshi Sahu, Sanjana Patel, Suraj Sahu
2 pages
Machine Learning Libro2
No ratings yet
Machine Learning Libro2
246 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
15 pages
Machine Learning Algorithms in Healthcare A Litterature Survey
No ratings yet
Machine Learning Algorithms in Healthcare A Litterature Survey
7 pages
Article 6
No ratings yet
Article 6
11 pages
Edited - Django Website For Disease Prediction Using Machine Learning
No ratings yet
Edited - Django Website For Disease Prediction Using Machine Learning
7 pages
TEAM_03
No ratings yet
TEAM_03
21 pages
Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset
No ratings yet
Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset
12 pages
Artificial Intelligence Approaches For Predicting Diabetes in Egypt
No ratings yet
Artificial Intelligence Approaches For Predicting Diabetes in Egypt
19 pages
Lec_2☑️
No ratings yet
Lec_2☑️
23 pages
Internshippppp Fimnalllll
No ratings yet
Internshippppp Fimnalllll
16 pages
Diabetes Analysis and Prediction Using R
No ratings yet
Diabetes Analysis and Prediction Using R
9 pages
PM For Diabetes
No ratings yet
PM For Diabetes
11 pages
The Significance of Machine Learning in Clinical Disease Diagnosis: A Review
No ratings yet
The Significance of Machine Learning in Clinical Disease Diagnosis: A Review
8 pages
DiseasePredReport (3) (1)
No ratings yet
DiseasePredReport (3) (1)
42 pages
Data Science Paper
No ratings yet
Data Science Paper
8 pages
Mini Project Report
No ratings yet
Mini Project Report
21 pages
Weka Project1 Sajeena
No ratings yet
Weka Project1 Sajeena
14 pages
phython 3
No ratings yet
phython 3
10 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
28 pages
Deepika_Disease Prediction Using Machine Learning
No ratings yet
Deepika_Disease Prediction Using Machine Learning
3 pages
JCM 08 01050 PDF
No ratings yet
JCM 08 01050 PDF
13 pages
Exposys Data Labs Diabetes Disease Prediction: Shilpa J Shetty Nishma Nayana
No ratings yet
Exposys Data Labs Diabetes Disease Prediction: Shilpa J Shetty Nishma Nayana
13 pages
Review 2 Final
No ratings yet
Review 2 Final
27 pages
final PPT
No ratings yet
final PPT
44 pages
AAMMP
No ratings yet
AAMMP
15 pages
Project Synopsis - Machine Learning in Disease Prediction
No ratings yet
Project Synopsis - Machine Learning in Disease Prediction
5 pages
20BCE7620 AP2021228000397 Experiment-6 Removed
No ratings yet
20BCE7620 AP2021228000397 Experiment-6 Removed
19 pages
Disease Detection Using ML
100% (8)
Disease Detection Using ML
24 pages
Early Diabetic Risk Prediction Using Machine Learning Classification Techniques
No ratings yet
Early Diabetic Risk Prediction Using Machine Learning Classification Techniques
6 pages
TSP_CMC_14604
No ratings yet
TSP_CMC_14604
19 pages
Diagnosis of Diabetes Using Machine Learning
No ratings yet
Diagnosis of Diabetes Using Machine Learning
12 pages
E3sconf Icmpc2023 01051
No ratings yet
E3sconf Icmpc2023 01051
10 pages
TSP Csse 31761
No ratings yet
TSP Csse 31761
17 pages
Ramesh 2019
No ratings yet
Ramesh 2019
13 pages
Diabetes Disease Prediction Using Machine Learning Techniques
No ratings yet
Diabetes Disease Prediction Using Machine Learning Techniques
7 pages
thesis presentation
No ratings yet
thesis presentation
22 pages
Epidemics vs. Pandemics (1)
No ratings yet
Epidemics vs. Pandemics (1)
15 pages
Cross Domain Sentiment Analysis
No ratings yet
Cross Domain Sentiment Analysis
17 pages
diseaseppt
No ratings yet
diseaseppt
18 pages
Miniproject Report
No ratings yet
Miniproject Report
11 pages
Multiple Disease Prediction
No ratings yet
Multiple Disease Prediction
18 pages
Research - Paper (1) (AutoRecovered)
No ratings yet
Research - Paper (1) (AutoRecovered)
5 pages
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
No ratings yet
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
12 pages
Introduction to Machine Learning and Neural Classification
From Everand
Introduction to Machine Learning and Neural Classification
Trilokesh Khatri
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
03 MSC Syllabus 2023 2025
No ratings yet
03 MSC Syllabus 2023 2025
72 pages
R A G: T D T I E T Y T: Epresentation Lignment For Eneration Raining Iffusion Ransformers S Asier HAN OU Hink
No ratings yet
R A G: T D T I E T Y T: Epresentation Lignment For Eneration Raining Iffusion Ransformers S Asier HAN OU Hink
36 pages
L Rag: S F R - A G: Ight Imple and AST Etrieval Ugmented Eneration
No ratings yet
L Rag: S F R - A G: Ight Imple and AST Etrieval Ugmented Eneration
16 pages
P F M E V G M: Yramidal LOW Atching For Fficient Ideo Enerative Odeling
No ratings yet
P F M E V G M: Yramidal LOW Atching For Fficient Ideo Enerative Odeling
23 pages
Week 2 - The Scientific Method: DEFINE The SCIENTIFIC METHOD in Your Own Words? Maximum of 3 Sentences Only
No ratings yet
Week 2 - The Scientific Method: DEFINE The SCIENTIFIC METHOD in Your Own Words? Maximum of 3 Sentences Only
2 pages
Anti-Theft Immobilizer
No ratings yet
Anti-Theft Immobilizer
43 pages
MLIST - Process Document
No ratings yet
MLIST - Process Document
50 pages
End User Procedure (EUP) General Analyses ME80FN: Purpose
No ratings yet
End User Procedure (EUP) General Analyses ME80FN: Purpose
10 pages
Robowars: 1. Problem Statement
No ratings yet
Robowars: 1. Problem Statement
6 pages
Group 7 Final Project
No ratings yet
Group 7 Final Project
14 pages
RR 01 Artificial Intelligence
No ratings yet
RR 01 Artificial Intelligence
14 pages
Subject and Verb Agreement
No ratings yet
Subject and Verb Agreement
10 pages
Workout Thesis Statement
100% (3)
Workout Thesis Statement
8 pages
Mawusi's Resume
No ratings yet
Mawusi's Resume
4 pages
Ksi Pietro
No ratings yet
Ksi Pietro
4 pages
Nueva Ecija University of Science and Technology
No ratings yet
Nueva Ecija University of Science and Technology
3 pages
SCHOOL YEAR 2022 - 2023 3 Preliminary Examination in
No ratings yet
SCHOOL YEAR 2022 - 2023 3 Preliminary Examination in
3 pages
Em330 Em340 Et330 Et340 CP
No ratings yet
Em330 Em340 Et330 Et340 CP
19 pages
Exam Ni Practical Research 2
No ratings yet
Exam Ni Practical Research 2
3 pages
0 Subiect XI A Final
No ratings yet
0 Subiect XI A Final
3 pages
Scientific Accuracy of Quran
No ratings yet
Scientific Accuracy of Quran
1 page
Bachelor of Industrial Technology Drafting Technology
No ratings yet
Bachelor of Industrial Technology Drafting Technology
4 pages
Acct Statement XX6672 21032022
No ratings yet
Acct Statement XX6672 21032022
39 pages
Publi 4156
No ratings yet
Publi 4156
170 pages
A Homegrown Economic Reform Agenda - A Pathway To Prosperity - Public Version - March 2020
No ratings yet
A Homegrown Economic Reform Agenda - A Pathway To Prosperity - Public Version - March 2020
42 pages
02-08-2016 Non Life Products of 2015-16, Aug 2016 Attachment-1
No ratings yet
02-08-2016 Non Life Products of 2015-16, Aug 2016 Attachment-1
5 pages
Conflict Management Techniques: Forcing
No ratings yet
Conflict Management Techniques: Forcing
10 pages
Stoneridge, Inc., Control Devices Division
No ratings yet
Stoneridge, Inc., Control Devices Division
4 pages
Password Reset SB A2plus B1
No ratings yet
Password Reset SB A2plus B1
26 pages
The Revised Blooms Taxonomy ASSESSMENT
No ratings yet
The Revised Blooms Taxonomy ASSESSMENT
21 pages
3-Question Paper Mgt1051 - Question Bank Fat
No ratings yet
3-Question Paper Mgt1051 - Question Bank Fat
3 pages
Econometrics Term Paper Example
100% (1)
Econometrics Term Paper Example
8 pages