67% found this document useful (3 votes)
5K views23 pages

Final PPT Heart Disease

This document presents a project on predicting heart disease using machine learning techniques. It discusses collecting heart disease data from an online source, understanding and preprocessing the data which includes checking data types, missing values, and duplicates. Various machine learning models - logistic regression, random forest, and neural networks - are built and their performance is compared using confusion matrices and accuracy scores. The models achieved around 93% accuracy. The document concludes the machine learning approaches were effective for heart disease prediction and discusses potential applications and future extensions of the project.

Uploaded by

nithish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
67% found this document useful (3 votes)
5K views23 pages

Final PPT Heart Disease

This document presents a project on predicting heart disease using machine learning techniques. It discusses collecting heart disease data from an online source, understanding and preprocessing the data which includes checking data types, missing values, and duplicates. Various machine learning models - logistic regression, random forest, and neural networks - are built and their performance is compared using confusion matrices and accuracy scores. The models achieved around 93% accuracy. The document concludes the machine learning approaches were effective for heart disease prediction and discusses potential applications and future extensions of the project.

Uploaded by

nithish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

UNIVERSITY COLLEGE OF SCIENCE

. OSMANIA
UNIVERSITY
DEPARTMENT OF STATISTICS

D ATA M O D E L I N G USING
M A C H I N E LEARNING
TECHNIQUES
HEART DISEASE
PREDICTION
PRESENTED BY
Mr. PUDATHU NITHISH KUMAR
Mr. POLOJU SATHEESH
Mr. B RAMESH

Under the Supervision of


Prof. N.Ch. Bhatracharyulu
Agenda
• Introduction
• Data Collection
• Data Understanding
• Data Preprocessing
• Data Visualization
• Model Building
• Applications
• Future Scope
• Conclusion
INTRODUction
 This project is related to " Heart Disease prediction".
 Heart disease is the  leading  cause  of  death  in  the  world  over the past 10
years.        
Several different symptoms are associated with heart disease, which makes it difficult
to diagnose it quicker and better.
This issue can be resolved by machine learning techniques. 

Problem Statement

 There are instruments available which can predict heart disease but either
they are expensive or are not efficient to calculate chance of heart disease in
human.
Problem solution
Early detection of cardiac diseases can decrease the mortality rate and
overall complications. By using machine learning algorithms we can predict
the Heart disease or Heart attack based on different symptoms.
Required packages and libraries
Data collection
• we have collected the data of heart disease

• We have extracted the data from the below website  

https
://www.kaggle.com/datasets/alexteboul/heart-disease-
health-indicators-dataset
 
Dataset
Data Understanding
• Data understanding process includes collecting and exploring the data.

• The data we have collected consist of 1000 observations with 22 attributes.

The attributes are :

 High Blood Pressure 


 Diabetes
 High Cholesterol
 Body Mass Index (BMI)
 Smoking 
Age
Sex
Physical Activity  
 
 
 
Diet  Alcohol Consumption
Stroke 
Diabetes 
Health Care 
Health General and Mental Health 
Education 
Heart disease (label): 1=Presence of heart disease 0=Absence of
heart disease
Understanding Data using Descriptive Statistics:
 Measures of central Tendency and Measures of Dispersion
 Visual description
Data pre-processing

Checking the data type


We need to check whether the data is in Numerical form or not .
If the data is not in numerical then we need to transform the data
into numerical form

Checking for missing values:

we need to check if the data contains any missing values …..

If there is any missing values found in the dataset, we have to


follow Imputation ,Deletion,Prediction.

We can detect missing values visually as shown in the figure. In


the figure there is no missing values.
,
Checking the data type Checking the missing values

The data is in float i.e., numerical form There is no missing values in the dataset.
so we need to transform the data type.
Checking and dropping the Duplicate records

We need to check Duplicate records in the data set if any values are detected duplicate
We need to drop such values

Before

After
Data visualization

The data can be understand visually by plotting the graphs like Box,Count,Bar chart and pie
charts.
This is the Heat map or correlation plot which shows the relation between one or more variables.
Here the graph shows that there is No Multi collinear points.
Model Building
The data set is trained and tested within 3 methods
1) Logistic Regression
2) Random Forest Classification
3) Neural Networks Classification

1. Logistic Regression/Classification:-

Logistic regression falls under the category of supervised learning; it


measures the relationship between the dependent variable which is
categorical with one or more than one independent variables by estimating
probabilities using a logistic/sigmoid function. Logistic regression can
generally be used when the dependent variable is Binary or Dichotomous. It
means that the dependent variable can take only two possible values like
“Yes or No”, “Living or dead”. 
Random Forest Classifier :
Random Forest is a tree based classification algorithm.
As the name indicates , the algorithm creates a forest with a large number of trees.
It creates a set of decision trees from a random sample of the training set.

Neural Networks Classifier

A method of computing , based on the interaction of multiple


connected processing elements.
A powerful technique to solve many real world problems.
The ability to deal with incomplete information.
Confusion Matrices of all the three Models
Test scores and Comparision of All models
Applications

 Medical Institutions
To teach the medical students how the heart attack or heart disease is
measured and how to identify that the person is suffering from Heart disease

 Hospitals
To detect that the person is having Heart Disease or not.
Conclusion

The machine learning algorithms is used to this project is Logistic


Regression and Neural Networks Model .
After training and testing the data we calculated the confusion Matrix
and accuray score of each algorithm then we get the accuracy score
of all models whatever we applied is same and score is 92.77 which
concludes that all the three algorithms performes same of this
project and also concludes that approxiametely 93% we have
predicted correctly and rest of 7% was failed to predict correctly.
Future scope of this project
The correct prediction of heart disease can prevent life threats, and
incorrect prediction can prove to be fatal at the same time

In this project different machine learning algorithms are applied to


compare the results and analysis of the machine learning

Heart disease dataset is a dataset which would help to provide better


outcomes and helps health professionals in predicting heart disease
efficiently and effectively
Thank you

You might also like