EDA Presentation
EDA Presentation
• The skewness score of -0.078 for Age indicates that the distribution of ages is slightly left-skewed.
This means that there are slightly more people with higher ages than lower ages
• The skewness score of 1.5722 for average glucose level indicates that the distribution of glucose levels is
slightly right-skewed. This means that there are slightly more people with lower glucose levels than
higher glucose levels
• The skewness score of 1.076 for BMI indicates that the distribution of BMIs is slightly right-
skewed. This means that there are slightly more people with lower BMIs than higher BMIs.
Univariate Analysis of Numerical Variables
● Age: The mean age of individuals in the dataset is 45.68 years with a standard deviation of 20.83 years.
The age range is from 5 to 82 years. The majority of the individuals (50%) are between the ages of 29 to
62 years.
● Avg_glucose_level: The mean average glucose level is 106.15 mg/dL with a standard deviation of 45.28
mg/dL. The glucose levels range from 55.12 mg/dL to 271.74 mg/dL. The distribution seems to be slightly
skewed to the right.
● BMI: The mean BMI of individuals in the dataset is 28.89 kg/m2 with a standard deviation of 7.85 kg/m2.
The BMI values range from 10.30 kg/m2 to 97.60 kg/m2. The majority of the individuals (50%) have a BMI
between 23.5 to 33.1 kg/m2.
Analyze Relationship Between Numerical Variables
The correlation between age and average glucose level (0.23) is stronger compared to age and
BMI (0.21), while the correlation between BMI and average glucose level (0.17) is relatively
weaker
Univariate Analysis For Categorical Variables
● There are 2969 (59.0%) females and 2097 (41.0%) males in the dataset.
● Majority of the individuals (90.30%) in the dataset do not have hypertension, while 9.70% have
hypertension.
● Similarly, the majority of the individuals (94.6%) in the dataset do not have heart disease, while 5.4%
have heart disease.
● The majority of the individuals (65.6%) in the dataset are married, while 34.4% are not married.
● The most common work type is private (57.20%), followed by self-employed (24.91%), children (6.96%),
government jobs (6.57%), and never worked (4.36%).
● The majority of the individuals (51.4%) in the dataset reside in urban areas, while 48.6% reside in rural areas.
● Non-smokers are the most common smoking status (37.0%), followed by unknown (30.2%), formerly smoked
(17.3%), and smokers (15.4%).
● The majority of the individuals (95.1%) in the dataset have not had a stroke, while only 4.9% have had a
stroke.
Bivariate Analysis for Categorical Variables and Target
Does the type of work or residence (Urban vs Rural) have any effect on the incidence of
stroke?
Individuals who are self-employed have a higher risk of experiencing a stroke compared to those in private or
government jobs.
There is no significant difference in the incidence of stroke between individuals residing in urban or rural areas.
Does having hypertension or heart disease increase the likelihood of having stroke?
Having pre-existing conditions such as hypertension increases the risk of experiencing a stroke.
Having pre-existing conditions such as heart disease increases the risk of experiencing a stroke.
Does smoking or marital status have any association with stroke?
Smoking increases the risk of experiencing a stroke compared to being a former smoker or a non-smoker.
Married individuals have a higher risk of experiencing a stroke compared to those who are not married.
Analyze Relationship Between Numerical and Target Variable
How does age relate to stroke ? Are Older people more likely to have stroke?
There are more chances of having heart stroke whose bmi is between 20-37.
❖ Conclusion
➢ Based on our analysis on the heart stroke dataset, we have identified the following five factors
that affect the incidence of stroke:
➢ Pre-existing conditions: Individuals with pre-existing conditions such as hypertension and heart
disease have a higher risk of experiencing a stroke.
➢ Smoking: Smoking increases the risk of experiencing a stroke compared to being a former smoker or
a non-smoker.
➢ Employment: Individuals who are self-employed have a higher risk of experiencing a stroke
compared to those in private or government jobs.
➢ Marital status: Married individuals have a higher risk of experiencing a stroke compared to those
who are not married.