Axis Insurance Project
Axis Insurance Project
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Objective
Do statistical analysis and extract actionable insights from the data
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Data Information
Variable Description
This is an integer indicating the age of the primary
Age beneficiary (excluding those above 64 years, since they are
generally covered by the government). Observations Variables
Sex This is the policy holder's gender, either male or female.
1338 7
This is the body mass index (BMI), which provides a sense
of how over or under-weight a person is relative to their
BMI height. BMI is equal to weight (in kilograms) divided by
height (in meters) squared. An ideal BMI is within the range Note:
of 18.5 to 24.9.
This is an integer indicating the number of children / ● There are no missing values in the
Children
dependents covered by the insurance plan. dataset
This is yes or no depending on whether the insured ● The sex, smoker and region
Smoker columns have been converted to
regularly smokes tobacco.
category
This is the beneficiary's place of residence in the U.S.,
Region divided into four geographic regions - northeast, southeast,
southwest, or northwest.
Charges Individual medical costs billed by health insurance
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
Exploratory Data Analysis – Age & BMI
BMI Age
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
Exploratory Data Analysis – Children & Charges
Children Charges
● The number of children has a left skewed ● Charges have a right skewed distribution. The
distribution. mean charges is higher than the median charges
● The plot suggests that we should ● This variable has a lot of outliers towards the
convert the children variable to higher end indicating that some people spend
categorical for further analysis. very high on their medicals.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
Exploratory Data Analysis – Sex & Children
Sex Children
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 6
Exploratory Data Analysis – Smoker & Region
Smoker Region
● 20% of the insurance holders are ● The distribution of insurance holders across
smokers. It will be interesting to see how various regions of US is fairly uniform. South
smoking affects the insurance claims. east region does have ~3% more observations
as compared to others but we will have to test if
this difference is statistically significant
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 7
Exploratory Data Analysis - Correlation matrix
Correlation matrix
● The correlation between between all the continuous variables is positive but not
very high.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 8
Hypothesis Testing - Medical cost
Problem: Prove(or disprove) that the medical claims made by the people who smoke is greater than those who don't?
Therefore, we reject the null hypothesis that the ● Visually the difference between charges of smokers and charges
mean charges of smokers is less than or equal to of non-smokers is apparent.
non-smokers. ● The non-smokers have much lower medical bill claims compared
to the smokers.
● We will have to perform a two sample t-test to test to check if the
mean charges of smokers and non-smokers is indeed different.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 9
Hypothesis Testing - BMI
Problem: Prove (or disprove) with statistical evidence that BMI of females is different from that of males.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 10
Hypothesis Testing - Smokers across region
Problem: Is the proportion of smokers significantly different across different regions?
Therefore, we fail to reject the null hypothesis that the no. of children has no effect on bmi.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 12
Conclusion
Based on our previous analysis, we can conclude that:
● The claims made by smoker are higher as compared to the non-smokers. We should create personalised policies for
these customer categories.
● Very few people have more than 2 children. 75% of the people have 2 or less children. However number of children
has no effect on BMI of the women insurance holders.
● BMI has a slight positive correlation with the medical claims.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 13
14
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.