0% found this document useful (0 votes)
312 views14 pages

Axis Insurance Project

Solution

Uploaded by

Abhay Poddar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
312 views14 pages

Axis Insurance Project

Solution

Uploaded by

Abhay Poddar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Axis Insurance Project

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Objective
Do statistical analysis and extract actionable insights from the data

We will be majorly focusing on these problems -


● Extracting insights using Exploratory Data Analysis.
● Prove (or disprove) that the medical claims made by the people who smoke is greater than those who don't?
● Prove (or disprove) with statistical evidence that the BMI of females is different from that of males.
● Is the proportion of smokers significantly different across different regions?
● Is the mean BMI of women with no children, one child and two children the same? Explain your answer with statistical
evidence.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Data Information
Variable Description
This is an integer indicating the age of the primary
Age beneficiary (excluding those above 64 years, since they are
generally covered by the government). Observations Variables
Sex This is the policy holder's gender, either male or female.
1338 7
This is the body mass index (BMI), which provides a sense
of how over or under-weight a person is relative to their
BMI height. BMI is equal to weight (in kilograms) divided by
height (in meters) squared. An ideal BMI is within the range Note:
of 18.5 to 24.9.
This is an integer indicating the number of children / ● There are no missing values in the
Children
dependents covered by the insurance plan. dataset
This is yes or no depending on whether the insured ● The sex, smoker and region
Smoker columns have been converted to
regularly smokes tobacco.
category
This is the beneficiary's place of residence in the U.S.,
Region divided into four geographic regions - northeast, southeast,
southwest, or northwest.
Charges Individual medical costs billed by health insurance

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
Exploratory Data Analysis – Age & BMI
BMI Age

● BMI looks to have a fairly normal ● Age seems uniformly distributed,


distribution with both mean and median around
40 years.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
Exploratory Data Analysis – Children & Charges
Children Charges

● The number of children has a left skewed ● Charges have a right skewed distribution. The
distribution. mean charges is higher than the median charges
● The plot suggests that we should ● This variable has a lot of outliers towards the
convert the children variable to higher end indicating that some people spend
categorical for further analysis. very high on their medicals.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
Exploratory Data Analysis – Sex & Children
Sex Children

● The distribution of observations across ● Nearly 42% insurance holders do not


genders is fairly similar as we saw earlier have a child.
as well. ● Nearly 42% insurance holders have 1
or 2 children.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 6
Exploratory Data Analysis – Smoker & Region
Smoker Region

● 20% of the insurance holders are ● The distribution of insurance holders across
smokers. It will be interesting to see how various regions of US is fairly uniform. South
smoking affects the insurance claims. east region does have ~3% more observations
as compared to others but we will have to test if
this difference is statistically significant
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 7
Exploratory Data Analysis - Correlation matrix
Correlation matrix

● The correlation between between all the continuous variables is positive but not
very high.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 8
Hypothesis Testing - Medical cost
Problem: Prove(or disprove) that the medical claims made by the people who smoke is greater than those who don't?

● Null Hypothesis = Ho = "Mean charges of


smokers is less than or equal to
non-smokers."

● Alternate Hypothesis = Ha = "Mean


charges of smokers is greater than
non-smokers."

By using Independent t-test, we get the p-value


is 4.13e-283 that is <0.05.

Therefore, we reject the null hypothesis that the ● Visually the difference between charges of smokers and charges
mean charges of smokers is less than or equal to of non-smokers is apparent.
non-smokers. ● The non-smokers have much lower medical bill claims compared
to the smokers.
● We will have to perform a two sample t-test to test to check if the
mean charges of smokers and non-smokers is indeed different.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 9
Hypothesis Testing - BMI
Problem: Prove (or disprove) with statistical evidence that BMI of females is different from that of males.

● Null Hypothesis = Ho = "Mean BMI of


females is same as that of males"

● Alternate Hypothesis = Ha = "Mean BMI of


females is different from males"

By using Independent t-test, we get the p-value


is 0.0899 that is >0.05.

Therefore, we fail to reject the null hypothesis


that the mean BMI of females is same as that of
males.

● Visually, there is no apparent relation between gender and BMI

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 10
Hypothesis Testing - Smokers across region
Problem: Is the proportion of smokers significantly different across different regions?

● Null Hypothesis = Ho = "Region has no


effect on smoking habits"

● Alternate Hypothesis = Ha = "Region has


an effect on smoking habits"

By using chi-square test, we get the p-value is


0.062that is >0.05.

Therefore, we fail to reject the null hypothesis


that the region has no effect on smoking habits.

● The proportion of smokers in southeast region is higher than


others.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 11
Hypothesis Testing - BMI of women
Problem: Is the mean BMI of women with no children, one child and two children the same? Explain your answer with
statistical evidence?

● Null Hypothesis = Ho = "No. of children has no effect on bmi"

● Alternate Hypothesis = Ha = "No. of children has an effect on bmi"

By using anova test, we get the p-value is 0.716 that is >0.05.

Therefore, we fail to reject the null hypothesis that the no. of children has no effect on bmi.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 12
Conclusion
Based on our previous analysis, we can conclude that:

● The claims made by smoker are higher as compared to the non-smokers. We should create personalised policies for
these customer categories.
● Very few people have more than 2 children. 75% of the people have 2 or less children. However number of children
has no effect on BMI of the women insurance holders.
● BMI has a slight positive correlation with the medical claims.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 13
14
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

You might also like