0% found this document useful (0 votes)

22 views17 pages

Medical Insurance Analysis ??

This document analyzes medical cost data using Python. It loads and explores an insurance cost dataset, encodes categorical variables, calculates correlations, and creates various visualizations of the relationships between variables like age, BMI, smoking status and costs. It also builds and evaluates linear regression and polynomial regression models to predict costs.

Uploaded by

Uzzal Hossen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views17 pages

Medical Insurance Analysis ??

Uploaded by

Uzzal Hossen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

medical-cost-analysis

May 25, 2024

[1]: import numpy as np

import pandas as pd
import matplotlib.pyplot as pl
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

/kaggle/input/insurance/insurance.csv

[2]: df = pd.read_csv("/kaggle/input/insurance/insurance.csv")
df.head()

[2]: age sex bmi children smoker region charges

0 19 female 27.900 0 yes southwest 16884.92400
1 18 male 33.770 1 no southeast 1725.55230
2 28 male 33.000 3 no southeast 4449.46200
3 33 male 22.705 0 no northwest 21984.47061
4 32 male 28.880 0 no northwest 3866.85520

[3]: df.isnull().sum()

[3]: age 0
sex 0
bmi 0
children 0
smoker 0
region 0
charges 0
dtype: int64

[4]: from sklearn.preprocessing import LabelEncoder

df_aug = pd.read_csv('/kaggle/input/insurance/insurance.csv')
#sex
le = LabelEncoder()

1
le.fit(df_aug.sex.drop_duplicates())
df_aug.sex = le.transform(df_aug.sex)
# smoker or not
le.fit(df_aug.smoker.drop_duplicates())
df_aug.smoker = le.transform(df_aug.smoker)
#region
le.fit(df_aug.region.drop_duplicates())
df_aug.region = le.transform(df_aug.region)

[5]: df_aug.corr()['charges'].sort_values()

[5]: region -0.006208

sex 0.057292
children 0.067998
bmi 0.198341
age 0.299008
smoker 0.787251
charges 1.000000
Name: charges, dtype: float64

[6]: f, ax = pl.subplots(figsize=(10, 8))

corr = df_aug.corr()
sns.heatmap(corr, mask=np.zeros_like(corr, dtype=bool), cmap=sns.
↪diverging_palette(260 ,20,as_cmap=True),

square=True, ax=ax)

[6]: <Axes: >

2
[ ]:

[7]: f= pl.figure(figsize=(12,5))

ax=f.add_subplot(121)
sns.distplot(df_aug[(df_aug.smoker == 1)]["charges"],color='c',ax=ax)
ax.set_title('Distribution of charges for smokers')

ax=f.add_subplot(122)
sns.distplot(df_aug[(df_aug.smoker == 0)]['charges'],color='b',ax=ax)
ax.set_title('Distribution of charges for non-smokers')

[7]: Text(0.5, 1.0, 'Distribution of charges for non-smokers')

3
[8]: sns.catplot(x="smoker", kind="count",hue = 'sex', palette="pink", data=df)

[8]: <seaborn.axisgrid.FacetGrid at 0x7b10fb55c730>

4
[9]: sns.catplot(x="sex", y="charges", hue="smoker",
kind="violin", data=df, palette = 'magma')

[9]: <seaborn.axisgrid.FacetGrid at 0x7b10faa3feb0>

[10]: pl.figure(figsize=(12,5))
pl.title("Box plot for charges of women")
sns.boxplot(y="smoker", x="charges", data = df_aug[(df_aug.sex == 1)] ,␣
↪orient="h", palette = 'magma')

[10]: <Axes: title={'center': 'Box plot for charges of women'}, xlabel='charges',

ylabel='smoker'>

5
[11]: pl.figure(figsize=(12,5))
pl.title("Box plot for charges of men")
sns.boxplot(y="smoker", x="charges", data = df_aug[(df_aug.sex == 0)] ,␣
↪orient="h", palette = 'rainbow')

[11]: <Axes: title={'center': 'Box plot for charges of men'}, xlabel='charges',

ylabel='smoker'>

[12]: pl.figure(figsize=(12,5))
pl.title("Distribution of age")
ax = sns.distplot(df_aug["age"], color = 'g')

6
[13]: g = sns.jointplot(x="age", y="charges", data = df_aug[(df_aug.smoker ==␣
↪0)],kind="kde", fill=True, cmap= "flare")

g.plot_joint(pl.scatter, c="w", s=0, linewidth=1, marker="+")

g.ax_joint.collections[0].set_alpha(0)
g.set_axis_labels("$X$", "$Y$")
ax.set_title('Distribution of charges and age for non-smokers')

[13]: Text(0.5, 1.0, 'Distribution of charges and age for non-smokers')

7
[14]: g = sns.jointplot(x="age", y="charges", data = df_aug[(df_aug.smoker ==␣
↪1)],kind="kde", fill=True, cmap="magma")

g.plot_joint(pl.scatter, c="w", s=0, linewidth=1, marker="+")

g.ax_joint.collections[0].set_alpha(0)
g.set_axis_labels("$X$", "$Y$")
ax.set_title('Distribution of charges and age for smokers')

[14]: Text(0.5, 1.0, 'Distribution of charges and age for smokers')

8
[15]: sns.lmplot(x="age", y="charges", hue="smoker", data=df_aug, palette =␣
↪'inferno_r')

ax.set_title('Smokers and non-smokers')

[15]: Text(0.5, 1.0, 'Smokers and non-smokers')

9
[16]: pl.figure(figsize=(12,5))
pl.title("Distribution of bmi")
ax = sns.distplot(df["bmi"], color = 'm')

10
[17]: pl.figure(figsize=(12,5))
pl.title("Distribution of charges for patients with BMI greater than 30")
ax = sns.distplot(df[(df.bmi >= 30)]['charges'], color = 'c')

[18]: pl.figure(figsize=(12,5))
pl.title("Distribution of charges for patients with BMI less than 30")
ax = sns.distplot(df[(df.bmi < 30)]['charges'], color = 'b')

11
[19]: g = sns.jointplot(x="bmi", y="charges", data = df,kind="kde", fill = True, cmap␣
↪= 'viridis')

g.plot_joint(pl.scatter, c="w", s=0, linewidth=1, marker="+")

g.ax_joint.collections[0].set_alpha(0)
g.set_axis_labels("$X$", "$Y$")
ax.set_title('Distribution of bmi and charges')

[19]: Text(0.5, 1.0, 'Distribution of bmi and charges')

[20]: pl.figure(figsize=(10,6))
ax = sns.
↪scatterplot(x='bmi',y='charges',data=df_aug,palette='magma',hue='smoker')

ax.set_title('Scatter plot of charges and bmi')

12
sns.lmplot(x="bmi", y="charges", hue="smoker", data=df_aug, palette = 'magma')

[20]: <seaborn.axisgrid.FacetGrid at 0x7b10f40a3610>

13
[21]: sns.catplot(x="children", kind="count", palette="ch:.25", data=df_aug)

[21]: <seaborn.axisgrid.FacetGrid at 0x7b10f40a25c0>

14
[22]: sns.catplot(x="smoker", kind="count", palette="rainbow",hue = "sex",
data=df[(df.children > 0)])
ax.set_title('Smokers and non-smokers who have childrens')

[22]: Text(0.5, 1.0, 'Smokers and non-smokers who have childrens')

15
[23]: from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.ensemble import RandomForestRegressor

[24]: x = df_aug.drop(['charges'], axis = 1)

y = df_aug.charges

x_train,x_test,y_train,y_test = train_test_split(x,y, random_state = 0)

lr = LinearRegression().fit(x_train,y_train)

y_train_pred = lr.predict(x_train)
y_test_pred = lr.predict(x_test)

print(lr.score(x_test,y_test))

0.7962732059725786

16
[25]: X = df_aug.drop(['charges','region'], axis = 1)
Y = df_aug.charges

quad = PolynomialFeatures (degree = 2)

x_quad = quad.fit_transform(X)

X_train,X_test,Y_train,Y_test = train_test_split(x_quad,Y, random_state = 0)

plr = LinearRegression().fit(X_train,Y_train)

Y_train_pred = plr.predict(X_train)
Y_test_pred = plr.predict(X_test)

print(plr.score(X_test,Y_test))

0.8849197344147234

[ ]:

Clive Barkers Undying - Manual - PC
100% (1)
Clive Barkers Undying - Manual - PC
15 pages
Medical Cost Analysis
No ratings yet
Medical Cost Analysis
17 pages
Linear Regression: Data Exploration
No ratings yet
Linear Regression: Data Exploration
12 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Medical Cost Prediction
No ratings yet
Medical Cost Prediction
27 pages
Step 1
No ratings yet
Step 1
10 pages
'Name-Piyush Tiwari''/n' 'Section - C'/N' 'Roll - No-2001610100142'
No ratings yet
'Name-Piyush Tiwari''/n' 'Section - C'/N' 'Roll - No-2001610100142'
28 pages
Healthcare Analytics
No ratings yet
Healthcare Analytics
72 pages
Diabetes Prediction 1704256341
No ratings yet
Diabetes Prediction 1704256341
17 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Python For Machine Learning Visualization 1735231185
No ratings yet
Python For Machine Learning Visualization 1735231185
69 pages
health_risk_prediction
No ratings yet
health_risk_prediction
80 pages
DSBDA2
No ratings yet
DSBDA2
6 pages
Python Datavisualization
No ratings yet
Python Datavisualization
69 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Explanationdocx
No ratings yet
Explanationdocx
9 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
RL - EX1.Ipynb - Colab
No ratings yet
RL - EX1.Ipynb - Colab
3 pages
Logistic Regression With Pyspark
No ratings yet
Logistic Regression With Pyspark
19 pages
My Code
No ratings yet
My Code
7 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
Ass 1 Dsbda
No ratings yet
Ass 1 Dsbda
8 pages
Healthcare Insurance Prediction Main
No ratings yet
Healthcare Insurance Prediction Main
74 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
No ratings yet
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
20 pages
Model Building Using Healthcare Dataset
No ratings yet
Model Building Using Healthcare Dataset
19 pages
Datascience 2 PDF
No ratings yet
Datascience 2 PDF
24 pages
Ai in HC - 2
No ratings yet
Ai in HC - 2
9 pages
Phase 3 Health Monitoring and Diagnosis
No ratings yet
Phase 3 Health Monitoring and Diagnosis
10 pages
Final Group Project
No ratings yet
Final Group Project
26 pages
Gastric Cancer Detection
No ratings yet
Gastric Cancer Detection
34 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
R based Project
No ratings yet
R based Project
24 pages
IPYNB Converter
No ratings yet
IPYNB Converter
8 pages
Capstone Project 2
No ratings yet
Capstone Project 2
15 pages
Data Analytics7
No ratings yet
Data Analytics7
5 pages
ML Proj Diabetes
No ratings yet
ML Proj Diabetes
51 pages
Ankur Assignment
No ratings yet
Ankur Assignment
10 pages
Machine Learning Algorithm 1690246024
No ratings yet
Machine Learning Algorithm 1690246024
26 pages
Healthcare-Project-Simplilearn - Week2
No ratings yet
Healthcare-Project-Simplilearn - Week2
8 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
Healthcare-Project-Simplilearn - Week1
No ratings yet
Healthcare-Project-Simplilearn - Week1
6 pages
Healthcare-Project-Simplilearn - Week3
No ratings yet
Healthcare-Project-Simplilearn - Week3
7 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Health Dataset
No ratings yet
Health Dataset
2 pages
Data Visualization
No ratings yet
Data Visualization
159 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
No ratings yet
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
8 pages
Medical
No ratings yet
Medical
4 pages
Midterm Project Group 6
No ratings yet
Midterm Project Group 6
41 pages
Methodolgy
No ratings yet
Methodolgy
8 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
A4 Dsbda Sana
No ratings yet
A4 Dsbda Sana
16 pages
Linear Regression Modelfor Predicting Medical Expenses
No ratings yet
Linear Regression Modelfor Predicting Medical Expenses
5 pages
No Ph.D. Game Design With Three.js
From Everand
No Ph.D. Game Design With Three.js
Nikiforos Kontopoulos
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Advanced Java MCQ Min MCQ
No ratings yet
Advanced Java MCQ Min MCQ
25 pages
User Profile
No ratings yet
User Profile
5 pages
PR-2053 - Control of Product Non-Conformities
100% (1)
PR-2053 - Control of Product Non-Conformities
13 pages
Ishika Resume
No ratings yet
Ishika Resume
1 page
Chapter-3 - MS Powerpoint 2016-Advanced Features
No ratings yet
Chapter-3 - MS Powerpoint 2016-Advanced Features
3 pages
Jeneesh Report
No ratings yet
Jeneesh Report
44 pages
AV176U Manual en
No ratings yet
AV176U Manual en
127 pages
IFM02B2
No ratings yet
IFM02B2
4 pages
FortiDDoS 4.5.0 Study Guide-Online
No ratings yet
FortiDDoS 4.5.0 Study Guide-Online
211 pages
Activity. Relations and Functions
No ratings yet
Activity. Relations and Functions
5 pages
Dynamo Studio Model Information Analysis For Construction Builders
No ratings yet
Dynamo Studio Model Information Analysis For Construction Builders
54 pages
Instalacion Sakai
No ratings yet
Instalacion Sakai
3 pages
HIL DSP 100 Interface
No ratings yet
HIL DSP 100 Interface
8 pages
Design of A Mobile and Desktop Application Platform For Hospital
No ratings yet
Design of A Mobile and Desktop Application Platform For Hospital
4 pages
QA Testing
No ratings yet
QA Testing
7 pages
Brocade Fos Target Path TB
No ratings yet
Brocade Fos Target Path TB
6 pages
PLC User - S Manual of Communication Module
No ratings yet
PLC User - S Manual of Communication Module
22 pages
C M M Lab Manual
No ratings yet
C M M Lab Manual
24 pages
MCQ Selection (AI Adaptive Model)
No ratings yet
MCQ Selection (AI Adaptive Model)
4 pages
Nc1250+-+intro+systems+development+p1+l2+memo+nov+2010 1
No ratings yet
Nc1250+-+intro+systems+development+p1+l2+memo+nov+2010 1
7 pages
Unit 5
No ratings yet
Unit 5
7 pages
Mounting Options Brochure
No ratings yet
Mounting Options Brochure
24 pages
Bev S4hana2022 BPD en Ae
No ratings yet
Bev S4hana2022 BPD en Ae
31 pages
How To Grow Your Business With Hacker Powered Security: Hack For Good
No ratings yet
How To Grow Your Business With Hacker Powered Security: Hack For Good
5 pages
Report
No ratings yet
Report
52 pages
Design and Development of Smart Automated Clothesline
No ratings yet
Design and Development of Smart Automated Clothesline
5 pages
IoT-Enabled Sensors in Automation Systems and Their Security Challenges
No ratings yet
IoT-Enabled Sensors in Automation Systems and Their Security Challenges
4 pages
EEE Mid-2 QP Py
No ratings yet
EEE Mid-2 QP Py
1 page
RIBS FileTrac Training Manual
No ratings yet
RIBS FileTrac Training Manual
50 pages