Week 8 Lab - Linear Regression
Week 8 Lab - Linear Regression
MARIAM ADEDOYIN-OLOWE
Welcome to the Week 7 lab session where you will continue to work on with the “Insurance.csv”
data. However, you will apply Linear Regression on the data to predict what insurance premium
people will be based on different attributes such as age, BMI, gender and smoking status.
file = files.upload()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
data = pd.read_csv('insurance.csv')
data.info()
data['sex'].value_counts()
#To convert text columns into number, let's display all the columns
with texts object
display(data['sex'].value_counts())
display(data['smoker'].value_counts())
display(data['region'].value_counts())
#creating one label encoder for sex and one label encoder for smoker
le_sex = LabelEncoder()
le_smoker = LabelEncoder()
#the fit object fits the specific values into the new columns using
only the 2 values e.g. male, female into 0,1
le_sex.fit(data['sex'].drop_duplicates())
le_smoker.fit(data['smoker'].drop_duplicates())
#applying the encording and saving the results in new columns. Note
that duplicates are not dropped here because we want to transform all
the rows
data['sex_enc'] = le_sex.transform(data['sex'])
data['smoker_enc'] = le_smoker.transform(data['smoker'])
trans = ct.fit_transform(data)
ins_data.head()
#rename columns
ins_data.columns = ['region_northeast',
'region_northwest',
CMP4293 INTRODUCTION TO AI PRODUCED BY DR. MARIAM ADEDOYIN-OLOWE
Welcome to the Week 7 lab session where you will continue to work on with the “Insurance.csv”
data. However, you will apply Linear Regression on the data to predict what insurance premium
people will be based on different attributes such as age, BMI, gender and smoking status.
'region_southeast',
'region_southwest',
'age',
'sex',
'bmi',
'children',
'smoker',
'charges',
'sex_enc',
'smoker_enc']
#reorder columns
ins_data = ins_data[[ 'age',
'sex',
'sex_enc',
'bmi',
'children',
'smoker',
'smoker_enc',
'region_northeast',
'region_northwest',
'region_southeast',
'region_southwest',
'charges'
]]
#remove object columns, save into new dataset, and convert to numeric
ins_data_t = ins_data[[ 'age',
'sex_enc',
'bmi',
'children',
'smoker_enc',
'region_northeast',
'region_northwest',
'region_southeast',
'region_southwest',
'charges'
]]
ins_data_t = ins_data_t.apply(pd.to_numeric)
ins_data_t.info()
df_corr = ins_data_t[['age',
CMP4293 INTRODUCTION TO AI PRODUCED BY DR. MARIAM ADEDOYIN-OLOWE
Welcome to the Week 7 lab session where you will continue to work on with the “Insurance.csv”
data. However, you will apply Linear Regression on the data to predict what insurance premium
people will be based on different attributes such as age, BMI, gender and smoking status.
]]
X = df_feat.iloc[:,0:-1]
y = df_feat.iloc[:,-1]
# y = a + B*X
# a = model.intercept
# B = model.coef_
model.intercept_, model.coef_
y_pred = model.predict(X_test)
r2 = model.score(X_test, y_test)
print(f"R2 score: {r2:.3f}")
df_feat['charges'].min(), df_feat['charges'].max(),
df_feat['charges'].max()- df_feat['charges'].min()
df_feat.columns