0% found this document useful (0 votes)

100 views4 pages

Week 8 Lab - Linear Regression

The document outlines a Week 7 lab session for an AI course where students apply Linear Regression to predict insurance premiums using the 'Insurance.csv' dataset. It details the steps for data preprocessing, including handling null values, encoding categorical variables, and splitting the dataset into training and testing sets. Finally, it describes the model training process and evaluation metrics such as Mean Squared Error (MSE) and R-squared score.

Uploaded by

Zainab Segilola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views4 pages

Week 8 Lab - Linear Regression

Uploaded by

Zainab Segilola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

CMP4293 INTRODUCTION TO AI PRODUCED BY DR.

MARIAM ADEDOYIN-OLOWE

Welcome to the Week 7 lab session where you will continue to work on with the “Insurance.csv”
data. However, you will apply Linear Regression on the data to predict what insurance premium
people will be based on different attributes such as age, BMI, gender and smoking status.

from google.colab import files

file = files.upload()

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

data = pd.read_csv('insurance.csv')
data.info()

#check for null values

data.isnull().sum()

#check for any duplicated rows

data.duplicated().sum()

#check the original and the duplicated rows

data[data.duplicated(keep=False)]

#drop the duplicated row

data.drop_duplicates(inplace=True)

#check to confirm the duplicated row has been dropped

data.duplicated().sum()

data['sex'].value_counts()

#To convert text columns into number, let's display all the columns
with texts object
display(data['sex'].value_counts())
display(data['smoker'].value_counts())
display(data['region'].value_counts())

#import the relevant sklearn libraries needed to convert the text

columns into numeric values
CMP4293 INTRODUCTION TO AI PRODUCED BY DR. MARIAM ADEDOYIN-OLOWE
from sklearn.preprocessing import LabelEncoder
from Welcome to the Week 7 lab session
sklearn.preprocessing whereOneHotEncoder
import you will continue to work on with the “Insurance.csv”
from data. However, you will apply Linear Regression
sklearn.compose import ColumnTransformer on the data to predict what insurance premium
people will be based on different attributes such as age, BMI, gender and smoking status.

#creating one label encoder for sex and one label encoder for smoker
le_sex = LabelEncoder()
le_smoker = LabelEncoder()

#the fit object fits the specific values into the new columns using
only the 2 values e.g. male, female into 0,1
le_sex.fit(data['sex'].drop_duplicates())
le_smoker.fit(data['smoker'].drop_duplicates())

#applying the encording and saving the results in new columns. Note
that duplicates are not dropped here because we want to transform all
the rows
data['sex_enc'] = le_sex.transform(data['sex'])
data['smoker_enc'] = le_smoker.transform(data['smoker'])

#now let's check the transformation

data.head()

#transforming the 'region' column using the OneHotEncorder and applying

the 'passthrough'
#to the remaining columns so that the transformer leaves them as they
are
ct = ColumnTransformer( [ ('ohe', OneHotEncoder(), ['region']) ],
remainder='passthrough' )

trans = ct.fit_transform(data)

#listing out the new dataframe headers

ins_data = pd.DataFrame(trans, columns=ct.get_feature_names_out())

#listing the new columns

list(ins_data.columns)

ins_data.head()

#rename columns
ins_data.columns = ['region_northeast',
'region_northwest',
CMP4293 INTRODUCTION TO AI PRODUCED BY DR. MARIAM ADEDOYIN-OLOWE

'region_southeast',
'region_southwest',
'age',
'sex',
'bmi',
'children',
'smoker',
'charges',
'sex_enc',
'smoker_enc']

#reorder columns
ins_data = ins_data[[ 'age',
'sex',
'sex_enc',
'bmi',
'children',
'smoker',
'smoker_enc',
'region_northeast',
'region_northwest',
'region_southeast',
'region_southwest',
'charges'
]]

#remove object columns, save into new dataset, and convert to numeric
ins_data_t = ins_data[[ 'age',
'sex_enc',
'bmi',
'children',
'smoker_enc',
'region_northeast',
'region_northwest',
'region_southeast',
'region_southwest',
'charges'
]]

ins_data_t = ins_data_t.apply(pd.to_numeric)
ins_data_t.info()

df_corr = ins_data_t[['age',
CMP4293 INTRODUCTION TO AI PRODUCED BY DR. MARIAM ADEDOYIN-OLOWE

sns.heatmap(df_corr, vmin=-1, vmax=1, annot=True, fmt='.2f')

from sklearn.model_selection import train_test_split

df_feat = ins_data_t [['age',
'sex_enc',
'bmi',
'children',
'smoker_enc',
'charges'

]]

X = df_feat.iloc[:,0:-1]
y = df_feat.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=5,

test_size=0.3)

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

# y = a + B*X
# a = model.intercept
# B = model.coef_
model.intercept_, model.coef_

y_pred = model.predict(X_test)

from sklearn.metrics import mean_squared_error

from sklearn.metrics import mean_absolute_error

mse = mean_squared_error(y_pred, y_test)

sqrt_mse = np.sqrt(mse)
mae = mean_absolute_error(y_pred, y_test)

print(f"MSE : {mse:.3f}, MSE_SQRT : {sqrt_mse:.3f}, MAE : {mae:.3f}")

r2 = model.score(X_test, y_test)
print(f"R2 score: {r2:.3f}")

df_feat['charges'].min(), df_feat['charges'].max(),
df_feat['charges'].max()- df_feat['charges'].min()

df_feat.columns

val = model.predict([[50,1, 45.9, 1, 0,]])

print('Predicted Insurance Charge =', val)

Regression Equations
No ratings yet
Regression Equations
32 pages
Statistical Treatment of Data
100% (1)
Statistical Treatment of Data
3 pages
Introduction To Econometrics - Stock & Watson - CH 9 Slides
100% (1)
Introduction To Econometrics - Stock & Watson - CH 9 Slides
69 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Slides 1 Arnold Ventures 2024
No ratings yet
Slides 1 Arnold Ventures 2024
68 pages
17 Random Vectors 2 Lecture
No ratings yet
17 Random Vectors 2 Lecture
49 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
2K22 - B17 - 49 PRIYANSHU NANDAN - Multi Layer Perceptrons Reference
No ratings yet
2K22 - B17 - 49 PRIYANSHU NANDAN - Multi Layer Perceptrons Reference
32 pages
Answer PDF Lab
No ratings yet
Answer PDF Lab
34 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
4-10 Aiml
No ratings yet
4-10 Aiml
25 pages
SML Lab 1
No ratings yet
SML Lab 1
19 pages
Logistic Binary Classification
No ratings yet
Logistic Binary Classification
3 pages
Introduction To Regression With Statsmodels in Python
No ratings yet
Introduction To Regression With Statsmodels in Python
142 pages
Multilinear ProblemStatement
No ratings yet
Multilinear ProblemStatement
132 pages
Cluster Validation: Presented By:Rohit Paul
No ratings yet
Cluster Validation: Presented By:Rohit Paul
22 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
02 B Regression Healthcare
No ratings yet
02 B Regression Healthcare
5 pages
Generative AI For Models Development
No ratings yet
Generative AI For Models Development
8 pages
MDS-271Y Machine Learning: Cia-I
No ratings yet
MDS-271Y Machine Learning: Cia-I
6 pages
Task 2
No ratings yet
Task 2
4 pages
Import Pandas As PD
No ratings yet
Import Pandas As PD
3 pages
Prac2 174 Final
No ratings yet
Prac2 174 Final
5 pages
02 B Regression Healthcare
No ratings yet
02 B Regression Healthcare
5 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
The SAS System
No ratings yet
The SAS System
5 pages
Econometrics Using Excel PDF
0% (1)
Econometrics Using Excel PDF
484 pages
Assignment
No ratings yet
Assignment
2 pages
Python Sklearn Linear Regression
No ratings yet
Python Sklearn Linear Regression
45 pages
Advanced Statistics Project Module 3 - Advanced Statistics: Submitted To Great Learning
No ratings yet
Advanced Statistics Project Module 3 - Advanced Statistics: Submitted To Great Learning
37 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
2 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
Machine Learning Using Matlab: Lecture 8 Advice On ML Application
No ratings yet
Machine Learning Using Matlab: Lecture 8 Advice On ML Application
30 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
ML 6 7 8
No ratings yet
ML 6 7 8
10 pages
Uji Asumsi Normalitas: Statistics
No ratings yet
Uji Asumsi Normalitas: Statistics
3 pages
Mock - Coding: Numpy NP CSV Sklearn - Linear - Model Pandas PD Matplotlib - Pyplot PLT Sklearn - Metrics
No ratings yet
Mock - Coding: Numpy NP CSV Sklearn - Linear - Model Pandas PD Matplotlib - Pyplot PLT Sklearn - Metrics
2 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Sla4a 21im30005
No ratings yet
Sla4a 21im30005
11 pages
Simple Linear Regression Code
No ratings yet
Simple Linear Regression Code
3 pages
Linearregression
No ratings yet
Linearregression
18 pages
Elements of Statistics and Probability STA 201 S M Rajib Hossain MNS, BRAC University Lecture-8
No ratings yet
Elements of Statistics and Probability STA 201 S M Rajib Hossain MNS, BRAC University Lecture-8
6 pages
Scikit Learn
No ratings yet
Scikit Learn
10 pages
Exp 1
No ratings yet
Exp 1
7 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Arima Cho Usd Eur
No ratings yet
Arima Cho Usd Eur
15 pages
2 Linear Regression
No ratings yet
2 Linear Regression
5 pages
Medical Insurance Cost Prediction
No ratings yet
Medical Insurance Cost Prediction
16 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
Tutorial Stat 322 PDF
No ratings yet
Tutorial Stat 322 PDF
58 pages
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
No ratings yet
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
22 pages
Regression Discontinuity
No ratings yet
Regression Discontinuity
5 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
Gaurav - Data Mining Lab Assignment
No ratings yet
Gaurav - Data Mining Lab Assignment
36 pages
Predict Health Insurance Cost by Using Machine Learning and DNN Regression Models
No ratings yet
Predict Health Insurance Cost by Using Machine Learning and DNN Regression Models
7 pages
Notes 23 Regression R
No ratings yet
Notes 23 Regression R
5 pages
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
0% (1)
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
18 pages
Experiment No 3
No ratings yet
Experiment No 3
7 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
5.3) Ordinal Logistic Regression 2
No ratings yet
5.3) Ordinal Logistic Regression 2
40 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Accounting & Control: Cost Management
No ratings yet
Accounting & Control: Cost Management
41 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
Medical
No ratings yet
Medical
4 pages
Machine File
No ratings yet
Machine File
27 pages
Session 11 - Multiple Regression Analysis (GbA) PDF
No ratings yet
Session 11 - Multiple Regression Analysis (GbA) PDF
119 pages
Time Series Penn
No ratings yet
Time Series Penn
67 pages
Correlation and Regression
No ratings yet
Correlation and Regression
6 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
Panel Data Regression Models-Seminar
No ratings yet
Panel Data Regression Models-Seminar
18 pages
Regression Demo
No ratings yet
Regression Demo
8 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Elmousalami-Elaskary2020 Article DrillingStuckPipeClassificatio
No ratings yet
Elmousalami-Elaskary2020 Article DrillingStuckPipeClassificatio
14 pages
Lecture Material 10
No ratings yet
Lecture Material 10
9 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Linear Regression (Simple & Multiple)
No ratings yet
Linear Regression (Simple & Multiple)
29 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
Application of Data Science and Machine Learning Algorithms For ROP Prediction Turning Data Into Knowledge
No ratings yet
Application of Data Science and Machine Learning Algorithms For ROP Prediction Turning Data Into Knowledge
10 pages
Python Machine Learning - Logistic Regression
No ratings yet
Python Machine Learning - Logistic Regression
1 page
Assignment III
No ratings yet
Assignment III
3 pages
Linear Regression Hands-On
No ratings yet
Linear Regression Hands-On
27 pages
Regression - Classification:: When Is Categorical
No ratings yet
Regression - Classification:: When Is Categorical
1 page
Multinomial Logistic Regression - R Data Analysis Examples - IDRE Stats
No ratings yet
Multinomial Logistic Regression - R Data Analysis Examples - IDRE Stats
8 pages
Essential Computer Hardware: The Illustrated Guide to Understanding Computer Systems
From Everand
Essential Computer Hardware: The Illustrated Guide to Understanding Computer Systems
Kevin Wilson
No ratings yet