0% found this document useful (0 votes)
0 views

Multi_Regression

The document outlines a lab exercise focused on supervised machine learning using regression with the Scikit Learn library. It includes instructions for performing linear regression on a fuel consumption dataset, extracting features and labels, fitting a model, and evaluating its performance using R-squared. Additionally, it covers multi-regression with multiple independent variables and provides steps for predicting outcomes based on various input features.

Uploaded by

nagulxlugan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Multi_Regression

The document outlines a lab exercise focused on supervised machine learning using regression with the Scikit Learn library. It includes instructions for performing linear regression on a fuel consumption dataset, extracting features and labels, fitting a model, and evaluating its performance using R-squared. Additionally, it covers multi-regression with multiple independent variables and provides steps for predicting outcomes based on various input features.

Uploaded by

nagulxlugan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1 Supervised Machine Learning - Regression with Scikit Learn Library

In this exercise, we will use Linear Regression model from Scikit Learn.
1. Linear-Regression with Scikit Learn Library
2. Multi-Regression with Scikit Learn Library
Instruction to compplete lab exercises:
1. Open python notebook file under Lab folder
2. Read the problem statement in the exercise and expected output
3. Uncomment and remove the lines and fill in wiht your answer
4. Run your code to produce expected output.
Noted: Data files are stored in dataset folder

1.1 Linear-Regression with Scikit Learn Library


Now, we will try out the same fuel consumption example to develop simple linear regression
model using Scikit Learn Library.
Perform simple linear regression using sklearn lib on the fuel consumption dataset. Uese the data
in auto-mpg-clean.csv file, predict the fuel consumption (mpg) of car based on weight.
Show your R-squared. Please refer to the following output as your reference.

1.1.1 Following steps will be performed:


1. Load Input dataset
2. Extract feature and label data
3. The create Linear Regression class and fit (train) the data e.g LinearRegres-
sion(normalize=True)
4. Predict(Y) based on (X)
5. Display Coefficient, Intercept

1
6. R-square using r2_score function from Scikit Learn metrics package
7. Predict a single data using with model

[3]: import numpy as np


import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

[4]: # 1) Load auto-mpg-clean data

car_data = pd.read_csv('dataset/auto-mpg-clean.csv',header=0,
,→skipinitialspace=True) #read the data

## Uncomment following lines and fill in wiht your answer


## display the top 5 data

car_data.head(5)

[4]: mpg cylinders displacement horsepower weight acceleration \


0 26.0 4 97.0 46 1835 20.5
1 26.0 4 97.0 46 1950 21.0
2 43.1 4 90.0 48 1985 21.5
3 44.3 4 90.0 48 2085 21.7
4 43.4 4 90.0 48 2335 23.7

model year origin car name


0 70 2 volkswagen 1131 deluxe sedan
1 73 2 volkswagen super beetle
2 78 2 volkswagen rabbit custom diesel
3 80 2 vw rabbit c (diesel)
4 80 2 vw dasher (diesel)

[5]: ## Uncomment following lines and fill in wiht your answer


## Describe the statistics of the data

[6]: car_data.describe()

[6]: mpg cylinders displacement horsepower weight \


count 392.000000 392.000000 392.000000 392.000000 392.000000
mean 23.445918 5.471939 194.411990 104.469388 2977.584184
std 7.805007 1.705783 104.644004 38.491160 849.402560
min 9.000000 3.000000 68.000000 46.000000 1613.000000
25% 17.000000 4.000000 105.000000 75.000000 2225.250000
50% 22.750000 4.000000 151.000000 93.500000 2803.500000
75% 29.000000 8.000000 275.750000 126.000000 3614.750000
max 46.600000 8.000000 455.000000 230.000000 5140.000000

2
acceleration model year origin
count 392.000000 392.000000 392.000000
mean 15.541327 75.979592 1.576531
std 2.758864 3.683737 0.805518
min 8.000000 70.000000 1.000000
25% 13.775000 73.000000 1.000000
50% 15.500000 76.000000 1.000000
75% 17.025000 79.000000 2.000000
max 24.800000 82.000000 3.000000

1.2 Extract weight as X (feature) and fuel consumption (mpg) as Y (label)

[7]: # 2) Extract weight as X (feature) and fuel consumption (mpg) as Y (label)


# 3) Create LinearRegression model from Sckit Learn package and fit the data
# You may add additional variables within the brackets or reshape the data

'''
X = car_data.iloc[:,4:5]
Y = car_data.iloc[:,0:1]

# or

X = car_data['weight'].to_numpy()
Y = car_data['mpg'].to_numpy()

X = X.reshape(-1, 1)
Y = Y.reshape(-1, 1)
'''
# or

X = car_data[['weight']]
Y = car_data['mpg']

[8]: ## Create LinearRegression model from Sckit Learn package and fit the data
## You may add additional variables within the brackets or reshape the data

[9]: lin_reg = LinearRegression(normalize=True)

## Uncomment following lines and fill in wiht your answer

lin_reg.fit(X,Y)

[9]: LinearRegression(normalize=True)

3
1.3 Predict the all data of X
1.4 Display the Coefficients, Intercept and r-squared value

[10]: ## Uncomment following lines and fill in wiht your answer

Y_pred = lin_reg.predict(X)

r2_value = r2_score(Y, Y_pred)

print("Coefficients: ", lin_reg.coef_)


print("Intercept: ", lin_reg.intercept_)

# The coefficient of determination: 1 is perfect prediction

print('R-squared Coefficient of determination: %.2f'


%r2_value)

Coefficients: [-0.00764734]
Intercept: 46.216524549017585
R-squared Coefficient of determination: 0.69

1.5 Display the predicted data by appending existing data


Append the predicted data and display together with original data as shown:

[11]: # 6) Append the predicted data


car_data['Y_pred'] = Y_pred
print(car_data.head())

mpg cylinders displacement horsepower weight acceleration \

4
0 26.0 4 97.0 46 1835 20.5
1 26.0 4 97.0 46 1950 21.0
2 43.1 4 90.0 48 1985 21.5
3 44.3 4 90.0 48 2085 21.7
4 43.4 4 90.0 48 2335 23.7

model year origin car name Y_pred


0 70 2 volkswagen 1131 deluxe sedan 32.183651
1 73 2 volkswagen super beetle 31.304207
2 78 2 volkswagen rabbit custom diesel 31.036550
3 80 2 vw rabbit c (diesel) 30.271815
4 80 2 vw dasher (diesel) 28.359980

2 Exercise 1:
Perform linear regression using sklearn lib on the Income3 dataset. Predict the Income based on
the Year of Education.

2.0.1 Perform following steps:


1. Load Input dataset
2. Extract feature and label data
3. The create Linear Regression class and fit (train) the data e.g LinearRegres-
sion(normalize=True)
4. Predict(Y) based on (X)
5. Display Coefficient, Intercept
6. R-square using r2_score function from Scikit Learn metrics package
7. Predict a single data using with model

[2]: # 1) Load Input dataset

#my_data = ___________________________________________________________

#print the first 5 data

#_________________________________________

# 2) Extract data

#X = ___________________________________________________________

#Y = = ___________________________________________________________

# 3) The create Linear Regression class and fit the data.

#lin_reg = ___________________________________________________________

5
#= ___________________________________________________________

# 4) Predict income data (Y) based on number of years in higher education (X)

#Y_pred = ___________________________________________________________

# 5) Display Coefficient, Intercept

#___________________________________________________________

#___________________________________________________________

# 6) R-square using r2_score function from Scikit Learn metrics package

#r2_data = ___________________________________________________________

#___________________________________________________________

# 7) Predict a single data eg. print the output expected income for 3 years in
,→higher education

#yeasofeducation = ___________________________________________________________

#predicted_income = __________________________________________________________

#print("Predicted income with 3 years higher education is :%.2f"


,→%predicted_income)

Observation Years of Higher Education (x) Income (y)


0 1 6 89617
1 2 0 39826
2 3 6 79894
3 4 3 56547
4 5 4 64795
Coefficients: [[7692.92437864]]
Intercept: [37264.82601798]
R-squared Coefficient of determination: 0.95
Predicted income with 3 years higher education is :60343.60

2.1 Multi-Regression with Scikit Learn Library


Multi-Regression find the relationship between multiple independent variables and one depen-
dent variable. A dependent variable is modeled as a function of several independent variables

6
with corresponding coefficients, along with the constant term. Multiple regression requires two
or more independent variables, and this is why it is called multiple regression.
Use the same fuel consumption dataset and implement Multiple regression model. Select the MPG
data as target and the rest of the data such as ‘cylinders’, ‘displacement’, ‘horsepower’, ‘weight’,
‘acceleration’, ‘model year’ as input data X.

2.2 Load data from csv


[ ]: car_data = pd.read_csv('dataset/auto-mpg-clean.csv',header=0,
,→skipinitialspace=True) #read the data

2.2.1 Prepare the input data and target (label)


• Split the column from index 1 onwards from the dataset as input X or
• Select ‘cylinders’, ‘displacement’, ‘horsepower’, ‘weight’, ‘acceleration’, ‘model year’ as X
• Select the column 0 or ‘mpg’ from the dataset as the label variable Y

[10]: # Prepare input and target label data


#setting the matrixes
# For 2 variables for multiple regression. You may add additional variables
,→within the brackets

#split
#X = car_data.to_numpy()
#X = X[:, 1:]

#or
#select the column

X = car_data[['cylinders', 'displacement', 'horsepower', 'weight',


,→'acceleration', 'model year']]

Y = car_data['mpg']

#make sure the dimension is correct


print (X.shape)
print (Y.shape)

(392, 6)
(392,)

2.2.2 Create LinearRegression model from Sckit Learn package


• Create LinearRegression model from Sckit Learn package and fit the data
• Display the Coefficients, Intercept and r-squared value
Note it will display coefficiens of the corresponding features.

7
[11]: lin_reg = LinearRegression(normalize=True)
lin_reg.fit(X, Y)

print('Intercept: \n', lin_reg.intercept_)


print('Coefficients: \n', lin_reg.coef_)

Intercept:
-14.535250480506125
Coefficients:
[-3.29859089e-01 7.67843024e-03 -3.91355574e-04 -6.79461791e-03
8.52732469e-02 7.53367180e-01]

2.2.3 Predict the all data of X and display r2_value

[12]: # 4) Predict the all data of X


# 5) Print out the Coefficients, Intercept and r-squared value

Y_pred = lin_reg.predict(X)
r2_value = r2_score(Y, Y_pred)

print("Coefficients: ", lin_reg.coef_)


print("Intercept: ", lin_reg.intercept_)

# The coefficient of determination: 1 is perfect prediction


print('R-squared Coefficient of determination: %.2f'
%r2_value)

# 6) Append the predicted data


car_data['Y_pred'] = Y_pred
print(car_data.head())

Coefficients: [-3.29859089e-01 7.67843024e-03 -3.91355574e-04 -6.79461791e-03


8.52732469e-02 7.53367180e-01]
Intercept: -14.535250480506125
R-squared Coefficient of determination: 0.81
mpg cylinders displacement horsepower weight acceleration \
0 26.0 4 97.0 46 1835 20.5
1 26.0 4 97.0 46 1950 21.0
2 43.1 4 90.0 48 1985 21.5
3 44.3 4 90.0 48 2085 21.7
4 43.4 4 90.0 48 2335 23.7

model year origin car name Y_pred


0 70 2 volkswagen 1131 deluxe sedan 26.887799
1 73 2 volkswagen super beetle 28.409156
2 78 2 volkswagen rabbit custom diesel 31.926285
3 80 2 vw rabbit c (diesel) 32.770612

8
4 80 2 vw dasher (diesel) 31.242504

3 Exercise 1: Predict the mpg data of the car with following informa-
tion
• ‘cylinders’ = 4
• ‘displacement’ = 97
• ‘horsepower’ = 48
• ‘weight’ = 2000
• ‘acceleration’ = 23.8
• ‘model year’ = 80
Expected output: Predicted mpg info is 33.58.

[13]: # Crate two dimensional array called new_data with proposed data

#new_data = __________________________________________________________

# Predict mpg and display

#p_mpg = __________________________________________________________

#__________________________________________________________________

Predicted mpg info is 33.58.

4 Exercise 2:
Perform multi regression using sklearn lib on the fishcatch dataset. Predict the weight of fish
based on all features EXCEPT sex and species
Show your R-squared and Adjusted R-squared.

4.1 Load the data from fishcatch.csv

9
[14]: ## Load csv data and display top 5 data

#fishcatch_data =
,→_____________________________________________________________________________

#= __________________________________________________________

[14]: Observation Species Weight Length1 Length2 Length3 Height Width \


0 1 1 242.0 23.2 25.4 30.0 38.4 13.4
1 2 1 290.0 24.0 26.3 31.2 40.0 13.8
2 3 1 340.0 23.9 26.5 31.1 39.8 15.1
3 4 1 363.0 26.3 29.0 33.5 38.0 13.3
4 5 1 430.0 26.5 29.0 34.0 36.6 15.1

Sex
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN

4.2 Selecte the ‘Length1’, ‘Length2’, ‘Length3’, ‘Height’, ‘Width’ as features


4.3 Selecte the ‘Weight’ as label

[15]: #Select features X and label Y with proposed columns

#X = = __________________________________________________________

#Y = = __________________________________________________________

#Display the dimension of X and Y

#__________________________________________________________

#__________________________________________________________

(157, 5)
(157,)

4.4 Create LinearRegression model and train with X and Y data


4.5 Display the intercept and coefficient similar to figure shown
• Intercept: -725.5440014015896
• Coefficients: [ 35.74869687 -13.20361087 9.50021993 4.89828407 9.06157899]

10
[16]: # Create model and train

#lin_reg1 = _________________________________________________________

#__________________________________________________________

# Display intercept and coefficients

#__________________________________________________________

#__________________________________________________________

Intercept:
-725.5440014015896
Coefficients:
[ 35.74869687 -13.20361087 9.50021993 4.89828407 9.06157899]

4.6 Make prediction of input data X


4.7 Compute R2 score and display

[17]: ## Make prediction

#Y_pred = __________________________________________________________

## Calculate R2 value and display

#r2_value = __________________________________________________________

#__________________________________________________________

R-squared Coefficient of determination: 0.87

4.8 Display the predicted data together with the original data

[18]: ## Display the predicted data

#__________________________________________________________

#__________________________________________________________

Observation Species Weight Length1 Length2 Length3 Height Width \


0 1 1 242.0 23.2 25.4 30.0 38.4 13.4
1 2 1 290.0 24.0 26.3 31.2 40.0 13.8
2 3 1 340.0 23.9 26.5 31.1 39.8 15.1
3 4 1 363.0 26.3 29.0 33.5 38.0 13.3
4 5 1 430.0 26.5 29.0 34.0 36.6 15.1

11
Sex Y_pred
0 NaN 362.979915
1 NaN 402.557772
2 NaN 406.192554
3 NaN 456.653174
4 NaN 478.006268

5 Exercise 3:
5.1 Predict the weight of the fish with following input data
• ‘length1’ = 24
• ‘length2’ = 28
• ‘length3’ = 34
• ‘height’ = 41
• ‘width’ = 15
Expected output (estimate): Predicted weight of the fish is 422.48.

[19]: #Crate two dimensional array called new_data with proposed data

#new_data = __________________________________________________________

#predict fish weight and display

#fish_weight = __________________________________________________________

#__________________________________________________________

Predicted weight of the fish is 422.48.

12

You might also like