Multi_Regression
Multi_Regression
In this exercise, we will use Linear Regression model from Scikit Learn.
1. Linear-Regression with Scikit Learn Library
2. Multi-Regression with Scikit Learn Library
Instruction to compplete lab exercises:
1. Open python notebook file under Lab folder
2. Read the problem statement in the exercise and expected output
3. Uncomment and remove the lines and fill in wiht your answer
4. Run your code to produce expected output.
Noted: Data files are stored in dataset folder
1
6. R-square using r2_score function from Scikit Learn metrics package
7. Predict a single data using with model
car_data = pd.read_csv('dataset/auto-mpg-clean.csv',header=0,
,→skipinitialspace=True) #read the data
car_data.head(5)
[6]: car_data.describe()
2
acceleration model year origin
count 392.000000 392.000000 392.000000
mean 15.541327 75.979592 1.576531
std 2.758864 3.683737 0.805518
min 8.000000 70.000000 1.000000
25% 13.775000 73.000000 1.000000
50% 15.500000 76.000000 1.000000
75% 17.025000 79.000000 2.000000
max 24.800000 82.000000 3.000000
'''
X = car_data.iloc[:,4:5]
Y = car_data.iloc[:,0:1]
# or
X = car_data['weight'].to_numpy()
Y = car_data['mpg'].to_numpy()
X = X.reshape(-1, 1)
Y = Y.reshape(-1, 1)
'''
# or
X = car_data[['weight']]
Y = car_data['mpg']
[8]: ## Create LinearRegression model from Sckit Learn package and fit the data
## You may add additional variables within the brackets or reshape the data
lin_reg.fit(X,Y)
[9]: LinearRegression(normalize=True)
3
1.3 Predict the all data of X
1.4 Display the Coefficients, Intercept and r-squared value
Y_pred = lin_reg.predict(X)
Coefficients: [-0.00764734]
Intercept: 46.216524549017585
R-squared Coefficient of determination: 0.69
4
0 26.0 4 97.0 46 1835 20.5
1 26.0 4 97.0 46 1950 21.0
2 43.1 4 90.0 48 1985 21.5
3 44.3 4 90.0 48 2085 21.7
4 43.4 4 90.0 48 2335 23.7
2 Exercise 1:
Perform linear regression using sklearn lib on the Income3 dataset. Predict the Income based on
the Year of Education.
#my_data = ___________________________________________________________
#_________________________________________
# 2) Extract data
#X = ___________________________________________________________
#Y = = ___________________________________________________________
#lin_reg = ___________________________________________________________
5
#= ___________________________________________________________
# 4) Predict income data (Y) based on number of years in higher education (X)
#Y_pred = ___________________________________________________________
#___________________________________________________________
#___________________________________________________________
#r2_data = ___________________________________________________________
#___________________________________________________________
# 7) Predict a single data eg. print the output expected income for 3 years in
,→higher education
#yeasofeducation = ___________________________________________________________
#predicted_income = __________________________________________________________
6
with corresponding coefficients, along with the constant term. Multiple regression requires two
or more independent variables, and this is why it is called multiple regression.
Use the same fuel consumption dataset and implement Multiple regression model. Select the MPG
data as target and the rest of the data such as ‘cylinders’, ‘displacement’, ‘horsepower’, ‘weight’,
‘acceleration’, ‘model year’ as input data X.
#split
#X = car_data.to_numpy()
#X = X[:, 1:]
#or
#select the column
Y = car_data['mpg']
(392, 6)
(392,)
7
[11]: lin_reg = LinearRegression(normalize=True)
lin_reg.fit(X, Y)
Intercept:
-14.535250480506125
Coefficients:
[-3.29859089e-01 7.67843024e-03 -3.91355574e-04 -6.79461791e-03
8.52732469e-02 7.53367180e-01]
Y_pred = lin_reg.predict(X)
r2_value = r2_score(Y, Y_pred)
8
4 80 2 vw dasher (diesel) 31.242504
3 Exercise 1: Predict the mpg data of the car with following informa-
tion
• ‘cylinders’ = 4
• ‘displacement’ = 97
• ‘horsepower’ = 48
• ‘weight’ = 2000
• ‘acceleration’ = 23.8
• ‘model year’ = 80
Expected output: Predicted mpg info is 33.58.
[13]: # Crate two dimensional array called new_data with proposed data
#new_data = __________________________________________________________
#p_mpg = __________________________________________________________
#__________________________________________________________________
4 Exercise 2:
Perform multi regression using sklearn lib on the fishcatch dataset. Predict the weight of fish
based on all features EXCEPT sex and species
Show your R-squared and Adjusted R-squared.
9
[14]: ## Load csv data and display top 5 data
#fishcatch_data =
,→_____________________________________________________________________________
#= __________________________________________________________
Sex
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
#X = = __________________________________________________________
#Y = = __________________________________________________________
#__________________________________________________________
#__________________________________________________________
(157, 5)
(157,)
10
[16]: # Create model and train
#lin_reg1 = _________________________________________________________
#__________________________________________________________
#__________________________________________________________
#__________________________________________________________
Intercept:
-725.5440014015896
Coefficients:
[ 35.74869687 -13.20361087 9.50021993 4.89828407 9.06157899]
#Y_pred = __________________________________________________________
#r2_value = __________________________________________________________
#__________________________________________________________
4.8 Display the predicted data together with the original data
#__________________________________________________________
#__________________________________________________________
11
Sex Y_pred
0 NaN 362.979915
1 NaN 402.557772
2 NaN 406.192554
3 NaN 456.653174
4 NaN 478.006268
5 Exercise 3:
5.1 Predict the weight of the fish with following input data
• ‘length1’ = 24
• ‘length2’ = 28
• ‘length3’ = 34
• ‘height’ = 41
• ‘width’ = 15
Expected output (estimate): Predicted weight of the fish is 422.48.
[19]: #Crate two dimensional array called new_data with proposed data
#new_data = __________________________________________________________
#fish_weight = __________________________________________________________
#__________________________________________________________
12