Assignment 2 ML
Assignment 2 ML
0: Machine Learning
Assignment 02
Introduction
In this lab, we'll build on the fundamentals of linear regression that we covered in Lab 01.
Instead of implementing the algorithms from scratch, we'll use scikit-learn (sklearn) to focus on
the concepts of multiple linear regression, feature scaling, and model evaluation.
We'll work with a real-world dataset called Auto MPG, which contains information about various
cars and their fuel efficiency.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
import seaborn as sns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 159 entries, 0 to 158
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Species 159 non-null object
1 Weight 159 non-null float64
2 Length1 159 non-null float64
3 Length2 159 non-null float64
4 Length3 159 non-null float64
5 Height 159 non-null float64
6 Width 159 non-null float64
dtypes: float64(6), object(1)
memory usage: 8.8+ KB
# Summary statistics
print("\nSummary statistics:")
auto_data.describe()
Summary statistics:
{"columns":[{"name":"index","rawType":"object","type":"string"},
{"name":"Weight","rawType":"float64","type":"float"},
{"name":"Length1","rawType":"float64","type":"float"},
{"name":"Length2","rawType":"float64","type":"float"},
{"name":"Length3","rawType":"float64","type":"float"},
{"name":"Height","rawType":"float64","type":"float"},
{"name":"Width","rawType":"float64","type":"float"}],"conversionMethod
":"pd.DataFrame","ref":"dc61edd1-d244-4649-b2e7-d64e6459823e","rows":
[["count","159.0","159.0","159.0","159.0","159.0","159.0"],
["mean","398.3264150943396","26.247169811320756","28.415723270440253",
"31.227044025157234","8.970993710691824","4.417485534591195"],
["std","357.9783165508931","9.996441210553128","10.716328098884247","1
1.610245832690964","4.2862076199688675","1.685803869992167"],
["min","0.0","7.5","8.4","8.8","1.7284","1.0476"],
["25%","120.0","19.05","21.0","23.15","5.9448","3.38565"],
["50%","273.0","25.2","27.3","29.4","7.786","4.2485"],
["75%","650.0","32.7","35.5","39.650000000000006","12.3659","5.5845"],
["max","1650.0","59.0","63.4","68.0","18.957","8.142"]],"shape":
{"columns":6,"rows":8}}
Species 0
Weight 0
Length1 0
Length2 0
Length3 0
Height 0
Width 0
dtype: int64
Let's visualize the relationships between features and the target (mpg):
auto_data.columns
LinearRegression()
# Make predictions
y_simple_train_pred = simple_model.predict(X_simple_train)
y_simple_test_pred = simple_model.predict(X_simple_test)
print(y_simple_train_pred)
print(y_simple_test_pred)
# Calculate metrics
simple_train_mse = mean_squared_error(y_train, y_simple_train_pred)
simple_test_mse = mean_squared_error(y_test, y_simple_test_pred)
LinearRegression()
# Calculate metrics
multi_train_mse = multi_model.score(X_multi_train, y_train)
multi_test_mse = multi_model.score(X_multi_test, y_test)
# Create a scaler
scaler = StandardScaler()
# Fit the scaler on the training data and transform both training and
test data
X_multi_train_scaled = scaler.fit_transform(X_multi_train)
X_multi_test_scaled = scaler.transform(X_multi_test)
LinearRegression()
# Make predictions
y_multi_train_pred_scaled =
multi_model_scaled.predict(X_multi_train_scaled)
y_multi_test_pred_scaled =
multi_model_scaled.predict(X_multi_test_scaled)
# Calculate metrics
multi_train_mse_scaled = mean_squared_error(y_train,
y_multi_train_pred_scaled)
multi_test_mse_scaled = mean_squared_error(y_test,
y_multi_test_pred_scaled)
plt.subplot(1, 2, 2)
plt.bar(feature_names, np.abs(multi_model_scaled.coef_))
plt.title('Feature Importance (Scaled)')
plt.xticks(rotation=45)
plt.ylabel('|Coefficient|')
plt.tight_layout()
plt.show()