0% found this document useful (0 votes)
6 views17 pages

MDS372 Lab4 2448001

The document outlines a comprehensive analysis of house price prediction using various feature selection techniques on the California housing dataset. It evaluates the performance of models using all features, forward selection, backward elimination, recursive feature elimination, exhaustive search, and LASSO regression, measuring their effectiveness through mean squared error (MSE). The results indicate that feature selection methods, particularly exhaustive search and LASSO regression, improve model accuracy by reducing irrelevant features.

Uploaded by

Aaditya Dhaka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views17 pages

MDS372 Lab4 2448001

The document outlines a comprehensive analysis of house price prediction using various feature selection techniques on the California housing dataset. It evaluates the performance of models using all features, forward selection, backward elimination, recursive feature elimination, exhaustive search, and LASSO regression, measuring their effectiveness through mean squared error (MSE). The results indicate that feature selection methods, particularly exhaustive search and LASSO regression, improve model accuracy by reducing irrelevant features.

Uploaded by

Aaditya Dhaka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

LAB-4

Name: Aaditya Kumar Dhaka

Reg no: 2448001


# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.feature_selection import RFE
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS
from sklearn.linear_model import Lasso, Ridge
from sklearn.metrics import mean_squared_error

from sklearn.datasets import fetch_california_housing


data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target # House Prices

# Display the first few rows


df.head()

MedInc HouseAge AveRooms AveBedrms Population AveOccup


Latitude \
0 8.3252 41.0 6.984127 1.023810 322.0 2.555556
37.88
1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842
37.86
2 7.2574 52.0 8.288136 1.073446 496.0 2.802260
37.85
3 5.6431 52.0 5.817352 1.073059 558.0 2.547945
37.85
4 3.8462 52.0 6.281853 1.081081 565.0 2.181467
37.85

Longitude Target
0 -122.23 4.526
1 -122.22 3.585
2 -122.24 3.521
3 -122.25 3.413
4 -122.25 3.422
print("No of records/rows:", df.shape[0])
print("No of features/columns:", df.shape[1])
print("Features:", df.columns)

No of records/rows: 20640
No of features/columns: 9
Features: Index(['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms',
'Population', 'AveOccup',
'Latitude', 'Longitude', 'Target'],
dtype='object')

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MedInc 20640 non-null float64
1 HouseAge 20640 non-null float64
2 AveRooms 20640 non-null float64
3 AveBedrms 20640 non-null float64
4 Population 20640 non-null float64
5 AveOccup 20640 non-null float64
6 Latitude 20640 non-null float64
7 Longitude 20640 non-null float64
8 Target 20640 non-null float64
dtypes: float64(9)
memory usage: 1.4 MB

# Standardize Features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,
test_size=0.2, random_state=42)

# Function to Train and Evaluate Model


def evaluate_model(X_train, X_test, y_train, y_test):
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
return mse
Baseline Model
This is the model before any feature selection. Uses all available features to predict house prices.
# Train a model using ALL features (Baseline)
mse_all_features = evaluate_model(X_train, X_test, y_train, y_test)
print(f"MSE with All Features: {mse_all_features:.4f}")

MSE with All Features: 0.5559

This means, on average, the squared difference between predicted and actual values is 0.5559.

This serves as a reference for comparing feature selection techniques.

Wrapper Method
Wrapper methods use a machine learning model to select features by evaluating their impact on
performance.

1. Forward Selection
Starts with no features.

Adds features one by one that improve the model the most.

Stops when adding more features does not improve performance.


# Plot MSE vs. Number of Features
mse_values = []
num_features = []

for k in range(1, X_train.shape[1] + 1): # Test for 1 to all features


sfs_k = SFS(model, k_features=k, forward=True, floating=False,
scoring='neg_mean_squared_error', cv=5)
sfs_k.fit(X_train, y_train)

X_train_k = sfs_k.transform(X_train)
X_test_k = sfs_k.transform(X_test)

mse = evaluate_model(X_train_k, X_test_k, y_train, y_test)


mse_values.append(mse)
num_features.append(k)

# Plotting
plt.figure(figsize=(6, 4))
plt.plot(num_features, mse_values, marker="o", linestyle="--",
color="blue", label="MSE Score")
plt.xlabel("Number of Selected Features")
plt.ylabel("MSE Score")
plt.title("Forward Selection: MSE vs. Number of Features")
plt.legend()
plt.grid()
plt.show()

We test different numbers of selected features from 1 to all features.

Train Model & Compute MSE , for each feature count, we:

Select the best features using Forward Selection (sfs). Train the model using only those features.
Calculate MSE (error) and store it.

Plot MSE vs. Number of Features. This shows how model performance changes as we add or remove
features.
sfs = SFS(model, k_features=4, forward=True, floating=False,
scoring='neg_mean_squared_error', cv=5)
sfs.fit(X_train, y_train)

X_train_fs = sfs.transform(X_train)
X_test_fs = sfs.transform(X_test)

mse_forward = evaluate_model(X_train_fs, X_test_fs, y_train, y_test)


print(f"MSE with Forward Selection: {mse_forward:.4f}")

MSE with Forward Selection: 0.5490


MSE: 0.5490 (Better than baseline)

This means removing some features improved the model slightly. Suggests that not all features were
useful.

2. Backward Elimination
Starts with all features in the model.

Removes the least important feature one by one, based on the effect on model performance.

Stops when removing another feature would increase the error (MSE).
# Store MSE values for different numbers of selected features
mse_values = []
num_features = []

for k in range(X_train.shape[1], 0, -1): # Iterate from all features


to 1 feature
sbs = SFS(model, k_features=k, forward=False, floating=False,
scoring='neg_mean_squared_error', cv=5)
sbs.fit(X_train, y_train)

# Transform dataset with selected features


X_train_bs = sbs.transform(X_train)
X_test_bs = sbs.transform(X_test)

# Compute MSE and store results


mse = evaluate_model(X_train_bs, X_test_bs, y_train, y_test)
mse_values.append(mse)
num_features.append(k)

# Plot MSE vs. Number of Features for Backward Elimination


plt.figure(figsize=(6, 4))
plt.plot(num_features, mse_values, marker="o", linestyle="--",
color="red", label="MSE Score")
plt.xlabel("Number of Selected Features")
plt.ylabel("MSE Score")
plt.title("Backward Elimination: Model Performance vs. Number of
Features")
plt.legend()
plt.grid()
plt.show()
The model begins with all available features.

Remove Least Important Features :

It removes one feature at a time, starting with the least important one. The removal is based on
how much the feature contributes to reducing error (MSE).

Train Model & Compute MSE , for each feature count, we :

Select the best features using Backward Elimination (sbs). Train the model using only those
selected features. Calculate MSE (error) and store it.

Plot MSE vs. Number of Features showing how model performance changes as we remove features.
Find the Optimal Features the ones where MSE is the lowest.
sbs = SFS(model, k_features=4, forward=False, floating=False,
scoring='neg_mean_squared_error', cv=5)
sbs.fit(X_train, y_train)

X_train_bs = sbs.transform(X_train)
X_test_bs = sbs.transform(X_test)

mse_backward = evaluate_model(X_train_bs, X_test_bs, y_train, y_test)


print(f"MSE with Backward Elimination: {mse_backward:.4f}")

MSE with Backward Elimination: 0.5490


MSE: 0.5490 (Better than baseline)

This means removing some features improved the model slightly. Suggests that not all features were
useful.

3. Recursive Feature Elimination (RFE)


Starts with all features and trains the model.

Removes the least important feature based on its contribution to the model.

Repeats the process recursively until only the desired number of features remain.

Key Idea: RFE ranks features by importance and eliminates them one by one until only the most
significant ones are left.
# Store MSE values for different feature counts
mse_values = []
num_features = []

for k in range(1, X_train.shape[1] + 1): # Iterate from 1 feature to


all features
rfe = RFE(estimator=LinearRegression(), n_features_to_select=k)
rfe.fit(X_train, y_train)

# Transform dataset using selected features


X_train_rfe = rfe.transform(X_train)
X_test_rfe = rfe.transform(X_test)

# Compute MSE and store results


mse = evaluate_model(X_train_rfe, X_test_rfe, y_train, y_test)
mse_values.append(mse)
num_features.append(k)

# Plot MSE vs. Number of Features for RFE


plt.figure(figsize=(6, 4))
plt.plot(num_features, mse_values, marker="o", linestyle="--",
color="purple", label="MSE Score")
plt.xlabel("Number of Selected Features")
plt.ylabel("MSE Score")
plt.title("RFE: Model Performance vs. Number of Features")
plt.legend()
plt.grid()
plt.show()
Starting with all Features, the model begins with all available features.

Remove Least Important Features :

It removes one feature at a time, starting with the least important one. The removal is based on
how much the feature contributes to reducing error (MSE).

Train Model & Compute MSE , for each feature count, we:

Select the best features using Recursive Feature Elimination (RFE). Train the model using only
those selected features. Calculate MSE (error) and store it.

Plot MSE vs. Number of Features showing how model performance changes as we remove features.
Find the Optimal Features, the ones where MSE is the lowest.
rfe = RFE(estimator=LinearRegression(), n_features_to_select=3)
rfe.fit(X_train, y_train)

X_train_rfe = rfe.transform(X_train)
X_test_rfe = rfe.transform(X_test)

mse_rfe = evaluate_model(X_train_rfe, X_test_rfe, y_train, y_test)


print(f" MSE with RFE: {mse_rfe:.4f}")

MSE with RFE: 0.5608


MSE with RFE: 0.5608 (Slightly Worse than Baseline)

This means removing some features slightly increased the error, suggesting that important features
might have been removed.

4. Exhaustive Search
Tests all possible feature subsets within the specified range (min_features=3, max_features=5).

Trains a model for each subset and calculates the corresponding MSE.

Selects the feature set that minimizes MSE, ensuring the best possible feature combination
mse_values = []
feature_counts = list(range(3, 6)) # Testing feature sets of size 3,
4, and 5

for num_features in feature_counts:


efs = EFS(model, min_features=num_features,
max_features=num_features, scoring='neg_mean_squared_error', cv=3)
efs.fit(X_train, y_train)

X_train_efs = efs.transform(X_train)
X_test_efs = efs.transform(X_test)

mse_values.append(evaluate_model(X_train_efs, X_test_efs, y_train,


y_test))

# Plot MSE vs. Number of Selected Features


plt.figure(figsize=(6, 4))
plt.plot(feature_counts, mse_values, marker='o', linestyle='-',
color='orange')
plt.xlabel("Number of Selected Features")
plt.ylabel("MSE")
plt.title("Exhaustive Search: MSE vs. Number of Features")
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()

Features: 56/56
efs = EFS(model, min_features=4, max_features=4,
scoring='neg_mean_squared_error', cv=3)
efs.fit(X_train, y_train)

X_train_efs = efs.transform(X_train)
X_test_efs = efs.transform(X_test)

mse_exhaustive = evaluate_model(X_train_efs, X_test_efs, y_train,


y_test)
print(f"MSE with Exhaustive Search: {mse_exhaustive:.4f}")

Features: 70/70

MSE with Exhaustive Search: 0.5490


Since MSE is lowest at 4 features, setting min_features and max_features to 4 ensures the model
selects only the optimal feature subset.

MSE with Exhaustive Search: 0.5490 (Better Performance than Baseline model)

Thus Exhaustive Search found the optimal 4-feature subset, leading to the lowest error. It confirms
that removing irrelevant features improved model accuracy.

Embedded Method
Embedded methods select features during model training by applying built-in regularization
techniques (e.g., LASSO shrinks coefficients, dropping less important features).

They are faster than wrapper methods and prevent overfitting.

1. LASSO Regression
Performs feature selection by shrinking some coefficients to zero using L1 regularization.

Automatically removes less important features, keeping only the most significant ones.

Controls complexity through the alpha parameter, preventing overfitting.


import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error

# Define alpha values (log scale for better visualization)


alpha_values = np.logspace(-3, 2, 10) # From 0.001 to 100
mse_values = []

# Loop through different alpha values


for alpha in alpha_values:
lasso = Lasso(alpha=alpha)
lasso.fit(X_train, y_train)
y_pred = lasso.predict(X_test)
mse_values.append(mean_squared_error(y_test, y_pred))

# Find the best alpha (minimum MSE)


best_alpha = alpha_values[np.argmin(mse_values)]

# Plot Alpha vs. MSE for LASSO


plt.figure(figsize=(8, 5))
plt.plot(alpha_values, mse_values, marker='o', linestyle='-',
color='r', label="LASSO")
plt.xscale('log') # Log scale for alpha
plt.xlabel("Alpha (LASSO Penalty)")
plt.ylabel("Mean Squared Error (MSE)")
plt.title(f"LASSO Regression: MSE vs. Alpha")
plt.grid(True)
plt.legend()
plt.show()

X-axis represents different values of alpha (regularization strength).

Smaller alpha means less regularization (closer to standard Linear Regression). Larger alpha
means stronger regularization (more shrinkage of coefficients).

Y-axis represents the Mean Squared Error (MSE) of the model on the test data.

Lower MSE indicates better predictive performance. Higher MSE suggests underfitting (too
much shrinkage).

Small alpha (left side of the plot) → Low regularization : MSE is high if alpha is too low because we
include too many irrelevant features (overfitting risk).

Moderate alpha (middle of the plot) → Optimal point : This is where MSE is minimum, meaning
LASSO has effectively removed unnecessary features while retaining the important ones. This is the
best choice of alpha for balancing bias and variance.

Large alpha (right side of the plot) → High regularization : MSE increases because LASSO shrinks too
many coefficients to zero, leading to underfitting (important features are lost).

Choose alpha where MSE is lowest to get the best feature subset with good predictive power.
# Train Lasso with the best alpha
lasso = Lasso(alpha=best_alpha)
lasso.fit(X_train, y_train)

# Select only important features (non-zero coefficients)


selected_lasso = np.where(lasso.coef_ != 0)[0]

print(f"Selected {len(selected_lasso)} features out of


{X_train.shape[1]}")
print(f"Selected feature indices: {selected_lasso}") # Print selected
feature indices

# Subset dataset to selected features


X_train_lasso = X_train[:, selected_lasso]
X_test_lasso = X_test[:, selected_lasso]

# Train Lasso again on reduced features


lasso_selected = Lasso(alpha=best_alpha)
lasso_selected.fit(X_train_lasso, y_train)

# Evaluate performance
y_pred = lasso_selected.predict(X_test_lasso)
mse_lasso = mean_squared_error(y_test, y_pred)

print(f"MSE after LASSO Feature Selection: {mse_lasso:.4f}")


print(f"Best Alpha Selected: {best_alpha:.4f}")

Selected 7 features out of 8


Selected feature indices: [0 1 2 3 5 6 7]
MSE after LASSO Feature Selection: 0.5486
Best Alpha Selected: 0.0129

So from the MSE results, this gives the lowest MSE (0.5486, better than baseline model), meaning
the model is performing optimally well.

2. Ridge Regression
Lasso (L1 regularization) removes unimportant features by setting coefficients to zero, making it
useful for feature selection.

Ridge (L2 regularization) shrinks coefficients without setting them to zero, meaning all features are
retained but with reduced impact.

When to use which:

If the goal is feature selection, Lasso is the right approach. If we want to reduce multicollinearity
and stabilize coefficients without removing features, Ridge can be tried as well. If unsure, we
cantry both Lasso and Ridge, compare MSE values, and choose the one that performs best
Keeps all features but shrinks coefficients to prevent overfitting.

Uses L2 regularization, which penalizes large coefficients.

Helps when features are highly correlated and prevents instability in predictions.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

alphas = np.logspace(-4, 4, 50) # Test values from 0.0001 to 10000


mse_values = []

for alpha in alphas:


ridge = Ridge(alpha=alpha)
ridge.fit(X_train, y_train)
y_pred = ridge.predict(X_test)
mse_values.append(mean_squared_error(y_test, y_pred))

# Plot MSE vs Alpha


plt.figure(figsize=(8,5))
plt.plot(alphas, mse_values, 'bo-', markersize=3)
plt.xscale("log")
plt.xlabel("Alpha (log scale)")
plt.ylabel("Mean Squared Error (MSE)")
plt.title("MSE vs Alpha for Ridge Regression")
plt.show()
best_alpha = alphas[np.argmin(mse_values)] # Get the alpha with
lowest MSE

ridge = Ridge(alpha=best_alpha)
ridge.fit(X_train, y_train)

y_pred = ridge.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred)

print(f"Optimal Alpha: {best_alpha:.4f}")


print(f"MSE with Ridge Regression: {mse_ridge:.4f}")

Optimal Alpha: 232.9952


MSE with Ridge Regression: 0.5518

Comparing MSE Across Methods


import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error

# Define the evaluation function


def evaluate_model(X_train, X_test, y_train, y_test, alpha=None):
if alpha is None:
model = LinearRegression() # Default case with no alpha
else:
model = Ridge(alpha=alpha) # If alpha is provided, use Ridge

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
return mean_squared_error(y_test, y_pred)

# Define feature selection methods


methods = ["All Features", "Forward", "Backward", "RFE", "Exhaustive",
"LASSO", "Ridge"]
mses = [evaluate_model(X_train, X_test, y_train, y_test), mse_forward,
mse_backward, mse_rfe, mse_exhaustive, mse_lasso, mse_ridge]

# Plot
plt.figure(figsize=(10, 5))
ax = sns.barplot(x=methods, y=mses)

# Print MSE values on top of bars


for i, mse in enumerate(mses):
ax.text(i, mse + 0.02, f"{mse:.4f}", ha='center', fontsize=12,
fontweight='bold')

# Labels and title


plt.xlabel("Feature Selection Method")
plt.ylabel("Mean Squared Error (MSE)")
plt.title("Comparison of Feature Selection Methods")
plt.xticks(rotation=45)
plt.show()
LASSO Performs the Best (0.5486)

LASSO (L1 Regularization) removes unimportant features by setting their coefficients to zero,
leading to a more efficient and optimized model. By selecting only the most relevant features,
LASSO reduces noise and prevents overfitting, which improves MSE.

Forward, Backward, and Exhaustive Selection Perform Similarly (0.5490)

These methods systematically evaluate feature subsets, ensuring that only relevant features
remain. Since they rely on statistical evaluation, they tend to select similar subsets, leading to
almost identical MSE values.

All Features (0.5559) Performs Worse Than Selected Features

Keeping all features can introduce irrelevant or redundant features, which may add noise and
slightly reduce model performance.. Feature selection methods remove these unnecessary
features, leading to a small but noticeable improvement.

RFE Performs the Worst (0.5608)

RFE removes features recursively, but its selection process might have eliminated some
important features, leading to higher error.

Ridge Regression (0.5518) is Better Than Using All Features, But Worse Than LASSO

Ridge (L2 Regularization) shrinks coefficients instead of removing features. It helps control
overfitting but does not eliminate irrelevant features, meaning it doesn’t reduce MSE as
effectively as LASSO.

You might also like