ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
This lab tutorial will guide you on analyzing, building, and testing regression models in Python. We will use
the popular scikit-learn library for this purpose.
1. Go to Google Colab.
2. Create a new notebook by clicking on File > New Notebook.
3. You're now ready to start writing and executing Python code in the notebook!
Let us use the "Diabetes" dataset available in scikit-learn, which contains ten baseline variables, six blood
serum measurements, age, sex, body mass index, average blood pressure, and six blood serum measurements
for 442 diabetes patients.
import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
POINTS TO PONDER:
Nonlinear regression models are used when the relationship between the independent and dependent variables
is not linear. One popular nonlinear regression model is the Polynomial Regression. In the next step, we will
perform Polynomial Regression on the "Diabetes" dataset.
POINTS TO PONDER:
The use of a pipeline with Polynomial Features and Linear Regression allows us to seamlessly combine the
process of transforming the features into polynomial features and fitting a linear regression model in a single
step. Here is a breakdown of why we use this approach:
Step-2 - Linear Regression: After transforming the features using Polynomial Features, we are left with a set
of new features, which may be of higher degree. However, the relationship between these transformed features
and the target variable is still linear.
Step-3 - Pipeline: A pipeline in scikit-learn allows us to chain multiple processing steps together. In this
case, we first apply the Polynomial Features transformation, followed by fitting a Linear Regression model.
The pipeline ensures that the same preprocessing steps are applied to both the training and testing data.
This approach simplifies the modeling process and allows us to utilize the existing tools for linear regression,
including evaluation metrics like Mean Squared Error.
Yes, you can apply Polynomial Regression directly without using a pipeline. In scikit-learn, you can use the
PolynomialFeatures class to transform your features, and then apply a Linear Regression model on the
transformed features. This approach is also valid and provides you with more flexibility if you want to explore
the intermediate steps or apply additional customizations.
You can access these coefficients using model_poly.coef_ and model_poly.intercept_. Remember that the
equation becomes more complex for higher-degree polynomials and involves multiple coefficients for each
feature.
Lab Task:
Use any of the publically available datasets for the Multivariate Regression problem. The dataset must include
a mix of numerical and categorical/ordinal features. Perform the following steps on the data.
a) Choose different regression models (e.g., Linear Regression, Ridge Regression, Lasso
Regression, and Polynomial Regression). Train them using the features and target variable.
b) Train these models with Feature Normalization and Standardization as well.
a) Predict the target variable using the trained models and calculate the Mean Squared Error (MSE).
b) Compare the performance of the chosen regression models with and without feature scaling.
c) Visualize the predictions against the actual values for better understanding.
d) Take a random test sample and predict its value.
e) Prepare a 1-2 page report on the analysis of results for different regression models.