0% found this document useful (0 votes)

11 views13 pages

SNT 7

The document outlines a linear regression analysis assignment using the Boston Housing Dataset to predict house prices based on independent variables such as the number of rooms, lower status population percentage, and pupil-teacher ratio. It details the steps for data preprocessing, model training using Ordinary Least Squares (OLS) regression, and performance evaluation through metrics like Mean Squared Error (MSE) and R² score. Additionally, it includes code snippets for data handling, exploratory data analysis, and interpretation of the regression results, along with limitations of the model.

Uploaded by

nisargbhatt.n

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views13 pages

SNT 7

Uploaded by

nisargbhatt.n

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

MA262-SNT D24IT179

Week 7: Linear Regression Analysis

Regression analysis is widely used in industries such as retail, meteorology, and real
estate to predict future trends. In this assignment, you will:

Select a dataset related to .

Preprocess the dataset for missing values and anomalies.

Apply to model the relationship between independent and dependent variables.

Evaluate model performance using a suitable metric

Follow following steps

Load the dataset using Pandas.

Handle missing values and perform data cleaning.

Select relevant independent variables (e.g., advertising budget, location, product

category for sales; humidity, altitude for temperature; house size, number of rooms for
housing prices).

Plot histograms, scatter plots, and correlation heatmaps to identify relationships

between variables.

Normalize or transform data if necessary.

Use from statsmodels to train the regression model.

Split data into training and testing sets.

Fit the model and interpret coefficients.

Use for model evaluation.

Identify key predictors and their impact on the dependent variable.

Analyze residuals to check for model validity.

Submit following

Problem definition and dataset details.

Data Preprocessing

CSPIT-IT 1
MA262-SNT D24IT179

OLS regression implementation and performance analysis.

Model interpretation and limitations.

Jupyter Notebook with well-commented code.

CSPIT-IT 2
MA262-SNT D24IT179

Problem Definition and Dataset Details

 Select a dataset relevant to regression analysis.
 Preprocess the dataset to handle missing values and anomalies.
 Use Ordinary Least Squares (OLS) Regression to find relationships between
variables.
 Evaluate the model using Mean Squared Error (MSE) and R² score.
 Interpret the regression coefficients and model performance

Dataset: Boston Housing Dataset (Predicting house prices)

This dataset contains information about different houses in Boston, including:
 Independent Variables (X):
o RM (Average number of rooms per dwelling)

o LSTAT (Percentage of lower status population)

o PTRATIO (Pupil-Teacher ratio in schools)

 Dependent Variable (y):

o MEDV (Median value of owner-occupied homes in $1000s)

CSPIT-IT 3
MA262-SNT D24IT179

Code : -
# 📌 Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
%matplotlib inline # Ensures plots are displayed in Jupyter Notebook

# 📌 Load the dataset (Boston Housing Data)

url = "https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv"
df = pd.read_csv(url)

# 📌 Display first 5 rows of the dataset

print("🔹 Initial 5 rows of the dataset:")
print(df.head())

# 📌 Check for missing values

print("\n🔹 Missing values in the dataset:\n", df.isnull().sum())

# 📌 If any missing values exist, fill them with column mean

df.fillna(df.mean(), inplace=True)

# 📌 Define independent (X) and dependent (y) variables

X = df[['rm', 'lstat', 'ptratio']] # Selecting important independent variables
y = df['medv'] # House price (dependent variable)

# 📌 Display dataset statistics

print("\n🔹 Summary statistics of dataset:")
print(df.describe())

# 📌 Exploratory Data Analysis (EDA)

# 🔹 Histogram of House Prices

plt.figure(figsize=(6,4))
sns.histplot(df['medv'], bins=20, kde=True, color="blue")
plt.title("Distribution of House Prices")
plt.xlabel("Median House Price ($1000s)")
plt.ylabel("Frequency")

CSPIT-IT 4
MA262-SNT D24IT179

plt.show()

# 🔹 Scatter plot of RM (rooms) vs. MEDV (price)

plt.figure(figsize=(6,4))
sns.scatterplot(x=df['rm'], y=df['medv'], color='red')
plt.title("Number of Rooms vs House Price")
plt.xlabel("Number of Rooms")
plt.ylabel("House Price ($1000s)")
plt.show()

# 🔹 Correlation Heatmap
plt.figure(figsize=(8,6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Correlation Heatmap of Variables")
plt.show()

# 📌 Splitting Data into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 📌 Add constant to independent variables for OLS regression

X_train = sm.add_constant(X_train)
X_test = sm.add_constant(X_test)

# 📌 Print dataset shapes to confirm splitting

print("\n🔹 Training Data Shape (X_train):", X_train.shape)
print("🔹 Testing Data Shape (X_test):", X_test.shape)
print("🔹 Training Target Shape (y_train):", y_train.shape)
print("🔹 Testing Target Shape (y_test):", y_test.shape)

# 📌 Print first few rows to confirm constant addition

print("\n🔹 First 5 rows of training data (X_train):")
print(X_train.head())

# 📌 Train the regression model using statsmodels

model = sm.OLS(y_train, X_train).fit()

# 📌 Display the model summary

print("\n🔹 OLS Regression Model Summary:")
print(model.summary())

# 📌 Predicting on Test Data

y_pred = model.predict(X_test)

# 📌 Compute Mean Squared Error (MSE) and R² Score

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

CSPIT-IT 5
MA262-SNT D24IT179

print(f"\n🔹 Model Performance:")

print(f"✅ Mean Squared Error (MSE): {mse:.2f}")
print(f"✅ R² Score: {r2:.2f}")

# 📌 Residual Analysis - Plot Residuals

residuals = y_test - y_pred

plt.figure(figsize=(6,4))
sns.scatterplot(x=y_pred, y=residuals, color='purple')
plt.axhline(y=0, color='red', linestyle='--')
plt.title("Residuals vs Predicted Values")
plt.xlabel("Predicted Values")
plt.ylabel("Residuals")
plt.show()

# 📌 Interpretation of Results
interpretation = """
🔹 **Interpretation of Coefficients:**
- `RM (Rooms)` → **Positive Coefficient**: More rooms increase house prices.
- `LSTAT (Lower Status %)` → **Negative Coefficient**: Higher % of lower-status people
reduces house prices.
- `PTRATIO (Pupil-Teacher Ratio)` → **Negative Coefficient**: Higher class sizes
negatively affect house prices.

🔹 **Limitations:**
1. The model assumes **linear relationships**, but real estate pricing may have non-
linear effects.
2. It does not account for **location**, crime rates, or neighborhood factors.
3. The dataset may contain **outliers** affecting predictions.
"""

print("\n🔹 Interpretation of Regression Model:")

print(interpretation)

CSPIT-IT 6
MA262-SNT D24IT179

Output : -

CSPIT-IT 7
MA262-SNT D24IT179

CSPIT-IT 8
MA262-SNT D24IT179

CSPIT-IT 9
MA262-SNT D24IT179

CSPIT-IT 10
MA262-SNT D24IT179

CSPIT-IT 11
MA262-SNT D24IT179

CSPIT-IT 12
MA262-SNT D24IT179

🔹Interpretation of Regression Analysis

- `RM (Rooms)` → Positive Coefficient: More rooms increase house prices.

- `LSTAT (Lower Status %)` → **Negative Coefficient**: Higher % of lower-status people reduces
house prices.

- `PTRATIO (Pupil-Teacher Ratio)` → **Negative Coefficient**: Higher class sizes negatively affect
house prices.

🔹 **Limitations:**

1. The model assumes **linear relationships**, but real estate pricing may have non-linear effects.

2. It does not account for location, crime rates, or neighborhood factors.

3. The dataset may contain outliers affecting predictions.

CSPIT-IT 13

MUNAR - Linear Regression - Ipynb - Colaboratory
No ratings yet
MUNAR - Linear Regression - Ipynb - Colaboratory
30 pages
T2 Summary VHA
No ratings yet
T2 Summary VHA
14 pages
AD-22053227 Lab 401, 402
No ratings yet
AD-22053227 Lab 401, 402
4 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Python File
No ratings yet
Python File
5 pages
Unit 3 5
No ratings yet
Unit 3 5
4 pages
Regression Analysis On The Boston House Price Dataset For House Price Prediction
No ratings yet
Regression Analysis On The Boston House Price Dataset For House Price Prediction
2 pages
Ex No.: Date: Problem Statement
No ratings yet
Ex No.: Date: Problem Statement
3 pages
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
No ratings yet
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
20 pages
Lab 3 - Linear Regression
No ratings yet
Lab 3 - Linear Regression
15 pages
DSBDAL - Assignment No 4
No ratings yet
DSBDAL - Assignment No 4
15 pages
20MIS1025 - Regression - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Regression - Ipynb - Colaboratory
5 pages
Lasso Regression Aim: Roll Number: 160122733094 Date
No ratings yet
Lasso Regression Aim: Roll Number: 160122733094 Date
8 pages
Practical No. 02: To Implement Linear Regression To Predict A Continuous Target Variable
No ratings yet
Practical No. 02: To Implement Linear Regression To Predict A Continuous Target Variable
4 pages
Lab 6 Linear Regression
No ratings yet
Lab 6 Linear Regression
1 page
Lab9 Solution 24052024 115622am
No ratings yet
Lab9 Solution 24052024 115622am
10 pages
Unit 5
No ratings yet
Unit 5
171 pages
ML Assignment4 (22bcb7162)
No ratings yet
ML Assignment4 (22bcb7162)
3 pages
AD-22053227 Lab 401, 402
No ratings yet
AD-22053227 Lab 401, 402
4 pages
IoT Task4 21BEC0384
No ratings yet
IoT Task4 21BEC0384
9 pages
DL Assignment 1ms24rai03
No ratings yet
DL Assignment 1ms24rai03
10 pages
AIML
No ratings yet
AIML
5 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
Continuous Assessment
No ratings yet
Continuous Assessment
4 pages
ML Assignment 1ipynb
No ratings yet
ML Assignment 1ipynb
10 pages
ML Manual
No ratings yet
ML Manual
30 pages
House Pricing
No ratings yet
House Pricing
15 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Document From Jahnavi
No ratings yet
Document From Jahnavi
20 pages
DA Lab2
No ratings yet
DA Lab2
5 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Beyond Syllabus:Implement Linear Regression Technique On Boston Houses Dataset in Python
No ratings yet
Beyond Syllabus:Implement Linear Regression Technique On Boston Houses Dataset in Python
2 pages
Exp 2 (Multiple Linear Regression)
No ratings yet
Exp 2 (Multiple Linear Regression)
6 pages
Week 6 LAB
No ratings yet
Week 6 LAB
13 pages
SML - Week 3
No ratings yet
SML - Week 3
5 pages
Boston Housing Kaggle Challenge With Linear Regression
No ratings yet
Boston Housing Kaggle Challenge With Linear Regression
3 pages
Linear Regression Apply On House Price Prediction On Boston House Dataset
No ratings yet
Linear Regression Apply On House Price Prediction On Boston House Dataset
12 pages
ML Lab-3
No ratings yet
ML Lab-3
14 pages
The Boston Housing Dataset
100% (2)
The Boston Housing Dataset
4 pages
ML Recordjp
No ratings yet
ML Recordjp
35 pages
Stats Assignment - Solution - Updated
No ratings yet
Stats Assignment - Solution - Updated
5 pages
Regression
No ratings yet
Regression
6 pages
Lab 6 - Linear Regression and Multiple Linear Regression
No ratings yet
Lab 6 - Linear Regression and Multiple Linear Regression
12 pages
Pratapa P Evidence of Learning 4
No ratings yet
Pratapa P Evidence of Learning 4
2 pages
ML Record
No ratings yet
ML Record
19 pages
Kritika Sejwal - 24MCI10023 - ML Lab - Worksheet 1
No ratings yet
Kritika Sejwal - 24MCI10023 - ML Lab - Worksheet 1
6 pages
Machine Learning-SEAIML-241P (PR) Bharat
No ratings yet
Machine Learning-SEAIML-241P (PR) Bharat
42 pages
Cap8 Predicting Continuous Target Variables With Regression Analysis - Thakur Ankita 2016 - Python Real World Data Science
No ratings yet
Cap8 Predicting Continuous Target Variables With Regression Analysis - Thakur Ankita 2016 - Python Real World Data Science
36 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
7 A
No ratings yet
7 A
2 pages
wvcg0mt7pkASSI 3 ML 16
No ratings yet
wvcg0mt7pkASSI 3 ML 16
4 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Assignment 2 - LP1
No ratings yet
Assignment 2 - LP1
7 pages
ML Project Part A 1
No ratings yet
ML Project Part A 1
6 pages
Search Algorithms Python Implementation
No ratings yet
Search Algorithms Python Implementation
6 pages
Linear Reg
No ratings yet
Linear Reg
25 pages
Coding Question
No ratings yet
Coding Question
6 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Dynamic Programming
No ratings yet
Dynamic Programming
83 pages
SNT 7
No ratings yet
SNT 7
4 pages
Practical 3
No ratings yet
Practical 3
4 pages
Week 8
No ratings yet
Week 8
4 pages
Lec 1 The Random Behavior of Asset Prices (Long) 20170821182630
No ratings yet
Lec 1 The Random Behavior of Asset Prices (Long) 20170821182630
20 pages
RVSP Unit 3
No ratings yet
RVSP Unit 3
25 pages
Phase Plane Analysis
No ratings yet
Phase Plane Analysis
83 pages
A Branch-and-Bound Algorithm For The Knapsack Problem With Conflict Graph
No ratings yet
A Branch-and-Bound Algorithm For The Knapsack Problem With Conflict Graph
24 pages
Chapter20 4e
No ratings yet
Chapter20 4e
36 pages
STA 421 LNote
No ratings yet
STA 421 LNote
20 pages
2nd International Conference IOT, Blockchain and Cryptography (IOTBC 2024)
No ratings yet
2nd International Conference IOT, Blockchain and Cryptography (IOTBC 2024)
3 pages
Plant Location Selection by Using A Three-Step
No ratings yet
Plant Location Selection by Using A Three-Step
4 pages
Operation Research
No ratings yet
Operation Research
131 pages
(Pec Cs701e)
No ratings yet
(Pec Cs701e)
4 pages
Experiment No 4 Vanraj
No ratings yet
Experiment No 4 Vanraj
2 pages
202104 - 공공분야 인공지능 도입 실무 안내서 PDF
No ratings yet
202104 - 공공분야 인공지능 도입 실무 안내서 PDF
74 pages
CH 2 Digital Communications
No ratings yet
CH 2 Digital Communications
22 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
3 pages
Mid Term Past Papers 701
No ratings yet
Mid Term Past Papers 701
4 pages
Question Bank - DC 6501
No ratings yet
Question Bank - DC 6501
10 pages
Quiz6 - Computer Vision
No ratings yet
Quiz6 - Computer Vision
3 pages
Seminar 8 - Network Models I (Exercise)
No ratings yet
Seminar 8 - Network Models I (Exercise)
2 pages
Sliding Mode Control PPT Final
No ratings yet
Sliding Mode Control PPT Final
28 pages
K-Space Filling
No ratings yet
K-Space Filling
3 pages
MA542 Lec13 Handout
No ratings yet
MA542 Lec13 Handout
18 pages
Demagnetization Fault Diagnosis of PMSM Based On Fuzzy Extreme Learning Machine
No ratings yet
Demagnetization Fault Diagnosis of PMSM Based On Fuzzy Extreme Learning Machine
6 pages
MST 2
No ratings yet
MST 2
4 pages
Fault Tolerance in An Infant Incubator
No ratings yet
Fault Tolerance in An Infant Incubator
4 pages
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
No ratings yet
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
5 pages
Quality Control in The Development Process of AI System On Ships
No ratings yet
Quality Control in The Development Process of AI System On Ships
7 pages
Comprehensive Survey of RANSAC Variants
100% (1)
Comprehensive Survey of RANSAC Variants
34 pages
Course Outlines For AI
No ratings yet
Course Outlines For AI
4 pages
Chapter 6 Linear Equation
No ratings yet
Chapter 6 Linear Equation
8 pages
21csc305p Machine Learning Unit 5
No ratings yet
21csc305p Machine Learning Unit 5
61 pages

SNT 7

Uploaded by

SNT 7

Uploaded by

MA262-SNT D24IT179

Week 7: Linear Regression Analysis

Select a dataset related to .

Preprocess the dataset for missing values and anomalies.

Apply to model the relationship between independent and dependent variables.

Evaluate model performance using a suitable metric

Follow following steps

Load the dataset using Pandas.

Handle missing values and perform data cleaning.

Select relevant independent variables (e.g., advertising budget, location, product

Plot histograms, scatter plots, and correlation heatmaps to identify relationships

Normalize or transform data if necessary.

Use from statsmodels to train the regression model.

Split data into training and testing sets.

Fit the model and interpret coefficients.

Use for model evaluation.

Identify key predictors and their impact on the dependent variable.

Analyze residuals to check for model validity.

Problem definition and dataset details.

OLS regression implementation and performance analysis.

Model interpretation and limitations.

Problem Definition and Dataset Details

Dataset: Boston Housing Dataset (Predicting house prices)

o LSTAT (Percentage of lower status population)

o PTRATIO (Pupil-Teacher ratio in schools)

 Dependent Variable (y):

# 📌 Load the dataset (Boston Housing Data)

# 📌 Display first 5 rows of the dataset

# 📌 Check for missing values

# 📌 If any missing values exist, fill them with column mean

# 📌 Define independent (X) and dependent (y) variables

# 📌 Display dataset statistics

# 📌 Exploratory Data Analysis (EDA)

# 🔹 Histogram of House Prices

# 🔹 Scatter plot of RM (rooms) vs. MEDV (price)

# 📌 Splitting Data into Training and Testing Sets

# 📌 Add constant to independent variables for OLS regression

# 📌 Print dataset shapes to confirm splitting

# 📌 Print first few rows to confirm constant addition

# 📌 Train the regression model using statsmodels

# 📌 Display the model summary

# 📌 Predicting on Test Data

# 📌 Compute Mean Squared Error (MSE) and R² Score

print(f"\n🔹 Model Performance:")

# 📌 Residual Analysis - Plot Residuals

print("\n🔹 Interpretation of Regression Model:")

🔹Interpretation of Regression Analysis

- `RM (Rooms)` → **Positive Coefficient**: More rooms increase house prices.

2. It does not account for **location**, crime rates, or neighborhood factors.

3. The dataset may contain **outliers** affecting predictions.

You might also like

- `RM (Rooms)` → Positive Coefficient: More rooms increase house prices.

2. It does not account for location, crime rates, or neighborhood factors.

3. The dataset may contain outliers affecting predictions.