0% found this document useful (0 votes)

28 views5 pages

Assignment 2

The document outlines an assignment on simple linear regression, explaining its theory, calculations for estimating coefficients, and making predictions using the Boston Housing Dataset. It details the steps involved in exploratory data analysis, feature selection, and regression modeling, along with performance metrics for the dataset. The conclusion highlights the significant impact of certain features on house prices and suggests future improvements for model generalizability.

Uploaded by

apurva.kondekar6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views5 pages

Assignment 2

Uploaded by

apurva.kondekar6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

ASSIGNMENT NO.

2
AIM: Assignment on Linear Regression

PREREQUISITE: Python programming

THEORY:

Simple Linear Regression

When we have a single input attribute (x), and we want to use linear regression, this is called
simple linear regression.
If we had multiple input attributes (e.g. x1, x2, x3, etc.) This would be called multiple linear
regression. The procedure for linear regression is different and simpler than that for multiple
linear regression, so it is a good place to start.
In this section we are going to create a simple linear regression model from our training data,
then make predictions for our training data to get an idea of how well the model learned the
relationship in the data.
With simple linear regression we want to model our data as follows:

y = B0 + B1 * x

This is a line where y is the output variable we want to predict, x is the input variable we know
and B0 and B1 are coefficients that we need to estimate that move the line around.
Technically, B0 is called the intercept because it determines where the line intercepts the y-axis.
In machine learning we can call this the bias, because it is added to offset all predictions that we
make. The B1 term is called the slope because it defines the slope of the line or how x translates
into a y value before we add our bias.
The goal is to find the best estimates for the coefficients to minimize the errors in predicting y
from x.
Simple regression is great, because rather than having to search for values by trial and error or
calculate them analytically using more advanced linear algebra, we can estimate them directly
from our data.
We can start off by estimating the value for B1 as:

Where mean() is the average value for the variable in our dataset. The xi and yi refer to the fact
that we need to repeat these calculations across all values in our dataset and i refers to the i’th
value of x or y. We can calculate B0 using B1 and some statistics from our dataset, as follows:

Not that bad right? We can calculate these right in our spreadsheet.
Estimating Slope (B1) Let’s start with the top part of the equation, the numerator. First we need
to calculate the mean value of x and y. The mean is calculated as: sum(x) / n Where n is the
number of values (5 in this case). Let’s calculate the mean value of our x and y variables:
𝑋̅ = 3 , 𝑌̅= 2.8
We now have the parts for calculating the numerator. All we need to do is multiple the error for
each x with the error for each y and calculate the sum of these multiplications.

Summing the final column we have calculated our numerator as 8.

Now we need to calculate the bottom part of the equation for calculating B1, or the denominator.
This is calculated as the sum of the squared differences of each x value from the mean.
We have already calculated the difference of each x value from the mean, all we need to do is
square each value and calculate the sum.
Calculating the sum of these squared values gives us up denominator of 10 Now we can calculate
the value of our slope.
B1 = 8 / 10 so further B1 = 0.8
Estimating Intercept (B0) This is much easier as we already know the values of all of the terms
involved. 𝐵0 = 𝑌̅ − (𝐵1 ∗ 𝑋̅)
or
B0 = 2.8 – 0.8 * 3 , or further B0 = 0.4
Making Predictions
We now have the coefficients for our simple linear regression equation.

y = B0 + B1 * x
or
y = 0.4 + 0.8 * x
Let’s try out the model by making predictions for our training data.

We can plot these predictions as a line with our data. This gives us a visual idea of how well the
line models our data.

Estimating Error We can calculate a error for our predictions called the Root Mean Squared Error
or RMSE.

Where sqrt() is the square root function, p is the predicted value and y is the actual value, i is the
index for a specific instance, n is the number of predictions, because we must calculate the error
across all predicted values.

First we must calculate the difference between each model prediction and the actual y values. We
can easily calculate the square of each of these error values (error*error or error^2).

The sum of these errors is 2.4 units, dividing by n and taking the square root gives us: RMSE =
0.692 Or, each prediction is on average wrong by about 0.692 units.

The Boston Housing Dataset contains information about housing prices in different areas of
Boston, influenced by various factors like crime rate, proximity to employment centers, and
pollution levels. The key steps involved in the analysis include:

● Exploratory Data Analysis (EDA): Identifying patterns and correlations between

features such as crime rate (CRIM), average number of rooms (RM), and pollution levels
(NOX).
● Feature Selection & Engineering: Determining which features contribute most to
housing prices.
● Regression Modeling: Using machine learning algorithms (e.g., linear regression,
decision trees, or random forests) to predict median house prices (MEDV).

The performance metrics for this dataset are:

● Mean absolute error: 3.1627098714574053
● Mean squared error: 21.517444231177205
● R squared error: 0.7112260057484934
● Root mean squared error: 4.6386899261728205

REFERENCES:

1. Mitchell M., T., Machine Learning, McGraw Hill (1997) 1st Edition.

2. Alpaydin E., Introduction to Machine Learning, MIT Press (2014) 3rd Edition.

3. https://medium.com/analytics-vidhya/understanding-the-linear-regression-808c1f6941c0

CONCLUSION:

The analysis of the Boston Housing Dataset reveals that features like the number of rooms (RM),
crime rate (CRIM), and lower status population (LSTAT) significantly impact house prices.
Regression models can effectively predict median house values, but the dataset has limitations in
generalizability. Future improvements could include using advanced models and incorporating
real-world economic factors for

Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
Stats 101 - Class 03
No ratings yet
Stats 101 - Class 03
94 pages
Regression
No ratings yet
Regression
60 pages
1 Simple LR
No ratings yet
1 Simple LR
111 pages
Regression
No ratings yet
Regression
56 pages
Cost-Function
No ratings yet
Cost-Function
31 pages
ML - Regression
No ratings yet
ML - Regression
34 pages
Simple+Linear+Regression1
No ratings yet
Simple+Linear+Regression1
51 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
DSA1101 2019 Week2 Part1
No ratings yet
DSA1101 2019 Week2 Part1
51 pages
Linear Regresion
No ratings yet
Linear Regresion
28 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
DAUNIT-3
No ratings yet
DAUNIT-3
32 pages
Lecture 3
No ratings yet
Lecture 3
42 pages
LinearRegression
No ratings yet
LinearRegression
24 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
2 Linear
No ratings yet
2 Linear
15 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
Stat 302 Lec 12
No ratings yet
Stat 302 Lec 12
59 pages
AI lab7
No ratings yet
AI lab7
13 pages
DSBDAL_Assignment no 4
No ratings yet
DSBDAL_Assignment no 4
15 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
linear regression (1)
No ratings yet
linear regression (1)
8 pages
SimpleLinearRegression PDF
No ratings yet
SimpleLinearRegression PDF
86 pages
Simple Linear Regression: Math Behind
No ratings yet
Simple Linear Regression: Math Behind
6 pages
fds_merged (3) (1)
No ratings yet
fds_merged (3) (1)
102 pages
Chapter 14
No ratings yet
Chapter 14
18 pages
Sessions 18 19 - Regression - SLR MLR
No ratings yet
Sessions 18 19 - Regression - SLR MLR
70 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
Group_1_Practical
No ratings yet
Group_1_Practical
16 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
ML Unit
No ratings yet
ML Unit
23 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Simplelinearregression NBC
No ratings yet
Simplelinearregression NBC
50 pages
Statistics I Ii For Dummies 2 Ebook Bundle 1 2 Deborah Rumsey download
No ratings yet
Statistics I Ii For Dummies 2 Ebook Bundle 1 2 Deborah Rumsey download
87 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Regression Notes- Part-1
No ratings yet
Regression Notes- Part-1
17 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Chap 10 Regression Analysis
No ratings yet
Chap 10 Regression Analysis
68 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Data Analytics Unit 3 Notes
100% (3)
Data Analytics Unit 3 Notes
28 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Machine Learning Practical File
No ratings yet
Machine Learning Practical File
41 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Regression
No ratings yet
Regression
16 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Applied Quantitative Analysis and Practices: Lecture#22
No ratings yet
Applied Quantitative Analysis and Practices: Lecture#22
27 pages
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
No ratings yet
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
20 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
RSE Unit I
No ratings yet
RSE Unit I
13 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Regression Notes
100% (1)
Regression Notes
20 pages
Multiple Linear Regression test_2025
No ratings yet
Multiple Linear Regression test_2025
47 pages
Linear regression case study
No ratings yet
Linear regression case study
6 pages
PPT on Data Science
No ratings yet
PPT on Data Science
27 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Correlation and Regression
100% (1)
Correlation and Regression
100 pages
Correlation & Regression
No ratings yet
Correlation & Regression
70 pages
FinalReport Life Insurance
80% (5)
FinalReport Life Insurance
34 pages
Investigating The Relationship Between UK National Lockdowns and Daily COVID-19 Cases, and Modelling
No ratings yet
Investigating The Relationship Between UK National Lockdowns and Daily COVID-19 Cases, and Modelling
27 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
25 pages
10 DipaliKhatri SIRP
No ratings yet
10 DipaliKhatri SIRP
33 pages
Multicollinearity Assignment April 5
100% (1)
Multicollinearity Assignment April 5
15 pages
Two-Variable Regression Analysis, Some Basic Ideas
No ratings yet
Two-Variable Regression Analysis, Some Basic Ideas
28 pages
Engineering Data Analysis Handsout Module 1 6 - Compress
No ratings yet
Engineering Data Analysis Handsout Module 1 6 - Compress
12 pages
Cab Fare Prediction Report by Abhinav Jha
No ratings yet
Cab Fare Prediction Report by Abhinav Jha
41 pages
Statistics For Business Analysis: Learning Objectives
No ratings yet
Statistics For Business Analysis: Learning Objectives
37 pages
Cattle Weight Estimation Using Linear Regression A
No ratings yet
Cattle Weight Estimation Using Linear Regression A
8 pages
DWDM Unit 1 Chap2 PDF
No ratings yet
DWDM Unit 1 Chap2 PDF
21 pages
Notes On Forecasting With Moving Averages - Robert Nau
No ratings yet
Notes On Forecasting With Moving Averages - Robert Nau
28 pages
2021 Exam Information & Learning Objective Statements: CMT Level Ii
No ratings yet
2021 Exam Information & Learning Objective Statements: CMT Level Ii
26 pages
5 Forecasting-Ch 3 (Stevenson) PDF
No ratings yet
5 Forecasting-Ch 3 (Stevenson) PDF
50 pages
2nd Year Syllabus
No ratings yet
2nd Year Syllabus
12 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Array: Tulsi Pawan Fowdur, Rosun Mohammad Nassir-Ud-Diin Ibn Nazir
No ratings yet
Array: Tulsi Pawan Fowdur, Rosun Mohammad Nassir-Ud-Diin Ibn Nazir
13 pages
D Linear Regression With R
No ratings yet
D Linear Regression With R
9 pages
Ocean Engineering: Yugowati Praharsi, Mohammad Abu Jami'in, Gaguk Suhardjito, Hui-Ming Wee
No ratings yet
Ocean Engineering: Yugowati Praharsi, Mohammad Abu Jami'in, Gaguk Suhardjito, Hui-Ming Wee
12 pages
Adult Learner Characteristics As Predictors
No ratings yet
Adult Learner Characteristics As Predictors
10 pages
A Review On Data Analytics For Supply Chain Management: A Case Study
No ratings yet
A Review On Data Analytics For Supply Chain Management: A Case Study
12 pages
Cost Estimation
No ratings yet
Cost Estimation
11 pages
Genetic Parameters and Environmental Effects
No ratings yet
Genetic Parameters and Environmental Effects
5 pages
A Random Forest-Based Classification Method For Prediction of Car Price
No ratings yet
A Random Forest-Based Classification Method For Prediction of Car Price
1 page
Student Solutions Manual for Mathematics for Economics, fourth edition
From Everand
Student Solutions Manual for Mathematics for Economics, fourth edition
Michael Hoy
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Assignment 2

Uploaded by

Assignment 2

Uploaded by

ASSIGNMENT NO.

PREREQUISITE: Python programming

Simple Linear Regression

Summing the final column we have calculated our numerator as 8.

●​ Exploratory Data Analysis (EDA): Identifying patterns and correlations between

The performance metrics for this dataset are:

You might also like

● Exploratory Data Analysis (EDA): Identifying patterns and correlations between