Day 2-Data Science

Linear regression is a statistical method used to model the relationship between dependent and independent variables by fitting a straight line to the data, primarily suitable for cross-sectional data. It requires certain assumptions to be met for valid results, such as linearity and independence of observations, and can be applied in various fields like economics and business. The coefficient of determination (R-squared) measures the variation explained by the model, while Adjusted R-squared accounts for the number of predictors, providing a more accurate assessment of model performance.

Uploaded by

tugumekeith022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views16 pages

Day 2-Data Science

Uploaded by

tugumekeith022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

DAY 2

Linear Regression
Linear regression is a statistical method

used to model the relationship between a

dependent variable and one or more

independent variables by fitting a straight

line to the data. It assumes that changes

in the independent variable(s) correspond

to proportional changes in the dependent

variable.
Linear Regression
The most important element of this technique is that it contains
random elements, that are ignored by other models that are states
theoretically.
For example, as a business owner, economic theory would say
demand for my products is influenced by its price, income of the
consumers, prices of other similar goods, plus the tastes and
preferences of the consumers.
However, there are other factors that could affect demand for the
goods I produce, which are not necessarily captured by the model.
The influence of other factors is therefore taken care of by the
random variable in this model.
Given the nature of this technique, it is mainly used for cross sectional data (data
captured at a single point in time) and not advised for time series data (data
captured continuously over a long period of time), as it cannot account for
structural breaks and other factors that could have affected the outcome
variable all the way.
For example, consider a company that was established in 1950. As market
analyst, you are looking at the return on investment in marketing strategies over
time. In the 1950s, marketing was harder, as there was no internet, a very few
television stations, no social media and so on. But as of now, the situation is
different.
A linear regression cannot consider such facts, and that is why it is best for cross
sectional data. For time series data, there are more appropriate analysis
techniques that can be employed.
Key
Notes NB 1: THE DEPENDENT
VARIABLE MUST BE A
NB 2: VIGILANCE SHOULD ALSO BE GIVEN TO
THE NATURE OF INDEPENDENT VARIABLES. IF
ALL INDEPENDENT VARIABLES ARE
QUANTITATIVE, THEN THEY CAN BE USED AS
CONTINUOUS VARIABLE THEY ARE. HOWEVER, IN CASE ANY
FOR LINEAR REGRESSION PREDICTOR VARIABLE IS CATEGORICAL, THEN
IT MUST BE CODED, OTHERWISE, IT WILL BE
TO BE USED. CONSIDERED AS A CONTINUOUS VARIABLE,
AND THE RESULTS SHALL BE BIASED. A
CLEARER EXAMPLE TO THIS EFFECT SHALL BE
GIVEN AS WE PROCEED.
Specification This stage involves determining the dependent
of the model
and independent variables to be included in the
model and the mathematical form of the model.

Stages of
Estimation of This involves gathering of data, examining
the model.
problems and peculiarities within variables and

linear
performing tests such as multicollinearity tests.

regression. Evaluation of
estimates.
This involves hypothesis testing to determine
whether the calculated estimates are statistically
reliable.

Determining This involves residual diagnostics such as

the
forecasting heteroscedasticity tests to determine if results
ability of the from the model can be reliable.
model
In statistics and data science, not any model can be
used anyhow and at anywhere. For a model to
produce statistically sound results, certain
assumptions must hold. For a linear regression
Assumptions model, the following conditions MUST be met to
have valid results.
of Linear • Linearity: The relationship between independent and dependent
Regression variables must be linear.
• Independence: Observations are independent of each other.
• Homoscedasticity: Constant variance of residuals (errors) across
all levels of the independent variables.
• Normality of Residuals: Errors follow a normal distribution.
• No Multicollinearity (for multiple regression): Independent
variables should not be highly correlated.
• No autocorrelation of error terms: The error terms should be
independent of each other, even at different points in time.
Some of the Economics: Predicting GDP Finance: Estimating stock

applications
growth based on various returns based on investments
economic factors. and other factors.

of Linear
Regression

Business: Forecasting sales

based on advertising spending
and forecasting demand for
products.
𝑌𝑡 = 𝛽0 + 𝛽1 𝑋1 + 𝜀𝑡
Where;
• 𝑌𝑡 is the outcome variable
•
Simple Linear 𝛽1 is a parameter to be estimated
• 𝛽0 is a constant term
Regression • 𝑋1 is a predictor variable
• 𝜀𝑡 𝑖𝑠 𝑎 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

Involves one independent variable (X) and one dependent

variable (Y).
Coefficient of determination (R-squared)
This is used when there is one predictor
variable in a model. It explains the
percentage of the variation in the outcome
variable that is explained by the variation in
the predictor variables.
It is interpreted in terms of percentages. For
example, if the R-squared value in a simple
linear regression is 0.85, then it would been
that 85% of the variation in the outcome
variable is explained by the changes in the
independent variable.
PRACTICAL EXAMPLE OF SIMPLE LINEAR
REGRESSION
𝑌𝑡 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯+ 𝛽𝑘 𝑋𝑘 + 𝜀𝑡

• Where; Multiple
•
•
𝑌𝑡 is the outcome variable
𝛽𝑖 is a parameter to be estimated
Linear
• 𝛽0 is a constant term Regression
• 𝑋𝑖 is a predictor varable
In multiple linear regression,
Adjusted
increasing the number of predictor
coefficient of
variables always leads to an increase
determination
in the R-squared value, even if some
(Adjusted R-
of the added variables are irrelevant
Squared)
to the dependent variable.
For example, consider a model where demand for a business
product is assumed to be influenced by price, income level, and
people’s perception about the product. If this model initially
returns an R-squared value of 0.75, adding new predictors—such
as customer height and weight, even though they have no logical
connection to demand—would still increase the R-squared value,
let’s say to 0.88.
Ordinarily, a higher R-squared value suggests a better
model, but in this case, common sense tells us that
height and weight are irrelevant to demand. This
highlights a key limitation of R-squared: it does not
account for the significance of predictors, meaning it
can be misleading when unnecessary variables are
included.
To address this, we use the Adjusted R-squared, which
adjusts for the number of predictors and only considers
variables that have a significant effect on the dependent
variable. Unlike R-squared, Adjusted R-squared decreases
when irrelevant predictors are added, making it a more
reliable measure of model performance. This ensures that
the model reflects only meaningful relationships between
the independent variables and the dependent variable.

2.13.practical Example - Descriptive Statistics Exercise Solution
67% (3)
2.13.practical Example - Descriptive Statistics Exercise Solution
31 pages
Statistical Inference For Decision Making
No ratings yet
Statistical Inference For Decision Making
9 pages
Seminar Titles For Business Research Methods
100% (1)
Seminar Titles For Business Research Methods
20 pages
Example ANOVA
50% (2)
Example ANOVA
3 pages
Ritesh Machine Learning Project
100% (9)
Ritesh Machine Learning Project
46 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
Data Science
100% (1)
Data Science
14 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
STA301 Quiz-2 File by Vu Topper RM
No ratings yet
STA301 Quiz-2 File by Vu Topper RM
109 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
ML Unit2
No ratings yet
ML Unit2
69 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
ML Module3 Regression
No ratings yet
ML Module3 Regression
51 pages
1 - UNIT 2 2 Files Merged
No ratings yet
1 - UNIT 2 2 Files Merged
80 pages
MachineLearning Unit-II
No ratings yet
MachineLearning Unit-II
45 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
Updated Lecture 7
No ratings yet
Updated Lecture 7
29 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Linear Regression Model 1
No ratings yet
Linear Regression Model 1
23 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
59 pages
Linear Regression Basic Interview Questions
No ratings yet
Linear Regression Basic Interview Questions
36 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
IDS UNIT 5 Linear Regression
No ratings yet
IDS UNIT 5 Linear Regression
27 pages
Module 3
No ratings yet
Module 3
34 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Understanding Regression
No ratings yet
Understanding Regression
40 pages
Hanan
No ratings yet
Hanan
9 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Mod3 Eda
No ratings yet
Mod3 Eda
16 pages
Unit II-II
No ratings yet
Unit II-II
21 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
Unit 2questionbank-1
No ratings yet
Unit 2questionbank-1
38 pages
Lecture 9-10
No ratings yet
Lecture 9-10
28 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
SimpleMultipleLinearRegression FoundationalMathofAI S24
No ratings yet
SimpleMultipleLinearRegression FoundationalMathofAI S24
6 pages
Linear Regression For Machine Learning
No ratings yet
Linear Regression For Machine Learning
9 pages
6 RegressionEDITED
No ratings yet
6 RegressionEDITED
26 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Dimpas Bscpe 2-7 Assignment No.9
No ratings yet
Dimpas Bscpe 2-7 Assignment No.9
17 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
DA unit-III
No ratings yet
DA unit-III
30 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Introduction To Simple Linear Regression: - K.Tejashree (23H51A66F8)
No ratings yet
Introduction To Simple Linear Regression: - K.Tejashree (23H51A66F8)
10 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Linearregressionpl
No ratings yet
Linearregressionpl
9 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
1.5.linear Regression
No ratings yet
1.5.linear Regression
5 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Bsacore1 M5 Wed
No ratings yet
Bsacore1 M5 Wed
4 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Lecture CH 3
No ratings yet
Lecture CH 3
11 pages
Business Statistics-2 08.03.2022 MBS
No ratings yet
Business Statistics-2 08.03.2022 MBS
20 pages
N X N X: Chapter 8 Selected Problem Solutions
No ratings yet
N X N X: Chapter 8 Selected Problem Solutions
6 pages
ToothThe ToothGrowth Data Analysis in R Growth
No ratings yet
ToothThe ToothGrowth Data Analysis in R Growth
3 pages
Measures of Position: S X X Z
No ratings yet
Measures of Position: S X X Z
3 pages
QT-Assignment 1 - Nov22
No ratings yet
QT-Assignment 1 - Nov22
4 pages
Ho Mediation
No ratings yet
Ho Mediation
3 pages
LO3 - TASK 2&3: Statistics and Financial Decisions
No ratings yet
LO3 - TASK 2&3: Statistics and Financial Decisions
10 pages
RM PPT 1
No ratings yet
RM PPT 1
13 pages
DS100 Sp22 Lec 09 - Intro To Modeling, SLR
No ratings yet
DS100 Sp22 Lec 09 - Intro To Modeling, SLR
69 pages
AMR Assignment Pilgrim Bank: Indian Institute of Management Raipur
No ratings yet
AMR Assignment Pilgrim Bank: Indian Institute of Management Raipur
20 pages
Statistics
No ratings yet
Statistics
6 pages
Scott Long (2006) Testing For IIA in The Multinomial Model
No ratings yet
Scott Long (2006) Testing For IIA in The Multinomial Model
18 pages
Wa0001.
No ratings yet
Wa0001.
68 pages
Predicting Good Probabilities With Supervised Learning: Alexandru Niculescu-Mizil Rich Caruana
No ratings yet
Predicting Good Probabilities With Supervised Learning: Alexandru Niculescu-Mizil Rich Caruana
8 pages
Antim Prahar 2024 Business Statistics and Analysis
No ratings yet
Antim Prahar 2024 Business Statistics and Analysis
34 pages
G11 Modules
No ratings yet
G11 Modules
32 pages
M1T3 Lesson 4 HW Key
100% (1)
M1T3 Lesson 4 HW Key
2 pages
RANDOM SAMPLING Notes
No ratings yet
RANDOM SAMPLING Notes
12 pages
CASP RCT Checklist Year 2020
No ratings yet
CASP RCT Checklist Year 2020
8 pages
Super Learner
No ratings yet
Super Learner
23 pages
SPSR232 Sheehan
No ratings yet
SPSR232 Sheehan
7 pages
Formula - Sheet Using in Final Exam
No ratings yet
Formula - Sheet Using in Final Exam
3 pages
Guide For ILS Statistics and Probabilities
No ratings yet
Guide For ILS Statistics and Probabilities
5 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
From Everand
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
Lee Baker
No ratings yet

Day 2-Data Science

Uploaded by

Day 2-Data Science

Uploaded by

DAY 2

used to model the relationship between a

dependent variable and one or more

independent variables by fitting a straight

line to the data. It assumes that changes

in the independent variable(s) correspond

to proportional changes in the dependent

Determining This involves residual diagnostics such as

Business: Forecasting sales

Involves one independent variable (X) and one dependent

You might also like