Prediction & Forecasting: Regression Analysis

Regression analysis is a statistical technique used to analyze relationships between variables and predict unknown variables. Simple linear regression models a linear relationship between one dependent and one independent variable, while multiple linear regression extends this to model relationships between one dependent and multiple independent variables. Key steps in regression analysis include defining the business problem, collecting and preprocessing data, building and evaluating models, and deploying the optimal model.

Uploaded by

smartanand2009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views3 pages

Prediction & Forecasting: Regression Analysis

Uploaded by

smartanand2009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Regression analysis a statistical technique to analyse the relationship betn different variables and predict a continuous unknown

variable using this relationship.

Regression analysis can be of the following two types: Prediction & Forecasting The fundamental difference between prediction
and forecasting is the introduction of the time dimension as an essential component in the forecasting use-cases.
PROCESS FLOW: DefineBusinessProblem -> GetData -> Data Pre-Processing & Analysis -> Build Model -> Evaluate the Model -> Deploy
Simple Linear Regression : There is only one dependent variable and one independent variable for which the predictions are to be
made, having a linear relationship between them. Independent variable == Predictor variable & Dependent == Output variable.
Difference w/ Correlation
- Tells the extent of relationship between 2 variables (-1 to 1) Vs making predictions
- No distinction between dependent & independent variables Vs Clear distinction
SLR Equation: y = β0 + β1x β0, β1 are coefficients ͠ Intercept & Slope
A best-fit line is to be fitted on the data that explains the relationship between the variables and is found by minimising the sum of
squares of the residuals (RSS):

∑𝒏𝒊 𝟏( 𝒀(𝒊) − 𝒀 𝒊𝒑𝒓𝒆𝒅 )𝟐 RSS is minimized to find the optimal values of β₀ and β₁
 Output of regression is always a continuous numeric variable
Multiple Linear Regression: Represents the relationship between Multiple Independent variables of any type and a single dependent
variable which is continuous.

MLR is an extension of SLR: y = β0 + β1 x1 + β2 x2 + …. + βn xn + ϵ (models error term)

 Find the best fit Hyperplane instead of best fit straight line
 Coefficients obtained by minimizing sum of squared errors (least squares) as in SLR
 There should be linear relation between predictor & response variable/s
Overfitting: More variables , more tendency to memorize outcomes in training
- makes the model complex, fail to generalize, high accuracy on training but less in test
Multi-Collinearity: There should be no or minimal multi-collinearity between predictor variables
Feature Selection: Keep only most relevant variables & avoid overfit-leading to less test accuracy
 Not all potential predictors are significant
 Too many features - lead to over-fitting & increase in training time
 Helps in determining adequate number of features for highest accuracy
Methods of feature selection: RFE: Recursive Feature Elimination
 Brute force: Try all combinations. Not efficient
 Manual Feature selection: Good when the number of features are less
 Forward Selection: Automated bottom-up method, starting with one feature and adding till there is no additional benefit
 Backward Elimination: Bottom-up method starting with all and keep removing features that are least significant
Multi-Collinearity : Having related predictor variables in the input data set. While the model is assumed to be built using several
independent variables, some may be inter-related.
- One may not know which variables are actually contributing - Explainability vanishes
- Does NOT impact precision or accuracy of prediction, or value of response variable
- P-value of coefficients may not be reliable
Detection : Visual way : Pair wise correlation plot using Scatter plot, Heat Map matrix
Objective way : VIF (Variance Inflation Factor)

VIF calculates how well one independent variable is explained by all the other independent variables combined. VIF measures how
well a predictor variable can be predicted using all the other predictor variables.
𝟏
𝑽𝑰𝑭 = { > 10 Eliminate > 5 Worth Inspecting < 5 GOOD }
𝟏 𝑹𝟐
 Calculated running regression for every X considered as dependent variable & rest as independent
Solution: Drop highly corelated variables (only 1 at a time) or those with less business value, create a new variable by combining
related variables, Transform the variables (PCA)
k-NN for Regression Supervised learning algorithm that analyses a certain number of neighbours to assign the dependent variable a
cumulative value.
 The K-NN algorithm does not assume any kind of linear relationship between the independent and dependent variables. It simply
finds the unknown value by assigning the most suitable value using the nearest neighbour criterion.
 find the nearest points to an unknown point and assign the new value by calculating the mean of these nearest points(dependent
variables)
o Select K
o Calculate Euclidean distance
o Average of the points
o Assign value
Euclidean distance between two points is: (𝑥 − 𝑥1) + (𝑦 − 𝑦1)
SIMPLE EVALUATION METRICS - would not be helpful unless you compare them to other models’ equivalent values
Mean Squared Error (MSE) 1 2
𝑛
Root Mean Squared Error(RMSE) 2
The deviation of the values predicted by a model from the actual ∑
observed values. Lower the better. 𝑛
Mean Absolute Error(MAE) 1
Only for magnitude of error and not direction 𝑛
R SQUARED R-Squared, also known as the Coefficient of  Lies between [0,1]
Determination, is the percentage of the variance of the output
variable that can be explained by the independent variables. [Unexplained/Total variation]
 Higher the better. High value implies a strong linear relation
i.e., the data fits the regression line more accurately
RSS (Residual Sum of Squares)
 1 implies - the variance in the data is explained by the model
 0 implies - none of the variance is being explained by model. TSS (Total Sum of Squares)
 0.1 R-square: model explains 10% of variation within the data.
Significance of the derived beta coefficient – P-Value [If p-value < significance level (usually 0.05) then your model fits the data well]
Fitted line in case of randomly scattered data does not serve the purpose, hence it is necessary to check if it is statistically significant.
Finding β1 is significant : Hypothesis testing
 Start with assuming β1 is not significant i.e. No relationship between X & Y
 NULL HYPOTHESIS H0 : β1 = 0 ALTERNATE HYPOTHESIS HA : β1 ≠ 0
 if the p-value < 0.05, reject the NULL hypothesis and state that β1 is indeed significant
 if you fail to reject the null hypothesis – independent variable is insignificant in the prediction of the dependent variable
How can you find out the variables which are contributing the least to the model?
 The p-value for each term tests the null hypothesis that the coefficient is equal to zero (no effect).
 High p-values(significance of a variable in a model) > 0.05 are not relevant and should be dropped(do not help in prediction)
 A low p-value (< 0.05) indicates that you can reject the null hypothesis. i.e., a predictor that has a low p-value is likely to be a
meaningful addition to your model because changes in the predictor's value are related to changes in the response variable

Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
RiP Final Study
No ratings yet
RiP Final Study
35 pages
13 Predictive Analysis - Tests of Association - Regression
No ratings yet
13 Predictive Analysis - Tests of Association - Regression
70 pages
DA unit-III
No ratings yet
DA unit-III
30 pages
Data Science
100% (1)
Data Science
14 pages
Unit 2
No ratings yet
Unit 2
136 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Statistic and Data Science Ii PDF
No ratings yet
Statistic and Data Science Ii PDF
37 pages
BA Notes (End Sem)
No ratings yet
BA Notes (End Sem)
26 pages
Presentation Regression Analysis
No ratings yet
Presentation Regression Analysis
61 pages
Unit - 3 - ML - 24
No ratings yet
Unit - 3 - ML - 24
41 pages
Predictive Analytics-Mid Sem Exam Question Bank
No ratings yet
Predictive Analytics-Mid Sem Exam Question Bank
28 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Unit 5 Business Analytics
No ratings yet
Unit 5 Business Analytics
24 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
ML Ai
No ratings yet
ML Ai
53 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Linear Regresion
No ratings yet
Linear Regresion
28 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Unit 3
No ratings yet
Unit 3
24 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Module 3
No ratings yet
Module 3
34 pages
REGRESSION ANALYSIS 1 and 2 Notes
No ratings yet
REGRESSION ANALYSIS 1 and 2 Notes
9 pages
Unit III
No ratings yet
Unit III
13 pages
UKP6053 - L8 Multiple Regression
100% (2)
UKP6053 - L8 Multiple Regression
105 pages
Module 4
No ratings yet
Module 4
33 pages
ML Unit-2
No ratings yet
ML Unit-2
34 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Music-6-Performance Task Q1
No ratings yet
Music-6-Performance Task Q1
7 pages
Rohan 20QM30011 AMSM Assignment Ch8
No ratings yet
Rohan 20QM30011 AMSM Assignment Ch8
11 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
Data Science Module 5 Q & A
No ratings yet
Data Science Module 5 Q & A
8 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Session 12: Regression, Forecasting Techniques: January 2015 - April 2015
No ratings yet
Session 12: Regression, Forecasting Techniques: January 2015 - April 2015
9 pages
336 Final Review Notes: Regression
No ratings yet
336 Final Review Notes: Regression
4 pages
Statistical Testing and Prediction Using Linear Regression: Abstract
No ratings yet
Statistical Testing and Prediction Using Linear Regression: Abstract
10 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
Linear Regression For Machine Learning
No ratings yet
Linear Regression For Machine Learning
9 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Selecting Tetradic Short Forms of The Taiwan Wechsler Adult Intelligence Scale IV
No ratings yet
Selecting Tetradic Short Forms of The Taiwan Wechsler Adult Intelligence Scale IV
12 pages
FEM Frame PDF
No ratings yet
FEM Frame PDF
7 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Multiple Regression: by Dr. D. Israel
No ratings yet
Multiple Regression: by Dr. D. Israel
23 pages
Dr. Hussin Abdullah School of Economics, Finance and Banking, Uum Cob
No ratings yet
Dr. Hussin Abdullah School of Economics, Finance and Banking, Uum Cob
12 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Regression
No ratings yet
Regression
25 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Class 12th Maths Chapter 5 (Continuity and Differentiability) Unsolved
No ratings yet
Class 12th Maths Chapter 5 (Continuity and Differentiability) Unsolved
9 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Tricks For Multivariable Limits by Tarek Hajj Shehadi
No ratings yet
Tricks For Multivariable Limits by Tarek Hajj Shehadi
10 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Information Theory, IT Entropy Mutual Information Use in NLP
No ratings yet
Information Theory, IT Entropy Mutual Information Use in NLP
23 pages
Atlas Mayur Sathe
No ratings yet
Atlas Mayur Sathe
53 pages
Session3 and 4 - RKS - PredictiveAnalytics
No ratings yet
Session3 and 4 - RKS - PredictiveAnalytics
46 pages
MPC 005 Previous Year Question Papers
No ratings yet
MPC 005 Previous Year Question Papers
43 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
AP Calculus AB-BC Poster 2025
No ratings yet
AP Calculus AB-BC Poster 2025
1 page
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
Pert CPM
No ratings yet
Pert CPM
7 pages
Regression Analysis
No ratings yet
Regression Analysis
20 pages
Acids and Bases Worksheet
No ratings yet
Acids and Bases Worksheet
12 pages
Acid Base Lecture 2
No ratings yet
Acid Base Lecture 2
32 pages
Slope-Deflection Method: Structural Theory
No ratings yet
Slope-Deflection Method: Structural Theory
19 pages
Iec62056 21en
No ratings yet
Iec62056 21en
9 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
3 Hermite Cubic Curves
No ratings yet
3 Hermite Cubic Curves
27 pages
Goal 2
No ratings yet
Goal 2
21 pages
Personal Branding - How To Go From Zero To Hero in No Time
No ratings yet
Personal Branding - How To Go From Zero To Hero in No Time
25 pages
(Chapman & Hall - CRC Applied Mathematics & Nonlinear Science) Chen, Yi-Tung - Li, Jichun-Computational Partial Differential Equations Using MATLAB-CRC Press (2008)
No ratings yet
(Chapman & Hall - CRC Applied Mathematics & Nonlinear Science) Chen, Yi-Tung - Li, Jichun-Computational Partial Differential Equations Using MATLAB-CRC Press (2008)
14 pages
Transylvania Summer School in Chromatography July 14-20 2010 Cluj-Napoca, Romania
No ratings yet
Transylvania Summer School in Chromatography July 14-20 2010 Cluj-Napoca, Romania
1 page
Assi Ginment Regression
100% (1)
Assi Ginment Regression
2 pages
Sum of Powers of Roots PDF
No ratings yet
Sum of Powers of Roots PDF
3 pages
Curtin University of Technology Department of Mathematics and Statistics
No ratings yet
Curtin University of Technology Department of Mathematics and Statistics
5 pages
A Comparative Analysis of Risk Assessment Techniqu
No ratings yet
A Comparative Analysis of Risk Assessment Techniqu
7 pages
5 Signals From Economic Survey On Budget For Dalal Street Investors - The Economic Times
No ratings yet
5 Signals From Economic Survey On Budget For Dalal Street Investors - The Economic Times
2 pages
Financial Accounting Basics
No ratings yet
Financial Accounting Basics
2 pages
Algebra H Iteration v2 PDF
No ratings yet
Algebra H Iteration v2 PDF
6 pages
Fungsi Eksponensial
No ratings yet
Fungsi Eksponensial
3 pages
ML Intro
No ratings yet
ML Intro
1 page
CAP - Cumulative Accuracy Profile PDF
No ratings yet
CAP - Cumulative Accuracy Profile PDF
1 page
Perceptual Mapping and Positioning
No ratings yet
Perceptual Mapping and Positioning
1 page
Math 121 Syllabus
No ratings yet
Math 121 Syllabus
5 pages
Brian James Gurka Resume Recent
No ratings yet
Brian James Gurka Resume Recent
1 page
Tutorial Questions - Sequence and Series
No ratings yet
Tutorial Questions - Sequence and Series
3 pages
Market Research Brief
No ratings yet
Market Research Brief
2 pages
University of Management and Technology School of Science
No ratings yet
University of Management and Technology School of Science
2 pages
Gaussian Elimination Method
No ratings yet
Gaussian Elimination Method
2 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Prediction & Forecasting: Regression Analysis

Uploaded by

Prediction & Forecasting: Regression Analysis

Uploaded by

Regression analysis a statistical technique to analyse the relationship betn different variables and predict a continuous unknown

variable using this relationship.

MLR is an extension of SLR: y = β0 + β1 x1 + β2 x2 + …. + βn xn + ϵ (models error term)

You might also like