Classical Machine Learning: Linear Regression: Ramesh S

Classical Machine Learning:
Linear Regression
Ramesh S
Regression
● “Draw a line through these dots. Yep, that's linear regression”
● Today this is used for:

○ Stock price forecasts
○ Demand and sales volume analysis
○ Medical diagnosis
○ Any number-time correlations

Regression
● Regression is basically classification where we forecast a number.
● Examples are
○ car price by its mileage
○ traffic by time of the day
○ demand volume by growth of the company etc.
● Regression is perfect when something depends on time.

Regression
● Everyone who works with finance and analysis loves regression.
● It's even built-in to Excel.
● And it's super smooth inside — the machine simply tries to draw a
line that indicates average correlation.
● Though, unlike a person with a pen and a whiteboard, machine

does so with mathematical accuracy, calculating the average
interval to every dot.
Regression
● When the line is

straight — it's a linear
regression
● When it's curved –
polynomial regression
Regression
● Linear and Polynomial are two major types of regression.
● The other ones are more exotic.
● Logistic regression is a black sheep in the flock.
● Don't let it trick you, as it's a classification method, not regression.

Regression Models
● Regression predicts a continuous target variable.
● It allows you to estimate a value, such as housing prices or human lifespan, based on input data.
● Here, target variable means the unknown variable we care about predicting, and continuous means there
aren’t gaps (discontinuities) in the value that can take on.
● A person’s weight and height are continuous values.
● Discrete variables, on the other hand, can only take on a finite number of values — for example, the number
of kids somebody has is a discrete variable.
Regression Models
● This technique is used for forecasting, time series modelling and finding the causal effect relationship
between the variables.
● For example, relationship between rash driving and number of road accidents by a driver is best studied
through regression.
● Regression analysis is an important tool for modelling and analyzing data.
● Here, we fit a curve / line to the data points, in such a manner that the differences between the distances
of data points from the curve or line is minimized.
Why Use Regression Analysis?
● Regression Analysis estimates the relationship between two or more variables. Let’s understand this with
an easy example:
● Say, you want to estimate growth in sales of a company based on current economic conditions.
● You have the recent company data which indicates that the growth in sales is around two and a half times
the growth in the economy.
● Using this insight, we can predict future sales of the company based on current & past information.
No. of
Independent
Regression Analysis Variables
● There are various kinds of regression techniques

available to make predictions.
● These techniques are mostly driven by three

metrics.
Regression
Shape of the Type of

Regression Dependent
Line Variables
Linear Regression
● A linear regression refers to a regression model that is completely made up of linear variables.
Linear Regression
Multi-variate Linear
Simple Linear Regression
Regression
Linear Regression
● Single Variable Linear Regression is a technique used to model the relationship between a single input
independent variable (feature variable) and an output dependent variable using a linear model i.e., a line.
𝑦 = 𝑚𝑥 + 𝑐 where, y is the dependent variable

x is the independent variable
c is the intercept
m is the slope
Linear Regression
The difference between simple linear regression and multiple linear regression is:
● multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1
independent variable.
● The more general case is Multi Variable Linear Regression where a model is created for the relationship
between multiple independent input variables (feature variables) and an output dependent variable.
● The model remains linear in that the output is a linear combination of the input variables.
𝑦= 𝑎₁ 𝑥₁+𝑎₂ 𝑥₂+𝑎₃ 𝑥₃ + …. + a 𝑥 +𝑏 where, 𝑎 are the coefficients

𝑥 are the variables and b is the bias
Linear Regression - Steps
I. Build the Model
II. Estimate the Cost (Loss) Function
III. Obtain the best fit line. (How?) - Use Least Squares Method
IV. Model Development and Improvement (Gradient Descent)
V. Model Validation and Diagnostics

Building the Model
● Study the problem and the data
● Correlate them and plan for the cost function

Estimate the Cost Function
● Includes calculating the necessary variables and their coefficients. (Like Lalitha did for the Par )
● Lalitha moved to a new city

● She is need of new friends.
● So, she plans for a party
● There are three supermarkets nearby: A-Mart, B-Mart and C-Mart

● Now she needs to buy groceries for the party.
● She went to all three supermarkets individually and noted prices
for the required groceries
● A-Mart has discount on sugar

● B-Mart has discount on vegetables
● C-Mart has discount on rice
● Each supermarket has their own prices for the items
● She deduced an equation like:
Par = 5(Sug ) + 10(Veg ) + 10(Ric )
5 and 10 being their weights in Kg
● Now she decides what to buy where, that has minimum value for the Cost
Function Par
● What Lalitha just did is optimization or Gradient Descent

● She made the final price to descend to the least value.
Goodness of Fit
● Least Squares Method calculates the best-fit line for the
observed data by minimizing the sum of the squares of
the vertical deviations from each data point to the line.
● Because the deviations are first squared, when added,
there is no cancelling out between positive and negative
values.
Goodness of Fit
● A model fits the data well if the differences
between the observed values and the model's
predicted values are small and unbiased. This should be
minimum.
Model Development and Improvement
● We need a better way to figure out how well we’ve
fit the data than staring at the graph.
● A common measure is the coefficient of

determination (or R-squared), which measures the
fraction of the total variation in the dependent
variable that is captured by the model
What is R-Squared?
● R-squared is a statistical measure of how close the data are to the fitted regression line.
● It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple
regression.
● It represents the proportion of the variance for a dependent variable that's explained by an independent variable.
● Correlation explains the strength of the relationship between an independent and dependent variable, R-squared
explains to what extent the variance of one variable explains the variance of the second variable.
● The sum of squared errors must be at least 0, which means that the R-squared can be at most 1.
● The higher the number, the better our model fits the data.
R-Squared
Other Performance Metrics
Performance Statistic Usage Condition What it should be?
R - Squared Any Regression Model Closer to 1, better the model
Mean Absolute Deviation Continuous Data Should be as less as possible
Median Absolute Deviation When Errors are skewed Should be as less as possible
Root Mean Square Errors To magnify errors Should be as less as possible

Classical Machine Learning: Linear Regression: Ramesh S

Uploaded by

Copyright:

Available Formats

Classical Machine Learning: Linear Regression: Ramesh S

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classical Machine Learning: Linear Regression: Ramesh S

Uploaded by

Copyright:

Available Formats

Classical Machine Learning:

● Today this is used for:

○ Demand and sales volume analysis

○ Any number-time correlations

○ traffic by time of the day

○ demand volume by growth of the company etc.

● Regression is perfect when something depends on time.

● It's even built-in to Excel.

● Though, unlike a person with a pen and a whiteboard, machine

● When the line is

● The other ones are more exotic.

● Logistic regression is a black sheep in the flock.

● Don't let it trick you, as it's a classification method, not regression.

● A person’s weight and height are continuous values.

● Regression analysis is an important tool for modelling and analyzing data.

● There are various kinds of regression techniques

● These techniques are mostly driven by three

Shape of the Type of

𝑦 = 𝑚𝑥 + 𝑐 where, y is the dependent variable

𝑦= 𝑎₁ 𝑥₁+𝑎₂ 𝑥₂+𝑎₃ 𝑥₃ + …. + a 𝑥 +𝑏 where, 𝑎 are the coefficients

II. Estimate the Cost (Loss) Function

IV. Model Development and Improvement (Gradient Descent)

V. Model Validation and Diagnostics

● Correlate them and plan for the cost function

● Lalitha moved to a new city

● There are three supermarkets nearby: A-Mart, B-Mart and C-Mart

● A-Mart has discount on sugar

● She deduced an equation like:

Par = 5(Sug ) + 10(Veg ) + 10(Ric )

5 and 10 being their weights in Kg

● What Lalitha just did is optimization or Gradient Descent

● A common measure is the coefficient of

Performance Statistic Usage Condition What it should be?

R - Squared Any Regression Model Closer to 1, better the model

Mean Absolute Deviation Continuous Data Should be as less as possible

Root Mean Square Errors To magnify errors Should be as less as possible

You might also like