Classical Machine Learning: Linear Regression: Ramesh S

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Classical Machine Learning:

Linear Regression

Ramesh S
Regression
● “Draw a line through these dots. Yep, that's linear regression”

● Today this is used for:


○ Stock price forecasts

○ Demand and sales volume analysis

○ Medical diagnosis

○ Any number-time correlations


Regression
● Regression is basically classification where we forecast a number.

● Examples are
○ car price by its mileage

○ traffic by time of the day

○ demand volume by growth of the company etc.

● Regression is perfect when something depends on time.


Regression
● Everyone who works with finance and analysis loves regression.

● It's even built-in to Excel.

● And it's super smooth inside — the machine simply tries to draw a
line that indicates average correlation.

● Though, unlike a person with a pen and a whiteboard, machine


does so with mathematical accuracy, calculating the average
interval to every dot.
Regression

● When the line is


straight — it's a linear
regression
● When it's curved –
polynomial regression
Regression
● Linear and Polynomial are two major types of regression.

● The other ones are more exotic.

● Logistic regression is a black sheep in the flock.

● Don't let it trick you, as it's a classification method, not regression.


Regression Models
● Regression predicts a continuous target variable.

● It allows you to estimate a value, such as housing prices or human lifespan, based on input data.

● Here, target variable means the unknown variable we care about predicting, and continuous means there
aren’t gaps (discontinuities) in the value that can take on.

● A person’s weight and height are continuous values.

● Discrete variables, on the other hand, can only take on a finite number of values — for example, the number
of kids somebody has is a discrete variable.
Regression Models
● This technique is used for forecasting, time series modelling and finding the causal effect relationship
between the variables.

● For example, relationship between rash driving and number of road accidents by a driver is best studied
through regression.

● Regression analysis is an important tool for modelling and analyzing data.

● Here, we fit a curve / line to the data points, in such a manner that the differences between the distances
of data points from the curve or line is minimized.
Why Use Regression Analysis?
● Regression Analysis estimates the relationship between two or more variables. Let’s understand this with
an easy example:

● Say, you want to estimate growth in sales of a company based on current economic conditions.

● You have the recent company data which indicates that the growth in sales is around two and a half times
the growth in the economy.

● Using this insight, we can predict future sales of the company based on current & past information.
No. of
Independent
Regression Analysis Variables

● There are various kinds of regression techniques


available to make predictions.

● These techniques are mostly driven by three


metrics.
Regression

Shape of the Type of


Regression Dependent
Line Variables
Linear Regression
● A linear regression refers to a regression model that is completely made up of linear variables.

Linear Regression

Multi-variate Linear
Simple Linear Regression
Regression
Linear Regression
● Single Variable Linear Regression is a technique used to model the relationship between a single input
independent variable (feature variable) and an output dependent variable using a linear model i.e., a line.

𝑦 = 𝑚𝑥 + 𝑐 where, y is the dependent variable


x is the independent variable
c is the intercept
m is the slope
Linear Regression
The difference between simple linear regression and multiple linear regression is:

● multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1
independent variable.

● The more general case is Multi Variable Linear Regression where a model is created for the relationship
between multiple independent input variables (feature variables) and an output dependent variable.

● The model remains linear in that the output is a linear combination of the input variables.

𝑦= 𝑎₁ 𝑥₁+𝑎₂ 𝑥₂+𝑎₃ 𝑥₃ + …. + a 𝑥 +𝑏 where, 𝑎 are the coefficients


𝑥 are the variables and b is the bias
Linear Regression - Steps
I. Build the Model

II. Estimate the Cost (Loss) Function

III. Obtain the best fit line. (How?) - Use Least Squares Method

IV. Model Development and Improvement (Gradient Descent)

V. Model Validation and Diagnostics


Building the Model
● Study the problem and the data

● Correlate them and plan for the cost function


Estimate the Cost Function
● Includes calculating the necessary variables and their coefficients. (Like Lalitha did for the Par )
Estimate the Cost Function

● Lalitha moved to a new city


● She is need of new friends.
● So, she plans for a party
Estimate the Cost Function

● There are three supermarkets nearby: A-Mart, B-Mart and C-Mart


● Now she needs to buy groceries for the party.
● She went to all three supermarkets individually and noted prices
for the required groceries
Estimate the Cost Function

● A-Mart has discount on sugar


● B-Mart has discount on vegetables
● C-Mart has discount on rice
● Each supermarket has their own prices for the items
Estimate the Cost Function

● She deduced an equation like:

Par = 5(Sug ) + 10(Veg ) + 10(Ric )

5 and 10 being their weights in Kg

● Now she decides what to buy where, that has minimum value for the Cost
Function Par
Estimate the Cost Function

● What Lalitha just did is optimization or Gradient Descent


● She made the final price to descend to the least value.
Goodness of Fit
● Least Squares Method calculates the best-fit line for the
observed data by minimizing the sum of the squares of
the vertical deviations from each data point to the line.
● Because the deviations are first squared, when added,
there is no cancelling out between positive and negative
values.
Goodness of Fit
● A model fits the data well if the differences
between the observed values and the model's
predicted values are small and unbiased. This should be
minimum.
Model Development and Improvement
● We need a better way to figure out how well we’ve
fit the data than staring at the graph.

● A common measure is the coefficient of


determination (or R-squared), which measures the
fraction of the total variation in the dependent
variable that is captured by the model
What is R-Squared?
● R-squared is a statistical measure of how close the data are to the fitted regression line.

● It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple
regression.

● It represents the proportion of the variance for a dependent variable that's explained by an independent variable.

● Correlation explains the strength of the relationship between an independent and dependent variable, R-squared
explains to what extent the variance of one variable explains the variance of the second variable.

● The sum of squared errors must be at least 0, which means that the R-squared can be at most 1.

● The higher the number, the better our model fits the data.
R-Squared
Other Performance Metrics

Performance Statistic Usage Condition What it should be?

R - Squared Any Regression Model Closer to 1, better the model

Mean Absolute Deviation Continuous Data Should be as less as possible

Median Absolute Deviation When Errors are skewed Should be as less as possible

Root Mean Square Errors To magnify errors Should be as less as possible

You might also like