0% found this document useful (0 votes)
25 views50 pages

Linear Models

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views50 pages

Linear Models

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Linear Models

What is the best Fit Line?


Our primary objective while using linear regression is to locate the best-fit line, which implies that the error
between the predicted and actual values should be kept to a minimum. There will be the least error in the best-fit
line.
Least Square Method
Overview
• Least square method is the process of finding a regression line or
best-fitted line for any data set that is described by an equation.
• This method requires reducing the sum of the squares of the residual
parts of the points from the curve or line and the trend of outcomes is
found quantitatively.
• The method of curve fitting is seen while regression analysis and the
fitting equations to derive the curve is the least square method.
Least Square Method Definition
• The least-squares method is a statistical method used to
find the line of best fit of the form of an equation such as
y = mx + b to the given data.
• The curve of the equation is called the regression line.
• Main objective is to reduce the sum of the squares of
errors as much as possible.
• This is the reason this method is called the least-squares
method.
• This method is often used in data fitting where the best
fit result is assumed to reduce the sum of squared errors
that is considered to be the difference between the
observed values and corresponding fitted value.
• The sum of squared errors helps in finding the variation in
observed data. For example, we have 4 data points and
using this method we arrive at the following graph.
Least Square Method Formula
• Least-square method is the curve that best fits a set of observations with a
minimum sum of squared residuals or errors.
• Let us assume that the given points of data are (x1, y1), (x2, y2), (x3, y3), …, (xn,
yn) in which all x’s are independent variables, while all y’s are dependent ones.
• This method is used to find a linear line of the form y = mx + b, where y and x
are variables, m is the slope, and b is the y-intercept.
• The formula to calculate slope m and the value of b is given by:
• m = (n∑xy - ∑y∑x)/n∑x2 - (∑x)2
• b = (∑y - m∑x)/n
• Here, n is the number of data points.
Steps to solve
Following are the steps to calculate the least square using the above
formulas.
• Step 1: Draw a table with 4 columns where the first two columns are
for x and y points.
• Step 2: In the next two columns, find xy and (x)2.
• Step 3: Find ∑x, ∑y, ∑xy, and ∑(x)2.
• Step 4: Find the value of slope m using the above formula.
• Step 5: Calculate the value of b using the above formula.
• Step 6: Substitute the value of m and b in the equation y = mx + b
Example
Find the value of m by using the formula,

m = (n∑xy - ∑y∑x)/n∑x2 - (∑x)2

m = [(5×88) - (15×25)]/(5×55) - (15)2

m = (440 - 375)/(275 - 225)

m = 65/50 = 13/10

Find the value of b by using the formula,

b = (∑y - m∑x)/n

b = (25 - 1.3×15)/5

b = (25 - 19.5)/5

b = 5.5/5
https://www.cuemath.com/data/least-squares/
https://www.geeksforgeeks.org/least-square-method/ So, the required equation of least squares is y =

mx + b = 13/10x + 5.5/5.
Multiple Linear Regression
Multivariate Linear Regression
• Multivariate regression refers to the statistical technique that
establishes a relationship between multiple data variables. It
estimates a linear equation that facilitates the analysis of multiple
dependent or outcome variables depending on one or more predictor
variables at different points in time.
Assumptions
• The validity and reliability of the multivariate regression findings depend upon
the following four assumptions:
• Linearity: The correlation between the predictor and outcome variables is linear.
• Independence: The observations are autonomous of each other, i.e., the value
of the other independent variable should not influence the value of the
independent variables.
• Homoscedasticity: The variance of the errors (residuals) is even across all levels
of the explanatory variables. This ensures that the spread of residuals is the
same for all predicted values.
• Normality: The residuals (differences between observed and predicted values)
should be normally distributed, ensuring that statistical inferences about
regression coefficients are valid.
Advantages
• Some of the benefits of this model are discussed below:
• Better Comprehends Relationships: Unlike simple linear regression, which considers
only one predictor, multivariate regression can account for interactions and
interdependencies among various predictors, capturing complex relationships
between these variables.
• Reliable Predictions: By including multiple predictors, the model might provide more
accurate estimations than simple regression models, leading to a better fit for the
data.
• Correlation, Strength, and Direction: Multivariate regression can help identify which
explanatory variables significantly influence the dependent variable, establishing a
correlation and quantifying the direction and strength of these correlations.
Disadvantages
The various limitations of this regression technique are as follows:
• Difficult to Interpret: Multivariate regression can be challenging to interpret, especially for
individuals unfamiliar with statistical analyses, due to multiple predictors.
• Complex Calculations: Since this model incorporates multiple variables, its computation involves
complex mathematical calculations.
• Extensive Data Requirement: Multivariate regression requires a larger sample size than simple
regression. Small sample sizes can result in unreliable parameter estimates and low statistical
power.
• Overfitting: It occurs when the model fits the training data too closely, capturing noise rather
than the underlying pattern.
https://www.wallstreetmojo.com/multivariate-regression/
Steps to achieve multivariate regression
• Step 1: Select the features
• First, you need to select that one feature that drives the multivariate regression. This is the feature that is
highly responsible for the change in your dependent variable.
• Step 2: Normalize the feature
• Now that we have our selected features, it is time to scale them in a certain range (preferably 0-1) so that
analysing them gets a bit easy.
• To change the value of each feature, we can use:
• Step 3: Select loss function and formulate a hypothesis
• A formulated hypothesis is nothing but a predicted value of the response variable and is denoted by h(x).
• A loss function is a calculated loss when the hypothesis predicts a wrong value. A cost function is a cost
handled for those wrongly predicting hypotheses.
Steps to achieve multivariate regression
• Step 4: Minimize the cost and loss function
• Both cost function and loss function are dependent on each other. Hence, in order to
minimize both of them, minimization algorithms can be run over the datasets. These
algorithms then adjust the parameters of the hypothesis.
• One of the minimization algorithms that can be used is the gradient descent algorithm.
• Step 5: Test the hypothesis
• The formulated hypothesis is then tested with a test set to check its accuracy and correctness.

• https://brilliant.org/wiki/multivariate-regression/: different equations for regression


where we have m data points in training data and y is the observed data of dependent
variable.
What are Overfitting and Underfitting?
• Overfitting is a phenomenon that occurs when a Machine Learning
model is constrained to the training set and not able to perform well
on unseen data. That is when our model learns the noise in the
training data as well. This is the case when our model memorizes the
training data instead of learning the patterns in it.
• Underfitting on the other hand is the case when our model is not able
to learn even the basic patterns available in the dataset. In the case of
the underfitting model is unable to perform well even on the training
data hence we cannot expect it to perform well on the validation
data. This is the case when we are supposed to increase the
complexity of the model or add more features to the feature set.
What is Regularized Regression?
• Regularized regression is a type of regression where the coefficient
estimates are constrained to zero.
• The magnitude (size) of coefficients, as well as the magnitude of the
error term, are penalized.
• Complex models are discouraged, primarily to avoid overfitting.
• “Regularization” is a way to give a penalty to certain models (usually
overly complex ones).
Types of Regularized Regression
Two commonly used types of regularized regression methods are
ridge regression and lasso regression.
• Ridge regression is a way to create a parsimonious model when the
number of predictor variables in a set exceeds the number of
observations, or when a data set has multicollinearity (correlations
between predictor variables).
• Lasso regression is a type of linear regression that uses shrinkage.
Shrinkage is where data values are shrunk towards a central point, like
the mean. This type is very useful when you have high levels of
muticollinearity or when you want to automate certain parts of model
selection, like variable selection/parameter elimination.
Hyper-parameter could be learning rate or epoch etc.

You might also like