0 ratings0% found this document useful (0 votes) 26 views4 pagesComplete Linear Regression Algorithm
this note presents information about linear regression algorithms for machine learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Supervised Learning - Regression Algorithms
Linear Regression
> Linear regression is a popular and widely-used machine leaming algorithm that is
used to model the relationship between a dependent variable and one or more
independent variables.
> Dependent Variable : (also known as the response variable or output).
> Independent Variable : (also known as the explanatory variables or input
features)
> It assumes a linear relationship between the independent variables (also known as
features or predictors) and the dependent variable (also known as the target or
response variable).
> The goal of linear regression is to find the best-fit line that minimizes the difference
between the predicted values and the actual values of the dependent variable.
> The equation of a simple linear regression model is: y = mx + b, where y is the
dependent variable, x is the independent variable, m is the slope, and b is the
y-intercept.
> The slope (m) represents the change in the dependent variable for a one-unit change
in the independent variable.
> The y-intercept (b) is the value of the dependent variable when the independent
variable is zero.
> The bestfit line is determined by minimizing the sum of squared differences between
the predicted values and the actual values (known as the residual sum of squares).
> This minimization is typically done using the least squares method.
> Linear regression can be extended to multiple independent variables, resulting in
‘multiple linear regression
> In multiple linear regression, the equation becomes: y = bO + b1x1 + b2x2 +... +
bnxn, where bO is the y-intercept and b1, b2, ..., bn are the slopes for each
independent variable.
> The coefficients (b0, b1, b2, .., bn) are estimated using various statistical techniques
like ordinary least squares or gradient descent.
> Linear regression can be used for both prediction and inference. It can help us
understand the relationship between variables and make predictions about future
outcomes.Finding the Best-Fit Line
Definition — Residual
Residual = Observed Y - Predicted Y
Line of 0 :
regression
dependent Variable
Resid = Observed ¥- Predisied Y
independent Variables. Die yee aie
Types of Linear Regression
There are several types of linear regression techniques that can be used depending on the
specific characteristics of the data and the research question. Here are some common types
of linear regression:
1, Simple Linear Regression: Simple linear regression involves modeling the
relationship between a single independent variable and a dependent variable. It
assumes a linear relationship between the variables and estimates a straight line that
best fits the data.
2. Multiple Linear Regression: Multiple linear regression extends simple linear
regression to include multiple independent variables. It models the relationship
between the dependent variable and multiple predictors, assuming a linear
combination of the predictors. The goal is to estimate the coefficients for each
predictor that best explain the variation in the dependent variable.
3, Polynomial Regression: Polynomial regression allows for non-linear relationships
by including polynomial terms of the independent variable(s) in the regression model.
It can capture curved or nonlinear pattems in the data by fitting higher-degree
Polynomial equations (e.g., quadratic, cubic) instead of a straight line,
4. Ridge Regression: Ridge regression is a regularization technique that addresses
‘multicollinearity (high correlation between predictors) in multiple linear regression. It
adds a penalty term to the least squares estimation to shrink the regression
coefficients towards zero. It can help improve the stability and generalization
performance of the model.5. Lasso Regression: Lasso (Least Absolute Shrinkage and Selection Operator)
regression is another regularization technique that performs variable selection and
regularization simultaneously. It adds a penalty term based on the absolute values of
the regression coefficients, encouraging sparsity (some coefficients become exactly
zero). Lasso regression is useful for feature selection and can handle
high-dimensional datasets
6, Elastic Net Regression: Elastic Net regression combines the regularization
techniques of ridge regression and lasso regression. It includes both the L1 (lasso)
and L2 (ridge) penatty terms in the objective function, providing a balance between
feature selection and coefficient shrinkage
‘These are some commonly used types of linear regression techniques. It's important to
choose the appropriate regression method based on the nature of the data, the relationship
between variables, and the research objectives.
Assumptions of Linear Regressio1
> Linearity: The relationship between the independent variables and the dependent
variable is linear.
> Independence: The observations are independent of each other.
> Homoscedasticity
© The variance of the errors (residuals) is constant across all levels of the
independent variables, In simpler terms, it means that the spread or
dispersion of the residuals is consistent across the range of values of the
independent variable(s).
© To understand homoscedasticity, let's first define the residuals. Residuals are
the differences between the observed values of the dependent variable and
the predicted values obtained from the regression model. They represent the
unexplained or leftover variation in the dependent variable.
> Normality: The errors are normally distributed with a mean of zero.
> No Multicollinearity: The independent variables are not highly correlated with each
other.Evaluation of Linear Regressiot
Mean Squared Error (MSE): It measures the average squared difference between
the predicted values and the actual values. Lower values indicate better
performance.
R-squared (R‘2): It represents the proportion of variance in the dependent variable
that can be explained by the independent variables. Higher values indicate better fi.
Limitations of Linear Regression:
Linear regression assumes a linear relationship between variables. If the relationship
is nor-linear, linear regression may not provide accurate results.
Linear regression is sensitive to outliers. Outliers can significantly affect the
estimated coefficients and the overall model.
Linear regression assumes independence of observations. If there is a correlation
between observations, the assumptions of linear regression may be violated
Interview Question on Linear Regression
Here are some interview questions about linear regression:
9,
10,
"
12
13
"4
What is linear regression?
Explain the difference between simple linear regression and multiple linear regression.
What are the assumptions of linear regression?
How do you interpret the coefficients in linear regression?
What is the purpose of the cost function in linear regression?
How does linear regression handle categorical variables?
What is the significance of the p-value in linear regression?
What is the difference between R-squared and adjusted R-squared?
What are some methods for handling multicollinearity in linear regression?
How do you evaluate the performance of a linear regression model?
What are some common techniques for feature selection in linear regression?
What are the potential problems or limitations of linear regression?
How do you handle outliers in linear regression?
Can you explain the concept of regularization in linear regression?
These questions cover a range of topics related to linear regression, including its basic principles,
assumptions, interpretation, evaluation, and potential challenges, Remember to study each topic
thoroughly to be well-prepared for your interview.