Course Notes Linear Regression
Course Notes Linear Regression
ANALYSIS
Course
Coursenotes:
notes:
Descriptive
Descriptive
statistics
statistics
What is linear regression?
Regression analysis is one of the most widely used methods for prediction. Linear regression is probably the most
fundamental machine learning method out there and a starting point for the advanced analytical learning path of
every aspiring data scientist.
A linear regression is a linear approximation of a causal relationship between two or more variables.
Regression models are highly valuable, as they are one of the most common ways to make inferences and
predictions. Apart from this, regression analysis is also employed to determine and assess factors that affect a
certain outcome in a meaningful way.
As many other statistical techniques, regression models help us make predictions about the population based on
sample data.
𝑦ො = b0 + b1*x1
Constant Independent
variable
Constant
(Estimate) Sample data for
Note: When we refer to the population independent variable
models, we use Greek letters
yො i = b0 + b1 xi
y
Estimator of the error (ොei )
Slope (b1 )
Intercept (b0 )
x
*On average the expected value of the error is 0, that is why it is not included in the regression equation
Correlation vs Regression
Shows that two variables move Shows cause and effect (one variable
together (no matter in which direction) is affected by the other)
Symmetrical w.r.t. the two variables: One way – there is always only one
𝝆(x,y) = 𝝆 (y,x) variable that is causally dependent
The specified model The independent The variance of the No identifiable No predictor variable
must represent a linear variables shouldn’t be errors should be relationship should exist should be perfectly (or
relationship correlated with the consistent across between the values of almost perfectly) explained
error term. observations. the error term. by the other predictors.
Other methods for finding the regression line
OLS (ordinary least squares) is just the beginning. OLS is the simplest, although often sufficient method to estimate the
regression line. In fact, there are more complex methods that are more appropriate for certain datasets and problems.
Bayesian regression
Kernel regression
Gaussian progress regression