Explain The Linear Regression Algorithm in Detail
Explain The Linear Regression Algorithm in Detail
Y = β₀ + β₁.X
where,
Y - Dependent variable
X - Independent variable
βo - Intercept *βo being a constant that is to be determined. It is referred to
as the intercept because, when X is 0 then Y = βo)
β1 – Slope *βo being a value that is to be determined. It is referred to as the
coe/cient, and is sort of like magnitude of change that Y goes through when
X changes.)
Multiple Linear Regression
y - Dependent variable
x1, x2….. Xn - Independent variable
βo - Intercept *βo being a constant that is to be determined. It is referred to
as the intercept because, when X is 0 then Y = βo)
β1 – The coe/cient for X1*the rst feature)
β2 – The coe/cient for X2*the second feature)
βn – The coe/cient for Xn*the nth feature)
RSS *Residual Sum of Squares): The total sum of error across the whole
sample.
TSS *Total sum of squares): It is the sum of errors of the data points from
mean of response variable.
For example, if r-squared = 0.850, which means that 85% of the total
variation in y can be explained by the linear relationship between x and y *as
described by the regression equation). The other 15% of the total variation in
y remains unexplained.
But while plotting these four datasets on an x/y coordinate plane, we can observe
that they show the same regression lines as well but each dataset is telling a
diAerent story
Dataset I appears to have clean and well-tting linear models.
Dataset II is not distributed normally.
In Dataset III the distribution is linear, but the calculated regression is
thrown oA by an outlier.
Dataset IV shows that one outlier is enough to produce a high correlation
coe/cient.
Furthermore:
Positive values denote positive linear correlation
Negative values denote negative linear correlation;
A value of 0 denotes no linear correlation;
The closer the value is to 1 or –1, the stronger the linear correlation.
7. You might have observed that sometimes the value of VIF is in7nite.
Why does this happen?
The Variance In8ation Factor (VIF) measures the impact of collinearity among
the variables in a regression model. The Variance In<ation Factor *VIF) is
1/Tolerance
Where,
R2i is the coe/cient of determination of a regression model where the ith factor is
treated as a response variable in the model with all of the other factors.
Tolerance = 1-R2i
In practice, the Gauss Markov assumptions are rarely all met perfectly, but they
are still useful as a benchmark, and because they show us what ‘ideal’ conditions
would be. They also allow us to pinpoint problem areas that might cause our
estimated regression coe/cients to be inaccurate or even unusable.
Cost Function is a way to determine how well the machine learning model has
performed given the diAerent values of each parameters.
For example, the linear regression model, the parameters will be the two
coe/cients, Beta 1 and Beta 2.
The cost function will be the sum of least square methods.
Since the cost function is a function of the parameters Beta 1 and Beta 2, we can
plot out the cost function with each value of Beta. *I.e. Given the value of each
coe/cient, we can refer to the cost function to know how well the machine learning
model has performed.)
When we are training the model, we are trying to the nd the values of the
coe/cients *the Betas, for the case of linear regression) that will give us the lowest
cost. In other words, for the case of linear regression, we are nding the value of
the coe/cients that will reduce the cost to the minimum
coe/cient = 0.0
The cost of the coe/cients is evaluated by plugging them into the function and
calculating the cost.
cost = f*coe/cient)
or
cost = evaluate*f*coe/cient))
The derivative of the cost is calculated. The derivative is a concept from calculus
and refers to the slope of the function at a given point. We need to know the slope
so that we know the direction *sign) to move the coe/cient values in order to get a
lower cost on the next iteration.
delta = derivative*cost)
Now that we know from the derivative which direction is downhill, we can now
update the coe/cient values. A learning rate parameter *alpha) must be specied
that controls how much the coe/cients can change on each update.
This process is repeated until the cost of the coe/cients *cost) is 0.0 or close
enough to zero to be good enough.
10. What is a Q-Q plot? Explain the use and importance of a Q-Q plot in
linear regression.
A quantile-quantile plot *also known as a QQ-plot) is another way you can determine
whether a dataset matches a specied probability distribution. QQ-plots are often
used to determine whether a dataset is normally distributed. Graphically, as
the name suggests, the horizontal and vertical axes of a QQ-plot are used to show
quantiles.
Quartiles divide a dataset into four equal groups, each consisting of 25 percent of
the data. But there is nothing particularly special about the number four. You can
choose any number of groups you please.
Another popular type of quantile is the percentile, which divides a dataset into 100
equal groups. For example, the 30th percentile is the boundary between the
smallest 30 percent of the data and the largest 70 percent of the data. The median
of a dataset is the 50th percentile of the dataset. The 25th percentile is the rst
quartile, and the 75th percentile the third quartile.
With a QQ-plot, the quantiles of the sample data are on the vertical axis, and the
quantiles of a specied probability distribution are on the horizontal axis. The plot
consists of a series of points that show the relationship between the actual
data and the speci7ed probability distribution. If the elements of a dataset
perfectly match the specied probability distribution, the points on the graph will
form a 45 degree line.
For example, this gure shows a normal QQ-plot for the price of Apple stock from
January 1, 2013 to December 31, 2013.
The QQ-plot shows that the prices of Apple stock do not conform very well to the
normal distribution. In particular, the deviation between Apple stock prices and the
normal distribution seems to be greatest in the lower left-hand corner of the graph,
which corresponds to the left tail of the normal distribution. The discrepancy is also
noticeable in the upper right-hand corner of the graph, which corresponds to the
right tail of the normal distribution.