0% found this document useful (0 votes)
144 views12 pages

Explain The Linear Regression Algorithm in Detail

Linear regression is a method used to model relationships between variables. Simple linear regression explains the relationship between a dependent variable and one independent variable using a straight line. Multiple linear regression explains the relationship between one dependent variable and two or more independent variables. The assumptions of linear regression include that the relationship between variables is linear, residuals have a mean of zero and equal variance, and variables are not correlated. Anscombe's quartet demonstrates that datasets can have identical descriptive statistics but tell different stories, emphasizing the importance of visualization.

Uploaded by

Prabhat Shankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views12 pages

Explain The Linear Regression Algorithm in Detail

Linear regression is a method used to model relationships between variables. Simple linear regression explains the relationship between a dependent variable and one independent variable using a straight line. Multiple linear regression explains the relationship between one dependent variable and two or more independent variables. The assumptions of linear regression include that the relationship between variables is linear, residuals have a mean of zero and equal variance, and variables are not correlated. Anscombe's quartet demonstrates that datasets can have identical descriptive statistics but tell different stories, emphasizing the importance of visualization.

Uploaded by

Prabhat Shankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1. Explain the linear regression algorithm in detail.

Regression is a method of modelling a target value based on independent


predictors. This method is used to forecast and nd out the relationship
between these independent variables to predict the target value.
Linear regression algorithm is one such method which is used for nding
linear relationship between target and one or more predictors. There are two
types of linear regression- Simple and Multiple linear regression.

Simple Linear Regression

Simple linear regression model explains the relationship between a


dependent variable and one independent variable using a straight line.

Y = β₀ + β₁.X

where,
Y - Dependent variable
X - Independent variable
βo - Intercept *βo being a constant that is to be determined. It is referred to
as the intercept because, when X is 0 then Y = βo)
β1 – Slope *βo being a value that is to be determined. It is referred to as the
coe/cient, and is sort of like magnitude of change that Y goes through when
X changes.)
Multiple Linear Regression

The multiple linear regression explains the relationship between one


continuous dependent variable*Y) and two or more independent
variables*x1, x2, x3… etc.).

y - Dependent variable
x1, x2….. Xn - Independent variable
βo - Intercept *βo being a constant that is to be determined. It is referred to
as the intercept because, when X is 0 then Y = βo)
β1 – The coe/cient for X1*the rst feature)
β2 – The coe/cient for X2*the second feature)
βn – The coe/cient for Xn*the nth feature)

2. What are the assumptions of linear regression regarding residuals?


The important assumptions in regression analysis are
 Relationship between your independent and dependent variables should
always be linear i.e. you can depict a relationship between two variables
with help of a straight line.
 Mean of residuals should be zero or close to 0 as much as possible. It is
done to check whether our line is actually the line of “best t”.

Residual = Observed – Predicted

We want the arithmetic sum of these residuals to be as much equal to


zero as possible.

 There should be homoscedasticity or equal variance in our regression


model. This assumption means that the variance around the regression
line is the same for all values of the predictor variable *X).
 All the dependent variables and residuals should be uncorrelated.
 The independent variables should not be correlated. Absence of this
phenomenon is known as multicollinearity.
 There should be no perfect multicollinearity in your model.
Multicollinearity generally occurs when there are high correlations
between two or more independent variables. In other words, one
independent variable can be used to predict the other. This creates
redundant information, skewing the results in a regression model. We can
check multicollinearity using VIF*variance in<ation factor). Higher the VIF
for an independent variable, more is the chance that variable is already
explained by other independent variables.
 Residuals should be normally distributed

3. What is the coe#cient of correlation and the coe#cient of


determination?

Correlation coe#cients are used in statistics to measure how strong a


relationship is between two variables. There are several types of correlation
coe/cient: Pearson’s correlation *also called Pearson’s R) is a correlation
coe/cient commonly used in linear regression.

Correlation coe/cient formulas are used to nd how strong a relationship is


between data. The formulas return a value between -1 and 1, where:

 1 indicates a strong positive relationship.


 -1 indicates a strong negative relationship.
 A result of zero indicates no relationship at all.

For example, if Correlation coe/cient of Gold and crude is 1 which means


they are highly correlated, hence a rise in the price of oil increases gold rate

The coe#cient of determination, R2, is used to analyze how diAerences


in one variable can be explained by a diAerence in a second variable. More
specically, R-squared gives you the percentage variation in y explained by
x-variables. The range is 0 to 1 *i.e. 0% to 100% of the variation in y can be
explained by the x-variables)
In general term, it provides a measure of how well actual outcomes are
replicated by the model. Overall, the higher the R-squared, the better the
model ts your data. Mathematically, it is represented as: R 2 = 1 - *RSS /
TSS)
Where,

RSS *Residual Sum of Squares): The total sum of error across the whole
sample.

TSS *Total sum of squares): It is the sum of errors of the data points from
mean of response variable.

For example, if r-squared = 0.850, which means that 85% of the total
variation in y can be explained by the linear relationship between x and y *as
described by the regression equation). The other 15% of the total variation in
y remains unexplained.

4. Explain the Anscombe’s quartet in detail.


Anscombe’s Quartet was developed by statistician Francis Anscombe. It
comprises four datasets, each containing eleven *x,y) pairs. The essential
thing to note about these datasets is that they share the same descriptive
statistics.
The summary statistics show that the means and the variances were identical for x
and y across the groups:
 Mean of x is 9 and mean of y is 7.50 for each dataset.
 Similarly, the variance of x is 11 and variance of y is 4.13 for each
dataset
 The correlation coe/cient *how strong a relationship is between two
variables) between x and y is 0.816 for each dataset

But while plotting these four datasets on an x/y coordinate plane, we can observe
that they show the same regression lines as well but each dataset is telling a
diAerent story
 Dataset I appears to have clean and well-tting linear models.
 Dataset II is not distributed normally.
 In Dataset III the distribution is linear, but the calculated regression is
thrown oA by an outlier.
 Dataset IV shows that one outlier is enough to produce a high correlation
coe/cient.

This quartet emphasizes the importance of visualization in Data Analysis. Looking at


the data reveals a lot of the structure and a clear picture of the dataset.
5. What is Pearson’s R?

Pearson’s correlation coe/cient is a statistical measure of the strength of a linear


relationship between paired data. In a sample it is denoted by r and is by design
constrained as follows

Furthermore:
 Positive values denote positive linear correlation
 Negative values denote negative linear correlation;
 A value of 0 denotes no linear correlation;
 The closer the value is to 1 or –1, the stronger the linear correlation.

In the gures various samples and their corresponding sample correlation


coe/cient values are presented. The rst three represent the “extreme” correlation
values of -1, 0 and 1:

6. What is scaling? Why is scaling performed? What is the di0erence


between normalized scaling and standardized scaling?
Most of the times, your dataset will contain features highly varying in magnitudes,
units and range. But since, most of the machine learning algorithms use Eucledian
distance between two data points in their computations, this is a problem.
To suppress this eAect, we need to bring all features to the same level of
magnitudes. This can be achieved by scaling.
The two most discussed scaling methods are Normalization and Standardization.
Normalization typically means rescales the values into a range of [0,1].
Standardization typically means rescales data to have a mean of 0 and a standard
deviation of 1 *unit variance).

Scaling is a step of Data Pre Processing which is applied to independent variables or


features of data. It basically helps to normalize the data within a particular range.
Sometimes, it also helps in speeding up the calculations in an algorithm.

7. You might have observed that sometimes the value of VIF is in7nite.
Why does this happen?

The Variance In8ation Factor (VIF) measures the impact of collinearity among
the variables in a regression model. The Variance In<ation Factor *VIF) is
1/Tolerance

Where,
R2i is the coe/cient of determination of a regression model where the ith factor is
treated as a response variable in the model with all of the other factors.

Tolerance = 1-R2i

If VIF is innite, it means,


Tolerance = 0
i.e. 1-R2i = 0 which implies R2 value is 1 which means variables are highly
correlated , hence multicollinearity issues exists and model is not stable.

8. What is the Gauss-Markov theorem?


The Gauss Markov theorem tells us that if a certain set of assumptions are met, the
ordinary least squares estimate for regression coe/cients gives you the best linear
unbiased estimate *BLUE) possible.

Gauss Markov Assumptions


There are ve Gauss Markov assumptions *also called conditions):
1. Linearity: the parameters we are estimating using the OLS method must be
themselves linear.
2. Random: our data must have been randomly sampled from the population.
3. Non-Collinearity: the regressors being calculated aren’t perfectly correlated
with each other.
4. Exogeneity: the regressors aren’t correlated with the error term.
5. Homoscedasticity: no matter what the values of our regressors might be, the
error of the variance is constant.

Purpose of the Assumptions


The Gauss Markov assumptions guarantee the validity of ordinary least squares
for estimating regression coe/cients.
Checking how well our data matches these assumptions is an important part of
estimating regression coe/cients. When you know where these conditions are
violated, you may be able to plan ways to change your experiment setup to help
your situation t the ideal Gauss Markov situation more closely.

In practice, the Gauss Markov assumptions are rarely all met perfectly, but they
are still useful as a benchmark, and because they show us what ‘ideal’ conditions
would be. They also allow us to pinpoint problem areas that might cause our
estimated regression coe/cients to be inaccurate or even unusable.

9. Explain the gradient descent algorithm in detail.

Cost Function is a way to determine how well the machine learning model has
performed given the diAerent values of each parameters.

For example, the linear regression model, the parameters will be the two
coe/cients, Beta 1 and Beta 2.
The cost function will be the sum of least square methods.

Since the cost function is a function of the parameters Beta 1 and Beta 2, we can
plot out the cost function with each value of Beta. *I.e. Given the value of each
coe/cient, we can refer to the cost function to know how well the machine learning
model has performed.)

When we are training the model, we are trying to the nd the values of the
coe/cients *the Betas, for the case of linear regression) that will give us the lowest
cost. In other words, for the case of linear regression, we are nding the value of
the coe/cients that will reduce the cost to the minimum

Gradient descent is an optimization algorithm used to nd the values of parameters


*coe/cients) of a function *f) that minimizes a cost function *cost).

Gradient Descent Procedure


The procedure starts oA with initial values for the coe/cient or coe/cients for the
function. These could be 0.0 or a small random value.

coe/cient = 0.0
The cost of the coe/cients is evaluated by plugging them into the function and
calculating the cost.

cost = f*coe/cient)

or

cost = evaluate*f*coe/cient))

The derivative of the cost is calculated. The derivative is a concept from calculus
and refers to the slope of the function at a given point. We need to know the slope
so that we know the direction *sign) to move the coe/cient values in order to get a
lower cost on the next iteration.

delta = derivative*cost)

Now that we know from the derivative which direction is downhill, we can now
update the coe/cient values. A learning rate parameter *alpha) must be specied
that controls how much the coe/cients can change on each update.

coe/cient = coe/cient – *alpha * delta)

This process is repeated until the cost of the coe/cients *cost) is 0.0 or close
enough to zero to be good enough.

10. What is a Q-Q plot? Explain the use and importance of a Q-Q plot in
linear regression.

A quantile-quantile plot *also known as a QQ-plot) is another way you can determine
whether a dataset matches a specied probability distribution. QQ-plots are often
used to determine whether a dataset is normally distributed. Graphically, as
the name suggests, the horizontal and vertical axes of a QQ-plot are used to show
quantiles.

Quartiles divide a dataset into four equal groups, each consisting of 25 percent of
the data. But there is nothing particularly special about the number four. You can
choose any number of groups you please.
Another popular type of quantile is the percentile, which divides a dataset into 100
equal groups. For example, the 30th percentile is the boundary between the
smallest 30 percent of the data and the largest 70 percent of the data. The median
of a dataset is the 50th percentile of the dataset. The 25th percentile is the rst
quartile, and the 75th percentile the third quartile.

With a QQ-plot, the quantiles of the sample data are on the vertical axis, and the
quantiles of a specied probability distribution are on the horizontal axis. The plot
consists of a series of points that show the relationship between the actual
data and the speci7ed probability distribution. If the elements of a dataset
perfectly match the specied probability distribution, the points on the graph will
form a 45 degree line.

For example, this gure shows a normal QQ-plot for the price of Apple stock from
January 1, 2013 to December 31, 2013.

The QQ-plot shows that the prices of Apple stock do not conform very well to the
normal distribution. In particular, the deviation between Apple stock prices and the
normal distribution seems to be greatest in the lower left-hand corner of the graph,
which corresponds to the left tail of the normal distribution. The discrepancy is also
noticeable in the upper right-hand corner of the graph, which corresponds to the
right tail of the normal distribution.

You might also like