Correlation and Regression - Interview Questions in Business Analytics
Correlation and Regression - Interview Questions in Business Analytics
PREV NEXT
⏮ ⏭
4. Hypothesis Testing 6. Segmentation
🔎
There are a number of techniques that have been developed to quantify the
association between variables of different scales (nominal, ordinal, interval,
and ratio), including the following:
A dip in a country’s gross domestic product (GDP), for example, would lead
to an increase in the unemployment rate. A casual look at the correlation
between these two variables would indicate that there is a strong
relationship between them.
In our GDP vs. unemployment rate example, this might be true, i.e., a lower
GDP rate might increase unemployment. But we cannot and should not
infer this from a correlation. It should be left to the sound judgment of a
competent researcher.
A correlation coefficient of less than zero would mean that the increase of
one variable generally leads to the decrease of the other variable. A
coefficient greater than zero implies that an increase of one variable leads
to an increase in the other variable.
https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_5_Chapter.html 1/5
2/23/2019 5. Correlation and Regression - Interview Questions in Business Analytics
For each (x, y) set of coordinates, subtract the mean from each
observation for x and y.
PHI CORRELATION
This is also used as an after test for a chi-square test . It is used when
variables are nominal.
POINT BISERIAL
This method is used when one variable is on a nominal/dichotomous scale
and one is measured on an interval or ratio scale.
Figure 5-1. Graphs plotting the statistical analysis of relationships between the
variables x and y
Figure (a)
r = 1.0
This represents a perfect linear association. All the data points fall on the
line.
Figure (b)
r = 0
No linear relationship exists between the variables. The data points are
scattered randomly and may approximate a circle. Changing the value of
one variable has no effect on the value of the other.
Figure (c)
r = 0.70
Figure (d)
r = -1.0
Figure (e)
r = 0.51
The relationship between the variables is not very strong, and the data
points are a little scattered, although still closer to a straight line.
Figure (f)
r = -0.70
This is similar to figure (c), with the difference being that the variables are
negatively correlated.
We can see that as the value of r decreases, the data points are more
scattered, whereas the data points are closer to a straight line when the
Find answers on the fly, value
or master something
of r approaches -1.0 or +1.0. new. Subscribe today. See pricing options.
https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_5_Chapter.html 2/5
2/23/2019 5. Correlation and Regression - Interview Questions in Business Analytics
For example, if r = 0.9, this means that the variables are strongly related,
and increasing the value of one results in an increase in the value of the
other. Similarly, if r = -0.9, this indicates that the variables are strongly
related, and increasing the value of one results in a decrease in the value of
the other.
A #N/A error for unequal numbers of data points in the two arrays
Y = a + bX,
Where, X = explanatory variable and
Y = dependent variable.
b = slope of the line
a = intercept (the value of y when x = 0).
2
For example, if R = 0.8, this means that the independent variables have
80% of the variation in the value of dependent variables.
2
Unfortunately, R alone may not be a reliable measure of the accuracy of
2
the multiple regression model, as R increases every time a new variable is
added in the model, even though the variable might not be statistically
significant. If there is a large number of independent variables, the value of
2
R may be high, even though the variables do not explain the dependent
variable that well. This problem is called overestimating the regression.
2
By adjusting the R value for the number of independent variables, the
problem of overestimating the regression can be overcome.
https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_5_Chapter.html 3/5
2/23/2019 5. Correlation and Regression - Interview Questions in Business Analytics
A scatter plot of residuals vs. time, such as those shown in Figure 5-3, can
reveal the presence of serial correlation. Figure 5-3 illustrates examples of
positive and negative serial correlation.
Figure 5-3. Scatter plot of residuals vs. time indicating positive and negative serial
correlations
Conditional Serial
M
Heteroscedasticity Correlation
H
Residual variance
a
related to level of Residuals are
What is it? m
independent correlated
in
variables
v
Coefficients are
C
consistent.
Coefficients are c
Standard errors
consistent. u
are
Standard errors are S
Effect? underestimated.
underestimated. a
Too many Type
Too many Type I o
I errors
errors T
(positive
I
correlation)
Find answers on the fly, or master something new. Subscribe today. See pricing options.
https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_5_Chapter.html 4/5
2/23/2019 5. Correlation and Regression - Interview Questions in Business Analytics
Conditional Serial
M
Heteroscedasticity Correlation
C
F
Breusch-Pagan chi- Durbin-Watson c
Detection?
square test test a
in
v
Odds = a:b
Probability = a/(a+b)
Probability = Odds/(1 + Odds)
Odds = Probability/(1 - Probability )
Recommended / Playlists / History / Topics / Settings / Get the App / Sign Out
© 2019 Safari. Terms of Service / Privacy Policy
PREV NEXT
⏮ ⏭
4. Hypothesis Testing 6. Segmentation
Find answers on the fly, or master something new. Subscribe today. See pricing options.
https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_5_Chapter.html 5/5