0% found this document useful (0 votes)

445 views5 pages

Correlation and Regression - Interview Questions in Business Analytics

Uploaded by

rohit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

445 views5 pages

Correlation and Regression - Interview Questions in Business Analytics

Uploaded by

rohit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2/23/2019 5.

Correlation and Regression - Interview Questions in Business Analytics

 Interview Questions in Business Analytics

PREV NEXT
⏮ ⏭
4. Hypothesis Testing 6. Segmentation
  🔎

© Bhasker Gupta 2016

Bhasker Gupta, Interview Questions in Business Analytics , 10.1007/978-1-

4842-0599-0_5

5. Correlation and Regression

1
Bhasker Gupta

(1)Bangalore, Karnataka, India

This chapter is concerned with measuring the relatedness between two

variables. A simple measure, the correlation coefficient, is commonly used
to quantify the degree of relationship between two variables. In this
chapter, I will discuss different types of regression models, assumptions
and questions related to them, and the estimation method used commonly
with regression models.

Q: What is correlation and what does it do?

In analytics, we try to find relationships and associations among various
events. In the probabilistic context, we determine the relationships
between variables. Correlation is a method by which to calculate the
relationship between two variables. The coefficient of a correlation is a
numerical measure of the relationship between paired observations (Xi,Yi),
i = 1,…, n. For different coefficients of correlations, the relationship
between variables and their interpretation varies.

There are a number of techniques that have been developed to quantify the
association between variables of different scales (nominal, ordinal, interval,
and ratio), including the following:

Pearson product-moment correlation (both variables are measured

on an interval or ratio scale)

Spearman rank-order correlation (both variables are measured on

an ordinal scale)

Phi correlation (both variables are measured on a

nominal/dichotomous scale)

Point biserial (one variable is measured on a nominal/dichotomous

scale, and one is measured on an interval or ratio scale)

Q: When should correlation be used or not used?

Correlation is a good indicator of how two variables are related. It is a good
metric to look at during the early phases of research or analysis. Beyond a
certain point, however, correlation is of little use.

A dip in a country’s gross domestic product (GDP), for example, would lead
to an increase in the unemployment rate. A casual look at the correlation
between these two variables would indicate that there is a strong
relationship between them.

Yet, the extent or measure of this relationship cannot be ascertained

through a correlation. A correlation of 75% between two variables would
not mean that the two variables are related to each other by a measure of
.75 times.

This brings us to another issue that is often misunderstood by analysts.

Correlation does not mean causation. A strong correlation between two
variables does not necessarily imply that one variable causes another
variable to occur.

In our GDP vs. unemployment rate example, this might be true, i.e., a lower
GDP rate might increase unemployment. But we cannot and should not
infer this from a correlation. It should be left to the sound judgment of a
competent researcher.

Q: What is the Pearson product-moment

correlation coe icient?
It is easy to determine whether two variables are correlated, simply by
looking at the scatter plot (each variable on the 2 axis of the plot).
Essentially, the values should be scattered across a straight line of the plot,
in order for the two variables to have a strong correlation.

However, to quantify the correlation, we use the Pearson’s product-

moment correlation coefficient for samples, otherwise known as Pearson’s
r.

The correlation can be either positive or negative.

So, the value of ρ can be any number between -1 to 1, as follows:

➔ [-1 <= ρ <= +1]

A correlation coefficient of less than zero would mean that the increase of
one variable generally leads to the decrease of the other variable. A
coefficient greater than zero implies that an increase of one variable leads
to an increase in the other variable.

Higher values mean stronger relationships (positive or negative), and

values closer to zero depict weak relationships. A correlation of 1.00 means
that the two values are completely or perfectly positively correlated; -1.00
means that they are perfectly negatively correlated; and a correlation of
0.00 means that there is no relationship between the two variables.

Q: What is the formula for calculating the

correlation coe icient?
Find answers on the fly, The
orformula
master something new. Subscribe today. See pricing options.
is as follows:

https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_5_Chapter.html 1/5
2/23/2019 5. Correlation and Regression - Interview Questions in Business Analytics

To compute r, the following algorithm, corresponding to the preceding

formula, is used as follows:

For each (x, y) set of coordinates, subtract the mean from each
observation for x and y.

Divide by the corresponding standard deviation.

Multiply the two results together.

The result is then added to a sum.

The sum is divided by the degrees of freedom, n - 1.

Q: Briefly, what are the other techniques for

calculating correlation?
Although the Pearson product-moment correlation is the most widely used
correlation technique, other correlation techniques must be applied if the
main tenet of the Pearson technique is violated, i.e., both variables should
be on an interval or ratio scale.

SPEARMAN RANK-ORDER CORRELATION

This techniqe is used when both variables are on an ordinal scale. Both
variables are ranked manually between a value of 1 to 10. The main tenet
here is that the rank for a particular observation should be high for both
variables, for any correlation to exist.

PHI CORRELATION
This is also used as an after test for a chi-square test . It is used when
variables are nominal.

POINT BISERIAL
This method is used when one variable is on a nominal/dichotomous scale
and one is measured on an interval or ratio scale.

Q: How would you use a graph to illustrate and

interpret a correlation coe icient?
Visualization helps us to understand the relationships among variables
better. In Figure 5-1, graphs have been plotted to illustrate and visualize
relationships between the variables x and y that are being statistically
analyzed.

Figure 5-1. Graphs plotting the statistical analysis of relationships between the
variables x and y

Now let us interpret these graphs.

Figure (a)
r = 1.0

This represents a perfect linear association. All the data points fall on the
line.

Figure (b)
r = 0

No linear relationship exists between the variables. The data points are
scattered randomly and may approximate a circle. Changing the value of
one variable has no effect on the value of the other.

Figure (c)
r = 0.70

There is some positive linear relationship, although it is not perfect. Most

of the data points fall on or closer to a straight line.

Figure (d)
r = -1.0

This shows a perfect linear relationship between the variables, similar to

figure (a), with the difference being that the variables are inversely related,
i.e., increasing the value of one variable results in a decrease in the other
variable.

Figure (e)
r = 0.51

The relationship between the variables is not very strong, and the data
points are a little scattered, although still closer to a straight line.

Figure (f)
r = -0.70

This is similar to figure (c), with the difference being that the variables are
negatively correlated.

We can see that as the value of r decreases, the data points are more
scattered, whereas the data points are closer to a straight line when the
Find answers on the fly, value
or master something
of r approaches -1.0 or +1.0. new. Subscribe today. See pricing options.

https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_5_Chapter.html 2/5
2/23/2019 5. Correlation and Regression - Interview Questions in Business Analytics

Q: How is a correlation coe icient interpreted ?

A correlation coefficient has both magnitude (unit less) and direction
(positive or negative sign) lying between -1 to 1. A correlation coefficient
with a value of zero indicates that no relationship exists between the
variables. When the correlation coefficient has a non-zero value, and the
values approach ±1, this signifies a strong linear relationship between the
variables. The magnitude of r signifies the strength of the relationship, and
the +ve and -ve signs indicate whether the relationship is directly or
inversely related.

For example, if r = 0.9, this means that the variables are strongly related,
and increasing the value of one results in an increase in the value of the
other. Similarly, if r = -0.9, this indicates that the variables are strongly
related, and increasing the value of one results in a decrease in the value of
the other.

Q: What are the various issues with correlation?

The various issues with correlation are

Correlation analysis does not measure the strength of a nonlinear

association between variables.

Accidental or spurious relationships are not accounted for.

Research problems, such as data contamination, sample bias, etc.,

hinder drawing reliable conclusions.

Correlation analysis measures the relationship and does not provide

an explanation or basis for it, which can result in false conclusions.

Q: How would you calculate a correlation

coe icient in Excel?
A correlation coefficient between two arrays can be calculated in Excel
using the CORREL function . The syntax for using the CORREL function is

CORREL (array1, array2)

For eg:
CORREL (A1:A20, B1:B20)

The CORREL function gives error in the case of the following:

A #N/A error for unequal numbers of data points in the two arrays

A #DIV/0! error when the standard deviation of array1 and array2 is

zero

Q: What is meant by the term “linear regression”?

Linear regression is a statistical modeling technique that attempts to model
the relationship between an explanatory variable and a dependent variable,
by fitting the observed data points on a linear equation, e.g., modeling the
body mass index (BMI) of individuals by weight.

A linear regression is used if there is a relationship or significant

association between the variables. This can be checked by scatter plots. If
no association appears between the variables, fitting a linear regression
model to the data will not provide a useful model.

A linear regression line takes equations in the following form:

Y = a + bX,
Where, X = explanatory variable and
Y = dependent variable.
b = slope of the line
a = intercept (the value of y when x = 0).

Q: What are the various assumptions that an

analyst takes into account while running a
regression analysis?
Regression analysis depends on the following assumptions :

The relationship between the variables should be linear (or

approximately linear) over the range of population being studied.

All the variables in the regression analysis should be normal, i.e.,

should follow the normal curve (exactly or approximately).

There should be no multicollinearity, i.e., the independent variables

should not show correlation among themselves.

There should be no autocorrelation in the data, i.e., the residuals

should be independent of each other.

The condition of homoscedasticity, i.e., the error terms or residuals

along the regression, should be equal.

Q: How would you execute regression on Excel?

Regression on Excel can be performed by using three built-in functions to
2
calculate slope, intercept, and R values or by using the Regression
function provided in the Data Analysis toolbar (after installing Analysis
ToolPak add-ins). The built-in functions are SLOPE(), INTERCEPT(), and
RSQ().

Q: What is the multiple coe icient of

determination or R-squared ?
2
The multiple coefficient of determination, R , is a method by which to
calculate the overall effectiveness (in terms of percentage similar to linear
regression) of all the independent variables in explaining the dependent
variable.

2
For example, if R = 0.8, this means that the independent variables have
80% of the variation in the value of dependent variables.
2
Unfortunately, R alone may not be a reliable measure of the accuracy of
2
the multiple regression model, as R increases every time a new variable is
added in the model, even though the variable might not be statistically
significant. If there is a large number of independent variables, the value of
2
R may be high, even though the variables do not explain the dependent
variable that well. This problem is called overestimating the regression.
2
By adjusting the R value for the number of independent variables, the
problem of overestimating the regression can be overcome.

Q: What is meant by “heteroscedasticity ”?

When the variance of the residuals differs across observations in the
sample, this is called heteroscedasticity . It is one of the errors in regression
analysis that analysts have to test before running the regression analysis.
One of the assumptions of multiple regression is that the variance of the
residuals is constant across observations.

Q: How do you di erentiate between conditional

Find answers on the fly, and
or master something
unconditional new. Subscribe today. See pricing options.
heteroscedasticity?

https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_5_Chapter.html 3/5
2/23/2019 5. Correlation and Regression - Interview Questions in Business Analytics

Unconditional heteroscedasticity occurs in cases in which the level of

independent variables does not affect heteroscedasticity, i.e., it doesn’t
change systematically with changes in the value of independent variables.
Although this is a defilement of the equal variance assumption, it
frequently causes no serious problems with the regression.

Conditional heteroscedasticity is heteroscedasticity that is related to the

level of (i.e., conditional upon) the independent variables.

Q: What are the di erent methods of detecting

heteroscedasticity?
There are two methods of detecting heteroscedasticity: examining scatter
plots of the residuals, and using the Breusch-Pagan chi-square test .
Plotting the residuals against one or more of the independent variables can
help us spot trends among the observations (see Figure 5-2).

Figure 5-2. Plotting residuals against independent variable

The residual plot in the figure indicates the presence of conditional

heteroscedasticity. Notice how the variation in the regression residuals
increases as the independent variable increases. This indicates that the
variance of the dependent variable about the mean is related to the level of
the independent variable.

The more common way to detect conditional heteroscedasticity is the

Breusch-Pagan test, which calls for the regression of the squared residuals
on the independent variables. Independent variables contribute
significantly in explaining squared residuals in case of conditional
heteroscedasticity.

Q: What are the di erent methods to correct

heteroscedasticity?
The most common remedy is to calculate robust standard errors. The t-
statistics is recalculated using the original regression coefficients and the
robust standard errors. A second method to correct for heteroscedasticity is
to use generalized least squares, by modifying the original equation.

Q: What is meant by the term “serial correlation”?

Serial correlation , or autocorrelation, is the phenomenon commonly
observed in time series data, in which there is a correlation between the
residual terms. It is of two types: positive and negative.

When a positive regression error in one time period increases the

probability of observing a positive regression for the next time period, this
is a positive serial correlation. In a negative serial correlation, the positive
regression error causes the probability of observing a negative error to
increase.

Q: What are the di erent methods to detect serial

correlation?
There are two methods that are commonly used to detect the presence of
serial correlation: residual plots and the Durbin-Watson statistic.

A scatter plot of residuals vs. time, such as those shown in Figure 5-3, can
reveal the presence of serial correlation. Figure 5-3 illustrates examples of
positive and negative serial correlation.

Figure 5-3. Scatter plot of residuals vs. time indicating positive and negative serial
correlations

The more common method is to use the Durbin-Watson statistic (DW) to

detect the presence of serial correlation.

Q: What are the di erent methods to correct

multicollinearity?
The most common method to remove multicollinearity is to omit
independent variables having a high correlation with the variable set.
Unfortunately, it is not always an easy task to identify the variable(s) that
are the source of the multicollinearity . There are statistical procedures that
may help in this effort, such as stepwise regression, which systematically
removes variables until multicollinearity is reduced.

A summary of what you have to know regarding violations of the

assumptions of multiple regression is offered in Table 5-1.

Table 5-1. Summary of Violations of Assumptions of Multiple Regression

Conditional Serial
M
Heteroscedasticity Correlation

H
Residual variance
a
related to level of Residuals are
What is it? m
independent correlated
in
variables
v

Coefficients are
C
consistent.
Coefficients are c
Standard errors
consistent. u
are
Standard errors are S
Effect? underestimated.
underestimated. a
Too many Type
Too many Type I o
I errors
errors T
(positive
I
correlation)

Find answers on the fly, or master something new. Subscribe today. See pricing options.

https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_5_Chapter.html 4/5
2/23/2019 5. Correlation and Regression - Interview Questions in Business Analytics

Conditional Serial
M
Heteroscedasticity Correlation

C
F
Breusch-Pagan chi- Durbin-Watson c
Detection?
square test test a
in
v

Use the Hansen

Use White- D
method to
Correction? corrected standard c
adjust standard
errors v
errors.

Q: What is an odds ratio ?

Odds is the relative occurrence of different outcomes, expressed as a ratio
of the form a:b. For example, if the odds of an event are said to be 5:2 in
favor of the first outcome, this means that the first outcome occurs five
times for the second outcome to occur twice. Odds are related to
probability and can be shown mathematically as follows:

Odds = a:b
Probability = a/(a+b)
Probability = Odds/(1 + Odds)
Odds = Probability/(1 - Probability )

Q: How is linear regression di erent from logistic

regression ?
Linear regression is applicable on numerical or continuous variables, but
logistic regression is applicable when the dependent variable is categorical
(a commonly dichotomous variable). The output of logistic regression is
between 0 and 1, where 1 denotes “success” and 0 denotes “failure.” But in
linear regression, the output is continuous, which can assume any range of
value.

Linear regression predicts numerical outputs, such as sales or profit,

whereas logistic regression predicts dichotomous output, such as yes and
no or living and dead.

Recommended / Playlists / History / Topics / Settings / Get the App / Sign Out
© 2019 Safari. Terms of Service / Privacy Policy
PREV NEXT
⏮ ⏭
4. Hypothesis Testing 6. Segmentation

Find answers on the fly, or master something new. Subscribe today. See pricing options.

https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_5_Chapter.html 5/5

Advanced Portfolio Optimization Content 1736604664
No ratings yet
Advanced Portfolio Optimization Content 1736604664
7 pages
Mahesh Kaushik RSI Based Nifty Ki Dukan Sheet
No ratings yet
Mahesh Kaushik RSI Based Nifty Ki Dukan Sheet
268 pages
Predictive Modelling - Final Project Report-Logistic Regression and LDA
100% (1)
Predictive Modelling - Final Project Report-Logistic Regression and LDA
25 pages
Data Science Design
No ratings yet
Data Science Design
299 pages
Supply and Demand Trading Made Easy How To Identify and Profit From Market Trading Zones Even As A Beginner Starting Today (Jason Applebaum) (Z-Library)
No ratings yet
Supply and Demand Trading Made Easy How To Identify and Profit From Market Trading Zones Even As A Beginner Starting Today (Jason Applebaum) (Z-Library)
35 pages
Capitalmind Portfolio - Adaptive Momentum - MASTER
No ratings yet
Capitalmind Portfolio - Adaptive Momentum - MASTER
24 pages
AWS Machine Learning Specialty Master Cheat Sheet
No ratings yet
AWS Machine Learning Specialty Master Cheat Sheet
24 pages
Ultimate Price Action Strategies Guide Download PDF
No ratings yet
Ultimate Price Action Strategies Guide Download PDF
27 pages
Job Evaluation Project
82% (11)
Job Evaluation Project
95 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
RSI Basic
50% (2)
RSI Basic
93 pages
Gaurav Bissa - Applied Harmonics
100% (2)
Gaurav Bissa - Applied Harmonics
69 pages
Patten Strategy
No ratings yet
Patten Strategy
184 pages
Algo Trading Black Box
No ratings yet
Algo Trading Black Box
219 pages
Literature Review Data Mining Techniques
100% (1)
Literature Review Data Mining Techniques
6 pages
Machine Learning Engineer Interview Questions
No ratings yet
Machine Learning Engineer Interview Questions
2 pages
Python Project
No ratings yet
Python Project
22 pages
Japanese Candlestick Charting Techniques - Steve Nison PDF
No ratings yet
Japanese Candlestick Charting Techniques - Steve Nison PDF
346 pages
Interactive Brokers API Reference Guide
No ratings yet
Interactive Brokers API Reference Guide
602 pages
Vector Databases
No ratings yet
Vector Databases
24 pages
Chapter 09
100% (1)
Chapter 09
31 pages
Balancing Coupling in Software Design Universal Design Principles For Architecting Modular Software Systems (Vlad Khononov)
No ratings yet
Balancing Coupling in Software Design Universal Design Principles For Architecting Modular Software Systems (Vlad Khononov)
557 pages
Pandas
No ratings yet
Pandas
3,603 pages
Security Analysis & Portfolio Management
No ratings yet
Security Analysis & Portfolio Management
118 pages
XL Wings
No ratings yet
XL Wings
216 pages
Cheatsheet PCA
No ratings yet
Cheatsheet PCA
1 page
Cronje Chemical 2010 PDF
No ratings yet
Cronje Chemical 2010 PDF
569 pages
Noren Rest API (Original)
No ratings yet
Noren Rest API (Original)
133 pages
Dynamic Programming Guide Sample
No ratings yet
Dynamic Programming Guide Sample
13 pages
Book 1
No ratings yet
Book 1
142 pages
Technical Analysis Using Python - Detecting EMA Crosses - Steemit
No ratings yet
Technical Analysis Using Python - Detecting EMA Crosses - Steemit
15 pages
Global Butane Gas Cartridges Consumption Market Sample Report
No ratings yet
Global Butane Gas Cartridges Consumption Market Sample Report
107 pages
Wiley - Fibonacci Applications and Strategies For Traders - Robert Fischer - (2003)
No ratings yet
Wiley - Fibonacci Applications and Strategies For Traders - Robert Fischer - (2003)
93 pages
Nigeria Transport Policy
No ratings yet
Nigeria Transport Policy
19 pages
Quality Recognition & Prediction: Smarter Pattern Technology With The Mahalanobis-Taguchi System
100% (1)
Quality Recognition & Prediction: Smarter Pattern Technology With The Mahalanobis-Taguchi System
26 pages
RSI Multitimefram
100% (1)
RSI Multitimefram
2 pages
Proposals For Strengthening The Markets-Springer International Pub
No ratings yet
Proposals For Strengthening The Markets-Springer International Pub
173 pages
Design and Analysis of Algorithms For R-2017 by Krishna Sankar P., Shangaranarayanee N.P.
No ratings yet
Design and Analysis of Algorithms For R-2017 by Krishna Sankar P., Shangaranarayanee N.P.
6 pages
Time Series Forecasting With Feed-Forward Neural Networks
100% (1)
Time Series Forecasting With Feed-Forward Neural Networks
40 pages
The Winds of Python
No ratings yet
The Winds of Python
308 pages
Financial Psychology - Complete
No ratings yet
Financial Psychology - Complete
102 pages
Efficient Capital Markets PDF
100% (1)
Efficient Capital Markets PDF
6 pages
2023 Staar Questions
No ratings yet
2023 Staar Questions
53 pages
Quantinsti
No ratings yet
Quantinsti
2 pages
Regression Analysis
No ratings yet
Regression Analysis
29 pages
New 13
No ratings yet
New 13
31 pages
Design of Experiments PDF
No ratings yet
Design of Experiments PDF
52 pages
Chapter 2 Methods of Enquiry in Psychology
No ratings yet
Chapter 2 Methods of Enquiry in Psychology
24 pages
MEMBA 2017 March 10 Competing On Analytics Part I
No ratings yet
MEMBA 2017 March 10 Competing On Analytics Part I
33 pages
Hypothesis Testing - Interview Questions in Business Analytics
No ratings yet
Hypothesis Testing - Interview Questions in Business Analytics
4 pages
Analysis of Variance PDF
No ratings yet
Analysis of Variance PDF
32 pages
FORTUNA KRISHH - CCTV Quotation-2MP CAM
No ratings yet
FORTUNA KRISHH - CCTV Quotation-2MP CAM
3 pages
STATISTICS Documentary
No ratings yet
STATISTICS Documentary
18 pages
MRS - Diana-Correlation Analysis-Notes
No ratings yet
MRS - Diana-Correlation Analysis-Notes
16 pages
Technical Analysis Library in Python Readthedocs Io en Latest
No ratings yet
Technical Analysis Library in Python Readthedocs Io en Latest
72 pages
Time Series Analysis
No ratings yet
Time Series Analysis
25 pages
3.1 Model Check
No ratings yet
3.1 Model Check
20 pages
Business Analytics: by Tanujit Chakraborty (Research Scholar, Isi Kolkata) Mail
No ratings yet
Business Analytics: by Tanujit Chakraborty (Research Scholar, Isi Kolkata) Mail
26 pages
Quantitative Method With Survey Design
No ratings yet
Quantitative Method With Survey Design
18 pages
4 Factor Analysis
No ratings yet
4 Factor Analysis
16 pages
Demand Analysis PDF
No ratings yet
Demand Analysis PDF
20 pages
Praedico Global Research Pvt. LTD.: Report On Technical Analysis of 10 Stocks'
No ratings yet
Praedico Global Research Pvt. LTD.: Report On Technical Analysis of 10 Stocks'
19 pages
OzFx Forex System
100% (2)
OzFx Forex System
2 pages
DS Unit 2 Essay Answers
No ratings yet
DS Unit 2 Essay Answers
17 pages
PDF Chapter 1 Quiz 2019 20 1q Ece154p Cola - Compress
No ratings yet
PDF Chapter 1 Quiz 2019 20 1q Ece154p Cola - Compress
16 pages
A Comprehensive Guide To Ensemble Learning (With Python Codes)
No ratings yet
A Comprehensive Guide To Ensemble Learning (With Python Codes)
21 pages
Exemplos Betas
No ratings yet
Exemplos Betas
12 pages
Chapter 1 (Introduction To Quality)
No ratings yet
Chapter 1 (Introduction To Quality)
24 pages
State Wise Data: Table No
No ratings yet
State Wise Data: Table No
17 pages
Hands-On Deep Learning For Images With T PDF
No ratings yet
Hands-On Deep Learning For Images With T PDF
3 pages
AdvancedPython PartI PDF
No ratings yet
AdvancedPython PartI PDF
44 pages
CSE-Machine Learning & Big Data - WSS Source Book
No ratings yet
CSE-Machine Learning & Big Data - WSS Source Book
181 pages
Developing Machine Learning Applications With TensorFlow
No ratings yet
Developing Machine Learning Applications With TensorFlow
22 pages
Momentum Trading
100% (1)
Momentum Trading
2 pages
Violence Against Women: Gender Justice
No ratings yet
Violence Against Women: Gender Justice
12 pages
Machine Learning and Finance II
No ratings yet
Machine Learning and Finance II
103 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
6 pages
Data Pre-Processing - by Quant Arb - The Quant Stack
No ratings yet
Data Pre-Processing - by Quant Arb - The Quant Stack
9 pages
Intermediate
No ratings yet
Intermediate
13 pages
1 YELLOW BATCH - CP - Seating Plan-1
No ratings yet
1 YELLOW BATCH - CP - Seating Plan-1
32 pages
RM1 - Errors in Business Research
No ratings yet
RM1 - Errors in Business Research
3 pages
Brochure Curriculum
No ratings yet
Brochure Curriculum
14 pages
Correlation SBC
No ratings yet
Correlation SBC
4 pages
Opera Formula
No ratings yet
Opera Formula
4 pages
Understanding Itx: at MCR End
No ratings yet
Understanding Itx: at MCR End
8 pages
INDE 3364 Final Exam Cheat Sheet
No ratings yet
INDE 3364 Final Exam Cheat Sheet
5 pages
NLSIU Bangalore 2015
No ratings yet
NLSIU Bangalore 2015
1 page
Assignment 2 QTB
No ratings yet
Assignment 2 QTB
5 pages
Time Series Analysis
No ratings yet
Time Series Analysis
22 pages
POM 1 - Intro
No ratings yet
POM 1 - Intro
38 pages
Prob and Stats Formula Sheet
No ratings yet
Prob and Stats Formula Sheet
3 pages
Applied Statistic Poster
No ratings yet
Applied Statistic Poster
2 pages
Operator Based Share Market Tips
No ratings yet
Operator Based Share Market Tips
25 pages
Bangalore Ward Patient PDF
No ratings yet
Bangalore Ward Patient PDF
1 page
RSI Unlimited
No ratings yet
RSI Unlimited
2 pages
Exploratory Data Analysis With Matlab PDF
No ratings yet
Exploratory Data Analysis With Matlab PDF
2 pages
Casio Calc Regression
No ratings yet
Casio Calc Regression
2 pages
Technical Analysis Course Syllabus
No ratings yet
Technical Analysis Course Syllabus
1 page
The Roles of Qualitative and Quantitative Research Methods in Linguistic Research
No ratings yet
The Roles of Qualitative and Quantitative Research Methods in Linguistic Research
16 pages

Correlation and Regression - Interview Questions in Business Analytics

Uploaded by

Correlation and Regression - Interview Questions in Business Analytics

Uploaded by

2/23/2019 5.

Correlation and Regression - Interview Questions in Business Analytics

 Interview Questions in Business Analytics

© Bhasker Gupta 2016

Bhasker Gupta, Interview Questions in Business Analytics , 10.1007/978-1-

5. Correlation and Regression

(1)Bangalore, Karnataka, India

This chapter is concerned with measuring the relatedness between two

Q: What is correlation and what does it do?

Pearson product-moment correlation (both variables are measured

Spearman rank-order correlation (both variables are measured on

Phi correlation (both variables are measured on a

Point biserial (one variable is measured on a nominal/dichotomous

Q: When should correlation be used or not used?

Yet, the extent or measure of this relationship cannot be ascertained

This brings us to another issue that is often misunderstood by analysts.

Q: What is the Pearson product-moment

However, to quantify the correlation, we use the Pearson’s product-

The correlation can be either positive or negative.

So, the value of ρ can be any number between -1 to 1, as follows:

➔ [-1 <= ρ <= +1]

Higher values mean stronger relationships (positive or negative), and

Q: What is the formula for calculating the

To compute r, the following algorithm, corresponding to the preceding

Divide by the corresponding standard deviation.

Multiply the two results together.

The result is then added to a sum.

The sum is divided by the degrees of freedom, n - 1.

Q: Briefly, what are the other techniques for

SPEARMAN RANK-ORDER CORRELATION

Q: How would you use a graph to illustrate and

Now let us interpret these graphs.

There is some positive linear relationship, although it is not perfect. Most

This shows a perfect linear relationship between the variables, similar to

Q: How is a correlation coe icient interpreted ?

Q: What are the various issues with correlation?

Correlation analysis does not measure the strength of a nonlinear

Accidental or spurious relationships are not accounted for.

Research problems, such as data contamination, sample bias, etc.,

Correlation analysis measures the relationship and does not provide

Q: How would you calculate a correlation

CORREL (array1, array2)

The CORREL function gives error in the case of the following:

A #DIV/0! error when the standard deviation of array1 and array2 is

Q: What is meant by the term “linear regression”?

A linear regression is used if there is a relationship or significant

A linear regression line takes equations in the following form:

Q: What are the various assumptions that an

The relationship between the variables should be linear (or

All the variables in the regression analysis should be normal, i.e.,

There should be no multicollinearity, i.e., the independent variables

There should be no autocorrelation in the data, i.e., the residuals

The condition of homoscedasticity, i.e., the error terms or residuals

Q: How would you execute regression on Excel?

Q: What is the multiple coe icient of

Q: What is meant by “heteroscedasticity ”?

Q: How do you di erentiate between conditional

Unconditional heteroscedasticity occurs in cases in which the level of

Conditional heteroscedasticity is heteroscedasticity that is related to the

Q: What are the di erent methods of detecting

Figure 5-2. Plotting residuals against independent variable

The residual plot in the figure indicates the presence of conditional

The more common way to detect conditional heteroscedasticity is the

Q: What are the di erent methods to correct

Q: What is meant by the term “serial correlation”?

When a positive regression error in one time period increases the

Q: What are the di erent methods to detect serial

The more common method is to use the Durbin-Watson statistic (DW) to

Q: What are the di erent methods to correct

A summary of what you have to know regarding violations of the

Table 5-1. Summary of Violations of Assumptions of Multiple Regression

Use the Hansen

Q: What is an odds ratio ?

Q: How is linear regression di erent from logistic

Linear regression predicts numerical outputs, such as sales or profit,

You might also like