Regression & Correlation
Regression & Correlation
Regression & Correlation
1 (∑ 𝑥) (∑ 𝑦)
2) Calculate Combine Variance 𝑆𝑥𝑦 = 𝑛−1
(∑ 𝑥𝑦 − 𝑛
)
1 (∑ 𝑥)2
3) Calculate Variance of X 𝑆𝑥2 = 𝑛−1
(∑ 𝑥 2 − 𝑛
)
𝑆𝑥𝑦
4) Calculate Slope 𝑏𝑦𝑥 = 𝑆𝑥2
1 (∑ 𝑥) (∑ 𝑦)
2) Calculate Combine Variance 𝑆𝑥𝑦 = 𝑛−1
(∑ 𝑥𝑦 − 𝑛
)
1 (∑ 𝑦)2
3) Calculate Variance of X 𝑆𝑦2 = 𝑛−1
(∑ 𝑦 2 − 𝑛
)
𝑆𝑥𝑦
4) Calculate Slope 𝑏𝑥𝑦 = 𝑆𝑦2
X 1 1 2 3 4 4 5 6 6 7
Y 2.1 2.5 3.1 3.0 3.8 3.2 4.3 3.9 4.4 4.8
a) Compute the least-squares regression equation for Y values on X values, that is the equation Ŷ = a +
bX.
b) Compute the residuals and verify that they add to zero.
c) Use the regression equation to predict the value of Y when X=10.
The owner of a retailing organization is interested in the relationship between price at which a commodity is
offered for sale and the quantity sold. The following sample data have been collected.
Price 25 45 30 50 35 40 65 75 70 60
Quantity sold 118 105 112 100 111 108 95 88 91 96
(a) Plot a scatter diagram for the above data.
(b) Using the method of least squares, determine the equation for the estimated regression line. Plot this
line on the scatter diagram.
Given the following set of values: Determine the equation of the least square regression line.
X 20 11 15 10 17 19
Y 5 15 14 17 8 9
The data in the following table gives the market share of product television advertising expenditure:
X=Advertising Expenditure 15 17 13 14 16
Y=Market Share 23 25 21 24 26
Estimate market share when advertising expenditure is 20:
A)28 B)28.8
C)26.5 D)25.6
Y=45-3x is the regression line of y on x, what number of units is expected to increase in ‘y’ if ‘x’ is
decreased by two units?
A)6 B)7
C)8 D)9
A researcher finds that there is a linear relationship between amount of fertilizer supplied to tomato plants
and the subsequent yield of tomatoes obtained. Eight tomato plants of the same variety were selected at
random and treated weekly with a solution in which x grams of fertilizer was dissolved in a fixed quantity of
water. The yield, y kilograms of tomatoes was recorded
Plant A B C D E F G H
X 1 1.5 2 2.5 3 3.5 4 4.5
Y 3.9 4.4 5.8 6.6 7 7.1 7.3 7.7
Estimate the yield of a plant treated weekly with 3.2 grams of fertilizer
A)6.7 B)3.9
C)7.6 D)8.2
A regression analysis between sales (in RS:1000) and advertising (in RS: 1000) resulted in the following
least squares line Y = 80+5x this implies that:
The two lines of regression are given by 8x+10y=25 and 16x+5y=12 respectively. If the variance of x is 25.
What is the standard deviation of y?
A)16 B)8
C)64 D)4
Given the following data
Variable X Y
Mean 80 98
Variance 4 9
Coefficient of correlation=0.6 What is the most likely value of y when X=90
A)90 B)103
C)104 D)107
Regression line ‘x on y ‘given. You are required to find the change in y due to 1 unit increase in x
a) Y will increase 3 units b) Y will increase 5 units
c) Y will decrease 3 units d) Y will increase 2 units
For the variables ‘ x and y ’ the regression equation are given as 7x-3y-18=0 and 4x-y-11=0 respectively
find arithmetic of x and y
a) Mean (x)=3 and Mean (y)=1 b) Mean (x)=2 and Mean (y)=3
c) Mean (x)= 2 and Mean (y)=1 d) Mean (x)=1 and Mean (y)=3
If y=3x+4 is the regression line ‘x on y’ and the arithmetic mean of x is -1 , what is the arithmetic mean of
y?
A) 1 b) -1
c) 7 d) 5
Suppose that four randomly chosen plots where treated with various level of fertilizer resulting in the following
yields of corn.
Fertilizer(kg/Acre) X 100 200 400 500
Production (Bushels/Acre) Y 70 70 80 100
i. Estimate the linear regression
ii. Estimate the yield when no fertilizer is applied.
iii. Estimate the yield when the average amount of fertilizer is applied.
iv. Estimate how much yield is increased for every kilogram of fertilizer
In regression, the dependent variable is assumed to be a random variable whereas the independent
variable is assumed to have:
A) Random values B) Fixed values
C) Both (a) or (b) D) None of these
The difference between the observed value and the estimated value in regression analysis is known as:
A) Error B) Residue
C) Deviation D) A or B
The errors in case of regression equations are:
A) Positive B) Negative
C) Zero D) All of these
The regression line of y on is derived by:
A) The minimization of vertical distances in the B) The minimization of horizontal distances in the
scatter diagram scatter diagram
C) Both A and B D) A or B
Which of the following represents the proportion of variation in dependent variable that is explained by
the independent variable?
A) Co-efficient of determination B) Co-efficient of correlation
C) Regression co-efficient D) None of these
As the angle between two regression lines increases the correlation co-efficient:
A) Remains same B) increases
C) Decreases D) None of these
The independent variable is also called:
A) Regressor B) Predictor
C) Regression D) All of these
In regression problem, the independent and dependent variables are:
A) Both fixed B) Both random
C) Independent variable fixed & dependent D) Dependent variable fixed & independent
variable random variable random
if a constant amount of change in the predicted variable is associated with a unit change in the
predicting variable the relation is said to be:
A) Linear B) Non linear
C) Inverse D) None of these
The two regression co-efficient always have:
A) Opposite signs B) Same signs
C) Not definite D) No signs
Which of the following is true when the slope of a regression line is positive?
A) Correlation co-efficient between the B) The regression line is parallel to the horizontal
independent and independent variable is 1 line
C) There is positive correlation between the D) None of these
dependent and independent variables
The regression co-efficient in independent of change of:
A) Origin B) Scale
C) Both D) None of these
Regression equation is also called:
A) Predicting equation B) Estimating equation
C) Line of average relationship D) All A,B and C
The regression line of x on y derived by method of least square:
A) Minimizes the horizontal distances in scatter B) Maximizes the vertical distances in scatter
diagram diagram
C) Minimizes the vertical distances in scatter D) Maximizes the horizontal distances in scatter
diagram diagram
Correlation
Introduction
Correlation is the degree of covariation between variables. The correlation coefficient, denoted by r, is a
measure of the strength of the straight-line or linear relationship between two variables. The correlation
coefficient takes on values ranging between +1 and -1. The following points are the accepted guidelines for
interpreting the correlation coefficient
𝑟 = ±√𝑏𝑥𝑦 . 𝑏𝑦𝑥
𝑆𝑥𝑦
𝑟=
𝑆𝑥 𝑆𝑦
Note:
o byx and bxy = + ve then r =+ ve
o byx and bxy = -ve then r = -ve
o Coefficient of correlation is pure number, it does not dependent upon the unit of measurement.
o The correlation coefficient is symmetrical with respect to X and Y, i.e. 𝑟𝑥𝑦 = 𝑟𝑦𝑥
o The correlation coefficient lies between –1 and +1. i.e. –1 ⩽ r ⩽+1.
6 ∑ 𝑑2
𝑟 =1−
𝑛(𝑛2 − 1)
Rank of the equation is given
Judge x 1 2 3 4 5 6 7 8
Judge y 6 5 1 4 3 2 8 7
A 35 40 42 43 40 53 54 49 41 55
B 102 101 97 98 38 101 97 92 95 95
PRACTICE QUESTION
1. The two regression coefficients have following value, 𝑏𝑦𝑥 = 0.86, 𝑏𝑥𝑦 = 0.95 find r.
2. Find the coefficient of correlation if the two regression coefficients have the following values -
0.1 and -0.4
3. The following are the results are given r = 0.60, 𝑆𝑥2 = 9, 𝑏𝑥𝑦 = 0.80, find 𝑆𝑦 .
4. For a given set of data, we have, r= 0.48, 𝑆𝑥2 = 16, 𝑆𝑥𝑦 = 36, find 𝑆𝑦 .
5. For a set of 50 pairs of observations the standard deviations of X and Y are 4.5 and 3.5
respectively. If the sum of products of deviations of X and Y values from their respectively means
be 420, find the Karl Pearson’s coefficient of correlation.
The correlation coefficient between two variables:
A) is a unit free measure B) Is expressed as product of units of two variables
C) Is expressed in units of first variable D) Is expressed in units of second variable
Correlation co-efficient is independent of change of origin:
A) Is always false B) Is always true
C) can be false D) Can be true
In rank correlation the association should be linear:
A) False B) True
C)A&B D) None of these
If the values of two different variables (say x and y) are plotted on a rectangular axes, such a plot is
referred to as a:
A) Frequency diagram B) Value diagram
C) Scatter diagram D) None of these
From the inspection of scatter diagram if it is seen that the points follow closely a straight line, it
indicates that the two variables are to some extent:
A) Unrelated B) Related
C) Linearly related D) None of these
In a scatter diagram, if the points follow closely a straight line of positive slope, the two variables are
said to have:
A) No correlation B) High positive correction
C) Negative correlation D) None of these
In a scatter diagram, if the points follow clearly a straight line of negative slope, the two variables are
said to have:
A) No correlation B) High positive correlation
C) High negative correlation D) None of these
In a scatter diagram, if the points follow a strictly random pattern, the two variables are said to have:
A) No linear relationship B) Low positive relationship
C) Low negative relationship D) None of these
A measure of the strength or degree of relationship or the interdependence is called:
A) Correlation B) Regression
C) Least square estimate D) None of these
The phenomenon that investigates the dependence of one variable on one or more independent
variables is called:
A) Correlation B) Regression
C) Least square estimate D) None of these
The linear relation between a dependent and an independent variable is called:
A) Regression line B) Regression co-efficient
C) Co-efficient of correlation D) None of these
Slope of the regression line is called:
A) Regression parameter B) Sample parameter
C) Regression co-efficient D) None of these
In regression analysis, if the value of a is positive the value of b:
A) Must be positive B) May take any value
C) Must be negative D) Less than -1or more than 1
The procedure which selects that particular line for which the sum of the squares of the vertical
distances from the observed points to the line is as small as possible, is called:
A) Sum of squares method B) Sum of squares of errors method
C) Least square method D) None of these
The numerical values of regression co-efficient must be:
A) Both positive B) Both negative
C) Both positive or both negative D) None of these
In regression, the dependent variable is assumed to be a random variable whereas the independent
variable is assumed to have:
A) Random values B) Fixed values
C) Both (a) or (b) D) None of these
Which of the statements about Spearman’s Co-efficient of Correlation is NOT correct:
A) It can co-relate two or more set of rankings B) It applies only when no ties exist
C) Both (a) and (b) D) None of the above
If two variables tend to vary simultaneously in some direction, they are said to be:
A) Dependent B) Independent
C) Correlated D) None of these
If two variable tends to increase (or decrease) together, the correlation is said to be
A) Zero B) Direct or positive
C) 1 D) None of these
If one variable tends to increase as the other variable decreases, the correlation is said to be:
A) Zero B) Inverse or negative
C) -1 D) None of these
While calculating “r” if x and y are interchanged i.e. instead of calculating if is calculated then:
A) = B)>
C) < D) None of these
Limits of the co-efficient of Correlation are:
A) -1 to 0 B) 0 to 1
C) 1- to +1 D) None of these
If r = 0.9 and if 5 is subtracted from each observation of x, then r will:
A) Decrease by 5 units B) Decreases by less than 5 units
C) Remain unchanged D) None of these
If r = 0.9 and if 5 is added to each observation of x, then r will:
A) Increase by 5 units B) Increase by more than 5 units
C) Remain unchanged D) None of these
If r = 0.9 and if 3 is subtracted from each observation of Y, then r will:
A) Decrease by 3 units B) Decrease by less than 3 units
C) Remain unchanged D) None of these
If r = 0.9 and if 3 is added to each observation of y, then r will:
A) Increase by 3 units B) Increase by more than 3 units
C) Remain unchanged D) None of these
If r = 0.9 and if 3 is subtracted from each observation of x and 5 is added to each observation of y, then
r will:
A) Decrease by 2 units B) Increase by 2 units
C) Remain unchanged D) None of these
If r = 0.9 and each observation of x is multiplied by 100, then r will:
A) Increase by 100 times B) Less than 100 times
C) Remain unchanged D) None of these
If r = 0.9 and each observation of Y is divided by 10, then r will:
A) Decrease by 10 times B) Decrease by less than 10 times
C) Remain unchanged D) None of these
If r = 0.9 and each observation of x and y is divided by 10, then r will:
A) Decrease by 10 times B) Decrease by 100 times
C) Remain unchanged D) None of these
The co-efficient of correlation is independent of:
A) Only origin B) Only scale
C) Origin and scale D) None of these
The geometric mean of two regressions co-efficient is equal to:
A) Co-efficient of determination B) Co-efficient of correlation
C) Co-efficient of rank correlation D) None of these
If 𝑏𝑥𝑦 = -0.78 and 𝑏𝑦𝑥 -0.45, then r is equal to:
A) +0.351 B) -0.351
C) Cannot be determined D) None of these
If 𝑏𝑥𝑦 -0.78 and 𝑏𝑦𝑥 0.45, then r is equal to:
A) +0.351 B) -0.351
C) Cannot be determined D) None of these
If 𝑏𝑥𝑦 +1.93 and 𝑏𝑦𝑥 = 0.6, then r is equal to:
A) 1.158 B) 1.0761
C) Data is fictitious D) None of these
If 𝑏𝑥𝑦 = 1.93 and 𝑏 𝑦𝑥 = 0.51, then r is equal to:
A) 0.9843 B) 0.992
C) Data is fictitious D) None of these
If 𝑏𝑥𝑦 = -1.93 and 𝑏 𝑦𝑥 0.51, then r is equal to:
A) -0.9843 B) -0.992
C) Data is fictitious D) None of these
The quantity which describes that the proportion (or percentage) of variation in the dependent
variable explained (or reduced) by the independent variable is called:
A) Co-efficient of determination B) Co-efficient of regression
C) Co-efficient of correlation D) None of these
If r = 0.8, then the variation in the dependent variable y due to independent variable x is about:
A) 80% B) 64%
C) 64% to 80% D) None of these
If r = 0.8 and byx 1.04 then bxy is equal to:
A) 0.769 B) 0.615
C) Cannot be determined D) None of these
2
If 𝑟 = 0.796 and bxy -1.04 then byx is equal to:
A) 0.765 B) -0.765
C) Cannot be determined D) None of these
Correlation analysis is aim at:
A) Predicting one variable for a given value of the B) Establishing relation between two variable
other variable
C) Measuring the extent of relation between two D) Both B & C
variables
What is spurious correlation?
A) It is bad relation between two variables B) it is very low correlation between two variables
C) It is the correlation between two variables D) It is negative correlation
having no casual relation
Scatter diagram is considered for measuring:
A) Linear relationship between two variables B) Curvilinear relationship between two variables
C) Neither A or B D) Both A & B
if the plotted points in a scatter diagram lie from upper left or lower right, then the correlation is:
A) Positive B) Zero
C) Negative D) None of these
If the plotted points in a scatter diagram are evenly distributed, then the correlation is:
A) Zero B) negative
C) Positive D) None of these
If all the plotted points In a scatter diagram lie on a single line, then the correlation is:
A) Perfect positive B) Perfect negative
C) Both A & B D) Either A or B
The correlation between shoe-size and intelligent is:
A) Zero B) Positive
C) Negative D) None of these
The correlation between the speed of an automobile and the distance travelled by it after applying the
brakes is:
A) Negative B) Zero
C) Positive D) None of these
Scattered diagram helps us to:
A) find the nature correlation between two B) Compute the extent of correlation between
variables two variables
C) Obtain the mathematical relationship between D) Both A & C
two variables
Pearson‟s correlation is used to for finding:
A) Correlation for any type of relation B) Correlation for linear relation only
C) Correlation for curvilinear relation only D) Both A & C
Product moment correlation coefficient is considered for:
A) Finding the nature of correlation B) Finding the amount of correlation
C) Both A & B D) Either A & B
If the value of correlation coefficient is positive, then the points in a scatter diagram tend to cluster:
A) From lower left corner to upper right corner B) From lower left corner to lower right corner
C) From lower right corner to upper left corner D) From lower right corner to upper right corner
Product moment correlation coefficient may be defined as the ratio of:
A) The product of standard deviations of the two B)The covariance between the variables to the
variables to the covariance between them product of the variance of them
C) The covariance between the variables to the D) Either B or C
product of their standard deviations
The covariance between two variables is:
A) Strictly positive B) Strictly negative
C) Always 0 D) Either positive or negative or zero
Which of the following is NOT a possible value of the correlation coefficient?
A) Negative 0.9 B) Zero
C) positive 0.15 D) Positive 1.5
The coefficient of correlation between two variables:
A) can have any unit B) Is expressed as the product of units of the two
variables
C) Is a unit free measure D) None of these
What are the limits of the correlation coefficient?
A) No limit B) -1 and 1
C) 0 and 1, including the limits D) -1 and 1, including the limits
For finding correlation between two attributes, we consider:
A Person‟s correlation coefficient B) scatter diagram
C) Spearman’s rank correlation coefficient D) Coefficient of concurrent deviations
For finding the degree of agreement about beauty, between two judges in a Beauty Contest, we use:
A) Scatter diagram B) Coefficient of rank correlation
C) Coefficient of correlation D) Coefficient of concurrent deviation
If there is a perfect disagreement between the marks in Geography and Statistics, then what would be the
value of rank correlation coefficient?
A) Any value B) Only 1
C) Only -1 D) B or C
When we are not concerned with the magnitude of the two variables under discussion, we consider:
A) Rank correlation coefficient B) Product moment correlation coefficient
C) Coefficient of concurrent deviation D) A or B but not c
What is the quickest method to find correlation between two variables?
A) Scatter diagram B) Method of concurrent deviation
C) Method of rank correlation D) Method of product moment correlation
What are the limits of the coefficient of concurrent deviations?
A) no limit B) Between -1 and 0, including the limiting values
C) Between 0 and 1, including the limiting values D) Between -1 and 1, the limiting values
The correlation between Shoes size and IQ:
A) Positive B) Negative
C) Might be any D) None of these
If the coefficient of determination is positive then „r‟:
A) Must be positive B) Must be negative
C) Might be any D) None of these
Bivariate are the data collected for:
A) One variable B) Two variables at different point of time
C) Two variables at some point of time D) None of these
Correlation analysis is used to:
A) Predict one variable for a given value of other B) Establish relationship between two variables
variable
C) Measure the extent of relation between two D) Both B or C
variables
If the plotted points in a scatter diagram lie from lower left to upper right, then correlation is:
A) Negative B) Positive
C) Perfect negative D) Perfect positive
Sign of product moment correlation co-efficient depends on:
A) Variance of X B) variance of Y
C) Co-variance D) Product of two standard deviation
Karl-Pearson‟s Correlation is the ration of:
A) Two variances B) The product of standard deviation of two
variables to the covariance
C) The covariance to the product of standard D) The covariance to the product of variance of
deviations of two variables two variables
Which of the following method take magnitude of observation into account:
A) Scatter diagram B) Correlation
C) Rank correlation D) All of these