0% found this document useful (0 votes)
93 views4 pages

Appendix: R C A - Q R

The document discusses regression analysis and correlation. It defines dependent and independent variables and describes linear regression equations that show the relationship between variables. It explains how to calculate the slope, intercept, multiple regression coefficients, correlation coefficients, and how to determine if relationships are statistically significant. Regression is used to predict how the dependent variable changes as the independent variables change. Correlation measures the strength and direction of the relationship between variables.

Uploaded by

antonicaalina
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views4 pages

Appendix: R C A - Q R

The document discusses regression analysis and correlation. It defines dependent and independent variables and describes linear regression equations that show the relationship between variables. It explains how to calculate the slope, intercept, multiple regression coefficients, correlation coefficients, and how to determine if relationships are statistically significant. Regression is used to predict how the dependent variable changes as the independent variables change. Correlation measures the strength and direction of the relationship between variables.

Uploaded by

antonicaalina
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Appendix

REGRESSION AND CORRELATION ANALYSIS – A QUICK


REFERENCE1

Variables

For certain observed phenomena, we have:


- a dependent variable, or the variable (phenomenon) whose variation
is to be explained, and
- independent variables, or the variables believed on the basis of
economic theory to influence the dependent variable.

Relationship (the regression equation)

We assume a linear relationship between two variables characterized by a


straight line of the form

Y = a + b X,
Where
- Y (the dependent variable) is said to be regressed on X
(the independent variable);
- a is the vertical intercept;
- b is the slope.

The relationship is visualized by means of actual observations plotted on


a graph as a straight line and fitted into a regression line by making use of
the “least squares” criterion (a mathematical convenience). The intercept
value says that if one could extrapolate to a situation of zero X, the
average level of Y is predicted to be “a”. The slope value of “b” says that

1
This part is based on Mica enciclopedie matematica (Mathematics at a Glance), Ed.
Tehnica, Bucuresti, 1975, 748-749, and Scherer, F.M., Industrial Market Structure and
Economic Performance, Chicago, 1980, 601-7.
A Workbook in International Trade

with each percentage point increase in X, Y rises on average by “b”


percentage points.
A simple computational method for obtaining the numerical values of “a”
and “b” (the regression coefficients) is to deduct from each observation
the mean value ( X , Y ) and the corresponding X/Y values deviations
from mean (Xi- X ; Yi- Y ).

n n
1⎛ n n

∑ ( X i − X )(Yi − Y ) ∑
i =1
X Y
i i − ⎜ ∑
n ⎝ i =1
X i ∑ Yi ⎟
i =1 ⎠
i =1
b= n
= 2
1⎛ ⎞
∑(X
n n
− X )2
∑ Xi − ⎜∑ Xi ⎟
2
i
i =1 i =1 n ⎝ i =1 ⎠

a= Y-bX

If there are more than one explanatory variable, a multiple regression


analysis is then required of the form
Y = a + b1X1 + b2X 2+…+ bkXk,
where b1, b2,..., bk are multiple regression coefficients analogous to the
single slope coefficient “b” in the two-variable regression equation. Each
reflects the influence on Y of the independent variable to which it relates,
given the values of the other explanatory variables.

Correlation

The correlation coefficients (r) answer how powerful is the relationship


under observation. The measurement produces
- the sum of squares of deviations between observed and mean values
of Y (SSDM)
- the sum of squares of deviations between observed and predicted
values of Y (deviations from regression) (SSDR).
We can say now that through regression we have explained (SSDM-
SSDR)/SSDM*100 percent of the SSDM. This value in ratio form is
called r2 (r-square), or in analyses with additional explanatory variables,
Appendix

R2. The maximum possible value of r2 is 1.0, and the minimum possible
value is 0.
The unsquared correlation coefficient, r, can take on either positive or
negative values ranging from +1.0 to -1.0. A formula to calculate r is
given by

n
∑ ( X i − X )(Yi − Y )
i =1
r=
n n
∑ (X i − X )2 ∑ (Yi − Y ) 2
i =1 i =1
A negative correlation reveals that the variables under investigation vary
inversely, while the nearer the extremes of +1.0 or -1.0 the coefficient
comes, the stronger the correlation is said to be.

Statistical significance

The value estimated in the regression equation (“b”) can explain the
relationship with a certain degree of reliability. The standard error of
predicted values is given by
SSDR ( N − 2)
, where N is the number of observations in the
∑ ( X − X ) 2

sample.
The size of the computed slope coefficient in relation to that parameter
gives the t-ratio, which provides information – with reference to standard
statistical tables – about the probability of wrongly inferring the existence
of relationships that owe their appearance in data samples more to chance
than to some link of genuine economic significance. As a crude rule of
thumb, economists conclude that observed relationships are “statistically
significant” when a coefficient exceeds its standard error by two times or
more.
A Workbook in International Trade

Summary
A regression equation of the form

Y = 4.99 + 0.088 X1 + 0.637 X2


(.018) (.211)

R2=0.862
reads as follows:
The intercept value of 4.99 predicts the Y level when both X1 and X2 are
zero. Y rises by 0.088 percentage points for every percentage point
increase in X1 and by 0.637 for every percentage point increase in X2.
The correlation coefficient R2 reveals the proportion of SSDM explained
by the independent variables. The values given in parentheses are the
standard errors of the regression coefficients. Both standard errors are
less than a third of the coefficient values (i.e. the t-ratios are above 3.0),
suggesting that X1 and X2 each have a statistically significant influence
on Y.

You might also like