Corelation and Regression
Corelation and Regression
Page | 2
Fourth, once researchers decide to use correlations, they must check the assumptions
that underlie using the specific correlational tool. Naturally, the assumptions for different
types of correlations are different. Before researchers can complete correlation studies,
they must examine and report the evidence that assumptions underlying the use of the
Page | 3
statistics have been met.
A nonzero correlation does not mean that a causal relationship has been established. A
correlation coefficient is only a measure of how much the variables coincide with each
other. If there were a causal relationship between variables, we would expect to observe
a strong correlation, but it is not the case that a strong correlation alone means that a
cause-and-effect relationship exists. Other possibilities may explain high correlations.
3. Identifying a causal relationship often is viewed as an example of logic, rather than
statistics, because claims about causal relationships between variables have been based
traditionally on the fulfillment of the Humean criteria, as elaborated for the social
sciences by Paul Lazarsfeld and his coworkers
A third factor may explain high correlations. For instance, until the virtual elimination of
polio, there was a strong correlation between the per capita amount of ice cream
consumed during a month by North Americans and the number of polio cases. Casual
observers wondered if something in ice cream might have contributed to susceptibility
to polio. Of course, the reason for the high correlation was that polio was a disease with
its highest incidence during the warm summer months. Naturally, ice cream sales tended
to increase during the summer months as well.
• Sometimes the causal relationship exists, but it is in the opposite direction presumed by
individuals. For instance, it was observed that the greater the number of small appliances
one owned, the fewer children one tended to have. This information seemed to suggest a
new breakthrough in family planning: sending small appliances such as toasters and hair
dryers to places where the population was exploding. Of course, the causal relationship
was not in the direction implied by the statement; it was in the opposite direction. If you
had few children, you could afford to buy small appliances for yourself, but if you had
many children, you might not have spare money to buy many small appliances.
Sometimes correlations seem to advance causal relationships when the research
methods used did not permit drawing causal claims. For instance, survey research
showed a high correlation between one's amount of self-disclosure, trust, and
interpersonal solidarity (“feelings of closeness between people that develop as a result of
shared sentiments, similarities, and intimate behaviors” [Rubin, 1994, p. 223]). But the
survey measured all these matters at the same time. Identifying a clear starting point
may not really be possible. The survey method may show an association among variables,
but not which variable may trigger any effects.
Page | 4
In case of multivariate population: Correlation can be studied through (a) coefficient of
multiple correlation; (b) coefficient of partial correlation; whereas cause and effect
relationship can be studied through multiple regression equations.
Cross tabulation approach is specially useful when the data are in nominal form. Under it
we classify each variable into two or more categories and then cross classify the variables
in these subcategories.
Then we look for interactions between them which may be symmetrical, reciprocal or
asymmetrical. A symmetrical relationship is one in which the two variables vary together,
but we assume that neither variable is due to the other. A reciprocal relationship exists
when the two variables mutually influence or reinforce each other. Asymmetrical
relationship is said to exist if one variable (the independent variable) is responsible for
another variable (the dependent variable). The cross-classification procedure begins
with a two-way table which indicates whether there is or there is not an interrelationship
between the variables. This sort of analysis can be further elaborated in which case a third
factor is introduced into the association through cross-classifying the three variables. By
doing so we find conditional relationship in which factor X appears to affect factor Y only
when factor Z is held constant. The correlation, if any, found through this approach is not
considered a very powerful form of statistical correlation and accordingly we use some
other methods when data happen to be either ordinal or interval or ratio data.
Charles Spearman’s coefficient of correlation (or rank correlation) is the technique of
determining the degree of correlation between two variables in case of ordinal data
where ranks are given to the different values of the variables. The main objective of this
coefficient is to determine the extent to which the two sets of ranking are similar or
dissimilar
SIMPLE REGRESSION ANALYSIS
Regression is the determination of a statistical relationship between two or more
variables. In simple regression, we have only two variables, one variable (defined as
independent) is the cause of the behaviour of another one (defined as dependent
variable). Regression can only interpret what exists
physically i.e., there must be a physical way in which independent variable X can affect
dependent variable Y. The basic relationship between X and Y is given by
Y’ = a + bX where the symbol Y’ denotes the estimated value of Y for a given value of X.
This equation is known as the regression equation of Y on X (also represents the
Page | 5
regression line of Y on X when drawn on a graph) which means that each unit change in
X produces a change of b in Y, which is positive for direct and negative for inverse
relationships.
Thus, the regression analysis is a statistical method to deal with the formulation of
mathematical model depicting relationship amongst variables which can be used for the
purpose of prediction of the values of dependent variable, given the values of the
independent variable.