Correlation and Regression 2

The document discusses the relationship between two interval scale variables and the use of correlation and regression analysis to understand and predict the relationship. It defines linear regression as estimating the means of the dependent variable (Y) for different values of the independent variable (X) to find the best fitting regression line. The key assumptions of the linear regression model are that the relationship is linear, the variables are normally distributed, and the variances are equal. The method of least squares is described as the approach to estimating the regression line parameters (a and b) by minimizing the vertical distances between the data points and the line. An example calculation is provided. Correlation is then introduced as a measure of the strength of the relationship between the variables.

Uploaded by

Srijoni Chaki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views24 pages

Correlation and Regression 2

Uploaded by

Srijoni Chaki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

CORRELATION AND

REGRESSION
RELATIONSHIP BETWEEN TWO INTERVAL SCALES

Ramakant Agrawal 14-09-2020

Correlation and regression
Introduction
• Relationship between two interval scales e.g. number of years of education completed (X) and the annual income of adults
(Y) or the percentage of the labor force engaged in manufacturing (X) to a city’s population growth (Y).
• We may be interested in measures of degree of relationship but also may want to describe the nature of the relationship
between the two variables so that we can predict the value of one variable if we know the other e.g. we may want to predict
a person’s future income from his/her education
• When interest is focused primarily on the exploratory task of finding out which variables are related to a given variable, we
are likely to be mainly interested in measures of degree or strength of relationship such as correlation coefficients. On the
other hand, once we have found the significant variables we are more likely to turn our attention to regression analysis in
which we attempt to predict the exact value of one variable from the other.
• It will be advisable to begin our discussion by studying the prediction problem. This is because the notion of regression is
both logically prior to and theoretically more important than that of correlation. After discussing the prediction problem, we
shall turn our attention to measuring the strength of relationship.
Linear Regression and Least Squares
The Prediction Problem

• The ultimate goal of all sciences is that of prediction. This does not imply, of course, that one is only secondarily interested
in ‘“‘understanding’’ why two or more variables are interrelated as they are. Perhaps it is correct to say that such
“‘understanding”’ is the ultimate goal and that to the degree that understanding becomes perfected, prediction becomes
more and more accurate.
• Suppose there is a dependent variable Y which is to be predicted from an independent variable X. In some problems X will
clearly precede Y in time. For example, a person usually completes his education before earning his income. We want to be
careful not to imply a necessary or causal relationship or that X is the only variable influencing the value of Y. we may be
equally interested in predicting Y from X or X fromY. Let us assume, however, that Y is taken as the dependent variable.
• if X and Y are independent, we cannot predict Y from X, or more exactly, knowledge of X does not improve our prediction
of Y. For example , we may wish to estimate a person’s future income, given that he/she has completed three years of
college. Without this knowledge of education, our best guess would be the mean income of all adults Knowing his/her
education, however, ought to enable us to obtain a better prediction.
The Regression Equation
Coceptualising the Problem

• Let us conceptualize the problem in the following way. We imagine that for every fixed value of the
independent variable X (education) we have a distribution of Y’s (incomes). In other words, for each
educational level there will be a certain income distribution in the population. Not all persons who
have completed high school will have exactly the same income, but these incomes will be distributed
about some mean. There will be similar income distributions for school pass outs, college graduates,
post-graduates, etc.
• Each of these separate income distributions (for fixed X’s) will have a “mean, and we can plot the
position of these means in the familiar X-Y coordinates. We refer to the resulting path of these means
of the Y’s for fixed X’s as a regression equation of Y on X. Diagram 17.1 is shown the next slide.In
Fig. 17.1 we have indicated the general nature of regression equations as involving the paths of the
means of Y values for given values of X.
ASSUMPTIONS OF THE REGRESSION MODEL
• That the form of the regression equation is linear,
• That the distributions of the Y values for each X are normal, and
• That the variances of the Y distributions are the same for each value of X.
• If the regression of Y on X is linear, or a straight-line relationship, we can write an equation as follows

• Where both 𝜶 and β are constants. Greek letters have been used since for the present we are dealing with the total
population. If we set X equal to zero, we see that Y= 𝜶. Therefore, 𝜶 represents the point where the regression line
crosses the Y axis (i.e., where X = 0).
• The slope of the regression line is given by β since this constant indicates the magnitude of the change in Y for a
given change in X.
Assumptions Contd….
• It was indicated that we shall assume that the Y’s are distributed normally about each value of X. It will also be
convenient to assume that for each fixed value of Y the X’s are also distributed normally. We say that the joint
distribution of X and Y is a bivariate normal distribution, meaning that there are two variables, each of which is
distributed about the other normally.
• The bivariate normal distribution has the property that the regression of Y on X is linear. Therefore if we have a
bivariate normal distribution, we know that if we trace the means of Y’s for each X the result will be a straight line.
It does not follow, however, that if the regression is linear, the Joint distribution is necessarily bivariate normal.
Bivariate normality will have to be assumed when we come to tests of significance.
• We shall also need to assume that the standard deviations of the Y’s for each X are the same regardless of the value
of X. if the joint distribution is bivariate normal, the standard deviations of the Y’s for each X will in fact all be
identical. This property of equal variances is referred to as homoscedasticity.
LINEAR LEAST SQUARES
STEPS TO FOLLOW

• Draw a scatter diagram from the data pairs of X and Y. This will help identify whether an
association exists or not.
• Having plotted the scores on a scattergram, approximate these points by some sort of a best-
fitting curve
• One way of doing this is to draw a curve (in this case a straight line) by inspection (diagram in
the next slide). There are other more precise methods of doing this, however. One of these is
the method of least squares which will be discussed in the present.
Least Squares Theory
• This means that we shall fit the data with a best-fitting straight line according to the least-squares criterion,
getting an equation of the form

• It will then turn out that the a and b obtained by this method are the most efficient unbiased estimates of the
population parameters 𝜶 and β if the regression equation actually is a straight line.
• The least-squares criterion involves our finding the unique straight line which has the property that the sum of
the squares of the deviations of the actual Y values from this line is a minimum.
• Thus if we draw vertical lines from each of the points to the least-squares line, and if we square these
distances and add, the resulting sum will be less than a comparable sum of squares from any other possible
straight line (see diagram in the next slide).
Formula to compute a and b
• Formulae
Numerical Example of Regression Equation Estimation

• Suppose we have the data given in Table 17.1 below, with X representing the percentage of minorities in certain
North Indian cities and Y indicating the difference between majority and minority median incomes as a measure of
economic discrimination.

• From the raw data we can compute five sums which, together with N, are all that we need in order to handle
regression and correlation problems. All but one of these sums will be used in computing a and b. Computations can
be summarized as follows:
Predicting from the regression equation

• Since discrimination scores indicate differences (in dollars) between the median incomes of
Majority and Minority, we see that an increase of 1 per cent Minority corresponds to a
difference of $19.93 in the median incomes of majority and minority. A scattergram and the
least-squares equation have been drawn in Fig. 17.6. To illustrate the use of such a prediction
equation, if we knew that there were 8 per cent minorities in a given city, the estimated median
income differential would be
Percent Minority

Figure 17.6: Scattergram and least-squares line for data of Table 17.1.
Correlation

• It is necessary to know the degree or strength of the relationship. Obviously, if the relationship
is very weak there is no point in trying to predict Y from X.
• correlations of a very high order are necessary for even moderately accurate prediction.
• The correlation coefficient was introduced by Karl Pearson and is often referred to as product-
moment correlation in order to distinguish it from other measures of association.
THANK YOU

Job Safety Analysis (Jsa) : Title of Activity / Work: Excavation & Trenching
91% (11)
Job Safety Analysis (Jsa) : Title of Activity / Work: Excavation & Trenching
2 pages
Pokemon Ranger Guardian Signs Official Game Guide - Unleashed
100% (1)
Pokemon Ranger Guardian Signs Official Game Guide - Unleashed
257 pages
Traffic Management and Accident Investigation
67% (3)
Traffic Management and Accident Investigation
8 pages
T54B VCF PDF
92% (12)
T54B VCF PDF
528 pages
Oxford Big Ideas Geography 8 Ch1 Landforms and Landscapes
0% (1)
Oxford Big Ideas Geography 8 Ch1 Landforms and Landscapes
14 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
06 Regression
No ratings yet
06 Regression
18 pages
Prediction Is A Key Task of Statistics
No ratings yet
Prediction Is A Key Task of Statistics
18 pages
Assignment 6 - STAT
No ratings yet
Assignment 6 - STAT
12 pages
Lecture Notes
No ratings yet
Lecture Notes
141 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Chapter 14 Simple Linear Regression .
No ratings yet
Chapter 14 Simple Linear Regression .
39 pages
Linear Regression
No ratings yet
Linear Regression
216 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
Econometrics 2
No ratings yet
Econometrics 2
128 pages
Regression
No ratings yet
Regression
20 pages
Ra Web
No ratings yet
Ra Web
70 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
Correlation
100% (1)
Correlation
29 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
8 pages
CH 6
No ratings yet
CH 6
42 pages
Handout 05 Regression and Correlation PDF
No ratings yet
Handout 05 Regression and Correlation PDF
17 pages
CH 6
No ratings yet
CH 6
43 pages
Correlation and Linear
No ratings yet
Correlation and Linear
27 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Chapter Two: Simple Linear Regression Model: 2.1 Introduction To Regression Analysis
No ratings yet
Chapter Two: Simple Linear Regression Model: 2.1 Introduction To Regression Analysis
7 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Simple Linear Regression and Correlation Analysis: Chapter Five
No ratings yet
Simple Linear Regression and Correlation Analysis: Chapter Five
5 pages
Topic 5-Lecture Notes
No ratings yet
Topic 5-Lecture Notes
12 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Regression Analysis - SSB
No ratings yet
Regression Analysis - SSB
2 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
83 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Regression: Leech N L, Barret K C & Morgan G A (2011)
No ratings yet
Regression: Leech N L, Barret K C & Morgan G A (2011)
35 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Third, Regression Analysis Predicts Trends and Future Values
No ratings yet
Third, Regression Analysis Predicts Trends and Future Values
2 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Module 3 PoM-Forecasting
No ratings yet
Module 3 PoM-Forecasting
5 pages
Unit Regression Analysis: Objectives
No ratings yet
Unit Regression Analysis: Objectives
18 pages
Regression
No ratings yet
Regression
1 page
A Tutorial On How To Run A Simple Linear Regression in Excel
No ratings yet
A Tutorial On How To Run A Simple Linear Regression in Excel
19 pages
Module 3 - Data Analysis - S RM
No ratings yet
Module 3 - Data Analysis - S RM
63 pages
Econometrics I Handout
No ratings yet
Econometrics I Handout
41 pages
Econometrics For MGT ppt-2
No ratings yet
Econometrics For MGT ppt-2
58 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
REGRESSION
No ratings yet
REGRESSION
38 pages
Regression Intro
No ratings yet
Regression Intro
3 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Cha 6
No ratings yet
Cha 6
8 pages
Regression Analysis
No ratings yet
Regression Analysis
6 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Regression
No ratings yet
Regression
60 pages
Unit 202: Principles of Engineering Technology: Handout 14: Work, Power and Energy
No ratings yet
Unit 202: Principles of Engineering Technology: Handout 14: Work, Power and Energy
3 pages
Lours Et Le Dragon Coffret 2 Volumes Tom Clancy Instant Download
100% (1)
Lours Et Le Dragon Coffret 2 Volumes Tom Clancy Instant Download
31 pages
Daftar Persediaan Januari Terbaru 2025 Pt. HNR
No ratings yet
Daftar Persediaan Januari Terbaru 2025 Pt. HNR
38 pages
Datasheet LORENTZ PSk3 Hybrid Solar Pumping Solution
No ratings yet
Datasheet LORENTZ PSk3 Hybrid Solar Pumping Solution
7 pages
c630 Nickel Aluminum Bronze PDF
No ratings yet
c630 Nickel Aluminum Bronze PDF
2 pages
Bài ôn tập học kì II - Unit 7 - Test 1
No ratings yet
Bài ôn tập học kì II - Unit 7 - Test 1
5 pages
Topic 1 - Introduction To Cell
No ratings yet
Topic 1 - Introduction To Cell
23 pages
Carrying Out Driving Instructor Tests and Checks Adi1
50% (2)
Carrying Out Driving Instructor Tests and Checks Adi1
142 pages
Engine Maintenance ................................... Ma-2
No ratings yet
Engine Maintenance ................................... Ma-2
12 pages
SUTENE2 TRM Test U8B
No ratings yet
SUTENE2 TRM Test U8B
4 pages
LPG Vitarak Chayan LPG Gas Agency Dealership Brochure
No ratings yet
LPG Vitarak Chayan LPG Gas Agency Dealership Brochure
9 pages
RSLTE001 - System Program Cell Level - RSLTE-LNBTS-2-Day-rslte LTE17A Reports RSLTE001 XML-2018 03-27-06!40!24 955
No ratings yet
RSLTE001 - System Program Cell Level - RSLTE-LNBTS-2-Day-rslte LTE17A Reports RSLTE001 XML-2018 03-27-06!40!24 955
1,000 pages
pc120 6 - 6E - 6EO SAA4D102E 2
No ratings yet
pc120 6 - 6E - 6EO SAA4D102E 2
12 pages
Dd-Il9-Practice Test Unit 3a
No ratings yet
Dd-Il9-Practice Test Unit 3a
6 pages
Kirt You So Much
No ratings yet
Kirt You So Much
3 pages
Richthofen - The Hunter Birth of A Logo Pt.2
100% (2)
Richthofen - The Hunter Birth of A Logo Pt.2
46 pages
Abinet PR
No ratings yet
Abinet PR
8 pages
Week 3
No ratings yet
Week 3
4 pages
FAMILY LAW-I Marriage
No ratings yet
FAMILY LAW-I Marriage
126 pages
F Ma Friction
100% (1)
F Ma Friction
5 pages
Consciousness, Causation, and Confusion: Darryl Mathieson
No ratings yet
Consciousness, Causation, and Confusion: Darryl Mathieson
18 pages
Ken Kim PG79 FINAL
No ratings yet
Ken Kim PG79 FINAL
1 page
Theory & Application of Psycho-Oncology
No ratings yet
Theory & Application of Psycho-Oncology
56 pages
Peri
No ratings yet
Peri
128 pages

Correlation and Regression 2

Uploaded by

Correlation and Regression 2

Uploaded by

CORRELATION AND

Ramakant Agrawal 14-09-2020

You might also like