0% found this document useful (0 votes)

11 views24 pages

Lecture 8 Bivariate Data

The document discusses bivariate frequency distribution, focusing on correlation and simple regression analysis. It explains the concepts of correlation, regression equations, and techniques for determining correlation, including scatter diagrams and various correlation coefficients. Additionally, it highlights common errors in correlation interpretation and provides examples to illustrate the application of these statistical methods.

Uploaded by

che-006-22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views24 pages

Lecture 8 Bivariate Data

Uploaded by

che-006-22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Bivariate Frequency Distribution: Correlation and Simple Regression Analysis

Bivariate Frequency Distribution: Correlation and

Simple Regression Analysis

Harold C Banda

Phone : +265 9997-733-78/8893-733-57.

Email : [email protected]
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Paired Data

Paired Data
is there a relationship?
if so, what is the equation?
use the equation for prediction.

Assumptions of Correlation
The sample of paired data (x, y ) is a random sample.
The pairs of (x, y ) data have a bivariate normal distribution.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis

Introduction
Correlation measures the strength of a relationship of
variables while regression is a way of representing that
relationship.
Thus, Correlation means the extent to which the two
variables vary directly (positive correlation) or inversely
(negative correlation).
The degree of relationship is expressed as a numeric index
called the coefficient of correlation denoted by r.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis

Properties/Interpretation of Correlation coefficient

−1 ≤ r ≤ 1
A value of r=1 means perfect positive correlation.
A value of r=-1 means perfect negative correlation.
0 < r < 1 means positive partial correlation.
−1 < r < 0 means negative partial correlation.
r=0 means no correlation (absence of a linear relationship
between the two variables).
r is not affected by the choice of variables(variables can be
interchanged).
r measures strength of a linear relationship.
Value of r does not change if all values of either variable are
converted to a different scale.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Techniques for determining correlation

Techniques for determining correlation

1 Inspection of a scatter diagram(is a graph in which the paired
(x, y ) sample data are plotted with a horizontal x axis and a
vertical y axis. Each individual (x, y ) pair is plotted as a
single point).
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Techniques for determining correlation

Exercise
Consider the paired data:
(x, y ) : (2, 1.4), (4, 1.8), (8, 2.1), (8, 2.3), (9, 2.6).
Draw a scatter diagram and comment on the relationship.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Techniques for determining correlation...Cont’d

Techniques for determining correlation...Cont’d

1 Inspection of a scatter diagram.

2 The Pearson’s product moment correlation coefficient which is

found by using the formulaP P P
−( x)( y )
r = √ P n2 xy P 2 P 2 P 2 .
[n x −( x) ][n y −( y) ]
Where x = the values of the independent variable.
y = the values of the dependent variable.
n = the number of the paired data points in the sample.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Example

Example
Refer to the bivariate data set below, the number of hours
(X ) six students studied for a final exam and their final exam
scores (Y ).
Hours of study (X) Exam score (Y)
3 86
5 95
4 92
4 83
2 78
3 82
Calculate the correlation coefficient between hours studied
and exam score and interpret your results.
From the table
P P above, wePhave the following
P 2 results: n = 6;
P x 2= 21; y = 516; xy = 1, 835; x = 79;
y = 44, 582.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Example

Example...cont’d
Substituting these results in the given formula, we get
r= 0.862.
Interpretation: There is a strong postive correlation
between hours of study and exam score. The more hours one
studies, the higher the score.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Example

Example...cont’d
Note: It is important to understand the limitations of
correlation as a measure.
While we have seen in the previous example a high correlation
between hours of study and test score, is there a causal
connection?
External evidence may lead us to think that studying for more
hours may cause one to have a high score but it is quite
possible that some students are gifted they could also have
high scores without spending more hours on studies.
We have to look at the other evidence and, unless we are
carrying out an experiment, have no idea what the causal
connection is between two variables.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Techniques for determining correlation...Cont’d

Techniques for determining correlation...Cont’d

1 Inspection of a scatter diagram.

2 The Pearson’s product moment correlation coefficient

3 The spearman’s rank correlation coefficient which is found by
using the formula:
6 d2
P
R =1− n(n2 −1)
.
Where d is the difference between the two ranks for any one
item, and n is the number of items involved.
Example: The mid-semester results in Mathematics and Costing
of a sample of 6 students were as follows:
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Example

Example...cont’d
STUDENTS MATHEMATICS COSTING
John 98 77
Annie 72 84
Peter 52 50
Chikondi 65 64
Mary 45 49
George 50 20

Use the Spearman’s rank correlation coefficient to investigate

whether there is a relationship between ability in Mathematics and
Costing.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Example

Solution:
STUDENTS Rank in MATHS Rank in COSTING d d2
John 1 2 -1 1
Annie 2 1 1 1
Peter 4 4 0 0
Chikondi 3 3 0 0
Mary 6 5 1 1
George 5 6 -1 1
6 d2
P
6×4
R =1− n(n2 −1)
=1− 6(62 −1)
= 0.886.

This means ability in Mathematics and Costing are strongly related.

Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Common Errors Involving Correlation

Common Errors Involving Correlation

Causation: It is wrong to conclude that correlation implies
causality.
Averages: Averages suppress individual variation and may
inflate the correlation coefficient.
Linearity: There may be some relationship between x and y
even when there is no significant linear correlation.

Coefficient of Determination
r 2 is called the coefficient of determination and it gives the
proportion of the total variation in the dependent variable
which is explained by the variation in the independent variable.
From the example above r = 0.862(3d.p)
So, r 2 = 0.862 × 0.862 = 0.743.
Thus 74.3% of the variation in the grades is explained by the
variation in x.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Covariance

Covariance
The term covariance has the same meaning as the variance of
one variable: how spread out or variable things are.
It is calculated as follows: P
i (xi −x̄)(yi −ȳ )
sxy = n−1 .

Regression Equation
Given a collection of paired data, the regression equation
y = a + bx algebraically describes the relationship between
the two variables (x, y).
Regression Line is the line of best fit or least-squares line
which connects the two variables.
Given a value xi with its corresponding observed value yi ,
plugging xi into the equation (y = bx + a ) yields say ybi as an
estimate of yi .
The difference in the estimation is yi − ybi .
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Linear Model or structure

Linear Model or structure

Independent/predictor variable (x): A single numerical
variable assumed to measure a cause.
Dependent/response variable (y): A single numerical
variable assumed to measure an effect.
Equation model: yb = β1 x + β0 .
Note: In a regression context, making a prediction means
taking an x-value that is not found in our sample, and
calculating a y -value for that individual. The ability to make
these sorts of predictions is very valuable in business, simply
because measurement costs money. If we can measure just
some of the variables and then calculate the rest, we can save
money, time, and resources
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Least squares Regression line

Least squares Regression line

We want a line y = bx + a which minimises yi − ybi .
To do this we find b and a which minimises (yi − ybi )2 .
P
Such a line is called the Least Squares Regression line
Since y is assumed to be dependent on x we call this line the
Least Squares Regression Line
P
y on x.
−nx̄ ȳ
From calculus we obtain b = P xyx 2 −nx̄ 2
and a = ȳ − bx̄ (this
follows from the fact that the point (x̄, x̄) lies on the line).
Hence we end up with the line y = bx + a.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Assumptions

Assumptions
We are investigating only linear relationships.
For each x value, y is a random variable having a normal
(bell-shaped) distribution. All of these y distributions have the
same variance. Also, for a given value of x, the distribution of
y -values has a mean that lies on the regression line.

Guidelines for Using The Regression Equation

If there is no significant linear correlation, don’t use the
regression equation to make predictions.
When using the regression equation for predictions, stay
within the scope of the available sample data.
A regression equation based on old data is not necessarily
valid now.
Don’t make predictions about a population that is different
from the population from which the sample data was drawn.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Least squares Regression line-Example

Least squares Regression line-Example

The following table shows the amount of time (in hours) that
students spend preparing for an exam and the grade they get
in the exam:
Hours of study (X) Exam score (Y)
10 51
7 48
12 52
15 58
6 48
14 53
2 23
Find the equation of the least squares regression line grade on
time (y on x) and hence estimate the grade obtained by a
students who spent 3 hours preparing for the exam.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Least squares Regression line-Example...cont’d

Least squares Regression line-Example...cont’d

We
P have the Pfollowing results from thePtable above:
x 2 = 754,
P
x = 66, y = 333, xy = 3416,
x̄ = 9.429(3d.p) and ȳ = 47.571(3d.p).
Now b = 2.098(3d.p) and a = 27.792(3d.p).
Hence the line is y = 2.098x + 27.792.
So for x = 3 the corresponding y is
y = 2.098 × 3 + 27.972 = 34.084(3d.p).

Some Definitions
Marginal Change: the amount a variable changes when the
other variable changes by exactly one unit.
Outlier: a point lying far away from the other data points.
Influential Points: points which strongly affect the graph of
the regression line.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Some Definitions...cont’d

Some Definitions...cont’d
Residual: for a sample of paired (x, y ) data, the difference
(y − yb) between an observed sample y -value and the value of
yb, which is the value of y that is predicted by using the
regression equation.
Least-Squares Property: A straight line satisfies this
property if the sum of the squares of the residuals is the
smallest sum possible.
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Contingency and Association Tables

Contingency and Association Tables

We have so far discussed the relationships between two
quantitative variables—the strength, direction and form of the
linear relationship with the correlation.
What about qualitative (categorical) variables?
Suppose a class of 82 students is asked this question: “do you
enjoy Statistics?” the following table shows the responses:

Strongly Agree Agree Neutral Disagree Strongly D

Males 9 13 5 2 1
Females 12 18 11 6 5
Bivariate Frequency Distribution: Correlation and Simple Regression Analysis
Contingency and Association Tables

Contingency and Association Tables

A contingency table relates two categories of data.
In the example above, the relationship is between the gender
of the student and his/her response to the question.
A marginal distribution of a variable is a frequency or a
relative frequency distribution of either the row or the column
variable in the contingency table (Totals).
If each of the totals above is divided by n (n=82), then the
result is called relative frequency marginal distribution.
A conditional distribution lists the relative frequency of each
category of variable, given a specific value of the other
variable in the contingency table.

Punch Inspection
No ratings yet
Punch Inspection
5 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
Harmonica Chords
100% (1)
Harmonica Chords
2 pages
Programming Manual: Advanced Motion Control Software
No ratings yet
Programming Manual: Advanced Motion Control Software
17 pages
Correlation Analysis
100% (1)
Correlation Analysis
51 pages
Simple Neon Lamp Circuits and Working Explained 2
No ratings yet
Simple Neon Lamp Circuits and Working Explained 2
36 pages
Using Statistical Techniq Ues in Analyzing Data
100% (1)
Using Statistical Techniq Ues in Analyzing Data
40 pages
C11.4.QA1.Chemical Bonding.R
No ratings yet
C11.4.QA1.Chemical Bonding.R
9 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
Correlation and Regression
100% (1)
Correlation and Regression
45 pages
API 510 Closed Book Practice Exam
No ratings yet
API 510 Closed Book Practice Exam
111 pages
Egression & Orrelation: Nalysis
0% (1)
Egression & Orrelation: Nalysis
48 pages
SCI10 - Q4 - M4 - Predicting and Balancing Chemical Equations
No ratings yet
SCI10 - Q4 - M4 - Predicting and Balancing Chemical Equations
20 pages
Correlation Analysis Notes-2
No ratings yet
Correlation Analysis Notes-2
5 pages
Moment of Force
No ratings yet
Moment of Force
5 pages
2.4-p1-p71 Vertical
No ratings yet
2.4-p1-p71 Vertical
7 pages
Correlation Analysis
No ratings yet
Correlation Analysis
102 pages
2.correlation Regression Summary Notes by Pranav Popat 1
No ratings yet
2.correlation Regression Summary Notes by Pranav Popat 1
4 pages
Equal and Reducing Tees: MSS SP-75 ANSI B16.9 (In MM) Wellgrow's Standard
No ratings yet
Equal and Reducing Tees: MSS SP-75 ANSI B16.9 (In MM) Wellgrow's Standard
4 pages
Sirosonic L
No ratings yet
Sirosonic L
100 pages
Prysmian MV 1CALX33HD Datasheet 2015-04
No ratings yet
Prysmian MV 1CALX33HD Datasheet 2015-04
2 pages
Keyscan System VII (7.0.19) User Quick Reference Guide: Table of Content
No ratings yet
Keyscan System VII (7.0.19) User Quick Reference Guide: Table of Content
8 pages
Introduction To Correlationand Regression Analysis BY Farzad Javidanrad PDF
No ratings yet
Introduction To Correlationand Regression Analysis BY Farzad Javidanrad PDF
52 pages
Correlation and Regration
No ratings yet
Correlation and Regration
57 pages
L7 Correlation
No ratings yet
L7 Correlation
40 pages
Oscillations - Class 11 Physics NCERT Solutions Free PDF Download
No ratings yet
Oscillations - Class 11 Physics NCERT Solutions Free PDF Download
41 pages
Correlation Analysis - Final
No ratings yet
Correlation Analysis - Final
40 pages
BSC Aeronautical
No ratings yet
BSC Aeronautical
144 pages
STAT1600 (24-25, 1st) Chapter 2
No ratings yet
STAT1600 (24-25, 1st) Chapter 2
63 pages
Research Paper
No ratings yet
Research Paper
47 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
34 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
83 pages
UNIT-2 by Ramanathan
No ratings yet
UNIT-2 by Ramanathan
67 pages
Lecture 10
No ratings yet
Lecture 10
33 pages
Correlation: By: Nathaniel S. Antero
No ratings yet
Correlation: By: Nathaniel S. Antero
13 pages
An Empirical Assessment of Empirical Corporate Finance
No ratings yet
An Empirical Assessment of Empirical Corporate Finance
40 pages
Stat
No ratings yet
Stat
17 pages
Cce 68 D 4 CC 4
No ratings yet
Cce 68 D 4 CC 4
28 pages
5-Correlation, Regression and Rank Correlation-08-03-2024
No ratings yet
5-Correlation, Regression and Rank Correlation-08-03-2024
29 pages
Correlation
No ratings yet
Correlation
30 pages
Notes - Correlation and Regression
No ratings yet
Notes - Correlation and Regression
26 pages
Correlation Constant
No ratings yet
Correlation Constant
23 pages
Chapter1-Introduction To Regression Analysis
No ratings yet
Chapter1-Introduction To Regression Analysis
12 pages
Chapter 1 - Presentation-1
No ratings yet
Chapter 1 - Presentation-1
26 pages
Analise Bivariada - Moodle
No ratings yet
Analise Bivariada - Moodle
46 pages
Correlation 1
No ratings yet
Correlation 1
9 pages
05 Correlation
No ratings yet
05 Correlation
12 pages
19e Multifunctional Indicator Operator Manual
No ratings yet
19e Multifunctional Indicator Operator Manual
73 pages
Practical Training Seminar: Shubham Jain 132 KV G.S.S.Chambal Jaipur
No ratings yet
Practical Training Seminar: Shubham Jain 132 KV G.S.S.Chambal Jaipur
16 pages
Correlation
No ratings yet
Correlation
34 pages
Correlation
No ratings yet
Correlation
14 pages
Introduction To Correlation and Regression Analysis
No ratings yet
Introduction To Correlation and Regression Analysis
14 pages
Smart Soot Blower System
No ratings yet
Smart Soot Blower System
8 pages
Unit 3-1
No ratings yet
Unit 3-1
12 pages
Short Notes On Servo Motor
100% (3)
Short Notes On Servo Motor
2 pages
Correlation Analysis
No ratings yet
Correlation Analysis
16 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
11 pages
Lecture 11-Correlation and Linear Regression
No ratings yet
Lecture 11-Correlation and Linear Regression
7 pages
Correction
No ratings yet
Correction
10 pages
Correlation
No ratings yet
Correlation
21 pages
Automatic Localization of Casting Defects With Convolutional Neural Networks
No ratings yet
Automatic Localization of Casting Defects With Convolutional Neural Networks
11 pages
PSY Chapter 7
No ratings yet
PSY Chapter 7
8 pages
Day 8 - Module Linear Correlation
No ratings yet
Day 8 - Module Linear Correlation
5 pages
Knowledge Based System PDF
No ratings yet
Knowledge Based System PDF
14 pages
Correlation and Its Significance
No ratings yet
Correlation and Its Significance
15 pages
Chapter - Six
No ratings yet
Chapter - Six
8 pages
Lesson 8
No ratings yet
Lesson 8
11 pages
Correlation Rank - Correlation Curve - Fitting For Student
No ratings yet
Correlation Rank - Correlation Curve - Fitting For Student
26 pages
Correlation and Regression - Interview Questions in Business Analytics
No ratings yet
Correlation and Regression - Interview Questions in Business Analytics
5 pages
Session 4 Correlation Analysis
No ratings yet
Session 4 Correlation Analysis
16 pages
Correlation and Regression
No ratings yet
Correlation and Regression
11 pages
Chapter 8 - PSYC 284
No ratings yet
Chapter 8 - PSYC 284
7 pages
Compactly Powerful: Ugeo Pt60A
No ratings yet
Compactly Powerful: Ugeo Pt60A
6 pages
2018 Howland Et Al. Quantifying The Effects of Erosion On Archaeological Sites With Low-Altitude Aerial Photography, Structure From Motion, and GIS
No ratings yet
2018 Howland Et Al. Quantifying The Effects of Erosion On Archaeological Sites With Low-Altitude Aerial Photography, Structure From Motion, and GIS
9 pages
CORRELATION
No ratings yet
CORRELATION
10 pages
Corn Starch
No ratings yet
Corn Starch
8 pages
Grade 5 DLL MATH 5 Q4 Week 2
No ratings yet
Grade 5 DLL MATH 5 Q4 Week 2
5 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
Lecture Sheet H
No ratings yet
Lecture Sheet H
17 pages
Lecture VII Bivariate Data
No ratings yet
Lecture VII Bivariate Data
8 pages
II Sem Syllabus
No ratings yet
II Sem Syllabus
12 pages
Dear Sir,: Larsen & Toubro Limited Electrical & Automation Control & Automation
No ratings yet
Dear Sir,: Larsen & Toubro Limited Electrical & Automation Control & Automation
2 pages
Design of Rotation Inducing Rocket Fins and Their Analysis For Aerodynamic Stability
No ratings yet
Design of Rotation Inducing Rocket Fins and Their Analysis For Aerodynamic Stability
6 pages
E20 EN Rev13
No ratings yet
E20 EN Rev13
4 pages
Types of Correlation
No ratings yet
Types of Correlation
3 pages
Lecture-25 CORRELATION - 34861774 - 2024 - 05 - 04 - 23 - 38
No ratings yet
Lecture-25 CORRELATION - 34861774 - 2024 - 05 - 04 - 23 - 38
4 pages
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet

Lecture 8 Bivariate Data

Uploaded by

Lecture 8 Bivariate Data

Uploaded by

Bivariate Frequency Distribution: Correlation and Simple Regression Analysis

Bivariate Frequency Distribution: Correlation and

Phone : +265 9997-733-78/8893-733-57.

Properties/Interpretation of Correlation coefficient

Techniques for determining correlation

Techniques for determining correlation...Cont’d

2 The Pearson’s product moment correlation coefficient which is

Techniques for determining correlation...Cont’d

2 The Pearson’s product moment correlation coefficient

Use the Spearman’s rank correlation coefficient to investigate

This means ability in Mathematics and Costing are strongly related.

Common Errors Involving Correlation

Linear Model or structure

Least squares Regression line

Guidelines for Using The Regression Equation

Least squares Regression line-Example

Least squares Regression line-Example...cont’d

Contingency and Association Tables

Strongly Agree Agree Neutral Disagree Strongly D

Contingency and Association Tables

You might also like