0% found this document useful (0 votes)
38 views32 pages

Chapter 03 Describing Bivarate Data

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views32 pages

Chapter 03 Describing Bivarate Data

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Introduction to Probability

and Statistics
Fourteenth Edition

Chapter 3
Describing Bivariate Data

Copyright ©2006 Brooks/Cole


Some images © 2001-(current year) www.arttoday.com  A division of Thomson Learning, Inc.
Bivariate Data
• Sometimes the data that are collected consist of obser-
vations for two variables on the same experimental unit.
• When two variables are measured on a single
experimental unit, the resulting data are called bivariate
data. Examples are:
• 1. An auto insurance company might be interested in the
number of vehicles owned by a policyholder as well as
the number of drivers in the household.
• 2. An economist might need to measure the amount spent
per week on groceries in a household and also the
number of people in that household.

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
Bivariate Data
• 3. A real estate agent might measure the selling price of a
residential property and the square footage of the living
area.
• You can describe each variable individually, and you can
also explore the relationship between the two variables.
• Bivariate data (qualitative or quantitative) can be
described with
– Graphs – allow you to study 2 variables together
– Numerical Measures

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
Graphs for Qualitative
/Categorical Variables
• When at least one of the two variables is qualitative or
categorical, either simple or more intricate pie charts
(comparative), line charts, and bar charts can be used to
display and describe the data.
• Sometimes you will have one qualitative and one
quantitative variable that have been measured in two
different populations or groups.
• In this case, you can use two side-by-side pie charts or a
bar chart in which the bars for the two populations are
placed side by side.

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
Graphs for Qualitative
/Categorical Variables
• Another option is to use a stacked bar chart, in which the
bars for each category are stacked on top of each other.
• Example 3.1: Are professors in private colleges paid more
than professors at public colleges?
• A sample of 400 college professors whose rank, type of
college, and salary are recorded. The number in each cell
is the average salary (in thousands of dollars) for all
professors who fell into that category.

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
Graphs for Qualitative
/Categorical Variables
• To display the average salaries of these 400 professors,
you can use a side-by-side bar chart.

• Salaries are substantially higher for full professors in


private colleges, however, there are less striking
differences at the lower two ranks.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Graphs for Qualitative
/Categorical Variables
• Example 3.2 (From Book)
• Another Example: Do you think that men and women are
treated equally in the workplace?
Variable #1 = Opinion
Variable #2 = Gender
Men Women

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
Comparative Bar Charts
120 Gender 70
Men
Women 60
100

50
80

Percent
40
Percent

60
30

40 20

10
20

0
Gender Men Women Men Women Men Women
0
Opinion Agree Disagree No Opinion
Opinion Agree Disagree No Opinion

• Stacked Bar Chart • Side-by-Side Bar Chart


Describe the relationship between opinion and
gender:
More women than men feel that they are
not treated equally in the workplace..
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Scatterplot for Two Quantitative
Variables
• When both of the variables are quantitative, call one
variable x and the other y. A single measurement is a pair
of numbers (x, y) that can be plotted using a two-
dimensional graph called a scatterplot.
• It is the two dimensional extension of the dotplot we
used to graph one quantitative variable
y
(2, 5)

y=5

x
x=2 Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Describing the Scatterplot
• We can describe the relationship between two variables, x
and y, using the patterns shown in the scatterplot.
• What pattern or form do you see?
• Straight line upward or downward
• Curve or
• No pattern at all, but just a random scattering of points
• How strong is the pattern?
• Strong- all of the points follow the pattern exactly or
• weak - the relationship is only weakly visible
• Are there any unusual observations?
observations
• Clusters or outliers
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Describing the Scatterplot
Example 3.3: The number of household members, x, and the
amount spent on groceries per week, y, are measured for six
households in a local area.

Example 3.4
from Book

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
Examples

Positive linear - strong Negative linear -weak

Curvilinear No relationship
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Numerical Measures for
Two Quantitative Variables
• A constant rate of increase or decrease is perhaps
the most common pattern found in bivariate
scatterplots.
• Assume that the two variables x and y exhibit a
linear pattern or form.
form
• There are two numerical measures to describe
– The strength and direction of the relationship
between x and y (Correlation Coefficient, r)
– The form of the relationship (Regression)
• Example: 3.5
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Correlation Coefficient
• The strength and direction of the relationship between x
and y are measured using the correlation coefficient, r.
• The new quantity sxy is called the s xy
r
covariance between x and y and defined as sx s y

( xi )( yi )
 xi y i 
s xy  n
n 1
sx = standard deviation of the x’s
Copyright ©2006 Brooks/Cole
sy = standard deviation of the y’s
A division of Thomson Learning, Inc.
The Correlation Coefficient
• When a data point (x, y) is in either area I or III in the
scatterplot, the cross product will be positive;
• When a data point is in area II or IV, the cross product
will be negative. We can draw these conclusions:

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
The Correlation Coefficient
• If most of the points are in areas I and III (forming a
positive pattern), Sxy and r will be positive.
• If most of the points are in areas II and IV (forming
a negative pattern), Sxy and r will be negative.
• If the points are scattered across all four areas
(forming no pattern), Sxy and r will be close to 0.

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
Example
• Living area x and selling price y of 5 homes.
Residence 1 2 3 4 5
x (thousand sq ft) 14 15 17 19 16
y ($000) 178 230 240 275 200

•The scatterplot
indicates a positive
linear relationship.

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
x
14
y
178
xy
2492
Example
15 230 3450 Calculate
17 240 4080
x  16.2 s x  1.924
19 275 5225
y  224.6 s y  37.360
16 200 3200
81 1123 18447

(  xi )( yi ) s xy
 xi y i  r
s xy  n sx s y
n 1
63.6
(81)(1123)   .885
18447  1.924(37.36)
 5  63.6
4
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Interpreting r MY APPLET

•-1  r  1 Range of r. Sign of r indicates


direction of the linear relationship.

•r  0 Weak relationship; random scatter


of points

•r  1 or –1 Strong relationship; either


positive or negative

All points fall exactly on a


•r = 1 or –1 straight line.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Interpreting r
• The value of r always lies between -1 and 1.
• When r is positive, x increases when y increases,
and vice versa.
• When r is negative, x decreases when y increases, or
x increases when y decreases.
• When r takes the value exactly -1 or 1, all the points
lie exactly on a straight line.
• If r = 0, then there is no apparent linear relationship
between the two variables.
• The closer the value of r is to -1 or 1, the stronger
the linear relationship between the two variables.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Regression Line
• Sometimes x and y are related in a particular way—the value of y
depends on the value of x.
– y = dependent variable
– x = independent variable
• Example: the cost of a home (y) may depend on its amount of
floor space (x), a student’s grade point average (x) may explain
her score on an achievement test (y)
• The form of the linear relationship between x and y can be
described by fitting a line as best we can through the points. This
is the regression line,
y = a + bx.
– a = y-intercept of the line
– b = slope of the line

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
The Regression Line

• For every one-unit increase in x, y increases by an amount


b. The quantity b determines whether the line is increasing
(b > 0), decreasing (b < 0), or horizontal (b = 0) and is
appropriately called the slope of the line.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Regression Line
• When plotting the (x, y) points for two variables x and y,
the points generally do not fall exactly on a straight line,
but they may show a trend that could be described as a
linear pattern.
• We can describe this trend by fitting a line as best we
can through the points.
• This best-fitting line relating y to x, often called the
regression or least squares line, is found by minimizing
the sum of the squared differences between the data
points and the line itself.

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
The Regression Line

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
The Regression Line
• To find the slope and y-intercept of the
best fitting line, use:

sy
br
sx
a  y  bx

• The least squares


• regression line is y = a + bx
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Regression Line
• Since Sx and Sy are both positive, b and r have the
same sign, so that:
1. When r is positive, so is b, and the line is
increasing with x.
2. When r is negative, so is b, and the line is
decreasing with x.
3. When r is close to 0, then b is close to 0.
• Example 3.7 from Book

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
x y xy Example in Excel
14 178 2492
15 230 3450
Recall
17 240 4080 x  16.2 s x  1.9235
19 275 5225 y  224.6 s y  37.3604
16 200 3200
r  .885
81 1123 18447

sy 37.3604
br  (.885)  17.189
sx 1.9235
a  y  b x  224 .6  17 .189 (16 .2 )   53 .86
Regression Line : y   53 .86  17 .189 x
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Example
• Predict the selling price for another residence
with 1600 square feet of living area.

Predict: y   53 .86  17 .189 x


  53 .86  17 .189 (16 )  221 .16 or $221,160
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Need to Know

Copyright ©2006 Brooks/Cole


A division of Thomson Learning, Inc.
The Correlation and Regression
• The regression approach is used when the values of x are set
in advance and then the corresponding value of y is
measured.
• The correlation approach is used when an experimental unit
is selected at random and then measurements are made on
both variables x and y.
• Most data analysts begin any data-based investigation by
examining plots of the variables involved.
• If the relationship between two variables is of interest, data
analysts can also explore bivariate plots in conjunction with
numerical measures of location, dispersion, and correlation.
• Graphs and numerical descriptive measures are only the first
of many statistical tools.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Key Concepts
I. Bivariate Data
1. Both qualitative and quantitative variables
2. Describing each variable separately
3. Describing the relationship between the variables
II. Describing Two Qualitative Variables
1. Side-by-Side pie charts
2. Comparative line charts
3. Comparative bar charts
 Side-by-Side
 Stacked
4. Relative frequencies to describe the relationship between
the two variables.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Key Concepts
III. Describing Two Quantitative Variables
1. Scatterplots
 Linear or nonlinear pattern
 Strength of relationship
 Unusual observations; clusters and outliers
2. Covariance and correlation coefficient
3. The best fitting line
 Calculating the slope and y-intercept
 Graphing the line
 Using the line for prediction
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.

You might also like