0% found this document useful (0 votes)
65 views53 pages

Lesson 2: Summarizing Data

Uploaded by

minhchau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views53 pages

Lesson 2: Summarizing Data

Uploaded by

minhchau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Lesson 2:

Summarizing data
Chapter Goals

After completing this lesson, you


should be able to:
 Construct a frequency distribution table
 Construct and interpret a histogram
 Create and interpret bar charts, pie charts,
and stem-and-leaf diagrams
 Present and interpret data in line charts
and scatter diagrams
Frequency Distributions

What is a Frequency Distribution?


 A frequency distribution is a list or a table …
 containing the values of a variable (or a set
of ranges within which the data fall) ...
 and the corresponding frequencies with
which each value occurs (or frequencies with
which data fall within each range)
Why Use Frequency Distributions?

 A frequency distribution is a way to


summarize data
 The distribution condenses the raw
data into a more useful form...
 and allows for a quick visual
interpretation of the data
Frequency Distribution:
Discrete Data
 The following data record the number of
children in the families of the 47 workers
in a company:
1 1 3 2 0 2 0 1 2 2 1 3

5 2 4 0 0 2 4 1 1 2 2 0

3 0 0 2 1 3 6 0 2 1 0 3

2 2 2 1 0 0 1 1 3 1 4
Frequency distribution table
Number of children Number of workers
in family
0
1
2
3
4
5
6
Frequency Distribution:
Discrete Data
 Discrete data: possible values are countable

Number of days
Example: An read
Frequency
advertiser asks 0 44
200 customers 1 24
how many days 2 18
per week they 3 16
read the daily 4 20
newspaper. 5 22
6 26
7 30
Total 200
Relative Frequency
Relative Frequency: What proportion is in each category?

Number of days Relative


Frequency
read Frequency
44
0 44 .22  .22
1 24 .12
200
2 18 .09 22% of the
3 16 .08 people in the
sample report
4 20 .10 that they read
5 22 .11 the newspaper
0 days per week
6 26 .13
7 30 .15
Total 200 1.00
NOTE
For developing frequency and relative
frequency distributions for discrete data
(1) List all possible values of the variables.
If the variable is quantitatives, order the
possible values from low and high.
(2) Count the number of occurrences at
each value of the variable and place this
value in a column labeled “frequency”
(3) Determine the variable frequencies
Frequency Distribution:
Continuous Data

 Continuous Data: may take on any value


in some interval
Example: A manufacturer of insulation randomly
selects 20 winter days and records the daily
high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32,
13, 12, 38, 41, 43, 44, 27, 53, 27
(Temperature is a continuous variable because it could
be measured to any degree of precision desired)
Definitions associated with
frequency distribution classes
 Class limits: are the lower and upper values of
the classes as physically described in the
distribution.

Discrete data Continuous data


Lower Upper
limit Classes Classes
limit

Lower Upper
limit limit
Definitions associated with
frequency distribution classes
 Class widths (class lengths):
- continuous data: are the numerical
differences between lower and upper class
limits.
- discrete data: are the numerical differences
between the lower limit of one class and the
lower limit of the immediately following class
 Class mid-points: are situated in the centre of
the classes.
Definitions associated with
frequency distribution classes
 Open-ended class:
- A class without a/an Classes
lower/upper limit. < 10
- Usually used for the
first class which has no 10-15
defined lower limit
and/or the last class
15-20
which has no defined >=20
upper limit
Grouping Data by Classes
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41,
43, 44, 46, 53, 58

 Find range: 58 - 12 = 46
 Select number of classes: 5 (usually between 5 and 20)

 Compute class width: 10 (46/5 then round off)

 Determine class boundaries:10, 20, 30, 40, 50


 Compute class midpoints: 15, 25, 35, 45, 55
 Count observations & assign to classes
Frequency Distribution Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Frequency Distribution

Class Frequency Relative


Frequency
10 but under 20 3 .15
20 but under 30 6 .30
30 but under 40 5 .25
40 but under 50 4 .20
50 but under 60 2 .10
Total 20 1.00
Questions for Grouping Data
into Classes

 1. How wide should each interval be?


(How many classes should be used?)

 2. How should the endpoints of the


intervals be determined?
 Often answered by trial and error, subject to
user judgment
 The goal is to create a distribution that is
neither too "jagged" nor too "blocky”
 Goal is to appropriately show the pattern of
variation in the data
How Many Class Intervals?
 Many (Narrow class
intervals)
3.5
3

 may yield a very jagged 2.5

Frequency
distribution with gaps from
2

empty classes
1.5
1

 Can give a poor indication 0.5

of how frequency varies 0

4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
More
across classes Temperature

Few (Wide class intervals)


12
 10
 may compress variation too

Frequency
8

much and yield a blocky 6

distribution 4

 can obscure important 2

patterns of variation. 0
0 30 60 More
Temperature

(X axis labels are upper class endpoints)


Guidelines for grouping values
into classes
• Use between 5 and 20 classes.
Or Sturges’s Rule: Classes = 1 + 3.322[log10(n)]
where n: number of data values.
The classes should meet four criteria
- First, they must be mutually exclusive.
- Second, they must be all-inclusive.
- Third, if at all possible, they should be of equal-width
and
- Fourth, avoid empty classes if possible.
General Guidelines

 Number of Data Points Number of Classes


under 50 5- 7
50 – 100 6 - 10
100 – 250 7 - 12
over 250 10 - 20
◦ Class widths can typically be reduced as the
number of observations increases
◦ Distributions with numerous observations are
more likely to be smooth and have gaps filled
since data are plentiful
Class Width

 The class width is the distance between


the lowest possible value and the highest
possible value for a frequency class

 The minimum class width is


Largest Value - Smallest Value
W =
Number of Classes
NOTE
To develop a continuous data frequency
distributions, perform the following steps:
(1) Determine the desired number of classes or
groups. The rule of thumb is to use 5 to 20
classes. Sturges’s rule can be used.
(2) Determine the minimum class width (round the
class width up to a more convenient value)
(3) Define the class boundaries, making sure that
the classes that are formed are mutually
exclusive and all inclusive. Ideally, the classes
should have equal width and should all contain
at least one observation.
(4) Count the number of values in each class.
Cummulative and relative
cummulative frequency distribution
 A summary of a set of data that displays
the number of observations with values
less than or equal to the upper limit of
each of its classes.
 A summary of a set of data that displays
the proportion of observations with values
less than or equal to the upper limit of
each of its classes.
Histograms

 The classes or intervals are shown on


the horizontal axis
 frequency is measured on the vertical
axis
 Bars of the appropriate heights can be
used to represent the number of
observations within each class
 Such a graph is called a histogram
Why start with Histograms?

It shows 3 general types of


information
(1) a visual indication of where the
approximate centre of data is.
(2) the degree of spread (or variation) in
the data.
(3) the shape of the distribution.
Histogram Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Histogram
7 6
6 5
5 No gaps
Frequency

4
4 3 between
3 2 bars, since
2
continuous
1 0 0 data
0
5 15 25 36 45 55 More
Class Midpoints
Relative frequency
histograms and ogives

• A relative frequency histogram is


formed in the same manner as a
frequency histogram, but it used
rather than frequencies.
• The cummulative relative frequency
is presented using a graph called an
ogive.
Example

 The director of emergency responses in


Montreal, Canada, is interested in
analyzing the time needed for response
teams to reach their destinations in
emergency situations after leaving their
stations. She has acquired the response
times for 1.220 calls last month.
To develop the frequency histogram,
relative frequency histograms and ogives.
Histograms in Excel

1
Select
Tools/Data Analysis
Histograms in Excel
(continued)

2
Choose Histogram

3
Input data and bin ranges

Business Statistics: A Decision- Cha


Select Chart Output Making Approach, 6e © 2005 p 2-
Prentice-Hall, Inc. 29
Stem and Leaf Diagram

 A simple way to see distribution details


in a data set

METHOD: Separate the sorted data


series into leading digits (the stem)
and the trailing digits (the leaves)
Example:

Data in ordered array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

 Here, use the 10’s digit for the stem unit:


Stem Leaf
 12 is shown as 1 2

 35 is shown as 3 5
Example:

Data in ordered array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

 Completed Stem-and-leaf diagram:


Stem Leaves
1 2 3 7
2 1 4 4 6 7 7
3 0 2 5 7 8
4 1 3 4 6
5 3 8
Using other stem units

 Using the 100’s digit as the stem:


◦ Round off the 10’s digit to form the
leaves
Stem Leaf

 613 would become 6 1


 776 would become 7 8
 ...
 1224 becomes 12 2
Graphing Categorical Data

Categorical
Data

Pie Bar Pareto


Charts Charts Diagram
Bar and Pie Charts

 Bar charts and Pie charts are often


used for qualitative (category) data

 Height of bar or size of pie slice


shows the frequency or percentage
for each category
Pie Chart Example

Current Investment Portfolio


Investment Amount Percentage Savings
Type (in thousands $)
15%
Stocks 46.5 42.27 Stocks
Bonds 32.0 29.09 42%
CD
CD 15.5 14.09 14%
Savings 16.0 14.55
Total 110 100

Bonds Percentages
(Variables are Qualitative) are rounded to
29% the nearest
percent
Bar Chart Example

Investor's Portfolio

Savings
CD
Bonds
Stocks

0 10 20 30 40 50
Amount in $1000's
Pareto Diagram Example
45% 100%

40% 90%
% invested in each category

80%

cumulative % invested
35%

70%
30%
(bar graph)

(line graph)
60%
25%

50%
20%
40%

15%
30%

10%
20%

5% 10%

0% 0%
Stocks Bonds Savings CD
Bar Chart Example

Number of Frequency
days read Newspaper readership per week
0 44
1 24 50
2 18 40
Freuency

3 16
30
4 20
20
5 22
6 26 10
7 30 0
Total 200 0 1 2 3 4 5 6 7
Number of days newspaper is read per week
Tabulating and Graphing
Multivariate Categorical Data

 Investment in thousands of dollars


Investment Investor A Investor B Investor C Total
Category
Stocks 46.5 55 27.5 129
Bonds 32.0 44 19.0 95
CD 15.5 20 13.5 49
Savings 16.0 28 7.0 51
Total 110.0 147 67.0 324
Tabulating and Graphing
Multivariate Categorical Data
(continued)
 Side by side charts
C o m p arin g In vesto rs

S a vin g s

CD

B onds

S toc k s

0 10 20 30 40 50 60

In ve s t o r A In ve s t o r B In ve s t o r C
Side-by-Side Chart Example
 Sales by quarter for three sales territories:
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 59 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9

60

50

40
East
30 West
North
20

10

0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Line Charts and
Scatter Diagrams
 Line charts show values of one
variable vs. time
◦ Time is traditionally shown on the
horizontal axis

 Scatter Diagrams show points for


bivariate data
◦ one variable is measured on the vertical
axis and the other variable is measured on
the horizontal axis
Line Chart Example

Inflation
Year Rate
1985 3.56
U.S. Inflation Rate
1986 1.86 6
1987 3.65
5
Inflation Rate (%)

1988 4.14
1989 4.82 4
1990 5.40
1991 4.21 3
1992 3.01
1993 2.99 2
1994 2.56
1
1995 2.83
1996 2.95 0
1997 2.29
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
1998 1.56
1999 2.21 Year
2000 3.36
2001 2.85
2002 1.58
Scatter Diagram Example

Production Volume vs. Cost per Day


Volume Cost per
per day day
250
23 125
200
Cost per Day

26 140
29 146 150
33 160
100
38 167
42 170
50
50 188 0
55 195 0 10 20 30 40 50 60 70
60 200 Volume per Day
Types of Relationships

 Linear Relationships

Y Y

X X
Types of Relationships
(continued)

 Curvilinear Relationships

Y Y

X X
Types of Relationships
(continued)

 No Relationship

Y Y

X X
Chapter Summary

 Data in raw form are usually not easy to


use for decision making -- Some type of
organization is needed:
 Table  Graph
 Techniques reviewed in this chapter:
◦ Frequency Distributions and
Histograms
◦ Bar Charts and Pie Charts
◦ Stem and Leaf Diagrams
◦ Line Charts and Scatter Diagrams
Summary
Exercises
2.4 You are given the following data
6 10 6 4 9 5 5 5
5 7 6 2 5 5 5 4
5 7 6 7 8 6 8 4
7 5 5 5 5 7 8 7
6 7 5 4 6 4 4 7
4 6 6 7 8 6 7 6
7 8 5 6 5 7 3 6
4 7 4 4
a.Contruct a frequency distribution for these
data
b.Based on the frequecy distribution, develop a
histogram
Exercises
2.24 You are given the following data
79 104 76 89 110
109 145 117 162 108
117 87 87 150 152
85 143 101 137 111
149 154 90 150 117
147 87 177 190 66
153 97 106 86 62
Contruct a stem and leaf diagram
Exercises
2.29 Real estate investment trusts (REITs) were
created by Congress in 1960 so small investors could
invest in real estate as shareholders rather than as
landlords. Using information taken from Paine
Webber and SEC fillings, …., reported the following
proportions of REIT money invested in different
categories
Category Percentage
- Shopping Centers 20
- Multifamily 18
- Office 17
- Other 45
Present this information graphically using a pie chart
and alternatively, a bar chart

You might also like