Lesson 2: Summarizing Data
Lesson 2: Summarizing Data
Summarizing data
Chapter Goals
5 2 4 0 0 2 4 1 1 2 2 0
3 0 0 2 1 3 6 0 2 1 0 3
2 2 2 1 0 0 1 1 3 1 4
Frequency distribution table
Number of children Number of workers
in family
0
1
2
3
4
5
6
Frequency Distribution:
Discrete Data
Discrete data: possible values are countable
Number of days
Example: An read
Frequency
advertiser asks 0 44
200 customers 1 24
how many days 2 18
per week they 3 16
read the daily 4 20
newspaper. 5 22
6 26
7 30
Total 200
Relative Frequency
Relative Frequency: What proportion is in each category?
Lower Upper
limit limit
Definitions associated with
frequency distribution classes
Class widths (class lengths):
- continuous data: are the numerical
differences between lower and upper class
limits.
- discrete data: are the numerical differences
between the lower limit of one class and the
lower limit of the immediately following class
Class mid-points: are situated in the centre of
the classes.
Definitions associated with
frequency distribution classes
Open-ended class:
- A class without a/an Classes
lower/upper limit. < 10
- Usually used for the
first class which has no 10-15
defined lower limit
and/or the last class
15-20
which has no defined >=20
upper limit
Grouping Data by Classes
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41,
43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 20)
Frequency Distribution
Frequency
distribution with gaps from
2
empty classes
1.5
1
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
More
across classes Temperature
Frequency
8
distribution 4
patterns of variation. 0
0 30 60 More
Temperature
Histogram
7 6
6 5
5 No gaps
Frequency
4
4 3 between
3 2 bars, since
2
continuous
1 0 0 data
0
5 15 25 36 45 55 More
Class Midpoints
Relative frequency
histograms and ogives
1
Select
Tools/Data Analysis
Histograms in Excel
(continued)
2
Choose Histogram
3
Input data and bin ranges
35 is shown as 3 5
Example:
Categorical
Data
Bonds Percentages
(Variables are Qualitative) are rounded to
29% the nearest
percent
Bar Chart Example
Investor's Portfolio
Savings
CD
Bonds
Stocks
0 10 20 30 40 50
Amount in $1000's
Pareto Diagram Example
45% 100%
40% 90%
% invested in each category
80%
cumulative % invested
35%
70%
30%
(bar graph)
(line graph)
60%
25%
50%
20%
40%
15%
30%
10%
20%
5% 10%
0% 0%
Stocks Bonds Savings CD
Bar Chart Example
Number of Frequency
days read Newspaper readership per week
0 44
1 24 50
2 18 40
Freuency
3 16
30
4 20
20
5 22
6 26 10
7 30 0
Total 200 0 1 2 3 4 5 6 7
Number of days newspaper is read per week
Tabulating and Graphing
Multivariate Categorical Data
S a vin g s
CD
B onds
S toc k s
0 10 20 30 40 50 60
In ve s t o r A In ve s t o r B In ve s t o r C
Side-by-Side Chart Example
Sales by quarter for three sales territories:
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 59 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9
60
50
40
East
30 West
North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Line Charts and
Scatter Diagrams
Line charts show values of one
variable vs. time
◦ Time is traditionally shown on the
horizontal axis
Inflation
Year Rate
1985 3.56
U.S. Inflation Rate
1986 1.86 6
1987 3.65
5
Inflation Rate (%)
1988 4.14
1989 4.82 4
1990 5.40
1991 4.21 3
1992 3.01
1993 2.99 2
1994 2.56
1
1995 2.83
1996 2.95 0
1997 2.29
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
1998 1.56
1999 2.21 Year
2000 3.36
2001 2.85
2002 1.58
Scatter Diagram Example
26 140
29 146 150
33 160
100
38 167
42 170
50
50 188 0
55 195 0 10 20 30 40 50 60 70
60 200 Volume per Day
Types of Relationships
Linear Relationships
Y Y
X X
Types of Relationships
(continued)
Curvilinear Relationships
Y Y
X X
Types of Relationships
(continued)
No Relationship
Y Y
X X
Chapter Summary