Lecture 2 Descriptive StatisticsTabular and Graphical Methods
Lecture 2 Descriptive StatisticsTabular and Graphical Methods
Descriptive Statistics:
Tabular and Graphical Methods
Lecture Outline
Summarizing Qualitative Data
Summarizing Quantitative Data
Exploratory Data Analysis
Crosstabulations
and Scatter Diagrams
Slide 2
Summarizing Qualitative Data
Frequency Distribution
Relative Frequency
Bar Graph
Pie Chart
Slide 3
Frequency Distribution
A frequency distribution is a tabular summary
of data showing the frequency (or number) of
items in each of several nonoverlapping classes.
Slide 4
Example: Marada Inn
Guests staying at Marada Inn were asked to
rate the quality of their accommodations as
being excellent, above average, average, below
average, or poor. The ratings provided by a
sample of 20 guests are shown below.
Slide 5
Example: Marada Inn
Frequency Distribution
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Slide 6
Relative Frequency Distribution
The relative frequency of a class is the
Slide 7
Percent Frequency Distribution
Slide 8
Example: Marada Inn
Relative Frequency and Percent
Frequency Distributions
Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25
Above Average .45 45
Excellent .05 5
Total 1.00 100
Slide 9
Bar Graph
A bar graph is a graphical device for depicting
qualitative data that have been summarized in a
frequency, relative frequency, or percent
frequency distribution.
On the horizontal axis we specify the labels that
are used for each of the classes.
A frequency, relative frequency, or percent
frequency scale can be used for the vertical axis.
Using a bar of fixed width drawn above each
class label, we extend the height appropriately.
The bars are separated to emphasize the fact
that each class is a separate category.
Slide 10
Example: Marada Inn
Bar Graph
9
8
7
Frequency
6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average
Slide 11
Pie Chart
The pie chart is a commonly used graphical
device for presenting relative frequency
distributions for qualitative data.
First draw a circle; then use the relative
frequencies to subdivide the circle into sectors
that correspond to the relative frequency for
each class.
Since there are 360 degrees in a circle, a class
with a relative frequency of .25 would consume
.25(360) =
90 degrees of the circle.
Slide 12
Example: Marada Inn
Pie Chart
Exc.
Poor
5%
10%
Below
Average
Above
15%
Average
45%
Average
25%
Quality Ratings
Slide 13
Example: Marada Inn
Insights Gained from the Preceding Pie Chart
One-half of the customers surveyed gave
Marada a quality rating of “above average” or
“excellent” (looking at the left side of the pie).
This might please the manager.
For each customer who gave an “excellent”
rating, there were two customers who gave a
“poor” rating (looking at the top of the pie).
This should displease the manager.
Slide 14
Summarizing Quantitative Data
Frequency Distribution
Dot Plot
Histogram
Cumulative Distributions
Ogive
Slide 15
Example: Hudson Auto Repair
The manager of Hudson Auto would like to get a
better picture of the distribution of costs for
engine tune-up parts. A sample of 50 customer
invoices has been taken and the costs of parts,
rounded to the nearest dollar, are listed below.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Slide 16
Frequency Distribution
Guidelines for Selecting Number of Classes
classes.
Slide 17
Frequency Distribution
Guidelines for Selecting Width of Classes
Slide 18
Example: Hudson Auto Repair
Frequency Distribution
If we choose six classes:
Approximate Class Width = (109-52)/6 = 9.5 10
Slide 19
Example: Hudson Auto Repair
Relative Frequency and Percent Frequency
Distributions
Relative Percent
Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 26
70-79 .32 32
80-89 .14 14
90-99 .14 14
100-109 .10 10
Total 1.00 100
Slide 20
Example: Hudson Auto Repair
Insights Gained from the Percent Frequency
Distribution
Only 4% of the parts costs are in the $50-59
class.
30% of the parts costs are under $70.
The greatest percentage (32% or almost one-
third) of the parts costs are in the $70-79
class.
10% of the parts costs are $100 or more.
Slide 21
Dot Plot
One of the simplest graphical summaries of
data is a dot plot.
Slide 22
Example: Hudson Auto Repair
Dot Plot
.. .. . . .
. .. .. .. .. . .
. . . ..... .......... .. . .. . . ... . .. .
50 60 70 80 90 100 110
Cost ($)
Slide 23
Histogram
Another common graphical presentation of
quantitative data is a histogram.
The variable of interest is placed on the
horizontal axis and the frequency, relative
frequency, or percent frequency is placed on the
vertical axis.
A rectangle is drawn above each class interval
with its height corresponding to the interval’s
frequency, relative frequency, or percent
frequency.
Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent
classes.
Slide 24
Example: Hudson Auto Repair
Histogram
18
16
14
Frequency
12
10
8
6
4
2
Parts
Cost ($)
50 60 70 80 90 100 110
Slide 25
Cumulative Distribution
The cumulative frequency distribution shows
the number of items with values less than or
equal to the upper limit of each class.
The cumulative relative frequency distribution
shows the proportion of items with values less
than or equal to the upper limit of each class.
The cumulative percent frequency distribution
shows the percentage of items with values less
than or equal to the upper limit of each class.
Slide 26
Example: Hudson Auto Repair
Cumulative Distributions
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 .62 62
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100
Slide 27
Ogive
An ogive is a graph of a cumulative distribution.
The data values are shown on the horizontal
axis.
Shown on the vertical axis are the:
cumulative frequencies, or
cumulative relative frequencies, or
cumulative percent frequencies
The frequency (one of the above) of each class is
plotted as a point.
The plotted points are connected by straight
lines.
Slide 28
Example: Hudson Auto Repair
Ogive
Because the class limits for the parts-cost
data are 50-59, 60-69, and so on, there
appear to be one-unit gaps from 59 to 60, 69
to 70, and so on.
These gaps are eliminated by plotting points
halfway between the class limits.
Thus, 59.5 is used for the 50-59 class, 69.5 is
used for the 60-69 class, and so on.
Slide 29
Example: Hudson Auto Repair
Ogive with Cumulative Percent Frequencies
Cumulative Percent Frequency
100
80
60
40
20
Parts
Cost ($)
50 60 70 80 90 100 110
Slide 30
Exploratory Data Analysis
The techniques of exploratory data analysis
consist of simple arithmetic and easy-to-draw
pictures that can be used to summarize data
quickly.
Slide 31
Stem-and-Leaf Display
A stem-and-leaf display shows both the rank
order and shape of the distribution of the data.
It is similar to a histogram on its side, but it has
the advantage of showing the actual data
values.
The first digits of each data item are arranged
to the left of a vertical line.
To the right of the vertical line we record the
last digit for each item in rank order.
Each line in the display is referred to as a stem.
Each digit on a stem is a leaf.
Slide 32
Example: Hudson Auto Repair
Stem-and-Leaf Display
5 2 7
6 2 2 2 2 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9
8 0 0 2 3 5 8 9
9 1 3 7 7 7 8 9
10 1 4 5 5 9
Slide 33
Stretched Stem-and-Leaf Display
If we believe the original stem-and-leaf display
has condensed the data too much, we can
stretch the display by using two more stems for
each leading digit(s).
Slide 34
Example: Hudson Auto Repair
Stretched Stem-and-Leaf Display
5 2
5 7
6 2 2 2 2
6 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4
7 5 5 5 6 7 8 9 9 9
8 0 0 2 3
8 5 8 9
9 1 3
9 7 7 7 8 9
10 1 4
10 5 5 9
Slide 35
Stem-and-Leaf Display
Leaf Units
Slide 36
Example: Leaf Unit = 0.1
If we have data with values such as
8.6 11.7 9.4 9.1 10.2 11.0 8.8
a stem-and-leaf display of these data will be
Slide 37
Example: Leaf Unit = 10
If we have data with values such as
1806 1717 1974 1791 1682 1910 1838
a stem-and-leaf display of these data will be
Leaf Unit = 10
16 8
17 1 9
18 0 3
19 1 7
Slide 38
Crosstabulations and Scatter Diagrams
Thus far we have focused on methods that are
used to summarize the data for one variable at a
time.
Often a manager is interested in tabular and
graphical methods that will help understand the
relationship between two variables.
Crosstabulation and a scatter diagram are two
methods for summarizing the data for two (or
more) variables simultaneously.
Slide 39
Crosstabulation
Crosstabulation is a tabular method for
summarizing the data for two variables
simultaneously.
Crosstabulation can be used when:
One variable is qualitative and the other is
quantitative
Both variables are qualitative
Both variables are quantitative
The left and top margin labels define the classes
for the two variables.
Slide 40
Example: Finger Lakes Homes
Crosstabulation
The number of Finger Lakes homes sold for each
style and price for the past two years is shown
below.
< $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45
Total 30 20 35 15 100
Slide 41
Example: Finger Lakes Homes
Insights Gained from the Preceding
Crosstabulation
The greatest number of homes in the sample
(19) are a split-level style and priced at less
than or equal to $99,000.
Only three homes in the sample are an A-
Frame style and priced at more than $99,000.
Slide 42
Crosstabulation: Row or Column
Percentages
Converting the entries in the table into row
Slide 43
Example: Finger Lakes Homes
Row Percentages
Slide 44
Example: Finger Lakes Homes
Column Percentages
Slide 45
Scatter Diagram
A scatter diagram is a graphical presentation of
the relationship between two quantitative
variables.
One variable is shown on the horizontal axis
and the other variable is shown on the vertical
axis.
The general pattern of the plotted points
suggests the overall relationship between the
variables.
Slide 46
Scatter Diagram
A Positive Relationship
Slide 47
Scatter Diagram
A Negative Relationship
Slide 48
Scatter Diagram
No Apparent Relationship
Slide 49
Example: Panthers Football Team
Scatter Diagram
The Panthers football team is interested in
investigating the relationship, if any, between
interceptions made and points scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 27
Slide 50
Example: Panthers Football Team
Scatter Diagram
Number of Points Scored y
30
25
20
15
10
5
0 x
0 1 2 3
Number of Interceptions
Slide 51
Example: Panthers Football Team
The preceding scatter diagram indicates a
positive relationship between the number of
interceptions and the number of points scored.
Higher points scored are associated with a
higher number of interceptions.
The relationship is not perfect; all plotted
points in the scatter diagram are not on a
straight line.
Slide 52
Tabular and Graphical Procedures
Data
Slide 53