0% found this document useful (0 votes)
5 views

Chapter2 Sumarizing Data S

Chapter 2 covers methods for summarizing data, including frequency distributions, descriptive statistics, and various data presentation techniques such as summary tables, bar charts, pie charts, and Pareto diagrams. It emphasizes the importance of visualizing categorical and numerical data to identify patterns and trends, as well as the use of contingency tables for bivariate categorical data. The chapter also outlines guidelines for grouping data into classes and determining appropriate class intervals for effective data analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Chapter2 Sumarizing Data S

Chapter 2 covers methods for summarizing data, including frequency distributions, descriptive statistics, and various data presentation techniques such as summary tables, bar charts, pie charts, and Pareto diagrams. It emphasizes the importance of visualizing categorical and numerical data to identify patterns and trends, as well as the use of contingency tables for bivariate categorical data. The chapter also outlines guidelines for grouping data into classes and determining appropriate class intervals for effective data analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 91

Chapter 2

Summarizing Data (Review)


Learning Objectives
In this chapter you learn:
■1. Frequency distribution classes
■2. Descriptive Statistics: Tabulars and
Charts/Graphs
Data Presentation
Data
Presentation

Categorial Numerical
Data Data

Summary Dot Stem-&-Leaf Frequency


Table Plot Display Distribution

Bar Pie Pareto


Histogram
Graph Chart Diagram
Summary Table
1. Lists categories & number of elements in
category
2. Obtained by tallying responses in category
3. May show frequencies (counts), % or both
Row Is Major Count Tally:
Category |||| ||||
Accounting 130
|||| ||||
Economics 20
Management 50
Total 200
Summary Table-Example

■A summary table of the retirement funds,


categorized by risk
Example

■The sample of 318 retirement funds includes


the variable risk that has the defined categories
Low, Average, and High. Construct a summary
table of the retirement funds, categorized by
risk (Dataset: Retirement Funds).
Summary Table: Retirement Funds
categorized by Risk

Fund Risk Level Number of funds Percentage of Funds


Low 99 31.13%
Average 145 45.60%
High 74 23.27%
Total 318 100.00%
Bar Chart
(for an Investor’s Portfolio)
Bar Chart

■The bar chart visualizes a categorical variable


as a series of bars.
■The length of each bar represents either the
frequency or percentage of values for each
category.
■ Each bar is separated by a space called a gap.
Pie Chart Example
Current Investment Portfolio
Investment Amount Percentage Savings
Type (in thousands $)
15%
Stocks 46.5 42.27 Stocks
Bonds 32.0 29.09 42%
CD 15.5 14.09 CD
Savings 16.0 14.55 14%
Total 110 100

Bonds Percentages
(Variables are Qualitative) are rounded to
29% the nearest
percent
Pie Chart

■The pie chart is a circle broken up into slices


that represent categories.
■The size of each slice of the pie varies
according to the percentage in each category.
Doughnut chart
Example

■The sample of 318 retirement funds includes


the variable risk that has the defined categories
Low, Average, and High. Construct a bar chart
and a pie chart of the retirement funds,
categorized by risk (Dataset: Retirement
Funds).
Pareto Diagram

Axis for
bar
chart
shows
%
invested
in each
category Axis for line
graph
shows
cumulative
% invested
VILFREDO PARETO
(1843–1923)

The Pareto Principle


■Pareto showed that approximately 80%
of the total wealth in a society lies with
only 20% of the families.
■This famous law about the “vital few
and the trivial many” is widely known as
the Pareto principle in economics.
Pareto Diagram
▪ Used to portray categorical data
▪ A bar chart, where categories are shown in
descending order of frequency
▪ A cumulative polygon is shown in the same graph
▪ Used to separate the “vital few” from the “trivial
many”.
▪ Pareto charts are also powerful tools for prioritizing
improvement efforts, such as when data are
collected that identify defective or nonconforming
items.
Pareto Diagram

The “Vital
Few”
Pareto Diagram

■Use the bank’s own processing systems as a


primary data source, causes of incomplete
transactions are collected, stored in ATM
Transactions to construct a Pareto Diagram.
Example
Pareto diagram
Pareto diagram

■The first three categories account for about


60% of the defects.
■Decision makers can see where to concentrate
efforts to improve the process. Attempts to
reduce defects due to warpage, damage, and
pin marks should produce the greatest payoff.
Summary
Bar graph: The categories (classes) of the
qualitative variable are represented by bars, where
the height of each bar is either the class frequency,
class relative frequency, or class percentage.
Pie chart: The categories (classes) of the
qualitative variable are represented by slices of a
pie (circle). The size of each slice is proportional to
the class relative frequency.
Pareto diagram: A bar graph with the categories
(classes) of the qualitative variable (i.e., the bars)
arranged by height in descending order from left to
right.
Bivariate Categorical Data

■Contingency Tables
■Side By Side Bar Charts
Bivariate Categorical Data

■Contingency Tables: Investment in Thousands of Dollars


Investment Investor A Investor B Investor C Total
Category

Stocks 46.5 55 27.5 129


Bonds 32 44 19 95
CD 15.5 20 13.5 49
Savings 16 28 7 51
Total 110 147 67 324
Bivariate Categorical Data

■Side by Side Charts


Contingency Table
■A contingency table cross-tabulates, or tallies
jointly, the data of two or more categorical
variables, allowing you to study patterns that may
exist between the variables.
■Tallies can be shown as a frequency, a percentage
of the overall total, a percentage of the row total, or
a percentage of the column total, depending on the
type of contingency table you use.
■ Each tally appears in its own cell, and there is a
cell for each joint response, a unique combination
of values for the variables being tallied.
Contingency Table- Example
Contingency Table- Example

■Use Mutual Funds to construct a contingency


table for the levels of risk, and type of funds.
Side by Side Bar Chart- Example

■Use Mutual Funds to construct a side-by-side


chart to visualizes the data for the levels of risk
for growth and value funds.
Side-By-Side Bar Charts
The doughnut chart
Data Presentation
Data
Presentation

Categorial Numerical
Data Data

Summary Dot Stem-&-Leaf Frequency


Table Plot Display Distribution

Bar Pie Pareto


Histogram
Graph Chart Diagram
Dot Plot
1. Horizontal axis is a scale for the quantitative
variable, e.g., percent.
2. The numerical value of each measurement is
located on the horizontal scale by a dot.
Stem-and-Leaf

■Data in Raw Form (as Collected):


24, 26, 24, 21, 27, 27, 30, 41, 32, 38
■Data in Ordered Array from Smallest to Largest:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
■Stem-and-Leaf Display:
2 144677
3 028
4 1
Example

■Suppose you collect the following meal costs (in


$) for 15 classmates who had lunch at a fast-food
restaurant (stored in FastFood ):
Example
■ To construct the stem-and-leaf display, you use whole
dollar amounts as the stems and round the cents to one
decimal place to use as the leaves. For the first value,
7.42, the stem is 7 and its leaf is 4. For the second
value, 6.29, the stem is 6 and its leaf 3.

■ A stem-and-leaf display turned sideways looks like a histogram.


Frequency Distributions
■A frequency distribution is a list or a table
containing the values of a variable (or a set of
ranges within which the data fall) and the
corresponding frequencies with which each value
occurs (or frequencies with which data fall within
each range).
■A frequency distribution is a way to summarize data
■The distribution condenses the raw data into a
more useful form and allows for a quick visual
interpretation of the data.
Example
Data in Ordered Array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Class Frequency Frequency Percentage
10 but under 20 3 .15 15
20 but under 30 6 .30 30
30 but under 40 5 .25 25
40 but under 50 4 .20 20
50 but under 60 2 .10 10
Total 20 1 100
Frequency Distribution:
Discrete Data
■The following data record the number of
children in the families of the 47 workers in a
company:
1 1 3 2 0 2 0 1 2 2 1 3

5 2 4 0 0 2 4 1 1 2 2 0

3 0 0 2 1 3 6 0 2 1 0 3

2 2 2 1 0 0 1 1 3 1 4
Frequency distribution table
Number of Number of workers
children in family
0
1
2
3
4
5
6
Frequency Distribution:
Discrete Data
■Discrete data: possible values are countable
Number of days
Example: An read
Frequency
advertiser asks 0 44
200 customers 1 24
how many days 2 18
per week they 3 16
read the daily 4 20
newspaper. 5 22
6 26
7 30
Total 200
Relative Frequency
Relative Frequency: What proportion is in each category?
Number of days Relative
Frequency
read Frequency
0 44 .22
1 24 .12
2 18 .09 22% of the
3 16 .08 people in the
sample report
4 20 .10 that they read
5 22 .11 the newspaper
0 days per week
6 26 .13
7 30 .15
Total 200 1.00
NOTE
For developing frequency and relative frequency
distributions for discrete data
(1)List all possible values of the variables. If the
variable is quantitatives, order the possible values
from low and high.
(2) Count the number of occurrences at each
value of the variable and place this value in a
column labeled “frequency”
(3)Determine the variable frequencies
Frequency Distribution:
Continuous Data

■Continuous Data: may take on any value in


some interval
Example: A manufacturer of insulation randomly selects
20 winter days and records the daily high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41,
43, 44, 27, 53, 27
(Temperature is a continuous variable because it could
be measured to any degree of precision desired)
Distribution classes
■ Class limits: are the lower and upper values of the
classes as physically described in the distribution.

Discrete Continuous
Lowe data Upper data
r Classes Classes
limit
limit

Lower Upper
limit limit
Distribution classes
■Class widths (class lengths):
- continuous data: are the numerical differences
between lower and upper class limits.
- discrete data: are the numerical differences
between the lower limit of one class and the lower
limit of the immediately following class
■Class mid-points: are situated in the centre of the
classes.
Distribution classes
■ Open-ended class: Classes
- A class without a/an < 10
lower/upper limit.
- Usually used for the first 10-15
class which has no defined
lower limit and/or the last 15-20
class which has no defined
upper limit
>=20
Grouping Data by Classes
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

■ Find range: 58 - 12 = 46
■ Select number of classes: 5 (usually between 5 and 20)
■ Compute class width: 10 (46/5 then round off)
■ Determine class boundaries:10, 20, 30, 40, 50
■ Compute class midpoints: 15, 25, 35, 45, 55
■ Count observations & assign to classes
Frequency Distribution Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53,
58
Frequency Distribution

Class Frequency Relative


Frequency
10 but under 20 3 .15
20 but under 30 6 .30
30 but under 40 5 .25
40 but under 50 4 .20
50 but under 60 2 .10
Total 20 1.00
Questions for Grouping Data
into Classes
■1. How wide should each interval be?
(How many classes should be used?)

■2. How should the endpoints of the


intervals be determined?
■Often answered by trial and error, subject to user judgment
■The goal is to create a distribution that is neither too "jagged"
nor too "blocky”
■Goal is to appropriately show the pattern of variation in the
data
How Many Class Intervals?
■ Many (Narrow class intervals)
■may yield a very jagged
distribution with gaps from
empty classes
■Can give a poor indication of
how frequency varies across
classes

■ Few (Wide class intervals)


■may compress variation too
much and yield a blocky
distribution
■can obscure important patterns
of variation.
(X axis labels are upper class endpoints)
Guidelines for grouping values into
classes
• Use between 5 and 20
classes.
Or Sturges’s Rule: Classes = 1 + 3.322[log10(n)]
where n: number of data values.
The classes should meet four criteria
-First, they must be mutually exclusive.
-Second, they must be all-inclusive.
-Third, if at all possible, they should be of equal-width
and
-Fourth, avoid empty classes if possible.
General Guidelines
■ Number of Data Points Number of Classes
under 50 5- 7
50 – 100 6 - 10
100 – 250 7 - 12
over 250 10 - 20
■Class widths can typically be reduced as the number of
observations increases
■Distributions with numerous observations are more
likely to be smooth and have gaps filled since data are
plentiful
Class Width

■The class width is the distance between the


lowest possible value and the highest possible
value for a frequency class

■The minimum class width is

Largest Value - Smallest Value


W
= Number of Classes
The Histogram

Data in Ordered Array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

No Gaps
Between
Bars

Class Boundaries
Class Midpoints
The Histogram

▪ A graph of the data in a frequency distribution is called a


histogram.

▪ The class boundaries (or class midpoints) are shown


on the horizontal axis.

▪ The vertical axis is either frequency, relative


frequency, or percentage.

▪ Bars of the appropriate heights are used to represent


the number of observations within each class.
Example

■To improve the quality of a product, a sample of


10 products is randomly collected each day for
ten days. The outside diameter is gauged and
reported in dataset diameter. Construct a
frequency distribution and a histogram to show
the distribution of the diameter.
The Histogram

Three general types of information:


❑ A visual indication of where the approximate
center of data is.
❑The degree of spread (or variation) in the data.
❑The shape of the distribution.
Data pattern

■Patterns in data are commonly described in


terms of: center, spread, shape and unusual
features
■Some common distributions have special
descriptive labels such as symmetric, bell-
shapes, skewed, etc…
Center

■the center of a distribution is located at the


median of the distribution
■This is the point in a graphic display where
about half of the observations are on either side
■Example:
Spread

■The spread of a distribution refers to the


variability of the data
■If the observations cover a wide range, the
spread is larger. If the observations are
clustered around
Shape

The shape of a distribution is described by the


following characteristics.
■Symmetry
■Number of peaks
■Skewness
■Uniform: When a uniform distribution has no
clear peaks.
Example
Unusual features

■The two most common unusual features are


gaps and outliers.
■Gaps: refer to areas of a distribution where there are
no observations.
■Outliers: distributions are characterized by extreme
values that differ greatly from the other observations.
Example
Histogram-Example
■ A manufacturer of industrial wheels suspects that
profitable orders are being lost because of the long time
the firm takes to develop price quotes for potential
customers. To investigate this possibility, 50 requests for
price quotes were randomly selected from the set of all
quotes made last year, and the processing time was
determined for each quote. Each quote was classified
according to whether the order was “lost” or not (i.e.,
whether or not the customer placed an order after
receiving a price quote).
■ Use data set QUOTES to create a frequency histogram
for these data. Then shade the area under the histogram
that corresponds to lost orders. Interpret the result.
Histogram-Example

“lost” orders
in the upper
tail of the
distribution.
Histogram- Example
■ In the Journal of Experimental Social Psychology (Vol. 45, 2009) study on
whether money can buy love (p. 63), the researchers randomly assigned
participants to the role of either gift-giver or gift-receiver. (Gift-givers, recall,
were asked about a birthday gift they recently gave, while gift-recipients were
asked about a birthday gift they recently received.) Two quantitative variables
were measured for each of the 237 participants: gift price (measured in dollars)
and overall level of appreciation for the gift (measured as the sum of the two 7-
point appreciation scales, with higher values indicating a higher level of
appreciation).
■ One of the objectives of the research was to investigate whether givers and
receivers differ on the price of the gift reported and on the level of appreciation
reported.
■ Use BUYLOV to construct side-by-side histograms for the quantitative
variables, one histogram for gift-givers and one for gift-recipients.
The histograms for birthday gift price

The prices
reported by
gift-recipients
tended to be
higher than
the prices
reported by
gift-givers.
The histograms for overall level of
appreciation

Gift-givers and
gift-recipients
respond
differently, with
gift-recipients
more likely to
express a greater
level of
appreciation for
the gift than what
gift-givers
perceive
Organizing Numerical Data

Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21

Ordered Stem and Leaf Frequency Distributions


Display and
Cumulative Distributions
Array 2 144677
21, 24, 24, 26, 27,
3 028
27, 30, 32, 38, 41
4 1 Histograms Ogive

Tables Polygons
Meal Costs
16
14 City
12
10
Frequency

8
6
4
2
0
25 35 45 55 65 75 85 95

Meal cost
Cummulative and relative cummulative frequency
distribution

■A summary of a set of data that displays the


number of observations with values less than or
equal to the upper limit of each of its classes.
■A summary of a set of data that displays the
proportion of observations with values less than
or equal to the upper limit of each of its classes.
Relative frequency histograms and ogives

• A relative frequency histogram is


formed in the same manner as a
frequency histogram, but it used
rather than frequencies.
• The cummulative relative frequency
is presented using a graph called an
ogive.
Tabulating Numerical Data:
Cumulative Frequency

Data in Ordered Array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Lower Cumulative Cumulative


Limit Frequency % Frequency
10 0 0
20 3 15
30 9 45
40 14 70
50 18 90
60 20 100
The Ogive (Cumulative %)

Data in Ordered Array :


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Class Boundaries (low boundaries (Not Midpoints)


Ogives
Unlike the
percentage
polygon, the
lower boundary
of the class
interval for the
numerical
variable are
plotted,
at their
respective class
percentages as
points on a line
along the X axis.
The Frequency Polygon

Data in Ordered Array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Class Midpoints
The Polygon
▪ A percentage polygon is formed by having the
midpoint of each class represent the data in that
class and then connecting the sequence of
midpoints at their respective class percentages.
▪ The cumulative percentage polygon, or ogive,
displays the variable of interest along the X axis,
and the cumulative percentages along the Y axis.
▪ Useful when there are two or more groups to
compare.
The Polygons

This chart uses


the midpoints of
each class
interval to
represent the
data of each
class and then
plots the
midpoints,
at their
respective class
percentages, as
points on a line
along the X axis
The Percentage Polygon
Line Charts and Scatter Diagrams
■Line charts show values of one variable
vs. time
■Time is traditionally shown on the horizontal axis

■Scatter Diagrams show points for bivariate


data
■one variable is measured on the vertical axis and
the other variable is measured on the horizontal
axis
Example
■ In collecting meal cost data as part of a study that reviews the
travel and entertainment costs that a business incurs in a major
city, you might want to determine if the cost of meals at restaurants
located in the center city district differ from the cost at restaurants
in the surrounding metropolitan area. As you collect meal cost data
for this study, you also note the restaurant location, center city or
metro area. The file Restaurants stores the data.
■ Construct ogives and polygons of meal costs for center city and
metro area restaurants
Line Chart Example
Inflation
Year Rate
1985 3.56
1986 1.86
1987 3.65
1988 4.14
1989 4.82
1990 5.40
1991 4.21
1992 3.01
1993 2.99
1994 2.56
1995 2.83
1996 2.95
1997 2.29
1998 1.56
1999 2.21
2000 3.36
2001 2.85
2002 1.58
Scatter Diagram Example

Volume Cost per


per day day
23 125
26 140
29 146
33 160
38 167
42 170
50 188
55 195
60 200
Types of Relationships

■Linear Relationships
Types of Relationships
(continued
)
■Curvilinear Relationships
Types of Relationships
(continued
)
■No Relationship
Summary
■END OF CHAPTER 2

You might also like