0% found this document useful (0 votes)
7 views49 pages

Week 1 - CH 2

Chapter 2 covers descriptive statistics, focusing on summarizing data for one variable at a time and displaying relationships between two variables. It discusses methods for displaying both quantitative and qualitative data, including frequency distributions, histograms, and crosstabulations. The chapter emphasizes the importance of choosing appropriate class widths and the number of classes for effective data presentation.

Uploaded by

vinctjc070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views49 pages

Week 1 - CH 2

Chapter 2 covers descriptive statistics, focusing on summarizing data for one variable at a time and displaying relationships between two variables. It discusses methods for displaying both quantitative and qualitative data, including frequency distributions, histograms, and crosstabulations. The chapter emphasizes the importance of choosing appropriate class widths and the number of classes for effective data presentation.

Uploaded by

vinctjc070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Chapter 2: Descriptive Statistics:

Tabular and Graphical Presentations

I. Summarize the Data for One Variable at a


Time:
• Displaying Quantitative Data
• Displaying Qualitative (Categorical) Data

II. Display Relationship Between Two


Variables:
• Crosstabulation and Scatter Diagram
Chapter 2: Descriptive Statistics:
Tabular and Graphical Presentations

I. Summarize the Data for One Variable at a


Time:
• Displaying Quantitative Data
• Displaying Qualitative (Categorical) Data

II. Display Relationship Between Two


Variables:
• Crosstabulation and Scatter Diagram
I. Display One Variable: Quantitative
Data

Summarizing Quantitative Data


Tabular:
• Frequency Distribution
• Relative Frequency and Percent
Frequency Distributions
• Cumulative Distributions
Graphical:
• Histogram
• Ogive

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-3
Recall: Discrete vs. Continuous
Data
Discrete data are values based on
observations that can be counted and are
typically represented by whole numbers (how
many)
• something that has been counted
• take on whole numbers such as 0, 1, 2, 3

Continuous data are values that can take on


any real numbers, including numbers that
contain decimal points (how much)
• usually measured rather than counted

Let’s focus on discrete data first


Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-4
Frequency Distribution
A frequency distribution shows the number of
data observations that fall into specific intervals

Example: Number of iPads sold per day

class

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-5
Relative and Percentage Frequency
Distributions
Relative frequency distributions display the
proportion of observations of each class relative
to the total number of observations
• shows the fraction of observations in each
class
• found by dividing each frequency by the total
number of observations
• the fractions in a relative frequency
distribution add up to 1.00

Percentage frequency distributions display


the percentage of observations of each class
relative to the total number of observations
• the percentages add up to 100
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-6
Relative and Percentage Frequency
Distributions

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-7
Example: Relative Frequency
Distributions

Example:

Two iPads were sold on 28% of the


days

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-8
Cumulative Frequency Distributions

A cumulative frequency distribution shows


the number of items that are less than or equal
to the upper limit of each class

A cumulative relative frequency distribution


shows the proportion of items that are less than
or equal to the upper limits of each class

A cumulative percent frequency distribution


shows the percentage of items that are less than
or equal to the upper limit of each class

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-9
Cumulative Frequency Distributions

• The last entry in a cumulative frequency


distribution always equals the total number
of observations
• The last entry in a cumulative relative
frequency distribution always equals 1.00
• The last entry in a cumulative percent
frequency distribution always equals 100

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-10
Example: Cumulative Relative
Frequency Distributions
Example:

Three iPads or less were sold on 80% of the business


days

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-11
Histogram to Graph a Frequency
Distribution
A histogram is a graph showing the number of
observations in each class of a frequency
distribution:
1. The variable of interest is placed on the
horizontal axis
2. A rectangle is drawn above each class interval
with its height corresponding to the interval’s
frequency, relative frequency, or percent
frequency
3. A histogram has no natural separation between
rectangles of adjacent classes

2-12
The Shape of Histograms

Symmetric
• the right side is the mirror
image of the left side of the
distribution

Still symmetric, but


wider spread

Not symmetric

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-13
Cumulative Distributions Plot(Ogive)

• The data values are shown on the horizontal


axis
• Shown on the vertical axis are the:
cumulative frequencies, or cumulative
relative frequencies, or cumulative percent
frequencies
• The frequency (one of the above) of each
class is plotted as a point
• The plotted points are connected by straight
lines

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-14
Example: Cumulative Distributions

Propane mileage (in miles per gallon) for 50 taxis:


cumulative relative frequencies

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-15
Frequency Distribution Using Grouped
Continuous Quantitative Data

Ideally, the number of classes in a


frequency distribution should be between 5
and 20
• Some data sets, particularly those with
continuous data, require several values to
be grouped together in a single class
• This grouping prevents having too many
classes in the frequency distribution, which
can make it difficult to detect patterns

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-16
Number of Classes

One method to determine the number of


classes in a frequency distribution is the rule
2k  n
where k = Number of classes
n = Number of data points
• Find the lowest value of k that satisfies
the rule
E.g. suppose n = 50
25 = 32 < 50 (k = 5 is too small)
26 = 64 > 50 (k = 6 is a good choice)
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-17
Class Width

Once k is known, the width of each class can be


found
Maximum data value  Minimum data value
Estimated class width 
k

• Use equal width across all classes


• The width is the range of numbers to put
into each class
• Round this estimate to a useful whole
number that makes the frequency
distribution more readable
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-18
Note on Number of Classes and Class
Width
• There is no one correct answer for the number
of classes and class width
• The goal is to create a histogram to clearly
and usefully show the pattern in the data
• Often there is more than one acceptable way
to accomplish this
• Ultimately, the analyst uses judgment to
determine the combination of the number of
classes and class width that provides the best
frequency distribution for summarizing the
data
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-19
Class Boundaries
Class boundaries represent the minimum and
maximum values for each class

Choose class boundaries that are easy to read

 
3 to less than 6 minutes vs. 3.21 to less than
6.21 minutes
6 to less than 9 minutes 6.21 to less than 9.21
minutes
Class Frequencies

Find class frequencies by counting and


recording the number of observations in
each class
Example:

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-21
Rules for Classes for Grouped
Data
1. Equal-size classes. All classes in the frequency
distribution must be of equal width
2. Mutually exclusive classes. Class boundaries
cannot overlap
3. Include all data values. Make sure all data
values are accounted for in the total row of the
frequency distribution
4. Avoid empty classes. It is undesirable for a
histogram to display a class so narrow that there
are no observations in it
5. Avoid open-ended classes (if possible). That
is, do not use ± ∞ as lower or upper limit, because
these violate the first rule of equal class sizes

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-22
The Consequences of
Too Few or Too Many Classes-(data presentation)

Wide classes results in few class intervals


• Can obscure important patterns
• Gives a “blocky” distribution graph
• Summarizes the data too much
• Tells us little about the true distribution shape

Too many narrow classes in a histogram also has


consequences
• Results in a “jagged” histogram
• Some classes may be empty
• Does not summarize the data enough

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-23
Read By Yourself: Stem-and-Leaf
Display
• A stem-and-leaf display
shows both the rank
order and shape of the
distribution of the data
• It is similar to a
histogram on its side, but
it has the advantage of
showing the actual data
values
• The first digits of each
data item are arranged
to the left of a vertical
line
• To the right of the
vertical line we record
the last digit for each Source: World Health Organization Global Database on Body-
item in rank order Mass Index. (Last accessed November 17, 2012)
Chapter 2: Descriptive Statistics:
Tabular and Graphical Presentations

I. Summarize the Data for One Variable at a


Time:
• Displaying Quantitative Data
• Displaying Qualitative (Categorical)
Data

II. Display Relationship Between Two


Variables:
• Crosstabulation and Scatter Diagram
Displaying Qualitative Data

• Qualitative data are values that are


categorical
• Can be nominal or ordinal measurement
level
• Describe aQualitative
Summarizing characteristic,
Data such as
• gender
Tabular: or level of Distribution,
Frequency education Relative
Frequency Distribution, Percent
Frequency Distribution
• Graphical: Bar Chart, Pareto Diagram,
and Pie Chart

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-26
Frequency Distributions

Frequency distributions:
 Indicates the number of occurrences of
various categories
 Techniques are similar to frequency
distributions with quantitative data
 Can construct relative and percent
frequency distribution

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-27
Example: Frequency, Relative, and
Percent Distributions

Guests staying at Marada Inn were asked to rate the


quality of their accommodations as being excellent,
above average, average, below average, or poor.
The ratings provided by a sample of 20 guests are:

Below Average Average Above Average


Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-28
Example: Frequency, Relative, and
Percent Distributions
Frequency, relative, and percent distribution of the
rating of Marada Inn accommodations:
Relative Percent
Rating Frequency Frequency Frequency
Poor 2 .10 10
Below Average 3 .15 15
Average 5 .25 25
Above Average 9 .45 45
Excellent 1 .05 5
Total 20 1.00 100

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-29
Bar Charts

• Can be arranged in a vertical or horizontal


orientation
• On one axis (usually, horizontal), we specify
the labels that are used for each of the
classes
• A frequency, relative frequency, or percent
frequency scale can be used for the other
axis (usually, vertical)
• Using a bar of fixed width drawn above each
class label, we extend the height
appropriately
• The bars are separated to emphasize the
fact that each class is a separate category
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-30
Bar Charts
Horizontal bar chart Vertical bar
chart

Can display multiple series with clustered or stacked bar


charts:
Pareto Diagram

• In quality control, bar charts are used to


identify the most important causes of
problems
• When the bars are arranged in descending
order of height from left to right (with the
most frequently occurring cause appearing
first) the bar chart is called a Pareto
diagram
• This diagram is named for its founder,
Vilfredo Pareto, an Italian economist
• 80% of a project's benefit can be achieved by doing
20% of the work
• 80% of problems can be traced to 20% of the causes
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-32
Example: Pareto Diagram

the Institute for Healthcare Improvement identified


three vital types of errors discovered during
surgical set-up.

80% of problems can be


traced to 20% of the
causes.

*https://www.investopedia.com/terms/p/pareto-analysis.asp
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-33
Pie Charts

Pie charts are a tool for comparing


proportions for categorical data
Each segment of the pie represents the
relative frequency of one category
• All categories in the data set must be
included in the pie
• Use a pie chart to compare the relative sizes
of all possible categories (via degrees of
slices)
• Bar charts are more useful when you want
to highlight the actual data values and
when the classes combined don’t form a
whole
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-34
Example: Pie Charts
Example:

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-35
Pie Charts
• For data with many categories, most
statisticians recommend that classes with
smaller frequencies be grouped into an
aggregate class called “others”

Example:

Source: IDC, The Economist (2012)

2-36
Chapter 2: Descriptive Statistics:
Tabular and Graphical Presentations

I. Summarize the Data for One Variable at a


Time:
• Displaying Quantitative Data
• Displaying Qualitative (Categorical) Data

II. Display Relationship Between Two


Variables:
• Crosstabulation and Scatter Diagram
II. Relationship Between Two
Variables: Crosstabulation
A crosstabulation is a tabular summary
of data for two variables
Crosstabulation can be used when:
• one variable is qualitative and the other is
quantitative,
• both variables are qualitative, or
• both variables are quantitative
The left and top margin labels define the
classes for the two variables

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


2-38
Example: Crosstabulation
Example: Finger Lakes Homes
The number of Finger Lakes homes sold for each style
and price for the past two years is shown below
Quantitativ Categorical Frequency distribution
e variable variable for the price range
variable
Price Home Style
Range Colonial Log Total
Split A-Frame
< $200,000 18 6 19 1255
> $200,000 12 14 16 345
Total 30 20 35 100
15

Frequency distribution
for
the home style
Crosstabulation: Row or Column
Percentages
Converting the entries in the table into row percentages
or column percentages can provide additional insight
about the relationship between the two variables

Example: Row Percentages (Finger Lakes Homes)

Price Home Style


Range Colonial Log Total
Split A-Frame
< $200,000 32.73 10.91 34.55 100
21.82
> $200,000 26.67 31.11 35.56 100
6.67
Note: row totals are actually 100.01 due to rounding.

(Colonial and ≥ $200K)/(All ≥ $200K) x 100 = (12/45) x 100


Crosstabulation: Row or Column
Percentages

Example: Column Percentages (Finger Lakes


Homes)

Price Home Style


Range Colonial Log Split A-Frame
< $200,000 60.00 30.00 54.29 80.00
> $200,000 40.00 70.00 45.71 20.00
Total 100 100 100 100

(Colonial and ≥ $200K)/(All Colonial) x 100 = (12/30) x 100


Simpson’s Paradox

Data in two or more crosstabulations are often


aggregated to produce a summary crosstabulation.

We must be careful in drawing conclusions about


the relationship between the two variables in the
aggregated crosstabulation.

In some cases the conclusions based upon an


aggregated crosstabulation can be completely
reversed if we look at the unaggregated data. The
reversal of conclusions based on aggregate and
unaggregated data is called Simpson’s paradox.

2-42
Simpson’s Paradox
One of the best-known examples of Simpson's
paradox is a study of gender bias among
graduate school admissions to
University of California, Berkeley. The admission
figures for the fall of 1973 showed that men
applying were more likely than women to be
admitted, and the difference was so large that it
was unlikely to be due to chance.
Simpson’s Paradox
However, when examining the individual
departments, it appeared that six out of 85
departments were significantly biased against men,
whereas four were significantly biased against
women. In fact, the pooled and corrected data
showed a "small but statistically significant bias in
favor of women". The data from the six largest
departments are listed below, the top two
departments by number of applicants for each
gender.
Scatter Diagram and Trendline
• A scatter diagram is a graphical presentation of
the relationship between two quantitative variables
• One variable is shown on the horizontal axis and
the other variable is shown on the vertical axis
• The general pattern of the plotted points suggests
the overall relationship between the variables
• A trendline provides an approximation of the
relationship
• Usually the first step to do in the analysis of the
relationship of two variables

 Will discuss more in correlation and simple linear


regression
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-45
Example: Scatter Diagram
Example: countries in the northern latitudes, which are mostly rich,
will see smaller temperature fluctuations and less affected by
climate change

Source: Economist.com (Last access Jan 6, 2019)


Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-46
Types of relationships

2-47
Example: Scatter Diagram and
Trendline
Example: the Phillips curve

Source: Economist.com (Last access Apr 13, 2018)


Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
2-48
Summary

You might also like