0% found this document useful (0 votes)
4 views78 pages

Part 2 - Descriptive Statistics

The document covers descriptive statistics, focusing on methods for summarizing both categorical and quantitative data through various techniques such as frequency distributions, bar charts, and histograms. It also discusses exploratory data analysis, including stem-and-leaf displays and scatter diagrams, and highlights the importance of visual representations of data. Real-life examples are provided to illustrate these concepts effectively.

Uploaded by

santha.kani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views78 pages

Part 2 - Descriptive Statistics

The document covers descriptive statistics, focusing on methods for summarizing both categorical and quantitative data through various techniques such as frequency distributions, bar charts, and histograms. It also discusses exploratory data analysis, including stem-and-leaf displays and scatter diagrams, and highlights the importance of visual representations of data. Real-life examples are provided to illustrate these concepts effectively.

Uploaded by

santha.kani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Section 2 : Descriptive Statistics: Tabular and

Graphical Presentations
❖ Summarizing Categorical Data
• Frequency Distributions
• Bar Charts & Pareto Diagrams
• Pie Charts
❖ Summarizing Quantitative Data
• Frequency Distributions
• Dot plots
• Histograms and Skewness
• Cumulative Distributions
• Ogives

1
Section 2 - (Continued)

❖ Exploratory Data Analysis


• Stem and Leaf Display
• Scatter Diagram
• Trendline
❖ Cross Tabulation
❖ Simpson’s Paradox
❖ Summary
❖ All topics are explained through real-life examples

2
Topics
❖ Descriptive Statistics
❖ Summarizing Data
❖ Frequency Distributions

3
Descriptive Statistics

❖ Information in newspapers, magazines, and other


publications is usually presented in forms that are easy to
understand.

❖ Such summaries of data, which may be tabular, graphical, or


numerical, are referred to as descriptive statistics.

4
Summarizing Data

❖ Summarizing Categorical Data


Categorical data use labels or names to identify categories
of like items.

❖ Summarizing Quantitative Data


Quantitative data are numerical values that indicate how
much or how many.

5
Summarizing Categorical Data

❖ Frequency Distribution
❖ Relative Frequency Distribution
❖ Percent Frequency Distribution
❖ Bar Chart
❖ Pie Chart

6
Frequency Distribution

It is a tabular summary of data showing the frequency


(or number) of items in each of several
non-overlapping classes.

The objective is to provide insights about the data


that cannot be quickly obtained by looking
only at the original data.

7
Frequency Distribution
❖ Example of a Motel:
Guests staying at a motel are usually asked to rate the quality
of their accommodation. Here is a sample of 20 ratings:

Below Average Average Above Average


Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average

8
Frequency Distribution
❖ Motel Example:

Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20

9
Relative Frequency Distribution

The relative frequency of a class is the fraction or proportion


of the total number of data items belonging to the class.

A relative frequency distribution is a tabular summary of a


set of data showing the relative frequency for each class.

10
Percent Frequency Distribution

The percent frequency of a class is the relative frequency


multiplied by 100.

A percent frequency distribution is a tabular summary


showing the percent frequency for each class.

11
Relative and Percent Frequency Distributions

❖ Example: Motel

Relative Percent
Rating Frequency Frequency

Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100

2/20 = .10
12
Topics
❖ Bar Charts
❖ Pareto Diagrams
❖ Pie Charts

13
Bar Chart

❖ A bar chart is a graphical way of depicting qualitative data.


❖ On one axis (usually the horizontal axis), we specify the labels
that are used for each of the classes.

❖ A frequency, relative frequency, or percent frequency scale can


be used for the other axis (usually the vertical axis).

❖ Using a bar of fixed width drawn above each class label, we


extend the height appropriately.
❖ The bars are separated to emphasize the fact that each class is a
separate category.

14
Bar Chart
Ratings for the Motel
10
9
8
Frequency 7
6
5
4
3
2
1

Poor Below Average Above Excellent


Average Average
Rating
15
Pareto Diagram

❖ In quality control, bar charts are used to identify the most important
causes of problems.

❖ When the bars are arranged in descending order of height from left
to right (with the most frequently occurring cause appearing first)
the bar chart is called a Pareto diagram.

❖ This diagram is named for its founder, Vilfredo Pareto, an Italian


economist.

16
Pie Chart

❖ The pie chart is a commonly used graphical device for presenting


frequency distributions for categorical data.

❖ First draw a circle; then use the relative frequencies to subdivide


the circle into sectors that correspond to the portion for each class.

❖ Since there are 360 degrees in a circle, a class with a relative


frequency of .25 would consume .25(360) = 90 degrees of the
circle.

17
Pie Chart of Motel Ratings

Excellent
5%
Poor
10%
Below
Average
Above 15%
Average
45%
Average
25%

18
Insights Gained from the Pie Chart

❖ One-half of the customers rated the motel Excellent


5%
as “above average” or “excellent” (looking Poor
10%
at the left side of the pie). Below
Average
15%
Above
Average
45%
Average
❖ For each customer giving an “excellent” 25%

rating, two customers rated it as “poor” (top).

19
Topics
❖ Summarizing Quantitative Data
❖ Steps in drawing Frequency Distribution
❖ Example of Frequency Distribution

20
Summarizing Quantitative Data

❖Frequency Distribution
❖Relative Frequency and Percent Frequency Distributions
❖Dot Plot
❖Histogram and Skewness
❖Cumulative Distributions
❖Ogive

21
Frequency Distribution
The three steps necessary to define the classes with quantitative
data are:

1. Determine the number of non-overlapping classes.


2. Determine the width of each class.
3. Determine the class limits.

22
Step 1: Classes

❖ Guidelines for Determining the Number of Classes

• Use between 5 and 20 classes.


• Large data sets usually require a larger number of classes, and
vice versa.

The goal is to use enough classes to show the variation in data,


but not so many classes that some contain only a few values.

23
Step 2: Class Width
❖ Guidelines for Determining the Width of Each Class
• Use classes of equal width.
• Approximate Class Width =

Largest Data Value − Smallest Data Value


Number of Classes

Using the same class width for the whole diagram makes it
simple and easy to understand.

24
Frequency Distribution

❖ Note on Number of Classes and Class Width


• The number of classes and the appropriate class width are
determined by trial and error.
• Once the number of classes is chosen, the appropriate
class width is found.

• The process can be repeated for a different number of classes.

• Ultimately, it takes practice to determine the appropriate


combination of the number of classes and class width, that
lead to the best frequency distribution for given data

25
Step 3: Class Limits
❖ How to determine Class Limits
• Class limits must be chosen so that each data item belongs
to one and only one class.
• The lower class limit identifies the smallest possible data
value assigned to the class.
• The upper class limit identifies the largest possible data
value assigned to the class.

An open-end class requires only a


lower class limit or an upper class limit.

26
Example: A-Z Super Store
The manager of A-Z Superstore wants to have a better
understanding of how much customers are spending in
her store. She examines 50 customer invoices.

The customer spending, rounded to the nearest dollar,


are on the next slide.

27
Example: A-Z Super Store

Customer spending: 50 Samples (Raw Data)

91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

28
Frequency Distribution
❖ Example: A-Z Superstore
If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5  10

Spending ($) Frequency


50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50

29
Relative and Percent Frequency Distributions

❖ Example: A-Z Superstore

Spending Relative Percent


($) Frequency Frequency
50-59 .04 4
60-69 .26 2/50 26 .04(100)
70-79 .32 32
80-89 .14 14 Percent
frequency is
90-99 .14 14 the relative
100-109 .10 10 frequency
Total 1.00 100 multiplied
by 100.

30
Insights from the Frequency Distribution

• Only 4% of the customers are in the $50-59 class.

• 30% of the customers spent under $70.

• The greatest percentage (32% or almost one-third)


of the customers are in the $70-79 class.
• 10% of the customers spent $100 or more.

31
Topics
❖ Dot Plots
❖ Histograms
❖ Skewness in Histograms

32
Dot Plot

❖It is one of the simplest graphical summaries of data.

❖A horizontal axis shows the range of data values.

❖Each data value is represented by a dot placed above


the axis.

33
Dot Plot

❖ Example: A-Z Superstore

50 60 70 80 90 100 110
Spending ($)

34
Histogram

❖ Another common graphical presentation of quantitative data


is a histogram.

❖ The variable of interest is placed on the horizontal axis.

❖ A rectangle is drawn above each class interval, with its height


corresponding to the interval’s frequency, relative frequency,
or percent frequency.
❖ Unlike a bar graph, a histogram has no natural separation
between rectangles of adjacent classes.

35
Histogram

18
16
14

Frequency 12
10
8
6
4
2

50−59 60−69 70−79 80−89 90−99 100-110


Spending ($)

36
Skewness in Histograms
❖ Symmetric
• Left tail is the mirror image of the right tail
• Examples: heights and weights of people

.35

Relative Frequency .30


.25
.20
.15
.10
.05
0

37
Skewed Histograms
❖ Moderately Skewed towards Left
• A longer tail to the left
• Example: Scores for an easy exam ☺

.35
Relative Frequency .30
.25
.20
.15
.10
.05
0

38
ACT scores of students

❖ Highly Skewed towards Left

39
Skewed Histograms
❖ Moderately Skewed towards Right
• A Longer tail to the right
• Example: housing values
.35
.30
Relative Frequency

.25
.20
.15
.10
.05
0

40
Prices of houses in the US
❖ Highly Skewed towards Right

41
Executive salaries in the US
❖ Highly Skewed towards Right

42
Topics
❖ Cumulative Distributions
❖ Ogives

43
Cumulative Distributions

Cumulative frequency distribution − shows the


number of items with values less than or equal to the
upper limit of each class..

Cumulative relative frequency distribution – shows


the proportion of items with values less than or equal to
the upper limit of each class.

Cumulative percent frequency distribution – shows


the percentage of items with values less than or
equal to the upper limit of each class.

44
Cumulative Distributions

❖ The last entry in a cumulative frequency distribution always


equals the total number of observations.
❖ The last entry in a cumulative relative frequency distribution
always equals 1.00.
❖ The last entry in a cumulative percent frequency distribution
always equals 100.

45
Cumulative Distributions
❖ A-Z Superstore

Cumulative Cumulative
Spending Cumulative Relative Percent
($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 2 + 13 .62 15/50 62 .30(100)
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100

46
Ogive
❖It is the graph of a Cumulative Distribution.
❖The data values are shown on the horizontal axis.
❖Shown on the vertical axis are the:
• cumulative frequencies, or
• cumulative relative frequencies, or
• cumulative percent frequencies
❖The frequency of each class is plotted as a point.

❖The plotted points are connected by straight lines.

47
Ogive
❖ A-Z Superstore
• Because the class limits for the customers data are 50-59, 60-69,
and so on, there appear to be one-unit gaps from 59 to 60, 69 to
70, and so on.

• These gaps are eliminated by plotting points halfway between


the class limits.

• Thus, 59.5 is used for the 50-59 class, 69.5 is used for the 60-69
class, and so on.

48
Ogive with Cumulative Percent Frequencies
A-Z Superstore

100

Cumulative Percent Frequency 80

60 (89.5, 76)

40

20

50 60 70 80 90 100 110
Spending ($)

49
Topics
❖ Exploratory Data Analysis
❖ Stem-and-Leaf Display
❖ Example

50
Exploratory Data Analysis

❖ The techniques of exploratory data analysis consist of


simple arithmetic and easy-to-draw pictures that can
be used to summarize data quickly.

❖ One such technique is the stem-and-leaf display.

51
Stem-and-Leaf Display
❖ A stem-and-leaf display shows both the rank order
and shape of the distribution of the data.
❖ It is like a histogram, but with the advantage of
showing the actual data values.
❖ The first digits of each data item are arranged to the
left of a vertical line.
❖ To the right of the vertical line, we record the last
digit for each item in sequence
❖ Each row in the display is referred to as a stem.
❖ Each digit on a stem is a leaf.

52
Example: A-Z Super Store
The manager of A-Z Superstore wants to have a better
understanding of how much customers are spending in
her store. She examines 50 customer invoices.

The customer spending, rounded to the nearest dollar,


are on the next slide.

53
Example: A-Z Super Store

Customer spending: 50 Samples (Raw Data)

91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

54
Stem-and-Leaf Display

5 2 7
6 2 2 2 2 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9
8 0 0 2 3 5 8 9
9 1 3 7 7 7 8 9
10 1 4 5 5 9

a stem a leaf
55
Stretched Stem-and-Leaf Display

❖ If the data is being condensed too much, we can stretch the


display vertically by using two stems for each leading digit(s).

❖ Whenever a stem value is stated twice, the first value


corresponds to leaf values of 0 − 4, and the second
value corresponds to leaf values of 5 − 9.

56
Stretched Stem-and-Leaf Display
❖ Example: A-Z Superstore
5 2
5 7
6 2 2 2 2
6 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4
7 5 5 5 6 7 8 9 9 9
8 0 0 2 3
8 5 8 9
9 1 3
9 7 7 7 8 9
10 1 4
10 5 5 9
57
Stem-and-Leaf Display
❖Leaf Units
• A single digit is used to define each leaf.
• In the preceding example, the leaf unit was 1.
• Leaf units may be 100, 10, 1, 0.1, and so on.

• Where leaf unit is not shown, assumed it equal to 1.

• Leaf unit indicates how to multiply stem-and-leaf


numbers in order to approximate the original data.

58
Example: Leaf Unit = 0.1
If we have data with values such as
8.6 11.7 9.4 9.1 10.2 11.0 8.8

Stem-and-leaf display of these data will be


Leaf Unit = 0.1
8 6 8
9 1 4
10 2
11 0 7

59
Example: Leaf Unit = 10
If we have data with values such as
1806 1717 1974 1791 1682 1910 1838

A stem-and-leaf display of these data will be

Leaf Unit = 10
16 8
The 82 in 1682
17 1 9 is rounded down
18 0 3 to 80 and is
represented as an 8.
19 1 7 Some accuracy is lost.

60
Topics
❖ Cross-tabulation and Scatter Diagrams
❖ Cross-tabulation
❖ Row and Column Percentages

61
Cross-tabulations and Scatter Diagrams

❖ So far we have focused on methods that are used to summarize


data for one variable at a time.

❖ We are often interested in tabular and graphical methods that


will help understand the relationship between two variables.

❖ Crosstabulation and a scatter diagram help in summarizing the


data for two variables simultaneously.

62
Cross-tabulation

❖ A crosstabulation is a tabular summary of data for two variables.

❖ Crosstabulation can be used when:


• one variable is qualitative and the other is quantitative,
• both variables are qualitative, or
• both variables are quantitative.
❖ The left and top margin labels define the classes for two variables.

63
Cross-tabulation
❖ Example: Motel Rooms
The daily rents for 150 motel room, and their ratings is shown below:
Quantitative variable
Room Rent ($)
Rating
<50 50-100 100-150 >150 Total
Good 21 20 1 0 42
categorical Very Good
variable 17 32 23 3 75
Excellent 1 7 14 11 33
Total 39 59 38 14 150

64
Cross-tabulation
❖ Example: Motel Rooms

Room Rent ($)


Rating
<50 50-100 100-150 >150 Total
Good 21 20 1 0 42 Frequency
distribution
Very Good 17 32 23 3 75 for
the Rating
Excellent 1 7 14 11 33 variable

Total 39 59 38 14 150

Frequency distribution for


The Room Rent variable
65
Cross-tabulation
❖ Insights from the cross-tabulation:

• The greatest number of motel rooms in the sample (32) are


rated ‘very good’ with a rent range of $50-100

• Only one motel room in the sample is rated ‘Excellent’ and


with a rent of $50 or less.

•Expensive motels generally have better rating.

66
Cross-tabulation: Row & Column percentages

❖ Converting each entry into row percentages or column


percentages can provide additional insight about the
relationship between the two variables.

67
Cross-tabulation: Row Percentage

Room Rent ($)


Rating
<50 50-100 100-150 >150 Total
Good 50.00 48.00 2.00 0 100
Very Good 22.66 42.66 30.66 4.00 100
Excellent 3.03 21.21 42.42 33.33 100

Note: Some row totals are not exactly 100.00 due to rounding.

(Good rating and <50)/(All with ‘Good’ rating) x 100 = (21/42) x 100

68
Cross-tabulation: Column Percentage

Room Rent ($)


Rating
<50 50-100 100-150 >150
Good 53.84 33.89 2.63 0
Very Good 43.58 54.23 60.52 21.42
Excellent 2.56 11.86 36.84 78.57

Total 100 100 100 100

(Good rating and <50)/(All <50) x 100 = (21/39) x 100

69
Topics
❖ Scatter Diagram
❖ Trendline
❖ Summary of Section-2

70
Scatter Diagram and Trendline

❖ A scatter diagram is a graphical presentation of relationship


between two quantitative variables.
❖ One variable is shown on the horizontal axis and the other
variable is shown on the vertical axis.

❖ The general pattern of the plotted points suggests the


overall relationship between the variables.
❖ A trendline provides an approximation of the relationship.

71
Scatter Diagram
A Positive Relationship
y

72
Scatter Diagram
A Negative Relationship

73
Scatter Diagram
No apparent Relationship

74
Scatter Diagram
❖ Example: Ice-cream Sales vs. Temperature
The manager of an ice-cream parlor is interested in investigating the
relationship, if any, between Sales and outside Temperature.

x= y = Number of
Temperature Ice-creams Sold
71 37
66 27
79 61
67 35
77 49

75
Scatter Diagram
y

Number of Ice-creams Sold


70
60
50
40
30
20
10
x
65 70 75 80
Temperature (F)

76
Example: Ice-cream Sales vs. Temperature
❖ Insights Gained from the Preceding Scatter Diagram

• The scatter diagram indicates a positive relationship


between the outside temperature and the sales

• Higher temperatures are associated with higher sales.

• The relationship is not perfect; not all points are on a


straight line.

77
Tabular and Graphical Methods
Data
Categorical Data Quantitative Data

Tabular Graphical Tabular Graphical


Methods Methods Methods Methods

• Frequency • Bar Chart • Frequency • Dot Plot


Distribution • Pie Chart Distribution • Histogram
• Rel. Freq. Dist. • Rel. Freq. Dist. • Ogive
• Percent Freq. • % Freq. Dist. • Stem-and-Leaf
Distribution • Cum. Freq. Dist. Display
• Cross-tabulation • Cum. Rel. Freq. • Scatter
Distribution Diagram
• Cum. % Freq.
Distribution
• Cross-tabulation
78

You might also like