Descriptive Statistics
Descriptive Statistics
Data Organization
and Presentation
Tufa Kolola
(MPH, Ass’t. Prof.)
1
Learning
objectives
At the end of this session you will be able to:
• Present qualitative data using tabular methods
• Present qualitative data using graphical
methods
• Present quantitative data using tabular methods
• Present quantitative data using graphical
methods
2
Descriptive
summary statistics
3
Raw data
Definition
Data that have been collected or recorded but
have not been arranged or processed yet are
called raw data
4
Example1: Ages of 50
students in years
21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24
5
Example2:
O AB A AB AB B O B B O
O O B O A O O A B B
A A AB O O O A O O B
A O O O A B O O A A
O A A B AB B O A O A
Ordered array
Ordered array: is a simple arrangement of
individual observations in the order of magnitude
- Example: Ages of 50 students
18 19 19 21 22 23 23 25 26 31
18 19 20 21 22 23 24 25 27 33
18 19 20 21 22 23 24 25 27 34
19 19 20 21 22 23 25 25 28 37
19 19 21 21 22 23 25 26 29 37
Data
Qualitative Data Quantitative Data
8
Frequency
Distribution
Frequency distribution: is a table that summarizes
a raw data into non-overlapping classes or categories
along with their corresponding class frequency
10
Frequency Distribution
for categorical variables
11
Frequency Distribution for
categorical variables
12
Frequency Distribution
for categorical variables
A relative frequency distribution: Shows the proportion
of counts that fall into each class or category
15
Frequency Distribution
for numerical variables
16
Ungrouped Frequency
Distribution
Ungrouped frequency distribution: Consists
of a single data with their respective frequency
17
Example:
Leisure time in hours per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16
18
Leisure time Frequency
(hours)
10 1
12 1
13 1
14 2
15 2
16 3
18 2
19 4
20 2
21 3
22 2
23 3
24 2
25 1
26 1
27 2
28 2
29 1
31 1
32 1
34 1
36 1
38 1
Total 40
19
Grouped Frequency
Distribution
20
Grouped Frequency
Distribution
21
Step 2: Determine the number of classes (k) and
the corresponding width, we may use:
Where;
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
22
Step 3: For each class, count the number of
observations (class frequency)
23
Grouped Frequency
Distribution
Guidelines for Constructing a Frequency
Distribution:
The classes must be mutually exclusive
24
Example:
Leisure time (hours) per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16
27
True limits: Are those limits that make an
interval of a continuous variable continuous in
both directions
28
29
Guidelines for
constructing tables
Tables should be self-explanatory
Show totals
If data is not original, indicate the source in foot-
note
30
Graphical
presentation of data
Should be self-explanatory
31
Graphical
presentation
33
Types of graphs
Categorical data
– Bar chart
– Pie-chart
Quantitative data
– Histogram
– Frequency Polygon
– Ogive
– Stem-and-leaf plot
– Box plot
– Scatter Diagram
34
Bar chart
Definition:
A graph made of bars whose heights represent
the frequencies of respective categories is called
a bar graph.
35
Bar chart
Used to display frequency contained in the
frequency distribution of categorical variable
o All the bars should rest on the same line called the
base
30
20
10 5.5
0
First trimester Second trimester Third trimester
50
45
40
35
30 25.7
25
20
15 10.0
10
5
0
Urban Rural
Residence
40
Pie-chart
41
Example
Digestive Others
System 8%
Injury and 4%
Poisoning
3%
Respiratory
system ciculatory
13% system
42%
Neoplasmas
30%
43
Histogram
In a histogram, the bars are drawn adjacent to
each other
44
Example
Using the following frequency distribution of the
home runs hit by Major League Baseball teams
during the 2002 season, construct the histogram
15
12
9
Frequency
0
123.5 145.5 167.5 189.5 211.5 233.5
Figure 4: Total home runs hit by all players of each of the 30
Major League Baseball teams during the 2002 season
47
Frequency
polygon
48
Frequency polygon
15
12
9
Frequency
0
134.5 156.5 178.5 200.5 222.5
50
Ogive
It is obtained as follows:
On a vertical axis we mark cumulative frequency
On a horizontal axis we mark the upper
boundaries of all classes. However, the lower
boundary of the first class will be the starting
point
Then, a smooth curve is drawn joining all these
points
51
Class boundaries and their Frequency and
cumulative frequency distributions
30
25
Cumulative frequency
20
15
10
54
Stem-and leaf plot
Can be constructed as follows:
(1) Separate each data point into a stem component
and a leaf component
The stem component consists of the number
formed by all but the rightmost digit of the
number, and the leaf component consists of the
rightmost digit. Thus the stem of the number
483 is 48, and the leaf is 3
56
Stem-and-leaf plot for the birth weight data
(N=100)
Stem Leaves
57
Stem-and-leaf plot can be constructed as
follows:
(3) Write the second stem, which equals the fist stem
+ 1, below the fist stem
(4) Continue with step until you reach the largest stem
in the data set
59
Box plot: BP for 113 Males
Boxplot of Systolic Blood Pressures
Sample of 113 Men
60
Box plot: BP for 113 Males
Sample Median
Blood Pressure
61
Box plot: BP for 113 Males
Boxplot of Systolic Blood Pressures
Sample of 113 Men
75th Percentile
25th Percentile
62
Box plot: BP for 113 Males
Largest Observation
Smallest Observation
63
Tabular and Graphical Procedures
Data
Qualitative Data Quantitative Data