0% found this document useful (0 votes)
15 views65 pages

Descriptive Statistics

Uploaded by

abdi merga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views65 pages

Descriptive Statistics

Uploaded by

abdi merga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Descriptive statistics:

Data Organization
and Presentation

Tufa Kolola
(MPH, Ass’t. Prof.)

1
Learning
objectives
 At the end of this session you will be able to:
• Present qualitative data using tabular methods
• Present qualitative data using graphical
methods
• Present quantitative data using tabular methods
• Present quantitative data using graphical
methods

2
Descriptive
summary statistics

 Descriptive statistics: Techniques used to


organize and summarize a set of data in more
comprehensible and meaningful way
– Organization of data
– Summarization of data
– Presentation of data

 Numbers that have not been summarized and


organized are called raw data

3
Raw data
Definition
 Data that have been collected or recorded but
have not been arranged or processed yet are
called raw data

4
Example1: Ages of 50
students in years

21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24

5
Example2:

 These are types of blood group for a sample of


50 OPD patients

O AB A AB AB B O B B O
O O B O A O O A B B
A A AB O O O A O O B
A O O O A B O O A A
O A A B AB B O A O A
Ordered array
Ordered array: is a simple arrangement of
individual observations in the order of magnitude
- Example: Ages of 50 students

18 19 19 21 22 23 23 25 26 31
18 19 20 21 22 23 24 25 27 33
18 19 20 21 22 23 24 25 27 34
19 19 20 21 22 23 25 25 28 37
19 19 21 21 22 23 25 26 29 37

 Very difficult with large sample size


7
Presentation of
data

Data
Qualitative Data Quantitative Data

Tabular Graphical Tabular Graphical


Methods Methods Methods Methods

8
Frequency
Distribution
 Frequency distribution: is a table that summarizes
a raw data into non-overlapping classes or categories
along with their corresponding class frequency

 Class frequency: The number of observations that


fall into the class

 The objective is to provide insights about the data


that cannot be quickly obtained by looking only at the
original data
9
Frequency
Distribution
 The actual summarization and organization of
data starts from frequency distribution

 The distribution condenses the raw data into a


more useful form and allows for a quick visual
interpretation of the data

10
Frequency Distribution
for categorical variables

 Count the number of observations (frequency) in


each category and present as relative
frequencies

 Often presented in the form of Table, Bar and


Pie charts

11
Frequency Distribution for
categorical variables

 Relative frequency: value for any category


obtained by dividing the number of observations in
that category by the total number of observations
- Class relative frequency = Class frequency/
Total number of observations

 This can be reported as a percentage by


multiplying the resulting fraction by 100

12
Frequency Distribution
for categorical variables
 A relative frequency distribution: Shows the proportion
of counts that fall into each class or category

 For nominal and ordinal data, frequency distributions


are often used as a summary

 The % of times that each value occurs, or the relative


frequency, is often listed

 Tables make it easier to see how the data are


distributed
13
Example 1: Nominal data
Table 1: Type of hospitals owned by MOH in Ethiopia
in 2006/07

Source: Health and health related indicator


14
Example 2: Ordinal data
Table 2: Level of satisfaction, with nursing care by
475 psychiatric in-patients, 1991

15
Frequency Distribution
for numerical variables

 A frequency distribution can also show the number


of observations at different values or within
certain ranges

 There are two types of frequency distribution:


– Single value (ungrouped frequency)
– Interval type (classes) – grouped frequency

16
Ungrouped Frequency
Distribution
 Ungrouped frequency distribution: Consists
of a single data with their respective frequency

 Can be used when the range of values in the


data set is not large

 Classes are one unit in width

17
Example:
 Leisure time in hours per week for 40 college
students:

23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16

Construct a frequency distribution table?

18
Leisure time Frequency
(hours)
10 1
12 1
13 1
14 2
15 2
16 3
18 2
19 4
20 2
21 3
22 2
23 3
24 2
25 1
26 1
27 2
28 2
29 1
31 1
32 1
34 1
36 1
38 1
Total 40
19
Grouped Frequency
Distribution

 Can be used when the range of values in the


data set is large

 The data must be grouped into classes that are


more than one unit in width

20
Grouped Frequency
Distribution

 Steps in Constructing Frequency Distribution


Tables

Step 1: Determine the range of the data

- R = Highest Value – Lowest Value

21
Step 2: Determine the number of classes (k) and
the corresponding width, we may use:

Where;
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
22
Step 3: For each class, count the number of
observations (class frequency)

Step 4: Determine the relative frequency for each


class

Frequency of each class interval


Relative frequency =
Total number of observations

23
Grouped Frequency
Distribution
 Guidelines for Constructing a Frequency
Distribution:
The classes must be mutually exclusive

The classes must be continuous

The classes must be exhaustive

The class must be equal in width

24
Example:
 Leisure time (hours) per week for 40 college
students:

23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16

Maximum value = 38, Minimum value = 10


K = 1 + 3.322 (log40) = 6.32  6
Width = (38-10)/6 = 4.6  5
25
26
 Cumulative frequencies: When frequencies of
two or more classes are added

 Cumulative relative frequency: The proportion of


the total number of observations that have a value
less than or equal to the upper limit of the interval

 Mid-point: The value of the interval which lies


midway between the lower and the upper limits of
a class

27
 True limits: Are those limits that make an
interval of a continuous variable continuous in
both directions

 Used for smoothening of the class intervals

 Subtract 0.5 from the lower and add it to the


upper limit

28
29
Guidelines for
constructing tables
 Tables should be self-explanatory

 Include clear title telling what, when and where

 Clearly label the rows and columns

 State clearly the unit of measurement used

 Explain codes and abbreviations in the foot-note

 Show totals
 If data is not original, indicate the source in foot-
note
30
Graphical
presentation of data

 Help users to obtain at a glance an intuitive feeling


of the data

 Should be self-explanatory

 Must have a descriptive title, labeled axes and


indication of the units of measurement

31
Graphical
presentation

Importance of Graphical presentation:


 Diagrams have greater attraction than mere figures

 They give quick overall impression of the data

 They have great memorizing value than mere


figures

 They facilitate comparison

 Used to understand patterns and trends


32
Graphical
presentation

 Well designed graphs can be powerful means of


communicating a great deal of information

 When graphs are poorly designed, they not only


ineffectively convey message, but they are often
misleading

33
Types of graphs
 Categorical data
– Bar chart
– Pie-chart
 Quantitative data
– Histogram
– Frequency Polygon
– Ogive
– Stem-and-leaf plot
– Box plot
– Scatter Diagram
34
Bar chart

Definition:
 A graph made of bars whose heights represent
the frequencies of respective categories is called
a bar graph.

35
Bar chart
 Used to display frequency contained in the
frequency distribution of categorical variable

 It is used with categorical data

 Each bar represent one category and its height is


the frequency or relative frequency
o y – axis: Frequency or the relative
frequency or percentage
o x – axis: Category
36
Bar chart
Rules
o Bars should be separated

o The gap between each bar is uniform

o All bars should be of the same width

o All the bars should rest on the same line called the
base

o It is very important that Y axis begin with 0

o Label both axes clearly


37
Simple bar chart
The simple bar chart is appropriate if only one
variable is to be shown
60
53.9
50
40.6
40
Percentage

30

20

10 5.5

0
First trimester Second trimester Third trimester

Figure 1 : First ANC booking time among pregnant women in X


Town, Ethiopia, 2017 38
Clustered bar chart
95 90
90 First day
85
80 74.3
75 Second and subsquent days
70
65
60
55
Percent

50
45
40
35
30 25.7
25
20
15 10.0
10
5
0
Urban Rural
Residence

Figure 2 : Timing of health care seeking reported by place of


residence, X District, Ethiopia, 2011. 39
Pie-chart
A pie chart: is a circle that is divided into
sections according to the percentage of
frequencies in each category of the distribution

 Used for a single categorical variable relative


frequency

 Each slice of pie correspond at relative


frequency of categories of variable

40
Pie-chart

Steps to construct a pie-chart


 Construct a frequency table

 Change the frequency into percentage (P)

 Change the percentages into degrees, where:


degree = Percentage X 360o

 Draw a circle and divide it accordingly

41
Example
Digestive Others
System 8%
Injury and 4%
Poisoning
3%

Respiratory
system ciculatory
13% system
42%

Neoplasmas
30%

Figure 3: Distribution for cause of death for females, in


England and Wales, 1989
42
Histogram

 Histograms are frequency distributions with


continuous class intervals that have been
turned into graphs

 To construct a histogram, we draw the interval


boundaries on a horizontal line and the
frequencies on a vertical line

43
Histogram
 In a histogram, the bars are drawn adjacent to
each other

 The bars are drawn to touch each other, to show


the underlying continuity of the data

 In a histogram, the area of each bar is proportional


to the frequency of observations in the interval

44
Example
Using the following frequency distribution of the
home runs hit by Major League Baseball teams
during the 2002 season, construct the histogram

Total Home Runs f


124 – 145 6
146 – 167 13
168 – 189 4
190 – 211 4
212 - 233 3
45
 Class boundaries and their Frequency and
cumulative frequency distributions

Total Home Cumulative


Class Boundaries Frequency
Runs frequency
124 – 145 123.5 - 145.5 6 6
146 – 167 145.5 - 167.5 13 19
168 – 189 167.5 - 189.5 4 23
190 – 211 189.5 - 211.5 4 27
212 - 233 211.5 - 233.5 3 30
Total 30
46
Histogram

15

12

9
Frequency

0
123.5 145.5 167.5 189.5 211.5 233.5
Figure 4: Total home runs hit by all players of each of the 30
Major League Baseball teams during the 2002 season
47
Frequency
polygon

 Frequency polygon: Is a graph formed by joining


the midpoints of the tops of successive bars in a
histogram with straight lines

 The total area under the frequency polygon is


equal to the area under the histogram

48
Frequency polygon

15

12

9
Frequency

0
134.5 156.5 178.5 200.5 222.5

Figure 5: Total home runs hit by all players of each of the 30


Major League Baseball teams during the 2002 season
49
Ogive

 Ogive: Is a curve drawn for the cumulative


frequency distribution by joining with straight lines
the dots marked above the upper boundaries of
classes at heights equal to the cumulative
frequencies of respective classes

50
Ogive
 It is obtained as follows:
On a vertical axis we mark cumulative frequency
On a horizontal axis we mark the upper
boundaries of all classes. However, the lower
boundary of the first class will be the starting
point
Then, a smooth curve is drawn joining all these
points

51
 Class boundaries and their Frequency and
cumulative frequency distributions

Total Home Cumulative


Class Boundaries Frequency
Runs frequency
124 – 145 123.5 - 145.5 6 6
146 – 167 145.5 - 167.5 13 19
168 – 189 167.5 - 189.5 4 23
190 – 211 189.5 - 211.5 4 27
212 - 233 211.5 - 233.5 3 30
Total 30
52
Ogive

30

25
Cumulative frequency

20

15

10

123.5 145.5 167.5 189.5 211.5 233.5

Figure 6: Total home runs hit by all players of each of the 30


Major League Baseball teams during the 2002 season 53
Stem-and leaf plot
 Another common tool for visually displaying
continuous data is the ―stem and leaf‖ plot

 Allows for easier identification of individual values


in the sample
 Very similar to a histogram

 Are most effective with relatively small data sets

 Helps to understand the nature of data

– Presence or absence of symmetry

54
Stem-and leaf plot
 Can be constructed as follows:
(1) Separate each data point into a stem component
and a leaf component
The stem component consists of the number
formed by all but the rightmost digit of the
number, and the leaf component consists of the
rightmost digit. Thus the stem of the number
483 is 48, and the leaf is 3

(2) Write the smallest stem in the data set in the


upper left-hand corner of the plot
55
Data of birth weights from 100 consecutive
deliveries

56
Stem-and-leaf plot for the birth weight data
(N=100)
Stem Leaves

57
Stem-and-leaf plot can be constructed as
follows:
(3) Write the second stem, which equals the fist stem
+ 1, below the fist stem

(4) Continue with step until you reach the largest stem
in the data set

(5) Draw a vertical bar to the right of the column of


stems

(6) For each number in the data set, find the


appropriate stem and write the leaf to the right of
the vertical bar
58
Box plot
 One way to give a nice profile of a data set is the
box plot

 Gives good insight into distribution shape in terms


of skewness and outlying values

 Very nice tool for easily comparing distribution of


continuous data in multiple groups—can be plotted
side by side

59
Box plot: BP for 113 Males
Boxplot of Systolic Blood Pressures
Sample of 113 Men

60
Box plot: BP for 113 Males

Box plot of Systolic Blood Pressures


Sample of 113 Men

Sample Median
Blood Pressure

61
Box plot: BP for 113 Males
Boxplot of Systolic Blood Pressures
Sample of 113 Men

75th Percentile

25th Percentile

62
Box plot: BP for 113 Males

Boxplot of Systolic Blood Pressures


Sample of 113 Men

Largest Observation

Smallest Observation

63
Tabular and Graphical Procedures
Data
Qualitative Data Quantitative Data

Tabular Graphical Tabular Graphical


Methods Methods Methods Methods

• Frequency •Bar Graph • Frequency • Histogram


Distribution •Pie Chart Distribution • Frequency polygon
• Rel. Freq. Dist. • Rel. Freq. Dist. • Ogive
• Cumulative Freq. • Cum. Freq. Dist. • Scatter
Distribution • Cum. Rel. Freq. Diagram
• Cumulative Rel. Freq. Distribution
Distribution
64
65

You might also like