0% found this document useful (0 votes)
22 views62 pages

Lec 3 (Data Organization)

The document outlines methods for data organization and presentation, focusing on descriptive statistics and the importance of summarizing raw data for clarity. It describes various techniques for organizing categorical and quantitative variables, including frequency distribution tables and graphical representations such as bar charts and pie charts. Additionally, it provides guidelines for constructing tables and graphs to enhance data interpretation and communication.

Uploaded by

Begidu Yilma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views62 pages

Lec 3 (Data Organization)

The document outlines methods for data organization and presentation, focusing on descriptive statistics and the importance of summarizing raw data for clarity. It describes various techniques for organizing categorical and quantitative variables, including frequency distribution tables and graphical representations such as bar charts and pie charts. Additionally, it provides guidelines for constructing tables and graphs to enhance data interpretation and communication.

Uploaded by

Begidu Yilma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

Arba-Minch University

College of Medicine and Health sciences


School of Public Health

Methods data organization and presentation

By: Etenesh K. (BSc, MPH( Epidemiology & Biostatistics))


Learning objectives

At the end of this session, students will be able to:


 Identify different methods of data organization and

presentation
 use the different methods of data organization and

presentation
Descriptive statistics

• Descriptive statistics are used to


summarize data in the form of
graphs and numerical measures.
• After collecting data, the first task for
a researcher is to organize and
simplify the data.
Brainstorming

• Why data organization and


summarization?
Cont..

• Collected data is raw


• Hence, it cannot report much information unless
organized
• More easily to determine what information the data
contain
• To get general overview of the results easily.
Cont..

 Numbers that have not been summarized and


organized are called raw data.
 In most cases, useful information is not
immediately evident from the mass of unsorted
data.
 Before interpretation & communication of the
findings, the raw data must be organized &
presented in a clear and understandable way.
02/09/2025 6
Cont..
• Descriptive statistics: is used to organize and interpret
research observations and findings.
• However, before performing any analyses (like organization),
you must first get to know your data (i.e. characteristics of
your variables).
• The summary technique used depends on the variable type
under consideration.

02/09/2025 7
Cont..
1. For categorical variables
A. Using table of frequency distribution
1. Frequency counts 2. Relative frequency
3. Cumulative frequency 4. Relative
cumulative frequency
B. Using pictorial forms
1.Bar charts(graph) 2.Pie charts

02/09/2025 8
Cont..
2. For Quantitative variable
A. Using table of grouped frequency
distributions
1.Frequency counts 2. Relative frequency
3.Cumulative frequencies 4.Relative cumulative
frequency
B. Using pictorial forms
1.Histogram 2.Frequency polygon 3.Line
graph…
02/09/2025 9
Frequency table & Frequency Distributions

Frequency:
• The number of same values within a data set.

Frequency distribution table :


 A table which contains the values of a variable and the

corresponding frequencies with which each value occurs (or


how often each value occurs).

02/09/2025 10
Cont..
Example 1: The blood type of 30 patients were
given as follows;

– A AB B B A O O AB AB B O A A B B
A AB A O AB B AB AB O A AB AB O A
O

• Construct frequency distribution for it?

02/09/2025 11
Cont..

• The distribution condenses the raw data


into a more useful form and allows for a
quick visual interpretation of the data.
• Simple and effective way for summarizing
large amounts of data

02/09/2025 12
Cont..
Relative Frequency
 A relative frequency distribution: shows the

proportion of counts that fall into each class or category.


 A relative frequency value for any category is obtained

by dividing the number of observations in that


category by the total number of observations.
 This can be reported as a percentage by multiplying the

resulting fraction by 100.

02/09/2025 13
Cont..
From the previous example

02/09/2025 14
Cont..
Cumulative frequency
 It is the number of observations in the category plus

observations in all categories smaller than it.


Cumulative relative frequency
 It is the proportion of observations in the category

plus observations in all categories smaller than it.


 It is obtained by dividing the cumulative frequency

by the total number of observations.


02/09/2025 15
Cont..
Distribution of birth weight of newborns between 1976-1996 at
X town.

02/09/2025 16
Cont..

• Cumulative frequency and cumulative


relative frequency distributions are useful
for describing particularly ordinal data.

• It tells nothing in nominal data.

• E.g. You will never say 70% are below


blue color.
02/09/2025 17
Frequency distribution for numerical variables

• For Quantitative variable, we need to select a set of


continuous, non-overlapping intervals such that each
value can be placed in one, and only one, of the
intervals.

• For both discrete or continuous data, the values are


grouped into distinct non-overlapping intervals, usually
of equal width.

• The first consideration is how many intervals to include

02/09/2025 18
Cont..

02/09/2025 19
Cont..
• To divide the data into groups or intervals or classes.
we need to determine:

1. The number of intervals (k): choosing the number


of classes

2. The range (R): It is the difference between the


largest and the smallest observation in the data set.

3. The Width of the interval (w): Class intervals


generally should be of the same width. Each class
should cover, namely, from where to where each
class should go.

02/09/2025 20
Cont..
To determine the number of class intervals and the
corresponding width, we may use:
Sturge’s rule:

where
K = number of class intervals n = no. of
observations
W = width of the class interval L = the largest value
S = the smallest value
02/09/2025 21
Cont..
Example: Leisure time (hours) per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20 22 14 13
10 19 27 29 22 38 28 34 32 23 19 21 31 16 28 19 18
12 27 15 21 25 16
K = 1 + 3.322 (logn)
K = 1 + 3.322 (log40) = 6.32 ≈ 6
Maximum value = 38, Minimum value = 10
W=L-S/K
W = (38-10)/6 = 4.66 ≈ 5
02/09/2025 22
Cont..

02/09/2025 23
Cont..
• Classes should be mutually exclusive.

• Make sure that the smallest and largest values fall within
the classification.
• None of the values can fall into possible gaps between
successive classes, and that the classes do not overlap,
namely, that successive classes have no values in
common.
• I.e. Class intervals should be continuous, non
overlapping, mutually exclusive and exhaustive

02/09/2025 24
Cont..

Class Limit: The range for each class


Upper class limit

Lower class limit

Mid-point ( class mark): The value of the


interval which lies midway between the
lower and the upper limits of a class.

02/09/2025 25
Cont..
Class boundary (True limits): Are those
limits that make an interval of a continuous
variable continuous in both directions
Upper class boundary

Lower class boundary

 Subtract 0.5 from the lower and add it to the

upper class limit


02/09/2025 26
Time
(Hours) True limit(class Mid-point Frequency
boundary)

10-14 9.5 – 14.5 12 5


15-19 14.5 – 19.5 17 11
20-24 19.5 – 24.5 22 12
25-29 24.5 – 29.5 27 7
30-34 29.5 – 34.5 32 3
35-39 34.5 - 39.5 37 2
Total 40

27
Exercise
• These data represent the record high temperatures in degrees
Fahrenheit (F) for each of the 50 states. Construct a grouped
frequency distribution
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114

02/09/2025 28
Cont…
Step 1 Determine the classes.
• Find the highest value and lowest value:

H = 134 and L= 100.


• Find the range: R highest value - lowest value

R= 134-100= 34
• Select the number of classes desired; Using this
formula K = 1 + 3.322×log(n), where n is the
number of observations (n=50)
K= 1 + 3.322×log(50) ≈ 7 classes

02/09/2025 29
Cont…

• Find the class width by dividing the range by the


number of classes.

• Width = R/Number of classes

= 34/7= 4.9

02/09/2025 30
Con…
 Select a starting point for the lowest class limit.
• Add the width to the lowest score taken as the
starting point to get the lower limit of the next
class & keep adding
 100
 105
 110
 115
 120
 125
 135
02/09/2025 31
• Subtract one unit from the lower limit of the second
class to get the upper limit of the first class.
• Then add the width to each upper limit to get all the
upper limits.
• 105 – 1 = 104
then
 109
 114
 119
 124
 129
 134
 139
Step 2: determine the midpoint for each interval.
Step 3: Find the numerical frequencies from the
distribution.
02/09/2025 32
Cont..
Class limit Class boundary Mid point Frequency
(true limit)
100-104 99.5-104.5 102 2
105-109 104.5-109.5 107 8
110-114 109.5-114.5 112 18
115-119 114.5-119.5 117 13
120-124 119.5-124.5 122 7
125-129 124.5-129.5 127 1
130-134 129.5-134.5 132 1

02/09/2025 33
Guidelines for constructing tables

 Keep them simple (Limit the number of variables to three

or less)
 All tables should be self-explanatory(Include clear title

telling what, when and where)


 Clearly label the rows and columns

 State clearly the unit of measurement used

 Explain codes and abbreviations in the foot-note

 Show totals

 If data is not original, indicate the source in foot-note.


02/09/2025 34
Pictorial /Diagrammatic presentation

Importance of diagrammatic presentation


1.Diagrams have greater attraction than mere
figures
2. They give quick overall impression of the data
3. They have great memorizing value than mere
figures
4. They facilitate comparison
5. Used to understand patterns and trends
02/09/2025 35
cont..
Specific types of graphs include:
• Bar graph
• Pie chart Nominal, ordinal data

• Histogram
• Frequency polygon
• Box plot
Quantitative
• Ogive curve data
• Scatter plot
• Line graph
• Others
36
1. Bar charts (Graphs)

 Graphical equivalent of a frequency table

 Categories are listed on the horizontal axis

(X-axis)
 Frequencies or relative frequencies are

represented on the Y-axis.


 The height of each bar is proportional to

the frequency or relative frequency of


02/09/2025 37
Simple bar chart
• It is a one-dimensional diagram in which the
bar represents the whole of the magnitude.

• The height or length of each bar indicates the


size (frequency) of the figure represented.

• The bars are not joined together (leave space


between bars)

02/09/2025 38
Cont..

02/09/2025 39
Multiple bar chart
 In this type of chart the component
figures are shown as separate bars
adjoining each other.

 The height of each bar represents the


actual value of the component figure.

 It depicts distributional pattern of more


than one variable.
02/09/2025 40
Cont..

02/09/2025 41
Sub-divided (component) bar chart

 If there are different quantities forming the sub-

divisions of the totals, simple bars may be sub-


divided in the ratio of the various sub-divisions
to exhibit the relationship of the parts to the
whole.
 The order in which the components are shown in

a “bar” is followed in all bars used in the


diagram.
02/09/2025 42
Cont..

02/09/2025 43
Cont..
Method of constructing bar chart

• All the bars must have equal width

• The different bars should be separated by equal


distances

• All the bars should rest on the same line called


the base

• Label both axes clearly

02/09/2025 44
2. Pie chart

 Shows the relative frequency for each

category by dividing a circle into sectors.


 The angles are proportional to the relative

frequency.
 Used for a single categorical variable

 Use percentage distributions

02/09/2025 45
Steps to construct pie chart

• Construct a frequency table


• Change the frequency into percentage (P)
• Change the percentages into degrees,
where, degree = Percentage X 360o
• Draw a circle and divide it accordingly

02/09/2025 46
Example: Distribution of deaths for females, in England
and Wales, 1989.
Cause of death No. of death
Circulatory system 100 000
Neoplasm 70 000
Respiratory system 30 000
Injury and poisoning 6 000
Digestive system 10 000
Others 20 000
Total 236 000

47
Distribution fo cause of death for females, in England and Wales, 1989

Others
8%
Digestive System
4%
Injury and Poisoning
3%

Circulatory system
Respiratory system
42%
13%

Neoplasmas
30%

48
Histogram
 Histograms are frequency distributions with continuous

class interval that have been turned into graphs.


 A histogram is a type of bar chart, but there are no

spaces between the bars.


 Histograms are used to visually depict frequency

distributions of continuous data.


 Given a set of numerical data, we can obtain impression of

the shape of its distribution by constructing a histogram.

02/09/2025 49
Cont..
 Constructed by choosing a set of non-overlapping class

intervals & counting the number of observations that


fall in each class.
 It is necessary that the class intervals be non-

overlapping so that each observation falls in one and


only one interval.
 Bars are drawn over the intervals

 The area of each bar is proportional to the frequency of

observations in the interval


02/09/2025 50
Example: Distribution of the age of women at the time of
marriage

02/09/2025 51
Cont..
Two problems with histograms
1. They are somewhat difficult to construct
2. The actual values within the respective groups are lost and
difficult to reconstruct. (we “lose” the information about
individual data values when we group the data).

 stem-and-leaf plot overcomes these


problems
02/09/2025 52
Frequency polygon

• The frequency polygon is a graph that displays the data by


using lines that connect points plotted for the frequencies at
the midpoints of the classes.

– The frequencies are represented by the heights of the points.

• To draw a frequency polygon we connect the mid-point of


the tops of the cells of the histogram by a straight line.

• Frequency polygons are superior to histograms for comparing


two or more sets of data.

02/09/2025 53
Frequency polygon for the ages of 2087 mothers with <5
children, Adami Tulu, 2003

700

600

500

400

300

200

100 Std. Dev = 6.13


Mean = 27.6
0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0

N1AGEMOTH

54
It can be also drawn without erecting rectangles by
joining the top midpoints of the intervals representing
the frequency of the classes as follows:

02/09/2025 55
Scatter plot
 Most studies in medicine involve measuring more

than one characteristic.


 For two quantitative variables we use bivariate plots

(also called scatter plots or scatter diagrams).


 In the study on percentage saturation of bile,

information was collected on the age of each patient.


 To see whether a relationship existed between the

two measures. (saturation of bile & age).

02/09/2025 56
Cont..

02/09/2025 57
Line graph
• The line graph is especially useful for the study of
some variables according to the passage of time.

• The time, in weeks, months or years is marked along


the horizontal axis; and the value of the quantity that
is being studied is marked on the vertical axis.

• The distance of each plotted point above the base-line


indicates its numerical value.

• The line graph is suitable for depicting a consecutive


trend of a series over a long period

02/09/2025 58
Cont..

Figure (1): Maternal mortality rate of (country), 1960-2000

02/09/2025 59
Cont..

02/09/2025 60
Reading assignment

 Read about, Steam and leaf plot, ogive curve


and box and whisker plot

02/09/2025 61
Thank
you
02/09/2025 62

You might also like