0% found this document useful (0 votes)

12 views69 pages

Biostatistics and Demography - Lecture 2

Uploaded by

Christine Ayamba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views69 pages

Biostatistics and Demography - Lecture 2

Uploaded by

Christine Ayamba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 69

Mbarara University of Science and

Technology
Faculty of Medicine, Department of Pharmacy

Biostatistics
&
Demography
Summarizing and presenting data

Edward .J. LUKYAMUZI (MPS)

P.O. Box 1410, Mbarara, Uganda, http://www.must.ac.ug

1
Objectives…
• Explain how proportions and percentages are
calculated
• Explain two methods for graphically
displaying the distribution of categorical data.
• Explain the method of construction of a
histogram; describe the shape, centre, and
spread
• Define a percentile, name the 3 important
percentiles and derive their values from a
frequency table
• Define and describe the characteristics of the
mean, median, and mode, and identify when
to use each.
• Define and describe the characteristics of
standard deviation, interquartile range and
identify when to use each
2
Recap…
Data types

Categorical Quantitative

Nominal Ordinal Continuous

Discrete (Measured on a continuum)
(count data)

Binary or
Dichotomous

Ratio Interval

3
Four scales of
measurement…
Introduction…
• In public health and health research we are
interested in describing a group (of people or
things e.t.c.) rather than an individual person.
• We have noted the inherent variability in biological,
socio-economic, or behavioral processes.
• Knowledge and application of appropriate
statistical methods enable us to describe groups of
people with varying experiences.

5
Summarizing and Presenting data : Methods and
tools
• Summarising categorical data
• Counts
• Proportions
• Percentages
• Rates, Ratios
• Summarizing quantitative data
• Measures of central tendency (Mode, median,
mean)
• Measures of dispersion (minimum &
maximum, IQR, standard deviation)
• Graphical presentation of data
• Categorical data (bar chart, pie chart)
• Continuous data (histogram, box plot)

6
Summarising Categorical Data:
Frequency counts
• The frequency distribution tells us how many
times different values of a variable occur in a
given sample or population. Frequency
distribution of a random sample of 200 new
students:

Example: Frequency distribution of PHA III

Sex
students by sex Number of
students: counts
Males 36
Females 14
Total 50
7
Summarizing Categorical Data:
Proportions
• Proportion
• Numerator is included in the denominator
(a / a+b) , where a+b=n
e.g. #students who are Males /Total number
of students

Frequency Distribution of Year 1 Students by Sex

Sex Number of Proportion
students
Males 108 0.54
Females 92 0.46
Total 200 1.0
8
Summarizing Categorical Data:
Percentages
• Proportions are often converted into
percentages for ease of interpretation.
• Percentages are relative frequencies
expressed per 100.
– Frequency% =(a/n)*100

Percentage of Students by Marital Status

Sex Number of Proportion Percentage
students
Males 108 0.54 54.0
Females 92 0.46 46.0
Total 200 1.0 100.0

9
Other Basic Measures of Frequency:
Ratios

Ratios show the relative sizes of two events

Examples
•Number of doctors to population size
•Number of nurses to number of patients in
a hospital
•Number of girls to number of boys
•Number of households with a bed net to
number of households without a bed net

10
Other Basic Measures of Frequency: Rates

A rate is a measure of frequency of

occurrence of events per unit time.

For example
-Number of disease events per year
-Number of cases of dysentery per week
• Numerator is number of events
• Denominator is total time of observation

• Time may be hours, days, weeks, months,

years etc

11
Mbarara University of Science and
Technology
Faculty of Medicine, Department of Pharmacy

SUMMARIZING CONTINUOUS
DATA

P.O. Box 1410, Mbarara, Uganda, http://www.must.ac.ug

12
Distribution of continuous data

• Values of continuous data exist on a

continuum.
• As the sample size increases, the number of
unique values are so large, that it is difficult
to summarise such data by making frequency
counts of individual values

• See example of frequency distribution of

total cholesterol of 250 people.

13
Example: Frequency distribution of
cholesterol in mg/dl (n=250)
totchol Freq. Percent Cum.

135 1 0.40 0.40

150 2 0.81 1.21
154 1 0.40 1.61
159 1 0.40 2.02
163 1 0.40 2.42
164 1 0.40 2.82
. . . .
. . .
.
320 2 0.81 98.39
325 1 0.40 98.79
326 1 0.40 99.19
332 1 0.40 99.60
464 1 0.40 100.00
14
Distribution of Continuous data
• Continuous data is first grouped into
classes comprising of successive
ranges of values of the variable of
interest

• In each class/group, frequency counts

are determined

• Frequency distributions of continuous

data have a shape

15
Frequency distribution of cholesterol data,
Class width is 20 g/dl
totcholcat Freq. Percent Cum.

135- 4 1.61 1.61

155- 12 4.84 6.45
175- 26 10.48 16.94
195- 35 14.11 31.05
215- 40 16.13 47.18
235- 52 20.97 68.15
255- 39 15.73 83.87
275- 19 7.66 91.53
295- 15 6.05 97.58
315- 5 2.02 99.60
455- 1 0.40 100.00

Total 248 100.00

Frequency distributions of continuous data have a
shape; easier to see on a graph
16
Frequency distribution ( from Table
above) presented on a Graph
(Histogram)
20
Observe the
Shape,
Percent of the sample

Centre and
15

Spread of the
data.
5 10 0

100 200 300 400 500

Total cholesterol, g/dl

17
Tools for Visualising distribution of continuous
data: Histogram
20
Percent of the sample

Does the centre

represents the
Mean, Median
and Mode?
5 10 0

100 200 300 400 500

Total cholesterol, g/dl

18
Recall: Frequency distribution of cholesterol in
mg/dl (n=250)
totchol Freq. Percent Cum.

135 1 0.40 0.40

150 2 0.81 1.21
154 1 0.40 1.61
159 1 0.40 2.02
163 1 0.40 2.42
164 1 0.40 2.82
. . . .
. . .
.
320 2 0.81 98.39
325 1 0.40 98.79
326 1 0.40 99.19
332 1 0.40 99.60
464 1 0.40 100.00
19
Summarising the distribution of total
cholesterol data (Continuous data)

• Minimum is 135 mg/dl

• Maximum is 464 mg/dl
• Sample size is 250
• The complete frequency table is too
long, making it difficult to make
meaning out of the frequency list.
• A histogram can help visualise the
distribution

20
Creating a Histogram
 Determine the minimum and maximum values
for the variable of interest e.g. income
• Minimum is 135, Maximum is 464
• Determine the range (max-min=329)
 Determine the number of classes/groups you
want to have e.g. 11.
 Determine the Class Interval width ≈Range
divide by number of classes ≈30 mg/dl
 Create mutually exclusive categories/classes of
the original (continuous) variable.
 Classes are also called bins .
 Start the first interval at a convenient value
below the minimum. 134.5 g/dl,
 Therefore first class will be 134.5 to 164.5
g
 Second class will be 164.5 to 194.5 g/dl
21
Creating a Histogram
 Determine the frequency counts and relative
frequency for each category/class
 To plot the graph:
• On the horizontal axis mark equally spaced
values of the lower boundary of each class
• On the vertical axis, the length represents
the frequency
• Plot the frequencies for each class as bars.
• The height of the bar will be proportional to
the frequency of that class.
• The width of the bars is the same.

22
30

Is the shape of a
20

Histogram
Percent

sensitive to the
number of
Class width is ≈30 mg/dl
classes?
10
0

100 200 300 400 500

total cholesterol, mg/dl

23
20 Shape of a
Histogram is
15

sensitive to the
number of classes
Percent
10

Class width is 16.45 mg/dl

5
0

100 200 300 400 500

total cholesterol, g/dl

24
Distribution of continuous data
• The shape of the frequency distribution can
be symmetrical or asymmetrical
• A symmetric distribution has the same
shape on both sides of the mean (the
centre)
• If outlying values occur only in one direction,
the distribution is said to be skewed
• Normally distributed data has zero skewness

25
Shape of distribution of continuous
data:
Symmetrical
Frequency

Values of a continuous variable (X)

• Symmetrical distribution has the same shape
both sides of the mean.
• Horizontal axis represents the X-variable while
the Vertical axis represents the frequencies.

26
Shape of distribution of continuous data:
Skewed to the right

27
Shape of distribution of continuous data:
Skewed to the Left

28
Mbarara University of Science and
Technology
Faculty of Medicine, Department of Pharmacy

MEASURES OF CENTRAL
TENDENCY

P.O. Box 1410, Mbarara, Uganda, http://www.must.ac.ug

29
Measures of central tendency
• The central tendency of a distribution is an
estimate of the "center" of a distribution of
values.
• Enable us to describe the characteristics of a
typical member of a group of people.
• Three major types of estimates of central
tendency:
• Mode
• Median
• Mean (Arithmetic mean)
• Other measures of central tendency
• Geometric mean
30
Mode

• The mode is the most frequently

occurring value in the set of observations

Using this age data,

45 19 23 10 16 21 25 17 21 18 15 18 21 13 16 23 21
24 18 19 26 20 21 19 20 25 26 20 23 8 23 18 24 16
30 24 15 22 27 20
What is the modal age

• In a given dataset, a variable may have

more than one mode (bimodal
distribution)
– distribution with two different peaks

31
Median
• The Median is the score found at the exact
middle of the set of values
• Arrange observations in order of magnitude,
the median is the middle observation
• Median divides the set of observations into
two equal parts such that the number of
values equal to or greater than the median is
equal to the number of values less than or
equal to the median
Consider the following age data:
45 19 23 10 16 21 25 17 21 18 15 18 21 13 16 23 21 24 18 19 26 20 21 19
20 25 26 20 23 8 23 18 24 16 30 24 15 22 27 20

- Is The median age of the 40 individuals is 20.5

years

32
The Arithmetic Mean
• The arithmetic mean – is the most popular
measure of central tendency
• Calculation of mean requires numerical data
• For a given variable, the mean is obtained by
1. Adding all values in the sample
2. Dividing the sum by the number of
observations in the sample (sample size)
• For a given set of data and variable, there is
only one mean

33
Arithmetic mean- computation

• Calculating the arithmetic mean

• Formula

– where n is the sample size and xi is the random

variable.
• The mean is affected by outliers(extreme but
legitimate values)

34
Arithmetic mean -Example
For this set of observations in a sample, calculate the mean.

45 19 23 10 16 21 25 17 21 18 15 18 21 13 16 23 21 24 18 19
26 20 21 19 20 25 26 20 23 8 23 18 24 16 30 24 15 22 27 20

Sample size (n)=40

Step 1: First obtain a sum:
45+ 19+ 23+ 10+ 16+ 21+ 25 +17 +21 +18 +15 + 18 + 21+ 13+ 16+ 23+
21+ 24+ 18+ 19+26+ 20+ 21 +19 +20 + 25 +26 + 20 +23 + 8 +23 +18 +24
+16 +30 +24 +15 +22+ 27 +20

Step 2: Divide the sum by the sample size (n)

Thus, arithmetic mean =sum/n =20.75

Note: The arithmetic mean is affected by outliers.

35
Summary of the Age data of 40
individuals:

• Median=20.5 years,
• Modal age=21 years
• Mean=20.75 years

• Based on these measures, what can

you conclude about the shape of the
distribution of age in this sample?

36
Using a graph to see the distribution helps
to identify key features such as presence of
25
20 outliers

Outliers
15
Percent
10
5
0

8 10 1214 16 18 2022 24 2628 3032 34 3638 40 42 44

Age(years)

37
Outliers & Arithmetic Mean
• Outliers fall outside the general pattern of the
distribution.
• The value of the arithmetic mean is sensitive
to/affected by outliers

• If the data contains outliers-

1. Find out if there is a problem with data entry.
2. Assess the extent to which the outlier
affects the results: for example calculate the
mean with and without the outlier

38
The Geometric Mean
• For a variable X with observations xi, the
geometric mean of a set of n observations is
equal to the nth root of the cross-product of the
n observations.
• For a given data set, the geometric mean is less
than or equal to the arithmetic mean.

Geometric mean=

• Do not calculate the geometric mean if you

have a zero or negative value in your data
set.
39
Application of Geometric Mean
• The geometric mean is a useful measure of
central tendency for highly skewed data e.g.
gametocyte density, antibody titers;
• Geometric mean is used to describe the central
tendency of data which is basically skewed BUT
normally distributed on a log-scale.
• Compared to the arithmetic mean, the geometric
mean is less affected by extreme values.
• The GM is appropriate for describing
proportional growth, e.g. annual population
growth, average rate of return on investment
over a period of time.
40
Choice of appropriate measure
of central tendency
• The arithmetic mean is calculated for
numerical data and for symmetric or ≈
symmetric distributions
• The median is suitable for ordinal data or
numerical data if the distribution is skewed
• The mode is used to describe bimodal
distributions (esp. in disease-age/time
distributions) e.g. seasonal distribution
of malaria
• For a symmetric distribution, the mean ≈
median ≈ mode
41
Mbarara University of Science and
Technology
Faculty of Medicine, Department of Pharmacy

P.O. Box 1410, Mbarara, Uganda, http://www.must.ac.ug

Mbarara University of Science and
Technology
Faculty of Medicine, Department of Pharmacy

MEASURES OF VARIATION

P.O. Box 1410, Mbarara, Uganda, http://www.must.ac.ug

43
Assessing Variation/Dispersion
in Data
 Variation of data is also commonly referred to as
dispersion.
 Dispersion refers to the spread of the values around the
central point.
 The starting point when assessing dispersion in a set of
data is to use visualisation tools such as the Histogram,
the Box Plot, Symmetry Plot, and Quantile plot.
 These tools enable the researcher to make a qualitative
description of the extent of variation observed in the
data.
 Data variation is often quantified using measures such
as the range, percentiles, inter-quartile range, the
standard deviation, coefficient of variation , and
standard error.

44
Tools for Visualising distribution of continuous
data: Histogram
20

Observe the
Spread of the data.
Percent of the sample
15

The graph (Histogram)

helps to see the spread
10

of the data
5
0

100 200 300 400 500

Total Cholesterol, mg/dl

Starts at 135 mg/dl; Class width is 21.93 mg/dl

45
Alternative methods for viewing
the distribution of continuous
data
• Although the histogram is a popular tool for
displaying visualising the distribution of
continuous data, histograms are sensitive to
the number of bins used in their construction.
• They can be inaccurate in informing
researchers about the nature of the distribution
of data.
• Other approaches to understanding the
distribution of continuous data include: Box
Plot, Symmetry Plot, Quantile plot etc

46
Tools for Visualising variation:
the Box Plot
Box Plot showing the distribution of total cholesterol.
500 400
totchol (mg/dl)
200 300
100

47
Box Plot- visualising variation in
age
Box Plot showing the distribution of age (n=40)
50
40
Age (years)
30
20
10

48
Box plot…
• Standardized way of displaying data.
• Based on five number summary;
1. Minimum
2. First quartile (Q1)
3. Median
4. Third quartile (Q3)
5. Maximum

49
How to draw a Box Plot
1. Sort the data from minimum to maximum
2. Determine the Min, Q1, Median, Q3, Maximum
3. Determine the IQR (i.e. Q3-Q1) and the value of IQR*1.5
4. Obtain the values of Q1-IQR*1.5, and Q3+IQR*1.5
5. Draw and Label a vertical line that includes the range of
the distribution
6. Draw a central box from Q1 to Q3
7. Draw a horizontal line for the median inside the box
8. Extend vertical lines (whiskers) from the box (at Q1 and
at Q3) out to the lower and upper bounds of data falling
within the general distribution (i.e. not outliers). Length
of the whisker is ≈1.5 times the IQR.
Determining Q1 & Q3

• These are known as the lower and upper quartiles

1. Arrange the data set in numerical order
2. If your data set has an odd number of
observations, Q1 is the median of the lower
half of the data set.
3. If your data set has an even number of
observations, Q1 is the average of the middle
two values of the lower half of the data set.
• Consider the data set {1, 3, 4, 7, 8, 9, 10, 12, 14}.
• Arrange the data in numerical order: {1, 3, 4, 7,
8, 9, 10, 12, 14}.
• (3+4)/2 = 3.5.
51
The Box
Plot
Box plot for
displaying the
distribution of
quantitative data.
Upper inner fence =Q3+
(1.5XIQR)

Lowe inner fence =Q1-

(1.5XIQR)
Any data that falls outside the fences are outliers. 52
Source: https://en.wikipedia.org/wiki/Boxplot

53
Box Plot: Location of fences
• When the calculated value of Upper
fence is greater than the maximum
observation in the data, the fence will
be located at the observed maximum
value.

• When the calculated value of the Lower

fence is less than the minimum value in
the data, the fence will be located at
the observed minimum value.
Box Plot showing the distribution of
cholesterol.

500 400
Observe
totchol (mg/dl)

the
location of
300

the median
relative to
Q1 and Q3
200 100

Data Values which fall outside the fences are called

outliers.
55
8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
Box Plot- distribution of age
Age (years)

Using this box

plot, estimate
the median, Q1
and Q3.

Data Values which fall outside the fences are called

outliers.
56
Tools for Visualising variation: the
Symmetry Plot
• Also known as a normal probability plot or a
normal plot.
• Assess whether a data set follows a normal
distribution.
• Scatter plot of the data against a theoretical
normal distribution.
• If the data is symmetric and approximately
bell-shaped, the points on the plot will form a
roughly straight line.
• If the data deviates significantly from
normality, the points on the plot will deviate
from a straight line.
57
Tools for Visualising variation: the
Symmetry Plot

250 200
Distance above median
The symmetry 150
plot showing
100

distribution of
data around the
median.
50 0

0 20 40 60 80 100
Distance below median

If data is symmetrically distributed, all plotted values

will lie along the reference line.
58
Quantifying variation: The
Range
• Range
• The range is the difference between the
highest value (maximum)minus the lowest
value (minimum)
• In practice the lowest and the highest values in the
data are reported

• Inter-quartile range
• Is the difference between the 1st quartile
(25th percentile) and the 3rd quartile(75th
percentile)
• The inter-quartile range contains the central
50% of the observations
59
Quantifying variation: Standard
Deviation
• Standard deviation is a measure of the
spread of observations about their mean
• It is a measure of how much on average each
of the values in the distribution deviates
from the mean
• Standard deviation is an essential part of
many statistical tests
• The value of the standard deviation is
affected by outliers

60
Calculating the Standard
Deviation
1. Calculate the arithmetic mean
2. Calculate and square the (difference
between each observation in the data set
and the mean)
3. Obtain a sum of the squared deviations
4. Divide the sum of the squared deviations by
n-1, (number of observations in the sample
minus one)

61
Computation of variance and standard
deviation
UG001 45 24.25 588.0625 Id_number age: xi (xi-mean) (xi-mean)^2
UG002 19 -1.75 3.0625 UG023 21 0.25 0.0625
UG003 23 UG024 19 -1.75 3.0625
UG004 10 UG025 20 -0.75 0.5625
UG005 16 UG026 25 4.25 18.0625
UG006 21 UG027 26
UG007 25 UG028 20
UG008 17 UG029 23
UG009 21 UG030 8
UG010 18 UG031 23
UG011 15 UG032 18
UG012 18
UG033 24
UG013 21
UG034 16
UG014 13
UG035 30
UG015 16
UG036 24
UG016 23
UG037 15
UG017 21
UG038 22
UG018 24
UG039 27
UG019 18
UG040 20
UG020 19
sum 830
UG021 26
mean
UG022 20
Variance=
Standard deviation=square-root of variance=

62
Median= 20.5, Mean=20.75,
Relationship between standard
deviation, the mean and
distribution of observations
• If the distribution of observations of a given
variable is approx normal:
– Approximately 68% of the observations in
the sample fall within one standard deviation
of the mean (Mean±1SD)
– Approximately 95% of the observations
in the sample fall within two standard
deviations of the mean (Mean±2SD)
– Approximately 99.7% of the observations in
the sample fall within three standard
deviations of the mean (Mean±3SD)

63
64
Summary of Cholesterol Data
• Sample size: 250 persons
• Minimum: 135 g/dl
• Median income: 237 g/dl
• Mean (Average): 236.3 g/dl
• Maximum: 464 g/dl
• Standard deviation: 42.6 g/dl

• Question. Based on the values of the mean

and median cholesterol, what can you say
about the distribution of cholesterol in this
sample?
65
Coefficient of Variation (CV)
• It is a ratio of the standard deviation to the
mean

• The formula is

• where SD=standard deviation and is the mean

• CV can be used to compare variation

between data sets
• Mainly applied in laboratory testing
and quality control procedures
66
Standard Error (SE)
• SE is used to assess how closely sample
estimates (like the sample mean) relate to
the population parameter (population mean)

• Used in computation of confidence intervals

and testing statistical significance

• (more on standard error later)

67
Choice of measures of
dispersion
• Standard deviation is appropriate when the mean is
used to describe central tendency (symmetric data)
• The inter-quartile range is used to describe the
central 50% of a distribution, regardless of its shape
• The percentile may also be used when the mean is
used but the objective is to compare a set of
observations with the norm
• The range is used with numerical data when the
purpose is to emphasize extreme values
• Percentiles and inter-quartile range are used when
the median is used (skewed data)
• The coefficient of variation is used when the intent is
to compare distributions of variables measured on
different scales
68
Take home assignment
• Using dummy data from the research questions that
were pitched in class, provide a summary of the data
collected as;
1.Summarize and present data on socio demographics of
the study population in percentages.
2.Present 4 separate variables of continuous data as;
a) Box plots
b) Symmentry lines.
3.Comment on the spread and symmetry of your data
findings
Please work in your groups to have a PowerPoint
presentation ready for a 5 minute presentation in
our next class

Research Methodology 2025
No ratings yet
Research Methodology 2025
91 pages
Cook P. Fundamentals of HTML, SVG, CSS and JavaScript For Data Visual. 2022
No ratings yet
Cook P. Fundamentals of HTML, SVG, CSS and JavaScript For Data Visual. 2022
87 pages
Agb Unit 1 PDF
No ratings yet
Agb Unit 1 PDF
63 pages
Solutions For Biostatistics For The Biological and Health Sciences 3rd Edition by Triola
No ratings yet
Solutions For Biostatistics For The Biological and Health Sciences 3rd Edition by Triola
17 pages
Topic 6 BPR Methodology
0% (1)
Topic 6 BPR Methodology
23 pages
Practical Biostatistics - Community Medicine
No ratings yet
Practical Biostatistics - Community Medicine
35 pages
Book Rust Devils
92% (12)
Book Rust Devils
39 pages
Lesson 2
No ratings yet
Lesson 2
151 pages
Enclosure 6307 - Product Description
No ratings yet
Enclosure 6307 - Product Description
25 pages
Biosa
No ratings yet
Biosa
99 pages
Statistics 180930091746
No ratings yet
Statistics 180930091746
117 pages
Gastrointestinal Physiology Jovile (1) - 1
No ratings yet
Gastrointestinal Physiology Jovile (1) - 1
218 pages
1 - Chapter 1 - Frequency Distribution and Graphs
No ratings yet
1 - Chapter 1 - Frequency Distribution and Graphs
29 pages
Chapter 3 Descriptive Biostatistics
No ratings yet
Chapter 3 Descriptive Biostatistics
103 pages
IBM Tivoli Monitoring Exploring
No ratings yet
IBM Tivoli Monitoring Exploring
172 pages
الفصل الثاني الاحصاء
No ratings yet
الفصل الثاني الاحصاء
67 pages
Cbse Class 10 Maths Competency Based Prcatice Questions Chapter 2
No ratings yet
Cbse Class 10 Maths Competency Based Prcatice Questions Chapter 2
3 pages
Teamlease Services Limited: Earnings Rs. Deduction Rs
No ratings yet
Teamlease Services Limited: Earnings Rs. Deduction Rs
1 page
Statistics 180930091746
No ratings yet
Statistics 180930091746
117 pages
Chap 2. Data Presentation
No ratings yet
Chap 2. Data Presentation
72 pages
Biostatistics Presentation Assignment
No ratings yet
Biostatistics Presentation Assignment
67 pages
DW 144
No ratings yet
DW 144
98 pages
Lecture 5 Representation of Data
No ratings yet
Lecture 5 Representation of Data
53 pages
Biostatistics: Statistics
No ratings yet
Biostatistics: Statistics
24 pages
B.SC (Computer Science) 2013 Pattern
No ratings yet
B.SC (Computer Science) 2013 Pattern
143 pages
Cancer and Oncogenesis
No ratings yet
Cancer and Oncogenesis
73 pages
AMTH 107 Statistics Part
No ratings yet
AMTH 107 Statistics Part
114 pages
Introduction To Bio-Statistics
No ratings yet
Introduction To Bio-Statistics
33 pages
MODULE IN STATISTICS Frequency Distribution and Graph
No ratings yet
MODULE IN STATISTICS Frequency Distribution and Graph
13 pages
MATH PRESENTATION - PPTX - 20250604 - 200835 - 0000
No ratings yet
MATH PRESENTATION - PPTX - 20250604 - 200835 - 0000
43 pages
Official Glossary - ISC 2 CC Preparation
No ratings yet
Official Glossary - ISC 2 CC Preparation
12 pages
Pmbok 6th Edition Free Download PDF
No ratings yet
Pmbok 6th Edition Free Download PDF
3 pages
P03 - Tables and Charts
No ratings yet
P03 - Tables and Charts
30 pages
How To Apply Initial Stress Using INISTATE
No ratings yet
How To Apply Initial Stress Using INISTATE
4 pages
Introduction and Frequency Distribution
No ratings yet
Introduction and Frequency Distribution
38 pages
3 Describtive Statistics
No ratings yet
3 Describtive Statistics
62 pages
Dna Mutations and Repair
No ratings yet
Dna Mutations and Repair
36 pages
Lecture 1 Renal Physiology
No ratings yet
Lecture 1 Renal Physiology
31 pages
Unit 2 Organizing and Displaying Data
No ratings yet
Unit 2 Organizing and Displaying Data
38 pages
CL - Concepts - m3
No ratings yet
CL - Concepts - m3
34 pages
02-Descriptive Statistics of Numerical Data - 52
No ratings yet
02-Descriptive Statistics of Numerical Data - 52
61 pages
Fundamentals of Biostat (Notes)
No ratings yet
Fundamentals of Biostat (Notes)
61 pages
MCHA022 (Analytical Chemistry 2)
No ratings yet
MCHA022 (Analytical Chemistry 2)
62 pages
02.3 Strategues For Understanding The Meaning of Data
No ratings yet
02.3 Strategues For Understanding The Meaning of Data
29 pages
Data Distribution-1
No ratings yet
Data Distribution-1
52 pages
Lecture 2 Chewing and Git Motility
No ratings yet
Lecture 2 Chewing and Git Motility
45 pages
Summarizing Data
No ratings yet
Summarizing Data
67 pages
Data Presentation, Analysis and Interpretation
No ratings yet
Data Presentation, Analysis and Interpretation
57 pages
2 Presentation2
No ratings yet
2 Presentation2
17 pages
Lecture 1 Introduction To Git
No ratings yet
Lecture 1 Introduction To Git
39 pages
Chapter1 Stat100
No ratings yet
Chapter1 Stat100
22 pages
1 Introduction To Biostatistics
No ratings yet
1 Introduction To Biostatistics
30 pages
s600 User Manual v2
No ratings yet
s600 User Manual v2
83 pages
Statistics
No ratings yet
Statistics
49 pages
Untitled
No ratings yet
Untitled
3 pages
Biostatistics Practicals With Key Answer 2
No ratings yet
Biostatistics Practicals With Key Answer 2
27 pages
02 - Presentation of Data
No ratings yet
02 - Presentation of Data
19 pages
Beginner's Ubuntu Handbook
No ratings yet
Beginner's Ubuntu Handbook
102 pages
Lecture 2 PDF
No ratings yet
Lecture 2 PDF
54 pages
Peliculas Full HD-2024
No ratings yet
Peliculas Full HD-2024
10 pages
Lec 3
No ratings yet
Lec 3
11 pages
Terms and Definitions
No ratings yet
Terms and Definitions
31 pages
Hotkeys Meshmixer
No ratings yet
Hotkeys Meshmixer
5 pages
Statistics Ns 20231
No ratings yet
Statistics Ns 20231
49 pages
3 Data Description and Measures of Central Tenndency
No ratings yet
3 Data Description and Measures of Central Tenndency
72 pages
Educ630 Web-Based Assessment Assignment
No ratings yet
Educ630 Web-Based Assessment Assignment
3 pages
Geomatics Engineering Technology
No ratings yet
Geomatics Engineering Technology
3 pages
Manual Hiad 6 Ton Inv. 1942
No ratings yet
Manual Hiad 6 Ton Inv. 1942
46 pages
SM 1
No ratings yet
SM 1
125 pages
Basic Statistical Concepts and Methods
100% (1)
Basic Statistical Concepts and Methods
122 pages
Contact Details:: Dr. Joy C. Chavez
No ratings yet
Contact Details:: Dr. Joy C. Chavez
54 pages
PPPT
No ratings yet
PPPT
14 pages
SLG 3.1 Constructing A Frequency Distribution and Histogram
No ratings yet
SLG 3.1 Constructing A Frequency Distribution and Histogram
5 pages
Interpretation of Data: Ritchie G. Macalanda, Ph. D
No ratings yet
Interpretation of Data: Ritchie G. Macalanda, Ph. D
48 pages
4 Methods of Data Organizing and Presentation
No ratings yet
4 Methods of Data Organizing and Presentation
47 pages
Daa Unit-4
No ratings yet
Daa Unit-4
31 pages
Classification
No ratings yet
Classification
7 pages
Write True If The Statement Is True, Write False If The Statement Is False
No ratings yet
Write True If The Statement Is True, Write False If The Statement Is False
34 pages
Applied Management Statistics
No ratings yet
Applied Management Statistics
25 pages
Session 2
No ratings yet
Session 2
10 pages
202005171817289765Priyanka-WT-HTML Basics-4
No ratings yet
202005171817289765Priyanka-WT-HTML Basics-4
4 pages
Bess White-Paper Explosion-Protection Final
No ratings yet
Bess White-Paper Explosion-Protection Final
2 pages
Data Presentation: Shirley I.M
No ratings yet
Data Presentation: Shirley I.M
18 pages
Biostatistics (DR Shilpi Gilra)
No ratings yet
Biostatistics (DR Shilpi Gilra)
45 pages
State Aid 10 Ton ESAL Traffic Forecast Calculator: Default Heavy Commerical Traffic Values
No ratings yet
State Aid 10 Ton ESAL Traffic Forecast Calculator: Default Heavy Commerical Traffic Values
4 pages
Writing Portfolio Task 3 Writing Form (Final) Name: Hoh Jia Da Group: 54 MATRIC No.: 200480
No ratings yet
Writing Portfolio Task 3 Writing Form (Final) Name: Hoh Jia Da Group: 54 MATRIC No.: 200480
2 pages
IEC 61850 Process Bus
No ratings yet
IEC 61850 Process Bus
3 pages
Histogram:: See Chapter 2 Section 1 Worksheet, #3 For Solution To Grouped Frequency Distribution
No ratings yet
Histogram:: See Chapter 2 Section 1 Worksheet, #3 For Solution To Grouped Frequency Distribution
7 pages
Frequency Bar Chart: Semester Frequency Relative Frequency Relative Frequency (Method)
No ratings yet
Frequency Bar Chart: Semester Frequency Relative Frequency Relative Frequency (Method)
4 pages
Biostatistics Assignment - 11th April
No ratings yet
Biostatistics Assignment - 11th April
5 pages
API - Pipeline Fact Sheet - RV8
No ratings yet
API - Pipeline Fact Sheet - RV8
1 page
MIPS Superscalar Simulator
No ratings yet
MIPS Superscalar Simulator
5 pages
Hardware of The PIC16F877
No ratings yet
Hardware of The PIC16F877
2 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet

Biostatistics and Demography - Lecture 2

Uploaded by

Biostatistics and Demography - Lecture 2

Uploaded by

Mbarara University of Science and

Edward .J. LUKYAMUZI (MPS)

P.O. Box 1410, Mbarara, Uganda, http://www.must.ac.ug

Nominal Ordinal Continuous

Example: Frequency distribution of PHA III

Frequency Distribution of Year 1 Students by Sex

Percentage of Students by Marital Status

Ratios show the relative sizes of two events

A rate is a measure of frequency of

• Time may be hours, days, weeks, months,

P.O. Box 1410, Mbarara, Uganda, http://www.must.ac.ug

• Values of continuous data exist on a

• See example of frequency distribution of

135 1 0.40 0.40

• In each class/group, frequency counts

• Frequency distributions of continuous

135- 4 1.61 1.61

Total 248 100.00

100 200 300 400 500

Does the centre

100 200 300 400 500

135 1 0.40 0.40

• Minimum is 135 mg/dl

100 200 300 400 500

Class width is 16.45 mg/dl

100 200 300 400 500

Values of a continuous variable (X)

P.O. Box 1410, Mbarara, Uganda, http://www.must.ac.ug

• The mode is the most frequently

Using this age data,

• In a given dataset, a variable may have

- Is The median age of the 40 individuals is 20.5

• Calculating the arithmetic mean

– where n is the sample size and xi is the random

Sample size (n)=40

Step 2: Divide the sum by the sample size (n)

Thus, arithmetic mean =sum/n =20.75

• Based on these measures, what can

8 10 1214 16 18 2022 24 2628 3032 34 3638 40 42 44

• If the data contains outliers-

• Do not calculate the geometric mean if you

P.O. Box 1410, Mbarara, Uganda, http://www.must.ac.ug

P.O. Box 1410, Mbarara, Uganda, http://www.must.ac.ug

The graph (Histogram)

100 200 300 400 500

Starts at 135 mg/dl; Class width is 21.93 mg/dl

• These are known as the lower and upper quartiles

Lowe inner fence =Q1-

• When the calculated value of the Lower

Data Values which fall outside the fences are called

Using this box

Data Values which fall outside the fences are called

If data is symmetrically distributed, all plotted values

• Question. Based on the values of the mean

• where SD=standard deviation and is the mean

• CV can be used to compare variation

• Used in computation of confidence intervals

• (more on standard error later)

You might also like