0% found this document useful (0 votes)
6 views38 pages

Chapter 1 Presentation

The document provides an overview of biostatistics, emphasizing its role in analyzing biological and medical data through management, descriptive, and inferential statistics. It discusses the importance of sampling methods, data collection techniques, and the distinction between primary and secondary data. Additionally, it highlights the limitations of statistics and the need for careful interpretation to avoid misuse.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views38 pages

Chapter 1 Presentation

The document provides an overview of biostatistics, emphasizing its role in analyzing biological and medical data through management, descriptive, and inferential statistics. It discusses the importance of sampling methods, data collection techniques, and the distinction between primary and secondary data. Additionally, it highlights the limitations of statistics and the need for careful interpretation to avoid misuse.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Chapter I: Basic Concepts in

Biostatistics

 Statistics is the science of data analysis. It deals with


scientific methods of collecting, organizing and analyzing
data so as to make valid conclusions and reasonable
decisions on the basis of such analysis.

 It is helpful to think of the process of data analysis as


consisting of three stages: management, descriptive and
inferential.
1
 The tools of statistics are employed in many
fields.

 When the data being analyzed are derived


from the biological sciences and medicine,
we use the term biostatistics to distinguish
this particular application of statistical tools
and concepts.
 Biostatistics: The application of statistical
methods to the fields of biological and
medical sciences.
2
 Biostatistics concerned with interpretation of
biological data & the communication of information
derived from these data.

 The numbers must be presented in such a way that


valid interpretations are possible.

 Statistics are everywhere – just look at any


newspaper or the current medical and public health
literature.

3
 In applying statistics to a scientific, industrial,
Societal or health problem, it is necessary to begin
with a population to be studied.

• A population is composed of all observations on the


entire group under our consideration.

• For practical reasons, a chosen subset of the


population called a sample is studied — as opposed
to compiling data about the entire group, an
operation called census.

4
Once a sample of the population is
determined, data is collected for the sample
members.
This data can then be subjected to statistical
analysis, serving two related purposes:
description and inference.
 Descriptive statistics summarize the population data
by describing what was observed in the sample
numerically or graphically.

5
 Inferential statistics uses patterns revealed through
analysis of sample data to draw inferences about the
population represented.

 For a sample to be used as a guide to an entire


population, it is important that it is truly a
representative of that overall population.

 Appropriate and scientific sampling procedures


assure that the inferences and conclusions can be
safely extended from the sample to the
population as a whole.

6
 The raw materials for any statistical analysis are the
data.
 Thus, the first thing we have to do is collect relevant
data using appropriate methods.
 Once the task of collecting data is
tedious completed,
this collection of raw data in
reveals very little. itself
 It is extremely difficult to determine the
meaning
true of a bunch of numbers that have simply
been recorded on a piece of paper.

7
 It remains for us to organize and describe these data in
a concise, meaningful manner.
 In order to determine their significance, we must display the
data in the form of tables, graphs and charts (so that we can
have a good overall picture of the data).

 Then, we have to analyze the data, i.e., we calculate


summary measures such as the mean and standard
deviation; assess the extent of relationship (correlation)
between two (or more) variables; and the like.

 Finally, based on the analysis, we have to make


generalizations and arrive at reasonable decisions.

8
Limitations of Statistics
• Although statistics is widely applied and has shown
its merit in planning, policy making, marketing
decisions, quality control, medical studies, etc., it has
some limitations:

• Statistical laws are not exact. They are probabilistic in


nature, and inferences based on them are only
approximate.

• Statistics is liable to be misused. It deals with figures


which are innocent by themselves, but which can be
easily distorted and manipulated.
9
Example:
 Information released from the President’s Office: the number
of minority students has increased from 10 to 20 in this
academic year; the student population of the university has
also increased from 1000 to 2500.
 Based on this information, a newspaper headline reads:
Number of minority students doubled
 Focusing only on one aspect of the data – the number of
minority students has increased from 10 to 20 – the
newspapers ignored the other fact, that is, the student
population of the university has increased from 1000 to 2500
this year.
 The fact is that the percentage of minority students has
decreased: from 1% last year to 0.8% this year.

10
Major steps in a statistical investigation

 Define the objectives and scope of the survey.


 Define the population and sampling units.
 Identify the proper sampling technique and collect
data.
 Organize the data to have a good overall picture of
the data
 Analyze the data (calculate various statistics of
interest).
 Make conclusions/predictions based on the statistics
computed from the sample by applying mathematical
statistics and probability theory.
11
Methods of data collection
1. Personal interview: This method of data collection
may take two forms:
a) Face-to-face: This involves trained interviewers visiting the
desired people (respondents) in person to collect data.
 To determine whether workers in a given factory are satisfied
with their salary or not, for example, an investigator may
contact each worker and ask his/her opinion.
 The advantage of this method is that it ensures a high
response rate, and trained interviewers gather better quality
data.

12
b) Telephone: This involves trained interviewers phoning
people to collect data. This method is quicker and less
expensive than face-to-face interviewing.

 For example, to determine whether a new drug


product is favored by the public or not, a company
may randomly pick telephone numbers from the
telephone directory, call each of these numbers and
then ask the preference or opinion of the
respondents.

13
2. Self-completed (written questionnaire): In this
method, written questions are mailed or hand-delivered to
respondents.
(a) Mail survey: Here questionnaires are mailed to people and
mailed back by the respondents after completion.
– It is a relatively inexpensive method of collecting data: one
can distribute a large number of questionnaires in a short
time.
– This requires the questionnaire to be simple and
straightforward.
– A major disadvantage of a mail survey is that it usually has
lower response rates than other data collection methods.
– Also, people with limited ability to read or write
may experience problems.

14
(b) Hand-delivered questionnaire: This is a
self-enumerated survey where questionnaires are hand-
delivered to people and mailed back by the
respondents after completion.

 This method usually results in better response rates


than a mail survey, and is particularly suitable when
information is needed from several household
members.

 The hand-delivered with respondent mail-back


method can reduce the cost of collecting forms.
15
 The following is a list of some key points to think
about when designing your questionnaire:
– The introduction should be informative and stimulate
respondents’ interest:
interviewers give the respondent their name and
provide identification;
 explain that a survey is being conducted;
 describe the survey's purpose;
give the respondent time to read or be informed
about confidentiality issues; etc

16
– The questions should read well and have a good flow.
– The words should be simple, direct and familiar to all
respondents.
– The questions should be clear and as specific as
possible.
– Questions should not be double-barreled
• Example: Does your company provide training for
new employees and re-training for existing staff?
• This example is double-barreled as it asks two
questions rather than one

17
– Questions should not be leading
– If the questions are close-ended, the response
categories should be mutually exclusive and
exhaustive?
– Open-ended questions give respondents an
opportunity to answer the question in their own
words.
– Close-ended questions give respondents a choice
of answers and the respondent is supposed to
select one.

18
Sources of Data
 Primary Data: When an individual, agency or
organization controls the design and data collection
processes
 Secondary Data: When you use data previously collected
by others for their own purposes
 In this case data were obtained from already collected sources like
newspaper, magazines, CSA, DHS, hospital records and existing data
like;
 Mortality reports
 Morbidity reports
 Epidemic reports
 Reports of laboratory utilization (including laboratory test results)
19
– As a general rule, primary data sources are preferred
to secondary sources since the primary source
contains much pertinent information about:
– collection methods and

– limitations associated with the data.

– If the information is derived from a secondary


source, for instance, it is possible that the data might
have been altered for some reason.

20
Some Basic Terms in Biostatistics
 In collecting data concerning the characteristics of a group of
individuals or objects, it is often impossible or impractical (from
the point of view of time and cost) to observe the entire group.
 In such cases, instead of examining the entire group, called
population, we examine only a small part of the population,
called sample.

 A population is the set of all elements that belong to a certain


defined group.
 A sample is a part (or a subset) of the population.

 Numerical characteristic of a population is called a parameter.


 Numerical characteristic of a sample is called a statistic.

21
Sampling Techniques
A census is a complete enumeration of the entire population.
 There are several reasons for taking a sample instead of a
complete enumeration of the whole population or census. These
include:
– A census may be very expensive.
– A census may require too much time.
– A carefully obtained sample may be more accurate than a
census.
 For example, in a large inventory census or in a study of the
prevalence of HIV among adolescents in Ethiopia, errors due to
fatigue or carelessness on the part of the census taker may
introduce a serious bias in the results.

22
 Broadly speaking, there two types of
sampling
are techniques: random sampling
non-random sampling. and
 In random sampling, the elements to be
included the sample entirely depend on
in
chance.
 Random sampling techniques often yield
samples that representative the
populationare of are drawn.
from which they
 In non-random sampling, the units in the
sample are chosen by the investigator based on
his/her personal convenience and beliefs. 23
Random Sampling Techniques
1. Simple Random Sampling
 This is a method of sampling in which every member
of the population has the same chance of being
included in the sample.
2. Systematic Random Sampling
 In some instances, the most practical way of
sampling is to select, say, every 20th name on a list,
every 12th house on one side of a street, every 50th
piece of item coming off a production line, and so
on.
 This is called systematic sampling.
24
3. Stratified Random Sampling
 Stratified random sampling is the procedure of dividing the
population into relatively homogeneous groups, called strata,
and then taking a simple random sample from each stratum.
 If the population elements are homogeneous, then there is
no need to apply this technique.

 Example: If our interest is the income of households in a city,


then our strata may be:
– low income households
– middle income households
– high income households

25
 To obtain a sample from each stratum:

– Take a sample of size proportional to the sub-


population (stratum) size, i.e., draw a large sample
from a large stratum and a small sample from a
small sub-population.

– This is known as proportional allocation.

26
4. Cluster Sampling
 This is a method of sampling in which the total population is
divided into relatively small subdivisions, called clusters, and
then some of these clusters are randomly selected using
simple random sampling.

 Once the clusters are selected, one possibility is to use all the
elements in the selected clusters.

 However, this seems uneconomical. we


Instead,take a random sampleof elements
clusterseach
from (calledof
two-stage
the sampling).
selected

27
Example: Suppose we want to make a survey
of households in Addis Ababa.
 Collecting information on each and every
household is impractical from the point of view of
cost and time.
 What we do is divide the city into a number of
relatively small subdivisions, say, Kebeles. So the
Kebeles are our clusters.
 Then we randomly select, say, 20 Kebeles using
simple random sampling.

28
 To collect information about individual households,
we have two options:
a) We visit all households in these 20 Kebeles
b) We randomly select households from each of these
20 selected Kebeles using simple random sampling.
– This method is called two-stage sampling since
simple random sampling is applied twice (first,
to select a sample of Kebeles and second, to
select a sample of households from the
selected Kebeles)

29
B. Non-random Sampling Techniques
 Convenience or Accidental sampling (members of
the population are chosen based on their relative
ease of access)
 Judgment or Purposive sampling – (The researcher
chooses the sample based on who he/she thinks
would be appropriate for the study)
– Purposive sampling starts with a purpose in mind and the
sample is thus selected to include people or objects of
interest and exclude those who do not suit the purpose.
Purposive sampling can be subject to bias and error.

30
– Case study (The research is limited to one
group, often with a similar characteristic or
of small size.)

– Ad hoc quotas (A quota is established and


researchers are free to choose any
respondent they wish as long as the quota is
met.)

– Snowball sampling (The first respondent


refers a friend. The friend also refers a
friend, etc.)
31
Comparison: Probability and non-probability
sampling

 Probability sampling (or random sampling) is a


sampling technique in which the probability of
getting any particular sample may be calculated.

 Non-probability sampling does not meet this


criterion and should be used with caution.

 Non-probability sampling techniques cannot be used


to infer from the sample to the general population.

32
• The difference between non-probability and
probability sampling is that non-probability
sampling does not involve random selection
and probability sampling does.

• Does that mean non-probability samples aren't


representative of the population? Not necessarily.
But it does mean that non-probability samples do
not depend upon the rationale of probability theory.

33
 With non-probability samples, we may or may
not represent the population well, and it will
often be hard for us to know how well we've
done so.

 In general, researchers prefer probabilistic or


random sampling methods over non-
probabilistic ones, and consider them to be
more accurate and rigorous.

34
Criteria for the acceptability of a
sampling method

Chance of Selection for Each Unit:


• The sample must be selected so that it
properly represents the population that is to
be covered.

• This means that each unit (farm, patient,


household, person, or whatever unit is being
sampled) must have a nonzero probability
(chance) of being selected. 35
Measurable Reliability:
• It should be possible to measure the
reliability of the estimates made from the
sample.

• That is, in addition to the desired estimates


of characteristics of the population (totals,
averages, percentages, etc.) the sample
should give measures of the precision of
these estimates.
36
Feasibility
 Another characteristic is that the sampling
plan must be practical.

 It must be sufficiently simple and


straightforward so that it can be carried out
substantially as planned.

 A plan for selecting a sample, no matter how


attractive it may appear on paper, is useful
only to the extent that it can be carried out in
practice. 37
Economy and Efficiency

• Finally, the design should be efficient.

• Among the various sampling methods that


meet the three criteria stated above, we
would naturally choose the method which,
to the best of our knowledge, produces the
most information at the smallest cost.

38

You might also like