0% found this document useful (0 votes)
39 views26 pages

1B. Topic 1 - Introduction To Statistics - 16 - 04 - 2009

Uploaded by

Ocs Namuteche
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views26 pages

1B. Topic 1 - Introduction To Statistics - 16 - 04 - 2009

Uploaded by

Ocs Namuteche
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

INTRODUCTION TO BIOSTATISTICS

Lawrence Muthami
MB.CHB Level II
Basic Concept I: What is data?
• Indeed the world is becoming quantitative. Large and
rich datasets are collected for government and the
various business communities.
• Data are not just numbers, but numbers that carry
information about a specific situation.
This valuable information- take important decisions, and
communicate this information to intended users
• In general, by data we mean a collection of numerical
values that has structure that we are willing to
investigate
What is statistics?
• “ Statistics…the most important science in the whole world:
for upon it depends the practical application of every other
science and of every art; the one science essential to all
political and social administration, all education , all
organization based upon experience, for it only gives the
results of our experience.” - Florence Nightingale (1820 –
1910 )
The “Random House College Dictionary “ defines statistics
as “the science that deals with the collection , classification,
analysis, and interpretation of information or data”.
Statistics is the art and science of learning from data.
What is Statistics?
• A natural question is what to do with this data and how
to read anything from it.

To answer this basic question, one needs to learn the art of


applied statistics.
• Applied Statistics- defined as the art of interrogating and
reading a dataset critically.

Statistics is fundamentally about designing an experiment which


would produce good data, collecting data, handling and
analyzing this data - to discover structures, patterns, and
trends, interpreting the results of the analysis within its specific
settings and boundaries, and
drawing practical conclusion eventually.
The three main aspects of statistics
1)Design: designing the process of data collection (Identify population,
what kind and how much data needed, how to collect a sample) .
2) Description: the methods of summarizing/describing data.
3) Inference: making decisions and predictions based on the data.
Remember : we observe samples but we are interested in populations.
• Is Statistics important?
– Market research.
– Public health research.
– Design and analysis of experiments.
– Making decisions in an uncertain world.
– Cool applications:
forensic statistics,
medical statistics….
Environmental Statistics
Statistics
• We need also statistics because of the
variation in measurements and
Manufacturing processing.
• Do you think every medical treatment has
exactly the same effect on every patient?
• With statistics we’ve all come to realize the
importance of planned experiments along
with careful measurements.
Basic Concepts - II
• Sources of Data
– Routinely kept medical records
– Surveys – KDHS, KAIS, MIS,
– Experiments / Research Studies
• Biostatistics
– Statistics applied to biological sciences and medicine
– Statistics including not only analytic techniques but also
study design issues
• Variables
– a characteristic that can take on different values for
different persons, places or things
– Statistical analyses need variability; otherwise there is
nothing to study
Biostatistics in Public Health
Research
• Methodological Research:
– New Statistical technique
– High speed computing
– Geographical patterns of diseases
– Clinical Trials
– Longitudinal Analysis
– Data Analysis in Epidemiological studies
Biostatistics in Public Health
Research
• Collaborative Research
– What is the objective of the research
– What is the main hypothesis
– What is the target population
– How to draw a representative sample
– How many people to sample
– How to take measurements
Basic Concepts – III -
• Types of Variables
– Qualitative
– Quantitative
– Random
– Discrete random
– Continuous random
• Population
– Sample
– Statistical analysis infers from a sample the
characteristics of the population
Measurements and Measurement Scales
• Measurement
– assignment of representative numbers to objects
or events according to a set of rules
• Measurement Scales
– Nominal: - Yes/No
– Ordinal: - Income (low/medium/high)
– Interval: - Degrees (Fahrenheit, Celsius)
– Ratio : - Height, weight
Basic concepts and Terminology
• Population: The entire collection of subjects/items
under investigation.
• A population parameter: is a numerical quantity that
describes a characteristic of a population.
• A population parameter can be considered to be a
constant number but its true value can be known if and
only if the outcome for every subject/item in the
population is recorded.
• A population parameter can be considered random
at times.
• The role of applied statistics is to provide ways to know
more about a population parameter of interest.
Basic concepts and Terminology
• A sample: is an arbitrary subset of the population selected
for study in order to gain more information about the entire
population.
• Sample implies an idea of partial collection.
• Each individual/item in the sample will be interrogated and
their outcome will be recorded.
• Information collected on a sample is often used to draw
conclusions or infer about the population parameter.
• It is therefore crucial that the sample be like the population
in every aspect (except that the sample would be of smaller
size).
• A sample needs to be a good representative of the larger
population under investigation. Otherwise conclusions
inferred on the population parameter cannot be valid.
Random Sample
• Reason
– sample a ‘small’ number of subjects from a population to
make inference about the population
– Essence of statistical inference
• Definition
– A sample of size n drawn from a population of size N in such a
way that every possible sample of size n has the same chance
of being selected
• Sampling with and without replacement
– In biostatistics, most sampling done without replacement
Why do we focus on sample?
• Limited time, money and human resources, large and
inaccessible population, economic advantage, destructive
nature of observations dictate that using a sample is the
preferred way to go to know more about the population
under study.
• Indeed, SAMPLING is always a practical necessity, that is,
one has to rely on a sample to draw conclusion on a
population parameter.
• It avoids to having to examine every single subject/item in
the population.
– Can you imaging a survey about the 2012 presidential election
that will take more than a year to complete? The presidential
election will be over by then!!!
– We want information on currently unemployment and public
opinion next week, not next year.
Why do we focus on sample?
• Facts:
• Attempting to count every last item in a pharmaceutical store
could be inaccurate. Bored people tend not to count carefully.
• A carefully conducted opinion pool produces more accurate
information than a census.
• Accountants, for example, sample a firm’s inventory to verify the
accuracy of the records.
• It is tempting to simply base our conclusion on our experience
without making any use of systematic data. We may rely on
anecdotes rather than data.
• We often recall an unusual incident that sticks in our memory
exactly because it is unusual.
• We remember an airplane crash that killed several hundreds of
people, therefore we refuse to board a plane.
• However government records, flying is much safer than driving.
Why do we focus on sample?
(Cont.)
• New theory in social, biological, medical and
physical science rely on statistical evidence.
Sometimes, the population is not well-defined for
these studies.
• Selecting a sample is an important component in
making accurate and valid statistical claims.
Basic concepts and terminology
• A sample statistic or a statistic: is a numerical descriptive
measure calculated from the observed outcomes in the sample.
• X-bar, sd
• It is important to note that a parameter is a term that refers to a
population quantity whereas a statistic refers to a sample.
• μ, σ
• The value of a statistic depends on the particular sample selected
from the population. In other words, a statistic changes its value
each time a new sample is selected(sample size n)

• Therefore there is always a potential risk that the sample results


be very different from the population parameter under
investigation. Hence, it is more important to quantify how likely a
sample result will be far from the population parameter. This is
where probability come into play.
Basic concepts and terminology

• The theory of probability would then kick in to


guarantee that after several repeats of an identical
experiment, one would discover structures and/or
trends that are highly probable.
• Probability will give a statement about how confident
we can be in the claim provided by the sample data
analysis (that is, how confident we can be that the
answer is correct).
• The process that generalizes the result obtained from a
sample to the entire population under study is called
inferential statistics.
• An observed effect so large that it would rarely occur
by chance is said to be statistically significant.
Sampling

• • Interviewing all Kenyans would be


• impractical.
• • Select a random sample.
• Sampling: 1,000 adults, 18 years and older,
• were randomly selected and interviewed on the
telephone.
• 􀂁 Random selection means that this group of
individuals
• represents the population of Kenyan registered
voters.
Data Description

1. Summary: 530 said ODM, 470 PNU


It is better to compare relative frequencies.
53% = 530/1,000 x 100 and 47% = 470/1,000 x 100
These figures describe the sample preferences.
2. How can we deduce that “53% of Kenyan say that
the ODM better represents their values,
compared to 47% who say that of the PNU
3. Use statistical inference.
Data Collection
Which party do you think better
SISI
Represents your interests

DP, ODM, PNU, ODM-K, SHIRIKISHO, SISI KWA SISI


Inference
The result inferred from the sample is uncertain.
We measure the uncertainty using probability.
The poll’s declaration:

For results based on this sample, one can say with 95


percent confidence that the maximum error attributable
to sampling and other random effects is plus or minus 3
percentage points.

In addition to sampling error, question wording and


practical difficulties in conducting surveys can introduce
error or bias into the findings of public opinion polls.
What is data?

• Pick any newspaper or watch any TV news channel or read any


online new sites, you will find a collection of numbers
summarizing a data with a sentence telling what to do with
this information. You could also at patients register and find
out how they record patient characteristics

• What do these summarized numbers represent? What


formulae are used to compute these numbers? What justify
the use of the formula?

• Even if one knew the answers of the questions above, another


problem is: “Upon reading these summarized numbers we
draw a conclusion without really knowing how to read and
what take out from this summarized information.”
Joke on control trials

A Biostatistician’s wife had twins. He was


delighted. He rang the priest who was also
delighted. “Bring them to church on Sunday
and we’ll baptize them” said the priest. “No,”
replied the Biostatistician. “Baptize one. We’ll
keep the other as a control.”

You might also like