Lesson 1 Introduction To Statistics
Lesson 1 Introduction To Statistics
003
Quantitative Methods
Mr. Melvin Ledesma
Instructor
Introductory Lessons: Definitions
OBJECTIVES
Define statistics
Distinguish clearly between
◼ Descriptive and Inferential Statistics
◼ Surveys and Experiments
◼ Retrospective and Prospective Studies
◼ Descriptive and Analytical Surveys
Define bias
Describe a clinical trial
Basics of Statistics
Definition: Science of collection, presentation,
analysis, and reasonable interpretation of data.
Statistics presents a rigorous scientific method for gaining insight into
data. For example, suppose we measure the weight of 100 patients in
a study. With so many measurements, simply looking at the data fails
to provide an informative account. However statistics can give an
instant overall picture of data based on graphical presentation or
numerical summarization irrespective to the number of data points.
Besides data summarization, another important task of statistics is to
make inference and predict relations of variables.
A Taxonomy of Statistics
The Uses of Statistics
Descriptive vs Inferential Statistics
Descriptive statistics – deals with the
enumeration, organization, and
graphical representation of data.
Inferential statistics – concerned with
reaching conclusions from incomplete
information—that is, generalizing from
the specific. It uses information taken
from a sample to say something about
an entire population.
Example of Descriptive Statistics
An example of descriptive statistics is
the decennial census of the USA,
where all residents are requested to
provide information such as age, sex,
race and marital stratus.
The collected data can be compiled
and arranged into tables and graphs
that DESCRIBE the characteristics of
the population at a given time.
Example of Inferential Statistics
An example of inferential statistics is an
opinion poll such as the Gallup Poll,
which attempts to draw inferences as to
the outcome of an election.
In such a poll, a sample of individuals
(frequently fewer than 2000) is selected,
their preferences are tabulated, and
inferences are made as to how more
than 80 million persons would vote if an
election were held that day.
Sources of Data
Surveys and Experiments
◼ Known to be the two fundamental kinds of
investigations.
◼ Data from survey may represent
observations of events or phenomena over
which few, if any, controls are imposed.
◼ In an experiment, we design a research plan
purposely to impose controls over the
amount of exposure (treatment) to a
phenomenon.
Sources of Data
Retrospective Studies (case-control
studies)
◼ Gather data from selected cases and
controls to determine differences, if any,
in the exposure to a suspected factor.
◼ The researcher identifies individuals with
a specific disease or condition (cases)
and also identifies a comparable sample
without that disease or condition
(controls).
Sources of Data
Prospective Studies (cohort studies)
◼ The researchers enroll a group of healthy
persons (a cohort) and follow them over a
certain period to determine the frequency
with which a disease develops.
◼ The advantage of this study is that they
permit the accurate estimation of disease
incidence in a population. They make it
possible to include potentially relevant
variables (e.g. age, gender, ethnicity,
occupation) that may be related to the
outcome variable.
Descriptive vs Analytical Surveys
Retrospective surveys are usually
descriptive surveys that provide
estimates of a population’s
characteristics.
Prospective surveys may be descriptive
or analytical. Analytical surveys seek
to determine the degree of association
between a variable and a factor in the
population.
Clinical Trial
A carefully designed experiment that
is generally considered to be the best
method for evaluating the
effectiveness of a new drug or
treatment method.
Are used extensively to test the
efficacy of new drugs and treatments.
Statistical Description of Data
Statistics describes a numeric set of
data by its
Center
Variability
Shape
Statistics describes a categorical set
of data by
Frequency, percentage or proportion of
each category
Populations and Samples
OBJECTIVES
Distinguish between
◼ Populations and samples
◼ Parameters and statistics
◼ Various methods of sampling
Define Random Sample
Explain why it is important to use random
sampling
Populations and Samples
Definition: A population is a set of persons (or
objects) having a common observable
characteristics. A sample is a subset of a
population.
16
SAMPLING……
What is your population of interest?
To whom do you want to generalize your
results?
◼ All doctors
◼ School children
◼ Indians
◼ Women aged 15-45 years
◼ Other
Can you sample the entire population?
17
SAMPLING…….
3 factors that influence sample
representativeness
Sampling procedure
Sample size
Participation (response)
18
19
SAMPLING BREAKDOWN
Types of Samples
Probability (Random) Samples
Simple random sample
◼ Systematic random sample
◼ Stratified random sample
◼ Cluster sample
Non-Probability Samples
◼ Convenience sample
◼ Purposive sample
◼ Quota
20
Process
The sampling process comprises several stages:
◼ Defining the population of concern
◼ Specifying a sampling frame, a set of items or events
possible to measure
◼ Specifying a sampling method for selecting items or
events from the frame
◼ Determining the sample size
◼ Implementing the sampling plan
◼ Sampling and data collecting
◼ Reviewing the sampling process
21
Population definition
A population can be defined as including all
people or items with the characteristic one
wishes to understand.
Because there is very rarely enough time
or money to gather information from
everyone or everything in a population, the
goal becomes finding a representative
sample (or subset) of that population.
22
SAMPLING FRAME
It is possible to identify and measure every single item in the
population and to include any one of them in our sample.
However, in the more general case this is not possible. There
is no way to identify all rats in the set of all rats. Where
voting is not compulsory, there is no way to identify which
people will actually vote at a forthcoming election (in advance
of the election)
As a remedy, we seek a sampling frame which has the
property that we can identify every single element and include
any in our sample .
The sampling frame must be representative of the population
23
PROBABILITY SAMPLING
24
PROBABILITY SAMPLING…….
25
NON-PROBABILITY SAMPLING
Any sampling method where some elements of population
have no chance of selection (these are sometimes
referred to as 'out of coverage'/'undercovered'), or
where the probability of selection can't be accurately
determined. It involves the selection of elements based
on assumptions regarding the population of interest,
which forms the criteria for selection. Hence, because
the selection of elements is nonrandom, nonprobability
sampling not allows the estimation of sampling errors..
28
SIMPLE RANDOM SAMPLING……..
Definition: Can also be thought of as a 'pick a name out of the hat'
technique. Samples are chosen from a population either by
using a random number table or a random number generator.
Each member of the population has an equal, independent
and known chance of being selected.
Advantages
Easy to implement
Each member of the population has an equal chance of being
selected
Free from bias
Disadvantages
If sampling frame large, this method may be impractical.
A complete list of population may not be available.
Minority subgroups of interest in population may not be present
in sample in sufficient numbers for study. 29
SYSTEMATIC SAMPLING
Systematic sampling relies on arranging the target
population according to some ordering scheme and then
selecting elements at regular intervals through that
ordered list.
Systematic sampling involves a random start and then
proceeds with the selection of every kth element from
then onwards. In this case, k=(population size/sample
size).
It is important that the starting point is not
automatically the first in the list, but is instead
randomly chosen from within the first to the kth
element in the list.
A simple example would be to select every 10th name
from the telephone directory (an 'every 10th' sample,
also referred to as 'sampling with a skip of 10').
30
SYSTEMATIC SAMPLING……
31
SYSTEMATIC SAMPLING……
ADVANTAGES:
Sample easy to select
Suitable sampling frame can be identified easily
Sample evenly spread over entire reference population
DISADVANTAGES:
May be biased where the pattern used for the samples
coincides with a pattern in the population.
Difficult to assess precision of estimate from one survey.
32
STRATIFIED SAMPLING
Where population embraces a number of distinct
categories, the frame can be organized into separate
"strata." Each stratum is then sampled as an
independent sub-population, out of which individual
elements can be randomly selected.
Every unit in a stratum has same chance of being
selected.
Using same sampling fraction for all strata ensures
proportionate representation in the sample.
Adequate representation of minority subgroups of
interest can be ensured by stratification & varying
sampling fraction between strata as required. 33
STRATIFIED SAMPLING……
Finally, since each stratum is treated as an
independent population, different sampling approaches
can be applied to different strata.
35
CLUSTER SAMPLING
Cluster sampling is an example of 'two-stage sampling' .
First stage a sample of areas is chosen;
Second stage a sample of respondents within those
areas is selected.
Population divided into clusters of homogeneous units,
usually based on geographical contiguity.
Sampling units are groups rather than individuals.
A sample of such clusters is then selected.
All units from the selected clusters are studied.
36
CLUSTER SAMPLING…….
Advantages :
Cuts down on the cost of preparing a
sampling frame.
This can reduce travel and other
administrative costs.
Disadvantages:
sampling error is higher for a simple random
sample of same size.
37
CLUSTER SAMPLING…….
• Identification of clusters
– List all cities, towns, villages & wards of cities with
their population falling in target area under study.
– Calculate cumulative population & divide by 30, this
gives sampling interval.
– Select a random no. less than or equal to sampling
interval having same no. of digits. This forms 1st
cluster.
– Random no.+ sampling interval = population of 2nd
cluster.
– Second cluster + sampling interval = 4th cluster.
– Last or 30th cluster = 29th cluster + sampling interval38
CLUSTER SAMPLING…….
Two types of cluster sampling methods.
One-stage sampling. All of the elements
within selected clusters are included in
the sample.
Two-stage sampling. A subset of elements
within selected clusters are randomly
selected for inclusion in the sample.
39
Non-PROBABILITY SAMPLING
A Non-Probability Sampling is a
method of selecting units from
population using a subjective (i.e.
nonrandom) method.
It does not require any study frame.
40
(1) QUOTA SAMPLING
The population is first segmented into mutually exclusive sub-
groups, just as in stratified sampling.
Then judgment used to select subjects or units from each segment
based on a specified proportion.
For example, an interviewer may be told to sample 200 females and
300 males between the age of 45 and 60.
It is this second step which makes the technique one of non-
probability sampling.
In quota sampling the selection of the sample is non-random.
For example interviewers might be tempted to interview those who
look most helpful. The problem is that these samples may be biased
because not everyone gets a chance of selection. This random
element is its greatest weakness and quota versus probability has
been a matter of controversy for many years
41
(2) CONVENIENCE SAMPLING
Sometimes known as grab or opportunity sampling or accidental or
haphazard sampling.
A type of nonprobability sampling which involves the sample being drawn
from that part of the population which is close to hand. That is, readily
available and convenient.
The researcher using such a sample cannot scientifically make
generalizations about the total population from this sample because it would
not be representative enough.
For example, if the interviewer was to conduct a survey at a shopping
center early in the morning on a given day, the people that he/she could
interview would be limited to those given there at that given time, which
would not represent the views of other members of society in such an area,
if the survey was to be conducted at different times of day and several
times per week.
This type of sampling is most useful for pilot testing.
In social science research, snowball sampling is a similar technique, where
existing study subjects are used to recruit more subjects into the sample.
42
CONVENIENCE SAMPLING…….
43 43
(3) Judgmental sampling or Purposive sampling
44
PANEL SAMPLING
Method of first selecting a group of participants through a
random sampling method and then asking that group for the
same information again several times over a period of time.
Therefore, each participant is given same survey or interview at
two or more time points; each period of data collection called a
"wave".
This sampling methodology often chosen for large scale or
nation-wide studies in order to gauge changes in the population
with regard to any number of variables from chronic illness to
job stress to weekly food expenditures.
Panel sampling can also be used to inform researchers about
within-person health changes due to age or help explain changes
in continuous dependent variables such as spousal interaction.
There have been several proposed methods of analyzing panel
sample data, including growth curves.
45
REVIEW:
Introduction to Statistics
Exercise
Identify whether each situation shows a
descriptive or inferential statistics.