Lecture - 5 - Start
Lecture - 5 - Start
Outline
1. Science, Method & Measurement
2. On Building An Index
3. Correlation & Causality
4. Probability & Statistics
5. Samples & Surveys
6. Experimental & Quasi-experimental Designs
7. Conceptual Models
8. Quantitative Models
9. Complexity & Chaos
10. Recapitulation - Envoi
Outline
1. Science, Method & Measurement
2. On Building An Index
3. Correlation & Causality
4. Probability & Statistics
5. Samples & Surveys
6. Experimental & Quasi-experimental Designs
7. Conceptual Models
8. Quantitative Models
9. Complexity & Chaos
10. Recapitulation - Envoi
Quantitative Techniques for Social Science Research
Lecture # 5:
Samples And Surveys
Ismail Serageldin
Alexandria
2012
Sample Surveys are among the most studied
and written about topics in statistics
So: no Textbooks.. Just follow the
presentation
Why Do Sample Surveys
Why do we do sample surveys?
We want to know something about the Population so
we study a small sample of the Population
(making sure that the sample is representative)
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Quantitative Variables:
Continuous and Discrete
• Continuous variables can take any value
between the maximum/minimum range: e.g.
the weight of the persons in a class.
• Discrete variables must have an integer
value: e.g tossing a coin, how many times do
we get heads? It can never be 2.7 times, it will
have to be 1,2,3,…n
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
TEST
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
TEST
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Two Snapshots, Two “states”:
Discrete variables imply sudden moves
from state to state
Continuous variables imply constantly
changing transitions between two
snapshots
Transitions can be cut up in discrete
states
But many transitions are really
continuous
Example:
Students leaving school and
entering the Labor Market
Later we will discuss how this fits in
Markov chains and the manpower model
But let’s go back to the issues of
Data Collection
Methods Of Data Collection
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Methods of Data Collection (Cont’d)
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Why do Sample Surveys?
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Pros and Cons
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Pros and Cons (continued)
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
We will have a lot more to say on
Experimental Designs later.
We must distinguish between
the sample statistic
and
the population parameter
From Population To Sample To Population:
(From Sample Statistic To Population Parameter)
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Example Of Population Parameter vs.
Sample Statistic
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Bad Surveys make for bad estimates
Estimates of the front runners in the
Egyptian Presidential Election 2012
• Before the first • After the first
Round: Round:
Non-Probability samples
and
Probability samples
Sampling Methods
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Non-Probability Sampling
Pros & cons of Non-Probability Sampling
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Two of the main types of non-probability
sampling methods
• Voluntary sample. People who self-select into the
survey. Often, these folks have a strong interest in
the main topic of the survey. E.g. those who call in to
talk show, or participate in an on-line poll. This would
be a volunteer sample.
• Convenience sample. A convenience sample is made
up of people who are easy to reach. E.g. interviewing
my students or my employees or shoppers at a local
mall, If the group or the location was chosen
because it was a convenient this would be a
convenience sample.
• Note: Neither allows generalization to the population.
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Non-probability Sample Surveys
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Probability Samples are representative
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Simple Random sampling
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Stratified Sampling
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Cluster sampling
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Multistage sampling.
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Systematic random sampling.
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
How To Select A
Probability Sample
How to select a probability sample
Probability Sampling
• Non-response bias
• Coverage bias
• Selection bias
Major Types of Bias In Surveys
• Non-response bias
• Coverage bias
• Selection bias
Major Types of Bias In Surveys
• Non-response bias
• Coverage bias
• Selection bias
Major Types of Bias In Surveys
• Non-response bias
• Coverage bias
• Selection bias
Major Types of Bias In Surveys
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Examples of Response Bias
(Due to error in the Measurement process)
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Examples of Response Bias – Cont’d
(Due to error in the Measurement process)
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Sampling Statistic and Sampling Error
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Increasing The Sample size:
Reduces Sampling Error but NOT Survey Bias
• Increasing the sample size tends to reduce the
sampling error; that is, it makes the sample statistic
less variable. However, increasing sample size does
not affect survey bias.
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
The Null Hypothesis &
Types of Error
To analyze survey data and arrive at a
conclusion, we need to formulate a
Null Hypothesis
Null Hypothesis
• It is symbolized by H0
The first to formalize the notion of the
“Null Hypothesis”
• One-Tailed :
Accept H0 Reject H0
• Two Tailed:
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
More samples means more accurate
estimation of the population parameter
688
Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001
Choosing the significance level for a test
696
Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001
Assume that we have the mean of a
distribution. We need to find the
standard deviation (or its square:
the variance)
The Variance is the square of the
Standard Deviation
Calculating the Variance and the
standard deviation
• The formula for calculating the
variance:
∑ −
=
−
• The Standard deviationis given by:
699
Example: calculating Variance and
Standard Deviation
For example, using these six measures
3,9,1,2,5 and 4:
∑ = 3 + 9 + 1 + 2 + 5 + 4 = 24
∑ =3 +9 +1 +2 +5 +4
= 9 + 81 + 1 + 4 + 25 + 16 = 136
The quantities are the substituted into the
shortcut formulate to find ∑ − .
∑
∑ − ̅ =∑ −
24
= 136 −
6 700
Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001
Example: calculating Variance and
Standard Deviation
#$"
= !" − = %&
"
The variance and standard deviation are now
found as before:
∑ − %&
= = ='
− #
701
Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001
We will say more about the
standard deviation and the
variance in a moment
Understanding What Is
Behind A Formula
Clear thinking about statistics:
understanding what is behind the
formula
. the logic behind a
• I want you to understand
formula. You do not need to memorize any
formula. You do that by asking questions….
• For example, let’s look at the formula for
computing the sample variance:
*
) = + , −
* −
,-
705
Why do we square the deviations
from the mean?
1
1
. = + 0 − ̅
/ −1
0-2
706
Why do we square the deviations
from the mean?
1
1
. = + 0 − ̅
/ −1
0-2
• Because, if we add up all deviations, we get
always zero value.
• So, to deal with this problem, we square the
deviations.
• Bonus: Notice that squaring also magnifies
the deviations; therefore it helps us better feel
the spread of the data.
707
Why do we square the deviations
from the mean?
1
1
. = + 0 − ̅
/ −1
0-2
• Because, if we add up all deviations, we get
always zero value.
• So, to deal with this problem, we square the
deviations.
• Bonus: Notice that squaring also magnifies
the deviations; therefore it helps us better feel
the spread of the data.
708
Why do we square the deviations
from the mean?
1
1
. = + 0 − ̅
/ −1
0-2
• Because, if we add up all deviations, we get
always zero value.
• So, to deal with this problem, we square the
deviations.
• Bonus: Notice that squaring also magnifies
the deviations; therefore it helps us better feel
the spread of the data.
709
Why not raise to the power of four
(three will not work)?
1
1
. = + 0 − ̅
/ −1
0-2
710
Why not raise to the power of four
(three will not work)?
1
1
. = + 0 − ̅
/ −1
0-2
• Squaring does the trick; why should we
make life more complicated than it is?
711
Why is there a summation notation
in the formula?
1
1
. = + 0 − ̅
/ −1
0-2
712
Why is there a summation notation
in the formula?
1
1
. = + 0 − ̅
/ −1
0-2
713
Why do we divide the sum of
squares by n-1.
1
1
. = + 0 − ̅
/ −1
0-2
714
Why do we divide the sum of
squares by n-1.
1
1
. = + 0 − ̅
/ −1
0-2
• The amount of deviation should reflect also
how large the sample is; so we must bring in
the sample size.
• Why? Because, in general, larger sample
sizes have larger sum of square deviation
from the mean.
715
Why do we divide the sum of
squares by n-1.
1
1
. = + 0 − ̅
/ −1
0-2
• The amount of deviation should reflect also
how large the sample is; so we must bring in
the sample size.
• Why? Because, in general, larger sample
sizes have larger sum of square deviation
from the mean.
716
Why divide by n-1 not n?
1
1
. = + 0 − ̅
/ −1
0-2
717
Why divide by n-1 not n?
1
1
. = + 0 − ̅
/ −1
0-2
• When you divide by n-1, the sample's
variance provides an estimated variance
much closer to the population variance, than
when you divide by n.
• But for larger samples, (say over 30), it really
does not matter whether it is divided by n or
n-1. The results are almost the same, and
they are acceptable. 718
Why divide by n-1 not n?
1
1
. = + 0 − ̅
/ −1
0-2
• When you divide by n-1, the sample's
variance provides an estimated variance
much closer to the population variance, than
when you divide by n.
• But for larger samples, (say over 30), it really
does not matter whether it is divided by n or
n-1. The results are almost the same, and
they are acceptable. 719
Does N-1 have a Meaning?
1
1
. = + 0 − ̅
/ −1
0-2
720
Does N-1 have a Meaning?
1
1
. = + 0 − ̅
/ −1
0-2
• The factor n-1 is what we consider as the
"degrees of freedom" (but that is another
discussion).
• Degrees of freedom is the number of values
in the final calculation of a statistic that are
free to vary.
721
Does N-1 have a Meaning?
1
1
. = + 0 − ̅
/ −1
0-2
• The factor n-1 is what we consider as the
"degrees of freedom" (but that is another
discussion).
• Degrees of freedom is the number of values
in the final calculation of a statistic that are
free to vary.
722
Explain number of values that are
allowed to vary
1
1
. = + 0 − ̅
/ −1
0-2
723
Explain number of values that are
allowed to vary
1
1
. = + 0 − ̅
/ −1
0-2