0% found this document useful (0 votes)
54 views167 pages

Lecture - 5 - Start

The document discusses sample surveys and how they are used to make inferences about populations. Sample surveys are typically cheaper and easier to conduct than a full census. The key aspects covered include population parameters versus sample statistics, sources of error in surveys, and how sample size affects sampling error.

Uploaded by

ss t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views167 pages

Lecture - 5 - Start

The document discusses sample surveys and how they are used to make inferences about populations. Sample surveys are typically cheaper and easier to conduct than a full census. The key aspects covered include population parameters versus sample statistics, sources of error in surveys, and how sample size affects sampling error.

Uploaded by

ss t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 167

Lecture -- 5 -- Start

Outline
1. Science, Method & Measurement
2. On Building An Index
3. Correlation & Causality
4. Probability & Statistics
5. Samples & Surveys
6. Experimental & Quasi-experimental Designs
7. Conceptual Models
8. Quantitative Models
9. Complexity & Chaos
10. Recapitulation - Envoi
Outline
1. Science, Method & Measurement
2. On Building An Index
3. Correlation & Causality
4. Probability & Statistics
5. Samples & Surveys
6. Experimental & Quasi-experimental Designs
7. Conceptual Models
8. Quantitative Models
9. Complexity & Chaos
10. Recapitulation - Envoi
Quantitative Techniques for Social Science Research

Lecture # 5:
Samples And Surveys

Ismail Serageldin
Alexandria
2012
Sample Surveys are among the most studied
and written about topics in statistics
So: no Textbooks.. Just follow the
presentation
Why Do Sample Surveys
Why do we do sample surveys?
We want to know something about the Population so
we study a small sample of the Population
(making sure that the sample is representative)

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001


So we will discuss how to undertake
sampling and how to do surveys
Let’s start with some definitions
Data, Variables,
Statistics and
Parameters
Variables

• A variable is an attribute that describes a


person, place, thing, or idea.
• The value of the variable can "vary" from one
entity to another.
• Qualitative Variables are categorical: e.g. The
color of balls are green, red or blue.
• Quantitative Variables are numeric: e.g. the
population of a city.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Quantitative Variables:
Continuous and Discrete
• Continuous variables can take any value
between the maximum/minimum range: e.g.
the weight of the persons in a class.
• Discrete variables must have an integer
value: e.g tossing a coin, how many times do
we get heads? It can never be 2.7 times, it will
have to be 1,2,3,…n

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
TEST

• Which of the following statements are true?


– I. All variables can be classified as quantitative or
categorical variables.
– II. Categorical variables can be continuous
variables.
– III. Quantitative variables can be discrete
variables.
• Answer: I and III are correct

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
TEST

• Which of the following statements are true?


– I. All variables can be classified as quantitative or
categorical variables.
– II. Categorical variables can be continuous
variables.
– III. Quantitative variables can be discrete
variables.
• Answer: I and III are correct

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Two Snapshots, Two “states”:
Discrete variables imply sudden moves
from state to state
Continuous variables imply constantly
changing transitions between two
snapshots
Transitions can be cut up in discrete
states
But many transitions are really
continuous
Example:
Students leaving school and
entering the Labor Market
Later we will discuss how this fits in
Markov chains and the manpower model
But let’s go back to the issues of
Data Collection
Methods Of Data Collection

• There are four main methods of data


collection.
• Census. A census is a study that obtains data
from every member of a population. In most
studies, a census is not practical, because of
the cost and/or time required.
• Sample survey. A sample survey is a study
that obtains data from a subset of a
population, in order to estimate population
attributes.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Methods of Data Collection (Cont’d)

• Experiment. An experiment is a controlled


study in which the researcher attempts to
understand cause-and-effect relationships.
• Observational study. The researcher is not
able to control (1) how subjects are assigned
to groups and/or (2) which treatments each
group receives.
• (Case Studies are observations of one case.)
• Note: Observational Studies do NOT allow
you to generalize the findings.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Why do Sample Surveys?

• The reason for conducting a sample survey is


to estimate the value of some attribute of a
population.

• It is much cheaper and easier than doing a


whole census

• When done scientifically, we can define the


error term accurately (e.g. ±3%)

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Pros and Cons

• Resources. A well-designed sample survey


can provide very precise estimates of
population parameters - quicker, cheaper, and
with less manpower than a census.
• Generalizability. Applying findings from a
study to a larger population. Generalizability
requires random selection.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Pros and Cons (continued)

• Causal inference. Cause-and-effect


relationships can be teased out when
subjects are randomly assigned to groups.
• Therefore, experiments, which allow the
researcher to control assignment of subjects
to treatment groups, are the best method for
investigating causal relationships

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
We will have a lot more to say on
Experimental Designs later.
We must distinguish between
the sample statistic
and
the population parameter
From Population To Sample To Population:
(From Sample Statistic To Population Parameter)

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001


Population Parameter vs.
Sample Statistic
• Population parameter. A population
parameter is the true value of a population
attribute.

• Sample statistic. A sample statistic is an


estimate, based on sample data, of a
population parameter.

• The estimate comes with the error term (e.g.


±3%)

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Example Of Population Parameter vs.
Sample Statistic

• Example. We want to know the percentage of


voters that favor a new tax.
– The actual percentage of all the voters is a population
parameter.
– The estimate of that percentage, based on sample data,
is a sample statistic.
• The quality of a sample statistic (i.e., accuracy,
precision, representativeness) is strongly
affected by the way that sample observations are
chosen; that is, by the sampling method.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Bad Surveys make for bad estimates
Estimates of the front runners in the
Egyptian Presidential Election 2012
• Before the first • After the first
Round: Round:

1. Abdel Moneim 1. Mohamed Morsi


Aboulfotouh 2. Ahmed Shafik
2. Amr Moussa 3. Hamdein Sabahi
3. Mohamed Morsi 4. Abdel Moneim
4. Hamdein Sabahi Aboulfotouh
5. Ahmed Shafik 5. Amr Moussa
The US 1948 Presidential Election:
Truman vs. Dewey
Bad (Inaccurate) Polls
What does it mean to say: “the poll says
52% (±3%) at 95% confidence level?”
• The 52% is the finding from the sample
survey
• The Error term (±3%) is related to the
Sampling error: it means that we think the
real value is between 49% and 55%
• The 95 % confidence level means that
there are 95 chances in 100 that these
values are correct; i.e. that the real figures
in the population will fall in that range.
• The error term will vary according to the
size of sample.
What is sampling error?
(The margin of error, or the ± 3%)

• Sampling Error is the calculated statistical


imprecision due to interviewing a random
sample instead of the entire population.
• The margin of error provides an estimate of how
much the results of the sample may differ due to
chance when compared to what would have
been found if the entire population was
interviewed.
• The confidence level (95 % or 95 out of 100) says
that we are that confident in that result within
that ± error term.
Sampling error

• Sampling error is related to sample size, but


it is not the only kind of error possible in a
sample surveys.

• You can look it up in sampling error tables


such as the one I can show you here

• This table is produced by Gallup for a sample


from a target population of 200 million, with a
confidence level of 95%
Recommended allowance for sampling error of a percentage *
In Percentage Points (at 95 in 100 confidence level)**
SAMPLE SIZE
1,000 750 500 250 100
Percentage near 10 2% 2% 3% 4% 6%
Percentage near 20 3 3 4 5 9
Percentage near 30 3 4 4 6 10
Percentage near 40 3 4 5 7 10
Percentage near 50 3 4 5 7 11
Percentage near 60 3 4 5 7 10
Percentage near 70 3 4 4 6 10
Percentage near 80 3 3 4 5 9
Percentage near 90 2 2 3 4 6
Table extracted from 'The Gallup Poll Monthly'. Cited at
http://www.ropercenter.uconn.edu/education/polling_fundamentals_error.html
An Important Observation:
Statistical Error and sample size
• As the sample size increases, there are
diminishing returns in percentage error.
• At percentages near 50%, the statistical
error drops from 7 to 5% as the sample
size is increased from 250 to 500.
• But, if the sample size is increased from
750 to 1,000, the statistical error drops
from 4 to 3%.
• As the sample size rises above 1,000, the
decrease in marginal returns is even more
noticeable.
Among others, Langer Research Associates
offers a margin-of-error calculator -- MoE
Machine -- as a convenient tool for data
producers and everyday data users. Access
the MoE Machine at
http://langerresearch.com/moe.php.
So, let’s learn more about surveys
and sampling…
Types of Samples
What is a Survey?

• A survey may refer to many different types or


techniques of observation, but it most often
involves a questionnaire used to measure
the characteristics and/or attitudes of people.

• Since we do not do a coverage of all the


population we select a sample.

• Different ways of contacting members of a


sample once they have been selected is the
subject of survey data collection.
What is Survey Sampling?

• In statistics, survey sampling describes the


process of selecting a sample of elements
from a target population in order to conduct
a survey.
• The purpose of sampling is to reduce the
cost and/or the amount of work that it would
take to survey the entire target population.
• A survey that measures the entire target
population is called a census.
Sampling
Two Kinds of Survey Samples

Non-Probability samples
and
Probability samples
Sampling Methods

• Non-probability samples. We do not know


the probability that each population element
will be chosen, and/or we cannot be sure that
each population element has a non-zero
chance of being chosen.

• Probability samples. Each population


element has a known (non-zero) chance of
being chosen for the sample.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Non-Probability Sampling
Pros & cons of Non-Probability Sampling

• Advantages: convenience and cost.


• Disadvantage: We cannot estimate the extent
to which sample statistics are likely to differ
from population parameters.
• Only probability sampling methods permit
that kind of analysis.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Two of the main types of non-probability
sampling methods
• Voluntary sample. People who self-select into the
survey. Often, these folks have a strong interest in
the main topic of the survey. E.g. those who call in to
talk show, or participate in an on-line poll. This would
be a volunteer sample.
• Convenience sample. A convenience sample is made
up of people who are easy to reach. E.g. interviewing
my students or my employees or shoppers at a local
mall, If the group or the location was chosen
because it was a convenient this would be a
convenience sample.
• Note: Neither allows generalization to the population.
Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Non-probability Sample Surveys

• Surveys that are not based on probability


sampling have no way of measuring their
bias or sampling error.

• Surveys based on non-probability samples


are not externally valid. You cannot
generalize from them to the general
population. They can only be said to be
representative of the people that have
actually completed the survey.
Non-Probability Samples

• The relationship between the target


population and the survey sample is
immeasurable and potential bias is
unknowable.
• Sophisticated users of non-probability
survey samples tend to view the survey as
an experimental condition, rather than a tool
for population measurement
• Analysts examine the results for internally
consistent relationships.
Examples Of Non-Probability Samples

• Judgment Samples: A researcher decides


which population members to include in the
sample based on his or her judgment. The
researcher may provide some alternative
justification for the representativeness of the
sample.
• Snowball Samples: Often used when a target
population is rare, members of the target
population recruit other members of the
population for the survey.
Examples Of Non-Probability Samples

• Quota Samples: The sample is designed to


include a designated number of people with
certain specified characteristics. For
example, 100 coffee drinkers. This type of
sampling is common in non-probability
market research surveys.
• Convenience Samples: The sample is
composed of whatever persons can be most
easily accessed to fill out the survey.
Probability Sampling
Probability samples are the only ones
whose results will be generalizable to the
entire population
Random Samples
Ronald Fisher (1890-1962)
Extract from table of random numbers
Main types of probability sampling

• Simple random sampling,


• Stratified sampling,
• Cluster sampling,
• Multistage sampling, and
• Systematic random sampling.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Probability Samples are representative

• The key benefit of all these probability


sampling methods is that they guarantee that
the sample chosen is representative of the
population. This ensures that the statistical
conclusions will be valid.

Hence the conclusions are generalizable

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Simple Random sampling

• The population consists of N objects.


• The sample consists of n objects.
• If all possible samples of n objects are
equally likely to occur, the sampling method
is called simple random sampling.

• Selection is done by a lottery method or


using a table of random number or a
computerized random number generator.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Stratified Sampling

• Stratified sampling. The population is divided


into groups, based on some characteristic.
• The groups are called strata.
• Then, within each group, a probability sample
(often a simple random sample) is selected.

• As a example, suppose we conduct a national


survey. We might divide the population into
groups or strata, based on geography - north,
east, south, and west. Then, within each stratum,
we might randomly select survey respondents.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Cluster sampling

• Cluster sampling. With cluster sampling, every


member of the population is assigned to one,
and only one, group. Each group is called a
cluster.
• A sample of clusters is chosen, using a
probability method (often simple random
sampling).
• Only individuals within sampled clusters are
surveyed.
• E.g. select a sample of BA units, survey all the
staff in these units.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Multistage sampling.

• Multistage sampling. With multistage


sampling, we select a sample by using
combinations of different sampling methods.

• For example, in Stage 1, we might use cluster


sampling to choose clusters from a
population. Then, in Stage 2, we might use
simple random sampling to select a subset of
elements from each chosen cluster for the
final sample.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Systematic random sampling.

• Systematic random sampling. With systematic


random sampling, we create a list of every
member of the population. From the list, we
randomly select the first sample element from
the first k elements on the population list.
Thereafter, we select every kth element on the
list.

• This method is different from simple random


sampling since every possible sample of n
elements is not equally likely.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
How To Select A
Probability Sample
How to select a probability sample
Probability Sampling

• A probability-based survey sample is created


by constructing a list of the target
population, called the sample frame, a
randomized process for selecting units from
the sample frame, called a selection
procedure, and a method of contacting
selected units to and enabling them
complete the survey, called a data collection
method or mode.
Probability Sampling: Step 1
• Construct a Sample frame: A probability-based
survey sample is created by constructing a list
of the target population, called the sample
frame.

• For some target populations this process may


be easy, for example, sampling the employees of
a company by using payroll list.

• However, in large, disorganized populations


simply constructing a suitable sample frame is
often a complex and expensive task.
Probability Sampling: Step 2

• Selecting a sample from within the Sample


frame:
• a randomized process for selecting units from
the sample frame, called a selection procedure.

• Common methods of conducting a probability


sample of the household population in the
United States are Area Probability Sampling,
Random Digit Dial telephone sampling, and
more recently Address-Based Sampling.
Specialized Techniques Of Probability
Sampling
• Within probability sampling there are
specialized techniques such as:
– stratified sampling &
– cluster sampling
• These techniques improve the precision or
efficiency of the sampling process without
altering the fundamental principles of
probability sampling.
Probability Sampling: Step 3

• Collecting the Data:


• There must be a method of contacting
selected units to and enabling them
complete the survey, called a data collection
method or mode.
Sources Of Bias
Major Types of Bias In Surveys

• Non-response bias

• Coverage bias

• Selection bias
Major Types of Bias In Surveys

• Non-response bias

• Coverage bias

• Selection bias
Major Types of Bias In Surveys

• Non-response bias: When individuals or


households selected in the survey sample
cannot or will not complete the survey there
is the potential for bias to result from this
non-response. Non-response bias occurs
when the observed value deviates from the
population parameter due to differences
between respondents and non-respondents.
Major Types of Bias In Surveys

• Non-response bias

• Coverage bias

• Selection bias
Major Types of Bias In Surveys

• Coverage bias: Coverage bias can occur


when population members do not appear in
the sample frame (undercoverage). Coverage
bias occurs when the observed value
deviates from the population parameter due
to differences between covered and non-
covered units. Telephone surveys suffer from
a well known source of coverage bias
because they cannot include households
without telephones.
Major Types of Bias In Surveys

• Non-response bias

• Coverage bias

• Selection bias
Major Types of Bias In Surveys

• Selection Bias: Selection bias occurs when


some units have a differing probability of
selection that is unaccounted for by the
researcher. For example, some households
have multiple phone numbers making them
more likely to be selected in a telephone
survey than households with only one phone
number. This selection bias would be
corrected by applying a survey weight equal
to [1/(# of phone numbers)] to each
household.
But how you select your sample
is only one of the issues in doing survey
research
Bias Due to Measurement Error

• In survey research, the measurement process


includes the environment in which the survey
is conducted, the way that questions are
asked, and the state of the survey
respondent.

• Response bias refers to the bias that results


from problems in the measurement process.
Some examples of response bias:

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Examples of Response Bias
(Due to error in the Measurement process)

• Leading questions. The wording of the


question may be loaded in some way to
unduly favor one response over another. For
example, a satisfaction survey may ask the
respondent to indicate where she is satisfied,
dissatisfied, or very dissatisfied.
• By giving the respondent one response
option to express satisfaction and two
response options to express dissatisfaction,
this survey question is biased toward getting
a dissatisfied response.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Examples of Response Bias – Cont’d
(Due to error in the Measurement process)

• Social desirability. Most people like to


present themselves in a favorable light, so
they will be reluctant to admit to unsavory
attitudes or illegal activities in a survey,
particularly if survey results are not
confidential. Instead, their responses may be
biased toward what they believe is socially
desirable.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Sampling Statistic and Sampling Error

• A survey produces a sample statistic, which is used


to estimate a population parameter. If you repeated a
survey many times, using different samples each
time, you might get a different sample statistic with
each replication. And each of the different sample
statistics would be an estimate for the same
population parameter.

• If the statistic is unbiased, the average of all the


statistics from all possible samples will equal the
true population parameter; even though any
individual statistic may differ from the population
parameter. The variability among statistics from
different samples is called sampling error.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
Increasing The Sample size:
Reduces Sampling Error but NOT Survey Bias
• Increasing the sample size tends to reduce the
sampling error; that is, it makes the sample statistic
less variable. However, increasing sample size does
not affect survey bias.

• A large sample size cannot correct for the


methodological problems (undercoverage,
nonresponse bias, etc.) that produce survey bias.

• Example: The Literary Digest Survey sample size was


very large - over 2 million surveys were completed;
but the large sample size could not overcome
problems with the sample - undercoverage and
nonresponse bias.

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
The Null Hypothesis &
Types of Error
To analyze survey data and arrive at a
conclusion, we need to formulate a
Null Hypothesis
Null Hypothesis

• It is usually a statement that can be falsified


and whose acceptance or rejection yields a
useful insight into the problem being studied
and for which the data was collected.

• The null hypothesis is a hypothesis which


the researcher tries to disprove, reject or
nullify.

• It is symbolized by H0
The first to formalize the notion of the
“Null Hypothesis”

Ronald Fisher (1890-1962)


How do you state your basic (null)
Hypothesis?
Usually:
the normal state
(don’t worry, no effect, no change)
Or:
there is no difference between expected
and observed (i.e. difference is due to
chance only)
How do you state your basic (null)
Hypothesis?
Usually:
the normal state
(don’t worry, no effect, no change)
Or:
there is no difference between expected
and observed (i.e. difference is due to
chance only)
One-tailed or Two-tailed Tests

• One-Tailed :

Accept H0 Reject H0

• Two Tailed:

Reject H0 Accept H0 Reject H0


Usually:
No directionality: use two-tailed test
Directionality: use one-tailed test
The Null Hypothesis identifies which kind of
test is needed: One tailed or two-tailed

• In classical science, it is most typically the


H0 statement that there is no effect of a
particular treatment; in observations, it is
typically that there is no difference between
the value of a particular measured variable
and that of a prediction, or between two
means. We use a two-tailed test
• But when there is Directionality, i.e. when we
say that it is better than, bigger than or less
than, we use a One-Tailed Test.
BUT:
In Accepting or rejecting the Null Hypothesis
we could be making
Two different types of error
Type I error:
(False Positive)
• Test says: This person is healthy Reality:
This person has cancer

• Test says: This person is not guilty


• Reality: This person is guilty

• Test Says: This product is faulty


• Reality: This product is good
Type II error:
(False Negative)
• Test says: This person has cancer
• Reality: This person is healthy

• Test says: This person is guilty


• Reality: This person is not guilty

• Test Says: This product is good


• Reality: This product is faulty
Type I & Type II Error

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001


Two other kinds of error:

• In 1948, Frederick Mosteller (1916-2006) Type


III error: "correctly rejecting the null
hypothesis for the wrong reason". (1948,
p.61)
Two other kinds of error:

• In 1970, Marascuilo and Levin proposed a


"fourth kind of error" -- a "Type IV error" –
defined as being the mistake of "the
incorrect interpretation of a correctly
rejected hypothesis";
• which, they suggested, was the equivalent of
"a physician's correct diagnosis of an
ailment followed by the prescription of a
wrong medicine" (1970, p.398).
Other risks of error:
This is in addition to many other risks:
• Correctly specifying the problem
• Sampling design
• Experimental or quasi-experimental
designs
• Correctly understanding the kind of data
and its limitations
• Correctly specifying the type of statistical
analysis
• Correctly interpreting the results
Calculation &
Conclusions
Conclusion of the statistical analysis is
to accept/reject the Null Hypothesis
Type I & Type II Error

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001


Type I & Type II Errors

Source: http://stattrek.com/statistics/data-collection-methods.aspx?Tutorial=AP
More samples means more accurate
estimation of the population parameter

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001


How to refer to significance level of a test
(all these statements are equivalent)

You should be familiar with these expressions


Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001
Tips to Help Avoid Common Mistakes

• Remember to convert between variance and


standard deviation.
• Check if hypothesis is one- or two-tailed.
For two-tailed, split α to ⁄ .
• Always use n - 1 degrees of freedom for one
sample t-test.
• Keep statistics ( ̅ , s) distinct from population
parameters ( , α).
Choosing the significance level for
a test
• Remember: the smaller the significance level p ( say
0.01 rather than 0.05), the more stringent the test.
• Choose the level based on:
– Sample size
– Estimated size of the effect being tested
– Consequences of making a mistake
• Common Significance levels:
– .05 (1 chance in 20);
– .01 (1 chance in a hundred) or
– .001 (1 chance in a thousand)

688
Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001
Choosing the significance level for a test

• Remember: the smaller the significance level


p ( say 0.01 rather than 0.05), the more
stringent the test.
• Choose the level based on:
– Sample size
– Estimated size of the effect being tested
– Consequences of making a mistake
• Common Significance levels:
– .05 (1 chance in 20);
– .01 (1 chance in a hundred) or
– .001 (1 chance in a thousand)
Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001
Common Mistakes

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001


Lets take a few simple examples of a
calculation
Remember: the normal (Gaussian)
distribution, the Bell Curve…
It has a mean, and a standard deviation.
The standard deviation defines how
“spread out” the distribution is:
Remember:
The sample statistic (measured)
is only an estimate for
the Population parameter (inferred)

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001


Common Statistical Notation

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001


Numerical Measures (Formulae)
∑ ⋯
Mean: = =
∑ ∑ ∑
Variance: s2 = =

Standard Error of the Mean: =


Median: the middle value of ordered values
Nth percentile: the value such that N% of
ordered values lie below it

696
Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001
Assume that we have the mean of a
distribution. We need to find the
standard deviation (or its square:
the variance)
The Variance is the square of the
Standard Deviation
Calculating the Variance and the
standard deviation
• The formula for calculating the
variance:
∑ −
=

• The Standard deviationis given by:

699
Example: calculating Variance and
Standard Deviation
For example, using these six measures
3,9,1,2,5 and 4:
∑ = 3 + 9 + 1 + 2 + 5 + 4 = 24
∑ =3 +9 +1 +2 +5 +4
= 9 + 81 + 1 + 4 + 25 + 16 = 136
The quantities are the substituted into the
shortcut formulate to find ∑ − .

∑ − ̅ =∑ −
24
= 136 −
6 700
Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001
Example: calculating Variance and
Standard Deviation
#$"
= !" − = %&
"
The variance and standard deviation are now
found as before:
∑ − %&
= = ='
− #

= = ' = .' '

701
Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001
We will say more about the
standard deviation and the
variance in a moment
Understanding What Is
Behind A Formula
Clear thinking about statistics:
understanding what is behind the
formula
. the logic behind a
• I want you to understand
formula. You do not need to memorize any
formula. You do that by asking questions….
• For example, let’s look at the formula for
computing the sample variance:
*

) = + , −
* −
,-

• Let’s ask why this? and why that?

705
Why do we square the deviations
from the mean?
1
1
. = + 0 − ̅
/ −1
0-2

706
Why do we square the deviations
from the mean?
1
1
. = + 0 − ̅
/ −1
0-2
• Because, if we add up all deviations, we get
always zero value.
• So, to deal with this problem, we square the
deviations.
• Bonus: Notice that squaring also magnifies
the deviations; therefore it helps us better feel
the spread of the data.
707
Why do we square the deviations
from the mean?
1
1
. = + 0 − ̅
/ −1
0-2
• Because, if we add up all deviations, we get
always zero value.
• So, to deal with this problem, we square the
deviations.
• Bonus: Notice that squaring also magnifies
the deviations; therefore it helps us better feel
the spread of the data.
708
Why do we square the deviations
from the mean?
1
1
. = + 0 − ̅
/ −1
0-2
• Because, if we add up all deviations, we get
always zero value.
• So, to deal with this problem, we square the
deviations.
• Bonus: Notice that squaring also magnifies
the deviations; therefore it helps us better feel
the spread of the data.
709
Why not raise to the power of four
(three will not work)?
1
1
. = + 0 − ̅
/ −1
0-2

710
Why not raise to the power of four
(three will not work)?
1
1
. = + 0 − ̅
/ −1
0-2
• Squaring does the trick; why should we
make life more complicated than it is?

711
Why is there a summation notation
in the formula?
1
1
. = + 0 − ̅
/ −1
0-2

712
Why is there a summation notation
in the formula?
1
1
. = + 0 − ̅
/ −1
0-2

• To add up the squared deviation of each data


point to compute the total sum of squared
deviations.

713
Why do we divide the sum of
squares by n-1.
1
1
. = + 0 − ̅
/ −1
0-2

714
Why do we divide the sum of
squares by n-1.
1
1
. = + 0 − ̅
/ −1
0-2
• The amount of deviation should reflect also
how large the sample is; so we must bring in
the sample size.
• Why? Because, in general, larger sample
sizes have larger sum of square deviation
from the mean.
715
Why do we divide the sum of
squares by n-1.
1
1
. = + 0 − ̅
/ −1
0-2
• The amount of deviation should reflect also
how large the sample is; so we must bring in
the sample size.
• Why? Because, in general, larger sample
sizes have larger sum of square deviation
from the mean.
716
Why divide by n-1 not n?
1
1
. = + 0 − ̅
/ −1
0-2

717
Why divide by n-1 not n?
1
1
. = + 0 − ̅
/ −1
0-2
• When you divide by n-1, the sample's
variance provides an estimated variance
much closer to the population variance, than
when you divide by n.
• But for larger samples, (say over 30), it really
does not matter whether it is divided by n or
n-1. The results are almost the same, and
they are acceptable. 718
Why divide by n-1 not n?
1
1
. = + 0 − ̅
/ −1
0-2
• When you divide by n-1, the sample's
variance provides an estimated variance
much closer to the population variance, than
when you divide by n.
• But for larger samples, (say over 30), it really
does not matter whether it is divided by n or
n-1. The results are almost the same, and
they are acceptable. 719
Does N-1 have a Meaning?

1
1
. = + 0 − ̅
/ −1
0-2

720
Does N-1 have a Meaning?

1
1
. = + 0 − ̅
/ −1
0-2
• The factor n-1 is what we consider as the
"degrees of freedom" (but that is another
discussion).
• Degrees of freedom is the number of values
in the final calculation of a statistic that are
free to vary.
721
Does N-1 have a Meaning?

1
1
. = + 0 − ̅
/ −1
0-2
• The factor n-1 is what we consider as the
"degrees of freedom" (but that is another
discussion).
• Degrees of freedom is the number of values
in the final calculation of a statistic that are
free to vary.
722
Explain number of values that are
allowed to vary
1
1
. = + 0 − ̅
/ −1
0-2

723
Explain number of values that are
allowed to vary
1
1
. = + 0 − ̅
/ −1
0-2

• For example, if we have two observations,


when calculating the mean we have two
independent observations;
• however, when calculating the variance, we
have only one independent observation,
since the two observations are equally
distant from the mean. 724
Explain number of values that are
allowed to vary
1
1
. = + 0 − ̅
/ −1
0-2

• For example, if we have two observations,


when calculating the mean we have two
independent observations;
• however, when calculating the variance, we
have only one independent observation,
since the two observations are equally
distant from the mean. 725
Degrees of Freedom
• The number of independent pieces of
information that go into the estimate of a
parameter is called the degrees of freedom
(df).
• So for calculating the mean of the sample,
we have all the observations in the sample
size (n).
• But to calculate the distance from the mean,
you have one less. Why?
• If you have two observations, they will be
both at the same distance from the mean.
This example shows how to question
statistical formulas.
To help you understand them rather
than memorizing them.
Then you can use the concepts
better.
Clear thinking is always more
important than the ability to
calculate something.
Clear Thinking
Social surveys
• Framing the Issues
• Identifying the target population
• Sample Frame and Sample design
• Instrument design
• Gathering data
• Analyzing data
• Interpreting Results
That is done within the framework of
a research design
Applications
• Market research
• Opinion poll
• Voting expectations
• Educational or Health studies
• Sociological studies
• Medical clinical studies
And so much more…
Examples of US/UK Major surveys

• National Election Studies


• Gallup poll
• General Social Survey
• International Social Survey
• United Kingdom Census
• United States Census
• National Health and Nutrition
Examination Survey
• World Values Survey
Again:
Clear thinking is always more
important than the ability to calculate
something.
So, One More Time…
With Clear thinking you will not be a
turkey…
You will learn to fly…
Some will even soar like an eagle
Thank You

You might also like