Introduction To Statistics: February 21, 2006

Introduction to Statistics
February 21, 2006

Statistics and Research Design
• Statistics: Theory and method of

analyzing quantitative data from samples
of observations … to help make
decisions about hypothesized relations.
– Tools used in research design
• Research Design: Plan and structure of

the investigation so as to answer the
research questions (or hypotheses)
Statistics and Research Design
• Analogy:
– Research design is the blueprint of the study.
– In quantitative designs, statistical design and

procedures are the craft and tools used to conduct
quantitative studies.
– The logic of hypothesis testing is the decision-

making process that links statistical design to
research design.
Statistics
• There are two types of statistics
– Descriptive Statistics: involve tabulating,

depicting, and describing data
– Inferential Statistics: predicts or estimates

characteristics of a population from a
knowledge of the characteristics of only a
sample of the population
Descriptive Statistics
Scales of Measurement
– Nominal
• No numerical or quantitative properties. A way to
classify groups or categories.
• Gender: Male and Female
• Major: RC or PH
– Ordinal
• Used to rank and order the levels of the variable
being studied. No particular value is placed
between the numbers in the rating scale.
• Movie Ratings: 4 Stars, 3 Stars, 2 Stars, and 1 Star
Scales of Measurement Cont.

– Interval
• Difference between the numbers on the scale is meaningful
and intervals are equal in size. NO absolute zero.
• Allows for comparisons between things being measured
• Temperatures on a thermometer: The difference between 60
and 70 is the same as the difference between 90 and 100. You
cannot say that 70 degrees is twice as hot as 35 degrees, it is
only 35 degrees warmer.
– Ratio
• Scales that do have an absolute zero point than indicated the
absence of the variable being studied. Can form ratios.
• Weight: 100 pounds is ½ of 200.
• Time
• Frequency Distributions
– In tables, the frequency distribution is

constructed by summarizing data in terms of
the number or frequency of observations in
each category, score, or score interval
– In graphs, the data can be concisely

summarized into bar graphs, histograms, or
frequency polygons
– Normal Curve – Bimodal Curve

– Positively Skewed – Negatively

Skewed
• Measures of Central Tendency

– Mode
• The most frequently occurring score
• 3 3 3 4 4 4 5 5 5 6 6 6 6: Mode is 6
• 3 3 3 4 4 4 5 5 6 6 7 7 8: Mode is 3 and 4
– Median
• The score that divides a group of scores in half with 50% falling above
and 50% falling below the median.
• 3 3 3 5 8 8 8: The median is 5
• 3 3 5 6: The median is 4 (Average of two middle numbers)
– Mean
• Preferred whenever possible and is the only measure of central
tendency that is used in advanced statistical calculations:
– More reliable and accurate
– Better suited to arithmetic calculations
• Basically, and average of all scores. Add up all scores and divide by
total number of scores.
• 2 3 4 6 10: Mean is 5 (25/5)
• Measures of Central Tendency

– Your Turn!
– Mode
• Example: 2 3 4 4 4 6 8 9 10 11 11
– Median
• Example: 2 3 4 4 4 6 8 9 10 11 11
– Mean
• Example: 2 3 4 4 4 6 8 9 10 11 11
• Measures of Variability (Dispersion)

– Range
• Calculated by subtracting the lowest score from the highest
score.
• Used only for Ordinal, Interval, and Ratio scales as the data
must be ordered
– Example: 2 3 4 6 8 11 24 (Range is 22)
– Variance
• The extent to which individual scores in a distribution of scores
differ from one another
– Standard Deviation
• The square root of the variance
• Most widely used measure to describe the dispersion among a
set of observations in a distribution.
• Standard Scores: Z-Scores and T-Scores

– Z-Scores
• Most widely used standard score in statistics
– It is the number of standard deviations above or below the
mean.
• A Z score of 1.5 means that the score is 1.5 standard
deviations above the mean; a Z score of -1.5 means that
the score is 1.5 standard deviations below the mean
• Always have the same meaning in all distributions
• To find a percentile rank, first convert to a Z score and
then find percentile rank off a normal-curve table
• Standard Scores: Z-Scores and T-Scores

– T-Scores
• Most commonly used standard score for
reporting performance
• May be converted from Z-scores and are always
rounded to two figures; therefore, eliminating
decimals
• Always reported in positive numbers
• The mean is always 50 and the standard
deviation is always 10.
– A T-score of 70 is 2 SDs above the mean
– A T-score of 20 is 3 SDs below the mean
• Correlation or Covariation
– A correlation coefficient is a statistical summary of the

degree or magnitude and direction of the relationship or
association between two variables
– It is possible to have a negative or positive correlation
• Linear Regression
– The purpose of a regression equation is to make
predictions on a new sample of observations from the
findings on a previous sample
Inferential Statistics: Sampling
• Sampling relates to the degree to which those

surveyed are representative of a specific
population
• The sample frame is the set of people who have

the chance to respond to the survey
• A question related to external validity is the

degree to which the sample frame corresponds to
the population to which the researcher wants to
apply the results (Fowler, 1988)
Sampling
• Two basic types: probability and non-

probability
• Probability sampling can include random

sampling, stratified random sampling, and
cluster sampling
• Non-probability sampling can include quota

sampling, haphazard sampling, and
convenience sampling
Random Sampling
• Every unit has an equal chance of

selection
• Although it is relatively simple, members

of specific subgroups may not be
included in appropriate proportions
Stratified Random Sampling
• The population is grouped according to

meaningful characteristics or strata
• This method is more likely to reflect the

general population, and subgroup
analysis is possible
• However, it can be time consuming and

costly
Systematic Sampling
• Every xth unit is selected

– (e.g., every other person entering the Swamp at
Gate 1 was selected)
• The method is convenient and close to random

sampling if the starting point is randomly
chosen
• Recurring patterns can occur and should be

examined
Cluster/Multistage Sampling
• Natural groups are sampled and then

their members are sampled
• This method is convenient and can use

existing units
Convenience Sampling
• This method uses readily available groups or units

of individuals
• It is practical and easy to use
• However, it may produce a biased sample
• Convenience sampling can be perfectly acceptable

if the purpose of the research is to test a hypothesis
that certain variables are related to one another
Snowball Sampling
• Previously identified members identify

others
• This method is useful when a list of

potential names is difficult to obtain
• However, it may produce a biased

sample
Quota Sampling
• The population is divided into subgroups

and the sample is selected based on the
proportions of the subgroups necessary
to represent the population
• This method depends on reliable data

about the proportions in the population
Statistics & Parameters
– A parameter is a value, usually unknown (and

which therefore has to be estimated), used to
represent a certain population characteristic. For
example, the population mean is a parameter that is
often used to indicate the average value of a
quantity
– A statistic is a quantity that is calculated from a

sample of data. It is used to give information about
unknown values in the corresponding population.
For example, the average of the data in a sample is
used to give information about the overall average
in the population from which that sample was
drawn.
Sampling Distribution
• The sampling distribution describes

probabilities associated with a statistic
when a random sample is drawn from a
population
Response Rates
• Whatever the sampling technique, response rates

and non-response bias must be considered
http://content.apa.org/journals/pro/32/3/248.html
• Lowered response rates introduce bias into the

sample
• In cases of low response rates, people who

respond to the survey are likely to be
systematically different from people who do not
respond to the sample
Response Rates
• In mail surveys, the results of non-response bias can be

examined by comparing those who respond early with those
who respond after follow up
• Most government-sponsored surveys require response rates

of 75%
• For mail surveys, post-cards, follow-up letters, and

telephone calls are used to increase the response rates
(Fowler, 1988)
• According to Babbie (1989), a response rate of 70% is very

good, 60% is good, and 50% is adequate
Inferential Statistics
• Interval Estimate
– A range or band within which the parameter is
thought to lie, instead of a single point or value as
the estimate of the parameter
• Sampling Distributions
– The sampling distribution of the mean is a

frequency distribution, not of observations, but of
means of samples, each based on n observations.
– The standard error of the mean is used as an

estimate of the magnitude of sampling error. It is
the standard deviation of the sampling distribution
of the sample means.
• Confidence Intervals
– Same as the percentage of cases in a normal
distribution that lie within 1, 2, or 3 standard
deviations from the mean
• Central Limit Theorem

– States that the distribution of samples (means,
medians, variances, and most other statistical
measures) approaches a normal distribution as the
sample size, n, increases
• Hypothesis Testing – will cover next.

• Types of Statistical Analysis - Descriptive
– Quantify the degree of relationship between

variables
– Parametric tests are used to test hypotheses with

stringent assumptions about observations
• e.g., t-test, ANOVA
– Nonparametric tests are used with data in a nominal

or ordinal scale
• e.g., Chi-Square, Mann-Whitney U, Wilcoxon
• Types of Statistical Analysis - Inferential
– Allow generalization about populations using data from

samples
– Non-parametric
• Non-parametric tests do not require any assumptions about normal
distribution, but are generally less sensitive than parametric tests.
• The test for nominal data is the Chi-Square test
• The tests for ordinal data are the Kolmogorov-Smirnov test, the
Mann-Whitney U test, and the Wilcoxon Matched-Pairs Signed-
Ranks test
– Parametric
• The tests for interval and ratio data include the t-test, ANOVA,
ANCOVA, and Post-Hoc ANOVA tests

Introduction To Statistics: February 21, 2006

Uploaded by

Copyright:

Available Formats

Introduction To Statistics: February 21, 2006

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Statistics: February 21, 2006

Uploaded by

Copyright:

Available Formats

Introduction to Statistics

February 21, 2006

• Statistics: Theory and method of

• Research Design: Plan and structure of

– Research design is the blueprint of the study.

– In quantitative designs, statistical design and

– The logic of hypothesis testing is the decision-

• There are two types of statistics

– Descriptive Statistics: involve tabulating,

– Inferential Statistics: predicts or estimates

Scales of Measurement Cont.

– In tables, the frequency distribution is

– In graphs, the data can be concisely

– Normal Curve – Bimodal Curve

– Positively Skewed – Negatively

• Measures of Central Tendency

• Measures of Central Tendency

• Measures of Variability (Dispersion)

• Standard Scores: Z-Scores and T-Scores

• Standard Scores: Z-Scores and T-Scores

– A correlation coefficient is a statistical summary of the

– It is possible to have a negative or positive correlation

• Sampling relates to the degree to which those

• The sample frame is the set of people who have

• A question related to external validity is the

• Two basic types: probability and non-

• Probability sampling can include random

• Non-probability sampling can include quota

• Every unit has an equal chance of

• Although it is relatively simple, members

• The population is grouped according to

• This method is more likely to reflect the

• However, it can be time consuming and

• Every xth unit is selected

• The method is convenient and close to random

• Recurring patterns can occur and should be

• Natural groups are sampled and then

• This method is convenient and can use

• This method uses readily available groups or units

• It is practical and easy to use

• However, it may produce a biased sample

• Convenience sampling can be perfectly acceptable

• Previously identified members identify

• This method is useful when a list of

• However, it may produce a biased

• The population is divided into subgroups

• This method depends on reliable data

– A parameter is a value, usually unknown (and

– A statistic is a quantity that is calculated from a

• The sampling distribution describes

• Whatever the sampling technique, response rates

• Lowered response rates introduce bias into the

• In cases of low response rates, people who

• In mail surveys, the results of non-response bias can be

• Most government-sponsored surveys require response rates

• For mail surveys, post-cards, follow-up letters, and

• According to Babbie (1989), a response rate of 70% is very

– The sampling distribution of the mean is a