Advanced Statistics
Advanced Statistics
Advanced Statistics
Janette Walde
[email protected]
Department of Statistics
University of Innsbruck
Advanced Statistics
Contents
Introduction
Basics/Descriptive Statistics
Scales of measurement
Graphical exploration of data
Descriptive characteristics for a variable
Estimation
Characteristics of an estimator
Confidence interval
Statistical hypothesis testing
Statistical testing principle
Testing errors
Power analysis
Why multivariate analysis?
Advanced Statistics
Introduction
Preliminary comments
1. You will learn to apply statistical tools correctly,
interpret the findings appropriately and get an
idea about the possibilities of analyzing
research questions employing statistics.
2. It is not possible and not worthwhile to learn
all statistical methods in such a course.
However, this course is successful if it enables
you to improve your knowledge in statistical
methods on your own. Therefore this course
gives you profound knowledge about some
statistical analyzing tools and shows you the
correct application of them.
Advanced Statistics
Introduction
Preliminary comments
3. Although knowing the most sophisticated
analyzing instruments one may be confronted
with limits in getting results or finding
appropriate interpretations or applying tools in
the given framework. This has to be accepted
(“If we torture the data long enough, they will
confess.”).
4. Be aware: Never confuse statistical significance
with biological significance.
Advanced Statistics
Basics/Descriptive Statistics
Scales of measurement
Scales of measurement
1. Nominal Scale. Nominal data are attributes like
sex or species, and represent measurement at
its weakest level. We can determine if one
object is different from another, and the only
formal property of nominal scale data is
equivalence.
2. Ranking Scale. Some biological variables
cannot be measured on a numerical scale, but
individuals can be ranked in relation to one
another. Two formal properties occur in
ranking data: equivalence and greater than.
Advanced Statistics
Basics/Descriptive Statistics
Scales of measurement
Scales of measurement
Histogram
Normal distribution Skewed distribution
300 300
250 250
200 200
frequency (density)
frequency (density)
150 150
100 100
50 50
0 0
−4 −3 −2 −1 0 1 2 3 4 0 2 4 6 8 10 12 14 16 18 20
X Y
Advanced Statistics
Basics/Descriptive Statistics
Graphical exploration of data
Box Plot
Normal distribution Skewed distribution
3 18
16
2
14
1
frequency (density)
frequency (density)
12
0 10
8
−1
6
−2
4
−3 2
0
−4
X Y
Advanced Statistics
Basics/Descriptive Statistics
Graphical exploration of data
Q-Q Plot
I Many statistical methods make some
assumptions about the distribution of the data
(e.g. normality).
I The quantile-quantile plot provides a way to
visually investigate such an assumption.
I The QQ-plot shows the theoretical quantiles
versus the empirical quantiles. If the
distribution assumed (theoretical one) is indeed
the correct one, we should observe a straight
line.
Advanced Statistics
Basics/Descriptive Statistics
Graphical exploration of data
Q-Q Plot
Normal Q−Q Plot Normal Q−Q Plot
50
2
40
1
Sample Quantiles
Sample Quantiles
30
0
20
−1
10
0
−2
−2 −1 0 1 2 −2 −1 0 1 2
0.2
0.1
0.0
−4 −2 0 2 4
Advanced Statistics
Basics/Descriptive Statistics
Descriptive characteristics for a variable
Summary Statistic
I Mean, median
I Percentiles, inter quartile range
I Minimum, maximum, range
I Standard deviation, variance
I Coefficient of variation
I Median absolute deviation, mean absolute
deviation
Advanced Statistics
Estimation
Fundamental concepts
Populations must be defined at the start of any
study and this definition should include the spatial
and temporal limits to the inference. The formal
statistical inference is restricted to these limits.
Characteristics of an estimator
A good estimator of a population parameter should
have the following characteristics:
I The estimator should be unbiased, meaning
0.2
0.1
mean of each sample
−0.1
−0.2
−0.3
−0.4
1 2 3 4 5 6 7 8 9 10
number of sample
Advanced Statistics
Estimation
Characteristics of an estimator
−5
1 2 3 4 5 6 7 8 9 10
n = 100
−5
1 2 3 4 5 6 7 8 9 10
n = 10,000
−5
1 2 3 4 5 6 7 8 9 10
Advanced Statistics
Estimation
Characteristics of an estimator
2
distribution of the means
−1
−2
−3
−4
−5
mean estimator median
Advanced Statistics
Estimation
Confidence interval
−5
−10
1 2 3 4 5 6 7 8 9 10
n = 100
10
−5
−10
1 2 3 4 5 6 7 8 9 10
n = 10,000
0.4
0.2
−0.2
−0.4
1 2 3 4 5 6 7 8 9 10
Advanced Statistics
Statistical hypothesis testing
Statistical testing principle
Model: H0 : µ = 0 and Ha : µ 6= 0
Real world: x̄, s
Advanced Statistics
Statistical hypothesis testing
Statistical testing principle
Example
Suppose we have a coin, and that our hypothesis is
that the coin is fair, i.e. that P(head) = P(tail) =
1/2. Suppose we toss a coin n = 25 times and
observe 21 heads. The probability of actually
observing these data under the model is P(21 heads,
4 tails) = 0.0004. It is a very unlikely (but possible)
event to see such data if the model is true. In this
falsification process we employ the interpretation
principle of statistics:
I Statistical power=(1 − β)
Advanced Statistics
Statistical hypothesis testing
Power analysis
Example (cont.)
1. State the hypotheses: let µ denote the mean
percent change:
H0 : µ = 0
Ha : µ > 0
2. Calculate the rejection region: The z test
rejects H0 at the α = 0.05 level whenever:
x̄ − µ0 x̄
z= √ = √ ≥ 1.645
σ/ n 2/ 25
That is we reject H0 when x̄ ≥ 0.658.
Advanced Statistics
Statistical hypothesis testing
Power analysis
Example (cont.)
3. Compute the power at a specific alternative:
The power of the test at alternative µ = 1 is
P(x̄ ≥ 0.658|µa = 1) = 0.8
Plot graph.
4. Statistical power is the probability of rejecting
H0 given population effect size (ES), α and
sample size (n). This calculation also requires
knowledge of the sampling distribution of the
test statistic under the alternative hypothesis:
Power curve.
Advanced Statistics
Statistical hypothesis testing
Power analysis
Example (cont.)
Power function in dependence of the effect size
1
0.9
0.8
0.7
0.6
power=1−β
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
µ −µ
0 a
Advanced Statistics
Statistical hypothesis testing
Power analysis
Further readings
Cohen, J. 1992. A power primer. Psychological
Bulletin 112: 155-159.
Simpson’s Paradox
Engineering Male Female English Male Female
Accept 30 10 Accept 5 10
Refuse entry 30 10 Refuse entry 15 30
Total 60 20 Total 20 40
I No relationship between sex and acceptance for
either programme. So no evidence of
discrimination. Why?
I More females apply for the English programme,
but it is hard to get into. More males applied
to Engineering, which has a higher acceptance
rate than English. Must look deeper than
single cross-tab to find this out!