Testing For Normality and Transforming Data
Testing For Normality and Transforming Data
and
Transforming data
How to tell whether data is
appropriate for parametric tests?
IF DATA
NOT normally distributed
can utilize
Parametric Tests
POWERFUL & ROBUST
IF DATA
NOT normally distributed
are also called distribution-free tests because they don’t assume that your
data follow a specific distribution.
ASSIGNMENT: Give examples of data sets/analysis using 1 sample t test, 2 samples t test,
1 way anova, 2 way anova, chi square test.
SKEWNESS
“asymmetry”
KURTOSIS
“height of hump”
MODALITY
“no. of humps”
TEST :
“shape” of distribution can be measured
SKEWNESS
“asymmetry”
KURTOSIS
“height of hump”
MODALITY
“no. of humps”
SKEWNESS: how sample differs in shape
from a symmetrical distribution
sample
skewness
= G1
sample
skewness
= G1
syntax in
= SKEW (number1,[number2],…)
MS Excel
Many software programs compute
adjusted Fisher-Pearson coefficient of SKEWNESS
sample
skewness
= G1
SKEWNESS
“asymmetry”
KURTOSIS
“height of hump”
MODALITY
“no. of humps”
KURTOSIS: extent to which data are distributed
“tails vs. center” of distribution
sample
kurtosis = G2
sample
kurtosis = G2
syntax in
= KURT (number1,[number2],…)
MS Excel
Many software programs compute
excess KURTOSIS: actual value - 3
sample
kurtosis = G2
SKEWNESS
“asymmetry”
KURTOSIS
“height of hump”
MODALITY
“no. of humps”
MODALITY: “mode” is more than just
the “observation value” that occurs most frequently
syntax in
= MODE.MULT (number1,[number2],…)
MS Excel
TEST STATISTIC
INFERENCE
TEST:
“Normality tests” are 1st inferential statistics you will employ
Shapiro-Wilk Test
best “power” when using same probability significance
compared to other two tests.
Anderson-Darling Test
TEST:
“Normality tests” are 1st inferential statistics you will employ
INFERENCE
TEST:
“Normality tests” are 1st inferential statistics you will employ
TEST STATISTIC
SIGNIFICANCE
INFERENCE
TEST:
TAKE NOTE! Null hypothesis are different for the three tests
Histogram
with Kernel Density Estimates 1. State QUESTION
VISUALIZATION
TEST STATISTIC
SIGNIFICANCE
INFERENCE
TEST:
Visualization is as important as the test statistics
The quantile-quantile (q-q) plot is a graphical
technique for determining if two data sets come
from populations with a common distribution.
Quantile-Quantile Plot
1. State QUESTION
VISUALIZATION
TEST STATISTIC
SIGNIFICANCE
INFERENCE
q-q plots of normal data.
Normal Quantile-Quantile Plots:
your data set vs. theoretical normal distribution
points tend to fall points form a curve points fall along a line in
in a straight line Instead of straight line middle of graph, but curve off
in extremities
ASSIGNMENT:
Jarque-Bera VISUALIZATION
TEST STATISTIC
Anderson-Darling SIGNIFICANCE
INFERENCE
TEST:
Take note of the deeper meaning of p-values
VISUALIZATION
It is not
probability of rejecting null hypothesis. TEST STATISTIC
SIGNIFICANCE
INFERENCE
TEST:
Take note of the deeper meaning of p-values
VISUALIZATION
suggests that observed data is inconsistent with
assumption that null hypothesis is true
TEST STATISTIC
SIGNIFICANCE
INFERENCE
SOLUTION:
Maximize possibility of parametric testing
only if data is appropriate
Playing around
“re-expression”
with data
VS.
TRANSFORM :
data transformations are important tools
Playing around
“re-expression”
with data
VS.
Trying different transformations better to use
until one gives
significant result is
transformations that other
researchers commonly use in your field
cheating
TRANSFORM :
Two most common
data transformations in Biology
Log transformation
often useful when high degree of variation within variables or
among attributes within a sample.
Square-root transformation
used for reducing right skewness, and also has advantage of
being applied to zero values.
TRANSFORM :
Log and square root transformations
have different advantages
Log transformation
compresses high values and spreads low values by expressing
the values as orders of magnitude.
Square-root transformation
can convert data from Poisson (discrete) distribution to a
normal (continuous) distribution
The log transformation can
be used to make
highly skewed distributions
less skewed. This can be
valuable both for making
patterns in the data more
interpretable and for helping
to meet the assumptions of
inferential statistics.
X Log10(X)
1 0
10 1
100 2
reduce skewness
The square root, x to x^(1/2) = sqrt(x),
is a transformation with a moderate
effect on distribution shape: it is
weaker than the logarithm and the
cube root. It is also used for reducing
right skewness, and also has the
advantage that it can be applied to
zero values.