100% found this document useful (1 vote)
113 views34 pages

The Normal Distribution Is The Distribution

The document discusses how empirical measurements of quantities like size, weight, and concentration are often skewed and positively valued, making the normal distribution inappropriate. It introduces the log-normal distribution, which results when the logarithm of a positive variable has a normal distribution. The log-normal distribution is more suitable than the normal for modeling many natural phenomena where values are restricted to being positive and quantities are often multiplied rather than added.

Uploaded by

Kevin Pineda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
113 views34 pages

The Normal Distribution Is The Distribution

The document discusses how empirical measurements of quantities like size, weight, and concentration are often skewed and positively valued, making the normal distribution inappropriate. It introduces the log-normal distribution, which results when the logarithm of a positive variable has a normal distribution. The log-normal distribution is more suitable than the normal for modeling many natural phenomena where values are restricted to being positive and quantities are often multiplied rather than added.

Uploaded by

Kevin Pineda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

0

The normal distribution is


the log-normal distribution
Werner Stahel, Seminar für Statistik, ETH Zürich
and Eckhard Limpert

2 December 2014
1

The normal Normal distribution

We like it!

• Nice shape.

• Named after Gauss. Decorated the 10 DM bill.

• We know it. Passed the exam.


µ − 2σ

µ + 2σ
µ−σ

µ+σ
µ

−3 −2 −1 0 1 2 3
2/3 (68%)
95% (95.5%)
2

Why it is right.

It is given by mathematical theory.

• Adding normal random variables gives a normal sum.

• Linear combinations Y = α0 + α1X1 + α2X2 + ...


remain normal.

• −→ Means of normal variables are normally distributed.


• Central Limit Theorem: Means of non-normal variables
are approximately normally distributed.

• −→ “Hypothesis of Elementary Errors”:


If random variation is the sum of many small random effects,
a normal distribution must be the result.

• Regression models assume normally distributed errors.


3

Is it right?

Mathematical statisticians believe(d) that it is prevalent in Nature.

Well, it is not. Purpose of this talk: What are the consequences?

1. Empirical Distributions

2. Laws of Nature

3. Logarithmic Transformation, the Log-Normal Distribution

4. Regression

5. Advantages of using the log-normal distribution

6. Conclusions
1. Empirical Distributions 4

1. Empirical Distributions

Measurements:
size, weight, concentration, intensity, duration, price, activity
All > 0 −→ “amounts” (John Tukey)
Example: HydroxyMethylFurfurol (HMF) in honey (Renner 1970)
450
350
frequency
250
150
50
0

0 5 10 15 20 25 30 35 40 45 50
concentration
1. Empirical Distributions 5

Measurements:
size, weight, concentration, intensity, duration, price, activity
All > 0 −→ “amounts”
>0
Distribution is skewed: left steep, right flat, skewness
unless coefficient of variation cv(X) = sd(X)/E(X) is small.

Other variables may have other ranges and negative skewness.


They may have a normal distribution.
They are usually derived variables, not original measurements.
Any examples?

Our examples: Position in space and time, angles, directions. That’s it!

For some, 0 is a probable value: rain, expenditure for certain goods, ...

pH, sound and other energies [dB] −→ log scale!


1. Empirical Distributions 6

The 95% Range Check


For every normal distribution, negative values have a probability > 0.
−→ normal distribution inadequate for positive variables.
Becomes relevant when 95% range x ± 2σ b reaches below 0.
Then, the distribution is noticeably skewed.
450
350
frequency
250
150
50
0

−15 −10 −5 0 5 10 15 20 25 30 35 40 45 50
concentration
2. Laws of Nature 7

2. Laws of Nature

(a) Physics E = m · c2
s = ·v 2/(2 · a) ; Velocity v = F · t/m
Stopping distance
Gravitation F = G · m1 · m2/r 2

Gas laws p · V = n · R · T ; R = p0 · V0/T0


Radioactive decay Nt = N0 · e−kt

(b) Chemistry

Reaction velocityv = k · [A]nA · [B]nB


change with changing temperature ∆t → +100C =⇒ v → ·2
based on Arrhenius’ law k = A · e−EA /R · T
EA = activation energy; R = gas constant
Law of mass action: A+B ↔ C +D : Kc = [A]·[B]/[C]·[D]
2. Laws of Nature 8

(c) Biology

Multiplication (of unicellular organisms) 1 − 2 − 4 − 8 − 16


Growth, size st = s0 · k t

Hagen-Poiseuille Law; Volume:


Vt = (∆P · r4 · π)/(8 · η · L) ; ∆P : pressure difference
Permeability

Other laws in biology?


3. Logarithmic Transformation, Log-Normal Distribution 9

3. Logarithmic Transformation, Log-Normal Distribution

Transform data by log transformation


500

300
400

250
200
300
frequency

150
200

100
50 100

50
0

−30 −20 −10 0 10 20 30 40 50 −1.6 −1.2 −0.8 −0.4 0.0 0.4 0.8 1.2
concentration log(concentration)
3. Logarithmic Transformation, Log-Normal Distribution 10

The log transform Z = log(X)


• turns multiplication into addition,

• turns variables X > 0 into Z with unrestricted values,


• reduces (positive) skewness (may turn it negatively skewed)

• Often turns skewed distributions into normal ones.

Note: Base of logarithm is not important.

• natural log for theory,

• log10 for practice.


3. Logarithmic Transformation, Log-Normal Distribution 11

The Log-Normal Distribution

If Z = log(X) is normally distributed (Gaussian), then


the distribution of X is called log-normal.

Densities

1.2
2

1.5
2.0
4.0
8.0
density
1 0

0.0 0.5 1.0 1.5 2.0 2.5

green: normal distribution


3. Logarithmic Transformation, Log-Normal Distribution 12
  2
√1 1 1 log(x)−µ
Density: exp −
σ 2π x 2 σ

Parameters: µ, σ : Expectation and st.dev. of log(X)


More useful:

• eµ = µ∗ : median, geometric “mean”, scale parameter


• eσ = σ ∗ : multiplicative standard deviation, shape parameter
σ ∗ (or σ ) determines the shape of the distribution.
Contrast to
2/2
• expectation E(X) = eµ ·eσ
 
• standard deviation sd(X) from var(X) = e σ2 σ2
e −1 e2µ
Less useful!
3. Logarithmic Transformation, Log-Normal Distribution 13

Ranges
Probability normal log-normal

2/3 (68%) µ±σ µ∗ ×/ σ ∗


95% µ ± 2σ µ∗ ×/ σ ∗2
×/ : “times-divide”
µ* ÷ σ*2

µ* ⋅ σ*2
µ* ÷ σ*

µ* ⋅ σ*
µ*

x
0 1 2 3
2/3 (68%)
95% (95.5%)
3. Logarithmic Transformation, Log-Normal Distribution 14

Properties

We had for the normal distribution:

• Adding normal random variables gives a normal sum.

• Linear combinations Y = α0 + α1X1 + α2X2 + ...


remain normal.

• −→ Means of normal variables are normally distributed.


• Central Limit Theorem: Means of non-normal variables
are approximately normally distributed.

• −→ “Hypothesis of Elementary Errors”:


If random variation is the sum of many small random effects,
a normal distribution must be the result.

• Regression models assume normally distributed errors.


3. Logarithmic Transformation, Log-Normal Distribution 15

Properties: We have for the log-normal distribution:

• Multiplying log-normal random variables gives a log-normal pro-


duct.

• −→ Geometric means of log-normal var.s are log-normally distr.


• Multiplicative Central Limit Theorem: Geometric means
of (non-log-normal) variables are approx. log-normally distributed.

• −→ Multiplicative “Hypothesis of Elementary Errors”:


If random variation is the product of several random effects,
a log-normal distribution must be the result.

Better name: Multiplicative normal distribution!


3. Logarithmic Transformation, Log-Normal Distribution 16

Qunicunx

Galton: Additive Limpert (improving on Kaptayn): Multiplicative


± 50 x
1.5

100 100

50 150 67 150

0 100 200 44 100 225

−50 50 150 250 30 67 150 338


−100 0 100 200 300 0 20 44 100 225 506

1 : 4 : 6 : 4 : 1 1 : 4 : 6 : 4 : 1
3. Logarithmic Transformation, Log-Normal Distribution 17
3. Logarithmic Transformation, Log-Normal Distribution 18

Back to Properties

• −→ Multiplicative “Hypothesis of Elementary Errors”:


If random variation is the product of several random effects,
a log-normal distribution must be the result.

Note: For “many small” effects, the geometric mean will have
a small σ ∗ −→ approx. normal AND log-normal!

Such normal distributions are “intrinsically log-normal”.


Keeping this in mind may lead to new insight!

• Regression models assume normally distributed errors! ???


4. Regression 19

4. Regression

Multiple linear regression:

Y = β0 + β1X1 + β2X2 + ... + E

Regressors Xj may be functions of original input variables


−→ model also describes nonlinear relations, interactions, ...
Categorical (nominal) input variables = “factors”
−→ “dummy” binary regressors
−→ Model includes Analysis of Variance (ANOVA)!
Linear in the coefficients βj
−→ “simple”, exact theory, exact inference
estimation by Least Squares −→ simple calculation
4. Regression 20

Characteristics of the model:


Formula:
Y = β0 + β1X1 + β2X2 + ... + E

additive effects, additive error


Error term E ∼ N (0, σ 2) −→
– constant variance
– symmetric error distribution

Target variable has skewed (error) distribution,


standard deviation of error increases with Y
−→ transform Y −→ log(Y ) !
log(Ye ) = Y = β0 +β1X1 +β2X2...+E
4. Regression 21

Ordinary, additive model Multiplicative model

Formula
Y = β0 + β1X1 + β2X2 + ... + E log(Ye ) = Y = β0 +β1X1 +β2X2...+E
e β1 · X
Ye = βe0 · X e β2 · ... · E
e
1 2

additive effects, additive error multiplicative effects, mult. errors


Error term
E ∼ N (0, σ 2) −→ e ∼ `N (1, σ ∗) −→
E
– constant variance – constant relative error
– symmetric error distribution – skewed error distribution
4. Regression 22
4. Regression 23

Yu et al (2012): Upregulation of transmitter release probability improves a conversion of synaptic


analogue signals into neuronal digital spikes

Figure 1. The probability of releasing glutamates increases during sequential presynaptic spikes...
4. Regression 24

Yu et al (2012): Upregulation of transmitter release probability improves a conversion of synaptic


analogue signals into neuronal digital spikes

Figure 4. Presynaptic Ca 2+ enhances an efficiency of probability-driven facilitation.


5. Advantages of using the log-normal distribution 25

5. Advantages of using the log-normal distribution

... or of applying the log transformation to data.

The normal and log-normal distributions are difficult to distinguish


for σ ∗ < 1.2 ↔ cv < 0.18
where the coef. of variation cv ≈ σ∗ − 1
−→ We discuss case of larger σ ∗ .
5. Advantages of using the log-normal distribution 26

More meaningful parameters

• The expected value of a skewed distribution is less typical


than the median.

• ( cv or) σ ∗ characterizes size of relative error


• Characteristic σ ∗ found in diseases:
latent periods for different infections: σ ∗ ≈ 1.4 ;
survival times after diagnosis of cancer, for different types: σ ∗ ≈3
−→ Deeper insight?
5. Advantages of using the log-normal distribution 27

Fulfilling assumptions, power

What happens to inference based on the normal distribution


if the data is log-normal?

• Level = prob. of falsely rejecting the null hypothesis


coverage prob. of confidence intervals are o.k.

• Loss of power! −→ wasted effort!


5. Advantages of using the log-normal distribution 28

• Loss of power! −→ wasted effort!


300
Difference between 2 groups (samples)

300
n0
5
10
50
250

250
effort, n n0 (%), for power 90%
200

200
150

150
100

100
1.0 1.5 2.0 2.5 3.0 3.5
s*
5. Advantages of using the log-normal distribution 29

More informative graphics

ad.0
10 15 20 25 30 35 40

ad.30
leth.0
leth.20
*^
leth.30
*
latency
5
0

−1 1 3 5 7 9 11
time
5. Advantages of using the log-normal distribution 30

More informative graphics

ad.0
10 15 20 25 30 35 40

ad.30
** +
*^

20
leth.0
leth.20 *

latency (log scale)


leth.30

10
*
latency

2 5
5

1
0

−1 1 3 5 7 9 11 −1 1 3 5 7 9 11
time time
More signi-
ficance
6. Conclusions 31

6. Conclusions

Genesis

• The normal distribution is good for estimators, test statistics,


data with small coef.of variation, and log-transformed data.
The log-normal distribution is good for original data.

• Summation, Means, Central limit theorem, Hyp. of elem. errors


−→ normal distribution
Multiplication, Geometric means, ...
−→ log-normal distribution
6. Conclusions 32

Applications

• Adequate ranges: µ∗ ×/σ ∗2 covers ≈ 95% of the data


• Gain of power of hypothesis tests −→ save efforts for experiments
(e.g., saves animals!)

• Regression models assume normally distributed errors.


−→ Regression model for log(Y ) instead of Y .
e0 · X β1 · X β2 · ... · E
Back transformation: Y = β e
1 2
• Parameter σ ∗ may characterize a class of phenomena
(e.g., diseases) −→ new insight ?!
6. Conclusions 33

Mathematical Statistics adds. Nature multiplies


−→ uses normal distribution −→ yields log-normal distribution

Scientists (and applied statisticians)


add logarithms!
use the normal distribution for log(data) and theory
use log-normal distribution for data

Thank you for your attention!

You might also like