100% found this document useful (1 vote)

113 views34 pages

The Normal Distribution Is The Distribution

The document discusses how empirical measurements of quantities like size, weight, and concentration are often skewed and positively valued, making the normal distribution inappropriate. It introduces the log-normal distribution, which results when the logarithm of a positive variable has a normal distribution. The log-normal distribution is more suitable than the normal for modeling many natural phenomena where values are restricted to being positive and quantities are often multiplied rather than added.

Uploaded by

Kevin Pineda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

113 views34 pages

The Normal Distribution Is The Distribution

Uploaded by

Kevin Pineda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

0

The normal distribution is

the log-normal distribution
Werner Stahel, Seminar für Statistik, ETH Zürich
and Eckhard Limpert

2 December 2014
1

The normal Normal distribution

We like it!

• Nice shape.

• Named after Gauss. Decorated the 10 DM bill.

• We know it. Passed the exam.

µ − 2σ

µ + 2σ
µ−σ

µ+σ
µ

−3 −2 −1 0 1 2 3
2/3 (68%)
95% (95.5%)
2

Why it is right.

It is given by mathematical theory.

• Adding normal random variables gives a normal sum.

• Linear combinations Y = α0 + α1X1 + α2X2 + ...

remain normal.

• −→ Means of normal variables are normally distributed.

• Central Limit Theorem: Means of non-normal variables
are approximately normally distributed.

• −→ “Hypothesis of Elementary Errors”:

If random variation is the sum of many small random effects,
a normal distribution must be the result.

• Regression models assume normally distributed errors.

Is it right?

Mathematical statisticians believe(d) that it is prevalent in Nature.

Well, it is not. Purpose of this talk: What are the consequences?

1. Empirical Distributions

2. Laws of Nature

3. Logarithmic Transformation, the Log-Normal Distribution

4. Regression

5. Advantages of using the log-normal distribution

6. Conclusions
1. Empirical Distributions 4

1. Empirical Distributions

Measurements:
size, weight, concentration, intensity, duration, price, activity
All > 0 −→ “amounts” (John Tukey)
Example: HydroxyMethylFurfurol (HMF) in honey (Renner 1970)
450
350
frequency
250
150
50
0

0 5 10 15 20 25 30 35 40 45 50
concentration
1. Empirical Distributions 5

Measurements:
size, weight, concentration, intensity, duration, price, activity
All > 0 −→ “amounts”
>0
Distribution is skewed: left steep, right flat, skewness
unless coefficient of variation cv(X) = sd(X)/E(X) is small.

Other variables may have other ranges and negative skewness.

They may have a normal distribution.
They are usually derived variables, not original measurements.
Any examples?

Our examples: Position in space and time, angles, directions. That’s it!

For some, 0 is a probable value: rain, expenditure for certain goods, ...

pH, sound and other energies [dB] −→ log scale!

1. Empirical Distributions 6

The 95% Range Check

For every normal distribution, negative values have a probability > 0.
−→ normal distribution inadequate for positive variables.
Becomes relevant when 95% range x ± 2σ b reaches below 0.
Then, the distribution is noticeably skewed.
450
350
frequency
250
150
50
0

−15 −10 −5 0 5 10 15 20 25 30 35 40 45 50
concentration
2. Laws of Nature 7

2. Laws of Nature

(a) Physics E = m · c2
s = ·v 2/(2 · a) ; Velocity v = F · t/m
Stopping distance
Gravitation F = G · m1 · m2/r 2

Gas laws p · V = n · R · T ; R = p0 · V0/T0

Radioactive decay Nt = N0 · e−kt

(b) Chemistry

Reaction velocityv = k · [A]nA · [B]nB

change with changing temperature ∆t → +100C =⇒ v → ·2
based on Arrhenius’ law k = A · e−EA /R · T
EA = activation energy; R = gas constant
Law of mass action: A+B ↔ C +D : Kc = [A]·[B]/[C]·[D]
2. Laws of Nature 8

Multiplication (of unicellular organisms) 1 − 2 − 4 − 8 − 16

Growth, size st = s0 · k t

Hagen-Poiseuille Law; Volume:

Vt = (∆P · r4 · π)/(8 · η · L) ; ∆P : pressure difference
Permeability

Other laws in biology?

3. Logarithmic Transformation, Log-Normal Distribution 9

3. Logarithmic Transformation, Log-Normal Distribution

Transform data by log transformation

500

300
400

250
200
300
frequency

150
200

100
50 100

50
0

−30 −20 −10 0 10 20 30 40 50 −1.6 −1.2 −0.8 −0.4 0.0 0.4 0.8 1.2
concentration log(concentration)
3. Logarithmic Transformation, Log-Normal Distribution 10

The log transform Z = log(X)

• turns multiplication into addition,

• turns variables X > 0 into Z with unrestricted values,

• reduces (positive) skewness (may turn it negatively skewed)

• Often turns skewed distributions into normal ones.

Note: Base of logarithm is not important.

• natural log for theory,

• log10 for practice.

3. Logarithmic Transformation, Log-Normal Distribution 11

The Log-Normal Distribution

If Z = log(X) is normally distributed (Gaussian), then

the distribution of X is called log-normal.

Densities

1.2
2

1.5
2.0
4.0
8.0
density
1 0

0.0 0.5 1.0 1.5 2.0 2.5

green: normal distribution

3. Logarithmic Transformation, Log-Normal Distribution 12
2
√1 1 1 log(x)−µ
Density: exp −
σ 2π x 2 σ

Parameters: µ, σ : Expectation and st.dev. of log(X)

More useful:

• eµ = µ∗ : median, geometric “mean”, scale parameter

• eσ = σ ∗ : multiplicative standard deviation, shape parameter
σ ∗ (or σ ) determines the shape of the distribution.
Contrast to
2/2
• expectation E(X) = eµ ·eσ

• standard deviation sd(X) from var(X) = e σ2 σ2
e −1 e2µ
Less useful!
3. Logarithmic Transformation, Log-Normal Distribution 13

Ranges
Probability normal log-normal

2/3 (68%) µ±σ µ∗ ×/ σ ∗

95% µ ± 2σ µ∗ ×/ σ ∗2
×/ : “times-divide”
µ* ÷ σ*2

µ* ⋅ σ*2
µ* ÷ σ*

µ* ⋅ σ*
µ*

x
0 1 2 3
2/3 (68%)
95% (95.5%)
3. Logarithmic Transformation, Log-Normal Distribution 14

Properties

We had for the normal distribution:

• Adding normal random variables gives a normal sum.

• Linear combinations Y = α0 + α1X1 + α2X2 + ...

remain normal.

• −→ Means of normal variables are normally distributed.

• Central Limit Theorem: Means of non-normal variables
are approximately normally distributed.

• −→ “Hypothesis of Elementary Errors”:

If random variation is the sum of many small random effects,
a normal distribution must be the result.

• Regression models assume normally distributed errors.

3. Logarithmic Transformation, Log-Normal Distribution 15

Properties: We have for the log-normal distribution:

• Multiplying log-normal random variables gives a log-normal pro-

duct.

• −→ Geometric means of log-normal var.s are log-normally distr.

• Multiplicative Central Limit Theorem: Geometric means
of (non-log-normal) variables are approx. log-normally distributed.

• −→ Multiplicative “Hypothesis of Elementary Errors”:

If random variation is the product of several random effects,
a log-normal distribution must be the result.

Better name: Multiplicative normal distribution!

3. Logarithmic Transformation, Log-Normal Distribution 16

Qunicunx

Galton: Additive Limpert (improving on Kaptayn): Multiplicative

± 50 x
1.5

100 100

50 150 67 150

0 100 200 44 100 225

−50 50 150 250 30 67 150 338

−100 0 100 200 300 0 20 44 100 225 506

1 : 4 : 6 : 4 : 1 1 : 4 : 6 : 4 : 1
3. Logarithmic Transformation, Log-Normal Distribution 17
3. Logarithmic Transformation, Log-Normal Distribution 18

Back to Properties

• −→ Multiplicative “Hypothesis of Elementary Errors”:

If random variation is the product of several random effects,
a log-normal distribution must be the result.

Note: For “many small” effects, the geometric mean will have
a small σ ∗ −→ approx. normal AND log-normal!

Such normal distributions are “intrinsically log-normal”.

Keeping this in mind may lead to new insight!

• Regression models assume normally distributed errors! ???

4. Regression 19

4. Regression

Multiple linear regression:

Y = β0 + β1X1 + β2X2 + ... + E

Regressors Xj may be functions of original input variables

−→ model also describes nonlinear relations, interactions, ...
Categorical (nominal) input variables = “factors”
−→ “dummy” binary regressors
−→ Model includes Analysis of Variance (ANOVA)!
Linear in the coefficients βj
−→ “simple”, exact theory, exact inference
estimation by Least Squares −→ simple calculation
4. Regression 20

Characteristics of the model:

Formula:
Y = β0 + β1X1 + β2X2 + ... + E

additive effects, additive error

Error term E ∼ N (0, σ 2) −→
– constant variance
– symmetric error distribution

Target variable has skewed (error) distribution,

standard deviation of error increases with Y
−→ transform Y −→ log(Y ) !
log(Ye ) = Y = β0 +β1X1 +β2X2...+E
4. Regression 21

Ordinary, additive model Multiplicative model

Formula
Y = β0 + β1X1 + β2X2 + ... + E log(Ye ) = Y = β0 +β1X1 +β2X2...+E
e β1 · X
Ye = βe0 · X e β2 · ... · E
e
1 2

additive effects, additive error multiplicative effects, mult. errors

Error term
E ∼ N (0, σ 2) −→ e ∼ `N (1, σ ∗) −→
E
– constant variance – constant relative error
– symmetric error distribution – skewed error distribution
4. Regression 22
4. Regression 23

Yu et al (2012): Upregulation of transmitter release probability improves a conversion of synaptic

analogue signals into neuronal digital spikes

Figure 1. The probability of releasing glutamates increases during sequential presynaptic spikes...
4. Regression 24

Yu et al (2012): Upregulation of transmitter release probability improves a conversion of synaptic

analogue signals into neuronal digital spikes

Figure 4. Presynaptic Ca 2+ enhances an efficiency of probability-driven facilitation.

5. Advantages of using the log-normal distribution 25

5. Advantages of using the log-normal distribution

... or of applying the log transformation to data.

The normal and log-normal distributions are difficult to distinguish

for σ ∗ < 1.2 ↔ cv < 0.18
where the coef. of variation cv ≈ σ∗ − 1
−→ We discuss case of larger σ ∗ .
5. Advantages of using the log-normal distribution 26

More meaningful parameters

• The expected value of a skewed distribution is less typical

than the median.

• ( cv or) σ ∗ characterizes size of relative error

• Characteristic σ ∗ found in diseases:
latent periods for different infections: σ ∗ ≈ 1.4 ;
survival times after diagnosis of cancer, for different types: σ ∗ ≈3
−→ Deeper insight?
5. Advantages of using the log-normal distribution 27

Fulfilling assumptions, power

What happens to inference based on the normal distribution

if the data is log-normal?

• Level = prob. of falsely rejecting the null hypothesis

coverage prob. of confidence intervals are o.k.

• Loss of power! −→ wasted effort!

5. Advantages of using the log-normal distribution 28

• Loss of power! −→ wasted effort!

300
Difference between 2 groups (samples)

300
n0
5
10
50
250

250
effort, n n0 (%), for power 90%
200

200
150

150
100

100
1.0 1.5 2.0 2.5 3.0 3.5
s*
5. Advantages of using the log-normal distribution 29

More informative graphics

ad.0
10 15 20 25 30 35 40

ad.30
leth.0
leth.20
*^
leth.30
*
latency
5
0

−1 1 3 5 7 9 11
time
5. Advantages of using the log-normal distribution 30

More informative graphics

ad.0
10 15 20 25 30 35 40

ad.30
** +
*^

20
leth.0
leth.20 *

latency (log scale)

leth.30

10
*
latency

2 5
5

1
0

−1 1 3 5 7 9 11 −1 1 3 5 7 9 11
time time
More signi-
ficance
6. Conclusions 31

6. Conclusions

Genesis

• The normal distribution is good for estimators, test statistics,

data with small coef.of variation, and log-transformed data.
The log-normal distribution is good for original data.

• Summation, Means, Central limit theorem, Hyp. of elem. errors

−→ normal distribution
Multiplication, Geometric means, ...
−→ log-normal distribution
6. Conclusions 32

Applications

• Adequate ranges: µ∗ ×/σ ∗2 covers ≈ 95% of the data

• Gain of power of hypothesis tests −→ save efforts for experiments
(e.g., saves animals!)

• Regression models assume normally distributed errors.

−→ Regression model for log(Y ) instead of Y .
e0 · X β1 · X β2 · ... · E
Back transformation: Y = β e
1 2
• Parameter σ ∗ may characterize a class of phenomena
(e.g., diseases) −→ new insight ?!
6. Conclusions 33

Mathematical Statistics adds. Nature multiplies

−→ uses normal distribution −→ yields log-normal distribution

Scientists (and applied statisticians)

add logarithms!
use the normal distribution for log(data) and theory
use log-normal distribution for data

Thank you for your attention!

BAC331 (Old Code BCM331) - Financial Risk Appraisal Module
No ratings yet
BAC331 (Old Code BCM331) - Financial Risk Appraisal Module
123 pages
Menyusun Model Regresi 2021
100% (1)
Menyusun Model Regresi 2021
45 pages
Lognormal Distribution
No ratings yet
Lognormal Distribution
20 pages
Unit 2 National Income Accounting
100% (1)
Unit 2 National Income Accounting
83 pages
Residential Market Analysis and Highest and Best Use Study Guide
No ratings yet
Residential Market Analysis and Highest and Best Use Study Guide
49 pages
Engg Econ Lecture 3.4 - Annual Worth
No ratings yet
Engg Econ Lecture 3.4 - Annual Worth
23 pages
Analysis On Refraction of Light
No ratings yet
Analysis On Refraction of Light
27 pages
Module 2 Equilibrium of Force Systems
No ratings yet
Module 2 Equilibrium of Force Systems
35 pages
Basic Electricity: EAS 199A Lecture Notes
No ratings yet
Basic Electricity: EAS 199A Lecture Notes
40 pages
SeamlessDoc Financial Research Analyst Doha
No ratings yet
SeamlessDoc Financial Research Analyst Doha
8 pages
Probability Distributions and Divisors For Estimating Measurement Uncertainty
100% (2)
Probability Distributions and Divisors For Estimating Measurement Uncertainty
17 pages
Upzoning Chicago: Impacts of A Zoning Reform On Property Values and Housing Construction
67% (3)
Upzoning Chicago: Impacts of A Zoning Reform On Property Values and Housing Construction
32 pages
1: Introduction To Physics.: Succeed in Physics Form I
No ratings yet
1: Introduction To Physics.: Succeed in Physics Form I
137 pages
Radicals Packet 6.1-6.3
100% (1)
Radicals Packet 6.1-6.3
20 pages
Current Electricity PDF
No ratings yet
Current Electricity PDF
52 pages
Introduction To Managerial Accounting and Job Order Cost Systems
100% (1)
Introduction To Managerial Accounting and Job Order Cost Systems
57 pages
Normal To Log
No ratings yet
Normal To Log
25 pages
MSS 241 Material
No ratings yet
MSS 241 Material
44 pages
2017-Global-State-of-Digital-Learning-in-K-12 Education
No ratings yet
2017-Global-State-of-Digital-Learning-in-K-12 Education
46 pages
Ch01 Business Statistics
No ratings yet
Ch01 Business Statistics
65 pages
ST1 Lesson4 Probability Distribution
No ratings yet
ST1 Lesson4 Probability Distribution
19 pages
Financial Statement Analysis
No ratings yet
Financial Statement Analysis
92 pages
Law of Floatation
No ratings yet
Law of Floatation
7 pages
The Balanced Scorecard: A Tool To Implement Strategy
No ratings yet
The Balanced Scorecard: A Tool To Implement Strategy
39 pages
Average Due Date
100% (1)
Average Due Date
19 pages
Notes of 3. Issue of Shares
No ratings yet
Notes of 3. Issue of Shares
21 pages
ACCG200 Lectures 2-11 Handout
No ratings yet
ACCG200 Lectures 2-11 Handout
108 pages
The Normal Distribution: Learning Objectives
No ratings yet
The Normal Distribution: Learning Objectives
5 pages
A Primer of Celestial Navigation, Favil
100% (1)
A Primer of Celestial Navigation, Favil
288 pages
Statistics For Business and Economics (13e) : John Loucks
100% (1)
Statistics For Business and Economics (13e) : John Loucks
48 pages
Structural Changes in Indian Economy
No ratings yet
Structural Changes in Indian Economy
5 pages
Notes: ACCA Paper F2 (FIA Paper FMA)
0% (1)
Notes: ACCA Paper F2 (FIA Paper FMA)
15 pages
Probability Densities 1
No ratings yet
Probability Densities 1
38 pages
Appr of Constr Rocks
No ratings yet
Appr of Constr Rocks
14 pages
NCERT Reference
No ratings yet
NCERT Reference
295 pages
Social Welfare Development Issues in Indonesia
No ratings yet
Social Welfare Development Issues in Indonesia
6 pages
BAYE's Theorm
No ratings yet
BAYE's Theorm
27 pages
Continuous Random Variables
No ratings yet
Continuous Random Variables
33 pages
Week 4 Continuous Probability Distribution PDF
No ratings yet
Week 4 Continuous Probability Distribution PDF
40 pages
P3 Acowtancy Notes - 186p
No ratings yet
P3 Acowtancy Notes - 186p
186 pages
Outdoor Recreation, Nature-Based Tourism
No ratings yet
Outdoor Recreation, Nature-Based Tourism
12 pages
1.5 Common Probability Distribution
No ratings yet
1.5 Common Probability Distribution
48 pages
Chapter 1 Financial Management
No ratings yet
Chapter 1 Financial Management
16 pages
Phy 11
100% (1)
Phy 11
362 pages
ArnoldKyle IntermFinAcct Vol1 2020A PDF
No ratings yet
ArnoldKyle IntermFinAcct Vol1 2020A PDF
588 pages
8051 Microcontroller Program
100% (1)
8051 Microcontroller Program
15 pages
E-Learning Hub (Online) : Yash Mahant, Jaykumar Patel, Rituraj Dharwadkar, Prof. Shubhada Labde (Guide)
100% (1)
E-Learning Hub (Online) : Yash Mahant, Jaykumar Patel, Rituraj Dharwadkar, Prof. Shubhada Labde (Guide)
3 pages
The Nature of Mathematics
No ratings yet
The Nature of Mathematics
43 pages
Equilibrium of Force
No ratings yet
Equilibrium of Force
5 pages
Ece-Vii-dsp Algorithms & Architecture (10ec751) - Notes
0% (2)
Ece-Vii-dsp Algorithms & Architecture (10ec751) - Notes
186 pages
Unite 12 Strategic Planning
100% (1)
Unite 12 Strategic Planning
29 pages
ACCA P3 Course Notes
No ratings yet
ACCA P3 Course Notes
219 pages
09 Sampling Distribution
No ratings yet
09 Sampling Distribution
15 pages
Efficient Frontier
No ratings yet
Efficient Frontier
27 pages
Abstract Classes
No ratings yet
Abstract Classes
5 pages
Chapter 2: The Demand For Money: Learning Objectives
No ratings yet
Chapter 2: The Demand For Money: Learning Objectives
12 pages
PSSC Maths Statistics Project Handbook Eff08 PDF
No ratings yet
PSSC Maths Statistics Project Handbook Eff08 PDF
19 pages
Applications of Analytic Geometry
No ratings yet
Applications of Analytic Geometry
15 pages
Board Diversity and Its Effects On Bank Performance - An International Analysis PDF
No ratings yet
Board Diversity and Its Effects On Bank Performance - An International Analysis PDF
13 pages
F2-02 Sources of Data
No ratings yet
F2-02 Sources of Data
14 pages
Chapter 13 Capital Budgeting Estimating Cash Flow and Analyzing Risk Answers To End of Chapter Questions 13 3 Since The Cost of Capital Includes A Premium For Expected Inflation Failure 1
100% (1)
Chapter 13 Capital Budgeting Estimating Cash Flow and Analyzing Risk Answers To End of Chapter Questions 13 3 Since The Cost of Capital Includes A Premium For Expected Inflation Failure 1
8 pages
Unit 3. Introduction To Programming in C
No ratings yet
Unit 3. Introduction To Programming in C
76 pages
Optimum Power Flow Analysis by Newton Raphson Method, A Case Study
No ratings yet
Optimum Power Flow Analysis by Newton Raphson Method, A Case Study
9 pages
NCERT Grade 09 Mathematics Introduction-To-Euclids-Geometry
No ratings yet
NCERT Grade 09 Mathematics Introduction-To-Euclids-Geometry
8 pages
Equilibrium of Forces and Supports
No ratings yet
Equilibrium of Forces and Supports
11 pages
Tall y Viner Imagen y Concepto
No ratings yet
Tall y Viner Imagen y Concepto
20 pages
Chap8 Pinto Chap10
No ratings yet
Chap8 Pinto Chap10
24 pages
Calculus 1 - Lecture 1
No ratings yet
Calculus 1 - Lecture 1
32 pages
Quadratic Equation - Arjuna Jee 2.0 2025
No ratings yet
Quadratic Equation - Arjuna Jee 2.0 2025
15 pages
12 - Design Root Locus - A
No ratings yet
12 - Design Root Locus - A
109 pages
Behat
100% (1)
Behat
87 pages
Prime Factorization: by Jane Alam Jan
No ratings yet
Prime Factorization: by Jane Alam Jan
6 pages
Micro Economicsforever Sem3
No ratings yet
Micro Economicsforever Sem3
34 pages
A Study On Effectiveness of Franchise Business Model of Mcdonald'S in Ahmedabad
No ratings yet
A Study On Effectiveness of Franchise Business Model of Mcdonald'S in Ahmedabad
25 pages
Q1) - What Are The Types Classes For Classful IP Addressing Are There in The Internet ? Ans)
No ratings yet
Q1) - What Are The Types Classes For Classful IP Addressing Are There in The Internet ? Ans)
16 pages
Kawasaki 1987
No ratings yet
Kawasaki 1987
23 pages
The Lognormal Distribution: X Is Said To Have The
No ratings yet
The Lognormal Distribution: X Is Said To Have The
3 pages
Constraint Programming: Michael Trick Carnegie Mellon
No ratings yet
Constraint Programming: Michael Trick Carnegie Mellon
41 pages
Fraunhofer Diffraction
No ratings yet
Fraunhofer Diffraction
7 pages
Averages Arithmetic Mean
No ratings yet
Averages Arithmetic Mean
2 pages
Models of Biological Pattern Formation: Common Mechanism in Plant and Animal Development
No ratings yet
Models of Biological Pattern Formation: Common Mechanism in Plant and Animal Development
12 pages
Antminer S19 Pro 2
No ratings yet
Antminer S19 Pro 2
4 pages
Mr. White's Script
No ratings yet
Mr. White's Script
4 pages
Region and Domain Region and Domain
No ratings yet
Region and Domain Region and Domain
3 pages
Lec10 3
No ratings yet
Lec10 3
3 pages
PHP Type Comparison Tables
No ratings yet
PHP Type Comparison Tables
2 pages
Filtered Back Projection
No ratings yet
Filtered Back Projection
3 pages
Practice: For Use With Pages 398-402
No ratings yet
Practice: For Use With Pages 398-402
1 page
Organizational Readiness to E-Transformation
From Everand
Organizational Readiness to E-Transformation
Aqel M. Aqel
No ratings yet
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
The Stock/Ticker Symbol
From Everand
The Stock/Ticker Symbol
Mario V. Farina
No ratings yet
Exercises of Matrices and Linear Algebra
From Everand
Exercises of Matrices and Linear Algebra
Simone Malacrida
No ratings yet
Statistical Analysis with Excel Complete Self-Assessment Guide
From Everand
Statistical Analysis with Excel Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet