Intro - To Statistics Data Analysis in Geology - Dr. Franz J Meyer

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36
At a glance
Powered by AI
The document provides an introduction to a course on statistics and data analysis in geology. It outlines the instructor's background and areas of expertise in remote sensing and applications of statistics. It also discusses how statistics can be misrepresented and manipulated visually.

The instructor has a background in geodetic engineering and received a PhD studying Arctic ice caps using SAR interferometry. His areas of expertise include SAR image generation, SAR interferometry, SAR data quality analysis, and applications like volcanic/tectonic monitoring and glacier/ice sheet mapping.

Examples given include truncating the y-axis of a bar plot to exaggerate differences, and using diagrams like stacked bars that can obscure the raw numbers being compared if not interpreted carefully.

Statistics and Data Analysis in Geology

1. Introduction

Dr. Franz J Meyer


Earth and Planetary Remote Sensing,
University of Alaska Fairbanks

Statistics & Data Analysis in Geology Franz Meyer 1


Who Am I?

Graduated from the Technical University of Munich, Germany in 2000 with a


Masters Degree in Geodetic Engineering
I am NOT a statistician we are somewhat in the same boat

Since Then:
2000 2003: Scientific Employee at the Chair for Photogrammetry and
Remote Sensing of the TU Munich.

2004: PhD in SAR Interferometry and Differential SAR Interferometry


for the Monitoring of Arctic ice caps.

2003 2006: SAR Scientist at the Remote Sensing Technology Institute of


the German Aerospace Center (DLR)

2007- Sept. 2008: Remote Sensing Scientist at ASF

Since October 2008: Research Professor at the Geophysical Institute

Statistics & Data Analysis in Geology Franz Meyer 2


Main Fields of Expertise
Systems

SAR Image Generation

SAR Interferometry and differential SAR Interferometry

SAR Data Quality Analysis


TX
RX

3
Main Fields of Expertise
Methods & Applications

RFI Suppression in SAR Processing Atmospheric Signals

Object Detection and Tracking

InSAR Time-Series Analysis for


Ionospheric Effects
Volcanic and Tectonic Phenomena
~3cm/y subsidence
Atmospheric & Ionospheric Correction
of InSAR Data
Glacier Dynamics

Atmospheric & Ionospheric Mapping

Mapping of Landfast Ice, Ice Sheets,


and Inland Ice Masses

Fast Ice Mapping

Statistics & Data Analysis in Geology Franz Meyer 4


Lets get started

Statistics & Data Analysis in Geology Franz Meyer 5


Statistics

is a mathematical science pertaining to the collection, analysis,


interpretation or explanation, and presentation of data. Also with
prediction and forecasting based on data. It is applicable to a wide
variety of academic disciplines, from the natural and social sciences
to the humanities, government and business.

Statistics & Data Analysis in Geology Franz Meyer 6


but most importantly, Statistics is

Your Friend !!!

Statistics & Data Analysis in Geology Franz Meyer 7


Why is Statistics Important?

Statistics is part of the quantitative approach to knowledge

In the past, geology has been qualitative, but is now becoming increasingly
quantitative. Statistics can be used to quantify data, but often times statistics are
ignored or misrepresented

In the old days geologists would use more observation skills. This looks like
granite

Nowadays geologists have a bunch of numbers to deal with

Methods of statistical data analysis are required to retrieve information from the
set of numbers in the computer

Statistics & Data Analysis in Geology Franz Meyer 8


Explosion of Observations

In recent years new sophisticated observation methods lead to a overwhelming


amount of data in many disciplines that needs to be analyzed

Retrieving information from the data, as well as choosing the right observation
source for information retrieval are the main challenges of the information age

Statistics is a way of stepping back


and getting the big picture

Statistics & Data Analysis in Geology Franz Meyer 9


Explosion of Observations
Example: Earth Observation

The Earth Observing System Data and Information System (EOSDIS)


Collects and archives data from more than 30 earth observing satellites

Largest scientific data system in the world

To get some perspective, think of the largest library in the world: the
Library of Congress in Washington, D.C. This massive library contains
29 million books and other printed materials, 2.7 million recordings, 12
million photographs, 4.8 million maps and 57 million manuscripts.

As massive as that sounds, the scientific data from EOSDIS could fill the
Library of Congress 300 times.

Using these data and extracting the useful information is one of the
disciplines of statistics and data analysis

Statistics & Data Analysis in Geology Franz Meyer 10


Explosion of Observations
Typical Observation Matrix in Geology

NAME FMTN LONG LAT AS AU CR CU NA PB U ZN LA


17657S 2 146.473 65.371 35 -5 60 22 0.79 36 9.4 110 40
16933S 2 145.859 65.478 42 -12 70 26 0.71 20 7.5 67 73
19063S 2 145.889 65.466 57 21 50 41 0.92 16 3 61 43
19100S 2 145.891 65.462 140 30 90 94 0.83 16 3.5 77 40
21999S 2 145.906 65.475 140 43 100 110 0.77 16 3.7 76 40
15849S 2 146.143 65.411 25 -7 70 34 0.75 20 5.7 64 37
24248S 3 146.358 65.453 16 -5 40 20 1.3 12 14.1 120 37
21935S 3 146.364 65.451 14 -8 50 20 1.3 14 16.5 110 38
17130S 3 146.374 65.449 22 -8 50 27 1.3 16 15.4 140 39
23555S 3 146.38 65.446 17 -6 40 23 1.1 16 15.1 110 34
16754S 3 146.387 65.442 18 -10 80 25 1.3 22 21.5 130 51
18145S 3 146.423 65.452 24 -11 60 19 0.94 38 85.9 68 29
19974S 3 146.427 65.455 18 -5 10 7 1.9 10 37 39 37
16716S 3 146.434 65.464 8 -7 20 9 2.4 12 18.5 63 43
17180S 3 146.437 65.454 38 -12 40 12 2 16 88 78 52
18417S 3 146.441 65.412 23 -8 70 13 1.6 18 46.4 58 40
24426S 3 146.443 65.444 12 11 20 7 2.3 14 25.1 55 63
17185S 3 146.445 65.448 28 -10 40 10 2.5 14 51.6 68 85
19652S 3 146.445 65.451 16 -5 -10 6 2.3 8 14.4 38 29
23558S 3 146.446 65.415 17 -5 40 14 1.5 16 44.5 62 32
18273S 3 146.447 65.464 11 9 10 9.5 2.1 16 20 48 45
23455S 3 146.451 65.452 31 11 40 17 1.3 28 149 78 41
22928S 3 146.454 65.448 44 -10 30 12 1.3 18 114 63 43
15975S 3 146.456 65.42 5 -9 60 8 2.1 14 26.8 43 51
15971S 3 146.459 65.418 8 -7 10 10 2.1 10 16.2 48 27
24247S 3 146.466 65.446 25 -9 30 12 1.4 18 86.1 70 47
23028S 3 146.467 65.448 18 6 30 7.5 1.7 16 40.7 47 39
19634S 3 146.473 65.417 13 -5 40 12 2.1 12 19 48 39
22140S 3 146.475 65.443 28 -14 50 11 1.9 20 107 53 50

Statistics & Data Analysis in Geology Franz Meyer 11


A First Approach of Data Analysis
On the Use of Graphs

Bivariate Plots (plots having, or relating to, two variables):

Are these fields distinct? Are trends significant?

Statistics & Data Analysis in Geology Franz Meyer 12


A First Approach of Data Analysis
On the Use of Graphs

Ternary Diagrams:

Statistics & Data Analysis in Geology Franz Meyer 13


A First Approach of Data Analysis
On the Use of Graphs

Spider Diagrams:

Statistics & Data Analysis in Geology Franz Meyer 14


A First Approach of Data Analysis
On the Use of Graphs

The use of graphs is a helpful tool to get a first grasp about the information
content in the data

However, the following questions remain:


What do all of these show?
Is there any critical analysis of these curves and fields?

We need a better answer than They look like it so they must be different

We need to employ more rigorous scientific methods to geologic problems


hypothesis testing

We need to turn to statistics

Statistics & Data Analysis in Geology Franz Meyer 15


What is the Field of Statistics?

Statistics is the determination of the probable from the possible


This implies we need a rigorous definition and quantification of probable
Statistics is the quantitative study of variance

Statistics usually deals with data in the form of numbers

Two common uses of the word Statistics:

Descriptive Statistics: Numerical or graphical summary of data (what was observed)

Inferential Statistics: used to model patterns in the data, accounting for randomness
and drawing inferences about the larger population.
answers to yes/no questions (hypothesis testing), estimates of numerical characteristics
(estimation), descriptions of association (correlation), or modeling of relationships (regression).

Other modeling techniques include ANOVA, time series, and data mining.

Statistics & Data Analysis in Geology Franz Meyer 16


What is the Field of Statistics?
Descriptive Statistics

Examples:
The average age of citizens who voted for the winning candidate in the last presidential
election
The average length of all books about statistics
The variation in the weight of 100 boxes of cereal selected from a factorys production
line
Or more technical: The adjustments of 14 GPS control points for this orthorectification
ranged from 3.63 to 8.36m with an arithmetic mean of 5.14

Interpretation:
You are most likely to be familiar with this branch of statistics, because many examples
arise in everyday life.
Descriptive statistics form the basis for analysis and discussion in many fields.

Statistics & Data Analysis in Geology Franz Meyer 17


What is the Field of Statistics?
Inferential Statistics

Examples:
A survey that sampled 2001 full- or part-time workers ages 50 to 70, conducted by the
American Association of Retired Persons (AARP), discovered that 70% of those polled
planned to work past the traditional mid-60s retirement age.
This statistics could be used the draw conclusions about the population of all workers ages 50
to 70.
Or again more technical: The mean adjustment of any set of GPS points used for
orthorectification is no less than 4.3 and no more than 6.1m; this statement has a 5%
probability of being wrong

Interpretation:
If you use inferential statistics, you start with a hypothesis and look to see whether the
data are consistent with this hypothesis.
Inferential statistical methods can be easily misapplied or misconstructed, and many
methods require the use of a calculator or computer.

Statistics & Data Analysis in Geology Franz Meyer 18


What is Variance?

Variance measures how a set of data values for a variable fluctuate around the
mean of that variable.

Variance is the natural error or scatter or variability in measurements or it can


be thought of as the natural spread of data

Variance is an inherent value of the measurement device one is using, or of the object
that is observed

Variance is also one of many quantitative measures of variability of data


(assuming the data is of Gaussian nature). This is sometimes represented as 2
or S2.

Statistics & Data Analysis in Geology Franz Meyer 19


Why do we need Statistics?
Why do Data Vary?
No two measurements/samples/natural objects will ever be the same! So, a
certain variation is inherent to all natural objects. E.g. geological variability is
usually what you want to determine
what is the composition or variability of a granite pluton?

Field sampling errors


not getting representative sample

Preparation errors
contamination, final split does not represent field sample

Analytical errors
calibration errors (setting up the machine)
measurement errors (fluctuations in counting)
machine errors (properties of the machine, mass fraction).

Much of the statistical analysis of data focuses upon discovering sources of


variation

Statistics & Data Analysis in Geology Franz Meyer 20


With Great Power Comes Great
Responsibility

Drawing conclusions from a set of erroneous data is difficult and using the wrong
analysis methods or the wrong models my lead to incorrect results

There are three kinds of lies: lies, damned lies and statistics. - Twain attributed this to
B. Disraeli

It has long recognized by public men of all kinds ... that statistics come under the head
of lying, and that no lie is so false or inconclusive as that which is based on statistics. -
H. Belloc

If your experiment needs statistics, you ought to have done a better experiment -
Ernest Rutherford

Never trust results you havent forged yourself Famous saying among engineers

Statistics & Data Analysis in Geology Franz Meyer 21


With Great Power Comes Great Responsibility
The Infamous Hockey Stick Graph

Dr Michael Mann of the Department of Geosciences, University of


Massachusetts was the primary author of a paper that overturned the whole of
climate history in one scientific coup
Mann M.E. et al, "Northern Hemisphere Temperatures During the Past Millennium:
Inferences, Uncertainties, and Limitations", AGU GRL, v.3.1, 1999

The Hockey Stick


Graph:
The most influential and
most controversial plot of
modern times
Here shown in its original
form depicts a dramatic
increase of temperature
in modern times

Statistics & Data Analysis in Geology Franz Meyer 22


With Great Power Comes Great Responsibility
The Infamous Hockey Stick Graph

Why is this graph so significant? It is because of two features!


It shows a strong temperature change since the beginning of the industrial age, which
could be connected to CO2 emissions, and
equally important, it shows a fairly flat temperature behavior for the 900 years before
It emphasizes the anthropogenic nature of climate change

Statistics & Data Analysis in Geology Franz Meyer 23


With Great Power Comes Great Responsibility
The Infamous Hockey Stick Graph

Lets change the graph by adding the standard deviation (square root of
variance) to the graph
We still see a strong temperature increase since 1900, but
the temperature in the past suddenly seams very noisy and not as threatening anymore
The anthropogenic nature seems still present but not as obvious anymore

The Hockey Stick


Graph:
Modified version with
uncertainties added
Rise looks less dramatic
and less significant

Statistics & Data Analysis in Geology Franz Meyer 24


With Great Power Comes Great Responsibility
The Infamous Hockey Stick Graph

Lets change the graph again by adding other studies to the graph
Temperature rise still sticks out but the differences between the studies render the
amplitude questionable
the temperature trend in the past doesnt look linear anymore
anthropogenic climate change?

The Hockey Stick


Graph:
Many people accused
Mann of forgery and
came up with different
models
Suddenly all looks very
confusing

Statistics & Data Analysis in Geology Franz Meyer 25


With Great Power Comes Great Responsibility
The Infamous Hockey Stick Graph

Lets compare the two most contradictory plots


Of course each of the parties blame the other for forging the data so, what is the
truth?
Plots can be very deceiving

The Hockey Stick Graph:


In this battle of the graphs the
two most extreme temperature
graphs are compared
In this case the bottom graph is
called the truth and Manns
graph is called forged
Whats the truth? Do you get
my point?

Statistics & Data Analysis in Geology Franz Meyer 26


How to Lie with Statistics

How to Lie with Statistics is Darrell Huff's perennially


popular introduction to statistics for the general reader.
Written in 1954.

A guide to how to misuse and distort the results of


statistics to fit your interests

The methods presented here will of course NOT be


taught in the class

Statistics & Data Analysis in Geology Franz Meyer 27


How to Lie with Statistics
The Power of Visualization

Since the eye is a "fat pipe" to the mind, that is, since a great deal of
(mis)information can be quickly communicated visually, the (im)proper display of
statistics offers a fast track to selling ideas, and potentially to lying with statistics

Therefore, the guideline is:


Keep illustrations simple and accurate

Examples of how different illustrations of the same data convey different


information can be found on the following viewgraphs

Statistics & Data Analysis in Geology Franz Meyer 28


How to Lie with Statistics
The Power of Visualization

The frogs in pond example

Diagram shows that there were


roughly 10 frogs in the pond in
may and about 40 in
September

The irregular contours


complicate the extraction of the
exact number of frogs

The different size suggests


that the difference was much
bigger than it really was
How to lie with statistics, Darrell Huff, 1954

Statistics & Data Analysis in Geology Franz Meyer 29


How to Lie with Statistics
The Power of Visualization

The frogs in pond example

The stacked frogs diagram


makes it more clear that there
were roughly 3 times as many
frogs in September than in
May

As it is unlikely that it was


exactly 3 times more frogs, a
confusing fractional frog would
be required to be more precise

How to lie with statistics, Darrell Huff, 1954

Statistics & Data Analysis in Geology Franz Meyer 30


How to Lie with Statistics
The Power of Visualization

The frogs in pond example

Using a bar plot helps to put


the raw numbers into
perspective

Exact numbers can be


displayed

Changes are visually


represented

How to lie with statistics, Darrell Huff, 1954

Statistics & Data Analysis in Geology Franz Meyer 31


How to Lie with Statistics
The Power of Visualization

The frogs in pond example

WARNING! The information


conveyed by something as
simple as a bar plot can be
distorted by truncating the y-
axis

In the truncated example the


population growth appears to
significantly exaggerated
although the numbers still tell
the truth

This shows that statistical


results can be manipulated
easily and have to be handled
with care How to lie with statistics, Darrell Huff, 1954

Statistics & Data Analysis in Geology Franz Meyer 32


The Syllabus

Statistics & Data Analysis in Geology Franz Meyer 33


Tell me and I will forget,
Show me and I will remember,
Involve me and I will understand.
Lao-Tse

Statistics & Data Analysis in Geology Franz Meyer 34


A few last words before we start

The goal of the course


Is not to create statisticians, but rather to show you the potential and the concepts of
statistical approaches that are (believe me, they will be) useful for you
Is to hand you a set of tools that will make your data more valuable and your
argumentations more powerful
My hope it to teach you an somewhat intuitive understanding of which tools to use in a
certain situation and why

The homework will help you to reach this goal and will help you and me to
understand topics that need to be reiterated

ALSO, my foremost goal is to help you understand and to make you pass, so
please dont hesitate to contact me when you have problems:
Franz Meyer, Room 106d, Westridge Research Building
Phone: 907-474-7767
email: [email protected]

Statistics & Data Analysis in Geology Franz Meyer 35


Ok, a few more words

All the material of the course can also be found online at

http://avo-ftp.images.alaska.edu/TEMP/geos430_geostats/

For downloading the material, just type the URL into your web browser

The PowerPoint material you find on the CD may change a bit in the course of the
semester

I will update the material on the webpage constantly and you can download the
latest versions from there

Statistics & Data Analysis in Geology Franz Meyer 36

You might also like