0% found this document useful (0 votes)
19 views

Lecture 1 Introduction

Uploaded by

sharontao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Lecture 1 Introduction

Uploaded by

sharontao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

ENVR 5320

Environmental Data Analysis

Lecture 1

Dr. Zhi NING


Division of Environment and Sustainability
The Hong Kong University of Science and Technology
Agenda

• Environmental problems and statistics


• Brief review of statistics
• Get familiar with the Excel tools
• Statistical distribution measures
• Probability distributions

2
Environmental Problems and Statistics

• The goal of statistics


– to make discovery process efficient.
• Environmental laws and regulations:
– toxic chemicals;
– water quality criteria;
– air quality criteria.
• Environmental data
– the limit of detection;
– acute and chronic toxicity criteria;
– cancer potency factors.

3
Environmental Problems and Statistics

• Use statistic tools to understand the nature

4
Structure of course teaching

• Introduction
– engineering problem and statistical method.
• Case study
– introduce a specific environmental example with
real world data
• Method
– give a brief explanation of statistical method that is
used to prepare the solution.
• Analysis
– show how the data suggest and influence the
method of analysis and give the solution.
5
Brief Review of Statistics

• Population and sample


– A population is a very large set of N observations
(or data values) from which the sample of n
observations can be imagined to have come.
• Two types of statistics:
– DESCRIPTIVE
• a way of summarizing the complexity of data with a
single number.
– INFERENTIAL
• answer the question, "To what extent can these findings
be GENERALIZED?

6
Brief Review of Statistics

•DESCRIPTIVE Statistics
– For one variable ("univariate analysis"):
– Measures of "CENTRAL TENDENCY" (averages) and
of DISPERSION or variance around that average.
– Examples: Means, Modes, Medians, Standard
Deviation, quartiles etc

– For multiple variables:


– The strength of relationship between two variables
(bivariate analysis) or among a set of variables
(multivariate analysis)
– Examples: correlation coefficient
7
Brief Review of Statistics

•INFERENTIAL Statistics
– Measures of the SIGNIFICANCE of the relationship
between two or more variables. Significance refers to
the probability that the findings could be attributed to
sampling error.
– Appropriate statistics depend on the LEVEL OF
MEASUREMENT OF THE DEPENDENT VARIABLE
(and of the independent variable).
– Example: t-Test, ANOVA (F-ratio)

8
Let’s get Familiar with Excel Advanced
Tools
• Formula in Excel
• Hidden Developer functions in Excel.
• Practice calculations in Excel data example
• Good practice in using Excel

9
Excel basics I

• Use of formula
• Use of $
• Use of shortcut to go to cells
• Note the black and white cross
• Plot
• Use of Ctrl + Shift + Enter for array calculation
• Developer tool
• ActiveX

10
Statistical distribution measures

• Central values
– Arithmetic mean, Geometric mean
– Mode, Median
• Measures of spread
– The range
– The interquartile range (IQR)
– Standard deviation, variance
– Coefficient of variation (CoV)
• Quartiles, Quantiles and percentiles

11
Statistical distribution measures

• Central values
– Arithmetic mean Average(a,b,c)

– Geometric mean


Geomean(a,b,c)
– Mode: value with highest probability of occurrence
– The median: central value of the ordered data
Median(a,b,c)
• Trimmed mean:
– e.g. 5 percent trimmed mean is the average of the
data between 5th and 95th percentiles 12
Statistical distribution measures

• Influence of the shape of the data distribution


• “heavy tails”.
• Arithmetic mean is
• The “heaviness” of the
influenced by high
tails depends degrees of
values;
freedom (df)
• G is same as median
• G best represents

• Right skewed
• Higher df leads to
normal dist.

13
• Bimodal distribution in nature
• The implications

14
Statistical distribution measures

• Measures of spread
– The range (MIN and MAX)
– The interquartile range (IQR)
Percentile (array, k)
Quartile (array, 0/1/2/3/4)
IQR=0.7413*(Q3-Q1)
– The standard deviation

15
Statistical distribution measures

• Measures of spread
– Variance
VAR(array)

– Coefficient of variation (CV)

16
Statistical distribution measures

• Measures of spread
– Quartiles, quantiles and percentiles
Quartile (array, 0/1/2/3/4)
Percentile (array, 0.05/0.10/0.95)
– Skewness:
• measure of symmetry of data distribution
Skew (array)
0 is symmetric; <0, left skewed; >0, right
skewed.

17
Statistical distribution measures

• Frequency distributions
– Identify cutting points to divide the data into
categories. The cutoff points should be chosen to
divide the data fairly evenly.
Frequency (data_array,bin_array)
PRESS SHIFT/CTRL/ENTER
Bin Frequency
1 10 2
2 20 0
3 30 2
4 40 3
5 50 5
6 60 4
7 70 2
8 80 0
9 90 1
10 100 1 18
Statistical distribution measures

• Accuracy, Bias and Precision


– Bias measures systematic errors
– Precision measures the degree of scatter in the
data
– Accuracy is a function of both bias and precision.

A known concentration of 8.00 mg/L.

19
Probability distributions

• The Normal Distribution


– Often called Gaussian distribution
– Characterized completely by N(η, σ2 ), “a normal
distribution with mean η and variance σ2 .

20
Read and type Greek letters correction

• Alt 956
• Alt 963
• Alt 961
• Alt 960

https://www.thespruceeats.com/the-greek-
21
alphabet-1705558
Probability distributions

• The Normal Distribution


1. The vertical axis (probability density) is scaled
such that area under the curve is unity (1.0).
2. The standard deviation σ measures the distance
from the mean to the point of inflection.
3. The probability that a positive deviation from the
mean will exceed one σ is 0.1587.
4. Because of symmetry, the probabilities are the
same for negative deviations
5. The chance that a deviation in either direction will
exceed 2σ is 2(0.0228) = 0.0456

22
Probability distributions

• NORM.DIST(x, mean, standard_dev, cumulative)


– Returns the normal cumulative distribution of with specific η and σ.
– Returns α value with given z and η σ values.
• NORM.INV (probability, mean, standard_dev)
– Returns the inverse of the normal cumulative distribution for η and σ.
– Returns z value with given α, η and σ values
• NORM.S.DIST (z, cumulative)
– Returns the standard normal cumulative distribution of with η=0 and σ=1
• NORM.S.INV (probability)
– Returns the inverse of the standard normal distribution with η=0 and σ=1

• Cumulative or not?
• Left tailed or right tailed?
• How to generate a normal distribution in excel?

23
Probability distributions

• Examples
– A normal distribution with η=8mg/L and σ=1 mg/L;
– Look for the value with 95% of data below?
– Look for the probability that the value is read
below 6.4mg/L?

– How to draw a normal distribution in Excel?


– Use function: norm.inv(rand(),8,1,1)

24
Probability distributions

• t distribution
– In normal distributions, both η and σ are known;
– In practice, σ is often not known and we use Se to
replace σ:

– Bell shaped and symmetric but tails are wider.


– Width of the t distribution depends on degree of
freedom.

Guinness brewer
Gosset, 1908 25
“Student” as pen name
Probability distributions

• Part of the t table as function of  and 

26
Probability distributions

• T.INV (probability, degree of freedom)


– Returns the inverse of the left tailed Student t distribution
• T.INV.2T (probability, degree of freedom)
– Returns the inverse of the two tailed Student t distribution

• T.DIST (x, degree of freedom, cumulative)


– Returns the left tailed Student t distribution
• T.DIST.RT (x, degree of freedom, cumulative)
– Returns the right tailed Student t distribution
• T.DIST.2T (x, degree of freedom, cumulative)
– Returns the two tailed Student t distribution

• If we enter α as probability and n-1 as Deg_freedom, then T.INV


outputs tn-1, 1-α/2, the 1-α/2 th percentile of a t distribution with n-1
degrees of freedom.
27
Probability distributions

• Example
– What is the 97.5th percentile of a t distribution with
degree of freedom 24 ?
– T.INV.2T(0.05, 24)=2.06
OR -T.INV(0.025,24)

– What is the probability of t value larger than 2.064


in a t distribution with degree of freedom 24?

– T.DIST.2T(2.064,24)

28
Distribution of average and variance

• Consider a sampling distribution of the


average, with many random samples of size n
were collected from a population
• Sample standard deviation:

• Standard error of the mean is:

29
Distribution of average and variance

• Central limit effect:


– If parent distribution where the samples come
from is normal, the distribution of average is
normal
– If the parent distribution is not normal, the
distribution of average will be more nearly normal
than the parent one.
– With increasing number of sample n, the
distribution becomes increasingly more normal.

30
Distribution of average and variance

• How to estimate the t statistic?


– From normal parent population to samples with t
distribution with df= n-1:

– The sample variance s2 is distributed as Chi-


square distribution:

31
Distribution of average and variance

• Example:

From Sd to Se, 0.266

NORM.DIST(7.51,8,0.27,1) With t=-1.842 and =26,


T.DIST(-1.842,26,1)
N(8,0.27)

32
Tutorial session

You might also like