0% found this document useful (0 votes)

22 views33 pages

Lecture 1 Introduction

Uploaded by

sharontao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views33 pages

Lecture 1 Introduction

Uploaded by

sharontao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

ENVR 5320

Environmental Data Analysis

Lecture 1

Dr. Zhi NING

Division of Environment and Sustainability
The Hong Kong University of Science and Technology
Agenda

• Environmental problems and statistics

• Brief review of statistics
• Get familiar with the Excel tools
• Statistical distribution measures
• Probability distributions

2
Environmental Problems and Statistics

• The goal of statistics

– to make discovery process efficient.
• Environmental laws and regulations:
– toxic chemicals;
– water quality criteria;
– air quality criteria.
• Environmental data
– the limit of detection;
– acute and chronic toxicity criteria;
– cancer potency factors.

3
Environmental Problems and Statistics

• Use statistic tools to understand the nature

4
Structure of course teaching

• Introduction
– engineering problem and statistical method.
• Case study
– introduce a specific environmental example with
real world data
• Method
– give a brief explanation of statistical method that is
used to prepare the solution.
• Analysis
– show how the data suggest and influence the
method of analysis and give the solution.
5
Brief Review of Statistics

• Population and sample

– A population is a very large set of N observations
(or data values) from which the sample of n
observations can be imagined to have come.
• Two types of statistics:
– DESCRIPTIVE
• a way of summarizing the complexity of data with a
single number.
– INFERENTIAL
• answer the question, "To what extent can these findings
be GENERALIZED?

6
Brief Review of Statistics

•DESCRIPTIVE Statistics
– For one variable ("univariate analysis"):
– Measures of "CENTRAL TENDENCY" (averages) and
of DISPERSION or variance around that average.
– Examples: Means, Modes, Medians, Standard
Deviation, quartiles etc

– For multiple variables:

– The strength of relationship between two variables
(bivariate analysis) or among a set of variables
(multivariate analysis)
– Examples: correlation coefficient
7
Brief Review of Statistics

•INFERENTIAL Statistics
– Measures of the SIGNIFICANCE of the relationship
between two or more variables. Significance refers to
the probability that the findings could be attributed to
sampling error.
– Appropriate statistics depend on the LEVEL OF
MEASUREMENT OF THE DEPENDENT VARIABLE
(and of the independent variable).
– Example: t-Test, ANOVA (F-ratio)

8
Let’s get Familiar with Excel Advanced
Tools
• Formula in Excel
• Hidden Developer functions in Excel.
• Practice calculations in Excel data example
• Good practice in using Excel

9
Excel basics I

• Use of formula
• Use of $
• Use of shortcut to go to cells
• Note the black and white cross
• Plot
• Use of Ctrl + Shift + Enter for array calculation
• Developer tool
• ActiveX

10
Statistical distribution measures

• Central values
– Arithmetic mean, Geometric mean
– Mode, Median
• Measures of spread
– The range
– The interquartile range (IQR)
– Standard deviation, variance
– Coefficient of variation (CoV)
• Quartiles, Quantiles and percentiles

11
Statistical distribution measures

• Central values
– Arithmetic mean Average(a,b,c)

– Geometric mean

–
Geomean(a,b,c)
– Mode: value with highest probability of occurrence
– The median: central value of the ordered data
Median(a,b,c)
• Trimmed mean:
– e.g. 5 percent trimmed mean is the average of the
data between 5th and 95th percentiles 12
Statistical distribution measures

• Influence of the shape of the data distribution

• “heavy tails”.
• Arithmetic mean is
• The “heaviness” of the
influenced by high
tails depends degrees of
values;
freedom (df)
• G is same as median
• G best represents

• Right skewed
• Higher df leads to
normal dist.

13
• Bimodal distribution in nature
• The implications

14
Statistical distribution measures

• Measures of spread
– The range (MIN and MAX)
– The interquartile range (IQR)
Percentile (array, k)
Quartile (array, 0/1/2/3/4)
IQR=0.7413*(Q3-Q1)
– The standard deviation

15
Statistical distribution measures

• Measures of spread
– Variance
VAR(array)

– Coefficient of variation (CV)

16
Statistical distribution measures

• Measures of spread
– Quartiles, quantiles and percentiles
Quartile (array, 0/1/2/3/4)
Percentile (array, 0.05/0.10/0.95)
– Skewness:
• measure of symmetry of data distribution
Skew (array)
0 is symmetric; <0, left skewed; >0, right
skewed.

17
Statistical distribution measures

• Frequency distributions
– Identify cutting points to divide the data into
categories. The cutoff points should be chosen to
divide the data fairly evenly.
Frequency (data_array,bin_array)
PRESS SHIFT/CTRL/ENTER
Bin Frequency
1 10 2
2 20 0
3 30 2
4 40 3
5 50 5
6 60 4
7 70 2
8 80 0
9 90 1
10 100 1 18
Statistical distribution measures

• Accuracy, Bias and Precision

– Bias measures systematic errors
– Precision measures the degree of scatter in the
data
– Accuracy is a function of both bias and precision.

A known concentration of 8.00 mg/L.

19
Probability distributions

• The Normal Distribution

– Often called Gaussian distribution
– Characterized completely by N(η, σ2 ), “a normal
distribution with mean η and variance σ2 .

20
Read and type Greek letters correction

• Alt 956
• Alt 963
• Alt 961
• Alt 960

https://www.thespruceeats.com/the-greek-
21
alphabet-1705558
Probability distributions

• The Normal Distribution

1. The vertical axis (probability density) is scaled
such that area under the curve is unity (1.0).
2. The standard deviation σ measures the distance
from the mean to the point of inflection.
3. The probability that a positive deviation from the
mean will exceed one σ is 0.1587.
4. Because of symmetry, the probabilities are the
same for negative deviations
5. The chance that a deviation in either direction will
exceed 2σ is 2(0.0228) = 0.0456

22
Probability distributions

• NORM.DIST(x, mean, standard_dev, cumulative)

– Returns the normal cumulative distribution of with specific η and σ.
– Returns α value with given z and η σ values.
• NORM.INV (probability, mean, standard_dev)
– Returns the inverse of the normal cumulative distribution for η and σ.
– Returns z value with given α, η and σ values
• NORM.S.DIST (z, cumulative)
– Returns the standard normal cumulative distribution of with η=0 and σ=1
• NORM.S.INV (probability)
– Returns the inverse of the standard normal distribution with η=0 and σ=1

• Cumulative or not?
• Left tailed or right tailed?
• How to generate a normal distribution in excel?

23
Probability distributions

• Examples
– A normal distribution with η=8mg/L and σ=1 mg/L;
– Look for the value with 95% of data below?
– Look for the probability that the value is read
below 6.4mg/L?

– How to draw a normal distribution in Excel?

– Use function: norm.inv(rand(),8,1,1)

24
Probability distributions

• t distribution
– In normal distributions, both η and σ are known;
– In practice, σ is often not known and we use Se to
replace σ:

– Bell shaped and symmetric but tails are wider.

– Width of the t distribution depends on degree of
freedom.

Guinness brewer
Gosset, 1908 25
“Student” as pen name
Probability distributions

• Part of the t table as function of  and 

26
Probability distributions

• T.INV (probability, degree of freedom)

– Returns the inverse of the left tailed Student t distribution
• T.INV.2T (probability, degree of freedom)
– Returns the inverse of the two tailed Student t distribution

• T.DIST (x, degree of freedom, cumulative)

– Returns the left tailed Student t distribution
• T.DIST.RT (x, degree of freedom, cumulative)
– Returns the right tailed Student t distribution
• T.DIST.2T (x, degree of freedom, cumulative)
– Returns the two tailed Student t distribution

• If we enter α as probability and n-1 as Deg_freedom, then T.INV

outputs tn-1, 1-α/2, the 1-α/2 th percentile of a t distribution with n-1
degrees of freedom.
27
Probability distributions

• Example
– What is the 97.5th percentile of a t distribution with
degree of freedom 24 ?
– T.INV.2T(0.05, 24)=2.06
OR -T.INV(0.025,24)

– What is the probability of t value larger than 2.064

in a t distribution with degree of freedom 24?

– T.DIST.2T(2.064,24)

28
Distribution of average and variance

• Consider a sampling distribution of the

average, with many random samples of size n
were collected from a population
• Sample standard deviation:

• Standard error of the mean is:

29
Distribution of average and variance

• Central limit effect:

– If parent distribution where the samples come
from is normal, the distribution of average is
normal
– If the parent distribution is not normal, the
distribution of average will be more nearly normal
than the parent one.
– With increasing number of sample n, the
distribution becomes increasingly more normal.

30
Distribution of average and variance

• How to estimate the t statistic?

– From normal parent population to samples with t
distribution with df= n-1:

– The sample variance s2 is distributed as Chi-

square distribution:

31
Distribution of average and variance

• Example:

From Sd to Se, 0.266

NORM.DIST(7.51,8,0.27,1) With t=-1.842 and =26,

T.DIST(-1.842,26,1)
N(8,0.27)

32
Tutorial session

40 Multiple Choice Questions in Basic Statistics
89% (9)
40 Multiple Choice Questions in Basic Statistics
8 pages
Statistics, Data Analysis, and Decision Modeling, 5th Edition
100% (6)
Statistics, Data Analysis, and Decision Modeling, 5th Edition
556 pages
STAB22 Lecture's Notes
No ratings yet
STAB22 Lecture's Notes
64 pages
(Springer Series in Statistics) Jun Shao, Dongsheng Tu (Auth.) - The Jackknife and Bootstrap-Springer-Verlag New York (1995)
100% (1)
(Springer Series in Statistics) Jun Shao, Dongsheng Tu (Auth.) - The Jackknife and Bootstrap-Springer-Verlag New York (1995)
532 pages
Assignment #3 Inferential Statistics Analysis and Writeup
No ratings yet
Assignment #3 Inferential Statistics Analysis and Writeup
4 pages
Bastidas HW 8 Chap 6
No ratings yet
Bastidas HW 8 Chap 6
4 pages
Basics of Statistics For Analytics Using SAS/ Excel
No ratings yet
Basics of Statistics For Analytics Using SAS/ Excel
28 pages
Lesson 4 Notes
No ratings yet
Lesson 4 Notes
14 pages
Chapter1 Statistic
No ratings yet
Chapter1 Statistic
33 pages
History Reporting
No ratings yet
History Reporting
61 pages
Quality Control: Fundamentals of Statistics
No ratings yet
Quality Control: Fundamentals of Statistics
62 pages
Sampling and Sampling Distribution With Business Application - v2
No ratings yet
Sampling and Sampling Distribution With Business Application - v2
11 pages
Stat Chapter 5-9
No ratings yet
Stat Chapter 5-9
32 pages
Manm526 W1
No ratings yet
Manm526 W1
38 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Stat Distributions
No ratings yet
Stat Distributions
24 pages
iQRM Warm Up Week 5 February 17 Corrected
No ratings yet
iQRM Warm Up Week 5 February 17 Corrected
39 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
Spring Semester, 2020-2021
No ratings yet
Spring Semester, 2020-2021
40 pages
MAT 211 Introduction To Business Statistics I Lecture Notes
No ratings yet
MAT 211 Introduction To Business Statistics I Lecture Notes
69 pages
Business Statistics A First Course - 6ed Index
0% (2)
Business Statistics A First Course - 6ed Index
7 pages
Chapter Two
No ratings yet
Chapter Two
36 pages
Basic Statistics - Hill
No ratings yet
Basic Statistics - Hill
44 pages
ST8114 Module1 PartI UnivariateEDA
No ratings yet
ST8114 Module1 PartI UnivariateEDA
60 pages
Term 3 Mathematics (Session 1 - 4) 2021 Learner Stats and Probability Final
No ratings yet
Term 3 Mathematics (Session 1 - 4) 2021 Learner Stats and Probability Final
52 pages
Mathematics As A Tool 1
No ratings yet
Mathematics As A Tool 1
68 pages
Statistics Notes 1702100127
No ratings yet
Statistics Notes 1702100127
22 pages
H1.1 Definitions, Measures, Plots, CLT
No ratings yet
H1.1 Definitions, Measures, Plots, CLT
83 pages
2466939-EDA and STATISTICS NOTES
No ratings yet
2466939-EDA and STATISTICS NOTES
15 pages
Part 3
No ratings yet
Part 3
36 pages
Week1 Introduction
No ratings yet
Week1 Introduction
36 pages
Tpe 517 Geostatistics II
No ratings yet
Tpe 517 Geostatistics II
83 pages
M6 - Basic Statistics
No ratings yet
M6 - Basic Statistics
66 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Analytics Compendium (Incl Stats)
No ratings yet
Analytics Compendium (Incl Stats)
31 pages
Data Types:: Basic Statistics
No ratings yet
Data Types:: Basic Statistics
23 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
59 pages
Cmda2005 Review
No ratings yet
Cmda2005 Review
65 pages
ANALYST Sources
No ratings yet
ANALYST Sources
23 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
14 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
Stats and Maths For Data Analyst
No ratings yet
Stats and Maths For Data Analyst
23 pages
Case Study - DBM 30033 (Dad3s1)
No ratings yet
Case Study - DBM 30033 (Dad3s1)
14 pages
Reliability Distribution 1
No ratings yet
Reliability Distribution 1
41 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Lecture01
No ratings yet
Lecture01
76 pages
Prob & Stats (Slides) PDF
No ratings yet
Prob & Stats (Slides) PDF
101 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
SCSA1606 - Predictive and Advanced Analytics - Unit II
No ratings yet
SCSA1606 - Predictive and Advanced Analytics - Unit II
50 pages
02 - Descriptive Statistics
No ratings yet
02 - Descriptive Statistics
45 pages
Probability and Statistics I - 2023
No ratings yet
Probability and Statistics I - 2023
197 pages
BIOEPI
No ratings yet
BIOEPI
2 pages
Math Test Prep File
No ratings yet
Math Test Prep File
88 pages
Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
No ratings yet
Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
4 pages
QM Formula Class
No ratings yet
QM Formula Class
31 pages
MATM111 Midterms REVIEWER
No ratings yet
MATM111 Midterms REVIEWER
3 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Module 1 Overview - of - Statistics
No ratings yet
Module 1 Overview - of - Statistics
11 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Lecture 7 General Applications
No ratings yet
Lecture 7 General Applications
21 pages
Lecture 6 Regression Analysis
No ratings yet
Lecture 6 Regression Analysis
35 pages
Lecture 9 Time Series Data Analysis
No ratings yet
Lecture 9 Time Series Data Analysis
19 pages
SQL 1 in Class Exercises Answers
No ratings yet
SQL 1 in Class Exercises Answers
2 pages
The Diagram Below Shows The Structure of An Atom of Element X
No ratings yet
The Diagram Below Shows The Structure of An Atom of Element X
6 pages
Part II Multiple Choice Questions: Gas Melting Point (°C) Boiling Point (°C) Abundance (%)
No ratings yet
Part II Multiple Choice Questions: Gas Melting Point (°C) Boiling Point (°C) Abundance (%)
4 pages
L6 - Biostatistics - Linear Regression and Correlation
No ratings yet
L6 - Biostatistics - Linear Regression and Correlation
121 pages
Module - 4 PDF
No ratings yet
Module - 4 PDF
15 pages
Prob Stats Module 4 2
No ratings yet
Prob Stats Module 4 2
80 pages
BL 234 Revised Correlation Notes
No ratings yet
BL 234 Revised Correlation Notes
8 pages
FSS 102
No ratings yet
FSS 102
21 pages
BU FCAI BS111 P&S Lec08
No ratings yet
BU FCAI BS111 P&S Lec08
66 pages
Practis Exam Chapter 8
No ratings yet
Practis Exam Chapter 8
12 pages
Mahajan Sir
No ratings yet
Mahajan Sir
29 pages
MScFE 610 ECON - Compiled - Video - Transcripts - M4
No ratings yet
MScFE 610 ECON - Compiled - Video - Transcripts - M4
9 pages
Week 4 Hypothesis Test Concerning Proportions
No ratings yet
Week 4 Hypothesis Test Concerning Proportions
33 pages
Trip Generation (Cont.)
No ratings yet
Trip Generation (Cont.)
36 pages
Markov Chain Monte Carlo Methods: Christian P. Robert
No ratings yet
Markov Chain Monte Carlo Methods: Christian P. Robert
456 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Berhan Abera PDF
No ratings yet
Berhan Abera PDF
57 pages
Standard Deviation and Variance
No ratings yet
Standard Deviation and Variance
6 pages
Measures of Central Tendency Practice 2
No ratings yet
Measures of Central Tendency Practice 2
2 pages
Stat 235 Lab Assignment 2
No ratings yet
Stat 235 Lab Assignment 2
13 pages
Logistic Regression Analysis 2022
No ratings yet
Logistic Regression Analysis 2022
38 pages
Tutorial Stat 322 PDF
No ratings yet
Tutorial Stat 322 PDF
58 pages
Errors in Chemical Analyses
No ratings yet
Errors in Chemical Analyses
11 pages
Problem Sheet Module-1
No ratings yet
Problem Sheet Module-1
2 pages
ML-Lecture-12 (Evaluation Metrics For Classification)
No ratings yet
ML-Lecture-12 (Evaluation Metrics For Classification)
15 pages
Instant Ebooks Textbook Practical Statistics For Nursing Using SPSS 1st Edition, (Ebook PDF) Download All Chapters
100% (3)
Instant Ebooks Textbook Practical Statistics For Nursing Using SPSS 1st Edition, (Ebook PDF) Download All Chapters
27 pages
4 - of Tests and Testing
100% (1)
4 - of Tests and Testing
16 pages
Excel Skills For Business Forecasting
No ratings yet
Excel Skills For Business Forecasting
5 pages
Eliciting Multivariate Probability Distributions
No ratings yet
Eliciting Multivariate Probability Distributions
23 pages

Lecture 1 Introduction

Uploaded by

Lecture 1 Introduction

Uploaded by

ENVR 5320

Environmental Data Analysis

Dr. Zhi NING

• Environmental problems and statistics

• The goal of statistics

• Use statistic tools to understand the nature

• Population and sample

– For multiple variables:

• Influence of the shape of the data distribution

– Coefficient of variation (CV)

• Accuracy, Bias and Precision

A known concentration of 8.00 mg/L.

• The Normal Distribution

• The Normal Distribution

• NORM.DIST(x, mean, standard_dev, cumulative)

– How to draw a normal distribution in Excel?

– Bell shaped and symmetric but tails are wider.

• Part of the t table as function of  and 

• T.INV (probability, degree of freedom)

• T.DIST (x, degree of freedom, cumulative)

• If we enter α as probability and n-1 as Deg_freedom, then T.INV

– What is the probability of t value larger than 2.064

• Consider a sampling distribution of the

• Standard error of the mean is:

• Central limit effect:

• How to estimate the t statistic?

– The sample variance s2 is distributed as Chi-

From Sd to Se, 0.266

NORM.DIST(7.51,8,0.27,1) With t=-1.842 and =26,

You might also like