0% found this document useful (0 votes)

14 views6 pages

Numerical Descriptors of Data

Uploaded by

hanyeelovesgod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views6 pages

Numerical Descriptors of Data

Uploaded by

hanyeelovesgod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Seoul National University Instructor: Junho Song

Dept. of Civil and Environmental Engineering [email protected]

457.212 Statistics for Civil & Environmental Engineers

In-Class Material: Class 03
Numerical Descriptors of Data (A&T 1.2-1.3, Supp #1)

Partial descriptors, measures or descriptors to capture (estimated from data)

i) Central tendency: median, sample (s.) mean,
ii) Dispersion: range, IQR, mean absolute deviation, s. variance, s. standard dev., s.c.o.v.
iii) Asymmetry: skewness
iv) Linear dependence: s. covariance, s. correlation coeff.

1. Measure of Central Tendency

(a) Median ( x0.5 ): the middle value of the data set, ( )-percentile, ( )-quantile, ( )-quartile

N odd even

x  N +1  xN / 2  + xN / 2 +1

 2 
  2
median
{10, 29, 35} {10, 29, 35, 49}
x0.5 = x0.5 =

(b) Sample mean ( x ): the average of the sample values

1 N
x= =  xi
N i =1

* Example 1: ( ) is less sensitive to “outliers” (extreme values) than ( ).

{1, 2, 3, …, 100, 106}

x0.5 =
x=

X1 = c(1:100,1000000)

median(X1) # quantile(X1, 0.5) should give the same result

mean(X1)

* Example 2: In the case of a multi-peak distribution, median and sample mean can be
significantly different.

1
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]

Data Set (N = 2,001) x0.5 x

{1, ……, 1, 25, 100, ……, 100}

{24, ……, 24, 25, 26, ……, 26}

X2 = c(array(1,1000),25,array(100,1000))
X3 = c(array(24,1000),25,array(26,1000))

mean(X2)
mean(X3)

median(X2)
median(X3)

2. Measure of Dispersion

(a) Range: r =

~ nondecreasing function of the sample ( ), therefore not stable

~ unduly affected by high and low values
~ e.g. range of golf driving distances for 100 and 1,000 hits

(b) IQR (Inter Quartile Range) =

~ more stable
~ spread of ( )% population at the center
~ generally, ( x1− q − xq ) for small q can be used as a measure of dispersion ( q = 0.25 for
IQR)

AddisonCreek = read.table("AddisonCreek.txt", header=TRUE)

FR = AddisonCreek$FlowRate

range_FR = diff(range(FR))
IQR_FR = IQR(FR, type=2)

# minimum and maximum

min(FR)
max(FR)

How about using “the average of the deviations from the mean” as a measure of dispersion?

• Data set 1: {10, 20, 30, 40}

• Data set 2: {10, 10, 40, 40}

Question 1: Which data set has larger dispersion?

Question 2: What are the sample means?

Question 3: What is the average of the deviations for each data set?

2
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]

Since “the average of the deviations” idea does not work …

(c) Mean Absolute Deviation ( d ): average of absolute deviations

1 N
d= = | xi − x |
N i =1

(d) Sample Variance ( s 2 ): average of squared deviations

1 N
s2 = = 
N i =1
( xi − x ) 2

(e) Sample Standard Deviation ( s ): square root of sample variance

d s2 s
Data Set 1
{10, 20, 30, 40}

Data Set 2
{10, 10, 40, 40}

(f) “Unbiased” sample variance and standard deviations: divide by (N–1) instead of (N)

X4 = c(10,20,30,40)
X5 = c(10,10,40,40)

mad_X4 = mean(abs(X4-mean(X4)))
mad_X5 = mean(abs(X5-mean(X5)))

var(X4)
var(X5)

sd(X4)
sd(X5)

Comparison of dispersion of data sets with different units or quantities? Consider

unbiased sample variances of {1, 2, 3} and {2, 4, 6}.

We need a measure of dispersion that is not affected by “scaling” or “unit changes.”

(g) Sample Coefficient of Variation (C.O.V.; δ̂ )

δ̂ =

- dimensionless
- independent of ( ) or ( )

3
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]

- useful for comparing ( ) of data sets with different magnitude or quantity

- does not work when x is close to ( )

Sample c.o.v. of {1, 2, 3} and {2, 4, 6}?

X6 = c(1,2,3)
X7 = c(2,4,6)
sd(X6)
sd(X7)
sd(X6)/abs(mean(X6))
sd(X7)/abs(mean(X7))

3. How to install R packages

• Collections of functions and data sets developed by the community

• Can make R more powerful by improving existing base R functions, or by adding new
ones

- Example : R package "e1071"

install.packages("e1071") # install packages

library(e1071) # load and attach add-on packages

4. Measure of Asymmetry

(a) Sample Coefficient of Skewness (θ̂ )

θ̂ =

- Symmetric distribution:
- Asymmetric distribution:

If positive: “positive skewness” or “skewed to the ( )”

If negative: “negative skewness” or “skewed to the ( )”

skewness(FR) # Compute the skewness coeff. using the function

skewness in "e1071" package

5. Measure of Linear Dependence between Two Data Samples

Data given in pairs, i.e., ( x1 , y1 ), ( x2 , y2 ),..., ( xN , y N ) and interested in the dependence.

• “the larger xi , the larger yi ”: ( ) linear dependence
• “the larger xi , the smaller yi ”: ( ) linear dependence

Can be seen from “scatter plots.” Numerically?

4
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]

(a) Sample Covariance

s XY =
1
( )
N −1

~ the sign tells us the trend, but not about the ( ) of the dependence

(b) Sample Correlation Coefficient: divide the sample covariance by the product of sample
standard deviations
rXY =

- dimensionless
- Bounded by ( ) and ( ): [ ]  rxy  [ ]
- rXY  −1 : strong ( ) linear dependence
- rXY  1 : strong ( ) linear dependence
- rXY  0 : no significant linear dependence

Sketches of scatter plots of these three cases?

HT = AddisonCreek$Height
cov(FR,HT)
cor(FR,HT)

6. Example 1: Computational simulations of steel structures under earthquake ground motions

Download the dataset ‘Kim_Collapse.txt’ from the eTL website (generated during Mr. Taeyong
Kim’s PhD research)
Related reference: Deniz, D., J. Song, and J.F. Hajjar (2018). Energy-based sidesway collapse fragilities for ductile structural
frames under earthquake loadings. Engineering Structures. Vol. 174, 282- 294.

# Exercise 01: Scatter plot of Velocity Ratio (VR) and Drift Ratio (DR)

Kim = read.table("Kim_Collapse.txt")
VR = Kim$EquivalentVelocityRatio
DR = Kim$DriftRatio

plot(DR,VR)

# Exercise 02: Compare partial descriptors of two sets - median, mean,

maximum, minimum, variance, standard deviation, and c.o.v.

median(VR); mean(VR); max(VR); min(VR); var(VR); sd(VR);

sd(VR)/abs(mean(VR))
median(DR); mean(DR); max(DR); min(DR); var(DR); sd(DR);
sd(DR)/abs(mean(DR))

# Exercise 03: Compare boxplots of DR and VR (before/after scaling by

means)

boxplot(DR,VR); boxplot(DR/mean(DR),VR/mean(VR))

5
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]

7. Example 2: Sample correlation coefficient between DO (dissolved oxygen) and BOD

(biochemical oxygen demand) in water?

Civil Question Papers Vtu
No ratings yet
Civil Question Papers Vtu
15 pages
Karnataka PWD AE Paper With Sol@civilenggpdf PDF
No ratings yet
Karnataka PWD AE Paper With Sol@civilenggpdf PDF
35 pages
My Thesis
No ratings yet
My Thesis
75 pages
CSE30301
No ratings yet
CSE30301
436 pages
Solutions For Exercises in Structural Analysis 6th Edition by Kassimali
No ratings yet
Solutions For Exercises in Structural Analysis 6th Edition by Kassimali
20 pages
Introduction To R Software Environment
No ratings yet
Introduction To R Software Environment
6 pages
Sheng-Hong Chen (2018), Computational Geomechanics and Hydraulic Structures
100% (1)
Sheng-Hong Chen (2018), Computational Geomechanics and Hydraulic Structures
908 pages
Linear Theory of Hydrologic Systems
No ratings yet
Linear Theory of Hydrologic Systems
339 pages
Module 2
No ratings yet
Module 2
22 pages
Elements of Set Theory - Part II
No ratings yet
Elements of Set Theory - Part II
7 pages
Graphical Representation of Data
No ratings yet
Graphical Representation of Data
9 pages
PYP Numerical Solutions AY2021 S2
No ratings yet
PYP Numerical Solutions AY2021 S2
29 pages
Adv Eng Math Lecture Notes 1 Examples v2
No ratings yet
Adv Eng Math Lecture Notes 1 Examples v2
26 pages
Syeda Zunaira Ali, Assignment 5
No ratings yet
Syeda Zunaira Ali, Assignment 5
43 pages
4th Semester CIVIL
No ratings yet
4th Semester CIVIL
18 pages
Lect1 - Intro
No ratings yet
Lect1 - Intro
37 pages
Numerical Methods Second Assessment
No ratings yet
Numerical Methods Second Assessment
19 pages
Civil GATE 2022 Final Solutions Set-1
No ratings yet
Civil GATE 2022 Final Solutions Set-1
27 pages
CM02
No ratings yet
CM02
8 pages
CM04
No ratings yet
CM04
8 pages
Sangin Yahya
No ratings yet
Sangin Yahya
12 pages
CM05
No ratings yet
CM05
7 pages
Solution To Mat102 PQ 20219 2020 Tesla Academy
No ratings yet
Solution To Mat102 PQ 20219 2020 Tesla Academy
12 pages
Soutions of Numericals (Lec # 3)
No ratings yet
Soutions of Numericals (Lec # 3)
14 pages
Physics101 Rivera, Kylla Mae Me
No ratings yet
Physics101 Rivera, Kylla Mae Me
7 pages
3-2 Fall 22 Mid-1
No ratings yet
3-2 Fall 22 Mid-1
10 pages
2022 May Refresher V5 SOLUTION
No ratings yet
2022 May Refresher V5 SOLUTION
3 pages
Technical Test - 01 - SOL - PDF
No ratings yet
Technical Test - 01 - SOL - PDF
13 pages
Math Assp2
No ratings yet
Math Assp2
5 pages
ME351 HW6 Solution
No ratings yet
ME351 HW6 Solution
11 pages
(CES) CEHDROL Exam 1 Reviewer (Practice Problems)
No ratings yet
(CES) CEHDROL Exam 1 Reviewer (Practice Problems)
8 pages
N1 Building Science 2019 August QP
No ratings yet
N1 Building Science 2019 August QP
7 pages
5th Semester Past Papers (20-Batch)
No ratings yet
5th Semester Past Papers (20-Batch)
16 pages
Homework 3: Force Vectors (2.5 2.9) : Due: Tuesday, March 28, 2023
No ratings yet
Homework 3: Force Vectors (2.5 2.9) : Due: Tuesday, March 28, 2023
3 pages
Computing Stress Increments Due To External Loads
No ratings yet
Computing Stress Increments Due To External Loads
3 pages
Homework Set #7: Due 11:59 PM March 3, 2025 (Monday)
No ratings yet
Homework Set #7: Due 11:59 PM March 3, 2025 (Monday)
6 pages
Some Soultions (CH.1-CH4)
No ratings yet
Some Soultions (CH.1-CH4)
26 pages
M3 Oct - 2022
No ratings yet
M3 Oct - 2022
3 pages
CM01
No ratings yet
CM01
7 pages
Live HN
No ratings yet
Live HN
3 pages
41 Bcs Civil Written Q 04.01.22
No ratings yet
41 Bcs Civil Written Q 04.01.22
4 pages
GATE Exam CE Naveen Sardar Final (Morning Session) - 1
No ratings yet
GATE Exam CE Naveen Sardar Final (Morning Session) - 1
21 pages
Chapter 4 Example 6
No ratings yet
Chapter 4 Example 6
9 pages
CET308 - Module 1
No ratings yet
CET308 - Module 1
57 pages
Supply Chain Management Project
No ratings yet
Supply Chain Management Project
70 pages
Due April 6 (Monday) GB Drop Box in The 4: Assignment 9
No ratings yet
Due April 6 (Monday) GB Drop Box in The 4: Assignment 9
2 pages
Assignment 2 Caculus
No ratings yet
Assignment 2 Caculus
3 pages
Vibrations Rao 4thSI Ch12
No ratings yet
Vibrations Rao 4thSI Ch12
76 pages
Marking Guideline: Building and Structural Construction N5
No ratings yet
Marking Guideline: Building and Structural Construction N5
8 pages
Assignment 1 ERD 2021
No ratings yet
Assignment 1 ERD 2021
2 pages
2nd Year 2nd Semester Final Exam (CE 5th)
No ratings yet
2nd Year 2nd Semester Final Exam (CE 5th)
6 pages
Westergaard Formula
No ratings yet
Westergaard Formula
7 pages
Managerial Accounting and Cost Concepts
No ratings yet
Managerial Accounting and Cost Concepts
66 pages
اسئلة فلود
No ratings yet
اسئلة فلود
7 pages
FTP - Ffc.Int/9Fgddxtffatkffaddx 5-6qac F: Ffpddxfli - MN
No ratings yet
FTP - Ffc.Int/9Fgddxtffatkffaddx 5-6qac F: Ffpddxfli - MN
2 pages
0L ('Hvljqri Hsro/Phu&Rqfuhwhiruydu/Lqj Udwlrri1D 6L2 1D2+: 85.DZDGH
No ratings yet
0L ('Hvljqri Hsro/Phu&Rqfuhwhiruydu/Lqj Udwlrri1D 6L2 1D2+: 85.DZDGH
3 pages
5th Semester 2017 Past Papers
No ratings yet
5th Semester 2017 Past Papers
12 pages
과제2
No ratings yet
과제2
4 pages
11.2+11.3+
No ratings yet
11.2+11.3+
2 pages
6569
100% (1)
6569
3 pages
Master Budgeting
No ratings yet
Master Budgeting
82 pages
Sem. - 5 - SS - Market Research
No ratings yet
Sem. - 5 - SS - Market Research
17 pages
A Study On Customer Satisfaction Towards AMRUTANJAN Products
No ratings yet
A Study On Customer Satisfaction Towards AMRUTANJAN Products
71 pages
Communication Research - Joann Keyton
No ratings yet
Communication Research - Joann Keyton
414 pages
Dialog Consumer Satisfaction
No ratings yet
Dialog Consumer Satisfaction
17 pages
Marketing Research That Won T Break The Bank A Practical Guide To Getting The Information You Need 2nd Edition Alan R. Andreasen
No ratings yet
Marketing Research That Won T Break The Bank A Practical Guide To Getting The Information You Need 2nd Edition Alan R. Andreasen
86 pages
Chapter 5
No ratings yet
Chapter 5
143 pages
Study On Customer Satisfaction Towards Swiggy
No ratings yet
Study On Customer Satisfaction Towards Swiggy
38 pages
LESSON 8 - HMIS Data Quality
No ratings yet
LESSON 8 - HMIS Data Quality
4 pages
Applied Linear Models With SAS 1st Edition Daniel Zelterman Download
No ratings yet
Applied Linear Models With SAS 1st Edition Daniel Zelterman Download
54 pages
EFM Ch6
No ratings yet
EFM Ch6
35 pages
Scheme of Examination
No ratings yet
Scheme of Examination
98 pages
Batchno 48
No ratings yet
Batchno 48
61 pages
Bank Selection Criteria
No ratings yet
Bank Selection Criteria
36 pages
Chapter 1
No ratings yet
Chapter 1
37 pages
Prob and Statistics 3rd Lecture 12-9-2023 1
No ratings yet
Prob and Statistics 3rd Lecture 12-9-2023 1
25 pages
2016 IMPC Sampling Metallurgical Test
No ratings yet
2016 IMPC Sampling Metallurgical Test
11 pages
13 Dowden PP 400-452
No ratings yet
13 Dowden PP 400-452
53 pages
Data Type
No ratings yet
Data Type
22 pages
Group 1 STEM Chapter 1 5 Final Manuscript
No ratings yet
Group 1 STEM Chapter 1 5 Final Manuscript
62 pages
Digital Computer Concept and Practice: Supervised Learning
No ratings yet
Digital Computer Concept and Practice: Supervised Learning
30 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
101 pages
Vahab New
No ratings yet
Vahab New
66 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
31 pages
Assignment 2 570 Hien
No ratings yet
Assignment 2 570 Hien
37 pages
CM15 Extreme Value Distributions
No ratings yet
CM15 Extreme Value Distributions
7 pages
Week07b FitProbDist
No ratings yet
Week07b FitProbDist
19 pages
Regression Statistics
No ratings yet
Regression Statistics
29 pages
Esther Nangila
No ratings yet
Esther Nangila
21 pages
02 - Baseline Study On Awareness of Rights
No ratings yet
02 - Baseline Study On Awareness of Rights
55 pages
MRV1
No ratings yet
MRV1
6 pages
PDF
No ratings yet
PDF
38 pages
2007 S Fin
No ratings yet
2007 S Fin
1 page
2006 S Fin
No ratings yet
2006 S Fin
1 page
Week 6 Machine Learning
No ratings yet
Week 6 Machine Learning
17 pages
2005 S Fin
No ratings yet
2005 S Fin
1 page
2003 S Fin
No ratings yet
2003 S Fin
1 page
Pretest - Enrichment (Stat)
No ratings yet
Pretest - Enrichment (Stat)
2 pages
HandsOnExs Variables and DataType
No ratings yet
HandsOnExs Variables and DataType
3 pages
CED Unit 3 Circuit Training
No ratings yet
CED Unit 3 Circuit Training
2 pages
Handout Methodology
No ratings yet
Handout Methodology
2 pages

Numerical Descriptors of Data

Uploaded by

Numerical Descriptors of Data

Uploaded by

Seoul National University Instructor: Junho Song

Dept. of Civil and Environmental Engineering [email protected]

457.212 Statistics for Civil & Environmental Engineers

Partial descriptors, measures or descriptors to capture (estimated from data)

1. Measure of Central Tendency

x  N +1  xN / 2  + xN / 2 +1

(b) Sample mean ( x ): the average of the sample values

* Example 1: ( ) is less sensitive to “outliers” (extreme values) than ( ).

{1, 2, 3, …, 100, 106}

median(X1) # quantile(X1, 0.5) should give the same result

Data Set (N = 2,001) x0.5 x

{1, ……, 1, 25, 100, ……, 100}

{24, ……, 24, 25, 26, ……, 26}

~ nondecreasing function of the sample ( ), therefore not stable

(b) IQR (Inter Quartile Range) =

AddisonCreek = read.table("AddisonCreek.txt", header=TRUE)

# minimum and maximum

• Data set 1: {10, 20, 30, 40}

Question 1: Which data set has larger dispersion?

Question 2: What are the sample means?

Since “the average of the deviations” idea does not work …

(c) Mean Absolute Deviation ( d ): average of absolute deviations

(d) Sample Variance ( s 2 ): average of squared deviations

(e) Sample Standard Deviation ( s ): square root of sample variance

Comparison of dispersion of data sets with different units or quantities? Consider

We need a measure of dispersion that is not affected by “scaling” or “unit changes.”

(g) Sample Coefficient of Variation (C.O.V.; δ̂ )

- useful for comparing ( ) of data sets with different magnitude or quantity

Sample c.o.v. of {1, 2, 3} and {2, 4, 6}?

3. How to install R packages

• Collections of functions and data sets developed by the community

- Example : R package "e1071"

install.packages("e1071") # install packages

(a) Sample Coefficient of Skewness (θ̂ )

If positive: “positive skewness” or “skewed to the ( )”

skewness(FR) # Compute the skewness coeff. using the function

5. Measure of Linear Dependence between Two Data Samples

Data given in pairs, i.e., ( x1 , y1 ), ( x2 , y2 ),..., ( xN , y N ) and interested in the dependence.

Can be seen from “scatter plots.” Numerically?

(a) Sample Covariance

Sketches of scatter plots of these three cases?

6. Example 1: Computational simulations of steel structures under earthquake ground motions

# Exercise 02: Compare partial descriptors of two sets - median, mean,

median(VR); mean(VR); max(VR); min(VR); var(VR); sd(VR);

# Exercise 03: Compare boxplots of DR and VR (before/after scaling by

7. Example 2: Sample correlation coefficient between DO (dissolved oxygen) and BOD

You might also like