Numerical Descriptors of Data
Numerical Descriptors of Data
(a) Median ( x0.5 ): the middle value of the data set, ( )-percentile, ( )-quantile, ( )-quartile
N odd even
1 N
x= = xi
N i =1
x0.5 =
x=
X1 = c(1:100,1000000)
* Example 2: In the case of a multi-peak distribution, median and sample mean can be
significantly different.
1
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]
X2 = c(array(1,1000),25,array(100,1000))
X3 = c(array(24,1000),25,array(26,1000))
mean(X2)
mean(X3)
median(X2)
median(X3)
2. Measure of Dispersion
(a) Range: r =
~ more stable
~ spread of ( )% population at the center
~ generally, ( x1− q − xq ) for small q can be used as a measure of dispersion ( q = 0.25 for
IQR)
range_FR = diff(range(FR))
IQR_FR = IQR(FR, type=2)
How about using “the average of the deviations from the mean” as a measure of dispersion?
Question 3: What is the average of the deviations for each data set?
2
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]
1 N
d= = | xi − x |
N i =1
1 N
s2 = =
N i =1
( xi − x ) 2
d s2 s
Data Set 1
{10, 20, 30, 40}
Data Set 2
{10, 10, 40, 40}
(f) “Unbiased” sample variance and standard deviations: divide by (N–1) instead of (N)
X4 = c(10,20,30,40)
X5 = c(10,10,40,40)
mad_X4 = mean(abs(X4-mean(X4)))
mad_X5 = mean(abs(X5-mean(X5)))
var(X4)
var(X5)
sd(X4)
sd(X5)
δ̂ =
- dimensionless
- independent of ( ) or ( )
3
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]
X6 = c(1,2,3)
X7 = c(2,4,6)
sd(X6)
sd(X7)
sd(X6)/abs(mean(X6))
sd(X7)/abs(mean(X7))
4. Measure of Asymmetry
θ̂ =
- Symmetric distribution:
- Asymmetric distribution:
4
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]
s XY =
1
( )
N −1
~ the sign tells us the trend, but not about the ( ) of the dependence
(b) Sample Correlation Coefficient: divide the sample covariance by the product of sample
standard deviations
rXY =
- dimensionless
- Bounded by ( ) and ( ): [ ] rxy [ ]
- rXY −1 : strong ( ) linear dependence
- rXY 1 : strong ( ) linear dependence
- rXY 0 : no significant linear dependence
HT = AddisonCreek$Height
cov(FR,HT)
cor(FR,HT)
Download the dataset ‘Kim_Collapse.txt’ from the eTL website (generated during Mr. Taeyong
Kim’s PhD research)
Related reference: Deniz, D., J. Song, and J.F. Hajjar (2018). Energy-based sidesway collapse fragilities for ductile structural
frames under earthquake loadings. Engineering Structures. Vol. 174, 282- 294.
# Exercise 01: Scatter plot of Velocity Ratio (VR) and Drift Ratio (DR)
Kim = read.table("Kim_Collapse.txt")
VR = Kim$EquivalentVelocityRatio
DR = Kim$DriftRatio
plot(DR,VR)
boxplot(DR,VR); boxplot(DR/mean(DR),VR/mean(VR))
5
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]