0% found this document useful (0 votes)
4 views

Probability Distribution Functions and Partial Descriptors

The document provides in-class material for a statistics course focused on civil and environmental engineering, covering probability distribution functions and their descriptors. It includes examples of calculating probability density functions (PDF), cumulative distribution functions (CDF), and various statistical measures such as mean, median, variance, and skewness. Additionally, it presents methods for generating random samples and visualizing empirical distributions using R programming.

Uploaded by

hanyeelovesgod
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Probability Distribution Functions and Partial Descriptors

The document provides in-class material for a statistics course focused on civil and environmental engineering, covering probability distribution functions and their descriptors. It includes examples of calculating probability density functions (PDF), cumulative distribution functions (CDF), and various statistical measures such as mean, median, variance, and skewness. Additionally, it presents methods for generating random samples and visualizing empirical distributions using R programming.

Uploaded by

hanyeelovesgod
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Seoul National University Instructor: Junho Song

Dept. of Civil and Environmental Engineering [email protected]

457.212 Statistics for Civil & Environmental Engineers


In-Class Material: Class 10
Probability Distribution Functions and Partial Descriptors (A&T: 3.1)

Example 1: Suppose the likelihood of accidents is uniform along the 100 km highway. Let
X denote the distance between the starting point and an accident location. Determine

0 100

(a) Probability density function (PDF) of X and plot

(b) Cumulative distribution function (CDF) of X and plot

(c) P(20  X  50) by use of PDF and CDF

f X (x)

FX (x)

ex01_sample = runif(10000, min=0, max=100)


# generate random numbers following uniform distribution U(0,100)
hist(ex01_sample, freq=FALSE, breaks=seq(0,100,5), main="PDF")
# plot empirical PDF from samples: use hist with freq=FALSE
plot(ecdf(ex01_sample), verticals=TRUE, pch="", main="CDF")

1
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]

# plot empirical CDF using samples

ex01_c = subset(ex01_sample, (ex01_sample>=20 & ex01_sample<=50))


# subset of elements satisfying the condition
c_ans = length(ex01_c)/length(ex01_sample) # P(20≤X≤50)

Example 2: Suppose an object can fall anywhere within a 10-km radius circle at random
(i.e., uniform likelihood over points inside the circle). Determine PDF and CDF of the
distance between the location and the center of the circle, 𝑅.

# creating a user-defined function


ex02 = function(num) { # create a function
samp = c()
repeat{ # repeated infinitely without ‘break’
x = runif(1, min=-10, max=10)
y = runif(1, min=-10, max=10)
if(x^2+y^2 < 100){ # random sample is located inside of the circle
samp = rbind(samp, c(x,y))
}
if(dim(samp)[1]==num) break # break when number of sample is ‘num’
}
return(samp) # result of function ‘ex02’ is ‘samp’
}

# scatter plot of samples


ex02_sample = ex02(1000)

2
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]

plot(ex02_sample, xlab="x", ylab="y", asp=1)

# draw CDF
distance = sqrt(ex02_sample[,1]^2+ex02_sample[,2]^2)
ex02_sample = cbind(ex02_sample, distance)
plot(ecdf(ex02_sample[,3]), verticals=TRUE, pch="", main="CDF")

1. Partial Descriptors of a random variable

(a) “Complete” description by probability functions:

(b) “Partial” descriptors: measures of key characteristics; can derive from ( )

“Numerical descriptions of sample are estimates of partial descriptors representing


populations” e.g., 𝑥̅ and 𝜇

Note:

• Expectation: E[] =  () f X ( x)dx (continuous) or  () p X ( x) (discrete)
− all x

x
𝑛] ∞
∫−∞ 𝑥 𝑛 𝑓𝑋 (𝑥)𝑑𝑥
n
• Moment: 𝐸[𝑋 = or p X ( x)
all x

• Central Moment, E[( X −  X ) ] =  ( x −  X ) n f X ( x)dx or
n
 ( x −  X ) n p X ( x)
− all x

Name Definition Meaning (PDF/CDF)


Location of the ( )
of an area underneath ( )
First moment,
Mean,  X
E[ X ]
Measure of Central Location

The value of a r.v. at which


FX ( x0.5 ) = 0.5 values above and below it
Median, x 0.5 are _______lly probable.
FX−1 (0.5) If symmetric?

The outcome that has the _______est


probability mass or density

Mode, ~
x arg max f X ( x )
x

Becomes more useful as the asymmetry of


the distribution increases

3
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]

Average of squared deviations


Second-order
central moment
Variance,  2X E[( X −  X ) 2 ]
= E[ X 2 ] − E[ X ] 2
Measure of Dispersion

Radius of ( )

Standard
Deviation,  X  2X

__________ed radius of ( )
Coefficient of X
Variation
(C.O.V.),  X | X |

Third-order central Behavior of two tails


Asymmetry

moment normalized
Coefficient of by  3X , >0
Skewness,  X =0
E[( X −  X ) 3 ] <0
 3X
Fourth-order “Peakedness” - more of the variance is due
central moment to infrequent extreme deviations, as
Flatness

Coefficient of normalized by  4X , opposed to frequent modestly-sized


Kurtosis,  X deviations.
E[( X −  X ) 4 ]
 4X

Example 3: Compute the mean, median, mode, variance, standard deviation and
coefficient of variation for a discrete random variable X whose PMF is given as

x PX (x)
0 0.20
1 0.50
2 0.30

4
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]

Example 4: Compute the mean, median, mode, variance, standard deviation and coefficient
of variation for a continuous random variable X whose PDF is given by
f X ( x) = (3 / 1000) x 2 , 0  x  10 and 0 elsewhere. What is your guess on the sign of the
coefficient of the skewness? Confirm your guess by computing it.

# mean
ex04_pdf = function(x) {x*(3/1000)*x^2}
mean_data = integrate(ex04_pdf, lower=0, upper=10) # integrate function
mean = mean_data$value # only 'value' of the integrate result

# variance
ex04_var = function(x) {(x-mean)^2*(3/1000)*x^2}
var_data = integrate(ex04_var, lower=0, upper=10)
var = var_data$value
std = sqrt(var)

# skewness
ex04_skew = function(x) {(x-mean)^3*(3/1000)*x^2}

5
Seoul National University Instructor: Junho Song
Dept. of Civil and Environmental Engineering [email protected]

skew_data = integrate(ex04_skew, lower=0, upper=10)


skew = skew_data$value/std^3
# kurtosis
ex04_kurt = function(x) {(x-mean)^4*(3/1000)*x^2}
kurt_data = integrate(ex04_kurt, lower=0, upper=10)
kurt = kurt_data$value/std^4

You might also like