CE-613 - DOC - 01 Introduction To Course, Measurement Scale, Simulation, Graphs, Fallacies-1
CE-613 - DOC - 01 Introduction To Course, Measurement Scale, Simulation, Graphs, Fallacies-1
Population Sample
Interpretation,
Develop Data
Discussion, and
Collection Plan
Conclusion
Analysis
Clean and Manage Data
(Descriptive -> Diagnostic -> Predictive)
Course Text Book
• Mostly for the guiding the syllabus
Data Measurement Scales: Qualitative
• Nominal/Categorical Scale (nominal ~ name)
• Label variables without quantitative value
• Data in category without specific order (Eye color, land use, transportation modes)
• Dichotomous: Failed and Passed, Male and Female
• Graphs: Bar, Pie,
• Ordinal Scale (ord ~ order)
• Label/Categorize in natural order
• Example => (1) “agree, neutral, disagree”, (2) strong, moderately strong, weak
• Size of steps between items is unequal/meaningless
• Graphs: Bar, Pie, Stem and Leaf
Data Measurement Scales: Quantitative
• Interval Scale (Interval = space in between)
• Labelling, in natural order and difference between categories is identical
• “Zero-point” does not mean absence of value. Zero is just a number on the scale by convention
• Negative value is possible
• Ratio is meaningless. We cannot say 20 deg C is twice as warmer as 10 deg C
• Example: Temperature in deg C. Time shown on a clock.
• Location of zero point is not fixed. No pre-decided starting point or a true zero value
• Graphs: Bar, Pie, Stem and Leaf, Box plot and Histogram
• Ratio scale
• Items have order, exact values between units and absolute zero and ratio are meaningful (Ratio
scale provides the most detailed information)
• Example: Age, Weight, Kelvin
• Since absolute zero exist, negative value do not exist.
• The zero point characteristic makes ratio meaningful
• Example: Temperature in Kelvin scale. Travel time in minutes/hours
• 40 kg is twice as heavy as 20 kg
• Graphs: Bar, Pie, Stem and Leaf, Box plot and Histogram, Line Plot, Scatter Plot
Data Measurement Scales
Nominal Ordinal Interval Ratio
1 Order values
2 Counts, frequency of distribution
3 Estimate Mode
4 Estimate Median
5 Estimate Mean
6 Quantify the difference between each value
7 Add and subtract values
8 Multiple and divide values
9 Quantify true/absolute zero
Simulation for Data Analysis
• Real-world data may not cover extreme situations that are important in design
• Nothing stops extreme cases to occur in future
• But we do not know when and therefore any system is always at risk
• Uncertainty is everywhere in engineering
• We can simulate the extreme cases to check how the system would behave
• Is the system ready for those events?
• To find we need to simulate such cases and assess the impact in a “model environment” and
then select a course of action
• We need to assess how risk sensitive a system is by studying the consequences of
hazardous events. That is were simulation comes in to help utility uncertainty and
quantify risk
• We can use the method of simulation to sample and get realizations of the certain
outcomes and answer “what would happen in various scenarios”. This is also called
“What-if-analysis”.
Transformation of random number
• Linear or Non-linear
• Continuous or discrete
• Rolling a fair dice (each of the values 1 through 6) are equally probable. That is
a uniform distribution in a discrete sense since values other than those 6 are
not possible.
• Similarly there is the uniform distribution in the continuous sense where each
value between (and including) two values is equally probable (or likely to
occur)
• Tossing a balanced/unbiased coin: Head or Tail are equally probable.
• How do we generate flip of a coin using a dice? Think about even/odd
• Can we transform the random outcome(s) of a coin to simulate numbers
generated by a unbiased dice (6 sided)?
Simulating Data: Linear transformation
• Simulation is done by using a random number which can be mapped to a
variable of interest. E.g. Excel has the rand() function
• Example1 → If the height h of waves is between 0 to 5 meter. A random
number 𝑢 ∈ [0,1] can be used to generate h
• ℎ𝑚𝑖𝑛 = 0 and ℎ𝑚𝑎𝑥 = 5
• ℎ𝑟𝑎𝑛𝑑𝑜𝑚 = 𝑢 ∗ ℎ𝑚𝑎𝑥 − ℎ𝑚𝑖𝑛 + ℎ𝑚𝑖𝑛 (This is a linear transformation)
• If 𝑢 = 0.13, 0.69, 0.10 , ℎ = 0.65, 3.45, 0.50
• The generated random value would depend on the distribution from which it is
generated/drawn. If a distribution is not provided, the random variable is
generally drawn from a uniform distribution.
Simulating Data: Non-linear transformation
• Example2 → Stress at the extreme fibers of a structural steel
beam of a bridge is given by
𝑀𝑐
𝑠= ≤ 𝑓𝑦 ,
𝐼
𝑋
3 𝑖𝑓 2Τ6 < 𝑋 ≤ 3Τ6
•𝑌=
4 𝑖𝑓 3Τ6 < 𝑋 ≤ 4Τ6
5 𝑖𝑓 4Τ6 < 𝑋 ≤ 5Τ6
6, 5Τ6 < 𝑋 ≤ 1
• The transformation can also be depicted in the form of graph using the
cumulative probability
Simulating a Dice
• X = sampled from a uniform continuous random generator ∈ [0,1]
• In a six faced (valued A through F) fair dice example could we do the
following transformation?
𝐴 𝑖𝑓 𝑋 ≤ 1Τ6
𝐵 𝑖𝑓 1Τ6 < 𝑋 ≤ 2Τ6
𝐶 𝑖𝑓 2Τ6 < 𝑋 ≤ 3Τ6
𝑌=
𝐷 𝑖𝑓 3Τ6 < 𝑋 ≤ 4Τ6
𝐸 𝑖𝑓 4Τ6 < 𝑋 ≤ 5Τ6
𝐹, 5Τ6 < 𝑋 ≤ 1
Simulating a Dice contd.
• X = sampled from a uniform continuous random generator ∈ [0,1]
• In the fair dice example could we do the following transformation?
1 𝑖𝑓 𝑋 ≤ 1Τ6
2 𝑖𝑓 1Τ6 < 𝑋 ≤ 2Τ6
6 𝑖𝑓 2Τ6 < 𝑋 ≤ 3Τ6
𝑌=
5 𝑖𝑓 3Τ6 < 𝑋 ≤ 4Τ6
4 𝑖𝑓 4Τ6 < 𝑋 ≤ 5Τ6
3, 5Τ6 < 𝑋 ≤ 1
𝑓(𝑋 = 0.94) = 3
𝑓(𝑋 = 0.82) = 2
𝑓(𝑋 = 0.64) = 1
𝑓(𝑋 = 0.37) = 0
𝑓(𝑋 = 0.25) = 0
𝑓(𝑋 = 0.02) = 0
Prob-1 (Analytical Solution)
• Here is the transformation function
• 𝑓 𝑋 ≤ 0.4 = 0
• 𝑓 0.4 < 𝑋 ≤ 0.7 = 1
• 𝑓 0.7 < 𝑋 ≤ 0.9 = 2
• 𝑓 0.9 < 𝑋 ≤ 1.0 = 3
• This function can be used to compute number of fatal accidents corresponding to the generated
random-numbers (0.37, 0.82, 0.64, 0.25, 0.02, 0.94)
• Is this a linear or non-linear transformation? 𝑋 𝑓
0.37 0
0.82 2
0.64 1
0.25 0
0.02 0
0.94 3
Prob-2
• The probabilities of the largest magnitude of an earthquake in any decade are
as follows:
Magnitude 3 to 4 4 to 5 5 to 6 6 to 7 7 to 8 8 to 9
Probability 0.78 0.13 0.04 0.03 0.01 0.01
• For E = {0.27, 0.62, 0.13, 0.49, 0.96, 0.06, 0.84} the corresponding values of M
are M={ 3.5, 3.5, 3.5, 3.5, 6.5, 3.5, 4.5}
• You can check this graphically using the following bar chart
Prob-2 (Graphical Solution)
• For E = {0.27, 0.62, 0.13, 0.49, 0.96, 0.06, 0.84} the corresponding values of M
are M={ 3.5, 3.5, 3.5, 3.5, 6.5, 3.5, 4.5}
• Disadvantage
• No exact numerical data
• Hard to compare 2 data sets
• "Other" category can be a problem
• Total unknown unless specified
• Best for 2-3 categories
• Use only with discrete data
Bar/Column Chart