Lectures - ProbaStat For Engineers
Lectures - ProbaStat For Engineers
• Test 1
30% • Test 2
• Final Exam
50%
Pre-Course Assessment
https://www.survio.com/survey/d/N6A1G8A9E5
G1H5W1J
Introduction
Statistics
• Modern mathematical statistics has various
engineering applications.
Probability Theory
QUICK REVIEW
OF
DESCRIPTIVE STATISTICS
As an “Engineer-to-be”
Suppose that you are working on designing a new type of
durable material for building bridges.
You collect data from different tests, and now, your challenge
is to understand what this data is telling you about the
material's reliability and durability.
To ensure the building remains stable and safe under different conditions,
you need to analyze the soil properties at the construction site.
Types of Data:
Quantitative data: Data that represents numerical values (e.g.: The load-
bearing capacity (in tons) of bridges, number of students in a class, height,
temperature)
1.1.Key Terms and Definitions
Sample: A subset of the population chosen for analysis.
𝑛
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥= 𝑛
𝑖=1 𝑓𝑖
Geometric Mean
For a set of positive numbers 𝑥1 , 𝑥2 , … , 𝑥𝑛, the geometric mean is
the 𝑛𝑡ℎ root of the product of these numbers. This average is
useful for datasets involving rates of growth, ratios, or
percentages, especially when values vary exponentially.
𝑛 𝑛
𝑛 𝑛
𝑥𝐺 = 𝑥𝑖 = 𝑥𝑖 𝑅𝑖
𝑖=1 𝑖=1
Harmonic Mean
The harmonic mean of a set of positive real number 𝑥1 , 𝑥2 , … , 𝑥𝑛,
is defined as:
𝑛 𝑛
𝐻 𝑥1 , 𝑥2 , … , 𝑥𝑛 = 𝑥𝐻 = =
1 1 1 𝑛 1
+ +⋯+ 𝑖=1
𝑥1 𝑥2 𝑥𝑛 𝑥𝑖
This type of average is useful when you want to find the average
of rates or ratios, particularly when the data involves quantities
like speeds or rates of change. It gives more weight to smaller
values in the dataset.
Example
Consider the height measurements of 25 students given in below
frequency table. Calculate the arithmetic mean, harmonic mean
and geometric mean, and compare the results. Also find the
median and mode.
𝒊𝒊 1 2 3 4 5 6 7 8 9 10 11 12
𝒙𝒊 152 154 155 159 160 161 162 167 170 171 172 173
𝑹𝒊 1 1 1 2 6 2 1 1 4 1 4 1
Solution
. Ri
ii xi Ri Ri xi xiRi xi
1 152 1 152 152 0.0066
2 154 1 154 154 0.0065
3 155 1 155 155 0.0065
4 159 2 318 25281 0.0126
5 160 6 960 1.67772E+13 0.0375
6 161 2 322 25921 0.0124
7 162 1 162 162 0.0062
8 167 1 167 167 0.0060
9 170 4 680 835210000 0.0235
10 171 1 171 171 0.0058
11 172 4 688 875213056 0.0233
12 173 1 173 173 0.0058
4102 2.3337E+55 0.1526
Solution
12
𝑖=1 𝑅𝑖 𝑥𝑖 4102
Arithmetic mean 𝑥 = 25
= 25
=
164.08
25 𝑅𝑖
12
• Geometric mean 𝑥𝐺 = 𝑖=1 𝑥𝑖 =
25
2.3337 × 1055 = 163.95
𝑛 25
• Harmonic mean 𝑥𝐻 = 12 𝑅𝑖
= 0.01526
=
𝑖=1𝑥
𝑖
163.8270
• Obviously, 𝑥𝐻 < 𝑥𝐺 < 𝑥
Solution
• For the n: number of observations in the given data set,
𝑥 𝑛 +𝑥 𝑛
+1 𝑛 𝑛
4 4
𝑄1 = , where is the integer part of .
2 4 4
Quartiles
• The second quartile 𝑄2 is equal to the median,
i.e.,𝑄2 = 𝑀𝑒
• The third quartile 𝑄3 is given by:
𝑥 3𝑛 +𝑥 3𝑛
+1
4 4
𝑄3 = 2
,
3𝑛 3𝑛
where is the integer part of .
4 4
𝒊𝒊 1 2 3 4 5 6 7 8 9 10 11 12
𝒙𝒊 152 154 155 159 160 161 162 167 170 171 172 173
𝑹𝒊 1 1 1 2 6 2 1 1 4 1 4 1
𝑥 𝑛 + 𝑥 𝑛 +1 𝑥 25 + 𝑥 25
4 4 4 4
+1 𝑥6 + 𝑥7 160 + 160
𝑄1 = = = =
2 2 2 2
= 160
Example
𝑄2 = 𝑀𝑒 = 161
𝑥 3𝑛 + 𝑥 3𝑛 𝑥 75 + 𝑥 75
4 4
+1 4 4
+1 𝑥19 + 𝑥20
𝑄3 = = =
2 2 2
170 + 171
= = 170.5
2
The interquartile range is
∆= 𝑄3 − 𝑄1 = 170.5 − 160 = 10.5
Measures of Dispersion
The degree to which numerical data tend to spread about an
average value is called the dispersion, or variation, of the data.
The most common measures of the dispersion are the range,
standard deviation, variance and coefficient of variation.
a) Range
The range is the largest value in a data set minus the smallest
value, i.e.,
𝑅 = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛 = 𝑥(𝑛) − 𝑥(1)
Measures of Dispersion
b) Standard Deviation (and Variance)
𝑓𝑖 𝑥𝑖−𝜇 2
𝜎2 = ,
𝑛
12
2 𝑖=1 𝑓𝑖 𝑥𝑖 −164.08 2 1031.84
• 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝜎 = = =
25 25
41.27
𝑠 6.42
• 𝐶𝑉 = × 100 = × 100 = 3.91%
𝑥 164.08
1.3. Data Representation
In descriptive statistics, data representation refers
to methods of summarizing and visualizing data
to understand its distribution, patterns, and
relationships.
Effective data representation helps in simplifying
complex data sets, making them easier to interpret
and analyze.
Tabular Representation of Data
a) Frequency Distribution Table
50,60,75,80,65,70,85,90,95,100,70,60,85,90,75,60,55,85,90,100
Tabular Representation of Data
The data can be represented a frequency distribution as follows:
# of Students
4
0
Banana Orange Apple Grapes Avocado
Histogram (Quantitative Data)
• A histogram is the graphical representation of data where data
is grouped into continuous number ranges and each range
corresponds to a vertical bar.
The number ranges depend upon the data that is being used.
Tabular Representation of Data
Let’s use our previous example about the marks of MEE students:
50,60,75,80,65,70,85,90,95,100,70,60,85,90,75,60,55,85,90,100
Histogram
7
5
Frequency
3 Frequency
0
55 64 73 82 91 100 More
Bin
Frequency Polygon
4
Frequency
0
150 155 160 165 170 175
Students' Marks
Cumulative Frequency Polygon
A cumulative frequency polygon (also called ogive) is used to represent
cumulative frequencies graphically. It shows the cumulative frequency
of the data at each point and is particularly useful for understanding
percentiles, medians, and other distribution characteristics.
To construct a cumulative frequency polygon (for ungrouped data), the
following steps are followed:
Step 1: Organize the data in increasing order and create a frequency
distribution table that records the frequency of each value
Step 4: Draw the ogive: Connect the points with straight lines.
Optionally, the curve can start from zero on the y-axis, depending
on the dataset.
Example
E.g. : The following graph is the ogive for the example in the
previous slides (Step-by-steps will be done while lecturing)
30
25
20
15
10
0
150 155 160 165 170 175
Pie-Chart
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑆 × 360𝑜
𝐴𝑛𝑔𝑙𝑒 𝑜𝑓 𝑆𝑒𝑐𝑡𝑜𝑟 𝑆 =
𝑇𝑜𝑡𝑎𝑙 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Pie-Chart
By using the example about the students’ favorite fruit, the pie-
chart is drawn as follows:
Banana
Orange
Apple
Grapes
Avocado
Grouped Data
Where,
• 𝑓𝑖 is the frequency of the 𝑖 𝑡ℎ class
• 𝑥𝑖 is the midpoint of the 𝑖 𝑡ℎ class ,
𝐿𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦+𝑈𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦
𝑥𝑖 =
2
𝑛
• 𝑖=1 𝑓𝑖 is the total frequency
Median for Grouped Data
The median is the value that separates the data into two equal halves. For
grouped data, it can be found using the formula:
𝑁
−𝐹
𝑀𝑒𝑑𝑖𝑎𝑛(𝑀𝑒) = 𝐿 + 2 ×ℎ
𝑓
Where,
• 𝐿 is the lower class boundary of the median class
• 𝑁 is the total frequency
• 𝐹 is the cumulative frequency before the median class
• f is the frequency of the median class
• h is the class width
Mode for Grouped Data
The mode is the value that appears most frequently. For grouped data,
the mode is calculated using the following formula:
𝑓1 − 𝑓0
𝑀𝑜𝑑𝑒 𝑀𝑜 = 𝐿 + ×ℎ
𝑓1 − 𝑓0 + 𝑓1 − 𝑓2
Where,
• L is the lower class boundary of the modal class
• 𝑓1 is the frequency of the modal class
• 𝑓0 is the frequency of the class before the modal class
• 𝑓2 is the frequency of the class after the modal class
• h is the class width
Variance and Standard Deviation for
Grouped Data
The variance (𝜎 2 ) is given by:
2
2
𝑓𝑖 × 𝑥𝑚 − 𝑥
𝜎 =
𝑓
Where,
• 𝑥𝑚 = 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑙𝑎𝑠𝑠
• 𝑥 is the mean
• 𝑓𝑖 is the frequency
Standard deviation(𝜎) = 𝜎 2
Example
Consider the following dataset:
Class 40-49 50-59 60-69 70-79 80-89 90-99 100- 110-
Interv 109 119
al
Frequ 5 7 12 15 10 6 3 2
ency
72,78,85,85,102,110,120,123,126,129,132,136,138,141,145,150,
153,155,160,165,172,175,180,182,185,188,191,193,195,198,202,
204,210,215,218,220,225,230,235,240,243,245,248,252,255,260,
264,270,275,280,285,290,295,300,305,310,315,320,325,330,335,
340,345,350,355,360,365,370,375,380,385,390,395,400,405,410,
415,420,425,430,435,440,445,450,455,460,465,470,475,480,485,
490,495,500,505,510,515,520,525,530.
Exercise (Assignment 1)
a) Arrange the data into classes