0% found this document useful (0 votes)
37 views60 pages

Lectures - ProbaStat For Engineers

The document outlines a module on Probability & Statistics for Engineers at the University of Rwanda, covering topics such as descriptive statistics, probability theory, regression analysis, and quality control methods. It emphasizes the application of statistical methods in engineering contexts, including material testing and process control. Evaluation methods include assignments, quizzes, and exams, with a focus on understanding data through various statistical measures.

Uploaded by

icharite2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views60 pages

Lectures - ProbaStat For Engineers

The document outlines a module on Probability & Statistics for Engineers at the University of Rwanda, covering topics such as descriptive statistics, probability theory, regression analysis, and quality control methods. It emphasizes the application of statistical methods in engineering contexts, including material testing and process control. Evaluation methods include assignments, quizzes, and exams, with a focus on understanding data through various statistical measures.

Uploaded by

icharite2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

University of Rwanda

College of Science and Technology

Module: Probability & Statistics for


Engineers

Facilitator: Dr.Rongin Uwitije


E-mail: [email protected]
Main Content
Quick Review of Descriptive Statistics

• Measure of central tendency, representation of data, frequency


distributions, histograms, frequency polygon, cumulative
frequency polygon, grouped and ungrouped data, dispersion,
peakedness (skewness, kurtosis),….

Elementary Probability Theory


• Definition of probability, conditional probability , interpretations
and properties of probability, Discrete Random variables and
probability distributions, Mathematical Expectation and
Variance, continuous random variables and probability
distributions , Hypothesis Testing, experimental design,….
Main Content(Con’t…)
Regression Analysis and Correlation

• Curve fitting by least squares methods, Pearson’s coefficient of


correlation, Confidence intervals for the regression coefficients,
auto and cross correlations, Examples from engineering
problems

Quality Control Methods

• Detecting process change; control charts: x chart, R-chart, Runs


analysis, p-chart, c-chart; tolerant limits, acceptance sampling for
defectives, Application to engineering problems
Evaluation
• Assignments
• HW
20% • Quizzes

• Test 1
30% • Test 2

• Final Exam
50%
Pre-Course Assessment

https://www.survio.com/survey/d/N6A1G8A9E5
G1H5W1J
Introduction
Statistics
• Modern mathematical statistics has various
engineering applications.

• For instance, in testing materials, control of


production processes, quality control of
production outputs, performance tests of systems,
robotics, and automatization in general,
production planning, marketing analysis, etc.
Introduction (Con’t…)

Probability Theory

 Providing models of probability distributions


(theoretical models of the observable reality
involving chance effects) to be tested by
statistical methods

 Supply the mathematical foundation of these


methods.
Unit 1:

QUICK REVIEW
OF
DESCRIPTIVE STATISTICS
As an “Engineer-to-be”
 Suppose that you are working on designing a new type of
durable material for building bridges.

 To ensure the material meets safety standards, you need to test


its strength under various conditions.

 You collect data from different tests, and now, your challenge
is to understand what this data is telling you about the
material's reliability and durability.

 By using descriptive statistics, you are able to summarize and


interpret your data effectively and provide insights that help
make critical decisions about the material's performance.
As an “Engineer-to-be”
 Suppose that you are tasked with designing the foundation of a skyscraper.

 To ensure the building remains stable and safe under different conditions,
you need to analyze the soil properties at the construction site.

 You collect data on soil density, moisture content, and load-bearing


capacity from multiple locations around the site. Your challenge is to make
sense of this data, summarize it, and determine if the soil can support the
building.

 By applying descriptive statistics, you can summarize the data, identify


trends, and provide insights that will guide critical decisions in the design
of the foundation.
1.1.Key Terms and Definitions
 Population: The set of objects or entities that we are interested in.
Example: All the bridges in Rwanda

 Data: A collection of observations gathered from the population or sample.

 Types of Data:

 Qualitative data: Data that describes categories or groups (e.g.: types of


cars, colors, gender)

 Quantitative data: Data that represents numerical values (e.g.: The load-
bearing capacity (in tons) of bridges, number of students in a class, height,
temperature)
1.1.Key Terms and Definitions
 Sample: A subset of the population chosen for analysis.

Example: 10 bridges selected 100 bridges in Rwanda to test for structural


integrity.

• Variable: Any characteristic that can take on different values.

Example: The compressive strength of concrete used in constructing the


bridges.
1.1. Key Terms and Definitions
 Descriptive Statistics: Methods used to summarize and
describe the features of a dataset.

 Frequency Distribution: A table that shows how often each


value occurs in a dataset.
Example: A table listing the number of samples with a tensile
strength between 500-600 MPa, 600-700 MPa, and so on.

 Central Tendency: A measure that represents the center of the


dataset.

 Mean (Average): Sum of all values divided by the number of


values.
1.2.Measure of Central Tendency and Location

 Descriptive statistical measures have two functions: they


provide a mental image of a data distribution, and they are an
essential component of inferential statistics, the basis of both
estimation and hypothesis testing.

 The clustering of the measurements near the centre of a


distribution is called “central tendency”, and the statistical
measures that describe aspects of the "centre" of a distribution
are called “measures of central tendency”.
1.2.Measure of Central Tendency and Location

• Measures of location show where the characteristics of a


distribution are located in relation to the measurement scale.
• 𝒙𝒎𝒊𝒏 and 𝒙𝒎𝒂𝒙 are the minimum and maximum values in the
data set respectively;
• 𝑥𝑚𝑒𝑑𝑖𝑎𝑛 (Median), is the middle value when data is ordered
from least to greatest. If the data set has an even number of
values, the median is the average of the two middle values
• The mode (Mo), is the most frequently occurring value in a set
of data.
Arithmetic Mean
Let 𝑥1, 𝑥2, … , 𝑥𝑛 be an array of n observations of a
variable 𝑋 and 𝑓1, 𝑓2, … , 𝑓𝑛 be the respective frequencies
of the observations. Then, the arithmetic mean of the
measurements is given by:

𝑛
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥= 𝑛
𝑖=1 𝑓𝑖
Geometric Mean
For a set of positive numbers 𝑥1 , 𝑥2 , … , 𝑥𝑛, the geometric mean is
the 𝑛𝑡ℎ root of the product of these numbers. This average is
useful for datasets involving rates of growth, ratios, or
percentages, especially when values vary exponentially.

𝑛 𝑛
𝑛 𝑛
𝑥𝐺 = 𝑥𝑖 = 𝑥𝑖 𝑅𝑖
𝑖=1 𝑖=1
Harmonic Mean
The harmonic mean of a set of positive real number 𝑥1 , 𝑥2 , … , 𝑥𝑛,
is defined as:

𝑛 𝑛
𝐻 𝑥1 , 𝑥2 , … , 𝑥𝑛 = 𝑥𝐻 = =
1 1 1 𝑛 1
+ +⋯+ 𝑖=1
𝑥1 𝑥2 𝑥𝑛 𝑥𝑖
This type of average is useful when you want to find the average
of rates or ratios, particularly when the data involves quantities
like speeds or rates of change. It gives more weight to smaller
values in the dataset.
Example
Consider the height measurements of 25 students given in below
frequency table. Calculate the arithmetic mean, harmonic mean
and geometric mean, and compare the results. Also find the
median and mode.

𝒊𝒊 1 2 3 4 5 6 7 8 9 10 11 12
𝒙𝒊 152 154 155 159 160 161 162 167 170 171 172 173

𝑹𝒊 1 1 1 2 6 2 1 1 4 1 4 1
Solution
. Ri
ii xi Ri Ri  xi xiRi xi
1 152 1 152 152 0.0066
2 154 1 154 154 0.0065
3 155 1 155 155 0.0065
4 159 2 318 25281 0.0126
5 160 6 960 1.67772E+13 0.0375
6 161 2 322 25921 0.0124
7 162 1 162 162 0.0062
8 167 1 167 167 0.0060
9 170 4 680 835210000 0.0235
10 171 1 171 171 0.0058
11 172 4 688 875213056 0.0233
12 173 1 173 173 0.0058
4102 2.3337E+55 0.1526
Solution
12
𝑖=1 𝑅𝑖 𝑥𝑖 4102
 Arithmetic mean 𝑥 = 25
= 25
=
164.08
25 𝑅𝑖
12
• Geometric mean 𝑥𝐺 = 𝑖=1 𝑥𝑖 =
25
2.3337 × 1055 = 163.95
𝑛 25
• Harmonic mean 𝑥𝐻 = 12 𝑅𝑖
= 0.01526
=
𝑖=1𝑥
𝑖
163.8270
• Obviously, 𝑥𝐻 < 𝑥𝐺 < 𝑥
Solution
• For the n: number of observations in the given data set,

• If n is even, the median is calculated as follows:


𝑥𝑛 𝑥 𝑛 +1
2+ 2
𝑀𝑒 =
2
• If n is odd, the median is calculated as :
𝑥𝑛+1
𝑀𝑒 =
2
𝑥25+1
• For our example, n is 25 (odd), thus 𝑀𝑒 = 2
= 𝑥13 = 161

• 𝑀𝑜 = 160 (with repetition 6)


Quartiles
The median is one of many possible quartiles that
can be calculated from a data set organized into
ascending array. There are three quartiles: first
quartile 𝑄1 , second quartile 𝑄2 and third quartile
𝑄3 . They divide arrays into four equal parts.
The first quartile 𝑄1 is given by:

𝑥 𝑛 +𝑥 𝑛
+1 𝑛 𝑛
4 4
𝑄1 = , where is the integer part of .
2 4 4
Quartiles
• The second quartile 𝑄2 is equal to the median,
i.e.,𝑄2 = 𝑀𝑒
• The third quartile 𝑄3 is given by:

𝑥 3𝑛 +𝑥 3𝑛
+1
4 4
𝑄3 = 2
,

3𝑛 3𝑛
where is the integer part of .
4 4

• The interquartile range Δ is the difference between the


third and first quartiles, i.e., ∆= 𝑄3 − 𝑄1 .
Example
Consider the height measurements of 25 students given in below
frequency table.

𝒊𝒊 1 2 3 4 5 6 7 8 9 10 11 12
𝒙𝒊 152 154 155 159 160 161 162 167 170 171 172 173

𝑹𝒊 1 1 1 2 6 2 1 1 4 1 4 1

𝑥 𝑛 + 𝑥 𝑛 +1 𝑥 25 + 𝑥 25
4 4 4 4
+1 𝑥6 + 𝑥7 160 + 160
𝑄1 = = = =
2 2 2 2
= 160
Example

𝑄2 = 𝑀𝑒 = 161

𝑥 3𝑛 + 𝑥 3𝑛 𝑥 75 + 𝑥 75
4 4
+1 4 4
+1 𝑥19 + 𝑥20
𝑄3 = = =
2 2 2

170 + 171
= = 170.5
2
The interquartile range is
∆= 𝑄3 − 𝑄1 = 170.5 − 160 = 10.5
Measures of Dispersion
The degree to which numerical data tend to spread about an
average value is called the dispersion, or variation, of the data.
The most common measures of the dispersion are the range,
standard deviation, variance and coefficient of variation.
a) Range
The range is the largest value in a data set minus the smallest
value, i.e.,
𝑅 = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛 = 𝑥(𝑛) − 𝑥(1)
Measures of Dispersion
b) Standard Deviation (and Variance)

The standard deviation and variance measure the spread of


data points around the mean.

• Variance(𝝈𝟐) for a population:

𝑓𝑖 𝑥𝑖−𝜇 2
𝜎2 = ,
𝑛

Where, 𝑥𝑖 is each individual value, 𝜇 is the mean of the


dataset, n is the total number of values.
Measures of Dispersion
• Variance(𝝈𝟐) for a sample:
𝑓𝑖 𝑥𝑖 −𝑥 2
𝑠2 = ,
𝑛−1

Where, 𝑥𝑖 is each individual value, 𝑥 is the sample mean, n is the


sample size.
• The standard deviation (𝜎 for population and 𝑠 for sample) is
the square root of the variance.
𝑠
• C) Coefficient of Variation (CV) is defined as: 𝐶𝑉 = × 100,
𝑥
where s is the standard deviation and 𝑥 is the mean.
Measures of Dispersion
By using our example in the previous slides,

12
2 𝑖=1 𝑓𝑖 𝑥𝑖 −164.08 2 1031.84
• 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝜎 = = =
25 25

41.27

• Standard deviation 𝜎 = 41.27 ≅ 6.42

𝑠 6.42
• 𝐶𝑉 = × 100 = × 100 = 3.91%
𝑥 164.08
1.3. Data Representation
 In descriptive statistics, data representation refers
to methods of summarizing and visualizing data
to understand its distribution, patterns, and
relationships.
 Effective data representation helps in simplifying
complex data sets, making them easier to interpret
and analyze.
Tabular Representation of Data
a) Frequency Distribution Table

A frequency distribution table summarizes data by listing the


categories (qualitative) or ranges (quantitative) and showing how
often each occurs.

Eg. Consider the following exam scores of 20 engineering


students in one of the module at UR:

50,60,75,80,65,70,85,90,95,100,70,60,85,90,75,60,55,85,90,100
Tabular Representation of Data
The data can be represented a frequency distribution as follows:

Score Range(Marks) Frequency


50-59 2
60-69 4
70-79 4
80-89 5
90-100 5
Tabular Representation of Data
(b) Relative Frequency Table

The relative frequency table shows the proportion (or percentage)


of total observations that fall into each category.

By using the previous example, this table is as follows:

Score Range(Marks) Frequency Relative


Frequency(%)
50-59 2 10%
60-69 4 20%
70-79 4 20%
80-89 5 25%
90-100 5 25%
Graphical Representation of Data
• Graphical representation allows for quick understanding of trends,
patterns and outliers in data.
Here the data can be represented by using “Bar Chart”, “Histogram”
and “Pie-Chart”.
• Bar charts represent categorical data with rectangular bars, where
the length of each bar is proportional to the frequency or relative
frequency.
• A bar chart is used when you want to show a distribution of data
points or perform a comparison of metric values across different
subgroups of your data.
Graphical Representation of Data
Eg. 30 MEE students were asked about their favorite fruits and
their responses is summarized in the table below. Plot the bar
chart for the data.

Fruit Banana Orange Apple Grapes Avocado


# of 8 6 5 4 7
Students
Solution
Bar Chart for Students' Favorite Fruit
9

# of Students
4

0
Banana Orange Apple Grapes Avocado
Histogram (Quantitative Data)
• A histogram is the graphical representation of data where data
is grouped into continuous number ranges and each range
corresponds to a vertical bar.

 The horizontal axis displays the number ranges.

 The vertical axis (frequency) represents the amount of data


that is present in each range.

 The number ranges depend upon the data that is being used.
Tabular Representation of Data
Let’s use our previous example about the marks of MEE students:

Score Range(Marks) Frequency


50-59 2
60-69 4
70-79 4
80-89 5
90-100 5
Histogram
Let’s use our previous example on the marks of MEE students:

50,60,75,80,65,70,85,90,95,100,70,60,85,90,75,60,55,85,90,100

Histogram
7

5
Frequency

3 Frequency

0
55 64 73 82 91 100 More
Bin
Frequency Polygon

A frequency polygon is a graphical representation of


the distribution of a dataset. It is created by plotting
points that represent the frequencies of data points
at each individual value(for ungrouped data) or at
the midpoints of each class interval (for grouped
data). These points are then connected with straight
lines, forming a polygon shape.
Frequency polygon for our previous example

Frequency Polygon Example


7

4
Frequency

0
150 155 160 165 170 175
Students' Marks
Cumulative Frequency Polygon
A cumulative frequency polygon (also called ogive) is used to represent
cumulative frequencies graphically. It shows the cumulative frequency
of the data at each point and is particularly useful for understanding
percentiles, medians, and other distribution characteristics.
To construct a cumulative frequency polygon (for ungrouped data), the
following steps are followed:
Step 1: Organize the data in increasing order and create a frequency
distribution table that records the frequency of each value

Step 2:Calculate the cumulative frequency


Cumulative Frequency Polygon
Step 3: Plot the points: on x-axis, place the data values, on y-axis,
place the cumulative frequencies and the then plot each data
value against its corresponding cumulative frequency.

Step 4: Draw the ogive: Connect the points with straight lines.
Optionally, the curve can start from zero on the y-axis, depending
on the dataset.
Example
E.g. : The following graph is the ogive for the example in the
previous slides (Step-by-steps will be done while lecturing)

30

25

20

15

10

0
150 155 160 165 170 175
Pie-Chart

A pie chart is used to display a set of categorical data. It


is a circle that is divided into sections or wedges
according to the percentage of frequencies in each
category of the distribution.

𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑆 × 360𝑜
𝐴𝑛𝑔𝑙𝑒 𝑜𝑓 𝑆𝑒𝑐𝑡𝑜𝑟 𝑆 =
𝑇𝑜𝑡𝑎𝑙 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Pie-Chart

By using the example about the students’ favorite fruit, the pie-
chart is drawn as follows:

Banana
Orange
Apple
Grapes
Avocado
Grouped Data

Grouped data refers to data that has been


organized into classes or intervals. When the raw
data is too large, grouping it simplifies analysis.
In this lecture, we will focus on calculating the
mean, median, and mode for grouped data and
learn how to plot a histogram, frequency
polygon, and cumulative frequency polygon.
Mean for Grouped Data
The mean (or average) of grouped data is calculated using the
formula:
𝑛
𝑖=1 𝑓𝑖 𝑥𝑖
𝑀𝑒𝑎𝑛 𝑥 = 𝑛
𝑖=1 𝑓𝑖

Where,
• 𝑓𝑖 is the frequency of the 𝑖 𝑡ℎ class
• 𝑥𝑖 is the midpoint of the 𝑖 𝑡ℎ class ,
𝐿𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦+𝑈𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦
𝑥𝑖 =
2
𝑛
• 𝑖=1 𝑓𝑖 is the total frequency
Median for Grouped Data
The median is the value that separates the data into two equal halves. For
grouped data, it can be found using the formula:

𝑁
−𝐹
𝑀𝑒𝑑𝑖𝑎𝑛(𝑀𝑒) = 𝐿 + 2 ×ℎ
𝑓

Where,
• 𝐿 is the lower class boundary of the median class
• 𝑁 is the total frequency
• 𝐹 is the cumulative frequency before the median class
• f is the frequency of the median class
• h is the class width
Mode for Grouped Data
The mode is the value that appears most frequently. For grouped data,
the mode is calculated using the following formula:

𝑓1 − 𝑓0
𝑀𝑜𝑑𝑒 𝑀𝑜 = 𝐿 + ×ℎ
𝑓1 − 𝑓0 + 𝑓1 − 𝑓2

Where,
• L is the lower class boundary of the modal class
• 𝑓1 is the frequency of the modal class
• 𝑓0 is the frequency of the class before the modal class
• 𝑓2 is the frequency of the class after the modal class
• h is the class width
Variance and Standard Deviation for
Grouped Data
The variance (𝜎 2 ) is given by:
2
2
𝑓𝑖 × 𝑥𝑚 − 𝑥
𝜎 =
𝑓
Where,
• 𝑥𝑚 = 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑙𝑎𝑠𝑠
• 𝑥 is the mean
• 𝑓𝑖 is the frequency
Standard deviation(𝜎) = 𝜎 2
Example
Consider the following dataset:
Class 40-49 50-59 60-69 70-79 80-89 90-99 100- 110-
Interv 109 119
al
Frequ 5 7 12 15 10 6 3 2
ency

Calculate the Mean, Median, Mode, variance and standard


deviation.
Solution (Steps-by-steps will be performed during the class
session)
Example
4450
• 𝑀𝑒𝑎𝑛 = = 74.17
60
30−24
• 𝑀𝑒𝑑𝑖𝑎𝑛 = 70 + 15 × 10 = 74
15−12
• 𝑀𝑜𝑑𝑒 = 70 + 15−12 + 15−10 × 10 =
73.75
2 17794.43
• 𝜎 = = 296.57
60
• 𝜎 = 296.57 = 17.22
Graphical Representation of Grouped Data
a) Histogram
• A histogram represents grouped frequency data. The x-axis
represents class intervals, and the y-axis represents
frequencies. The height of each bar corresponds to the
frequency, and bars are adjacent to each other.
• To plot a Histogram, Mark the class intervals on the x-axis,
then mark the frequencies on the y-axis and finally draw bars
corresponding to each class interval’s frequency, ensuring no
gaps between the bars.
Graphical Representation of Grouped Data
b) Frequency Polygon

• A frequency polygon is a line graph that shows frequencies at


the midpoints of each class interval. The graph helps visualize
the shape of the distribution.

• To plot a frequency polygon, start by calculating the midpoint


for each class interval, then plot points at the midpoints with
the corresponding frequencies, and then connect the points
with straight lines.
Graphical Representation of Grouped Data
b) Cumulative Frequency Polygon (Ogive)

• A cumulative frequency polygon (ogive) shows cumulative


frequencies and helps in identifying percentiles and medians
visually.

• To plot an ogive, start by creating a cumulative frequency


table, then plot cumulative frequencies against the upper class
boundaries, and then connect the points with a smooth curve.
HW1
For the dataset in the previous example,
construct:
a) A histogram
b) A frequency polygon
c) A cumulative frequency polygon
Exercise
The following data set represents the tensile strength (in Mpa) of
a certain type of alloy used in construction.

72,78,85,85,102,110,120,123,126,129,132,136,138,141,145,150,
153,155,160,165,172,175,180,182,185,188,191,193,195,198,202,
204,210,215,218,220,225,230,235,240,243,245,248,252,255,260,
264,270,275,280,285,290,295,300,305,310,315,320,325,330,335,
340,345,350,355,360,365,370,375,380,385,390,395,400,405,410,
415,420,425,430,435,440,445,450,455,460,465,470,475,480,485,
490,495,500,505,510,515,520,525,530.
Exercise (Assignment 1)
a) Arrange the data into classes

b) Perform a thorough descriptive statistical analysis (by


calculating the mean, median, mode, variance and the
standard deviation using the grouped data in a)

c) Construct the histogram, frequency polygon and cumulative


frequency polygon (Ogive)

d) By using the ogive constructed in c) find the median tensile


strength and compare it with the median obtained in b).

You might also like