1 - Chapter (1) Analysis of Data and Its Types Exercise
1 - Chapter (1) Analysis of Data and Its Types Exercise
Analysis of Data
Introduction:
Statistics is the study that deals with the collection, analysis and presentation of data to make decisions and
answer questions about samples of observations. It refers to a collection of methods (tools) developed over
hundreds of years for working with data and using data.
In simple terms, statistics is the science of data. Because many aspects of engineering practice involve working
with data, obviously knowledge of statistics is just as important to an engineer as are the other engineering
sciences. Specifically, statistical techniques can be powerful aids in designing new products and systems,
improving existing designs, and designing, developing, and improving production processes. Statistics can
also be used to see if scores on two variables are related and to make predictions.
When it comes to the statistical tools that we use in practice, it can be helpful to divide the field of statistics
into two large groups of methods:
i - Descriptive Statistics consists of methods for organizing, displaying and describing data with the help
of some tools like tables, bar charts, pie charts, graphs, …. etc., and summary results.
ii - Inferential Statistics consists of methods using sample data to make an inference, decisions or
predictions about a population of data.
Definition: A population consists of a set of elements (items or objects) whose characteristics are being
studied and a sample is a subset (portion) of the population selected for study. A random sample is a sample
drawn such that each element of the population has the same chance of being selected.
Central Tendency
In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability
distribution. It may also be called a center or location of the distribution. Colloquially, measures of central
tendency are often called averages. The term central tendency dates from the late 1920s.
The most common measures of central tendency are the arithmetic mean, the median and the mode.
A central tendency can be calculated for either a finite set of values or for a theoretical distribution, such as the
normal distribution.
1. Mean (The Arithmetic Mean) equals the sum of observations (values) divided by the number of values.
2. Median (The Middle of the road) equals the observation (value) in the center when all observations
are ordered from smallest to largest; when there is an even number of observations, the median is defined
𝑛+1 𝑡ℎ
as the average of the middle two values therefore, the Median = the value of the ( ) term in a ranked
2
data set.
3. Mode equals the value that occurs with highest frequency among all values of the data set, in other words
the mode is the number(s) that appear(s) the most out of a given set of data. A data set can have more
than one mode value.
4. Geometric Mean is an appropriate measure when values change exponentially therefore, it is more
commonly used in microbiological and serological research.
1
One important disadvantage of GM is that it cannot be used if any of the values are zero or negative so, this
measure is valid only for data that are measured absolutely on a strictly positive scale. Therefore, GM equals
the nth root of the product of the data values, where there are n of these.
1. Standard deviation: The concept of Standard Deviation (SD) was introduced by Karl Pearson in 1893.
It is by far the most important and widely used measure of dispersion.
It is a measure of dispersion (variability) of a set of data from its mean is calculated as the square root of the
variance, it measures the absolute variability of a distribution; the higher the dispersion or variability, the greater
is the standard deviation and greater will be the magnitude of the deviation of the value from their mean.
2. Coefficient of variation:
In probability theory and statistics, the coefficient of variation (CV), is a standardized measure of dispersion
of a probability distribution or frequency distribution. It is often expressed as a percentage, and is defined as
the ratio of the standard deviation to the mean (or its absolute value). The CV is commonly used in
engineering when doing quality assurance studies.
A direct comparison of 2 or more measures of dispersion (such as the standard deviation for a distribution of
annual incomes compared to the standard deviation for a distribution of the number of days absent for the
same group of employees) is impossible. It is impossible because the two values are measured in different
units. The absenteeism would be measured in number of days, while the incomes would be measured in dollar
bills($). In order to make a meaningful comparison of the 2 standard deviations, we need to convert them to a
relative value. So when comparing between data sets with different units or widely different means, one should
use the coefficient of variation for comparison.
Example: If the mean of salaries of some employers in a week is 200 L.E., with S.D. 35, and the mean of
their experience in years is 10 years with S.D. 6 years. Compare the dispersion of their salaries with their
experience.
Solution
𝒔 𝟑𝟓
∵ (𝑪𝑽)𝒔𝒂𝒍𝒂𝒓𝒊𝒆𝒔 = 𝒙
̅
× 𝟏𝟎𝟎 % = 𝟐𝟎𝟎 × 𝟏𝟎𝟎% = 𝟏𝟕. 𝟓 % 𝑎𝑛𝑑,
𝒔 𝟔
(𝑪𝑽)𝒆𝒙𝒑𝒆𝒓𝒊𝒆𝒏𝒄𝒆 = × 𝟏𝟎𝟎 % = 𝟏𝟎 × 𝟏𝟎𝟎% = 𝟔𝟎 %
𝒙
̅
Therefore,
(𝑪𝑽)𝒔𝒂𝒍𝒂𝒓𝒊𝒆𝒔 = 𝟏𝟕. 𝟓 % < (𝑪𝑽)𝒆𝒙𝒑𝒆𝒓𝒊𝒆𝒏𝒄𝒆 = 𝟔𝟎 %
2
(𝟏) 𝑼𝑵𝑮𝑹𝑶𝑼𝑷𝑬𝑫 𝑫𝑨𝑻𝑨
Ungrouped Data are the data that has not been organized into groups, it looks like a list of numbers.
∑𝒙 ∑𝒙
Arithmetic Mean ̅ =
𝒙 𝝁=
𝒏 𝑵
( ∑ 𝒙 )𝟐 ( ∑ 𝑥 )2
∑ 𝒙𝟐 – ∑ 𝒙𝟐 −
Variance 𝒔𝟐 = 𝒏
𝝈𝟐 = 𝑁
𝒏−𝟏 𝑵
(∑ 𝒙)𝟐 (∑ 𝒙)𝟐
∑ 𝒙𝟐 − ∑ 𝒙𝟐 −
Standard Deviation √𝒔𝟐 = 𝒔 = √ 𝒏 √𝝈𝟐 = 𝝈 = √ 𝑵
𝒏−𝟏 𝑵
𝒔 𝝈
Coefficient of variation 𝑪𝑽 = × 𝟏𝟎𝟎 % 𝑪𝑽 = × 𝟏𝟎𝟎 %
̅
𝒙 𝝁
𝟐 (∑ 𝒙)𝟐
∑𝒙 − 520 − 500
b) Standard Deviation Data set A is, 𝝈𝑨 = √ 𝑵
= √ = √4 = 2
𝑵 5
𝟐 (∑ 𝒙)𝟐
∑𝒙 − 500 − 500
Standard Deviation Data set B is, 𝝈𝑩 = √ 𝑵
= √ = √0 = 0
𝑵 5
𝟐 (∑ 𝒙)𝟐
∑𝒙 − 824 − 500
Standard Deviation Data set C is, 𝝈𝑪 = √ 𝑵
= √ = √64.8 = 8.05
𝑵 5
3
𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟐): The exam grades of a sample of 9 students in a class are: 4 , 10 , 7 , 7 , 6 , 9 , 3 , 8 , 9
Calculate: i) The range ii) The mode iii) The mean iv) The median
v) The variance and standard deviation vi) The coefficient of variation.
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏
First, we arrange the data in ascending order ; 3 , 4 , 6 , 7 , 7 , 8 , 9 , 9 , 10
i) The range = difference between the max. and min. value of grades = 10 – 3 = 7
ii) The given data set has 2 modes: 7 and 9
∑ 𝒙𝒊 (3 + 4 + 6 + 7 + 7 + 8 + 9 + 9 +10) 63
̅=
iii) The mean, 𝒙 = = =7
𝒏 𝟗 9
𝑛+1 9+1
iv) ∵ Middle term = = =5. ∴ The median = the value of the middle term = 7
2 2
(∑ 𝒙)𝟐
∑ 𝒙𝟐 − 485 – 441
𝟐 𝒏
v) The variance, 𝒔 = = = 5.5
𝒏−𝟏 8
𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟑): The duration of time from the first exposure to HIV infection to AIDS diagnosis is called the
incubation period.
The incubation periods of a random sample of 7 HIV infected individuals is given below (in years):
𝟏𝟐. 𝟎 , 𝟏𝟎. 𝟓 , 𝟗. 𝟓 , 𝟔. 𝟑 , 𝟏𝟑. 𝟓 , 𝟏𝟐. 𝟓 , 𝟕. 𝟐
Calculate: i) The range and mode. ii)The sample mean and median.
ii) The variance and standard deviation. iii) The coefficient of variation.
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏
First we arrange the data in ascending order;
6.3 , 7.2 , 9.5 , 10.5 , 12 , 12.5 , 13.5
i) Range is the difference between the max and min. incubation periods, so Range = 13.5 − 6.3 = 7.2
There is no mode since, there is no repetition of data
Median is the value of the middle term (5-th term), therefore the median is = 10.5
( ∑ 𝒙 )𝟐
∑ 𝒙𝟐 – 774.53 − 730,32 44.21
𝟐 𝒏
iii) Variance, 𝒔 = = = = 7.37
𝒏−𝟏 𝟕−𝟏 6
4
(𝟐) 𝑮𝑹𝑶𝑼𝑷𝑬𝑫 𝑫𝑨𝑻𝑨
Grouped Data are the data that has been organized into groups or sets (into a frequency distribution).
1- Estimating the range for grouped Data
Range = center of final class – center of initial class
∆−
𝑴𝒐𝒅𝒆 = 𝑳𝒎𝒐 + ( )𝒘
∆− + ∆+
Where,
𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟏): The following table displays the frequency distribution of the daily commuting (in minutes)
from the home to 6th of October campus for 50 staff members working at MSA. Find the range and mode.
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏
i) ∵ 𝑹𝒂𝒏𝒈𝒆 = 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒇𝒊𝒏𝒂𝒍 𝒄𝒍𝒂𝒔𝒔 – 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒊𝒏𝒊𝒕𝒊𝒂𝒍 𝒄𝒍𝒂𝒔𝒔
𝟒𝟏 + 𝟓𝟎 𝟏 + 𝟏𝟎 𝟗𝟏 𝟏𝟏
∴ 𝑹𝒂𝒏𝒈𝒆 = ( 𝟐
)−( 𝟐
) = ( 𝟐 ) − ( 𝟐 ) = 𝟒𝟓. 𝟓 − 𝟓. 𝟓 = 𝟒𝟎
ii) From the table, the modal class (which has the height frequency) is the class [𝟏𝟏 ∶ 𝟐𝟎], therefore,
𝟏𝟎 + 𝟏𝟏
𝑳𝒎𝒐 = = 𝟏𝟎. 𝟓 , 𝒘 = (𝟐𝟎 − 𝟏𝟏) + 𝟏 = 𝟏𝟎
𝟐
∆− = 𝟏𝟒 − 𝟖 = 𝟔 , ∆+ = 𝟏𝟒 − 𝟏𝟐 = 𝟐
∆− 𝟔 𝟑
∴ 𝑴𝒐𝒅𝒆 = 𝑳𝒎𝒐 + ( ) 𝒘 = 𝟏𝟎. 𝟓 + (𝟔 + 𝟐) × 𝟏𝟎 = 𝟏𝟎. 𝟓 + (𝟒) × 𝟏𝟎 = 𝟏𝟖 𝒎𝒊𝒏𝒖𝒕𝒆𝒔
∆− + ∆+
5
𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟐): The following table shows the frequencies of ages for 4 range groups. Find the range and
mode.
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏
i) ∵ 𝑹𝒂𝒏𝒈𝒆 = 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒇𝒊𝒏𝒂𝒍 𝒄𝒍𝒂𝒔𝒔 – 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒊𝒏𝒊𝒕𝒊𝒂𝒍 𝒄𝒍𝒂𝒔𝒔
𝟔𝟎 + 𝟓𝟏 𝟑𝟎 + 𝟐𝟏 𝟏𝟏𝟏 𝟓𝟏
∴ 𝑹𝒂𝒏𝒈𝒆 = ( 𝟐
)−( 𝟐
)= ( 𝟐
) − ( 𝟐 ) = 𝟓𝟓. 𝟓 − 𝟐𝟓. 𝟓 = 𝟑𝟎
ii) From the table, the modal class (which has the height frequency is the class [𝟓𝟏 ∶ 𝟔𝟎], therefore,
𝟓𝟎 + 𝟓𝟏
𝑳𝒎𝒐 = = 𝟓𝟎. 𝟓 , 𝒘 = (𝟔𝟎 − 𝟓𝟏) + 𝟏 = 𝟏𝟎
𝟐
∆− = 𝟏𝟓 − 𝟓 = 𝟏𝟎 , ∆+ = 𝟏𝟓 − 𝟎 = 𝟏𝟓
∆− 𝟏𝟎 𝟏𝟎
∴ 𝑴𝒐𝒅𝒆 = 𝑳𝒎𝒐 + ( ) 𝒘 = 𝟓𝟎. 𝟓 + (𝟏𝟎 + 𝟏𝟓) × 𝟏𝟎 = 𝟓𝟎. 𝟓 + (𝟐𝟓) × 𝟏𝟎 = 𝟓𝟒. 𝟓
∆− + ∆+
6
𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟏): The following table displays the frequency distribution of the daily commuting (in minutes)
from home to work for a sample of 21 employees in a certain company. Find the range and median.
frequency (f) 2 7 8 4
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏
𝑵 𝟐𝟏
ii) ∵ = = 𝟏𝟎. 𝟓 therefore, the Median Class is the class [𝟔𝟏 ∶ 𝟔𝟓] (class that contains the median).
𝟐 𝟐
𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟐): The following table shows the frequencies per age group/range. Find the median of ages.
Age Group t ( year) 21 ≤ 𝑡 ≤ 30 31 ≤ 𝑡 ≤ 40 41 ≤ 𝑡 ≤ 50 51 ≤ 𝑡 ≤ 60
Frequency for age group 1 10 5 15
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏
𝑵 𝟑𝟏
∵ = = 𝟏𝟓. 𝟓 therefore, the Median Class is the class [𝟒𝟏 ∶ 𝟓𝟎] (class that contains the median).
𝟐 𝟐
(𝑵⁄𝟐) − 𝑩 (𝟑𝟏⁄𝟐) − 𝟏𝟏
Therefore, 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝑳𝒎𝒆 + ( ) . 𝒘 = 𝟒𝟎. 𝟓 + ( 𝟓
) × 𝟏𝟎
𝑮
4.5
= 40.5 + × 10 = 40.5 + 9 = 𝟒𝟗. 𝟓
5
7
Mean, Variance, Standard deviation and Coefficient of variation for Grouped Data:
∑ 𝒎𝒇 ∑ 𝒎𝒇
Mean ̅=
𝒙 𝝁=
𝒏 𝑵
( ∑ 𝒎𝒇 )𝟐 ( ∑ 𝒎𝒇 )𝟐
∑ 𝒎𝟐 𝒇 − ∑ 𝒎𝟐 𝒇 −
Variance 𝒔𝟐 = 𝒏
𝝈𝟐 = 𝑵
𝒏−𝟏 𝑵
( ∑ 𝒎𝒇 )𝟐 ( ∑ 𝒎𝒇 )𝟐
∑ 𝟐 ∑ 𝒎𝟐 𝒇 −
Standard Deviation
𝒔= √ 𝒎 𝒇 − 𝒏
𝝈=√ 𝑵
𝒏−𝟏 𝑵
𝒔 𝝈
Coefficient of variation 𝑪𝑽 = × 𝟏𝟎𝟎 % 𝑪𝑽 = × 𝟏𝟎𝟎 %
̅
𝒙 𝝁
𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟑): The following table gives the frequency distribution of the number of orders received each
day during the past 50 days at the office of a mail-order company.
Intervals 𝟏𝟎 ≤ 𝒕 ≤ 𝟏𝟐 𝟏𝟑 ≤ 𝒕 ≤ 𝟏𝟓 𝟏𝟔 ≤ 𝒕 ≤ 𝟏𝟖 𝟏𝟗 ≤ 𝒕 ≤ 𝟐𝟏
Midpoint (m) 11 14 17 20
frequency (f) 4 12 20 14
Cumulative
4 16 36 50
Frequency
𝒎𝒇 44 168 340 280 ∑ 𝒎 𝒇 = 𝟖𝟑𝟐
𝒎𝟐 𝒇 484 2352 5780 5600 ∑ 𝒎𝟐 𝒇 = 14216
8
iii) From the table, the class mode (which has the height frequency is the class [𝟏𝟔 ∶ 𝟏𝟖], therefore,
𝟏𝟓 + 𝟏𝟔
𝑳𝒎𝒐 = = 𝟏𝟓. 𝟓 , 𝒘 = (𝟏𝟖 − 𝟏𝟔) + 𝟏 = 𝟐 + 𝟏 = 𝟑
𝟐
∆− = 𝟐𝟎 − 𝟏𝟐 = 𝟖 , ∆+ = 𝟐𝟎 − 𝟏𝟒 = 𝟔
∆− 𝟖 𝟒
∴ 𝑴𝒐𝒅𝒆 = 𝑳𝒎𝒐 + ( ) 𝒘 = 𝟏𝟓. 𝟓 + (𝟖 + 𝟔) × 𝟑 = 𝟏𝟓. 𝟓 + (𝟕) × 𝟑 = 𝟏𝟕. 𝟐𝟏𝟒
∆− + ∆+
𝑵 𝟓𝟎
iv) ∵ = = 𝟐𝟓 therefore, the Median Class is the class [𝟏𝟔 ∶ 𝟏𝟖] (class that contains the median).
𝟐 𝟐
(𝑵⁄𝟐) − 𝑩 𝟐𝟓 − 𝟏𝟔
∴ 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝑳𝒎𝒆 + ( ) . 𝒘 = 𝟏𝟓. 𝟓 + × 𝟑 = 𝟏𝟓. 𝟓 + 𝟏. 𝟑𝟓 = 𝟏𝟔. 𝟖𝟓
𝑮 𝟐𝟎
( ∑ 𝒎𝒇 )𝟐 (𝟖𝟑𝟐)𝟐
∑ 𝒎𝟐 𝒇 − 𝟏𝟒𝟐𝟏𝟔 −
𝟐 𝒏 𝟓𝟎
v) 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒔 = = = 𝟕. 𝟓𝟖𝟐
𝒏−𝟏 𝟓𝟎 − 𝟏
9
MSA University Module Title: Probability and Statistics
Faculty of Engineering Module Code: MAT 361
Exercise (1)
(1) i) If the mean of the given data set { 3 , 6 , 12 , 𝒙 } is 9 . Find the value of 𝑥.
ii) If the mode of the given data set { 1 , 0 , 2 , 5 , 9 , 11 , 13 , 14 , 7 , 𝒙 } is 0 . Find the value of 𝒙.
iii) If the median of { 2 , 3 , 11 , 1 , 6 , 14 , 𝑥 , 8 , 9 } is 8. Is it true that the value of 𝑥 is greater than 8?
(2) Find the value of 𝒙 and 𝒚 so that the ordered data set {17, 22 , 26 , 29 , 34 , 𝒙 , 42 , 67 , 70 , 𝒚 }
has a mean of 42 and a median of 35.
(3) If the grades of a sample of 9 students in exam of Math. are: 1 , 3 , 22 , 7 , 17 , 9 , 10 , 4 , 17. Calculate:
i) The mode and range of grades. ii) The mean and median for the grades of student
iii) The variance and standard deviation of grades. iv) Coefficient of variation.
(4) Based on the grouped data below, find the range mode and median. 𝐴𝑛𝑠: [40 , 17.5 , 24]
(5) The following table shows the distribution of the number of hours worked each week (on average) for a
sample of 100 community college students.
Calculate: i) Range. ii) Mean. iii) Mode. iv) Median. v) Variance and standard deviation.
(6) Consider the following frequency table for a population of marks obtained in a test by 88 students.
Marks (𝒙) 0≤𝑥≤9 10 ≤ 𝑥 ≤ 19 20 ≤ 𝑥 ≤ 29 30 ≤ 𝑥 ≤ 39 40 ≤ 𝑥 ≤ 49
Frequency (𝒇) 6 16 24 25 17
Calculate: i) Range. ii) Mean. iii) Mode. iv) Median. v) Variance and standard deviation.