0% found this document useful (0 votes)
99 views10 pages

1 - Chapter (1) Analysis of Data and Its Types Exercise

This document discusses descriptive and inferential statistics. Descriptive statistics involve organizing and describing data using tables, charts, and summary statistics. Inferential statistics use sample data to make inferences about populations. It defines key concepts like populations, samples, random samples, measures of central tendency (mean, median, mode), measures of dispersion (range, variance, standard deviation), and ungrouped vs grouped data. It provides examples to calculate and compare the mean, standard deviation, and coefficient of variation for different data sets to measure central tendency and dispersion.

Uploaded by

Alaa Farouk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views10 pages

1 - Chapter (1) Analysis of Data and Its Types Exercise

This document discusses descriptive and inferential statistics. Descriptive statistics involve organizing and describing data using tables, charts, and summary statistics. Inferential statistics use sample data to make inferences about populations. It defines key concepts like populations, samples, random samples, measures of central tendency (mean, median, mode), measures of dispersion (range, variance, standard deviation), and ungrouped vs grouped data. It provides examples to calculate and compare the mean, standard deviation, and coefficient of variation for different data sets to measure central tendency and dispersion.

Uploaded by

Alaa Farouk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

CHAPTER (1)

Analysis of Data
Introduction:
Statistics is the study that deals with the collection, analysis and presentation of data to make decisions and
answer questions about samples of observations. It refers to a collection of methods (tools) developed over
hundreds of years for working with data and using data.
In simple terms, statistics is the science of data. Because many aspects of engineering practice involve working
with data, obviously knowledge of statistics is just as important to an engineer as are the other engineering
sciences. Specifically, statistical techniques can be powerful aids in designing new products and systems,
improving existing designs, and designing, developing, and improving production processes. Statistics can
also be used to see if scores on two variables are related and to make predictions.

When it comes to the statistical tools that we use in practice, it can be helpful to divide the field of statistics
into two large groups of methods:
i - Descriptive Statistics consists of methods for organizing, displaying and describing data with the help
of some tools like tables, bar charts, pie charts, graphs, …. etc., and summary results.

ii - Inferential Statistics consists of methods using sample data to make an inference, decisions or
predictions about a population of data.

Definition: A population consists of a set of elements (items or objects) whose characteristics are being
studied and a sample is a subset (portion) of the population selected for study. A random sample is a sample
drawn such that each element of the population has the same chance of being selected.

Central Tendency
In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability
distribution. It may also be called a center or location of the distribution. Colloquially, measures of central
tendency are often called averages. The term central tendency dates from the late 1920s.
The most common measures of central tendency are the arithmetic mean, the median and the mode.
A central tendency can be calculated for either a finite set of values or for a theoretical distribution, such as the
normal distribution.
1. Mean (The Arithmetic Mean) equals the sum of observations (values) divided by the number of values.
2. Median (The Middle of the road) equals the observation (value) in the center when all observations
are ordered from smallest to largest; when there is an even number of observations, the median is defined
𝑛+1 𝑡ℎ
as the average of the middle two values therefore, the Median = the value of the ( ) term in a ranked
2

data set.
3. Mode equals the value that occurs with highest frequency among all values of the data set, in other words
the mode is the number(s) that appear(s) the most out of a given set of data. A data set can have more
than one mode value.
4. Geometric Mean is an appropriate measure when values change exponentially therefore, it is more
commonly used in microbiological and serological research.

1
One important disadvantage of GM is that it cannot be used if any of the values are zero or negative so, this
measure is valid only for data that are measured absolutely on a strictly positive scale. Therefore, GM equals
the nth root of the product of the data values, where there are n of these.

𝑮𝒆𝒐𝒎𝒆𝒕𝒓𝒊𝒄 𝑴𝒆𝒂𝒏 𝑮. 𝑴 = 𝒏√𝒙𝟏 𝒙𝟐 … … … 𝒙𝒏

Measures of Dispersion: (Range, Variance, standard deviation, standard error)


Definition: What is Dispersion (variability or spread)?
Dispersion describes the manner in which the data are scattered around a specific value (such as the mean).
To measure the dispersion of a set of numbers means to measure how spread out the numbers in the set are.

1. Standard deviation: The concept of Standard Deviation (SD) was introduced by Karl Pearson in 1893.
It is by far the most important and widely used measure of dispersion.
It is a measure of dispersion (variability) of a set of data from its mean is calculated as the square root of the
variance, it measures the absolute variability of a distribution; the higher the dispersion or variability, the greater
is the standard deviation and greater will be the magnitude of the deviation of the value from their mean.

2. Coefficient of variation:
In probability theory and statistics, the coefficient of variation (CV), is a standardized measure of dispersion
of a probability distribution or frequency distribution. It is often expressed as a percentage, and is defined as
the ratio of the standard deviation to the mean (or its absolute value). The CV is commonly used in
engineering when doing quality assurance studies.

A direct comparison of 2 or more measures of dispersion (such as the standard deviation for a distribution of
annual incomes compared to the standard deviation for a distribution of the number of days absent for the
same group of employees) is impossible. It is impossible because the two values are measured in different
units. The absenteeism would be measured in number of days, while the incomes would be measured in dollar
bills($). In order to make a meaningful comparison of the 2 standard deviations, we need to convert them to a
relative value. So when comparing between data sets with different units or widely different means, one should
use the coefficient of variation for comparison.

Example: If the mean of salaries of some employers in a week is 200 L.E., with S.D. 35, and the mean of
their experience in years is 10 years with S.D. 6 years. Compare the dispersion of their salaries with their
experience.
Solution
𝒔 𝟑𝟓
∵ (𝑪𝑽)𝒔𝒂𝒍𝒂𝒓𝒊𝒆𝒔 = 𝒙
̅
× 𝟏𝟎𝟎 % = 𝟐𝟎𝟎 × 𝟏𝟎𝟎% = 𝟏𝟕. 𝟓 % 𝑎𝑛𝑑,

𝒔 𝟔
(𝑪𝑽)𝒆𝒙𝒑𝒆𝒓𝒊𝒆𝒏𝒄𝒆 = × 𝟏𝟎𝟎 % = 𝟏𝟎 × 𝟏𝟎𝟎% = 𝟔𝟎 %
𝒙
̅

Therefore,
(𝑪𝑽)𝒔𝒂𝒍𝒂𝒓𝒊𝒆𝒔 = 𝟏𝟕. 𝟓 % < (𝑪𝑽)𝒆𝒙𝒑𝒆𝒓𝒊𝒆𝒏𝒄𝒆 = 𝟔𝟎 %

2
(𝟏) 𝑼𝑵𝑮𝑹𝑶𝑼𝑷𝑬𝑫 𝑫𝑨𝑻𝑨
Ungrouped Data are the data that has not been organized into groups, it looks like a list of numbers.

Measures Sample Population

∑𝒙 ∑𝒙
Arithmetic Mean ̅ =
𝒙 𝝁=
𝒏 𝑵
( ∑ 𝒙 )𝟐 ( ∑ 𝑥 )2
∑ 𝒙𝟐 – ∑ 𝒙𝟐 −
Variance 𝒔𝟐 = 𝒏
𝝈𝟐 = 𝑁
𝒏−𝟏 𝑵

(∑ 𝒙)𝟐 (∑ 𝒙)𝟐
∑ 𝒙𝟐 − ∑ 𝒙𝟐 −
Standard Deviation √𝒔𝟐 = 𝒔 = √ 𝒏 √𝝈𝟐 = 𝝈 = √ 𝑵
𝒏−𝟏 𝑵
𝒔 𝝈
Coefficient of variation 𝑪𝑽 = × 𝟏𝟎𝟎 % 𝑪𝑽 = × 𝟏𝟎𝟎 %
̅
𝒙 𝝁

Examples for Ungrouped Data


𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟏): Consider the following three data sets A, B and C.
𝑨 = { 9 , 10 , 11 , 7 , 13 } , 𝑩 = { 10 , 10 , 10 , 10 , 10 } and 𝑪 = { 1 , 1 , 10 , 19 , 19 },
a) Calculate the mean of each data set.
b) Calculate the standard deviation of each data set.
c) Which set has the largest standard deviation?
d) Is it possible to answer question c) without calculations of the standard deviation?
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏
∑ 𝒙𝒊 9 + 10 + 11 + 7 + 13 50
a) Mean of Data set A is, 𝝁𝑨 = = = = 10
𝑵 5 5
∑ 𝑥𝑖 10 + 10 + 10 + 10 + 10 50
Mean of Data set B is, 𝝁𝑩 = = = = 10
𝑁 5 5
∑ 𝑥𝑖 1 + 1 + 10 + 19 + 19 50
Mean of Data set C is, 𝝁𝑪 = = = = 10
𝑁 5 5

𝟐 (∑ 𝒙)𝟐
∑𝒙 − 520 − 500
b) Standard Deviation Data set A is, 𝝈𝑨 = √ 𝑵
= √ = √4 = 2
𝑵 5

𝟐 (∑ 𝒙)𝟐
∑𝒙 − 500 − 500
Standard Deviation Data set B is, 𝝈𝑩 = √ 𝑵
= √ = √0 = 0
𝑵 5

𝟐 (∑ 𝒙)𝟐
∑𝒙 − 824 − 500
Standard Deviation Data set C is, 𝝈𝑪 = √ 𝑵
= √ = √64.8 = 8.05
𝑵 5

c) Data set C has the largest standard deviation.


d) Yes, since data Set C has data values that are further away from the mean compared to sets A and B.

3
𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟐): The exam grades of a sample of 9 students in a class are: 4 , 10 , 7 , 7 , 6 , 9 , 3 , 8 , 9
Calculate: i) The range ii) The mode iii) The mean iv) The median
v) The variance and standard deviation vi) The coefficient of variation.
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏
First, we arrange the data in ascending order ; 3 , 4 , 6 , 7 , 7 , 8 , 9 , 9 , 10
i) The range = difference between the max. and min. value of grades = 10 – 3 = 7
ii) The given data set has 2 modes: 7 and 9
∑ 𝒙𝒊 (3 + 4 + 6 + 7 + 7 + 8 + 9 + 9 +10) 63
̅=
iii) The mean, 𝒙 = = =7
𝒏 𝟗 9
𝑛+1 9+1
iv) ∵ Middle term = = =5. ∴ The median = the value of the middle term = 7
2 2
(∑ 𝒙)𝟐
∑ 𝒙𝟐 − 485 – 441
𝟐 𝒏
v) The variance, 𝒔 = = = 5.5
𝒏−𝟏 8

The standard deviation, 𝒔 = √𝒔𝟐 = √5.5 = 2.35


𝒔 2.35
vi) The coefficient of variation, 𝑪𝑽 = × 𝟏𝟎𝟎 % = × 100 % = 33.57 %
𝒙̅ 7

𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟑): The duration of time from the first exposure to HIV infection to AIDS diagnosis is called the
incubation period.
The incubation periods of a random sample of 7 HIV infected individuals is given below (in years):
𝟏𝟐. 𝟎 , 𝟏𝟎. 𝟓 , 𝟗. 𝟓 , 𝟔. 𝟑 , 𝟏𝟑. 𝟓 , 𝟏𝟐. 𝟓 , 𝟕. 𝟐
Calculate: i) The range and mode. ii)The sample mean and median.
ii) The variance and standard deviation. iii) The coefficient of variation.
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏
First we arrange the data in ascending order;
6.3 , 7.2 , 9.5 , 10.5 , 12 , 12.5 , 13.5
i) Range is the difference between the max and min. incubation periods, so Range = 13.5 − 6.3 = 7.2
There is no mode since, there is no repetition of data

∑ 𝒙𝒊 6.3 + 7.2 + 9.5 + 10.5 + 12 + 12.5 + 13.5 71.5


̅=
ii) The mean, 𝒙 = = = 10.21
𝒏 7 7

Median is the value of the middle term (5-th term), therefore the median is = 10.5
( ∑ 𝒙 )𝟐
∑ 𝒙𝟐 – 774.53 − 730,32 44.21
𝟐 𝒏
iii) Variance, 𝒔 = = = = 7.37
𝒏−𝟏 𝟕−𝟏 6

The standard deviation = 𝒔 = √ 𝒔𝟐 = √7.37 = 2.71


𝒔 2.71
iv) The coefficient of variation, 𝑪𝑽 = ̅
× 𝟏𝟎𝟎 % = × 100 % = 26.59 %
𝒙 10.21

4
(𝟐) 𝑮𝑹𝑶𝑼𝑷𝑬𝑫 𝑫𝑨𝑻𝑨
Grouped Data are the data that has been organized into groups or sets (into a frequency distribution).
1- Estimating the range for grouped Data
Range = center of final class – center of initial class

2- Estimating the Mode of grouped Data


We have defined mode as the element which has the highest frequency in a given data set. In grouped
data, we can find two kinds of mode: the Modal Class that includes the highest frequency and the mode
itself, which we calculate from the modal class using the following formula:

∆−
𝑴𝒐𝒅𝒆 = 𝑳𝒎𝒐 + ( )𝒘
∆− + ∆+
Where,

w is the class width = “Upper Range Limit” – “Lower Range Limit” + 1


∆− is the difference between the frequency of class mode and the frequency of the class before the class mode.
∆+ is the difference between the frequency of class mode and the frequency of the class after the class mode.
𝑳𝒎𝒐 is the lower boundary of class mode.

𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟏): The following table displays the frequency distribution of the daily commuting (in minutes)
from the home to 6th of October campus for 50 staff members working at MSA. Find the range and mode.

Daily commuting time t (min) 1 ≤ 𝑡 ≤ 10 𝟏𝟏 ≤ 𝒕 ≤ 𝟐𝟎 21 ≤ 𝑡 ≤ 30 31 ≤ 𝑡 ≤ 40 41 ≤ 𝑡 ≤ 50

Number of staff members (f) 8 14 12 9 7

𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏
i) ∵ 𝑹𝒂𝒏𝒈𝒆 = 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒇𝒊𝒏𝒂𝒍 𝒄𝒍𝒂𝒔𝒔 – 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒊𝒏𝒊𝒕𝒊𝒂𝒍 𝒄𝒍𝒂𝒔𝒔

𝟒𝟏 + 𝟓𝟎 𝟏 + 𝟏𝟎 𝟗𝟏 𝟏𝟏
∴ 𝑹𝒂𝒏𝒈𝒆 = ( 𝟐
)−( 𝟐
) = ( 𝟐 ) − ( 𝟐 ) = 𝟒𝟓. 𝟓 − 𝟓. 𝟓 = 𝟒𝟎

ii) From the table, the modal class (which has the height frequency) is the class [𝟏𝟏 ∶ 𝟐𝟎], therefore,

𝟏𝟎 + 𝟏𝟏
𝑳𝒎𝒐 = = 𝟏𝟎. 𝟓 , 𝒘 = (𝟐𝟎 − 𝟏𝟏) + 𝟏 = 𝟏𝟎
𝟐

∆− = 𝟏𝟒 − 𝟖 = 𝟔 , ∆+ = 𝟏𝟒 − 𝟏𝟐 = 𝟐

∆− 𝟔 𝟑
∴ 𝑴𝒐𝒅𝒆 = 𝑳𝒎𝒐 + ( ) 𝒘 = 𝟏𝟎. 𝟓 + (𝟔 + 𝟐) × 𝟏𝟎 = 𝟏𝟎. 𝟓 + (𝟒) × 𝟏𝟎 = 𝟏𝟖 𝒎𝒊𝒏𝒖𝒕𝒆𝒔
∆− + ∆+

5
𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟐): The following table shows the frequencies of ages for 4 range groups. Find the range and
mode.

Age Group t ( year) 21 ≤ 𝑡 ≤ 30 31 ≤ 𝑡 ≤ 40 41 ≤ 𝑡 ≤ 50 𝟓𝟏 ≤ 𝒕 ≤ 𝟔𝟎

Frequency for age group 1 10 5 15

𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏
i) ∵ 𝑹𝒂𝒏𝒈𝒆 = 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒇𝒊𝒏𝒂𝒍 𝒄𝒍𝒂𝒔𝒔 – 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒊𝒏𝒊𝒕𝒊𝒂𝒍 𝒄𝒍𝒂𝒔𝒔
𝟔𝟎 + 𝟓𝟏 𝟑𝟎 + 𝟐𝟏 𝟏𝟏𝟏 𝟓𝟏
∴ 𝑹𝒂𝒏𝒈𝒆 = ( 𝟐
)−( 𝟐
)= ( 𝟐
) − ( 𝟐 ) = 𝟓𝟓. 𝟓 − 𝟐𝟓. 𝟓 = 𝟑𝟎

ii) From the table, the modal class (which has the height frequency is the class [𝟓𝟏 ∶ 𝟔𝟎], therefore,
𝟓𝟎 + 𝟓𝟏
𝑳𝒎𝒐 = = 𝟓𝟎. 𝟓 , 𝒘 = (𝟔𝟎 − 𝟓𝟏) + 𝟏 = 𝟏𝟎
𝟐

∆− = 𝟏𝟓 − 𝟓 = 𝟏𝟎 , ∆+ = 𝟏𝟓 − 𝟎 = 𝟏𝟓

∆− 𝟏𝟎 𝟏𝟎
∴ 𝑴𝒐𝒅𝒆 = 𝑳𝒎𝒐 + ( ) 𝒘 = 𝟓𝟎. 𝟓 + (𝟏𝟎 + 𝟏𝟓) × 𝟏𝟎 = 𝟓𝟎. 𝟓 + (𝟐𝟓) × 𝟏𝟎 = 𝟓𝟒. 𝟓
∆− + ∆+

3- Estimating the Median from Grouped Data


We have defined the Median as the value of the middle term for a given ordered data set that is to say, it
divides the ordered data into two equal parts in other words, it is the score of the scale that separates the
upper half of the distribution from the lower.
Therefore, in order to find the median we apply the following steps
Step 1: Construct the cumulative frequency distribution.
Step 2: Decide the Class Median (the class that contain the median), which is the first class with the value of
cumulative frequency equal at least 𝑵/𝟐.
Step 3: Find the median by using the following formula:
( 𝑵⁄ 𝟐 ) − 𝑩
𝑴𝒆𝒅𝒊𝒂𝒏 = 𝑳𝒎𝒆 + ( ).𝒘
𝑮
where:
𝑳𝒎𝒆 is the lower class boundary of the class median.
N is the total number of values.
B is the cumulative frequency of the classes before the class median.
G is the frequency of the class median.
w is the class width.

6
𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟏): The following table displays the frequency distribution of the daily commuting (in minutes)
from home to work for a sample of 21 employees in a certain company. Find the range and median.

Time in min (t) 51 ≤ 𝑡 ≤ 55 56 ≤ 𝑡 ≤ 60 61 ≤ 𝑡 ≤ 65 66 ≤ 𝑡 ≤ 70

frequency (f) 2 7 8 4

𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏

Time in min (t) 51 ≤ 𝑡 ≤ 55 56 ≤ 𝑡 ≤ 60 𝟔𝟏 ≤ 𝒕 ≤ 𝟔𝟓 66 ≤ 𝑡 ≤ 70


frequency (f) 2 7 8 4
Cumulative Frequency 2 9 17 21

i) ∵ 𝑹𝒂𝒏𝒈𝒆 = 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒇𝒊𝒏𝒂𝒍 𝒄𝒍𝒂𝒔𝒔 – 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒊𝒏𝒊𝒕𝒊𝒂𝒍 𝒄𝒍𝒂𝒔𝒔


𝟕𝟎 + 𝟔𝟔 𝟓𝟓 + 𝟓𝟏 𝟏𝟑𝟔 𝟏𝟎𝟔
∴ 𝑹𝒂𝒏𝒈𝒆 = ( 𝟐
)−( 𝟐
)= ( 𝟐
)−( 𝟐
) = 𝟔𝟖 − 𝟓𝟑 = 𝟏𝟓

𝑵 𝟐𝟏
ii) ∵ = = 𝟏𝟎. 𝟓 therefore, the Median Class is the class [𝟔𝟏 ∶ 𝟔𝟓] (class that contains the median).
𝟐 𝟐

In our case, , 𝑳𝒎𝒆 = 𝟔𝟎. 𝟓 , 𝑵 = 𝟐𝟏 , 𝑩=𝟗 , 𝑮=𝟖 , 𝒘=𝟓


(𝑵⁄𝟐) − 𝑩 (21⁄2) − 9
Therefore, 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝑳𝒎𝒆 + ( ) . 𝒘 = 60.5 + ( )×5
𝑮 8

= 60.5 + 0.9375 = 𝟔𝟏. 𝟒𝟑𝟕𝟓

𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟐): The following table shows the frequencies per age group/range. Find the median of ages.
Age Group t ( year) 21 ≤ 𝑡 ≤ 30 31 ≤ 𝑡 ≤ 40 41 ≤ 𝑡 ≤ 50 51 ≤ 𝑡 ≤ 60
Frequency for age group 1 10 5 15

𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏

Age Group t ( year) 21 ≤ 𝑡 ≤ 30 31 ≤ 𝑡 ≤ 40 𝟒𝟏 ≤ 𝒕 ≤ 𝟓𝟎 51 ≤ 𝑡 ≤ 60


Frequency for age group 1 10 5 15
Cumulative Frequency 1 11 16 31

𝑵 𝟑𝟏
∵ = = 𝟏𝟓. 𝟓 therefore, the Median Class is the class [𝟒𝟏 ∶ 𝟓𝟎] (class that contains the median).
𝟐 𝟐

In our example, 𝑳𝒎𝒆 = 𝟒𝟎. 𝟓 , 𝑵 = 𝟑𝟏 , 𝑩 = 𝟏𝟏 , 𝑮=𝟓 , 𝒘 = 𝟏𝟎

(𝑵⁄𝟐) − 𝑩 (𝟑𝟏⁄𝟐) − 𝟏𝟏
Therefore, 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝑳𝒎𝒆 + ( ) . 𝒘 = 𝟒𝟎. 𝟓 + ( 𝟓
) × 𝟏𝟎
𝑮

4.5
= 40.5 + × 10 = 40.5 + 9 = 𝟒𝟗. 𝟓
5
7
Mean, Variance, Standard deviation and Coefficient of variation for Grouped Data:

Measures Sample Population

∑ 𝒎𝒇 ∑ 𝒎𝒇
Mean ̅=
𝒙 𝝁=
𝒏 𝑵
( ∑ 𝒎𝒇 )𝟐 ( ∑ 𝒎𝒇 )𝟐
∑ 𝒎𝟐 𝒇 − ∑ 𝒎𝟐 𝒇 −
Variance 𝒔𝟐 = 𝒏
𝝈𝟐 = 𝑵
𝒏−𝟏 𝑵

( ∑ 𝒎𝒇 )𝟐 ( ∑ 𝒎𝒇 )𝟐
∑ 𝟐 ∑ 𝒎𝟐 𝒇 −
Standard Deviation
𝒔= √ 𝒎 𝒇 − 𝒏
𝝈=√ 𝑵
𝒏−𝟏 𝑵
𝒔 𝝈
Coefficient of variation 𝑪𝑽 = × 𝟏𝟎𝟎 % 𝑪𝑽 = × 𝟏𝟎𝟎 %
̅
𝒙 𝝁

𝑬𝒙𝒂𝒎𝒑𝒍𝒆 (𝟑): The following table gives the frequency distribution of the number of orders received each
day during the past 50 days at the office of a mail-order company.

Daily time t (hours) 𝟏𝟎 ≤ 𝒕 ≤ 𝟏𝟐 𝟏𝟑 ≤ 𝒕 ≤ 𝟏𝟓 𝟏𝟔 ≤ 𝒕 ≤ 𝟏𝟖 𝟏𝟗 ≤ 𝒕 ≤ 𝟐𝟏

Number of orders (f) 4 12 20 14

Calculate: i) Range. ii) Mean. iii) Mode.


iv) Median. v) Variance and standard deviation. vi) Coefficient of variation.
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏

Intervals 𝟏𝟎 ≤ 𝒕 ≤ 𝟏𝟐 𝟏𝟑 ≤ 𝒕 ≤ 𝟏𝟓 𝟏𝟔 ≤ 𝒕 ≤ 𝟏𝟖 𝟏𝟗 ≤ 𝒕 ≤ 𝟐𝟏

Midpoint (m) 11 14 17 20
frequency (f) 4 12 20 14
Cumulative
4 16 36 50
Frequency
𝒎𝒇 44 168 340 280 ∑ 𝒎 𝒇 = 𝟖𝟑𝟐
𝒎𝟐 𝒇 484 2352 5780 5600 ∑ 𝒎𝟐 𝒇 = 14216

i) ∵ 𝑹𝒂𝒏𝒈𝒆 = 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒇𝒊𝒏𝒂𝒍 𝒄𝒍𝒂𝒔𝒔 – 𝒄𝒆𝒏𝒕𝒆𝒓 𝒐𝒇 𝒊𝒏𝒊𝒕𝒊𝒂𝒍 𝒄𝒍𝒂𝒔𝒔

∴ 𝑹𝒂𝒏𝒈𝒆 = (𝟐𝟏 +𝟐 𝟏𝟗) − (𝟏𝟐 +𝟐 𝟏𝟎) = (𝟒𝟎


𝟐𝟐
𝟐
) − ( 𝟐 ) = 𝟐𝟎 − 𝟏𝟏 = 𝟗
∑ 𝒎𝒇 𝟖𝟐𝟑
̅=
ii) 𝑴𝒆𝒂𝒏 𝒙 = = 𝟏𝟔. 𝟔𝟒
𝒏 𝟓𝟎

8
iii) From the table, the class mode (which has the height frequency is the class [𝟏𝟔 ∶ 𝟏𝟖], therefore,
𝟏𝟓 + 𝟏𝟔
𝑳𝒎𝒐 = = 𝟏𝟓. 𝟓 , 𝒘 = (𝟏𝟖 − 𝟏𝟔) + 𝟏 = 𝟐 + 𝟏 = 𝟑
𝟐

∆− = 𝟐𝟎 − 𝟏𝟐 = 𝟖 , ∆+ = 𝟐𝟎 − 𝟏𝟒 = 𝟔

∆− 𝟖 𝟒
∴ 𝑴𝒐𝒅𝒆 = 𝑳𝒎𝒐 + ( ) 𝒘 = 𝟏𝟓. 𝟓 + (𝟖 + 𝟔) × 𝟑 = 𝟏𝟓. 𝟓 + (𝟕) × 𝟑 = 𝟏𝟕. 𝟐𝟏𝟒
∆− + ∆+
𝑵 𝟓𝟎
iv) ∵ = = 𝟐𝟓 therefore, the Median Class is the class [𝟏𝟔 ∶ 𝟏𝟖] (class that contains the median).
𝟐 𝟐

In our example, 𝑳𝒎𝒆 = 𝟏𝟓. 𝟓 , 𝑵 = 𝟓𝟎 , 𝑩 = 𝟏𝟔 , 𝑮 = 𝟐𝟎 , 𝒘=𝟑 Therefore,

(𝑵⁄𝟐) − 𝑩 𝟐𝟓 − 𝟏𝟔
∴ 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝑳𝒎𝒆 + ( ) . 𝒘 = 𝟏𝟓. 𝟓 + × 𝟑 = 𝟏𝟓. 𝟓 + 𝟏. 𝟑𝟓 = 𝟏𝟔. 𝟖𝟓
𝑮 𝟐𝟎

( ∑ 𝒎𝒇 )𝟐 (𝟖𝟑𝟐)𝟐
∑ 𝒎𝟐 𝒇 − 𝟏𝟒𝟐𝟏𝟔 −
𝟐 𝒏 𝟓𝟎
v) 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒔 = = = 𝟕. 𝟓𝟖𝟐
𝒏−𝟏 𝟓𝟎 − 𝟏

𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏, 𝑺 = √𝒔𝟐 = √𝟕. 𝟓𝟖𝟐 = 𝟐. 𝟕𝟓


Thus, the standard deviation of the number of orders received at the office of this mail-order company
during the past 50 days is 2.75
𝒔 𝟐.𝟕𝟓
vi) 𝑻𝒉𝒆 𝒄𝒐𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝒐𝒇 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 = 𝑪𝑽 = × 𝟏𝟎𝟎 % = 𝟏𝟔.𝟔𝟒 × 𝟏𝟎𝟎 % = 𝟏𝟔. 𝟓𝟑 %
𝒙̅

9
MSA University Module Title: Probability and Statistics
Faculty of Engineering Module Code: MAT 361

Exercise (1)
(1) i) If the mean of the given data set { 3 , 6 , 12 , 𝒙 } is 9 . Find the value of 𝑥.
ii) If the mode of the given data set { 1 , 0 , 2 , 5 , 9 , 11 , 13 , 14 , 7 , 𝒙 } is 0 . Find the value of 𝒙.
iii) If the median of { 2 , 3 , 11 , 1 , 6 , 14 , 𝑥 , 8 , 9 } is 8. Is it true that the value of 𝑥 is greater than 8?

(2) Find the value of 𝒙 and 𝒚 so that the ordered data set {17, 22 , 26 , 29 , 34 , 𝒙 , 42 , 67 , 70 , 𝒚 }
has a mean of 42 and a median of 35.

(3) If the grades of a sample of 9 students in exam of Math. are: 1 , 3 , 22 , 7 , 17 , 9 , 10 , 4 , 17. Calculate:
i) The mode and range of grades. ii) The mean and median for the grades of student
iii) The variance and standard deviation of grades. iv) Coefficient of variation.

(4) Based on the grouped data below, find the range mode and median. 𝐴𝑛𝑠: [40 , 17.5 , 24]

Time to travel to work 1 ≤ ℎ ≤ 10 11 ≤ ℎ ≤ 20 21 ≤ ℎ ≤ 30 31 ≤ ℎ ≤ 40 41 ≤ ℎ ≤ 50


Frequency 8 14 12 9 7

(5) The following table shows the distribution of the number of hours worked each week (on average) for a
sample of 100 community college students.

Hours Worked 0≤ℎ≤9 10 ≤ ℎ ≤ 19 20 ≤ ℎ ≤ 29 30 ≤ ℎ ≤ 39 40 ≤ ℎ ≤ 49


No. of Students 24 14 39 18 5

Calculate: i) Range. ii) Mean. iii) Mode. iv) Median. v) Variance and standard deviation.

(6) Consider the following frequency table for a population of marks obtained in a test by 88 students.
Marks (𝒙) 0≤𝑥≤9 10 ≤ 𝑥 ≤ 19 20 ≤ 𝑥 ≤ 29 30 ≤ 𝑥 ≤ 39 40 ≤ 𝑥 ≤ 49
Frequency (𝒇) 6 16 24 25 17

Calculate: i) Range. ii) Mean. iii) Mode. iv) Median. v) Variance and standard deviation.

𝑫𝒓. 𝑴𝒐𝒉𝒂𝒎𝒆𝒅 𝑺𝒂𝒊𝒅.


10

You might also like