0% found this document useful (0 votes)
4 views

Module 1.3 - Central Tendency and Variability of Data

The document provides an overview of probability and statistics, focusing on measures of central tendency, including mean, median, and mode, as well as variability of data through measures such as range, variance, and standard deviation. It includes definitions, formulas, and examples for calculating these statistical measures for both ungrouped and grouped data distributions. Additionally, it discusses the importance of understanding data distribution shapes and variability in data analysis.

Uploaded by

Chyrra Macatula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Module 1.3 - Central Tendency and Variability of Data

The document provides an overview of probability and statistics, focusing on measures of central tendency, including mean, median, and mode, as well as variability of data through measures such as range, variance, and standard deviation. It includes definitions, formulas, and examples for calculating these statistical measures for both ungrouped and grouped data distributions. Additionally, it discusses the importance of understanding data distribution shapes and variability in data analysis.

Uploaded by

Chyrra Macatula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

I E101

ENGINEERING DATA ANALYSIS


Mo du l e 1 – Pa r t
1 Probability and Statistics
Review of Basic Concepts of
Central Tendency
Measures of Central Tendency
o To obtain a feel for a large amount of data, it is useful
to be able to summarize it by some suitably chosen
measures.
o In this section, we present some summarizing
statistics, where a statistic is a numerical quantity
whose value is determined by the data.
o To start, we introduce some statistics that are used
for describing the center of a set of data values.
Measures of Central Tendency
o A statistic is a characteristic or measure obtained by
using the data values from a sample.
o A parameter is a characteristic or measure obtained
by using the data values from a specific population.
Mean
o The mean is defined to be the sum of the data values
divided by the total number of values.

o For the sample:

The symbol represents the sample mean. is read as “X-


bar”. The Greek symbol is read as “sigma” and it means “to
sum”. “n” is the sample size.

𝑋 1+ 𝑋 2+…+ 𝑋 𝑛
𝑋=
𝑛
Mean
o For a finite population of values:

The symbol represents the population mean. is read as


“mu”. ”N” is the size of the population

𝑋 1+ 𝑋 2 +…+ 𝑋 𝑛
𝜇=
𝑁
Mean
o For ungrouped frequency distribution:

is the frequency for the corresponding value of , and

𝑋=
∑ ( 𝑓 ∗ 𝑋)
𝑛
Mean
o For ungrouped frequency distribution:

Example: The scores for 25 students on a 4-point quiz are


given in the table. Find the mean score.

Scores, X Frequency, f
0 2
1 4
2 12
3 4
4 3
Mean
o For ungrouped frequency distribution:

Solution: The scores for 25 students on a 4-point quiz are


given in the table. Find the mean score.
Scores, X Frequency, f*X
f
0 2 0
𝑋=
∑ ( 𝑓 ∗ 𝑋 ) 52
= =2.08
1 4 4 𝑛 25
2 12 24
3 4 12
4 3 12
Mean
o For grouped frequency distribution:

Here is the corresponding class midpoint.

𝑋=
∑ ( 𝑓 ∗ 𝑋 𝑚)
𝑛
Mean
o For grouped frequency distribution:

Example: Given the table below. Find the mean.

Class Frequency, f
15.5 – 20.5 3
20.5 – 25.5 5
25.5 – 30.5 4
30.5 – 35.5 3
35.5 – 40.5 2
Mean
o For grouped frequency distribution:

Solution: Given the table below. Find the mean.


Class Frequency
,f
(𝑓 ∗ 𝑋 𝑚 )
𝑋=
15.5 – 20.5 3 18 54
20.5 – 25.5 5 23 115
25.5 – 30.5
30.5 – 35.5
4
3
28
33
112
99
𝑛
35.5 – 40.5 2 38 76
Median
o When a data set is ordered, it is called a data array.

o The median is defined to be the midpoint of the data


array.

o The symbol used to denote the median is MD.

o When there is an even number of values in the data set,


the median is obtained by taking the average of the
two middle numbers.
Median
o Example: The weights (in pounds) of seven army recruits
are 180, 201, 220, 191, 219, 209, and 186. Find the
median.

o Solution:

o Arrange the data in order and select the middle point.

o Data array: 180,186, 191, 201, 209, 219, 220.

The median, MD = 201 lbs.


Median
o Example: The weights (in pounds) of seven army
recruits are 180, 201, 220, 191, 219, 209, 186, and 202.
Find the median.

o Solution:

o Arrange the data in order and select the middle point.

o Data array: 180, 186, 191, 201, 202, 209, 219, 220.

o The median, MD = (201 + 202)/2 = 201.5 lbs


Median
o For ungrouped frequency distribution:

o For an ungrouped frequency distribution, find the median


by examining the cumulative frequencies to locate the
middle value.

o If n is the sample size, compute n/2. Locate the data


point where n/2 values fall below, and n/2 values fall
above.
Median
o For ungrouped frequency distribution:

o Example: LRJ Appliance recorded the number of VCRs


sold per week over a one-year period. The data is given
below.
No. of Sets Frequency, f Cumulative
Sold Frequency
1 4 4
𝑛 24
2 9 13 = =12
2 2
3 6 19
4 2 21
5 3 24

MD = 2 sets sold. This class contains the 5th through


th
Median
o For grouped frequency distribution:

=lower boundary of the median class


Median
o For grouped frequency distribution:

o Example: Given the table below, find the median

Class Frequency Cumulativ


,f e
Frequency
15.5 – 20.5 3 3
𝑛 17
20.5 – 25.5 5 8 = = 8.5 ≈ 9
2 2
25.5 – 30.5 4 12
17
( )
30.5 – 35.5 3 15
35.5 – 40.5 2 17
−8
2
𝑀𝐷= ( 5 ) + 25.5
This will be the median 4
class.
Mode
o The mode is defined to be the value that occurs most
often in a data set.

o A data set can have more than one mode.

o A data set is said to have no mode if all values occur


with equal frequency.
Mode
Example: The following data represent the duration (in
days) of U.S. space shuttle voyages for the years 1992-94.
Find the mode.

o Data set: 8, 9, 9, 14, 8, 8, 10, 7, 6, 9, 7, 8, 10, 14, 11, 8,


14, 11.

Solution:

o Ordered set: 6, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 11, 11,


14, 14, 14.

o Mode = 8 days
Mode
Example: Six strains of bacteria were tested to see how
long they could remain alive outside their normal
environment. The time, in minutes, is given below. Find the
mode.

o Data set: 2, 3, 5, 7, 8, 10.

Solution:

o There is no mode since each data value occurs


equally with a frequency of one.
Mode
Example: Eleven different automobiles were tested at a
speed of 15 mph for stopping distances. The distance, in
feet, is given below. Find the mode.

o Data set: 15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26.

Solution:

o There are two modes (bimodal). The values are 18 and 24.
Mode
o For ungrouped frequency distribution:

o Find the class with the highest frequency.

Example: The scores for 25 students on a 4-point quiz are


given in the table. Find the mode.

Scores, X Frequency, f
0 2
1 4
2 12
3 4
4 3

Mode = 2
Mode
o For grouped frequency distribution:

o The mode for grouped data is the modal class.

o The modal class is the class with the largest frequency.

o Sometimes the midpoint of the class is used rather than


the boundaries.
Mode
o For grouped frequency distribution:

Example: Given the table below. Find the mode.

Class Frequency, f
15.5 – 20.5 3
20.5 – 25.5 5
25.5 – 30.5 4
30.5 – 35.5 3
35.5 – 40.5 2

Modal Class = 20.5 –


25.5
Midrange
o The midrange is found by adding the lowest and highest
values in the data set and dividing by 2.

o The midrange is a rough estimate of the middle value of


the data.

o The symbol that is used to represent the midrange is MR.

Example: The weights (in pounds) of seven army recruits


are 180, 201, 220, 191, 219, 209, and 186. Find the
midrange.

Solution:
Distribution Shapes
o Frequency distributions can assume many shapes.
o The three most important shapes are positively skewed,
symmetrical, and negatively skewed.

Positively skewed
Distribution Shapes
Symmetrical
Distribution Shapes
Negatively Skewed
Variability of Data
Measures of Variation
o We have presented statistics that describes the central
tendencies of a data set.

o We are also interested in ones that describe the spread or


variability of the data values.

o A statistic that could be used for this purpose would be


one that measures the average value of the squares of
the distances between the data values and the sample
mean.
Measures of Variation
o Illustration:

SET A SET B
No. Score No. Score
1 12 1 15
2 10 2 9
3 13 3 10
4 12 4 7
5 15 5 16
6 14 6 17
7 12 7 18
8 15 8 19
9 12 9 8
10 10 10 6
Mean 12.5 Mean 12.5
Measures of Variation
o Variance and SD can be used to determine the spread of
the data.

o If the variance or SD is large, the data are more dispersed.

o The measure of variance and SD are used to determined


the consistency of a variable.

o In the manufacture of fittings, such as nuts and bolts, the


variation in diameter must be small, or the parts will not fit
together.

o The variance and SD are used to determine the number of


data values that fall within a specified interval in
distribution.
Range
o The range is defined to be the highest value minus the
lowest value. The symbol R is used for the range.

o R = highest value – lowest value

o Extremely large or extremely small data values can


drastically affect the range.
Population Variance
o The variance is the average of the squares of the distance
each value is from the mean.

o The symbol for the population variance is ( is the Greek


lowercase letter sigma).
Population Standard Deviation
o The standard deviation is the square root of the variance.

√ ∑ ( 𝑋 − 𝜇)
2

𝜎 =√𝜎 =
2
𝑁
Population Variance and Standard Deviation
Example: Consider the following data to constitute the
population: 10, 60, 50, 30, 40, 20. Find the mean and


variance.
2
2(𝑋−𝜇)
𝜎=
Solution:
10 -25 625
60
50
+25
+15
625
225
𝑁
Note: Do not round off at early stage of the
30 -5 25
computation. For the standard deviation, use the
40 +5 25 exact value of variance from your calculator. The
two decimal places is only to show the solution.

𝜎 =√𝜎
20 -15 225 2

𝜎 = √ 291.67
𝜎 =17.08
Sample Variance
o The unbiased estimator of the population variance or the
sample variance is a statistic whose value approximates the
expected value of a population variance. It is denoted by
where

Alternative Formula:

(∑ 𝑋 )
2

∑ 2
𝑋 −
𝑛
𝑠2 =
𝑛−1
n
Sample Standard Deviation
o The sample standard deviation is the square root of the
sample variance.

√ ∑(𝑋−𝑋) 2

𝑠=√ 𝑠 =
2
𝑛−1
o Alternative Formula:


(∑ 𝑋 )
2

∑ 2
𝑋 −
𝑛
𝑠=
𝑛 −1
Sample Variance and Sample Standard
Deviation
Example: Find the variance and standard deviation for the
following sample: 16, 19, 15, 15, 14.

Solution:


2 2
( 𝑋 ) (79)
2
∑ 2
𝑋 −
𝑛
1263 − 𝑠=√ 3.7
5
𝑠 = =
𝑛−1 5 −𝑠=1.92
1
Sample Standard Deviation
o For ungrouped frequency distribution:

o Use the actual observed value.

(∑
2
𝑓 ∗ 𝑋)
∑(𝑓 ∗𝑋 2
) −[
𝑛
]
𝑠2 =
𝑛 −1
Sample Standard Deviation
o For ungrouped frequency distribution:

Example: Find the sample variance of the data set below:

5 2
6 3
7 8
8 1
9 6
10 4
Sample Standard Deviation
o For ungrouped frequency distribution:

Solution:

5 2 10 50
6 3 18 108
7 8 56 392
8 1 8 64
9 6 54 486
10 4 40 400
𝑛=24
Sample Standard Deviation
o For grouped frequency distribution:

o Use the class midpoints,


2
(∑ 𝑓 ∗ 𝑋 𝑚)
∑ ( 𝑓 ∗ 𝑋 ) −[
𝑚
2
𝑛
]
𝑠2 =
𝑛 −1
Sample Standard Deviation
o For grouped frequency distribution:

Example: Find the variance and standard deviation for the


FD:
Class Frequency
5.5 – 10.5 1
10.5 – 15.5 2
15.5 – 20.5 3
20.5 – 25.5 5
25.5 – 30.5 4
30.5 – 35.5 3
35.5 – 40.5 2
Sample Standard Deviation
o For grouped frequency distribution:

Solution:

( ∑ 𝑓 ∗ 𝑋 𝑚)
Class Frequenc 2
y
5.5 – 10.5
10.5 – 15.5 2
1 8
13
8
26
64
338 2
∑ ( 𝑓 ∗ 𝑋 ) −[
𝑚
2
𝑛
]
15.5 – 20.5 3 18 54 972 𝑠=
20.5 – 25.5 5 23 115 2645
𝑛−1
25.5 – 30.5 4 28 112 3136
𝑠=√ 68.68
30.5 – 35.5 3 33 99 3267
35.5 – 40.5 2 38 76 2888
𝑠=8.29
𝑛=20
Coefficient of Variation
o The coefficient of variation is defined to be the standard
deviation divided by the mean. The result is expressed as a
percentage.
Chebyshev’s Theorem
o The proportion of values from a data set that will fall within
k standard deviations of the mean will be at least , where
k is any number greater than 1.

o For k = 2, 75% of the values will lie within 2 standard


deviations of the mean. For k = 3, approximately 89% will
lie within 3 standard deviations.
Chebyshev’s Theorem
Example: The mean price of houses in a certain
neighborhood is $50,000, and the SD is $10,000. Find the
price range for which at least 75% of the houses will sell.

Solution:
For k= 2, 75% of the values will lie within 2
standard deviations of the mean

$50,000+2($10,000) = $70,000
$50,000-2($10,000) = $30,000

Therefore, according to Chebyshev’s Theorem, 75% of the


values will lie within the range of $30,000 to $70,000.
Empirical (Normal) Rule
For any bell-shaped distribution:

o Approximately 68% of the data values will fall within one


standard deviation of the mean.

o Approximately 95% will fall within two standard deviations


of the mean.

o Approximately 99.7% will fall within three standard


deviations of the mean.
Empirical (Normal) Rule
Example: The following data gives the scores on a statistics
exam taken by engineering students.
90, 91, 94, 83, 85, 85, 87, 88, 72, 74, 74, 75, 77, 77, 78, 60,
62, 63, 64, 66, 66, 52, 55, 55, 56, 58, 43, 46
The corresponding histogram is approximately normal. Use it
to assess the empirical rule.

Solution:
Note: Do not round off at early stage of the computation. Use the exact value from
your calculator. The two decimal places is only to show the solution.

o At one standard deviation, 68% fall within the range of

o At two standard deviation, 95% fall within the range of

You might also like