Describing The Data Using Numerical Measures
Describing The Data Using Numerical Measures
1. Compute the mean, median, mode, and weighted mean for a set of data and
understand what these values represent.
2. Compute the range, interquartile range, variance, standard deviation and know
what these values means.
3. Compute a z score and the coefficient of variation and understand how they
are applied in decision making situations.
4. Understand and apply some basic statistical and mathematical formulas in
analyzing numeric data behavior.
You could:
o You should:
Compute the summary mileage measures for the various tire brands
information
It can show us the center of data and degree of spread
o Parameter
A measure computed from the entire population
o Statistics
A measure computed from a sample that has been selected from a
population
The value of the statistic will depend on which sample is selected
1. Collect the data for the variable of interest for all items in the
population. The data must be quantitative.
= 121
IS 106 – QUANTITATIVE METHODS
Population Mean - Example
Parameter Measure
o The average for all values in the sample computed by dividing the
sum of all sample values by the sample size.
Extreme Value
Conclusion:
o With only one value in the sample changed, the mean is now substantially higher than before.
o Because the mean is affected by extreme values, it may be a misleading measure of data’s center.
1 1
𝑖= 𝑛
2
𝑖= (7)
2 𝑖=3.5
𝑖= 4
=4
o Symmetric Data
Datasets whose values are evenly spread around the center.
Right Skewed
Left Skewed
oManagement Salaries.
Mean
> Median
Right Skewed
IS 106 – QUANTITATIVE METHODS
Mode
o A data set may have more than one mode if multiple values tie for
the most frequently occurring values.
o In our example,
The modes are 2 and 4, the occurred 6 times.
PERCENTILES
Q U A RT I L E S
o The mean of data values that have been weighted according to their
relative importance.
o Consider this:
o The above formula is what we have used to compute for the mean of a sample.
o However, there will be times, when some other values are weighted more than
the other.
1. Collect the desired data and determine the weight to be assigned to each data
value.
oOne of the most common usage of weighted mean measure is in computing your
General Point Average (GPA).
1. Collect the desired data and determine the weight to be assigned to each data
value.
= 21.75
= 21
𝛴 𝑤 𝑖 𝑥𝑖 21.75
𝜇𝑤 = = = 1.0357=1.04
𝛴 𝑤𝑖 21
o Used to describe the location of the data in terms other than center of data.
o Definition:
the pth percentile in a data array is a value that divides the data set into two parts.
the lower segment contains at least p% and the upper segment contains at least 100
– p%
o Suppose:
You are enrolling in a university and took an entrance exam. You then received the result
saying that your score is at the 90th percentile. What does it mean?
• It means that you scored as high or higher than 90% of the other students.
1. Sort the data in order from the lowest to the highest value.
b) If i is an integer, the pth percentile is the average of the values at location index
positions i and i + 1
1. Sort the data in order from the lowest to the highest value.
o Therefor, the distance on the 80th percentile that will be subject to surcharge is :
20.5 + 21 = 20.75
o Quartiles in a data array are those values that divide the data set into four equal-
sized groups.
o Fourth Quartile
The rest
o Let’s say we’re only interested in the 3rd quartile of this data.
𝑃 75
ⅈ= ( 𝑛=
) ( 30 ) =22.5=23
100 100
o Therefor, the third quartile is the 23rd value from low end of the sorted data.
o A descriptive tool
that incorporates the median and the quartiles to graphically display data.
o Used to identify outliers that are usually small or large data values that lie
mostly by themselves.
o
o Definition:
• Limits are then located at a value that is 1.5 times the difference between and below and above
• The whiskers extend to the left to the lowest value within the limits and to the right to the
o
1. Sort the data values from low to high.
2. Calculate the 25th percentile (1st quartile), the 50th percentile(median), and 75th percentile(3rd
quartile).
3. Create a graph with the data values on the horizontal axis. Draw a box so the ends
correspond to and .
4. Draw a vertical line through the box at the median. Half the data values in the box will be on
either side of the median.
o
5. Use the interquartile range () to compute for upper and lower limits.
Lower Limit =
Upper Limit =
6. Draw the whiskers using dashed lines from each end of the box to the lowest and highest
value within the limits.
2. Calculate the 25th percentile (1st quartile), the 50th percentile (median), and
75th percentile (3rd quartile).
o
3. Create a graph and draw the box so the ends corresponds to and .
o
5. Compute for upper and lower limits.
Lower Limit =
Upper Limit =
Customers prefer a color somewhere between black and white but closer to white – meaningless.
o A set of data exhibits variation if all the data are not the same value.
o Some facts:
there will always be variation in everything made by humans or that occurs in nature –
could be small, but it is there.
It could be
Interquartile Range
Variance
Standard Deviation
1. Collect the data set and sort it from low to high value.
1. Collect the data set and sort it from low to high value
Plant A Plant B
15 23
20 24
25 25
30 26
35 27
o Plant B : 23 and 27
1. Because we only use the high and low values, it is very sensitive to extreme
values.
2. Regardless of how many data values are in the sample or population, the
range is computed from only two values.
o Definition:
A measure of variation that is determined by computing the difference between the third
and first quartiles.
𝐼𝑄𝑅=𝑇 h𝑖𝑟𝑑 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒−𝐹𝑖𝑟𝑠𝑡 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒
IS 106 – QUANTITATIVE METHODS
How to?
o
2. Compute the range.
o Note: The range is sensitive to extreme values and that made our range very large.
𝑃 25 𝑃 75
ⅈ= ( 𝑛 )= ( 100 ) =25 ⅈ= ( 𝑛 )= ( 100 ) =75
100 100 100 100
o Note: The IQR would be unchanged even if the values on the high or low end of the distribution were
even more extreme.
IS 106 – QUANTITATIVE METHODS
Variance and Standard Deviation
o Both range and interquartile range does not utilize all the data available in the
computation
Which means, there can be a lot of possible valuable information in the data that has been
ignored.
o Standard Deviation is used as the measure of spread when the mean is used to
calculate central location – thus it measures spread around the mean.
Stddev is small when the data is concentrated close to the mean
Stddev is large when the data is loosely spread out of the mean
o Population Variance
Average of the squared distances of the values from the mean.
o Let’s try computing for the variance and standard deviation of the BLC
scenario
o We’ll
use the original variance formula
250
2
𝜎 = =50
5
2
𝜎 =√ 𝜎
𝜎 =√ 50
products
2 2
𝜎 =√ 𝜎 𝜎 =√ 𝜎
𝜎 =√ 50
𝜎 =√ 2
products products
o We can see that Plant A has a standard deviation that is five times larger than Plant B’s.
o
1. Collect the quantitative data for the variable of interest for the entire population.
2. Use either the original or the shortcut formula to compute for the variance
3. (Assuming we use the shortcut) Find the sum of the x-values) and then square this sum .
1. Collect the quantitative data for the variable of interest for the entire
population.
2. Select a formula.
2
(∑ 𝑥)
2
𝛴𝑥 −
2 𝑁
𝜎 =
𝑁
o
2 1849
4. Square each x value and 𝛴𝑥 −
2 𝑁
𝜎 =
sum these squared values (). 𝑁
= 2.4082 weeks squared
2
𝜎 =√ 𝜎 =√ 2.4082=1.5518 𝑤𝑒𝑒𝑘𝑠
Thus, the standard deviation for the number of shipping weeks between
Vancouver and London for the seven shipments is 1.5518 weeks.
o There are slight differences with the formula for sample variance and standard
deviation compared to the one we used for the population
The notations
o
o The general reason for having (n-1) is that we want the average sample variance
to equal the population variation.
o Suppose we computed the variance of 5 samples from a population
o The average for these 5 sample variance () will be equal to the population variance IF
o If we use the n, we will get a value lesser than the population variance.
o
1. Select the sample and record the data for the variable of interest.
3. Compute .
4. (Assuming we use the original formula) Determine the sum of the squared deviations of each x value
from .
1. Select the sample and record the data for the variable of interest.
𝑛
2
∑ ( 𝑥𝑖 − 𝑥
´)
2 𝑖=1
𝑠 =
𝑛 −1
∑ 𝑥 30
´𝑥 = = =3.0
𝑛 10
2
𝑠= √ 𝑠 =6=2.4495
o Two of the most important measures that we’ve talked about is the
mean
standard deviation.
o Standard Deviation
the distribution with the largest CV is said to have the greatest relative
spread
o In finance, the CV can be used to measure the relative risk of a stock portfolio.
Standard deviation of 3%
Standard deviation of 2%
o
o We can compute the CV as follows:
o Even though portfolio B has a lower stddev, it is still riskier than A because it
has greater CV.
The Human Resource will collect the scores from the applicants who applied
using these two tests.
Assuming these data reflect the population of interest for the university, the population mean
can be computed using:
AIMS: BHS:
AIMS BHS
John’s Score of 2344 converts to Mary’s Score of 95 converts to
Conclusion:
1. Compared to the average score on the AIMS Hiring Test, John’s score is 1.72 standard deviations higher.
2. Compared to the average score on the BHS Hiring Test, Mary’s score is only 1.25 standard deviations
higher.
Therefor, even though the two tests used different scales, standardizing the data allowed us
to see that John scored relatively better than Mary.