Data Description
Data Description
Chapter Outline
• Measures of Central Tendency:
Summarize data using the measures of central tendency,
such as the Mean, Median, Mode, Geometric mean &
Harmonic mean.
• Measures of Variation & Position:
Describe data using the measures of variation, such as
the Range, Quartile deviation, Mean deviation, Variance
and Standard deviation. Identify the position of a data
value in a data set. Chebyshev’s Theorem. Moments.
Moment Ratios. Box-and-Whisker Plot.
• Shape:
Symmetric, Skewness & Kurtosis detecting outlier in data.
Positional
Mathematical
Average
(i) The Arithmetic Mean (ii) The Median
Average
(iv) Geometric Mean (iii) The Mode
(v) Harmonic Mean
The mean is the arithmetic average of a data set. This is found by
adding the numbers in a data set and dividing by how many numbers
there are. OR The sum of all the observations dividing by their
number of observations. It is abbreviated as A.M, denoted by X
that is
X =
X
Ungroup data ത
𝑋=
σ 𝑓𝑋
Group data
n σ𝑓
Example # 1
The marks obtained by 8 students are given below:
46, 32, 37, 46, 39, 36, 48, 36. Calculate the arithmetic mean.
46 + 32 + 37 + 46 + 39 + 36 + 48 + 36 320
X = = = 40 marks
8 8
2
Example # 2
Data given below, Calculate the mean.
9, 16, 9, 13, 15, 7, –12, 4, 11, – 8 .
9 + 16 + 9 + 13 + 15 + 7 – 12 + 4 + 11 – 8 64
𝑋ത = = = 6.4
10 10
Example # 3
Find the arithmetic mean from the following distribution.
Classes 0 – 5 6 – 11 12 – 17 18 – 23 24 – 29
Frequency 4 8 14 3 1
Classes f X fX
0–5 4 2.5 10
6 – 11 8 8.5 68
12 – 17 14 14.5 203
18 – 23 3 20.5 61.5
24 – 29 1 26.5 26.5
Total 30 – 369
− fX 369
X= = 30 = 12.30
f 3
Question # 1
Find the arithmetic mean of the set of numbers 45, 48, 50, 55, 40, 32, 41.
Question # 2 Ans: 44.43
Question # 3
Calculate the mean.
Classes 0 – 10 10 – 20 20 – 30 30 – 40
Frequency 10 20 40 30 Ans: 24
Question # 4
The ages of residents of a town are given below. Find Mean.
4
Age 29 – 33 34 – 38 39 – 43 44 – 48
Ans: 36.2
Frequency 11 12 2 5
Weighted Mean
Mean found by multiplying each value by its corresponding weight
and dividing by the sum of the weights. It is denoted by Xw.
Xw = wX
w
Example # 4: You are taking a class in which your
grade is determined from five sources: 50% from your
test mean, 15% from your midterm, 20% from your final
exam, 9% from your computer lab work and 6% from
your homework. Your scores are 86 (test mean), 96
(midterm), 82 (final exam), 98 (computer lab), and 100
(homework). What is the weighted mean of your scores?
Source X(score) w (weight) wX wx
Xw = w = 88.62 / 1.00
Test mean 86 0.50 43.0
Midterm 96 0.15 14.4 = 88.62
Final 82 0.20 16.4
Computer Lab 98 0.09 8.82
Homework 100 0.06 6.00 5
1.00 88.62
Example # 5 Find the weighted mean price of three models of cars
sold. The number and price of each model sold are shown in the
following list.
Model Number Price
A 8 $10,000
B 10 $12,000
C 12 $8,000
Solution
Number W Price X
WX
8 10000 80000
10 12000 120000
WX 296000
12 8000 96000 Xw= W
=
30
= 9866.67
30 – 296000
Question # 5
A recent survey of a new diet cola reported the following percentages of
people who liked the taste. Find the weighted mean of the percentages.
Area % Favoured Number surveyed
1 40 1000
2 30 3000 Ans: 35.42
3 50 800 6
Question # 6
In an advertisement, a retail store stated that its employees
averaged nine years of service. The distribution is shown here.
Number of employees Years of service
8 2
2 6
3 10
Calculate weighted mean for years of service.
Ans: 4.46
Question # 7
The costs of three models of helicopters are shown below. Find the
weighted mean of the costs of the models.
Model Number sold Cost
Sun scraper 9 $427,000
Sky coaster 6 $365,000
Highflyer 12 $725,000
Ans: $545666.67
7
Question # 8
For the month of April, a checking account has a balance
of $523 for 24 days, $2415 for 2 days, and $250 for 4 days.
What is the account’s mean daily balance for April? Ans: Σw = 30, Σwx = 18382, Mean = 612.7333
Question # 9
A student receives the following grades, with
an A+ worth 4 points, a B worth 3 points,
a C worth 2 points, and a D worth 1 point.
What is the student’s mean grade point score?
B in 2 three-credit classes D in 1 two-credit class
A+ in 1 four-credit class C in 1 three-credit class
Ans: Σw = 15, Σwx = 42, Mean = 2.8
Grades A+ B C D
X 4 3 2 1
w 4 6 3 2
Question # 10 wX 16 18 6 2
Question # 12
An instructor gives four 1-hour exams and one final exam, which counts as
two 1-hour exams. Find weighted mean of a student’s grade if she received
62, 83, 97, and 90 on the hour exams and 82 on the final exam. Ans: 82.67
Question # 13
Using the weighted mean, find the average number of grams of fat in meat
or fish a person would consume over a five-day period if he ate the
following:
Meat or Fish Fat (grams/oz)
3 oz. fried shrimp 3.33
3 oz. veal cutlet (broiled) 3.00
2 oz. roast beef (lean) 2.50
2.5 oz. fried chicken drumstick 4.40
4 oz. tuna (canned in oil) 1.75
Ans: 2.8959
9
Properties of the Arithmetic Mean.
(i) Sum of deviations taken from arithmetic mean is zero, i.e.
n
(X
i =1
i − X) = 0
Solution
− −
X X – X (X – X)2 X – A (X – A)2
40 – 26 676 – 20 400
50 – 16 256 – 10 100
60 –6 36 0 0
80 14 196 20 400
100 34 1156 40 1600
330 0 2320 - 2500
− X 330 n
X = n = 5 = 66. Hence (X i − X) = 0
i =l
Question # 14
Three teachers of statistics reported mean examination grades of 75,
82, and 85 in their classes, which consisted of 30, 25 and 17 students,
respectively. Determine the mean grade for all the classes. Ans: 79.083
12
The median is the middle number in a data set when the numbers
are listed in either ascending or descending order. OR Middle most value
of an array series is known as median. Median divides the data into two
equal parts. Fifty percent of the data falls below the median and fifty
percent falls above the median. It is denoted by X (tilde).
When the number of values is odd, the median is the middle
value and when the number of values is even, the median is the mean of
the two middle values.
Example # 9 The prices, in dollars for a sample of a room air conditioners
are listed. Find the median. 180, 201, 220, 191, 219, 209, 186
180 186 191 201 209 219 220
(IN ORDER) X = 201
Example # 10 The prices,= in
Median dollars for a sample of a room air
201
conditioners are listed. Find the median. 18, 24, 20, 35, 19, 23, 26, 18
18 18 19 20 23 24 26 35
20 + 23
X = 2
= 21.5
13
~ h n
X =l+ f 2 – C
l = lower class boundary of the median class, n = total frequency
h = size of class interval of median class
f = frequency of the median class
c = cumulative frequency of the class preceding the median class.
Example # 11
From the following frequency distribution find median.
Classes 100 – 149 150 – 199 200 – 249 250 – 299
Frequency 10 30 40 20
Hourly
40–60 60–80 80–100 100–120 120–140 140–160 160–180
wages (Rs)
No. of
13 23 101 182 105 19 7
employees
15
Question # 18
During a quality assurance check, the actual coffee
contents (in ounces) of six jars of instant coffee were
recorded as 6.03, 5.59, 6.40, 6.00, 5.99, and 6.02.
(a) Find the mean and the median of the coffee content.
(b) The third value was incorrectly measured and is
actually 6.04. Find the mean and median of the
coffee content again.
(c) Which measure of central tendency, the mean or the
median, was affected more by the data entry error?
Question # 19 Ans: (a) Mean = 6.005, Median = 6.01 (b) Mean = 5.945, Median = 6.01 (c) Mean
Quartiles
Not a Measure of Central Tendency. Split Ordered Data into 4 Quarters
Q1 Q2 Q3 17
Example # 13
From the data find the Q1, Q2, & Q3. 8, 6, 5, 4, 3, 9, 14, 16.
Q1 = 4.5 Q2 = 7 Q3 = 11.5
Example # 14
From the data find the Q1, Q2, & Q3. 2, 3, 5, 4, 6, 9, 7, 8, 10.
Solution
Data in Ordered Array: 2 3 4 5 6 7 8 9 10
Q2 = 6
Q1 = 3.5 Q3 = 8.5 18
Example # 15
From the data find the Q1, Q2, & Q3. 2, 3, 5, 4, 6, 9, 7, 8, 10,11.
Solution
Q2 = 6.5
Q1 = 4 Q3 = 9
Example # 16
From the data find the Q1, Q2, & Q3. 8, 3, 5, 4, 6, 7, 2.
Solution
Data in Ordered Array: 2 3 4 5 6 7 8
Q2 = 5
Q1 = 3 Q3 = 7
Example # 17
From the data find the Q1, Q2, & Q3. 3, 5, 4, 6, 7, 2.
Solution
Data in Ordered Array: 2 3 4 5 6 7
Q2 = 4.5 19
Q1 = 3 Q3 = 6
The mode is the value that occurs the most often in a data set. A
distribution having only one mode is called uni-modal distribution; a
distribution having two modes is called bio-modal distribution and a
distribution having more than two modes is called a multi-modal
distribution.
Example # 18 Identify the mode for each of the following lists of numbers.
(i) 2, 4, 5, 5, 5, 6, 9, 16, 22
(ii) 4, 9, 3, 5, 4, 6, 9, 3, 4, 11, 9
(iii) 18, 22, 9, 4, 6, 12
(i) Since 5 occurs maximum number of times so mode = 5
(ii) Since 4 and 9 occurs maximum but equal no of times. So, mode = 4 &
9. A distribution with two modes is called a bimodal distribution.
(iii) Here mode does not exist, because each number occurs only once.
fm – f1
Mode = l + ×h
(2fm – f1 – f2) l = lower class boundary of the modal class,
fm = frequency of the modal class or Maximum frequency.
f1 = frequency preceding to the modal class,
OR f2 = frequency following to the modal class,
fm – f1 h = class interval size of the modal class.
Mode = l + ×h
(fm – f1 ) + (fm – f2)
20
Example # 19
From the following frequency distribution find mode.
Classes 100 – 149 150 – 199 200 – 249 250 – 299
Frequency 10 30 40 20
21
Question # 20
Calculate the mode for the following frequency distribution. Ans: 75.82
Classes Frequency
11 – 13 1
13 – 15 4
15 – 17 6
17 – 19 10
19 – 21 7
21 – 23 2
Question # 22
The weights of the 40 students at a university are given in the following frequency
table.
Weight 118–126 127–135 136–144 145–153 154–162 163–171 172–180
Frequency 3 5 9 12 5 4 2
Calculate arithmetic mean, median, mode, Q1, Q3 & IQR.
Ans: 146.975, 146.75, 147.2, 137.5, 155.3, 17.8
Q1 = l + f 4 – C , Q3 = l + f 4 – C
h n h 3n
22
Empirical Relation between Mean, Median and Mode
For a moderately skewed distribution there exists an empirical
relationship among the mean, median and mode hence
Mean – Mode = 3(Mean – Median)
Mode = 3Median – 2Mean
Example # 20
In a symmetrical distribution, mean is 40, what is the median and mode?
Solution
In a symmetrical distribution the mean, median and mode coincide. Hence the
mean, median and mode are also 40.
If Mean = 75, Mode = 70, using empirical relation, find the value of Median.
Solution
From the empirical relation between the mean, median and mode.
Mode = 3Median – 2Mean
70 = 3Median – 2(75)
3Median = 70 + 150 = 220
220
Median = = 73.33
3
Question # 23
Find the mode from the following distribution.
Classes 0 – 5 6 – 11 12 – 17 18 – 23 24 – 29
Frequency 4 14 14 6 2 23
Ans: Mean = 12.7, Median = 12.36, Mode = 3Median – 2Mean = 11.68
Choosing between Mean, Median, and Mode
Example # 25
27
Question # 25
Which of the averages would be suitable for each of the following?
(i) Heights of students.
(ii) Dress and shoe sizes.
(iii) Half of the factory workers make more than $5.37 per hour and
half make less than $5.37 per hour.
(iv) Number of tomatoes on plants.
(v) Comparison of Intelligence.
(vi) Per capita income in Pakistan.
(vii) Marks obtained in any examination.
(viii) The distribution is open-ended.
(ix) The data is categorical.
(x) The average person cuts the lawn once a week.
(xi) The most common fear today is fear of speaking in public.
(xii) The average age of university professors is 42.3 years.
Ans: (i) Mean (ii) Mode (iii) Median (iv) Mode (v) Median or Mean (vi) Mean (vii) Mean or Median (viii) Median (ix) Mode
(x) Mode (xi) Mode (xii) Mean
28
Example # 26.
The data shown consist of the number of games
played each year in the career of Baseball Hall of
Farmer Bill Mazeroski. Check for outliers.
81 148 152 135 151 152 159 142 34 162 130
162 163 143 67 112 70.
Q1 Q3
96.5 155.5
Find the interquartile range (IQR) IQR = Q3 – Q1 = 155.5 – 96.5 = 59
Question # 27
Check each data set for outliers.
a. 16, 18, 22, 19, 3, 21, 17, 20 b. 24, 32, 54, 31, 16, 18, 19, 14, 17, 20
c. 321, 343, 350, 327, 200 d. 88, 72, 97. 84, 86, 85, 100
e. 145, 119, 122,118, 125, 116 f. 14, 16, 27, 18, 13, 19, 38, 15, 20
Ans: (a) Outlier 3 (b) 54 (c) No outlier (d) No outlier (e) 145 (f) 38
Geometric Mean
The geometric mean G.M, of a set of a positive values X1,X2,…,Xn is
defined as the possible nth root of their product i.e.
1
𝑛
G.M = 𝑋1 × 𝑋2 . . .× 𝑋𝑛 = 𝑋1 × 𝑋2 . . .× 𝑋𝑛 𝑛
log X
Or G.M = Anti log Ungroup data
n
σ 𝑓log𝑋
G.M = 𝐴𝑛𝑡𝑖log
σ𝑓 30
The geometric mean applies only to positive numbers. It is often used for
a set of numbers whose values are meant to be multiplied together or are
exponential in nature, such as data on the growth of the human population
or interest rates of a financial investment. It is also used in certain financial
and stock market indexes, such as Financial Time’s Value Line Geometric
index.
Example # 27
Find the G.M of the values 2, 9, 12.
1
GM = 2 9 12 = 216 = (216) = 6
3 3 3
Example # 28
Find the geometric mean of the values 7.96, 13.82, 24.14, 30.27, 37.44
X logX logX
G.M = Antilog n
7.96 0.9009
13.82 1.1405
6.4784
= Antilog 5
24.14 1.3827
30.27 1.4810
37.44 1.5733
Total 6.4784
= Antilog (1.29568) = 19.76
31
Example # 29
Find the geometric mean for the following distribution.
Classes 0 – 30 30 – 50 50 – 80 80 – 100
Frequency 20 30 40 10
Question # 29
Given the following frequency distribution of weights, calculate the
geometric mean. Ans: 117.7
f f
H.M = =
f 1
X f X
Harmonic means are often used in averaging things like rates (e.g. the
average travel speed given a duration of several trips). The weighted
harmonic mean is used in finance to average multiples like the price-
earnings ratio because it gives equal weight to each data point.
Example # 30
Find the H.M of the values 2, 6, and 8.
3 3 3
H.M = = 0.5 + 0.167 + 0.125 = 0.792 = 3.788
1 1 1
+ +
2 6 8 33
Example # 31
Find the harmonic mean for the following distribution.
Weights 20–40 41–61 62–82 83–103 104–124
Frequency 8 7 10 6 4
Weights 1 1
f X 𝑓
𝑋 𝑋
20 – 40 8 30 0.0333 0.2664
41 – 61 7 51 0.0196 0.1372
62 – 82 10 72 0.0139 0.1390
83 – 103 6 93 0.0108 0.0648 f 35
H.M = = = 54.47
104 – 124 4 114 0.0088 0.0352 1 0.6426
f
Total 35 – – 0.6426 X
Question # 30
Find the harmonic mean. 4, 5, 7, 9, 20. Ans: 6.67
34
Question # 31
Given the following frequency distribution, calculate the harmonic mean.
Hourly wages (Rs) 40 – 50 50 – 60 60 – 70 70 – 80 80 – 90 Ans: 63.05
Number of persons 4 8 16 8 4
Question # 32
Calculate geometric mean and harmonic mean from the following distribution.
X 37 42 47 52 57 62 67 Ans: 49.18, 48.47
f 15 13 17 29 11 10 5
Question # 33
Find average rate of increase of income, if the income of a worker
increase by 25% during 1st year and 40% during 2nd year and 50% during
3rd year. Ans: percentage increase 37.95%
Question # 34
Find average rate of increase in population, which in the first decade
increased 20%, in the next 25% and in the third 44%. Ans: percentage increase 29.27%
Question # 35
Find out the average rate of motion in the case of a person who rides the
first mile at the rate of 10 miles per hour, the next mile at the rate of 8 miles
per hour, and the third mile at the rate of 6 miles per hour. Ans: 7.66 m.p.h.
35
Question # 36
Find out the average speed of person who rides the first mile at the rate of 8
miles an hour, the next mile at the rate of 7.5 miles an hour, and the third mile
at the rate of 5.5 miles an hour. Ans: 6.8 m.p.h.
Question # 37
A man gets a rise of 20% in salary at the end of his first year of service and
further rises of 30% and 35% at the end of second and third years respectively;
the rise in each year being calculated on his salary at the beginning of the year.
To what average annual percentage increase is this equivalent? Ans: 28.18%
37
Relative Measures of Dispersion
When the dispersion is expressed in the form of the ratio, percentage or
coefficient and is free from any units of measurement is called relative
measures of dispersion. Each absolute measures of dispersion can be
converted into its relative measures. Relative dispersion is defined as:
Relative dispersion = Absolute dispersion /average. The main measures of
relative dispersions are:
(i) Coefficient of range or range coefficient of dispersion.
(ii) Coefficient of quartile deviation.
(iii) Coefficient of mean deviation or mean coefficient of dispersion.
(iv) Coefficient of variation.
Range
Range is defined as the difference between the largest and smallest
observations. Symbolically, the range is given by the relation: R = Xm – Xo
Where Xm denotes the largest observation and Xo denotes the smallest
observation in the data set.
In case of grouped data, the range is the difference between the upper
boundary of the highest class and the lower boundary of the lowest class.
38
Coefficient of Range
It is a relative measure of dispersion and is based on the value of range; it is
defined by the following relation. Xm – Xo
Coefficient of Range = X + X
m o
Example # 32
Weights of eight students are recorded below:
84, 87, 91, 67, 60, 74, 69, 84. Find the range and coefficient of range.
Example # 33
Find the range and coefficient of range from the following frequency
distribution.
Classes 2 – 4 4 – 6 6 – 8 8 – 10 10 – 12
Frequency 4 5 8 2 1
39
Classes f X Xm = 12, Xo = 2
2–4 4 3 R = Xm – Xo = 12 – 2 = 10
4–6 5 5
6–8 8 7 Xm – Xo 12 – 2
8 – 10 2 9 Coefficient of Range = = = 0.71 = 71%
Xm + Xo 12 + 2
10 – 12 1 11
42
Mean Deviation about
Types of Data
Mean Median
− f|X – Median|
Grouped data M.D = f|X – X| M.D =
f f
36 + 48 + 45 + 32 + 37 + 46 + 39 + 36 + 41 360
(i) 𝑋ത = = = 40
10 9
(ii) Data in an array: 32, 36, 36, 37, 39, 41, 45, 46, 48. Median = 39
X |X – 𝑋ത | |X – Median| −
|X – X|
36 4 3 (i) M.D =
48 8 9 n
45 5 6 40
= 9 = 4.44
32 8 7
37 3 2 M.D from Mean 4.44
Coefficient of M.D (mean) = = = 0.111 or 11.1%
46 6 7 Mean 40
39 1 0 |X – Median|
36 4 3
(ii) M.D =
n
41 1 2 39
= = 4.33
Total 40 39 9
M.D from Median 4.33
Coefficient of M.D (median) = = = 0.111 or 11.1% 44
Median 39
Example # 37
Compute mean deviation from mean and median from the following
frequency distribution. Also calculate the coefficient of mean deviation
from mean and median.
Daily wages (Rs) 200 – 250 250 – 300 300 – 350 350 – 400 400 – 450
No of persons 10 20 40 20 10
− fX 32500 h n
X= = 100 = 325, Median = l + f 2 – C = 325
f
~ ~
Daily wages f X fX C |X – 𝑋ത| f |X – 𝑋ത| |X – X| f |X – X|
200 – 250 10 225 2250 10 100 1000 100 1000
250 – 300 20 275 5500 30 50 1000 50 1000
300 – 350 40 325 13000 70 0 0 0 0
350 – 400 20 375 7500 90 50 1000 50 1000
400 – 450 10 425 4250 100 100 1000 100 1000
Total 100 – 32500 – – 4000 – 4000
−
f|X – X| 4000
M.D = = = 40
f 100
M.D from Mean 40
Coefficient of M.D (mean) = = = 0.123 or 12.3%
Mean 325
f|X – Median| 4000
M.D = = = 40
f 100
M.D from Median 40 45
Coefficient of M.D (median) = = = 0.123 or 12.3%
Median 325
Question # 39
Following are the wages of 8 workers of a factory. Find the range and the
coefficient of range. Wages (in Rs) 1400, 1450, 1520, 1380, 1485, 1495,
1575, 1440. Ans: 195, 6.6%
Question # 40
The following distribution gives the height distribution of 320 students.
Calculate its range and coefficient of range.
Height 50–52 53–55 56–58 59–61 62–64 65–67 68–70 Ans: 21, 17.5%
Frequency 2 20 72 65 83 70 8
Question # 41
Compute Quartile Deviation and coefficient of Quartile deviation from the
given data: 4, 4, 3, 6, 8, 7, 10, 14. Ans: 2.5, 19.23%
Question # 42
The following table shows the distribution of marks of students. Compute the
lower and the upper quartiles, quartile deviation, and coefficient of quartile
deviation. Marks 41–50 51–60 61–70 71–80 81–90 91–100
Frequency 30 36 43 104 73 14
Question # 43 Ans: 62.59, 82.17, 9.78, 13.5%
Find the mean deviation from the mean and median for the values
30, 36, 32, 33, 35, 39, 36.5, 35, 34. Also calculate the coefficient of mean
deviation from mean and median. 46
Ans: Mean = 34.5, Median = 35, M.D(mean) = 2.0, M.D(median) = 1.94, coefficient of M.D (mean) = 5.8%, coefficient of M.D (median) = 5.5%
Question # 44
Compute the mean deviation from mean and median from the data given
below. Also calculate the coefficient of mean deviation from mean and
median.
Classes 20–24 25–29 30–34 35–39 40–44 45–49 50–54
Frequency 1 4 8 11 15 9 2
Ans: Mean = 39, Median = 39.8, M.D(mean) = 5.72, M.D(median) = 5.69, coefficient of M.D (mean) = 14.7%, coefficient of M.D (median) = 14.3%
47
Dispersion
Variance Coefficient of
Standard Deviation
Variation
Range Population Population
Variance (s2) Standard
s
Deviation (s) CV = 100%
Sample X
Standard Sample
Variance (s2)
Error of Standard
Mean Deviation (s)
𝑠
ത =
𝑆𝐸(𝑋)
𝑛
48
Midrange
It is a rough estimate of the middle, defined as the sum of the lowest and
highest values in the data set divided by 2.
MR = (Lowest value + Highest value ) / 2
Variance
• Important Measure of Variation
Ungroup data
• Shows Variation About the Mean: Biased
(X − )
2
( −X) (
X 2 −
)
2 2
X 1 X
• For the Sample: s 2 = i S2 = Ungroup data
n −1 n −1 n Unbiased
(
fX 2 −
)
2
fX 1 σ 𝑓𝑋 2
1
S2 = 𝑆= 𝑓𝑋 2 − Group data
f − 1 f
σ𝑓 − 1 σ𝑓 Unbiased
R = xLargest − xSmallest
X X2 = 14.3 – 11.2 = 3.1
σ 𝑋2 σ𝑋 2
11.2 125.44 S2 = –
𝑛 𝑛
11.9 141.61
12.0 144.00 958.94 75.6 2
S2 = –
12.8 163.84 6 6
13.4 179.56
14.3 204.49 S2 = 1.063
Question # 46
The average score on a Marketing final examination was 80, with a
standard deviation of 9; the average score on a Finance final exam was
120, with a standard deviation of 18. Which class was more consistent?
Ans: Marketing class consistent (11.25%)
Question # 47
The average score on an English final examination was 85, with a
standard deviation of 5; the average score on an Accounting final
examination was 110, with a standard deviation of 8. Which class was
more variable? Ans: Accounting class is more variable.
Question # 48
A student calculated mean and standard deviation of 25 values as 20 and
4 respectively. Find the value of coefficient of variation. Ans: 20%
Question # 49
If X = 5.2, 4.4, 3.1. Find variance, mean & coefficient of variation.
52
Ans: 0.75, 4.23, 17.73%
Question # 50
Two candidates A and B at the BBA (Hons) Examination obtained the following
marks in ten papers. Which of the candidate showed a more consistent
performance? Ans: 𝑋ത = 62.7, S = 12.72, CV = 20.29%, 𝑋ത = 63.4, S = 17.10, CV = 26.97%. A is consistent
(A) (B)
Rate (X) 67 68 69 70 72 75
Frequency 1 1 3 5 8 2 53
Question # 53
Goals recorded by two teams A and B in a football season were as follows:
Number of goals scored in a match 0 1 2 3 4
Number of Team A 24 9 8 5 4
Matches Team B 17 9 6 5 3
By calculating the C.V. in each case, find which team may be considered as
more consistent.
Ans: Mean (A) = 1.12, S(A) = 1.32, CV(A) = 117.86% | Mean (B) = 1.2, S(B) = 1.31, CV(B) = 109.17% , Team B more consistent.
Question # 54
If X = 2, 3, 6, 8, 11. Find SD, mean & coefficient of variation.
Question # 55 Ans: 3.28, 6, 54.67%
or SD(X ± Y) = 55
Example # 39
The mean and standard deviation of a variable X are 80 and 2
respectively. Find the mean and variance of a new variable if
(i) All the values of X are increased by 20 points.
(ii) All the values of X are increased by 20%.
(iii) All the values of X are decreased by 0.05%.
Percentile Formula
The percentile corresponding to a given value (X) is computed by using
the following formula:
(number of values below X) + 0.5
Percentile = Χ 100%
Total number of values
58
Procedure for finding a data value corresponding to a given percentile.
1. Arrange the data in order from lowest to highest.
2. Substitute in the formula.
nΧp
C=
100
n = total number of values
p = percentile
59
Example # 41: A professor gives
a 100-point test to 6 students. The
scores are shown below. Find the
percentile rank of a score of 82.
88, 92, 86, 97, 78, 82.
What value corresponds to the
45th percentile.
Arrange the data in order from lowest to highest. 78, 82, 86, 88, 92, 97.
1+ 0.5
Then substitute in the formula. Percentile = Χ 100% = 25th percentile
6
Thus, a student whose score was 82 did better than 25% of the class.
For 45th percentile. Substitute in the formula 6 Χ 45
C= = 2.7 = 3
100
Start at the lowest value and count over to the third value, which is 86. Hence,
the value 86 corresponds to the 45th percentile.
60
Example # 42: A professor gives a 20-point test to10 students. The
scores are shown below. Find the value that corresponds to the 60th
percentile. 18, 15, 12, 6, 8, 2, 3, 5, 20, 10.
Question # 59
The number of credits in business courses eight job applicants had is
shown here. 9, I2, 15, 27, 33, 45, 63, 72.
(i) Find the percentile for each value.
(ii) What value corresponds to the 40th percentile?
Ans: (i) 6.25th, 18.75th, 31.25th , 43.75th, 56.25th, 68.75th, 81.25th, 93.75th (ii) X = 27
62
Empirical Rule for Normal Distributions
Approximately 68% of the data values fall within one standard deviation of the mean.
Approximately 95% of the data values fall within two standard deviations of the mean.
Approximately 99.7% of the data values fall within three standard deviations of the mean.
99.7%
34% 34%
13.5% 13.5%
2.35% 2.35%
–∞ x +∞
x – 3s x – 2s x – 1s x x + 1s x + 2s x + 3s
–3 –2 –1 0 +1 +2 +3
z
0.5 0.5
63
Standard Z Scores
A variable is defined to be standardized or in standard units if it is expressed
in terms of deviations from its mean and divided by its standard deviation. It
is denoted by Z. Its mean is zero, and standard deviation is one. The formula is
𝑋 − 𝑋ത For samples 𝑋−𝜇
✓ 𝑍=
𝑠
𝑍=
𝜎
For populations
The Z-values, being independent of the units of measurement, provide a basis for comparison
between individual values, even though they belong to different distributions. That is why they
are often used in psychological and education testing, where they are known as standard
scores. The negative numbers are avoided by multiplying the Z values by 10, an arbitrary SD.,
and adding 50, an arbitrary mean, to them. The values so obtained are called standard Z
scores. It is used to produce random Numbers that are Normally Distributed in SPSS. Thus, a
standard Z score is given by the relation
𝑋 − 𝑋ത
𝑍 = 50 + 10
𝑠
Chebyshev’s Theorem
The proportion of values from a data set that will fall within k standard
1
deviations of the mean will be at least (1 − k ) ; where k is a number
2
65
Example # 44.
Find the Z score for each test and state which is higher.
Test A x = 38 X = 40 s = 5
Test B x = 94 X = 100 s = 10
X−X 38 − 40
For test A, Z = = = − 0.4
s 5
X − X 94 − 100
For test B, Z= = = − 0.6
s 10
The Z score for test A is relatively higher than the score for test B.
Question # 60
Which of the following exam grades has a better relative position?
(i) A grade of 43 on a test with X = 40 and s = 3.
(ii) A grade of 75 on a test with X = 72 and s = 5. Ans: (i) 1 (ii) 0.6, (i) is higher
66
Question # 61
A student scores 60 on a mathematics test that has a mean of 54 and a
standard deviation of 3, and she scores 80 on a history test with a mean of
75 and a standard deviation of 2. On which test did she do better than the
rest of the class? Ans: Math = 2.0, History = 2.5, student did better in history.
Question # 62
Which score indicates the highest relative position?
(i) A score of 3.2 on a test with X = 4.6 and s = l.5.
(ii) A score of 630 on a test with X = 800 and s = 200.
(iii) A score of 43 on a test with X = 50 and s = 5.
Ans: (i) - 0.93 (ii) -0.85 (iii) - 1.4, part (ii) is highest
Question # 63
Which score has the highest relative position?
(i) A score of 12 on a test with X = 10 and s = 4.
(ii) A score of 170 on a test with X = 120 and s = 32.
(iii) A score of 180 on a test with X = 60 and s = 8.
Ans: (i) 0.5 (ii) 1.6 (iii) 15, part (iii) is highest
Question # 64
A set of data is mounded, with a mean of 450 and a variance of 625.
Approximately what proportion of the observations is
(i) Greater than 425?
(ii) Less than 500?
(iii) Greater than 525? Ans: (i) Approx 84% of the observations will be greater than 425. (ii) Approx 67
97.5% of the observations will be less than 500. (iii) Approx 0% of the observations will be greater than 525.
Example # 45.
The mean price of houses in a certain neighborhood is $50000, and the
standard deviation is $10000. Find the price range for which at least 75% of
the houses will sell.
X kS = $50000 2($10000)
$50000 − 2($10000) = $30000
$50000 + 2($10000) = $70000
Hence, at least 75% of all homes sold in the area will have a price
range from $30000 to $70000.
68
Example # 46.
A survey of local companies found that the mean amount of travel
allowance for executives was $0.25 per mile. The standard
deviation was $0.02. Using Chebyshev’s theorem, find the minimum
percentage of the data values that will fall between $0.20 and $0.30.
69
Question # 65
Using Chebyshev's theorem to approximate each of the following
observations if the mean is 250 and the standard deviation of 20.
Approximately what proportion of the observations is
(i) Between 190 and 310 (ii) Between 210 and 290
(iii) Between 230 and 270 (iv) Less than 215 and more than 285
Ans: (i) 88.89% (ii) 75% (iii) 0% (iv) 33%
Question # 66
The average cost of a certain type of grass seed is $4.00 per box. The
standard deviation is $0.10. Using Chebyshev’s theorem, find the minimum
percentage of data values that will fall in the range of $3.82 to $4.18.
Ans: 69%
Question # 67
Using Chebyshev's theorem, solve the following problems for a distribution
with a mean of 80 and a standard deviation of 10.
(i) At least what percentage of values will fall between 60 and 100?
(ii) At least what percentage of values will fall between 65 and 95?
Ans: (i) 75% (ii) 56%
Question # 68
The mean of a distribution is 20 and standard deviation is 2. Answer each
using Chebyshev's.
(i) At least what percentage of values will fall between 10 and 30?
70
(ii) At least what percentage of values will fall between 12 and 28?
Ans: (i) 96% (ii) 93.75%
Question # 69
The average delivery charge for a refrigerator is $32. The standard
deviation is $4. Find the minimum percentage of data values that will fall in
the range of $20 to $44. Use Chebyshev’s theorem. Ans: 88.89%
Question # 70
For a certain type of job, it costs a company an average of $231 to train an
employee to perform the task. The standard deviation is $5. Find the
minimum percentage of data values that will fall in the range of $219 to
$243. Use Chebyshev’s theorem. Ans: 83%
Question # 71
The average score on a special test of knowledge of wood refinishing has a
mean of 53 and a standard deviation of 6. Using Chebyshev’s theorem, find
the range of values in which at least 75% of the scores will lie.
Ans: 41 ~ 65
Question # 72
A survey of several leading brands of cereal shows that the mean content
of potassium per serving is 95 milligrams, and the standard deviation is 2
milligrams. Find the values in which at least 88.89% of the data will fall. Use
Chebyshev‘s theorem. Ans: 89 ~ 101
71
Question # 73
A sample of the hourly wages of employees who work in restaurants in a
large city has a mean of $5.02 and a standard deviation of $0.09. Using
Chebyshev’s theorem, find the range in which at least 75% of the data
values will fall. Ans: $4.84 ~ $5.20
Question # 74
A sample of the labour costs per hour to assemble a certain product has a
mean of $2.60 and a standard deviation of $0.15. Using Chebyshev’s
theorem, find the values in which at least 88.89% of the data will lie.
Ans: $2.15 ~ $3.05
Question # 75
In a distribution of 200 values, the mean is 50 and the standard deviation is
5. Answer each using Chebyshev's theorem.
(i) At least how many values will fall between 30 and 70?
(ii) At most how many values will be less than 40 or more than 60?
Ans: (i) 93.75% 188 (ii) 25% 50
Question # 76
In a distribution of 300 values, the mean is 50 and the standard deviation is
15. Answer each using Chebyshev's theorem.
(i) At least how many values will fall between 20 and 80?
(ii) At most how many values will be less than 30 or more than 70?
Ans: (i) 75% 225 (ii) 56.27% 169
72
Question # 77
A study of the nicotine contents of a certain brand of cigarette shows that
on the average one cigarette contains 1.52 milligrams of nicotine with a
standard deviation of 0.07 milligram. According to Chebyshev's theorem,
between what values must the nicotine content be for
(i) At least 24/25 of all cigarettes of this brand?
(ii) At least 48/49 of all cigarettes of this brand?
Ans: (i) 1.17 ~ 1.87 (ii) 1.03 ~ 2.01
Question # 78
Old Faithful is a famous geyser at Yellowstone National Park. From a
sample with n = 32, the mean duration of Old Faithful’s eruptions is 3.32
minutes, and the standard deviation is 1.09 minutes. Using Chebychev’s
Theorem, determine at least how many of the eruptions lasted between
1.14 minutes and 5.5 minutes.
Ans: k = 2, 75%, No of eruptions = 0.75 × 32 = 24
73
Moments
Moments tells us the power of the deviations to which they are raised before
finding their averages. These moments are also called the central moments
or the mean moments and are used to describe a set of data.
The first four moments about, the mean is defined as
Types of
Moments about Mean
Data
1 = 2 = 3 = 4 =
Ungrouped − − − −
(X − X) (X − X)2 (X − X)3 (X − X)4
Data n n n n
=0 = S2
1 = 2 = 3 = 4 =
Grouped − − 2 − 3 − 4
f(X − X) f(X − X) f(X − X) f(X − X)
Data f f f f
=0 = S2 74
Moment Ratios
There are some ratios in which both numerators and the denominators are
moments. The most common of these moment ratios are 1 and 2 defined by the
32
relation 1 = 3
2
4
2 = 2
2
For symmetrical distribution 1 = 0, 2 is used to explain the shape of the
curve and is measure of peakedness. 1 is known as moment coefficient of skewness
and 2 is known as moment coefficient of kurtosis.
75
Example # 47
Find the first four moments about mean for the values 2, 4, 6, 8, 10. Also
find 1 and 2. State whether the distribution is leptokurtic or platykurtic.
X X – 𝑋ത (X – 𝑋ത )2 (X – 𝑋ത )3 (X – 𝑋ത )4
2 –4 16 – 64 256
4 –2 4 –8 16
6 0 0 0 0
8 2 4 8 16
10 4 16 64 256
30 0 40 0 544
−
− X 30 (X − X) 32 (0)2
X= 1 = = 0, 1 = 3 = 3 = 0
n = 5 =6 n 2 (8)
−
2 =
(X − X)2 40 4 108.8
=
5 =8 2 = = = 1.7
n
22 (8)2
− Distribution is platykurtic.
(X − X)3 0
3 = =
n 5=0
−
(X − X)4 544
4 = = = 108.8
n 5 76
Example # 48
Find the first four moments about mean from the following distribution. Also
find 1 and 2. State whether the distribution is leptokurtic or platykurtic.
Classes 1–3 3–5 5–7 7–9
Frequency 40 30 20 10
Question # 80
Find the first four moments about mean from the following distribution. Also
find 1 and 2.
X 74.5 94.5 114.5 134.5 154.5 174.5 194.5 Ans: (i) 0, 1216, 23104, 3717632 (ii) 0.297, 2.51
f 9 10 17 10 5 4 5
Question # 81
The first four moments about, the arithmetic mean of a distribution are 0, 4,
6 and 48. Find 2. Ans: 3
Question # 82
The following information obtained from a frequency distribution of patients,
weights. f(X − 𝑋ഥ ) = 0, f(X − 𝑋ത )2 = 124, f(X − 𝑋ത )3 = 180 and total number
of patients = 48, Find 1. Ans: 0.819
78
Symmetrical Distribution.
A distribution is said to be symmetrical in which the data values are
uniformly distributed about its mean. In a symmetrical distribution, a
deviation below the mean is equal to the corresponding deviation above
the mean. In symmetrical distribution
(i) Mean = Median = Mode
(ii) Q3 – Median = Median – Q1
(iii) 3 = 0
3 2
(iv) 1 = 3 = 0 2 1 1 2
2 Mean
Skewness. Median
Mode
79
(a) Positive Skewness.
When a distribution departs from symmetry, the mean, median and mode
are pulled apart and one tail becomes longer than the other. If the
frequency curve has a longer tail to the right, as in Fig the distribution is
said to be positively skewed.
In positively skewed idstribution
(i) Mean > Meidan > Mode
(ii) Q3 – Median > Median – Q1
b) Negatively Skewness.
If the frequency has a longer tail to the left, as in Fig the distribution is
said to be negatively skewed.
In negatively skewed distribution.
(i) Mean < Median < Mode
(ii) Q3 – Median < Median – Q1
80
Distribution Shape
Symmetric or Skewed
4 6 8 10 12
Example # 49
Using box plots, compare the two distributions of the sodium contents of a
sample of real cheese and a cheese substitute.
Real Cheese Cheese Substitute
310 420 45 40 270 180 250 290
220 240 180 90 130 260 340 310
Real Cheese: Q1 = 67.5, Q2 = 200, Q3 = 275, Min = 40, Max = 420.
Cheese Substitute: Q1 = 215, Q2 = 265, Q3 = 300, Min = 130, Max = 340.
Cheese Substitute
Compare the plots. It is quite apparent
that the distribution for the cheese
Real Cheese substitute data has a higher median for
the distribution for the real cheese data.
The variance or spread for the
distribution of the real cheese data is
0 100 200 300 400 500
larger than the variation for the
Question # 83 distribution of the cheese substitute data.
The number of previous jobs held by each of six applicants is shown here.
2, 4, 5, 6, 8, 9. Construct a box plot and comment on the nature of the
distribution. Ans: Q = 4, Q = 5.5, Q = 8, Min = 2, Max = 9, positively skewed.
1 2 3
Question # 84
The number of credits in business courses eight job applicants had is shown
here. 9, I0, 15, 27, 44, 45, 53, 67.
Construct a boxplot and comment on the nature of the distribution.
Ans: Q1 = 12.5, Q2 = 35.5, Q3 = 49, Min = 9, Max = 67, negatively skewed.
4
Kurtosis. 2 = 2
2
It measures the peakedness of the distribution. Leptokurtic
Mean
(i) Leptokurtic.
The distribution is said to be Leptokurtic when 2 is greater than 3, the
curve is more sharply peaked and has wider tails than the normal curve.
(ii) Mesokurtic.
The distribution is said to be mesokurtic or normal when 2 = 3, the curve
is neither flat nor highly peaked.
(iii) Platykurtic.
The distribution is said to be Platykurtic when 2 is less than 3, the curve
has a flatter top and relatively narrower tails than the normal curve.
84
Coefficient of Skewness
Q 3 + Q1 - 2Median (Bowley’s coefficient of skewness)
Sk = Its value lies between ±1
Q 3 − Q1
Sk = Mean – Mode Its value lies between ± 1
(Karl Pearson coefficient of skewness)
S
3(Mean – Median) Its value lies between ± 3
Sk = (Karl Pearson coefficient of skewness)
S
Determining Normality
There are several ways for checking normality. The easiest way to draw
a histogram for the data and check its shape. If the histogram is not
approximately bell-shaped, then the data are not normally distributed.
Skewness can be checked by using Pearson’s Index of skewness (PI).
The formula is 3(X – Median)
PI = s
If the index is greater than or equal to +1 or less than or equal to –1, it
can be concluded that the data are significantly skewed. In addition, the
data should be checked for outliers. If there are two or more, then
85
reject normality.
Example # 50.
The data shown consist of the number of games
played each year in the career of Baseball Hall
of Farmer Bill Mazeroski. Check for normality.
81 148 152 135 151 152 159 142 34 162
130 162 163 143 67 112 70.
Mean Median
127.24 143
St Dev
38.68
Check for normality;
3(127.24 – 143)
PI = = –1.222
38.68
Since the PI is less than –1, it can be concluded that the distribution is
significantly skewed to the left. There is no outlier in the data. (already
solved in example # 26) 86
Standard Error of Skewness
Standard Error of Skewness.
The ratio of skewness to its standard
6 × 𝑛 × (𝑛 − 1) error can be used as a test of normality
Std. Error of Sk = (that is, you can reject normality if the
𝑛 − 2)(𝑛 + 1)(𝑛 + 3 ratio is less than -2 or greater than +2). A
large positive value for skewness
indicates a long-right tail; an extreme
negative value indicates a long-left tail.
89
Example # 52.
Annual salaries for a sample of five employees are
$39000 $37500 $35200 $40400 $100000
Describe the central tendency and symmetry of the data.
Question # 86
The cost per load (in cents) of 35 laundry detergents tested by a consumer
organization is shown below. Calculate the coefficient of skewness using
Karl Pearson’s Method, also interpret your result.
Class Limits 13 – 19 20 – 26 27 - 33 34 - 40 41 - 47 48 – 54 55 - 61 62 - 68
Frequency 2 7 12 5 6 1 0 2
Ans: Mean = 33.8, S = 11.6, Mode = 29.42, SK = 37.76%, positively skewed.
91
Question # 87
Draw an ogive and locate median, quartiles, D4, D7, P10, P90 & IQR
graphically
Classes 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99 100-109 110-119
f 1 4 17 28 25 18 13 6 5 2 1
The necessary calculations are given below, and graphic locations are in
Fig. 3.1
Class
Classes f C
Boundaries
10 – 19 1 9.5–19.5 1
20 – 29 4 19.5–29.5 5
30 – 39 17 29.5–39.5 22
40 – 49 28 39.5–49.5 50
50 – 59 25 49.5–59.5 75
60 – 69 18 59.5–69.5 93
70 – 79 13 69.5–79.5 106
80 – 89 6 79.5–89.5 112
90 – 99 5 89-5–99.5 117
100 – 109 2 99.5–109.5 119
110 – 119 1 109.5–119.5 120 92
The approximate values of median, quartiles D4, D7, P10, P90 & IQR can be
located from an ogive.
Fig. 3.1
The necessary calculations are given below, and graphic location is in fig 3.2
Class
Classes f
Boundaries
7 — 11 2 6.5 — 11.5
12 — 16 6 11.5 — 16.5
17 — 21 11 16.5 — 21.5
22 — 26 18 21.5 — 26.5
27 — 31 22 26.5 — 31.5
32 — 36 17 31.5 — 36.5
37 — 41 10 36.5 — 41.5
42 — 46 7 41.5 — 46.5
47 — 51 5 46.5 — 51.5
94
Fig 3.2
95
Question # 89
96
Question # 90
Ans: S = 0.831
97
Solution: X: 1 2 3 4, f: 5 15 12 3, ΣfX = 8 , ΣfX2 = 221, Σf = 35, S = 0.831
Question # 91
Circle the correct option i.e. A / B / C / D.
(1) If any value in a series is zero, then we cannot calculate the:
(A) Mean (B) Geometric mean
(C) Mode (D) Median
(2) The empirical relationship between mean, median and mode: the value of mode =
(A) 3 Mean – 2 Median (B) 2 Mean – 3 Median
(C) 3 Median – 2 Mean (D) 2 Median – 3 Mean
(3) If 10% is added to each value of variable, the GM of new variable is added by:
(A) 10 (B) 0.01
(C) 10% (D) 0.11
(4) If mean of 5 values is 10, then the sum of the values will be:
(A) 2 (B) 15
(C) 25 (D) 50
(5) If the arithmetic mean of the two numbers X1 and X2 is 5 if X1 = 3, then X2 is:
(A) 3 (B) 5
(C) 7 (D) 10
(6) In a moderately skewed distribution, the mean is 11 and the median is 13 then the
value of mode is:
(A) 15 (B) 13
(C) 11 (D) 17
(7) The lack of uniformity or symmetry is called:
(A) Skewness (B) Dispersion
(C) Kurtosis (D) Standard deviation
98
Answer: 1. B 2. C 3. C 4. D 5. C 6. D 7. A
CRITICAL THINKING PROBLEM
( No # 1 )
SKILL CHECK
An Internet site compares the strokes
per round of two professional golfers.
Which golfer is more consistent: Player
A with = 71.5 strokes and s = 2.3
strokes, or Player B with = 70.1
strokes and s = 1.2 strokes? Explain.
Answer: Player B
100