0% found this document useful (0 votes)
12 views18 pages

MEFall2023 3

The document discusses measures of central tendency, including mean, median, and mode, which are used to summarize data with a single representative value. It provides definitions, formulas, and examples for calculating these measures, highlighting their sensitivity to outliers and their applicability to grouped data. Additionally, the document covers quantiles, particularly quartiles, which divide data into equal parts to provide insights into data distribution.

Uploaded by

Muhammad Ibrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views18 pages

MEFall2023 3

The document discusses measures of central tendency, including mean, median, and mode, which are used to summarize data with a single representative value. It provides definitions, formulas, and examples for calculating these measures, highlighting their sensitivity to outliers and their applicability to grouped data. Additionally, the document covers quantiles, particularly quartiles, which divide data into equal parts to provide insights into data distribution.

Uploaded by

Muhammad Ibrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Probability and Random Variables

The math, the computation, and examples.

Prof. Dr. Asad Ali

Department of Applied Mathematics and Statistics


Institute of Space Technology
Islamabad, Pakistan

1 / 55
Descriptive Statistics

Chapter 3: Descriptive Statistics


Measures of Central Tendency

39 / 55
Measures of Central Tendency

Instead of describing or presenting a whole data, sometimes we use one value to represent it. In this
way, it is easier to compare two or more data of the same type. A data can be summarized in a single
value, usually, somewhere in the center of the data. It’s the value at which the data have a tendency
to concentrate; the point at which the distribution is in balance. These measures are also called
averages, or measures of location or position.
The most commonly used measures of central tendency are mean, median and mode.
Mean: Aka arithmetic Mean, the mean is typically what is meant by the word average. It’s
perhaps the most common measure of central tendency. The mean of a data is given by
the sum of all data values
M ean =
the number of values
4+8+9 21
For example, the mean of 4, 8, and 9 is = = 7.
3 3
The sample mean is written as x̄, and the population mean as the Greek letter mu (µ). Despite its
popularity, the mean may not be an appropriate measure of central tendency in skewed distributions,
or in data with outliers.

40 / 55
Measures of Central Tendency

Definition
The sample mean x̄ of a set of n observations x1 , x2 , ..., xn is defined as,
Pn
x1 + x2 +, ..., +xn i=1 xi
x̄ = = i = 1, ..., n.
n n
P
You can ignore the limits (i = 1, n) of the summation symbol and can simply write it as .
For grouped data, arranged in frequency table, with k classes with midpoints x1 , x2 , ..., xk , and frequen-
cies f1 , f2 , ..., fk , the mean is given by,

f1 x1 + f2 x2 +, ..., +fk xk
x̄ =
f1 + f2 +, ..., +fk
Pk
fi xi
= Pi=1
k
i = 1, ..., k.
i=1 fi
Pk
Note that i=1 fi = n.

41 / 55
Measures of Central Tendency

Example 9.
In a mathematics test, the marks of six students are 36, 52, 57, 58, 67 and 90. Find the mean of their
marks.
Solution:
The mean is given by
P
xi
x̄ =
n
36 + 52 + 57 + 58 + 67 + 90
=
6
360
=
6
= 60 Marks

The arithmetic mean is quite sensitive to any change in a single value, that makes it an inappropriate
measure under certain circumstances. It gives good results when the observations are reasonably similar.
Its value can be greatly affected by the presences of a single outlier (extreme value observations). For
example, in the above example, if one data value, say 52, was mistakenly recorded as 352. The resulting
mean will then be 110 , which is quite different than the previous one, leading to a different decision
about the data.

42 / 55
Measures of Central Tendency

Example 10.
Find the mean for data in Example 2.
Solution:
Lets put the necessary columns in frequency table obtained in Example 2.
Classes Midpoints (x) f fx
360-369 364.5 2 729.0
370-379 374.5 3 1123.5
380-389 384.5 5 1922.5
390-399 394.5 7 2761.5
400-409 404.5 5 2022.5
410-419 414.5 4 1658.0
420-429 424.5 3 1273.5
430-439 434.5 1 434.5
P
– 30 11925
Thus the mean is P9
fi xi 11925
x̄ = Pi=1
9 = = 397.5
i=1 f i 30

43 / 55
Measures of Central Tendency

Median: The median is the middle value of a set of data when they are arranged in ascending
or descending order. It’s a value above and below which 50% of the ordered data lie. Median is
insensitive to outliers.

Definition
The sample median x̃ (∼ is pronounced as “tilde” (till-day)) of a set of n observations x1 , x2 , ..., xn is
obtained as,
Arrange the n observations in ascending (or descending) order.
¶ If n is an odd number, the median is the ( n+1
2
)th observation.
· If n is an even number, the median is the mean of the ( n
2
)th and and the ( n
2
+ 1)th (i.e. the two
middle) observations.
For grouped data the median is calculated by the following formula
h n 
x̃ = l + −C
f 2
where l is lower class boundary of the median class and C is the cumulative frequency of the preceding
class. Where median class is a class corresponding to n2 th observation. No need to worry about n
being odd or even.

44 / 55
Measures of Central Tendency

Example 11.
Consider the following 12 data points.
15.2 9.3 7.6 11.9 10.4 9.7
20.4 9.4 11.5 16.2 9.4 8.3

(a) Find the median for the full data.


(b) Omit the largest (or the smallest) observation and find the median again.
Solution:
Rearranging the values in ascending order. We have
7.6 8.3 9.3 9.4 9.4 9.7 10.4 11.5 11.9 15.2 16.2 20.4

(a) Here n = 12 is even and hence median is


  th 
n th n
x̃ = Mean of and +1 observations.
2 2
The two middle values are indicated by under-brace in the ordered data
7.6 8.3 9.3 9.4 9.4 9.7
| {z10.4} 11.5 11.9 15.2 16.2 20.4

9.7 + 10.4
x̃ = = 10.05
2

45 / 55
Measures of Central Tendency

(b) Lets omit the last observation 20.4 and calculate the median again. Now since n = 11 is odd, median
 th
n+1
is just the middle observation, that is the observation, which is 6th here.
2
7.6 8.3 9.3 9.4 9.4 9.7
|{z} 10.4 11.5 11.9 15.2 16.2

Thus the median is x̃ = 9.7 .


For n = 12, the mean is x̄ = 139.3
12
= 11.61. Now lets replace 20.4 by 50.0, we see that now mean is
x̄ = 168.9
12
= 14.07 but the median remains unchanged. Thus median is insensitive to outliers.

46 / 55
Measures of Central Tendency
Example 12.
Find the median for grouped data in Example 2.
Solution:
Lets put the necessary columns in frequency table obtained in Example 2. First we find the median
group with n2 = 30
2
= 15. We see that 15 is falls below the 17 of cf column, so 389.5 − 399.5 is our
target or median group where the median lies.
Classes CBs f cf
360-369 359.5-369.5 2 2
370-379 369.5-379.5 3 5
380-389 369.5-389.5 5 10 ←C
390-399 389.5-399.5 7 17 ← n2 = 30
2
= 15
400-409 399.5-409.5 5 22
410-419 409.5-419.5 4 26
420-429 419.5-429.5 3 29
430-439 429.5-439.5 1 30

Now we have l = 389.5, h = 10, f = 7, C = 10. Thus by putting the values in the formula we have,
h n  10
x̃ = l + − C = 389.5 + (15 − 10) = 396.64
f 2 7

47 / 55
Measures of Central Tendency
Mode: Mode is simply the most frequent observation in a data. A data can have more than one
mode; if it has two, it is said to be bimodal. When you have categorical data, or data that appears
as words instead of numbers, you need to use the mode. For example, if a sandwich shop sells 10
different types of sandwiches, the mode would represent the most popular sandwich. Mode is also
insensitive to outliers.
Example 13.
The mode of { 1, 1 , 2, 3, 5, 8} is 1.
|{z}
The modes of {1, 3, 5, 7, 9, 9 , 21, 25, 25, 31} are 9 and 25. Thus, the data is bimodal.
|{z} | {z }
Definition
For grouped data, the mode is calculated by the following formula.
fm − f1
Mode = l + ×h
2fm − f1 − f2
where
l is the lower class boundary of the modal class (the class with largest frequency)
fm the frequency of the modal class (the largest frequency),
f1 the frequency of the preceding class of the modal class,
f2 the frequency of the succeeding class of the modal class, and
h is the class interval.
48 / 55
Measures of Central Tendency

Example 14.
Find the mode for grouped data in Example 2.
Solution:
We know that the class with highest frequency is 389.5-399.5, so it’s our modal group.
Classes CBs f
360-369 359.5-369.5 2
370-379 369.5-379.5 3
380-389 369.5-389.5 5 ← f1
390-399 389.5-399.5 7 ← fm
400-409 399.5-409.5 5 ← f2
410-419 409.5-419.5 4
420-429 419.5-429.5 3
430-439 429.5-439.5 1

From the above table we have l = 389.5, h = 10, f1 = 5, fm = 7 and f2 = 5. So the mode is
fm − f1 7−5
Mode = l + × h = 389.5 + × 10 = 394.5
2fm − f1 − f2 2×7−5−5

49 / 55
Measures of Central Tendency

Quantiles: Quantiles are the kins of the median, as they represent equidistant points around the
median. The median divides an ordered data into two equals parts while quantiles divide it into
more than two parts that helps in indicating the extent to which the data lies near the median, or
near the extremes. Some of the commonly used quantiles are defined here.
Quartiles: Quartiles divide an ordered data set into four equal parts. The values that divide each
part are called the first, second, and third quartiles; and they are denoted by Q1 , Q2 , and Q3 ,
respectively.
Q1 is the “middle” value in the first half of the rank-ordered data set. That is below which 25% of
data lie.
Q2 is equal to the median value in the set.
Q3 is the “middle” value in the second half of the rank-ordered data set. That is below which 75% of
data lie.
Quartiles are calculated in the same manner as median except the multiplication by extra factors
of 1, 2, 3 for first, second and third quartiles respectively. Since Q2 = Median therefore we usually
calculate only the first and the third quartiles for a given data. Quartiles are also known as fourths
(Devore’s terminology). Q1 is called lower quartile or lower fourth and Q3 is known as upper quartile
or upper fourth.

50 / 55
Measures of Central Tendency

Definition
For a sample of size n, the j th (j = 1, 3) quartile is defined as,
First arrange the n observations in ascending (or descending) order. Then
 
n+1
Qj = j × th observations
4
where j = 1, 3.
 
n+1
for j = 1, Q1 = 1 × th observation
4

and
 
n+1
for j = 3, Q3 = 3 × th observation
4

If the result contains a fraction (because n is even), then the value is the mean between the values
at the index above and below.

51 / 55
Measures of Central Tendency

Example 15.
Find the two quartiles for the following ungrouped data.
65 55 89 56 35 14 56 55 87 45 92
Solution:
First arrange the values in ascending order,
14 35 45 55 55 56 56 65 87 89 92
We have,
(n + 1) (11 + 1) 12
Q1 = th = th = th = 3rd observation
4 4 4
Q1 = 45

and
3(n + 1) 3(11 + 1) 36
Q3 = th = th = th = 9th observation
4 4 4
Q3 = 87

Now, if n was 10, then the index of the 1st quartile is 2.5. The quartile is the average of the 2nd and
3rd value in the list.
52 / 55
Measures of Central Tendency

There are several other ways of finding quartiles but this one is simple. Different methods may give
different results.
For example, one method, which can be found in many texts, divides the ordered data into two
halves. The median of the first half is Q1 and that of the second half is Q3 . If n is odd than the
left over middle value of the entire data is included in both halves.
Now lets find the quartiles for the data in the previous example by this method.
The ordered data is a given as below.

Lower Half Q3 = 65+87


2
=76
z }| { z}|{
14 35 45 55 55 56 56 65 87 89 92
| {z } | {z }
Q1 = 45+55
2
=50 Upper Half

Since n is odd, therefore there is a single middle value: 56, we divide the above data into two halves
and include 56 in both sets. Both halves have an even number of data points. Using the method of
finding the median for even n, we get Q1 = 50 and Q3 = 76.
Deciles: Deciles divide a rank-ordered data set into ten equal parts. These are defined in the same
way as quartiles, except that now the divisor is 10 instead of 4 and j runs from 1 to 10. These are
denoted by D1 , D2 ,..., D10 .
Percentiles: Percentiles divide a rank-ordered data set into hundred equal parts. These are also
defined in the same way as quartiles, except that now the divisor is 100 instead of 4 and j runs from
1 to 100. These are denoted by P1 , P2 ,..., P100 .
53 / 55
Measures of Central Tendency

In reality, the quartiles and deciles are also percentiles. We can relate them as following.
P25 is equal to Q1 .
P50 is the median value in the set.
P75 is equal to Q3 .
P60 is equal to D6 and so on..
For grouped data (frequencies) one has to use the cumulative frequency, as was used in the calcu-
lation of median. Percentiles are useful for giving the relative standing of an individual observation
in a population, they are essentially the rank position of an individual observation.
For grouped data, we calculate the quartiles, deciles and percentiles using the same formula, with a
slight modification, as that for median. Thus,
Quartiles  
h jn
Qj = l + −C where j = 1, 2, 3
f 4
Deciles  
h jn
Dj = l + −C where j = 1, 2, ..., 10
f 10
and
Percentiles  
h jn
Pj = l + −C where j = 1, 2, ..., 100
f 100
Solve an exercise using these formulas.
54 / 55
Measures of Central Tendency

In J L Devore Book; the soft copy:


Have a look of Section 1.3 in Chapter 1.
Exercises Section 1.3
Questions: 30-37.
Note: Ignore the trimmed mean and trimmed median etc for now. However, the concept is way too
easy and if you would like it, I can explain them to you.

55 / 55

You might also like