0% found this document useful (0 votes)
7 views20 pages

Lec 3

The document outlines a course on Statistics, focusing on measures of central tendency including arithmetic mean, median, and mode. It explains how to calculate these measures for both raw and grouped data, along with their properties and when to use each measure. Additionally, it includes exercises for practical application of the concepts discussed.

Uploaded by

Hassan Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views20 pages

Lec 3

The document outlines a course on Statistics, focusing on measures of central tendency including arithmetic mean, median, and mode. It explains how to calculate these measures for both raw and grouped data, along with their properties and when to use each measure. Additionally, it includes exercises for practical application of the concepts discussed.

Uploaded by

Hassan Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

COURSE TITLE: Statistics

COURSE CODE: BSLP 2407

Instructor: Dr. Md. Ershadul Haque

Associate Professor

Department of Statistics, DU
Describing data: Measures of Central Tendency

• Goals:

✓ Understanding and finding the arithmetic mean, weighted mean, median, mode, and
geometric mean.

✓ Explain the characteristics, uses, advantages and disadvantages of each measures of


central tendency.
Measures of Central Tendency
• In a data set, the values have a tendency to cluster around a certain point. This tendency of
clustering the values around the center of the series is usually called central tendency.

• The numerical measure of this tendency of concentration is variously known as the measure
of central tendency or measure of location or the measure of average.

• Different measures of central tendency

✓ Arithmetic mean

✓ Median

✓ Mode
Mean
• The most popular and best-understood measure of central tendency for a quantitative data
set is the arithmetic mean (mean) of a data set.

• The mean for a sampled data is denoted by 𝑥ҧ , it is read “𝑥 bar”.

• Suppose there are 𝑛 values 𝑥1 , 𝑥2 , … , 𝑥𝑛 for 𝑥, then the 𝑥ҧ is defined as

𝑥1 + 𝑥2 + 𝑥3 + ⋯ + ⋯ + 𝑥𝑛 ∑𝑥
𝑥ത = =
𝑛 𝑛

• Example: Let us consider a hypothetical data of a sample of 9 subjects that contains scores
on the Peabody Picture Vocabulary Test-Revised (PPVT-R). The data are as follows:

115, 105, 110, 95, 89, 126, 77, 100, 90


Mean
• The mean (arithmetic mean) is calculated as

∑𝑥 115 + 105 + ⋯ + 90 907


𝑥ത = = = = 100.78
𝑛 9 9

• Thus the mean PPVT-R scores of the sample is 100.78.

• Arithmetic mean (AM) for grouped data:


Mid-values: 𝑥1, 𝑥2, … , 𝑥𝑘
Frequencies: 𝑓1 , 𝑓2, … , 𝑓𝑘
∑ 𝑓𝑥
• Then the 𝑥ҧ for grouped data is defined as 𝑥ҧ = , where 𝑛 = ∑ 𝑓.
𝑛
Mean (cont…)
• The following frequency distribution shows the time taken to utter a particular sentence that are
measured for 100 speakers
Table: Frequency distribution of times taken to utter a sentence
Time (sec) Frequency
3.1-3.5 5
3.6-4.0 18
4.1-4.5 25
4.6-5.0 27
5.1-5.5 20
5.6-6.0 5
• To compute the mean (arithmetic mean), we have constructed the following table
Time (sec) Mid-point (𝑥𝑖 ) Frequency (𝑓𝑖 ) 𝑓𝑖 𝑥𝑖
3.1-3.5 3.3 5 16.5
3.6-4.0 3.8 18 68.4
4.1-4.5 4.3 25 107.5
4.6-5.0 4.8 27 129.6
5.1-5.5 5.3 20 106.0
5.6-6.0 5.8 5 29.0
Total - 𝑛 = 100 ∑6𝑖=1 𝑓𝑖 𝑥𝑖 =457.0
Mean (cont…)
• The mean (arithmetic mean) is calculated as
∑ 𝑓𝑥 457.0
𝑥ҧ = = = 4.57 sec
𝑛 100

• Thus the mean times taken to utter a sentence is 4.57 sec.

• Properties of Arithmetic Mean


✓ Every set of interval or ratio level data has a mean. That is, mean can be computed only for
quantitative variable.

✓ All the values are included in computing the mean.

✓ The mean is unique.

✓ The sum of the deviations of each value from the mean is zero. Symbolically, ∑𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ = 0

✓ If a set consists of 𝑛1 observations of the form 𝑥11 ,𝑥12 , … , 𝑥1𝑛1 with mean 𝑥ҧ1 and a second set consists
of 𝑛2 observations of the form 𝑥21 ,𝑥22 , … , 𝑥2𝑛2 with mean 𝑥ҧ 2 , then the mean of all the 𝑛1 + 𝑛2
𝑛1 𝑥ҧ1 +𝑛2 𝑥ҧ 2
observations called combined mean or pooled mean, is given by 𝑥ҧ 𝑐 =
𝑛1 +𝑛2
Median
• Median is the middle most value when the observations or a set of values are arranged in
ascending (or descending) order of magnitude.
✓ the number of observations below the position corresponding to median should be equal to the
number of observations above the position.

• Median for raw data: Let us consider 𝑛 observations on a variable. At first we have to
arrange the observations in ascending/descending order of magnitude and then identify
whether 𝑛 is even or odd

𝑛+1 𝑡ℎ
✓ If 𝑛 is odd: Median = observation.
2

𝑛 𝑡ℎ 𝑛 𝑡ℎ
✓ If 𝑛 is even: Median = Mean of 2
observation and 2
+ 1 observation
Median
• Find median for the data: 12, 7, 2, 34, 17, 21 and 19
✓ arrange the values in ascending order 2, 7, 12, 17, 19, 21, 34

✓ Count the total number of elements, Here n= 7, 7 is an odd number

𝑛+1 𝑡ℎ
✓ Median = observation = 4𝑡ℎ observation = 17
2

• Find median for the data: 12, 7, 2, 34, 17, 40, 21 and 18
✓ arrange the values in ascending order 2, 7, 12, 17, 18, 21, 34, 40

✓ Count the total number of elements, Here n= 8, 8 is an even number

𝑛 𝑡ℎ 𝑛 𝑡ℎ
𝑣𝑎𝑙𝑢𝑒+ +1 𝑣𝑎𝑙𝑢𝑒 4𝑡ℎ 𝑣𝑎𝑙𝑢𝑒+5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 17+18
✓ Median = 2
2
2
= 2
= 2
= 17.5
Median
• Median for group data: To calculated median for grouped frequency distribution, we regard
the frequencies are evenly spread over the class intervals and the class intervals are formed
so that there could be no gaps in the intervals. For example 1-5, 6-10, 11-15 should be
replaced by 0.5-5.5, 5.5-10.5, 10.5-15.5. Then median is calculated as follows

𝑛Τ2−𝐹
Median = 𝐿 + ℎ
𝑓𝑚

✓ L = lower boundary of the median class.

✓ F = Cumulative frequency of pre-median class.

✓ 𝑓𝑚 = frequency of the median class.

✓ h = width of median class =(upper boundary – lower boundary) of median class.

𝑛 𝑡ℎ
✓ Median Class is the class that contains 2
observation of the given data.
Median
• Calculate the mean age (in years) of workers from the following data
Age Frequency
11-20 5
21-30 15
31-40 50
41-50 45
51-60 35

• To compute the median age of workers, we have constructed the following table

• The median class is that contains the 75th value


75−70
Median = 40.5 + 45
× 10 = 40.5 + 1.11 = 41.61
Mode
• Mode: The mode is simply that value which has the highest frequency.

• The scores obtained by 5 students in a statistics test are 10, 7, 7, 7, and 0. The value “7”
has the highest frequency, therefore the mode is “7”

• Find the measure of central tendency from the following frequency distribution showing the
opinion of DU students regarding their curriculum load.
Laboratory service is excellent Frequency
Strongly agree 16
Agree 22
Undecided 33
Disagree 178
Strongly disagree 118

The category “disagree” has the highest frequency, therefore the mode is “disagree”
Mode for grouped data
• For grouped data mode is obtained by using the following formula

(𝑓0 − 𝑓−1)
𝑀𝑜 = 𝐿 + ℎ
𝑓0 − 𝑓−1 + (𝑓0 − 𝑓1 )

✓ L = lower boundary of the modal class.

✓ 𝑓−1 = frequency of pre-modal class.

✓ 𝑓0 = frequency of the modal class.

✓ 𝑓1 = frequency of post-modal class.

✓ h = width of modal class

✓ The class that contains highest frequency is the modal class.


Mode for grouped data (cont…)
• Calculate the modal age (in years) of workers from the following data
Age Frequency
11-20 5
21-30 15
31-40 50
41-50 45
51-60 35

• The class boundaries of modal class is 30.5-40.5, the highest frequency belongs to this class.

50−15
• Mode = 30.5 + × 10 = 30.5 + 8.75 = 39.25
50−15 +(50−45)
Choosing measures of central tendency
• The mean is only suitable for only ratio or interval data. For this type of data, the median is
used as a measure of central tendency if some unusual values arise.

• The mode may be the only measure available where it is not possible to do arithmetic
operation on the data, as in the case of qualitative (nominal/ordinal) variable.

• In the following cases arithmetic mean should not be used:

✓ When there are very large and very small values of observations (median can be used)

✓ In distributions with open-end class (median can be used)

✓ When the distribution is unevenly spread and the concentration being small or large at
irregular points (see Figure-2). (median can be used).

✓ When the variable under study is qualitative.


Choosing measures of central tendency (cont…)
• The mean is only suitable for only ratio or interval data. For this type of data, the median is
used as a measure of central tendency if some unusual values arise.

Figure 1: Bell-shaped distribution

Figure 2: Skewed distribution


Exercise
• Problem#01: In an investigation certain linguistic features on the politeness of a sentence in
a particular social context is carried out. 25 informants are asked to rate the sentence on a
scale from 1 (very impolite) to 5 (very polite), with the following results:
3, 2, 4, 3, 1, 4, 1, 3, 5, 3, 2, 4, 1, 4, 3, 2, 1, 5, 2, 3, 3, 2, 1, 1, 3
Decide on the most appropriate measure of central tendency, and calculate a value for it.

• Problem#02: The following data, representing pause length (milliseconds) at a particular


point of a sentence read by a sample of 16 subjects:
24, 22, 13, 21, 16, 21, 17, 23, 20, 25, 22, 14, 29, 17, 14, 20
Calculate mean , median , mode.
Exercise (cont…)
• Problem#03: Suppose a researcher is interested in comparing the performance of five
group of subjects (I, II, III, IV, V) on a test of auditory sequential memory (ASM). The
following scores (higher scores indicate better memory performance) were obtained from a
sample of subjects

I: 7, 5, 4, 4

II: 6, 8, 10, 4, 9, 5, 8, 6

III: 3, 7, 9, 5, 6

IV: 9, 10, 8

V: 2, 6, 5, 4
Which group has the least memory performance? Which group has the best memory
performance?
Exercise (cont…)
• Problem#04: Let us consider a hypothetical data of a sample of 30 subjects that contains
scores on the Peabody Picture Vocabulary Test-Revised (PPVT-R). The following frequency
distribution was constructed based on PPVT-R scores of the sampled data.

Table: Frequency distribution of PPVT-R scores


Score Frequency
65-74 3
75-84 8
85-94 10
95-104 5
105-114 3
115-124 1

Calculate mean and median, which one is the appropriate measure of central tendency for this
data set? Why?
Thank You

You might also like