0% found this document useful (0 votes)
40 views41 pages

Stat Chapter 3

Chapter three discusses numerical descriptive measures, focusing on central tendency, variation, and position of data values. It explains measures such as mean, median, mode, and midrange for both grouped and ungrouped data, along with their calculations and examples. Additionally, it covers measures of dispersion, highlighting the importance of understanding data spread around an average value.

Uploaded by

Tesfisha Altaseb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views41 pages

Stat Chapter 3

Chapter three discusses numerical descriptive measures, focusing on central tendency, variation, and position of data values. It explains measures such as mean, median, mode, and midrange for both grouped and ungrouped data, along with their calculations and examples. Additionally, it covers measures of dispersion, highlighting the importance of understanding data spread around an average value.

Uploaded by

Tesfisha Altaseb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Chapter three

Numerical Descriptive Measures

Objectives

 Describe data using measures of central tendency, such as


the mean, median, mode, and midrange.
 Summarize data using measures of variation, such as the
range, variance, and standard deviation.
 Determine the position of a data value in a data set using
various measures of position, such as percentiles, deciles
and quartiles.
A. Measure of central
tendency
A measure of central tendency is very important
tool that refer to the center of a histogram or a
frequency distribution curve.
Such measures are the mean, the median, and the
mode for the two cases (grouped and ungrouped
data sets).
The mean
◦ The most commonly used measure of central tendency is
called mean (or the average).
• Also known as arithmetic average: it is the most common
measure.
• Calculated by adding all the values in the group & then
dividing by the number of values.
• Helps to summarizing the essential features and enables
comparison.

Cont…
 Mean is the sum of the values divided by the
number of values. The mean of a set of
numbers x1, x2... xn is typically denoted by " ".
 This mean is a type of arithmetic mean.
 Itis the "standard" average, often simply
called the "mean".
The mean for an ungrouped data is obtained
by dividing the sum of all values by the
number of values in that data set.
Cont…
The Mean for Ungrouped Data
calculated as

Mean for population data:


Mean for sample data: x̄ =
Example; Find the mean score of 10
students in a midterm exam in a
class if their scores are
Cont…
25 27 30 23 16 27 29 14
20 28
=
Example2. According to example 1,
if we take a sample of 4 students
from the class and find their scores
to be: 23, 27, 16, and 29. Find the
mean of this scores.
 x̄ =
 x̄==95/4=23.75
ii. Weighted Mean

 If 𝑥1 , 𝑥2 , … , 𝑥𝑛represent values of the and 𝑤1 , 𝑤2 , … , 𝑤𝑛 the


itemsthen the weighted mean, (𝑥ҧ𝑤 ) is
arecorresponding weights,
given by
w
xw  1 1x  w x
2 2  w x
n n 
wi xi 
w1  w2  wn wi 
Example: A student’s final mark in Mathematics, Physics, Chemistry and Biology are A,
B, D and C respectively. If the respective credits received for these courses are 4, 4, 3
and 2, determine the approximate average mark the student has got for the course.
Solution:

𝑥𝑖 4 3 1 2
𝑤𝑖 4 4 3 2
𝑥 𝑖 𝑤𝑖 16 12 3 4

x w  w1 x1  w2 x2  wn xn  w x i i

𝟏𝟔+𝟏𝟐+𝟑+𝟒
w1  w2  wn w i

= 𝟑𝟓 𝟏𝟑 = = 2.69. That is, Average mark of the student is


𝟏𝟑 2.69.
iii. Combined mean
 When a set of observations is divided into k groups and x̄ 1𝑛1 is the mean
of n1
& group 1, x̄ 2𝑛2 is the mean of n2 & group2, …, x̄ k𝑛k is the mean of nk &
group k, then the combined mean, denoted by x̄c, of all

x̄ 1𝑛1 + x̄ 2𝑛2 + ⋯ + x̄ 𝑘 𝑛𝑘
observations taken together is given by

𝑛1 + 𝑛 2 + ⋯
X̄c =
+ 𝑛𝑘
Example: There are two classes, Class A and Class B. Class A has 30 students
with an average score of 70 on a test. Class B has 20 students with an average
score of 80. What is the combined average score for both classes?
Solution:

X̄c = =3700/50 = 74.

The combined mean of the entire students will be


74.
Note:

If a constant c is added to or subtracted from every value in the data set, the
mean increases or decreases by that constant:

 New Mean=Old Mean + c, for added;

 New Mean=Old Mean - c, for subtracted

If each value in the data set is multiplied by a constant k, the mean is also
multiplied by k: New Mean=k × Old Mean.

Question 1: If the mean of a data set is 50, what will the new mean be if a constant
value of 5 is added to every value in the data set?
Given mean = 50 and constant = 5; New mean = 50 + 5 = 55.
The mid range

The midrange (MR) is defined as the sum of the lowest and highest
values in the data set divided by 2.
MR = Lowest value + Highest value
2
Example: Find the midrange (MR) for the following data:
11, 13, 20, 30, 9, 4, 15
Solution: The lowest value is 4, and the highest value is 30, then

MR = 4 + 30 = 34 = 17
2 2
Note that, this measure (MR) is weak as a measure of central ten-
dency since it is depends only on two values among of all values in
the data set.
Mean for Grouped data
 If data are given in the form of continuous frequency distribution,
the
sample𝑘mean can be computed as
= , 𝑥𝑖 𝑓𝑖 - is the product of mid-
σ𝑖=1 𝑥 𝑖 𝑓 𝑖 𝑥 1 𝑓1 +𝑥 2 𝑓2 + …
𝑘
σ𝑖=1+𝑥
�𝑖 𝑘 𝑓 𝑘 𝑓1 +𝑓2 + …+

𝑓𝑘
= freq. � point &

Class boundary 60-62 62-64 64-66 66-68 68-70 70-72 Total

Frequency (fi) 5 18 42 20 8 7 100

xi 61 63 65 67 69 71

xifi 305 1134 2730 1340 552 497


Solution:
σ𝑖=1 f i𝑖
𝑘 𝑥𝑖
The formula to be used for the
σ𝑖=1

𝑓
mean is as follows:
𝑘
σ𝑖=1 f i𝑖 655
=
𝑘 𝑥 8100
σ𝑖=1 𝑖
x̄ = x̄ = 65.58.
𝑓
 Median

• Is the value of the middle item of series


when it is arranged in ascending or
descending order.
• It divides the series into two half.
• It is positional average.
• It is the middle value of the distribution when
all items are arranged in either
th ascending or
 n  1
descending order  terms ofvalue
Med in value.
 Where n is odd  2 

03/08/2025 By: Menberu T. 1


2
Cont…
 Example: Find the median for the data set:

312, 257, 421, 289, 526, 374, 497


 Solution: First, the data set after we have ranked in
increasing order is:

x1 x2 x3 x4 x5 x6 x7

257 289 312 374 421 497 526

Median=374
 Since there are 7 values in this data set, so the fourth
term a 7+ 1 = 4k in the ranked data is the median.
Therefore the median is
 median = ( )th item= = 4th item = 374
Cont…
 Median of Even Numbers
 Step 1: Arrange the data either in ascending or in descending
order.
 Step 2: If the number of observations (say n) are even, then
identify (n/2)th and [(n/2) + 1]th observations.
 Step 3: The average of the above two observations (which
are identified in step 2) is the median of the given data.
Cont…
Example: Find the median for the data
set:
8, 12, 7, 17, 14, 45, 10, 13, 17, 13, 9, 11

x Solution:
x1 x x2 x 3
First,
x x x we
4 x x rank
5 x x6
the data in
7 8 9 10 11 12

7
increasing order:
8 9 10 11 12 13 13 14 17 17 45

Since there are 12 values in this data


set, the median is given by the average
of the two middle values whose ranks
are
Median for grouped data
 For grouped data, the median is
obtained
by the following formula.
Median=L+()h
Where L= lower limit of the median class
n= number of observation
f=frequency of the median class
cf=cumulative frequency of the class
preceding the median class
h=class width
~ Example: Water percentage in the body of species of Fish is given below.
x Calculate the median.
Class interval 15-24 25-34 35-44 45-54 55-64 Total

Frequency 7 17 16 6 4 50

Solution: Construct the less than cumulative frequency distribution,


then:
Class Interval 15-24 25-34 35-44 45-54 55-64 Total
Frequency 7 17 16 6 4 50
Cumulative Freq. 7 24 40 46 50

 Since n = 50, 50/2 = 25


 l=35
 f=16
 h=9
 Cf=24
 Median=L+()h = =35+()9 = 35.56
The mode
 The mode is another measure of central tendency and it is
known as the most common value in a data set.
 Data set with none mode: In such data set each value
occurring only once.
 Data set with one mode: In such data set only one value
occurring with the highest frequency. The data set in this
case is called unimodal.
 Data set with two modes: In such data set two values that
occur with the same (highest) frequency. The distribution, in
this case, is said to be bimodal.
 Data set with more than two modes: In such data set more
than two values occurs with the same (highest) frequency,
then the data set contains more than two modes and it is said
to be multimodal.
Cont…
 Example: Find the mode for the given data set:
 22, 19, 21, 19, 27, 21, 29, 22, 19, 25, 21, 22, 25
 Solution: Since each of the three values, 19 (occur three
times), 21 (occur three times), and 22 (occur three times)
occurs with a highest frequency in their neighborhoods,
therefore, each of these is a mode, that is the modes for this
data set are: 19, 21, and 22.
Mode for grouped data
The formula for calculating the mode of
grouped data is:

In this formula, the variables are:


• L: The lower limit of the modal class
• h: The size of the class interval
• f1: The frequency of the modal class
• f0: The frequency of the class preceding
the modal class
• f2: The frequency of the class succeeding
the modal class
Example : The following table shows the distribution of scores obtained by
students in an exam:
Score Range Number of Students (Frequency)
50 – 60 8
60 – 70 12
70 – 80 25
80 – 90 10
90 - 100 5

What is the mode of the exam


scores?
Answer:
• L = lower boundary of the modal class = 70
• f1 = frequency of the modal class = 25
• f0 = frequency of the class before the modal class =
12
• f2​= frequency of the class after the modal class = 10 =75
• h = class width = 10
• Using formula: Mode = 75.
Relationships Between Mean, Median and Mode:
The relationships between mean, median & mode is defined
as Mode is
equal to the difference between 3 times the median & 2 times the
mean.
That is, Mean – Mode = 3 (Mean – Median) OR;
Mode = 3 Median – 2 Mean.
Example : If the difference between mean and mode of a
population is 48 and the median is 12, then the mean is
Solution:
 Mean – Mode = 3(Mean – Median);

 48 = 3(Mean – 12);
 16 = Mean – 12;
 Mean = 28.
B. Measures of dispersion
• An average can represent a series only as best as a
single figure can, but it certainly cannot reveal
the entire story of any phenomenon under study
• It shows the degree by which numerical data tend
to spread around an average value/mean .
• Averages do not tell anything about the scatterness
of observations within the distribution.
• In order to measure the degree of scatter, the
statistical device called measures of dispersion are
calculated.
03/08/2025 By: Menberu T. 23
 Range = highest value – lowest value
 It shows the difference b/n the highest value
and the lowest value, hence it is the weakest
measure of dispersion
 Variance
 First calculate the mean, then deduct the
mean from each value in the group square
the result and divide the result by the
number of values.
 The variance is used as a measure of how far a
set of numbers are spread out from each other.
 It describes how far the numbers lie from the
mean (expected value).
03/08/2025 By: Menberu T. 24
n 2

 ( x  x)
i 1
i
Var ( x) 
n
 Standard deviation
 The most reliable measurement of the degree
to which the data is spread around the mean
 Putting the variance in square root

03/08/2025 By: Menberu T. 25


Example: please, find the mean, median, mode,
range, variance and standard deviation for the
ID
following row Age of respondent
data?
1 53
2 44
3 56
4 70
5 45
6 62
7 36
8 23
9 56
10 55
03/08/2025 By: Menberu T. 26
Solution: A) Mean

= ∑xi/n = 53 + 44 + 56 +70 + 45 + 62 + 36 +23 + 56 + 55/10


= 500/10 = 50

B) Median, first we should arrange the raw data in ascending or


descending order as follow:

23, 36, 44, 45, 53, 55, 56, 56, 62, 70, since n is order, therefore

Median = 53 + 55/2 = 54

C) Mode, we find the most frequently occur, 56 is the mode of


the given data since it is more frequently occur and It is uni-
modal.

D) Range = largest value-lowest value = 70-23 =47

E) Variance = ∑(xi- )2/n


03/08/2025 By: Menberu T. 27
ID
xi xi- (xi- )2
1
53 3 9
2
44 -6 36
3
56 6 36
4
70 20 400
5
45 -5 25
6
62 12 144
7
36 -14 196
8
23 -27 729
9
56 6 36
10
55 5 25
∑(xi- )2 =1636 variance = ∑(xi- )2/n = 1636/10
= 163.6
F) SD=variance = 163.6 = 12.79
03/08/2025 By: Menberu T. 28
Measure of dispersion for
Grouped Data

• Sample Variance Formula for Grouped Data (σ2) = ∑ f(mi – x̄ )2/(n-1)

• Population Variance Formula for Grouped Data (σ2) = ∑ f(mi – x̄ )2/n

• where,

• f is the frequency of each interval

• mi is the midpoint of the ith interval

• x̄ is the mean of the grouped data

03/08/2025 By: Menberu T. 29


Cont…

• Find the variance and the standard deviation for the following frequency dist
of a sample:
Class Frequency fm

5–9 2

10 – 14 4
15 – 19 7
20 – 24 3
25 – 29 1

30 – 34 3

Total 20

03/08/2025 By: Menberu T. 30


Cont…

3.5 12.25

03/08/2025 By: Menberu T. 31


Cont…

• Variance= = 1105/19=58.158

• Standard deviation=7.626

03/08/2025 By: Menberu T. 32


C. Measures of relationship
1. Coefficient of variance
It (CV) is a normalized measure of dispersion.
It is also known as unitized risk or the variation coefficient.
It is defined as the ratio of the standard deviation to the
mean.
CV is a relative measure of dispersion, V, defined by:

 SD 
CV   
 Mean 

03/08/2025 By: Menberu T. 33


Example: If the standard deviation of a given
distribution is 0.20 and the mean is 0.50, what is
the coefficient of variation (CV)?

CV = (0.20/0.50)*100% = 40%
2. Covariance
Covariance between X and Y refers to a measure
of how much two variables change together.
Covariance indicates how two variables are
related.
A positive covariance means the variables are
positively related, while a negative covariance
means the variables are inversely related.
The formula for calculating covariance of sample
data is shown below.
03/08/2025 By: Menberu T. 34
Note: for population (N) and
for sample (n-1)
 Often the numbers have no
meaning. Thus we focus on the
sign.

03/08/2025 By: Menberu T. 35


3. correlation
Covariance only shows the direction. It has no
upper and lower bound.
Correlation tells the degree to which the variables
tend to move together.
The most familiar measure of dependence between
two quantities is the "Pearson's correlation."
It is obtained by dividing the covariance of the two
variables by the product of their standard deviations
.
The Pearson correlation is defined only if both of the
standard deviations are finite ፥ፍልሕ፡ህ፡and both of
them are nonzero.
The correlation coefficient is symmetric: corr(X,
Y) = corr(Y, X).
03/08/2025 By: Menberu T. 36
The Pearson correlation is +1 if there is perfect
positive linear relationship, −1 if there is
perfect negative linear relationship.
If the variables are independent, Pearson's
correlation coefficient is 0.
The sample correlation coefficient is written

03/08/2025 By: Menberu T. 37


The correlation between two random
variables, X and Y, is a measure of the
degree of linear association between the
two variables.
The population correlation, denoted by , can
take on any value from -1 to 1.
  indicates a perfect negative linear relationship
-1 <  < 0 indicates a negative linear relationship
   indicates no linear relationship
0 <  < 1 indicates a positive linear relationship
   indicates a perfect positive linear relationship
The absolute value of  indicates the strength or exactness of the
relationship.

03/08/2025 By: Menberu T. 38


Example: find covariance and Pearson
correlation following hypothetical row data?
xi yi xi- yi- (xi- )(Yi- ) (xi- )2 (Yi- )2

10 18 -4 6 -24 16 36
30 6 16 -6 -96 256 36
8 12 -6 0 0 36 0
16 15 2 3 6 4 9
6 9 -8 -3 24 64 9
Cov (X,Y)= ∑(xi- )(Yi- )/n= -90/5 = - ∑(xi- )2∑
18 = 376 (Yi- )2
= 90
r (x, y) = ∑(xi- )(Yi- )/∑(xi- )2∑ (Yi- )2 = -90/33, 840 =
-90/183 = -0.49
03/08/2025 By: Menberu T. 39
D. Shape of Frequency
Distribution
Skewness
 It refers to symmetry or asymmetry of the
distribution.
 A distribution is symmetric if its left half is a
mirror image of its right half.
 The skewness value can be positive or
negative.
 A symmetric distribution with a single peak and
a bell shape is known as a normal
distribution.

03/08/2025 By: Menberu T. 40


Kurtosis:
 It refers to peakedness/flatness of the
distribution.
 Higher kurtosis means more of the
variance is the result of infrequent extreme
deviation. n 4

 The
 ( x  x)
i
fourth standardized
KU  i 1 moment is
(n  1) S 4
defined as

03/08/2025 By: Menberu T. 41

You might also like