0% found this document useful (0 votes)
17 views8 pages

Descriptive Lec

Uploaded by

fafa1980002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Descriptive Lec

Uploaded by

fafa1980002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

-1-

DESCRIPTIVE STATISTICS
2024 2024

Lecture (3)

The MEDIAN(Q2) and


quartiles
The median is defined to be a value that splits the data into two equal parts (# of smaller observations = # of
larger observations). Two other measures similar to the median are, the first quartile (Q1) and the third
quartile (Q3). The first quartile splits the data such that 25% of the observations are less than or equal to Q 1
and 75% are larger. The third quartile splits the data such that 75% of the observations are less than or equal
to Q3 and 25% are larger. After sorting the data, the positions of the two quartiles are determined as:
(𝒏+𝟏) (𝒏+𝟏) 𝟑(𝒏+𝟏)
Position (Q1) = Position (Q2) = Position (Q3) =
𝟒 𝟐 𝟒

Example (1):
Given the following, find Q1 ,Q2 ,and Q3
0, -8, -10, 2, 8, 9, 80, and 15.
Solution:
First, arrange the data (from smallest to largest)
-10 -8 0 2 8 9 15 80
X1 X2 X3 X4 X5 X6 X7 X8

𝒏+𝟏 𝟖+𝟏 𝟗
Position (Q1 ) = = = 𝟒 = 𝟐. 𝟐𝟓
𝟒 𝟒

Q1 = X2 +0.25 (X3 –X2)


= -8 + 0.25 (0--8)
= -8 + 0.25(8)= -8 + 2 = −𝟔.
𝒏+𝟏 𝟖+𝟏 𝟗
Position of (Q2) = = = = 𝟒. 𝟓
𝟐 𝟐 𝟐

Q2 = X4 +0.5 (X5 –X4)


= 2 + 0.5 (8-2)
= 2 + 0.5(6)= 2 + 3 = 𝟓.
𝟑(𝒏+𝟏) 𝟑(𝟖+𝟏) 𝟐𝟕
Position (Q3 ) = = = = 𝟔. 𝟕𝟓
𝟒 𝟒 𝟒

Q3 = X6 + 0.75(X7 –X6 )
= 9 + 0.75 (15-9)
=9 +0.75(6)= 9+4 = 𝟏𝟑.
-2-

Example(2):
Find Q1 ,Q2 ,and Q3 from the following data: 25, 40, 15, 90, 50.
Solution:
First, arrange the data (from smallest to largest)
15 25 40 50 90
X1 X2 X3 X4 X5

𝒏+𝟏 𝟓+𝟏 𝟔
Position (Q1 ) = = = 𝟒 = 𝟏. 𝟓
𝟒 𝟒

Q1 = X1 +0.5 (X2 –X1)


= 15 + 0.5 (25-15)
= 15 + 0.5(10)= 15 + 5 = 𝟐𝟎.
𝒏+𝟏 𝟓+𝟏 𝟔
Position ( Q2) = = =𝟐=𝟑
𝟐 𝟐

Q2 = X3 = 𝟒𝟎.
𝟑(𝒏+𝟏) 𝟑(𝟓+𝟏) 𝟏𝟖
Position (Q3 ) = = = = 𝟒. 𝟓
𝟒 𝟒 𝟒

Q3 = X4 + 0.5(X5 –X4 )
= 50 + 0.5 (90-50)
=50 +0.5(40)= 50+20 = 𝟕𝟎

Properties of the median


1) Unique value.
2) Insensitive to extreme values (outliers).
3) Can be computed for ordinal data.
4) We do not use all the observations in the calculation.
5) If 𝒚 = 𝒂 + 𝒃. 𝒙, then 𝒎𝒆𝒅𝒊𝒂𝒏 (𝒚) = 𝒂 + 𝒃. 𝒎𝒆𝒅𝒊𝒂𝒏 (𝒙).

The MODE

The mode is the most frequently repeated observation. For example, the mode of the data set
{9 6 8 8 10 12 8 9 } is 8 . The mode is not frequently used as a measure of location with
quantitative data. However, it is the only measure that can be used with qualitative data.

Example (1):
Find the Mode from the following data:
i. 3, 8, 4, 20, 5. No mode
-3-

ii. 5,5, 8, 10, 7,7 ,14, 7. One mode = 7


iii. 0,0, -2, 7, 6, -2, 9. Two modes 0 and -2

Properties of the mode


1- Can be evaluated for all types of data.
2- Insensetive to extreme values (outliers).
3- Data sets may be have no mode, one mode, or more than one mode.
4- If 𝒚 = 𝒂 + 𝒃. 𝒙, then 𝒎𝒐𝒅𝒆 (𝒚) = 𝒂 + 𝒃. 𝒎𝒐𝒅𝒆 (𝒙).

Measures of Absolute Dispersion

2- Inter-Quartile Range 3- Standard


1- Range
Deviation

Measures of Dispersion (variability or spread): describe the variability structure of a data set. That
is, how observations are scattered away or close to each other, i.e., they reflect the distances.
They describe whether the observations in a given data set are widely dispersed (large
dispersion) or concentrated close to each other (small dispersion). For example:

Center Dispersion
Data set
Mean Median Mode Range
-5 5 5 15 5 5 5 20
-20 5 5 30 5 5 5 50
-40 5 5 50 5 5 5 90
The three data sets have the same measures of central tendency, but their deviations from the
center are different.
Note that all measures of dispersion are nonnegative. Also, a measure of dispersion takes on the
value zero if and only if all observations in a data set have the same value.

1- Range:
The range is the simplest measure of dispersion to calculate. It is obtained by taking the
difference between the largest and the smallest values in a data set.

𝑹 = 𝑳𝒂𝒓𝒈𝒆𝒔𝒕 − 𝑺𝒎𝒂𝒍𝒍𝒆𝒔𝒕
-4-

Properties of the range:


1) Very simple to compute.
2) Its calculation is based on two values only: the largest and the smallest, and these
two values may be outliers or extreme.
3) Sensitive to extreme values.

2- Inter-Quartile Range:
The I.Q.R. measure the range of only 50% of the observations at the middle and it
eliminates any information about the first and last quarters of the data.

𝑰𝑸𝑹 = 𝑸𝟑 − 𝑸𝟏

Properties of I.Q.R:
1) Depends on only two values.
2) Insensitive to extreme outliers.
Note that:
Each of the range and the I.Q.R. depends on two values only namely, the smallest and the
largest and, Q1 and Q3 respectively. A measure of dispersion, which makes a full use of
the information provided by the data, will certainly be better.

3- Standard deviation and Variance:


𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 = +√𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆
The standard deviation is the most used measure of dispersion. The value of the standard
deviation tells how closely the values of a data set are clustered around the mean. The
standard deviation is obtained by taking the positive square root of the variance. The
variance calculated for population data is denoted by 𝝈𝟐 , and the variance calculated for
sample data is denoted by 𝑺𝟐 . Consequently, the standard deviation calculated for the
population data is denoted by 𝝈, and the standard deviation calculated for the sample data
is denoted by 𝑺.
The formulas for the sample variance are:

(∑ 𝒙)𝟐
∑(𝒙 − 𝒙
̅) 𝟐 ∑ 𝒙𝟐 −
𝒏
𝑺𝟐 = =
𝒏−𝟏 𝒏−𝟏

𝑺 = +√𝑺𝟐
-5-

Note that:
• The values of the variance and the standard deviation are never negative.
• The measurement units of variance are always the square of the measurement units
of the original data.

Example (1):
Find the sample standard deviation from the following data: 4, 9, 5
Solution:

𝒙 ̅
𝒙−𝒙 ̅)𝟐
(𝒙 − 𝒙 𝒙𝟐

4 𝟒 − 𝟔 = −𝟐 𝟒 16
9 𝟗−𝟔 = 𝟑 𝟗 81
5 𝟓 − 𝟔 = −𝟏 𝟏 25
∑ 𝒙 = 𝟏𝟖 𝟎 𝟏𝟒 ∑ 𝒙𝟐 = 𝟏𝟐𝟐
Note that:
∑ 𝒙 𝟏𝟖
̅=
𝒙 = =𝟔
𝒏 𝟑
The sample variance
(∑ 𝒙)𝟐
∑ 𝒙𝟐 − ̅) 𝟐
∑(𝒙 − 𝒙
𝒏
𝑺𝟐 = 𝟐
𝑺 =
𝒏−𝟏 𝒏−𝟏
(𝟏𝟖)𝟐 𝟏𝟒
𝟏𝟐𝟐 − 𝟏𝟐𝟐 − 𝟏𝟎𝟖 𝑺𝟐 = =
𝟐 𝟑
𝑺 = = 𝟑−𝟏
𝟑−𝟏 𝟐
𝟏𝟒
𝟏𝟒 = =𝟕
= =𝟕 𝟐
𝟐

The sample standard deviation


𝑺 = √𝟕 = 𝟐. 𝟔𝟒𝟔
Properties of S:
1) Depends on all values.
2) Sensitive to extreme values.
Notes:
• A set of data has the same values, for example: 8, 8, 8, 8, 8,
▪ The mean = the median.
▪ No mode.
▪ All measures of dispersion = zero.
• If 𝒚 = 𝒂 + 𝒃. 𝒙, then
▪ 𝑹𝒂𝒏𝒈𝒆 (𝒚) = |𝒃|. 𝑹𝒂𝒏𝒈𝒆 (𝒙)
-6-

▪ 𝑰. 𝑸. 𝑹 (𝒚) = |𝒃|. 𝑰. 𝑸. 𝑹 (𝒙)


▪ 𝑺𝒅 (𝒚) = |𝒃|. 𝑺𝒅 (𝒙)

Example (2):
If 𝒚 = 𝟗 − 𝟐𝒙, and the variance (x)= 25, then
𝑺𝒅 (y)= |−𝟐| × 𝑺𝒅(𝒙) = 𝟐 × √𝟐𝟓 = 𝟐 × 𝟓 = 𝟏𝟎
Example:
If 𝒚 = 𝟓 − 𝟐𝒙, and the variance of (x)= 10, then
Variance of (y)= (−𝟐)𝟐 × 𝟏𝟎 = 𝟒 × 𝟏𝟎 = 𝟒𝟎
Notes:
(1) Measures of location are used to describe the general level of the data. For example, to
compare between the standards of living in two countries, one important indicator is the
average (personal) income. To evaluate the performance level of workers, we consider their
average daily production or the average time/worker to produce one unit. Here, we seek a
single value at the middle of the data to determine its location on the real line.
(2) Measure of absolute dispersion are used to describe the variability structure of one group
while measures of relative dispersion, to be discussed next, are used to compare the variability
structures between two groups or more.

Homework
Question (1):
The following data set represents the average monthly income (in hundreds) in a random
sample of 6 families.
40 3 3 4 4 6
Find each of the following:
(1) The mean, the median, and the mode.
(2) The quartiles.
Question (2):
The following is the time in minutes taken to serve customers at a certain bank.
8 9 9 10 11 11 12 14 14 16 19
Answer the following:
a) Compute the mean of the serving time.
b) The median and quartiles, 𝑸𝟏 and 𝑸𝟑 .
c) The mode.
-7-

Question (3):
The following frequency distribution table represents the daily wages in L.E for a random
sample of 100 workers in the drug industry.
Class 17.5- 22.5- 27.5- 32.5- 37.5- 42.5- 47.5-52.5
Frequency 15 10 15 20 25 10 5
1) Determine the population, the sampling unit, and the variable.
2) Draw the Histogram and the polygon.
3) Construct the cumulative frequency distribution table.
4) Find the number of workers whose wages are between 30 and 42.5.
5) Use the cumulative frequency curve to find the percentage of workers whose wages
are greater than 30.
Question (4):
Choose the correct answer in each of the following questions:
1. If the Statistics score of all students in some exam is 90, this means that …..
a) The mode = zero.
b) The mode = 90.
c) The interquartile range = 90.
d) The standard deviation = zero.
2. If 80% of the unemployed in some country searched for new jobs for at least 5 months, then
…..
a) The first quartile is less than or equal to 5.
b) The first quartile is greater than or equal to 5.
c) The third quartile is less than or equal to 5.
d) The median is less than or equal to 5.
3. If 25% of the unemployed in some country searched for new jobs for at most 5 months, then
…..
a) 𝑸𝟑 = 𝟓.
b) 𝑸𝟏 = 𝟓.
c) 𝑸𝟐 = 𝟓.
d) 𝒙̅=𝟓
3. If 80% of middle-sized companies paid their tax obligations by deadline. Then……
a) The first quartile is greater than or equal to the deadline.
b) The median is less than or equal to the deadline.
c) The median is greater than or equal to the deadline.
d) The third quartile is greater than or equal to the deadline.
4. If 75% of middle-sized companies paid their tax obligations by deadline. Then this deadline is
simply……
a) The first quartile.
b) The median.
c) The third quartile.
d) The mean.
5. If the median depth of a lake is 1.5 meters, it means that …..
a) An adult of average height can walk through the first half of the lake.
-8-

b) An adult of average height can walk through the first quartile of the lake.
c) An adult of average height cannot walk through the lake.
d) Most spots in the lake have depth less than or equal to 1.5 meters.
6. If the average depth of a lake is 1.5 meters, it means that …..
a) The deepest point of the lake is 1.5 meters.
b) An adult of average height can walk through the lake.
c) 50% of the lake has depth less than or equal to 1.5 meters.
d) There could be a spot in the lake where it is deeper than 1.5 meters.

You might also like