0% found this document useful (0 votes)
55 views40 pages

Lecture 2 - Descriptive Statistics

Okay, here are the steps: 1) Find the mean (average) of the 5 values: (600 + 470 + 170 + 430 + 300) / 5 = 380 2) Find the deviations from the mean: 600 - 380 = 220 470 - 380 = 90 170 - 380 = -210 430 - 380 = 50 300 - 380 = -80 3) Square the deviations: 220^2 = 48,400 90^2 = 8,100 -210^2 = 44,100 50^2 = 2,500 -80^2 = 6,400 4) Calculate the variance: Sum of squared deviations / Number of values = 108,

Uploaded by

ShuYun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views40 pages

Lecture 2 - Descriptive Statistics

Okay, here are the steps: 1) Find the mean (average) of the 5 values: (600 + 470 + 170 + 430 + 300) / 5 = 380 2) Find the deviations from the mean: 600 - 380 = 220 470 - 380 = 90 170 - 380 = -210 430 - 380 = 50 300 - 380 = -80 3) Square the deviations: 220^2 = 48,400 90^2 = 8,100 -210^2 = 44,100 50^2 = 2,500 -80^2 = 6,400 4) Calculate the variance: Sum of squared deviations / Number of values = 108,

Uploaded by

ShuYun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Lecture 2

DESCRIPTIVE
STATISTIC
DR HAR I ATI ABDULLAH HAS HI M

hariati@ u tm .m y
Measure of central
tendency

Lecture
Content Measure of dispersion

Statistical graphics and


frequency distribution
PART I
Measure of Central Tendency
Measure of central tendency are measure of the location of the middle or the
center of a distribution. Identification of a single value as representative of an entire
distribution

The most frequently used are:

Mean Median Mode


Measure of Central Tendency

Mean Median Mode


MEAN
Arithmetic Mean
Geometric Mean
Weighted Mean
Trimmed Mean
Ungrouped Groupe
Arithmetic Data d Data
Mean
It is an average, a
∑𝑥𝑖 ∑𝑓𝑖𝑥𝑖
measure of the centre of
𝑥ҧ = 𝑥ҧ =
a set of data
𝓃 𝑓𝑖
Most commonly used, 𝑥ҧ = sample mean 𝑥ҧ = sample mean
𝑥𝑖 = value of x for a 𝑥𝑖 = midpoint for the
accurate but sensitive to
particular case range of values
extreme values (outliers)
𝓃 = number of cases 𝑓𝑖 = frequency of
value in the range
Example
Arithmetic Mean
The following table shows the number of plants in 20 houses in a group. Find the mean number of plans per house

Number of Plants Number of 𝒇𝒊 𝒙𝒊 𝒇𝒊𝒙𝒊


houses

0-2 1

2-4 2 ∑𝑓𝑖𝑥𝑖
4-6 2
𝑥ҧ =
6-8 4
∑𝑓𝑖
8-10 6

10-12 2

12-14 3
Normally used to average indexed and percent
(e.g percent increase in sales, production or other
business from one time to another)
Growth
rate

Geometric 𝑮𝑴 = 𝒏
(𝒙𝟏)(𝒙𝟐)(𝒙𝟑)(𝒙𝟒)(𝒙𝒏)

Mean
OR

𝒏 𝒗𝒂𝒍𝒖𝒆 𝒂𝒕 𝒕𝒉𝒆 𝒆𝒏𝒅 𝒐𝒇 𝒑𝒆𝒓𝒊𝒐𝒅


𝑮𝑴 = −𝟏
𝒗𝒂𝒍𝒖𝒆 𝒂𝒕 𝒕𝒉𝒆 𝒃𝒆𝒈𝒊𝒏𝒏𝒊𝒏𝒈 𝒐𝒇 𝒑𝒆𝒓𝒊𝒐𝒅
Portfolio return
Example
Geometric Mean in Growth Rate
Consider a stock that grows by 10% in year one, declines by 20% in year two, and
then grows by 30% in year three. The geometric mean of the growth rate is
calculated as follows:

𝑛
𝐺𝑀 = (𝑥1)(𝑥2)(𝑥3)(𝑥4)(𝑥𝑛)
3
𝐺𝑀 = 1 + 0.1 1 − 0.2 1 + 0.3
Example
Geometric Mean in Portfolio Return
Consider a portfolio of stocks that goes up from RM100 to RM110 in year one, then declines to
RM80 in year two and goes up to RM150 in year three. The return on portfolio is then
calculated as follows:

𝑛 𝑣𝑎𝑙𝑢𝑒 𝑎𝑡 𝑡ℎ𝑒 𝑒𝑛𝑑 𝑜𝑓 𝑝𝑒𝑟𝑖𝑜𝑑


𝐺𝑀 = −1
𝑣𝑎𝑙𝑢𝑒 𝑎𝑡 𝑡ℎ𝑒 𝑏𝑒𝑔𝑖𝑛𝑛𝑖𝑛𝑔 𝑜𝑓 𝑝𝑒𝑟𝑖𝑜𝑑

3 150
𝐺𝑀 = −1
100
Weighted mean is calculated when
certain values in a data set are more
important than the others.
A weight wi is attached to each of the
Weighted values xi to reflect this importance.

Mean (𝑤1𝑋1 + 𝑤2𝑥2+ . . +𝑤𝑛 𝑥𝑛)


𝑥ഥ 𝑤 =
(𝑤1 + 𝑤2 + ⋯ . 𝑤𝑛)
Example
Weighted Means
Sarah wants to buy a new camera, and decides
on the following rating system:

Image Quality 50%

Battery Life 30%


(𝑤1𝑋1 + 𝑤2𝑥2+ . . +𝑤𝑛 𝑥𝑛)
Zoom Range 20% 𝑥ഥ 𝑤 =
(𝑤1 + 𝑤2 + ⋯ . 𝑤𝑛)
The Sony camera gets 8 (out of 10) for Image (0.5x8 + 0.3x6 + 0.2x7)
Quality, 6 for Battery Life and 7 for Zoom Range. xഥ w =
(1)
The Conan camera gets 9 for Image Quality, 4 for
Battery Life and 6 for Zoom Range.

Which camera is best?


Trimmed Mean
Trimmed mean is calculated by discarding a certain percentage of extreme values
in a distribution and then calculating the mean of ite remaining values

It trims any outliers


Example
Trimmed mean
Find the trimmed 20% mean for the following test scores: 60, 81, 83, 91, 99

Step 1:

Trim the top and bottom 20% from the data. That leaves us with the middle three values:

60, 81, 83, 91, 99.

Step 2:

Find the mean with the remaining values. The mean is (81 + 83 + 91) / 3 ) = ?.
Measure of Central Tendency

Mean Median Mode


Why MEDIAN?
▪ The mean is typically being skewed by extreme / outliers value
▪ When our data is skewed (i.e., the frequency distribution for our
data is skewed)

As the data becomes skewed the mean loses its ability to provide
the best central location for the data because the skewed data is
dragging it away from the typical value.
MEDIAN
• Median is the middle value that lies in the centre of the
data when the values are ranked in ascending or
descending order

• If there is an ODD number of values, the median is defined


as (n+1) /2 th value

• If there is an EVEN number of values, the median is the


average of the two middle values, which is defined as Median for ungroup data

(n /2)th and [(n/2)+1] th value


Median for Grouped Data
Marks No. of Students

0 – 10 10

10 – 20 20

20 – 30 30

30 – 40 40

40 – 50 50

50 – 60 30
Marks No. of Students
Step 1 : Obtain the 0 – 10 10
cumulative
10 – 20 20
frequencies
20 – 30 30
30 – 40 40
40 – 50 50
Step 2 : determine 50 – 60 30
the location of
median class interval
using cumulative
frequency column

Step 3 : calculate
the value of
median
Measure of Central Tendency

Mean Median Mode


Why MODE?
The mode is a statistical term that refers to the most frequently occurring number
found in a set of numbers.

The mode is useful when the most common item, characteristic or value of a data set
is required.

Mode is commonly used to describe ordinal and categorical data


Mode for Grouped Data
Time to travel to
Frequency
work
1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 - 50 7
End of Part 1
PART II
Measure of Dispersion

Standard
Range Variance
Deviation
Range
• Range is the difference between the largest value (maximum) and the smallest (minimum) value
in the data set.

• Range is sensitive to outliers

• Range is good data screening technique

64, 66, 77, 72, 80, 72, 78, 63, 68, 79

What is the range for above data set ?


Interquartile Range
▪Interquartile Range (IQR) is alternative measure of
dispersion, which is less influenced than the range by
extreme values.

▪R is the range of the middle 50% of the values in a data


set, which is calculated as the difference between the 75th
and 25th percentile values
Interquartile Range
The table below shows the marks obtained by a group of Form 4 students in school mathematics test.

Marks Frequency
20 – 29 4
30 – 39 8
40 – 49 20
50 – 59 16
60 -69 9
70 – 79 3

Estimate the interquartile range


Measure of Dispersion

Standard
Range Variance
Deviation
Variance and SD
Variance and SD are the most common measures of dispersion for continuous data

Variance and SD is used to describe how far the individual value disperse from the mean value

Variance (σ2): Average of the


Standard deviation (σ): The
squared deviations from the square root of the variance
mean
Variance for a Population Sample Variance

Standard Deviation for a Population Standard Deviation for a Sample


Example
Mean, Variance, and Standard Deviation
Sales for five days = RM600, RM470, RM170, RM430, RM300

Find out the Mean, Variance and SD for the 5 days of business.
SKEWNESS IN
RELATION TO MEAN,
MODE, MEDIAN
Normal distribution – BELL CURVE
Skewness
▪A distribution can be
symmetric, skewed to right or
skewed to the left

▪Pearson’s coefficient of
skewness is usually used to
measure the skewness of the
distribution
Thank You

You might also like