Tutorial Wk3
Tutorial Wk3
Topic 1
Introduction to Statistics
Tutorial Week3
1
Concept Review
Measures of Central Tendency: Mean, Median and
Mode
Distribution Shape
2
Mean
Sample mean
n
X
pronounced x-
bar i
X1 X 2 X n
X i 1
n n Sample
Size
Population mean
N
Pronounced
miu X i
X1 X 2 X N
i 1
N N Population
Size
3
Median
In an ordered array, the median is the “middle”
number (50% above, 50% below)
If n or N is odd, the median is the middle number
𝑵 +𝟏
𝐼𝑓 𝑁 𝑖𝑠 𝑜𝑑𝑑 , 𝑡h𝑒𝑚𝑒𝑑𝑖𝑎𝑛 𝑖𝑠 𝑡h𝑒𝑛𝑢𝑚𝑏𝑒𝑟 𝑟𝑎𝑛𝑘𝑒𝑑 𝑖𝑛 𝑡h𝑒 𝑎𝑟𝑟𝑎𝑦
𝟐
If n or N is even, the median is the average of the 2 middle
numbers
4
Mode
Value that occurs most often
Not affected by extreme values (outliers)
Used for both numerical and categorical data
There may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Modes =5, 9 and 12
5
Comparison Table
Measures Variable Type Can be non- Can be more- Effect of
exist? than-one? Outliers
Median Numerical No No No
6
Range
Simplest measure of variation
Difference between the largest and the smallest
values Range= 𝑋 Largest − 𝑋 Smallest
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
7
Quartiles
8
Interquartile Range
Cont’d
Q1 Q2 Q3
Interquartile Range
Interquartile range is also called the mid-spread because it covers the
middle 50% of the data
Not influenced by outliers or extreme values
Usually, values fall outside the range [Q1 - 1.5*IQR, Q3 + 1.5*IQR] are
considered as outliers
9
Variance
Most preferred measure of variation due to its
mathematical property
It shows variation of each value from the mean
Sample Variance 𝑛
∑ ( 𝑋𝑖 − 𝑋 )
¯ 2
2 𝑖=1
𝑆 =
Population Variance
𝑛 −𝟏
𝑁
pronounced ∑ ( 𝑋 𝑖 − 𝜇) 2
sigma 𝜎 2= 𝑖=1
squared 𝑁
10
Standard Deviation
√
Sample Standard Deviation 𝑛
∑ ( 𝑋𝑖− 𝑋 )
¯ 2
𝑖 =1
𝑆=
𝑛− 𝟏
√
Population Standard Deviation 𝑁
pronounce
d sigma
∑ ( 𝑋 𝑖 − 𝜇) 2
𝑖 =1
𝜎=
𝑁
11
Standard Deviation
Smaller standard deviation
𝜇or 𝑋
Smaller standard deviation means most values of X are closer
to its mean value. Larger standard deviation means the
values of X are more spread out
12
Distribution Shape
Position of mean and median for unimodal
continuous distribution
Left-Skewed Symmetric Right-Skewed
Mean < Median Mean = Median Median < Mean
13
The Five Number Summary and
Boxplot
The five numbers that help describe the center,
spread and shape of data are
Xsmallest -- Q1 -- Median -- Q3 -- Xlargest
Boxplot
25% of data 25% 25% 25% of data
of data of data
14
Distribution Shape and Boxplot
X smallest 𝑄1 𝑄2 𝑄 3 X largest
If A = B, then Symmetric
If A > B, then Left-Skewed
A B If A < B, then Right-
Skewed
If C = D, then Symmetric
If C > D, then Left-
C D Skewed If C < D, then
Right-Skewed
If E = F, then Symmetric
If E > F, then Left-Skewed
E F If E < F, then Right-
Skewed
Look at all the three pairs of comparisons, go for the majority
Distribution Shape and Boxplot
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
16
Quick Check
Kahoot! (20s for each question)
Please Google Kahoot and Click the first link.
17
Tutorial Question
Topic 1
Question 9 & 10.
18
Formula Reference n N
X i
X X 2 X n X i
X1 X 2 X N
Mean X i 1
1 i 1
n n N N
If r is not an integer
Interquartile Range = Q3 – Q1
19
Formula Reference
20
Summary
21