0% found this document useful (0 votes)
12 views34 pages

Central Tendency - Lecture Notes

The document provides an overview of central tendency in statistics, including definitions and calculations for mean, median, and mode. It also discusses weighted mean, mean of grouped data, and variability measures such as range, variance, and standard deviation. Additionally, it covers quartiles and interquartile range, including methods for identifying outliers.

Uploaded by

Fahim Ansari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views34 pages

Central Tendency - Lecture Notes

The document provides an overview of central tendency in statistics, including definitions and calculations for mean, median, and mode. It also discusses weighted mean, mean of grouped data, and variability measures such as range, variance, and standard deviation. Additionally, it covers quartiles and interquartile range, including methods for identifying outliers.

Uploaded by

Fahim Ansari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

3.

Data Distribution
Chapter 1 - Central Tendency
Statistics

Descriptive Inferential
Statistics Statistics

Central Tendency
Central Tendency
Descriptive Statistics Outcome is one value

Definition:

a descriptive summary of a dataset through a single value that reflects


the center of the data distribution

To tell us

How the center of distribution?


3. Data Distribution
Chapter 2 - Understanding Mean, Median, Mode
Central Tendency

Name Age

John 33

Mark

Susan

Joe
28

25

46
? Average
= Mean

… … Average age of employees in the data distribution (data set)


Bob 29
Statistics

Descriptive Inferential
Statistics Statistics

Central Tendency

Mean Median Mode


= Average
Calculating Mean

Name Age

John 33

Mark 28
33 + 28 + 25 + 46 + 32 + 29 + 42 + 21
Susan 25
8
Joe 46

Ema 32

Bob 29 Mean value = 32


Keith 42

Julia 21
Calculating Mean

Day Sales

Sunday 9500

Monday 100
9500 + 100 + 50 + 150 + 100 + 150 + 100
Tuesday 50
7
Wednesday 150

Thursday 100 More than 10 times compared


to most of the sales
Friday 150 Mean value = 1450
Saturday 100
Mean
● Average value of a data series

○ e.g. data[100, 200, 50, 150], then mean is 100+200+50+150 / 4 = 125

● Outlier(s) in our data set can mislead mean value

○ e.g. data[9500, 100, 50, 350], then mean is 9500 + 100 + 50 + 350 / 4 = 2500

○ in above example, mean value is too far high then most of our data in the dataset

● Formula,

○ Mean = 𝝨x/N
Median
● Middle value in our dataset
● Must sort out the data from low to high
○ e.g. data[150, 50, 600, 200, 350]
○ sorted_data[50, 150, 200, 350, 600]
○ therefore, median value is 200
● Formula
○ {(n + 1) ÷ 2}th value

● if n is even, the median is calculated by averaging the two middle values


○ e.g. data[150, 50, 600, 200, 350, 100]
○ sorted_data[50, 100, 150, 200, 350, 600]
○ therefore, media value is 150+200/2 = 175
Mode
● Value that frequently appears in the data set
○ e.g. data[100, 50, 200, 50, 150]
○ therefore, mode is 50 as it appears twice in the data set
● Can be more than one mode in a single data set
○ e.g. data[100, 50, 200, 50, 150, 100]
○ therefore, mode is 50 and 100

● Some data set do not have mode if there is no repeating number


○ e.g. data[50, 100, 150, 200, 250, 300]
● Sorting is preferred as it helps visually
3. Data Distribution
Chapter 3 - Understanding Weighted Mean
and Mean of Grouped Data
Statistics

Descriptive Inferential
Statistics Statistics

Central Tendency

Mean Median Mode


= Average
Weighted Mean

Weighted:

some part of the data is more important than other

calculating mean based on each weight


Weighted Mean - Example

Assessment Type Scores/Marks Weight in Percentage

Mid-term Exam 95 15%

Practical Project 85 35%

Final Exam 82 50%

● Year end score is computed based on:


○ Get 15% of overall score from mid-term exam
○ Get 35% of overall score from practical project
○ Get 50% of overall score from final exam
Weighted Mean - Calculation

Assessment Type Scores/Marks Weight Weight x Score

Mid-term Exam 95 0.15 14.25

Practical Project 85 0.35 29.75

Final Exam 82 0.50 41

Grade Point 85

● To calculate the weighted mean


∑w i . x i
Sum (Multiply weight of each value with its value)
WA =
∑w i

○ Sum (weight of each value)
○ Then divide w = weight of each value
x = data value
Mean of Grouped Data - Example

Frequency Distribution

Sales Group No. of Days

0-2 11

3-5 8

6-8 5

9-11 3

12-14 1

15-17 2

Mean/Avg = 150/30 = 5
Mean of Grouped Data - Calculation

Sales Midpoint (x) No. of Days f.x


∑fi . xi
Group (frequency, f) GM =
0-2 1 11 11 ∑fi
3-5 4 8 32
f = frequency of each group
6-8 7 5 35 x = midpoint
9-11 10 3 30
153
12-14 13 1 13
30
15-17 16 2 32
Grouped Mean = 5.1
30 153
3. Data Distribution
Chapter 4 - Variability
Statistics

Descriptive Inferential
Statistics Statistics

Central
Tendency Variability

Mean Median Mode Range Variance Standard Deviation


3. Data Distribution
Chapter 5 - Understanding Range, Variance
and Standard Deviation
Statistics

Descriptive Inferential
Statistics Statistics

Central
Tendency Variability

Mean Median Mode Range Variance Standard Deviation


Range - Example & Calculation

● the difference between the largest number and the smallest number
○ e.g. data[100, 50, 200, 50, 150]
○ therefore, range is 200 - 50 = 150

● same outlier effect as mean


○ e.g. data[100, 50, 9000, 50, 100]
○ therefore, range is 9000 - 50 = 8950

● Formula
○ Range = Largest number - Smallest number

● sorting is preferred as it helps visually


Variance - Calculation

● Calculating Variance

○ Step 1 - calculate the mean value

○ Step 2 - subtract mean value from each data point

○ Step 3 - get the squared value for each subtracted value

○ Step 4 - calculate the average of each squared value


Variance - Example
● E.g. data[15, 17, 16, 14, 18, 16] ● Step 1 - calculate the mean value
● Step 2 - subtract mean value from each data point
Step 1 ● Step 3 - get the squared value for each subtracted value
● Step 4 - calculate the average of each squared value
mean = 15+17+16+14+18+16 / 6 = 96/6 = 16

Step 2 & 3

15-16 = -1 => (-1)2 = 1 Step 4


17-16 = 1 => (1)2 = 1
1 + 1 + 0 + 4 + 4 + 0 = 10
2
16-16 = 0 => (0) = 0
n = 6, therefore VAR = 10/6 = 1.67
2
14-16 = -2 => (-2) = 4
18-16 = 2 => (2)2 = 4
16-16 = 0 => (0)2 = 0
Variance - Interpretation

● If the variance is too small, then our data is very close to the mean
○ E.g. data[15, 17, 16, 14, 18, 16]
○ Variance = 1.67, Mean = 16
○ Since the value of variance is small, each data point is not much far from mean

● If the variance is large, then our data is very far from the mean
○ E.g. data[13, 3, 40, 12, 3, 25]
○ Variance = 170, Mean = 16
○ Since the value of variance is large, each data point is considered far from mean
Standard Deviation - Interpretation

● the value of standard deviation shows us how far each data is deviated
from the mean
● Formula
○ take the square root of Variance

● E.g. data[15, 17, 16, 14, 18, 16]


○ VAR = 1.67
○ SD = √1.67 = 1.29
3. Data Distribution
Chapter 6 - Understanding Quartiles and
Interquartiles Range
Quartiles

● Divide the data set into four equal segments after arranging in ascending order
Quartiles

● First quartile, denoted as Q1


○ splits off the lowest 25% of data from the highest 75%
● Second quartile, denoted as Q2
○ cuts data set in half, median value
● Third quartile, denoted as Q3
○ splits off the highest 25% of data from the lowest 75%
Quartiles - Calculating Q1, Q2, Q3

● Step 1: Arrange data in ascending order


● Step 2: Find the median value, i.e. Q2
● Step 3: Find the median value of lower half of the data set, i.e. Q1
● Step 4: Find the median value of upper half of the data set, i.e. Q3
Quartiles - Example
● Step 1: Arrange data in ascending order
e.g. 7, 18, 16, 10, 2, 5, 13, 11, 3
● Step 2: Find the median value, i.e. Q2
Step 1 ● Step 3: Find the median value of lower half of the data set, i.e. Q1

2, 3, 5, 7, 10, 11, 13, 16, 18 ● Step 4: Find the median value of upper half of the data set, i.e. Q3

Step 2 Step 2

Median = 10 = Q2 Median Upper = (13+16)/2 = 14.5 = Q3

Step 2

Median Lower = (3+5)/2 = 4 = Q1


Interquartile Range (IQR)

● Measure spread of the center half of the data set

IQR = Q3 - Q1

● Useful to spot outliers


Any values that are more than:
Q3 + 1.5 IQR

OR

Any values that are less than:


Q1 - 1.5 IQR
Finding Outliers - Example
● Step 1: Arrange data in ascending order
e.g. 11, 41, 44, 47, 51, 53, 57, 75
● Step 2: Find the median value, i.e. Q2
Sort First
● Step 3: Find the median value of lower half of the data set, i.e. Q1
11, 41, 44, 47, 51, 53, 57, 75 ● Step 4: Find the median value of upper half of the data set, i.e. Q3

Q3
Q3 + 1.5 IQR
(53+57)/2 = 55
55 + (1.5 x 12.5) = 73.75
Q1

(41+44)/2 = 42.5 Q1 - 1.5 IQR


IQR
42.5 - (1.5 x 12.5) = 23.75
55-42.5 = 12.5

You might also like