3-Lect - Finding The Center of Data Set. Mean, Median, Mode
3-Lect - Finding The Center of Data Set. Mean, Median, Mode
FIVE CHARACTERISTICS
1. CENTER
2. VARIATION
3. DISTRIBUTION
4. OUTLIER
5. CHANGES OVER TIME
• CENTER (THE MIDDLE OF DATA SET),
= 𝑆𝑈𝑀
𝑋 = 𝐷𝐴𝑇𝐴 𝑉𝐴𝐿𝑈𝐸
𝑛 = # 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠/𝑖𝑡𝑒𝑎𝑚𝑠(𝑖𝑛 𝑎 𝑠𝑎𝑚𝑝𝑙𝑒)
𝑁 = # 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠/𝑖𝑡𝑒𝑎𝑚𝑠(𝑖𝑛 𝑎 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛)
𝑋ത = 𝑆𝐴𝑀𝑃𝐿𝐸 𝑀𝐸𝐴𝑁
𝑀 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑀𝑒𝑎𝑛
THERE ARE DIFFERENT LETTERS TO TALK ABOUT POPULATIONS OR WHEN WE TALK ABOUT SAMPLES, THERE ARE DIFFERENT LETTERS TO SHOW
THAT. (YOU WILL DO THE SAME MATH BUT IT IS DIFFERENT WAYS CALLING IT)
FOR PARAMETERS AND STATISTICS WE USE DIFFERENT LETTERS, THEY DO EXACTLY THE SAME THINGS.
σ𝑥
𝑋ത = SAMPLE
𝑛
σ𝑥
𝑀= POPULATION
𝑁
SAMPLE DATA
QUESTION:
• EVERY MONTH YOU COUNT THE CHANGES IN YOUR CAR. HERE IS WHAT YOU GOT IN DOLLAR:
5.40 1.10 0.42 0.73 0.48 1.10
ANSWER
WE WILL USE SAMPLE DATA SO
σ𝑥
𝑋ത = 𝑛
5.40 + 1.10 + 0.42 + 0.73 + 0.48 + 1.10
𝑋ത =
6
9.23
𝑋ത =
6
𝑋ത = 1.54
2- MEDIAN
• THE MIDDLE VALUE OF THE DATA SET
• MUST BE IN ORDER
• FIND THE MIDDLE VALUE:
• IF ODD NUMBERS OF VALUES, THE MEDIAN IS THE MIDDLE NUMBER
• IF EVEN NUMBERS OF VALUES, THE MEDIAN IS THE MEAN OF THE TWO MIDDLE NUMBERS
1 3 4 5 6 7 → M=4.5
EXP: 8 , 3 , 5 , 11 , 13 , 4 , 6 → M=11
• 3 , 4 , 5 , 6 , 8 , 11 , 13 → M=6
6+8
EXP: 3 , 4 , 5 , 6 , 8 , 11 , 13 , 412 → 𝑀 = 2
,𝑀 = 7
0.73+1.10
EXP: 0.42 , 0.48 , 0.73 , 1.10 , 1.10 , 5.40 → 𝑀 = 2
, 𝑀 = 0.915
NOTE: THE RESULT WILL BE ALWAYS ONE DECIMAL PLACES EXTRA, NEVER ROUND UP UNTIL THE FINAL RESULTS.
OUTLIER FOR MEAN AND MEDIAN
• IF WE ADD 310.05 $ TO THE MEAN EXAMPLE THE MEAN WILL CHANGE, HOW EVER IF WE
ADDED IT TO THE MEDIAN IT WILL NOT CHANGE ANYTHING BECAUSE IT WILL BECOME AN
OUTLIER AND IT WILL NOT AFFECT THE RESULTS, BECAUSE ACCORDING TO THE PREVIOUS
EXAMPLE ABOUT MEDIAN THIS NUMBER IS WAY OFF THAN THE OTHER NUMBERS.
• IT WILL BE AN OUTLIER BECAUSE IT IS NOT EVEN CLOSE TO THE OTHER MAJORITY OF THE
DATA.
• MEAN IS VERY AFFECTED BY THE OUTLIER BECAUSE YOU ARE ADDING ALL DATA PIECES, BUT
THE MEDIAN DOESN’T BECAUSE YOU PUT THEM IN ORDER.
3- MODE
• THE MOST COMMONLY OCCURRENT DATA VALUE
EXP: 5.40 , 1.10 , 0.42 , 0.73 , 0.48 , 1.10
• MODE=1.10
EXP: 27 , 27 , 27 , 55 , 55 , 55 , 88 , 88, 87
• MODES: 27 ,55
EXP: 1 , 2 , 4 , 7, 9 , 10 ,12
• MODE=Ø
MEAN OF THE FREQUENCY DISTRIBUTION
Age 𝒇 𝑿 𝒇 .𝒙
Class Midpoint
21-30 28 25.5 714 σ 𝑓.𝑥
𝑋ത = 𝑛
31-40 30 35.5 1065
41-50 12 45.5 546 2718
51-60 2 55.5 111 𝑋ത = 76 = 35.76
61-70 2 65.5 131
71-80 2 75.5 151
𝑛 = 76
𝑓. 𝑥 = 2718
WEIGHTED MEAN
FINDING THE MEAN OF THE WEIGHTED DISTRIBUTION
185
Lets guess you have on homework ∗ 100 = 0.84 ∗ 100 = 84
220
It is similar to the idea of frequency distribution.
𝑾 X 𝑿. 𝑾
Points
HW 15% 70 10.5
σ 𝑥. 𝑤 82.85 T1 20% 90 18.0
𝑋ത = = = 8285 T2 20% 68 13.6
𝑤 100 T3 20% 85 17.0
F 25% 95 23.75
σ 𝑤 = 100%
𝑋. 𝑊 = 82.85
FINDING THE MEAN OF THE WEIGHTED DISTRIBUTION
Another example
𝑾 X 𝑿. 𝑾
Points
HW 15% 70 10.5
σ 𝑥. 𝑤 42.1
𝑋ത = = = 7654 T1 20% 90 18.0
𝑤 55 T2 20% 68 13.6
T3
F
𝑋. 𝑊 = 42.1
𝑤 = 55%
76.5 %
NORMAL DISTRIBUTION , SKEWED RIGHT , SKEWED LEFT
• THE OUTLIERS SKEW YOUR DATA.
WHAT IS THE MOST COMMON WAY TO DESCRIBE
THE CENTER OF THE DATA
• THE MEDIAN
• THE MODE
• THE MEAN
VARIATION
• HOW THE DATA IS SPREAD
• THREE BANK LINES Person 1 Person 2 Person 3
σ (𝑋−𝑥)2
•𝑆=
𝑛−1
• X: DATA VALUE
ത MEAN
• 𝑋:
• WE NEED TO MAKE THE EQUATION SQUARED TO BE POSITIVE ALWAYS.(BECAUSE IF WE HAVE
POSITIVE AND NEGATIVE NUMBERS WE WILL HAVE ZERO. SO THAT WILL BE THE DISTANCE.
THERE ARE ANOTHER FORMULA INSTEAD OF THE
PREVIOUS ONE
𝑛 σ (𝑋)2− (σ 𝑥)2
• 𝑆= → HERE WE WILL HAVE THE SAME RESULTS,
𝑛(𝑛−1)
BUT WE WILL NOT NEED TO CALCULATE THE MEAN,
SO IT WILL BE EASIER
• EXP:
• FIND THE STANDARD DEVIATION OF VALUES 1,3,14 x x-x̅̅̅̅̅̅̅̅ (x-x̅̅̅̅)2
• ANSWER-1:
1 1-6=-5 25
3 3-6=-3 9
σ (𝑋−𝑥)2 14 14-6=8 64
• 𝑆=
𝑛−1
σ (𝑋 − 𝑥)2 =98
• X̅̅̅̅̅̅̅̅=6
98
• S=7 𝑆=
3−1
S= 49=7
EXP:
FIND THE STANDARD DEVIATION OF VALUES 1,3,14
• ANSWER 2:-
𝑛 σ (𝑋)2− (σ 𝑥)2
• 𝑆=
𝑛(𝑛−1)
• 𝑆=
3∗206− 18 2
x x2
3(3−1)
1 1
• 𝑆=
618− 324 3 9
6
14 196
294
• 𝑆=
6
𝑥 = 18 (𝑥2) = 206
• 𝑆 = 49
• S=7
EXP:
FIND THE STANDARD DEVIATION OF VALUES 4,7,7
σ (𝑋 − 𝑥)2
𝑆=
𝑛−1 x
4
7
𝑛 σ (𝑋)2 − (σ 𝑥)2 7
𝑆=
𝑛(𝑛 − 1)
𝑥 = (𝑥2) =
EXP:
FIND THE STANDARD DEVIATION OF VALUES 4,7,7
𝑛 σ (𝑋)2 − (σ 𝑥)2
𝑆=
𝑛(𝑛 − 1)
x x2
4 16
2
7 49
3 ∗ 114 − 18 7 49
𝑆=
3∗2
𝑥 = 18 (𝑥2) = 114
18
𝑆=
6
𝑆= 3
𝑆 = 1.73
STANDARD DEVIATION () FOR A POPULATION
σ 𝑥−𝜇 2
=
𝑁
• WHEN YOU CALCULATE THE STANDARD DEVIATION YOU AUTOMATICALLY CALCULATED THE
VARIANCE.
• THE VARIANCE IS THE NUMBER BEFORE YOU TAKE THE SQUARE ROOT, SEE PREVIOUS EXAMPLE
• 𝑆 = 3 → VARIANCE=3
• 𝑆 = 49 →VARIANCE=49
• WE SEE IT IS VERY EASY, IF WE HAVE STANDARD DEVIATION WE CAN EASILY HAVE THE VARIANCE.
• SAMPLE VARIANCE : S2
• POPULATION VARIANCE: 2
STANDARD DEVIATION RELATIVITY TO DATA SPREAD
OUT?
• IS LARGE STANDARD DEVIATION MEANS THAT THE DATA IS REALLY SPREAD OUT? IS THAT TRUE.
• SO THE AVERAGE DISTANCE FROM THE MEAN IS GREAT.
• THE MORE THE STANDARD DEVIATION MEANS THAT MY DATA IS MORE SPREAD OUT, THAN
CLOSE TOGETHER.
• CLOSE DATA WILL HAVE SMALL STANDARD DEVIATION, AND SPREAD PART DATA WILL HAVE
LARGE STANDARD DEVIATION
EMPIRICAL RULE:
• 68% OF DATA SET WILL FALL IN 1 STANDARD DEVIATION OF MEAN
• IF I TAKE YOUR AVERAGE AS A CLASS(THE HEIGHT) , AND TAKE THE STANDARD DEVIATION ABOVE AND BELLOW THAT
AVERAGE/MEAN, 60 % OF YOU WILL FALL IN THAT RANGE
• 95% OF DATA SET WILL FALL IN 2 STANDARD DEVIATION OF MEAN
• 99.7% OF DATA SET WILL FALL IN 3 STANDARD DEVIATION OF MEAN
• IF A DATA VALUE LIES WITHIN 2 STANDARD DEVIATION OF THE MEAN, IT IS CONSIDERED USUAL VALUE
• IF A DATA VALUE LIES OUT OF THE 2 STANDARD DEVIATION OF THE MEAN, IT IS CONSIDERED UNUSUAL VALUE(CAN WE
EVER COVER THE 100% OF OUR DATA? IF WE KEEP GOING, IN PRACTICALLY WE MIGHT BE ABLE TO, BUT IN THEORY WE
CANT) , TO COVER THE ENTIRE POPULATION THERE MIGHT BE ALWAYS A CHANCE TO GET SOMETHING HIGHER OR
LOWER THAN YOU COULD IMAGINED.
• SOMETIMES YOU CAN NEVER REACH THE END OF THE DATA
• IF A DATA VALUE LIES OUTSIDE 3 STANDARD DEVIATION FROM THE MEAN, IT IS EXTREMELY RARE
EXAMPLE
• SAMPLE: HEIGHTS ARE NORMALLY
DISTRIBUTED WITH THE MEAN OF 65”, AND A
STANDARD DEVIATION OF 3” INCHES.
ANSWER:
• WE HAVE X=65
68%
• WHERE IS THE S.D. IN THE GRAPH?
• SINCE THE S.D.=3
• THE NEXT S.D=68
• TO GET FROM 65 TO 68,
• WE ADD THE MEAN:
WE CALCULATE THE FIRST STANDARD DEVIATION 95%
• ONE S.D. TO THE RIGHT 56 59 62 68 71 74
• X+S.D➔ 65+3=68 -s +s
-3*s +3*s
EXAMPLE
• SAMPLE: HEIGHTS ARE NORMALLY DISTRIBUTED
WITH THE MEAN OF 65”, AND A STANDARD
DEVIATION OF 3” INCHES.
ANSWER-CON…2:
• WE HAVE X=65 68%
• WHERE IS THE S.D. IN THE GRAPH?
99.7%
• SINCE THE S.D.=3
WE CALCULATE THE THIRD STANDARD
DEVIATION?
95%
• ONE S.D. TO THE RIGHT
56 59 62 68 71 74
• X+3*S.D ➔ 65+9=74 -s +s
• ONE S.D. TO THE LEFT -2*s +2*s
X X
10 34 58
EXAMPLE
• MEAN IS 34 LBS, STANDARD DEVIATION IS 8 LBS
• WHAT PERCENT OF DATA FALL BETWEEN 10 LBS (POUNDS) AND 58 LBS (POUNDS) ?
X X
x-x
10 24 34 24 58
8
S
3 Std.Dev. 3 Std.Dev.
# of standard deviation
99.7% of data lies from the mean
COEFFICIENT OF VARIATION-EXAMPLE
Coefficient of variation
x s
Coefficient of variation converts standard deviation in
comparison to its mean to percentage, because we cant height 65” 3”
compare there units directly
weight 175 lbs 4 lbs
𝑠
𝐶. 𝑉 = ∗ 100%
𝑥
3
𝐶. 𝑉 = 65 ∗ 100% −→ = 4.6% _> this one variance
more
4
𝐶. 𝑉 = ∗ 100% −→ 2.3%
175