0% found this document useful (0 votes)
10 views29 pages

3-Lect - Finding The Center of Data Set. Mean, Median, Mode

The document discusses methods for finding the center of a data set, including mean, median, and mode, and explains their characteristics and calculations. It also covers measures of variation such as range and standard deviation, highlighting their importance in understanding data spread. Additionally, the document introduces the empirical rule for normally distributed data and the concept of outliers affecting mean and median calculations.

Uploaded by

belaoo4454
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views29 pages

3-Lect - Finding The Center of Data Set. Mean, Median, Mode

The document discusses methods for finding the center of a data set, including mean, median, and mode, and explains their characteristics and calculations. It also covers measures of variation such as range and standard deviation, highlighting their importance in understanding data spread. Additionally, the document introduces the empirical rule for normally distributed data and the concept of outliers affecting mean and median calculations.

Uploaded by

belaoo4454
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

FINDING THE CENTER OF DATA SET.

MEAN, MEDIAN, MODE


M MAIKEY ZAKI BIA
DESCRIBING DATA

FIVE CHARACTERISTICS
1. CENTER
2. VARIATION
3. DISTRIBUTION
4. OUTLIER
5. CHANGES OVER TIME
• CENTER (THE MIDDLE OF DATA SET),

• THE VALUE THAT THE DATA SURROUNDING CENTER


THREE WAYS

1- MEAN: ARITHMETIC AVERAGE


ADD ALL THE VALUES AND DIVIDE BY THE NUMBER OF VALUES ADDED
σ𝑥
𝑀𝐸𝐴𝑁 =
# 𝑜𝑓 𝑉𝐴𝐿𝑈𝐸𝑆

෍ = 𝑆𝑈𝑀

𝑋 = 𝐷𝐴𝑇𝐴 𝑉𝐴𝐿𝑈𝐸
𝑛 = # 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠/𝑖𝑡𝑒𝑎𝑚𝑠(𝑖𝑛 𝑎 𝑠𝑎𝑚𝑝𝑙𝑒)
𝑁 = # 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠/𝑖𝑡𝑒𝑎𝑚𝑠(𝑖𝑛 𝑎 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛)
𝑋ത = 𝑆𝐴𝑀𝑃𝐿𝐸 𝑀𝐸𝐴𝑁
𝑀 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑀𝑒𝑎𝑛

THERE ARE DIFFERENT LETTERS TO TALK ABOUT POPULATIONS OR WHEN WE TALK ABOUT SAMPLES, THERE ARE DIFFERENT LETTERS TO SHOW
THAT. (YOU WILL DO THE SAME MATH BUT IT IS DIFFERENT WAYS CALLING IT)

FOR PARAMETERS AND STATISTICS WE USE DIFFERENT LETTERS, THEY DO EXACTLY THE SAME THINGS.
σ𝑥
𝑋ത = SAMPLE
𝑛

σ𝑥
𝑀= POPULATION
𝑁
SAMPLE DATA
QUESTION:
• EVERY MONTH YOU COUNT THE CHANGES IN YOUR CAR. HERE IS WHAT YOU GOT IN DOLLAR:
5.40 1.10 0.42 0.73 0.48 1.10

ANSWER
WE WILL USE SAMPLE DATA SO

σ𝑥
𝑋ത = 𝑛
5.40 + 1.10 + 0.42 + 0.73 + 0.48 + 1.10
𝑋ത =
6

9.23
𝑋ത =
6

𝑋ത = 1.54
2- MEDIAN
• THE MIDDLE VALUE OF THE DATA SET
• MUST BE IN ORDER
• FIND THE MIDDLE VALUE:
• IF ODD NUMBERS OF VALUES, THE MEDIAN IS THE MIDDLE NUMBER
• IF EVEN NUMBERS OF VALUES, THE MEDIAN IS THE MEAN OF THE TWO MIDDLE NUMBERS

1 3 4 5 6 7 → M=4.5
EXP: 8 , 3 , 5 , 11 , 13 , 4 , 6 → M=11

• 3 , 4 , 5 , 6 , 8 , 11 , 13 → M=6

6+8
EXP: 3 , 4 , 5 , 6 , 8 , 11 , 13 , 412 → 𝑀 = 2
,𝑀 = 7

0.73+1.10
EXP: 0.42 , 0.48 , 0.73 , 1.10 , 1.10 , 5.40 → 𝑀 = 2
, 𝑀 = 0.915

NOTE: THE RESULT WILL BE ALWAYS ONE DECIMAL PLACES EXTRA, NEVER ROUND UP UNTIL THE FINAL RESULTS.
OUTLIER FOR MEAN AND MEDIAN

• IF WE ADD 310.05 $ TO THE MEAN EXAMPLE THE MEAN WILL CHANGE, HOW EVER IF WE
ADDED IT TO THE MEDIAN IT WILL NOT CHANGE ANYTHING BECAUSE IT WILL BECOME AN
OUTLIER AND IT WILL NOT AFFECT THE RESULTS, BECAUSE ACCORDING TO THE PREVIOUS
EXAMPLE ABOUT MEDIAN THIS NUMBER IS WAY OFF THAN THE OTHER NUMBERS.
• IT WILL BE AN OUTLIER BECAUSE IT IS NOT EVEN CLOSE TO THE OTHER MAJORITY OF THE
DATA.
• MEAN IS VERY AFFECTED BY THE OUTLIER BECAUSE YOU ARE ADDING ALL DATA PIECES, BUT
THE MEDIAN DOESN’T BECAUSE YOU PUT THEM IN ORDER.
3- MODE
• THE MOST COMMONLY OCCURRENT DATA VALUE
EXP: 5.40 , 1.10 , 0.42 , 0.73 , 0.48 , 1.10
• MODE=1.10
EXP: 27 , 27 , 27 , 55 , 55 , 55 , 88 , 88, 87
• MODES: 27 ,55
EXP: 1 , 2 , 4 , 7, 9 , 10 ,12
• MODE=Ø
MEAN OF THE FREQUENCY DISTRIBUTION
Age 𝒇 𝑿 𝒇 .𝒙
Class Midpoint
21-30 28 25.5 714 σ 𝑓.𝑥
𝑋ത = 𝑛
31-40 30 35.5 1065
41-50 12 45.5 546 2718
51-60 2 55.5 111 𝑋ത = 76 = 35.76
61-70 2 65.5 131
71-80 2 75.5 151
𝑛 = 76
෍ 𝑓. 𝑥 = 2718
WEIGHTED MEAN
FINDING THE MEAN OF THE WEIGHTED DISTRIBUTION
185
Lets guess you have on homework ∗ 100 = 0.84 ∗ 100 = 84
220
It is similar to the idea of frequency distribution.

𝑾 X 𝑿. 𝑾
Points
HW 15% 70 10.5
σ 𝑥. 𝑤 82.85 T1 20% 90 18.0
𝑋ത = = = 8285 T2 20% 68 13.6
𝑤 100 T3 20% 85 17.0
F 25% 95 23.75
σ 𝑤 = 100%
෍ 𝑋. 𝑊 = 82.85
FINDING THE MEAN OF THE WEIGHTED DISTRIBUTION

Another example

𝑾 X 𝑿. 𝑾
Points
HW 15% 70 10.5
σ 𝑥. 𝑤 42.1
𝑋ത = = = 7654 T1 20% 90 18.0
𝑤 55 T2 20% 68 13.6
T3
F

𝑋. 𝑊 = 42.1
෍ 𝑤 = 55% ෍
76.5 %
NORMAL DISTRIBUTION , SKEWED RIGHT , SKEWED LEFT
• THE OUTLIERS SKEW YOUR DATA.
WHAT IS THE MOST COMMON WAY TO DESCRIBE
THE CENTER OF THE DATA
• THE MEDIAN
• THE MODE
• THE MEAN
VARIATION
• HOW THE DATA IS SPREAD
• THREE BANK LINES Person 1 Person 2 Person 3

• THREE PERSONS WAITING IN BANK LINE #1 6 6 6 𝑋ത =6


#2 4 7 7 𝑋ത =6
#3 1 3 14 𝑋ത =6

WAYS TO MEASURE VARIATION:

1. RANGE: MAX VALUE – MIN VALUE


• EASY TO FIND
• BUT DOES NOT CONSIDER ALL THE VALUES
2- STANDARD DEVIATIONS
• MOST USED MEASURE FOR VARIATION, MOST IMPORTANT AND USEFUL
• MEASURES THE AVERAGE DISTANCE YOUR DATA POINTS/VALUES ARE FROM THE MEAN.
• IT IS NEVER GOING TO BE NEGATIVE BECAUSE IT IS DISTANCE
• IT IS NEVER ZERO UNLESS ALL THE DATA ENTRIES ARE THE SAME.
• GREATLY AFFECTED BY OUTLIERS.
• SAMPLE STANDARD DEVIATION IS DENOTED: S

σ (𝑋−𝑥)2
•𝑆=
𝑛−1
• X: DATA VALUE
ത MEAN
• 𝑋:
• WE NEED TO MAKE THE EQUATION SQUARED TO BE POSITIVE ALWAYS.(BECAUSE IF WE HAVE
POSITIVE AND NEGATIVE NUMBERS WE WILL HAVE ZERO. SO THAT WILL BE THE DISTANCE.
THERE ARE ANOTHER FORMULA INSTEAD OF THE
PREVIOUS ONE
𝑛 σ (𝑋)2− (σ 𝑥)2
• 𝑆= → HERE WE WILL HAVE THE SAME RESULTS,
𝑛(𝑛−1)
BUT WE WILL NOT NEED TO CALCULATE THE MEAN,
SO IT WILL BE EASIER

• EXP:
• FIND THE STANDARD DEVIATION OF VALUES 1,3,14 x x-x̅̅̅̅̅̅̅̅ (x-x̅̅̅̅)2
• ANSWER-1:
1 1-6=-5 25
3 3-6=-3 9
σ (𝑋−𝑥)2 14 14-6=8 64
• 𝑆=
𝑛−1
σ (𝑋 − 𝑥)2 =98
• X̅̅̅̅̅̅̅̅=6
98
• S=7 𝑆=
3−1
S= 49=7
EXP:
FIND THE STANDARD DEVIATION OF VALUES 1,3,14
• ANSWER 2:-

𝑛 σ (𝑋)2− (σ 𝑥)2
• 𝑆=
𝑛(𝑛−1)

• 𝑆=
3∗206− 18 2
x x2
3(3−1)
1 1
• 𝑆=
618− 324 3 9
6
14 196
294
• 𝑆=
6
෍ 𝑥 = 18 ෍(𝑥2) = 206
• 𝑆 = 49

• S=7
EXP:
FIND THE STANDARD DEVIATION OF VALUES 4,7,7
σ (𝑋 − 𝑥)2
𝑆=
𝑛−1 x
4
7
𝑛 σ (𝑋)2 − (σ 𝑥)2 7
𝑆=
𝑛(𝑛 − 1)
෍𝑥 = ෍(𝑥2) =
EXP:
FIND THE STANDARD DEVIATION OF VALUES 4,7,7
𝑛 σ (𝑋)2 − (σ 𝑥)2
𝑆=
𝑛(𝑛 − 1)
x x2
4 16
2
7 49
3 ∗ 114 − 18 7 49
𝑆=
3∗2
෍ 𝑥 = 18 ෍(𝑥2) = 114
18
𝑆=
6

𝑆= 3
𝑆 = 1.73
STANDARD DEVIATION () FOR A POPULATION

σ 𝑥−𝜇 2
=
𝑁

THERE ARE DIFFERENCE BETWEEN STANDARD DEVIATION FOR


A POPULATION AND STANDARD DEVIATION FOR SAMPLES.
VARIANCE
• WE WILL ALSO HAVE SAMPLE VARIANCE AND POPULATION VARIANCE.

• WHEN YOU CALCULATE THE STANDARD DEVIATION YOU AUTOMATICALLY CALCULATED THE
VARIANCE.

• THE VARIANCE IS THE NUMBER BEFORE YOU TAKE THE SQUARE ROOT, SEE PREVIOUS EXAMPLE
• 𝑆 = 3 → VARIANCE=3

• 𝑆 = 49 →VARIANCE=49
• WE SEE IT IS VERY EASY, IF WE HAVE STANDARD DEVIATION WE CAN EASILY HAVE THE VARIANCE.
• SAMPLE VARIANCE : S2
• POPULATION VARIANCE: 2
STANDARD DEVIATION RELATIVITY TO DATA SPREAD
OUT?
• IS LARGE STANDARD DEVIATION MEANS THAT THE DATA IS REALLY SPREAD OUT? IS THAT TRUE.
• SO THE AVERAGE DISTANCE FROM THE MEAN IS GREAT.
• THE MORE THE STANDARD DEVIATION MEANS THAT MY DATA IS MORE SPREAD OUT, THAN
CLOSE TOGETHER.
• CLOSE DATA WILL HAVE SMALL STANDARD DEVIATION, AND SPREAD PART DATA WILL HAVE
LARGE STANDARD DEVIATION

• CLOSELY GROUPED DATA WILL HAVE SMALL STANDARD DEVIATION


• SPREAD OUT DATA WILL HAVE A LARGE STANDARD DEVIATION
EMPIRICAL RULE
• WE TALKED ABOUT NORMAL DATA PREVIOUSLY, ONLY NORMAL DATA WORK WITH EMPIRICAL RULE, IF DATA
SKEWED LEFT OR RIGHT IT WILL NOT WORK.
• IF DATA SET IS NORMALLY DISTRIBUTED WE CAN USE THE EMPIRICAL RULE

EMPIRICAL RULE:
• 68% OF DATA SET WILL FALL IN 1 STANDARD DEVIATION OF MEAN
• IF I TAKE YOUR AVERAGE AS A CLASS(THE HEIGHT) , AND TAKE THE STANDARD DEVIATION ABOVE AND BELLOW THAT
AVERAGE/MEAN, 60 % OF YOU WILL FALL IN THAT RANGE
• 95% OF DATA SET WILL FALL IN 2 STANDARD DEVIATION OF MEAN
• 99.7% OF DATA SET WILL FALL IN 3 STANDARD DEVIATION OF MEAN
• IF A DATA VALUE LIES WITHIN 2 STANDARD DEVIATION OF THE MEAN, IT IS CONSIDERED USUAL VALUE
• IF A DATA VALUE LIES OUT OF THE 2 STANDARD DEVIATION OF THE MEAN, IT IS CONSIDERED UNUSUAL VALUE(CAN WE
EVER COVER THE 100% OF OUR DATA? IF WE KEEP GOING, IN PRACTICALLY WE MIGHT BE ABLE TO, BUT IN THEORY WE
CANT) , TO COVER THE ENTIRE POPULATION THERE MIGHT BE ALWAYS A CHANCE TO GET SOMETHING HIGHER OR
LOWER THAN YOU COULD IMAGINED.
• SOMETIMES YOU CAN NEVER REACH THE END OF THE DATA
• IF A DATA VALUE LIES OUTSIDE 3 STANDARD DEVIATION FROM THE MEAN, IT IS EXTREMELY RARE
EXAMPLE
• SAMPLE: HEIGHTS ARE NORMALLY
DISTRIBUTED WITH THE MEAN OF 65”, AND A
STANDARD DEVIATION OF 3” INCHES.
ANSWER:

• WE HAVE X=65
68%
• WHERE IS THE S.D. IN THE GRAPH?
• SINCE THE S.D.=3
• THE NEXT S.D=68
• TO GET FROM 65 TO 68,
• WE ADD THE MEAN:
WE CALCULATE THE FIRST STANDARD DEVIATION 95%
• ONE S.D. TO THE RIGHT 56 59 62 68 71 74
• X+S.D➔ 65+3=68 -s +s

• ONE S.D. TO THE LEFT -2*s +2*s

• X-S.D➔ 65-3=62 -3*s +3*s


EXAMPLE
• SAMPLE: HEIGHTS ARE NORMALLY
DISTRIBUTED WITH THE MEAN OF 65”, AND A
STANDARD DEVIATION OF 3” INCHES.
ANSWER-CON…1:
• WE HAVE X=65
68%
• WHERE IS THE S.D. IN THE GRAPH?
• SINCE THE S.D.=3

WE CALCULATE THE SECOND STANDARD DEVIATION?


• ONE S.D. TO THE RIGHT
95%
• X+2*S.D ➔ 65+6=71 56 59 62 68 71 74
• ONE S.D. TO THE LEFT -s +s
• X-2*S.D ➔ 65-6=59 -2*s +2*s

-3*s +3*s
EXAMPLE
• SAMPLE: HEIGHTS ARE NORMALLY DISTRIBUTED
WITH THE MEAN OF 65”, AND A STANDARD
DEVIATION OF 3” INCHES.
ANSWER-CON…2:
• WE HAVE X=65 68%
• WHERE IS THE S.D. IN THE GRAPH?
99.7%
• SINCE THE S.D.=3
WE CALCULATE THE THIRD STANDARD
DEVIATION?
95%
• ONE S.D. TO THE RIGHT
56 59 62 68 71 74
• X+3*S.D ➔ 65+9=74 -s +s
• ONE S.D. TO THE LEFT -2*s +2*s

• X-3*S.D ➔ 65-9=56 -3*s +3*s


EXAMPLE
• MEAN IS 34 LBS, STANDARD DEVIATION IS 8 LBS
• WHAT PERCENT OF DATA FALL BETWEEN 10 LBS (POUNDS) AND 58 LBS (POUNDS) ?

X X

10 34 58
EXAMPLE
• MEAN IS 34 LBS, STANDARD DEVIATION IS 8 LBS
• WHAT PERCENT OF DATA FALL BETWEEN 10 LBS (POUNDS) AND 58 LBS (POUNDS) ?

X X

x-x
10 24 34 24 58
8
S

3 Std.Dev. 3 Std.Dev.

# of standard deviation
99.7% of data lies from the mean
COEFFICIENT OF VARIATION-EXAMPLE
Coefficient of variation

Can we say weight have more spread data than other, we


cant compare too things here, we have inches and pounds

x s
Coefficient of variation converts standard deviation in
comparison to its mean to percentage, because we cant height 65” 3”
compare there units directly
weight 175 lbs 4 lbs
𝑠
𝐶. 𝑉 = ∗ 100%
𝑥
3
𝐶. 𝑉 = 65 ∗ 100% −→ = 4.6% _> this one variance
more

4
𝐶. 𝑉 = ∗ 100% −→ 2.3%
175

There is another way called Z score(will talk about it latter)

You might also like