Lecture 02- Exploratory Data and Descriptive Statistics
Lecture 02- Exploratory Data and Descriptive Statistics
Analysis /
Descriptive
Statistics
• In statistics, exploratory data analysis is an
approach to analyzing data sets to
summarize their main characteristics, often
with visual methods.
• Perform Exploratory
Data Analysis (EDA) to
understand the distribution
of a variable and to check
for anomalies and outliers.
Descriptive Statistics
Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 9 No Mode
Probability & Statistics
QUARTILES
Q1 Q2 Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50%
are larger)
Only 25% of the observations are greater than the third
quartile
Probability & Statistics
Quartile Formulas
Find a quartile by determining the value in the
appropriate position in the ranked data, where
Quartiles
E x a m p l e : F i n d t h e first quartile
S a m p l e D a t a in O r d e r e d Ar r a y: 11 12 13 16 16 17 18 21 22
(n = 9)
Q 1 = is in the (9+1)/4 = 2.5 position of the r a n k e d d a ta
EXERCISE
Consider th e following stem-and-leaf display
Find Ra n ge , M e d i a n M o d e , Q1, Q 2 a n d Inter Quartile R a n g e
-2 2
-1 20
-0 5320
0 01146688
1 3357
2 23346889999
3 056789
4 235799
5 48
6 38
7
8 6
Probability & Statistics
EXERCISE
Consider th e following stem-and-leaf display
Find Ra n ge , M e d i a n M o d e , Q1, Q 2 a n d Inter Quartile R a n g e
-2 2 R a n g e = 8 6 – (-22)=108
-1 20
-0 5320 M e d i a n = (47+1)/2th value
0 01146688 = 24 t h Value
1 3357 = 26
2 23346889999
3 056789 Mode = 29
4 235799
5 48 Q 1 = (47+1)/4 t h Value = 12 t h value
6 38 =6
7 Q 3 = (47+1) *3/4 t h value = 36 t h Value
8 6 = 39
I QR = 3 9 – 6 = 3 3
Probability & Statistics
Measures of Variation
Va r i a t i o n
◼ M e a s u r e s of var i at i on g i v e
i n f o r m a t i o n o n t h e s p r e a d or
va r i a b i l i t y o f t h e d a t a v a l u e s .
S a m e c e n t e r,
different variation
Probability & Statistics
Range
Range
• S i m p l e s t m e a s u r e o f va r i at i o n
• D i f fe r e n c e b e t w e e n t h e l a rge st a n d t h e
smallest observations:
R a n g e = Xlargest – Xsmallest
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Saturday, J anuar y 2, 2 0 1 6
Probability & Statistics
Interquartile Range
• Can eliminate some outlier problems by using the
interquartile range
Saturday, J a n u a ry 2, 2 0 1 6
Probability & Statistics
Variance
– S a m p l e variance: n
(Xi − X) 2
S2 = i=1
n -1
W h e re
X= a r ithmetic m e a n
n = s a m p l e size
X i = i t h v a l u e o f t h e va r i a b l e X
Probability & Statistics
Standard Deviation
• M o s t c o m m o n l y u s e d m e a s u r e o f va r i at i o n
• S h o w s va r i at i o n a b o u t t h e m e a n
• H a s t h e s a m e u n i t s a s t h e o r i g i n a l d ata
– S a m p l e sta n d a rd d ev i at i o n :
(Xi − X) 2
S = i=1
n -1
Probability & Statistics
Population
Standard Deviation
Here we use the formula,
(x )
n
−x
2
i
i=1
=
n
Shape of a Distribution
• D e s c r i b e s h o w d ata is dist ributed
• Measures of shape
• – Symmetric or skewed
M i n i m u m -- Q 1 -- M e d i a n -- Q 3 -- M a x i m u m
Example:
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3