Class 2 Exploratory Data Analysis
Class 2 Exploratory Data Analysis
16
14
12
10
No of obs
0
<= 10 (10,20] (20,30] (30,40] > 40
MR
Histogram of MR Variable
ADVANTAGES
1. Provides powerful insight as to how the the data is distributed
2. A normal curve can be superimposed to see how normally the data are
distributed.
The Histogram
Histogram (EDA.STA 5v*35c) Histogram (EDA.STA 5v*35c)
22
16
20
18 14
16
12
14
10
12
No of obs
No of obs
10 8
8 6
6
4
4
2
2
0 0
<= 10 (10,20] (20,30] (30,40] > 40 <= 10 (10,20] (20,30] (30,40] > 40
IM MR
Histogram of IM Variable
Histogram of MR Variable
6
5
4
3
2
1
0
<= 10 (10,20] (20,30] (30,40] > 40
SM
Histogram of SM Variable
Stem & Leaf Plot
• Breaks the data into “STEM” and “LEAVES”
– STEM: is the root value of a data point
– LEAF: is the residual value of a data point
– Consider the following: 37, 32 and 39
– Stem is “3” and leaves are 7, 2 and 9
• It gives the histogram (or a close
approximation) of the variable
• Each value in the dataset is represented and
is visible
• Allows visual identification of Mode
• However The choice of the stem may
produce different histograms.
Stem & Leaf Plot
• The plot for MR Data looks as follows
42
38
34
30
MR
26
22
18
14
20 24 28 32 36 40
IM
Scatter Plot
Matrix Plot (EDA.STA 4v*35c)
IM
MR
SM
Box & Whisker Plot
• Used to examine group differences between
two or more variables. Can also be used on a
single variable
• Provides insight into
– Spread & Skewness of the data
– Extreme values and outliers
• Construction:
– Calculate Q1, Median and Q3 and Interquartile
Range (IQ)
– L1 = Q1 - 1.5 IQ U1 = Q3 + 1.5 IQ
– L2 = Q1 - 3.0 IQ U2 = Q3 + 3.0 IQ
– Data beyond L2 and U2 are “Extreme Values” and
data L1 - L2 and U1 - U2 are “Outliers”
Box & Whisker Plot
Box Whisker Plot from Selected Block
Cases 1 through 35
55
45
35
25
Max
15
Min
75th %
25th %
5
IM MR SM Median
1. Chernoff Faces are always complete. As such some components will always remain
constant