Chapter 4 - Dimension Reduction: Data Mining For Business Intelligence
Chapter 4 - Dimension Reduction: Data Mining For Business Intelligence
where X1 and X2 are the original variables in the dataset; a1, a2, b1, b2 are called “weights”, and
Z1: first principal component
Z2: second principal component
PCA output for these 2 variables
(XLMiner)
Top: weights to project original Components
data onto Z1 & Z2
Variable 1 2
e.g. (-0.847, 0.532) are weights for calories -0.84705347 0.53150767
Z1 and (0.532,0.847) are weights rating 0.53150767 0.84705347
for Z2
Variance 498.0244751 78.932724
Bottom: reallocated variance for Variance% 86.31913757 13.68086338
new variables Cum% 86.31913757 100
P-value 0 1
Z1 : 86% of total variance
Z2 : 14%
Row Id. 1 2
100%_Bran 44.92 2.20
100%_Natural_Bran -15.73 -0.38
All-Bran 40.15 -5.41
All-Bran_with_Extra_Fiber 75.31 13.00
Almond_Delight -7.04 -5.36
Apple_Cinnamon_Cheerios -9.63 -9.49
Apple_Jacks -7.69 -6.38
Basic_4 -22.57 7.52
Bran_Chex 17.73 -3.51
overlap)
Generalization
• X1, X2, X3, … Xp, original p variables