0% found this document useful (0 votes)
10 views

Week 4

Uploaded by

g-sk5103tmp05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Week 4

Uploaded by

g-sk5103tmp05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MULTIVARIATE DATA

ANALYSIS
Logistic Regression
Logistic Regression
 Regression analysis on data with categorical
outcomes based on one or more explanatory
variables using logistic function.
 Also known as logit regression.

Pr Yi  yi x1i ,..., xki   piyi 1  pi 


1 yi

i   0  1 x1i  ...   k xki


 pi 
logit pi   ln    0  1 x1i  ...   k xki
 1  pi 
Logistic Regression (cont.)
 Case Study 2: kyphosis
Variable Description
Kyphosis Presence/absence of the postoperative spinal deformity
Age Age of child in months
Number Number of vertebrae involved in surgery
Start Beginning of the range of vertebrae involved

 Perform a regression analysis to study the


relationship between the age, number and start
variables with the presence of kyphosis.
Logistic Regression (cont.)
Logistic Regression (cont.)
 Probit regression – a variation of logistic regression
 Example 1: See User’s Guide (pages 299 – 301)
Principal Component Analysis (PCA)
 PCA is a projection method, which finds projections
of maximal variability.
 It seeks linear combinations of the columns of data
X with maximal (or minimal) variance.
 Let S denote the covariance matrix of the data X.
nS  X  n 11 X  X  n 111T X   X T X  nx x T ,
1 T
T
x  1T X n
where x is the row vector of means of the variables.
 Then, sample variance of a linear combination xa of
a row vector x is aT a .
PCA (cont.)
 The above is maximized (or minimized) subject to
a  aT a  1 .
2

 Non-negative definite matrix and


eigendecomposition:
  C C
T

 where  is a diagonal matrix of (non-negative)


eigenvalues in decreasing order.
PCA (cont.)
 Let b = Ca (same length as a since C is orthogonal).
 Equivalently, maximize
bT b   i bi2
subject to
 i  1.
b 2

 Variance is maximized by taking b to be the first


unit vector, or equivalently taking a to be the column
eigenvector corresponding to the largest eigenvalue
of  .
PCA (cont.)
 Taking subsequent eigenvectors gives combinations
with as large as possible variance that are
uncorrelated with those that have been taken
earlier.
 The ith principal component is then the ith linear
combination picked by this procedure.
 The first k principal components span a subspace
containing the ‘best’ k-dimensional view of the data.
 It has maximal covariance matrix.
PCA (cont.)
 It also best approximates the original points in the
sense of minimizing the sum of squared distances
from the points to their projections.
 The first few principal components are often useful
to reveal structure in the data.
 The principal components corresponding to the
smallest eigenvalues are the most nearly constant
combinations of the variables.
PCA (cont.)
 Note: Principal components depend on the scaling
of the original variables. It is conventional to take
the principal components of the correlation matrix,
implicitly rescaling all variables to have unit sample
variance.
 Example 2: See S+ User’s Guide (pages 344 –
346)
 loadings are the columns giving the linear
combinations of a for each principal component.
 scores are the data on the principal components.
PCA (cont.)

-15 -10 -5 0 5 10

132

0.2
118
16

10
log(Sepal.Width)
34 110
15
33
1719
6

0.1
125
137 126 log(Sepal.Length)
145

5
471145
20 149
51121106
144 136
49 101
8657111140
142
22 52 53
116
87 130
141
103
37 66 108
5 138113
146131
78 123
38 41128 32
2144
18 7176 105
117
148
23 40 2927 59
8 24 128
92
139

0.0
75104
150 77 log(Petal.Width)
119 log(Petal.Length)

Comp.2
1225 62

0
750 96 98
64 55133
129
896779134
127
36 85 74
9772 112
30
3 65 124
115
4835
10 56
100122
84
43 431 26 83143
68 102
135109
13 246 95 147
93 73

-5
80

-0.1
39 6091 114
14 9
70
90 88
81
99107
82
69
5463120

-10
58
94

-0.2
42

-15
61

-0.2 -0.1 0.0 0.1 0.2

Comp.1
Multivarite ANOVA (MANOVA)
 Extension of univariate ANOVA to correlated
multiple responses from a subject.
 Example 3: See S+ User’s Guide (pages 346 –
347)

You might also like