Whati S "Mul T I Vari at E"?
Whati S "Mul T I Vari at E"?
s
“
multi
var
iat
e”?
Mul t
ivariatedat aanal ysisisasetofst atisti
cal
model st hatexami nepat ternsin
mul t
idimensi onal databyconsi deri
ng,atonce,
severaldat av ari
abl es.Itisanexpansi onof
bivari
atedat aanal y si
s, whichconsidersonl y
twov ariablesini t
smodel s.Asmul ti
vari
at e
model sconsi dermor evar i
ables,t
heycan
exami nemor ecompl exphenomenaandf i
nd
datapat ternst hatmor eaccur atel
yrepresent
thereal wor l
d.
Consi
derasanexampletheregressi
onmodel
—amet hodtoanal
yzecorr
elati
onsindat
a.The
non-
multi
var
iat
ecaseofregressi
onisthe
analy sisbet weent wov ari
ables,anditiscall
ed
abiv ar i
ater egression.I
tcouldbeused, f
or
i
nstance, toseehowt heheightofaswi mmer
correl atest oitsspeed.Bydoi ngabi vari
ate
regressi on, theanalystcouldfi
ndt hattal
ler
swimmer stendt oswimf aster
.Althoughi ti
s
ri
ght, weknowt hattheheightisnottheonl y
thi
ngi nfluencingspeed, sothebi v
ariatemodel
hardlyexpl ainsthecompl etephenomenaof
swimmi ng.
Incontrast,amul ti
var i
ater egression—al so
call
edmul tipleregression—coul dtakeinto
accountwaymor ev ari
abl es:weight,age,
carbohydratei nt
ake, proteinintake,amountof
trai
ninghour s,amountofr esti
nghour s,and
manyot hers.I ntheory,thehi gherthenumberof
vari
ables,themor eaccur at etheregressioncan
representthephenomenaofswi mmi ng,toa
pointwherei tcouldpi npoi ntthespeedofanew
swimmerwi thl i
tt
leer r
or..
Unt angli
ngt hewebofv ar i
abler el
ationships,
wher eeachonecor relatest omanyot hers,isat
thehear tofmul ti
v ari
at edat aanalysi s.I
nmany
cases, thehi ghert hesei nt er
correl
at i
ons, the
har derthet askofdet ectingmeani ngf uldat a
relati
onshi ps—si nceal lv ari
ablesseem t o
i
nf l
uencesomet hing, anyunder lyi
ngst r
ucture
oranycauseofef fectbecomesdi l
ut ed.Sol vi
ng
thisissuei spar tl
yat askoft heanal yst,t
hat
mustknowt hedat aandr educenoi sesand
biasesasmuchaspossi ble,andpar tlyoft he
mul ti
variatet echnique, thatknowshowt odeal
witht her emai nderimper fections.
So,amul tivar
iat
edat aanalysistri
estofind
patternsinaseaofdat av ari
ables.Butwhatare
thosepat t
erns?Whi chstati
sticalt
echniques
weredesi gnedtofindthem?Letusst epal i
tt
le
furt
herandseet hepossibil
i
tiesoft hi
stypeof
analysis.
Techni
quesar
eli
ket
ool
s:t
her
eisabunch;
somear ever
ysimi
lar
,somearemor egener
al
thanother
s,but
,moreimpor
tant
ly,
eachof
them hasausecase.
Multi
variat
eDat
aAnal
ysi
s
Techniques
Ther earet wocat egoriesofmul ti
variate
techniques, eachpur suingadi f
ferenttypeof
rel
at i
onshipi nthedat a:dependenceand
i
nt erdependence.Dependencer el
atest ocause-
effectsituationsandt riestoseeifonesetof
variablescandescr i
beorpr edictthev al
uesof
otherones.I nterdependencer eferstost r
uctur
al
i
nt ercorrel
ationandai mst ounder standthe
under l
y i
ngpat ter
nsoft hedata.
Thereareseveral mul
tiv
ariat
emodelscapabl
e
offindi
ngthoser el
ati
onships,andmanyfact
ors
dist
ingui
shthem.Oneoft heprimar
yfact
ors
thatmustbet akenintoaccountwhenchoosing
atechni
quei
sthenatur
eofthedatav
ari
abl
es:
t
heycanbemet r
icornon-
metr
ic.
Metricdat
av ari
ables:areal
way softhe
numer i
ctypeandrepresenti
nformat i
onthat
canbemeasur edbysomescal e.Examples
i
ncludeage( 20years)
,temperature(25ºC),
and
prof
it(US$2000).
Non-
met r
icvariables:
cat egori
zest hedata,
but
donotspecif
yitsmagni tude.Exampl esincl
ude
anoperat
ionalsy st
em (Wi ndows,Linux,macOS)
andhousesize( small
,medi um,large).
Notethatnon-
metri
cvari
ablescanalsobe
numericwheniti
snotatt
achedt oanyscal
e,
suchasav ari
abl
ethatdi
ctatest
heidnumber
ofobj
ects.
Thenat ureofvari
ablesisthepri
mar yfactor
thatdisti
ngui
shesmul ti
vari
atetechniques.The
foll
owingsecti
onssummar i
zesomeoft he
avail
ablemodel s,
theirgoals,
andthedat a
natur
esthatt
heycanoperat
e.I
tisnotan
ext
ensivel
ist
,buti
tcov
ersenoughtechni
ques
toanaly
zedatawit
hanycombinati
onofnatur
es.
DependenceTechni
ques
Dependencetechniquesinv
esti
gat
ecause-
ef
fectrel
ati
onshipsinthedata.
Theswi mmerexampl ement ionedear li
eri sa
classicuseofadependencet echnique:t hegoal
i
st oestablishacause-effectrelati
onship
betweeni ndependent—orpr edictor—v ari
ables
(thecause)anddependentv ar i
ables(the
effect)
.Ifthereaderi
sf amili
arwi thmachi ne
l
ear ni
ng—whi chsharesmanysi mi l
ari
tieswi t
h
mul ti
vari
atedataanalysis—dependence
techniquescanbeassoci at
edwi t
hsuper vi
sed
l
ear ni
ngt echni
ques.
I
ndependencetechniques,t
heanal
ystfeedsa
modelwi
thinputdata,speci
fyi
ngwhich
var
iablesareindependentandwhi char e
dependent.Theindependentvariablesarethe
onesthemodel wil
ltr
ytopredictorexplain(e.
g.,
swimmerspeed) .Thedependentv ari
ables(e.
g.,
swimmerhei ght)aretheonestheanal ystwants
tostudyhowmuchi taffect
sthei ndependent
ones.
Thegoalofalldependencetechniquesisto
est
abli
shacause- ef
fectrel
ati
onship.Themost
not
abledi
fferencesbetweenthem arethe
numberofindependentvari
ablestheysupport
andthenat
ureoft hevari
abl
esi nv
olved.
Soletusseehowt het
echni
quesrel
atet
othose
charact
eri
sti
csandwhentheycanbeused.
Mult
ipl
e
Regr
ession
DependentVar
iabl
e:onemet
ri
cvar
iabl
e.
I
ndependentVar
iabl
es’
Nat
ure:any.
Multipleregressioni sanopt i
onwhent he
analyststipulatesonl yonedependentv ari
able,
whichi smet ric.Ther esultofapplyinga
multipleregressioni sthedegr eeofi mpactthat
eachi ndependentv ari
ablehasont he
dependentone.Thatr esultalsoleadstoan
esti
mat i
onfunct ion,wher eitacceptsvaluesfor
theindependentv ar
iablesandr etur
nsthe
expect edvaluef orthedependent .
Ananal
ystcoul
dusemul
ti
pler
egr
essi
on,
for
i
nstance, t
opr edi
ctsalesperformanceof
diff
erentstoresbasedoni tsatt
ributes(e.
g.,
numberofv endor
s,numberofhour sopen) .
Suchanal ysiswouldleadtoadeeper
understandingofwhatmakeseachst oresell
mor e,whichcoulddriveadministrat
ivechanges
i
nt hemosti mportantattr
ibutestowardsv al
ues
thatgivehigherprofi
t.
Conj
ointAnal
ysi
s
DependentVar
iabl
e(s)
:onev
ari
abl
eofany
nat
ure.
I
ndependentVar
iabl
es’
Nat
ure:
non-
met
ri
c.
Conjointanaly
sisisanoptionwhent he
i
ndependentv ari
ablesarenon-metri
c, andthey
aff
ectonl yonevariabl
e.I
fanyonewer easked
howt hisanalysi
scouldbedone, ani
nt uit
ive
answerwoul dbet otestal
lthecombi nati
onsof
l
evelsinthenon- metri
cvari
ablesandobser ve
thev al
ueofthedependentv alueineachof
them.Howev er,t
hatcouldbev erycostl
y ,
asthe
numberofcombi nati
onsgr owsexponent ial
l
y
witheachnewi ndependentv ari
able.The
attr
acti
venessofconjointanalysisisthatthere
i
snoneedt otestal
latt
r i
butecombi nationsto
achievegoodresult
s,socollectingthedat a
neededf ort
hisanaly
sisisfasterandcheaper .
Thatmakest heconjointanal ysisawidelyused
techniqueinthecommer cial domain,wher ean
analystmaywantt oexami net heaccept anceof
theuser( t
hedependentv ariable)upon
productsofv aryi
ngat t
ri
but es( i
ndependent
vari
ables)withoutwast i
ngt oomanyr esources.
Forinstance,ifaproducthast hreeindependent
vari
ables(e.g.,col
or,si
ze, pricepercepti
on),
i
nsteadofmeasur inguser s’acceptancei nal
l
27combi nations,thecoll
ect eddat aonlyneed
tohav eafewcombi nati
onsoft hem—t he
techniquetakescar eofisolat i
ngtheeffectsof
eachv
ari
abl
e.
Theconjoi
ntanal y
sisisdeeplyt
iedtothe
eff
ici
encyoft hedatacoll
ecti
onprocess,
soi
tis
mostuseful whenthedatahasnotbeen
col
lect
ed,insteadofusingiti
nanalready
completedataset.
Mult
ipl
eDi
scr
imi
nant
Anal
ysi
s
DependentVar
iabl
e(s)
:onenon-
met
ri
cvar
iabl
e
I
ndependentVar
iabl
es’
Nat
ure:
met
ri
c
Multiplediscri
minantanal y
sisisverysimi l
art
o
machi nelearningclassi
fier
s.Itisanopt i
on
whent herei sonlyonedependentv ari
able,
whichi snon- metri
c—al socalled“class”or
“l
abel”.Thegoal istounder st
andt he
characteri
sticofthedatat hatpertaintoeach
cl
ass.
Thecl assi
cexampl eisclassification.Af ter
processingthedata,themodel cancl assi fy
futureentri
esthatdon’thav elabel s.For
i
nst ance,amodel couldanal yzechar acterist
ics
ofmusi cfragments( dependentv ariables),
wher easeachpieceisassi gnedt oamusi cal
genr e(i
ndependentv ari
able).Iftheanal yst
buildsasuccessfulmodel ,itcancl assifythe
genr eoffr
agment sitneversawbef ore.
Mult
ipl
ediscri
minantanal
ysisisnotopt
imal
whensomeoft heindependentvari
abl
esare
non-
metri
c,meaningthatori
ginal
lymetri
cones
l
eadtobetterr
esult
s.
Li
nearProbabi
li
ty
Models
DependentVar
iabl
e(s)
:onenon-
met
ri
cvar
iabl
e
(pr
efer
ablybi
nary)
I
ndependentVar
iabl
es’
Nat
ure:
any
Linearprobabili
tymodel swor ksimilarl
yto
mul t
ipl
ediscriminantanal ysi
s—t hegoal isto
classif
yanon- met r
icdependentv ar
iable—but
withoutthelimitat
ionofr equir
ingmet ri
c
i
ndependentv ari
ables.Howev er,another
l
imi t
ati
ont akesitsplace:thistechni
quewor ks
betterwhent hedependentv ari
ableisbinary
;
thatis,i
tonlyhast wol evel
s.
Ift
hecl assif
icati
oni nvolvesmul t
iplepossible
l
abelsf orthedependentv ari
ableandt he
i
ndependentv ari
ablesar emet r
ic,theanalyst
shouldgivepr eferencet omulti
pledi scri
minant
analysi
s.Ifthecl assif
icati
oninvolvesabi nary
dependentv ariableandt heindependent
vari
ablesincludenon- met ri
cones, itisbett
erto
applyli
nearpr obabili
tymodel s.
Mul
tivar
iat
eAnal
ysi
sof
Var
ianceandCovar
iance
DependentVar
iabl
e(s)
:manymet
ri
cvar
iabl
es
I
ndependentVar
iabl
es’
Nat
ure:
non-
met
ri
c
Themul tivariat
eanal ysisofv ari
ance( MANOVA)
andmul tiv
ar iat
eanal ysisofcov ari
ance
(MANCOVA)ar et echniquest hattheanalyst
canuset omeasur etheef fectofmanynon-
metri
ci ndependentv ariablesontwoormor e
dependentmet ricvariables.Ift
her eaderis
famil
iarwit hANOVA—t hatsuppor tsonlyone
dependentv ari
able—t heMANOVAi sthe
multi
variateext ensionoft hattechnique.
Thenat ureofthedatathatMANOVAsuppor ts
makesi tsuit
ablefort
her esearchdomain.To
testahy pot
hesis,
researchersusuall
y
mani pul
ateanon-met r
icvariablewi
thtwoor
mor elevel
s—cal l
edtreatment s—andthentake
somemeasur estoseei ftheobjectsunderone
tr
eatmentarediffer
entfrom theobject
sunder
other
s.Howev er,
compl exresearchmayr equi
re
manyindependentv ari
ablesandmany
measures:t
hat’swer eMANOVAcomesi n.
Considertheexampleofat eam of
aerodynamicsengineerswhoaredesigninga
newai r
craftandwanttomeasur eifseveral
combinationsofenginesandwingsaf f
ectt he
magnitudeoft hefor
cesinairpl
anes(e.
g., t
hrust
,
drag,l
if
t,weight)
.
Inasi mul ati
onenv ironment ,theengi neers
chooset hreetypesofengi nes( E1,E2,E3)and
threet ypesofwi ngs( W1, W2, W3)—bot hthe
engi netypeandt hewi ngt ypear ei
ndependent
variables.Theydev el opsev eralair
planesforall
oft heengi ne-
wingcombi nati
onsandl aunch
them inmanyv irtual spacestocol lectasmuch
forcedat aaspossi ble(thedependent
variables).
Theappl i
cationofMANOVAi nthecol l
ected
datacouldr evealt
hatthecombi nati
onofE1-
W2i ssigni
ficantl
ywor se,whi l
eE3- W1i s
signi
fi
cantlybetter
.Theengi neerscanseehow
eachengine, eachwing,andeachcombi nat
ion,
i
mpact soneachoft hef orces.Itisnotaneasy
techni
quet oconductort ointerpretbutisa
rewardi
ngandpower fulone.
Themul tipl
ecov ari
anceanal ysi
s( MANCOVA)
canfine-tunet heresul tsandr ei
nforcethe
study’
sv ali
dit
ybyr emov i
ngt heeffectsof
possibleunobser v
edv ari
abl es( f
orexample,
whetheri twasr ainingornoti nthesimulat
ions).
Thus,ev eni ft
hesef actor
saf fectthedependent
vari
ables, theMANCOVAr educesi tsi
mpact sto
i
solatet heef f
ectoft hetreat mentsasmuchas
possible.
I
nter
dependenceTechni
ques
I
nter
dependencet quesai
echni mtofi
ndan
i
nnerst
ructur
einthemidstofdat
achaos.
Int
erdependencet echniquesdonotai m at
solvi
ngcause-effectproblems,butinst
eadt o
underst
andt heunder ly
ingstr
uctureofthedata.
Thatmakeseachoft hem ver
ydisti
nguishabl
e,
withdif
ferentgoalsandneeds.
Fact
orAnal
ysi
s
Goal
:underst
andwhichvar
iabl
eshi
ghl
y
cor
rel
atetoother
s.
Fact oranalysi
saimsatr educingthe
dimensi onali
tyoft hedatabyr educi ngthe
numberofdat av ariabl
es.Itdetectsgr oupsof
variableswithhighcor rel
ation,whi chthe
anal ystmayuseasabasi st ocreat eanew
variablethatcanr eplacethem wi t
hl i
tt
le
i
nf ormat i
onloss.Fact oranalysi
si ncludes
techni quessuchaspr i
ncipalcomponent
anal
ysi
sandcommonf
act
oranal
ysi
s.
Analyst scommonl yusethist ypeoftechni
que
asapr e-pr
ocessi ngsteptot r
ansformt hedata
beforeusi ngot hermodels.Whent hedatahas
toomanyv ari
ables,theperformanceof
multivariat
et echniquestendst obesubopt i
mal,
aspat ternsar emor edif
ficul
ttof i
nd.Byusing
factoranal ysi
st ocondensei nformati
onintoa
smal l
ersetofnewv ari
ables,thepatter
ns
becomel essdi l
utedandeasi ertoanalyze.
Cl
ust
erAnal
ysi
s
Goal
:fi
ndpat
ter
nsi
ndat
aent
ri
es
Clusteranal
ysisaimsatdet ecti
nggroups
(cl
usters)ofdataentri
esthathav esimil
ar
values.Thi
stechniqueisnotanexcl usi
vit
yof
mul t
ivar
iat
edat aanaly
sis:evenunidimensional
datacanbecl uster
ed.Howev er
,thi
staskisway
harderwhent herearemorev ari
ablesinwhi ch
t
ocompar
ethedat
a.
Analystscanuseclusterstounder
standthe
di
st r
ibuti
onofentri
es.Byfindi
ngsimil
ardata
points,t
heanalystcanreasonaboutthi
s
si
mi l
arit
yandgrowi tsknowledgeaboutthe
behav i
orthatdr
ivesthevaluesoft
heanalyzed
objects.
Forinstance,i
ncommer ci
aldata,suchanal ysis
coul
dr esulti
nacknowledgingt heexistenceof
agroupofconsumer st hathavesi milar
characteri
sti
csandbuycer tai
npr oductsvery
oft
en—aconsumerpr ofil
e.Theor ganizati
on
canthent akeacti
ontomakesuchpr oducts
moreaccessi bl
etothosepot enti
al consumer s.
Mult
idi
mensi
onal
Scal
ing
Goal
:obt
aint
abul
ardat
afr
om awei
ght
ed
gr
aphst
ruct
ure
Thist echniqueaimsatmeasur i
ngsi mi l
arit
ies
betweenobj ect
s,andi tcantransform agr aph
i
nt oat able.Theinputisadi ssimilari
tymat r
ix:
anadj acencymat rixofawei ghtedgr aphwher e
theedgewei ghtsrepresentanar tifi
cialdistance
measur ebetweent henodes( theobj ects).The
outputi sasetofnewmet ri
cv ari
ables( usually
two)wher eobjectswithhighsi milari
tiesar e
closerint hedataspace, andobj ectswi thlow
similari
tiesarefar.
Onecommonuseofmul t
idimensi onal scali
ng
i
stodet ectwhichpr oduct shav easi milar
consumerpr ofi
le.Byusi ngasi nputagr aph
wherethenodesar et hepr oducts, andt heedge
weightsrepresenthowdi fferentaret heir
consumerpr ofi
les,theanal ystt r
ansf or
mst his
str
uctureintoat abulardataset .Visualizingthe
newdat ainascat terplotshowswhi chpr oducts
havesimilarconsumer s,ast heyar etheones
t
hatar
ecl
osert
oeachot
heri
nthedat
aspace.
Howt
oDesi
gnaMul
ti
var
iat
eDat
aAnal
ysi
s
Designinganef fecti
vest udygoeswayf urt
her
thanchoosi ngamul t
ivari
atetechni que.After
defi
ningt heconcept ualproblem, engineeringa
model thatsol vesthepr oblem athandr equires
theanaly sttodef i
net hespeci f
icaspect sthat
areinherentt ot heselectedtechni que,suchas
esti
mat ionmet hodsanddi st
ancemet r
ics.
Additi
onal l
y,theanal ystmustbeawar eoft he
model assumpt ionsandt r
ansformt hedataif
needed.I fthedat awasnotcol lectedy et,t
hen
theanaly stmustdef i
net hesampl esi zeby
ponderingt hest ati
sticalsi
gnifi
cance, stati
stical
power ,
andef fectsize.
Onl
ythentheanaly
stcanapplyt
hetechniquet
o
pr
oduceresul
ts.However
,themodelrar
ely
gi
vessati
sfact
oryresult
sinthefir
str un.
Someti
mest hemodel fai
lstoproveort oreject
ahypot
hesisduetot y
peIandt ypeIIer r
ors.
Someti
mest hemodel fi
tstooheavilytothe
i
nputdataandcannotgener al
izeit
sr esultsfor
newentr
ies—apr ocessclassedov erfi
tti
ng.
Examiningtheerrorsandrespeci
fyi
ngthe
model i
sani t
erati
veprocess,andtheanalyst
shouldonlyproceedwhent heresul
tsaresoli
d
andgenerali
zable.Thereaderisencour
agedt o
searchforworkfl
owsandgui destolear
nhow
thispr
ocessofbui l
dingrobustmodelstakes
place.