Simulating Multivariate Structures
Simulating Multivariate Structures
http://www.personality-project.org/r/r.datageneration.html
numberofvariables <- dim(fmodel)[1] #problem size determined by input to the function numberoflatent <- dim(fmodel)[2] tmodel <- t(fmodel) #transpose of model # fmodel %*% tmodel #show the resulting measurement structure communality=diag(fmodel%*%tmodel) uniqueness=1-communality errorweight=sqrt(uniqueness) errorweight=diag(errorweight) #find how much to weight true scores and errors given the measurement model
latentscores <- matrix(rnorm(numberofcases*(numberoflatent)),numberofcases) #create true scores for the latent variables #round(cor(latentscores),2) #if uncommented, this shows the sample true score correlation matrix of the factors latentscores <- latentscores%*%effect #create true scores to reflect structural relations between the factors # round(cor(latentscores),2) #note that the factors are now correlated truescores <- latentscores %*% tmodel #round(cor(truescores),2) #show the true score correlation matrix (without error) error<- matrix(rnorm(numberofcases*(numberofvariables)),numberofcases) #create normal error scores error=error%*%errorweight observedscore=truescores+error allscores<- data.frame(observedscore,truescores) return(allscores) } #end of function mes
The first example is the classic multiple regression of two predictors and one criterion. We are assuming no errors of measurement and uncorrelated (in the population) predictors. The values of beta for X1 and X2 may be changed for alternative demonstrations. Note that the unique variance of the Y variable is specified in terms of what is not predicted by X1 and X2.
#example 1 2 predictors, 1 criterion variable no errors of measurement
beta1 <- .5 beta2 <- .6 uniquey <- sqrt(1-(beta1^2 + beta2^2)) fmodel <- matrix(c (1,0,0, 0,1,0, 0,0,1), nrow=3,ncol=3,byrow=TRUE) effect <- matrix(c ( 1,0,beta1, 0,1,beta2, 0,0,uniquey),nrow=3,ncol=3,byrow=TRUE) observed.df <- mes(fmodel,effect) names(observed.df) <- c("X1","X2", "Y","T1","T2","Ty") round(var(observed.df),2) summary(lm(Y ~ X1+X2,data=observed.df))
This next example is merely two correlated predictors, still with no measurement error.
#example 2 2 predictors, 1 criterion variable no errors of measurement correlated predictors
beta1 <- .5 beta2 <- .6 rx1x2 <- .5 uniquex2 <- sqrt(1-rx1x2) uniquey <- sqrt(1-(beta1^2 + beta2^2)) fmodel <- matrix(c (1,0,0, 0,1,0, 0,0,1), nrow=3,ncol=3,byrow=TRUE) effect <- matrix(c ( 1,rx1x2,beta1, 0,beta2,0, 0,0,uniquey),nrow=3,ncol=3,byrow=TRUE) observed.df <- mes(fmodel,effect) names(observed.df) <- c("X1","X2", "Y","T1","T2","Ty") round(var(observed.df),2) round(cor(observed.df),2) summary(lm(Y ~ X1+X2,data=observed.df))
Here we modify example 1 by introducing errors of measurement into the criterion variable. This does not affect the relationships between the true scores, but does between the observed scores.
#example 3 2 predictors, 1 criterion variable errors in measurement of criterion
beta1 <- .5 beta2 <- .6 uniquey <- sqrt(1-(beta1^2 + beta2^2)) fmodel <- matrix(c (1,0,0, 0,1,0,
1 de 3
04/11/2011 23:57
http://www.personality-project.org/r/r.datageneration.html
0,0,.5), nrow=3,ncol=3,byrow=TRUE) effect <- matrix(c ( 1,0,beta1, 0,1,beta2, 0,0,uniquey),nrow=3,ncol=3,byrow=TRUE) observed.df <- mes(fmodel,effect) names(observed.df) <- c("X1","X2", "Y","T1","T2","Ty") round(var(observed.df),2) round(cor(observed.df),2) summary(lm(Y ~ X1+X2,data=observed.df))
Now we introduce errors in measurement on the predictors. An interesting exercise is to vary the amount of error for the X1 and X2 variables. Also, compare the effect of changing the beta weights.
#example 4 2 predictors, 1 criterion variable errors in measurement of predictors
beta1 <- .5 beta2 <- .6 uniquey <- sqrt(1-(beta1^2 + beta2^2)) fmodel <- matrix(c (.8,0,0, 0,.6,0, 0,0,1), nrow=3,ncol=3,byrow=TRUE) effect <- matrix(c ( 1,0,beta1, 0,1,beta2, 0,0,uniquey),nrow=3,ncol=3,byrow=TRUE) observed.df <- mes(fmodel,effect) names(observed.df) <- c("X1","X2", "Y","T1","T2","Ty") round(var(observed.df),2) round(cor(observed.df),2) summary(lm(Y ~ X1+X2,data=observed.df))
Now introduce errors in measurement into both predictors as well as the criterion variable.
#example 5 2 predictors, 1 criterion variable errors in measurement
beta1 <- .5 beta2 <- .6 uniquey <- sqrt(1-(beta1^2 + beta2^2)) fmodel <- matrix(c (.8,0,0, 0,.6,0, 0,0,.5), nrow=3,ncol=3,byrow=TRUE) effect <- matrix(c ( 1,0,beta1, 0,1,beta2, 0,0,uniquey),nrow=3,ncol=3,byrow=TRUE) observed.df <- mes(fmodel,effect) names(observed.df) <- c("X1","X2", "Y","T1","T2","Ty") round(var(observed.df),2) round(cor(observed.df),2) summary(lm(Y ~ X1+X2,data=observed.df))
Now, finally, we can consider what happens in the case where we have errors of measurement in the predictors as well as the criterion, but more importantly, we have multiple indicators of all of the latent variables. This is, of course, an example of the general problem encountered in structural equation modeling. By having multiple indicators of the latent variables, we are able to estimate the errors of measurement that affected all of the previous models but that was not possible to estimate. For this case, notice that the factor model (fmodel) has more than one variable loading on each factor. The effects matrix is the same as before. This particular example then rescales the observed score matrix to put the variables into more "realistic" terms. The model may be seen as having three estimates of ability, two of achievement, and three of performance. The particular example assumes that there are 3 measures of ability (GREV, GREQ, GREA), two measures of motivation (achievment motivation and anxiety), and three measures of performance (Prelims, GPA, MA). These titles are, of course, arbitrary and can be changed easily.
#example 6 9 variables on 3 factors, with the first two predicting the 3rd
title<-
#measurement (factor) model for 3 factors and 9 variables fmodel <- matrix(c (.9, 0, 0, .8, 0, 0, .7, .7, 0, 0, .6, 0, 0, -.8, 0, 0, 0, .7, 0, 0, .6, 0, 0, .5), nrow=numberofvariables,ncol=numberoflatent,byrow=TRUE)
#structural model for 3 factors (the diagonals reflect unique variance, the off diagonals the structure coefficients beta1 <- .7 beta2 <- .6 uniquey <- sqrt(1-(beta1^2 + beta2^2)) effect <- matrix(c ( 1,0,beta1, 0,1,beta2, 0,0,uniquey),nrow=3,ncol=3,byrow=TRUE) observedscore <- mes(fmodel,effect) #1000 subjects is the default
round(cor(observedscore),2)
GREV=round(observedscore[,1]*100+500,0) GREQ=round(observedscore[,2]*100+500,0)
2 de 3
04/11/2011 23:57
http://www.personality-project.org/r/r.datageneration.html
GREA=round(observedscore[,3]*100+500,0) Ach=round(observedscore[,4]*10+50,0) Anx=round(-observedscore[,5]*10+50,0) Prelim=round(observedscore[,6]+10,0) GPA=round(observedscore[,7]*.5+4,2) MA=round(observedscore[,8]*.5+3,1) data=data.frame(GREV,GREQ,GREA,Ach,Anx,Prelim,GPA,MA) summary(data) #basic summary statistics round(cor(data),2) #show the resulting correlations #it is, of course, identical to the previous one
#example 6 9 variables on 3 factors, with the first two predicting the 3rd # The particular example assumes that there are 3 measures of ability (GREV, GREQ, GREA), two measures of motivation (achievment #motivation and anxiety), a
title<-
#measurement (factor) model for 3 factors and 9 variables fmodel <- matrix(c (.9, 0, 0, .8, 0, 0, .7, 0, 0, 0, .6, 0, 0, .8, 0, 0, 0, .7, 0, 0, .6, 0, 0, .5), nrow=numberofvariables,ncol=numberoflatent,byrow=TRUE)
#structural model for 3 factors (the diagonals reflect unique variance, the off diagonals the structure coefficients effect <- matrix(c(1,0,.7, 0,1,.6, 0,.0,.39),nrow=numberoflatent,byrow=TRUE) observedscore <- mes(fmodel,effect) #1000 subjects is the default
round(cor(observedscore),2)
#show the correlation matrix #give the data "realistic" properties sds <-c(100,100,100,10,10,1,.5,.5,rep(1,8)) means <- c(500,500,500,50,50,10,4,3,rep(0,8)) t <-observedscore *sds+means GREV=round(observedscore[,1]*100+500,0) GREQ=round(observedscore[,2]*100+500,0) GREA=round(observedscore[,3]*100+500,0) Ach=round(observedscore[,4]*10+50,0) Anx=round(-observedscore[,5]*10+50,0) Prelim=round(observedscore[,6]+10,0) GPA=round(observedscore[,7]*.5+4,2) MA=round(observedscore[,8]*.5+3,1) data=data.frame(GREV,GREQ,GREA,Ach,Anx,Prelim,GPA,MA) summary(data) #basic summary statistics round(cor(data),2) #show the resulting correlations #it is, of course, identical to the previous one
part of a short guide to R Version of May 4, 2004 - revised October 22, 2005 to be somewhat more readable
3 de 3
04/11/2011 23:57