Treatment of Missing Data
Treatment of Missing Data
DeptP
AnxtP
GSItP
DeptS
AnxtS
50
52
52
44
41
42
-9
65
55
57
73
68
71
60
57
67
61
67
63
65
45
61
64
57
60
59
62
48
61
52
57
44
50
50
58
53
55
53
70
70
69
-9
64
59
60
-9
-9
-9
-9
53
50
50
42
38
33
42
38
39
44
41
45
-9
61
61
55
44
50
42
44
50
42
42
38
43
-9
-9
57
55
51
44
41
35
-9
-9
-9
-9
-9
-9
57
52
57
GSItS
SexChild
Totbpt
-9
-9
-9
52
-9
51
65
70
59
66
-9
-9
-9
61
57
61
52
53
59
53
49
We read in the data as we normally do in SPSS, in my case as a "dat" file. Then from the Analyze menu
choose Multiple Imputation and then select Impute Missing Values. When you have made the necessary
assignments of variables to the role you will have a menu that looks like the following.
http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.html
1/6
7/10/2014
Notice that I have included all nine variables in doing the imputations, even though I will only use six of them in
the regressions. I do this because those extra variables may be able to add importantly to the imputed values.
For example, suppose that I had a second measure of depression, but chose not to use in in the final
analysis. That measures would presumably be nicely correlated with DeptP, and would be useful in imputing
missing data for that variable. So I include it here, even though I drop it later.
The important thing to notice here is the section called "Location of Imputed Data." I have taken the default
and specified that the new dataset will be named SPSSImputations. It is important to note that this will NOT
create a file in your directory with that name. It will create a file in your current session to which we will turn very
shortly.
I am not going to present the output from that procedure because it doesn't get us very far. Basically you will
see a list of variables with their means, standard deviations, etc. from the raw data and from the imputed
data. You should look at that, but it is not very exciting.
This step of the procedure doesn't look as if it has done much for us, but in fact it has. It has created five data
sets containing imputed values, and those are held in SPSSImputations. If you go to the Window tag in the
main SPSS Window, it will offer you the choice of going to that data set. You can see this in the following
image. There are other choices in that window because I have created other stuff as I wrote this page, but you
want to select "Untitled[SPSSImputations]-IBM SPSS Statistics Editor." When you make that selection you
will get the following data set. Notice that it looks like the original, but with a new variable called "Imputation_."
This will consist of the numbers 0 to 5, referring to the particular imputation session. (Imputation = 0 refers to
the original data file.) You can see part of that data file below, showing the last few lines of the original data
and the first few lines of the data from imputation 1. The areas shaded in yellow are imputed values where the
value was missing in the original.
http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.html
2/6
7/10/2014
Now we are ready to do our analysis, but we do it in kind of a strange way. If you look back at the first window
that I showed you, you will see a note at the bottom referring to a special icon. This means that if you now take
this new data set and go to the standard Analyze menu, you will see that some of the procedures have this
icon next to them. That really means that if you use this data set with that procedure, SPSS will recognize that
you want to combine imputed data sets and will allow you to do so. For example, we want to use linear
regression to predict Totbpt from 5 other variables. You set this up as follows
http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.html
3/6
7/10/2014
Noitice that I have added "Imputation_ to the box labeled "Selection Variable" and used the "Rule" to specify
that I want it to use all imputations numbered 1 or more. The partial results of this printout follow.
http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.html
4/6
7/10/2014
The important part is the last set of output. It shows you what the regression coefficients, their standard errors,
etc were for the 5 separate imputations, and then it shows you for the "pooled" data. This is the result you
were looking for, and is comparable to what we found in the last bit of printouts for NORM and SAS. The
values will not be exactly the same, but they will be reasonably close.I think that the "error message" in that
last window is not an error message. It is simply saying that I did not chose to include Imputation 0, which was
the original data.
SPSS Syntax
For those who like to work with syntax rather than focussing on the GUI, the syntax for this analysis follows.
http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.html
5/6
7/10/2014
http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.html
6/6