0% found this document useful (0 votes)
66 views49 pages

Analyzing Missing Data: Problems Using Scripts

This document discusses analyzing missing data through SPSS. It will examine missing data using SPSS statistics and procedures, then use an SPSS script to produce output for missing data analysis without requiring individual commands. Three key issues for evaluating missing data are discussed: the number of cases missing per variable, the number of variables missing per case, and the pattern of correlations among variables representing missing and valid data. Examples are provided to identify the number of missing cases for each variable, create a variable counting the number of missing values per case, and compute dichotomous valid/missing variables for further analysis of the missing data pattern.

Uploaded by

Melvin A. Vidar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views49 pages

Analyzing Missing Data: Problems Using Scripts

This document discusses analyzing missing data through SPSS. It will examine missing data using SPSS statistics and procedures, then use an SPSS script to produce output for missing data analysis without requiring individual commands. Three key issues for evaluating missing data are discussed: the number of cases missing per variable, the number of variables missing per case, and the pattern of correlations among variables representing missing and valid data. Examples are provided to identify the number of missing cases for each variable, create a variable counting the number of missing values per case, and compute dichotomous valid/missing variables for further analysis of the missing data pattern.

Uploaded by

Melvin A. Vidar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 49

SW388R7

Data Analysis &


Computers II
Slide 1
Analyzing Missing Data
Introduction
ro!lems
"sing Scripts
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 2
Missing data and data analysis

Missing data is a pro!lem in multi#ariate data !ecause


a case $ill !e e%cluded &rom t'e analysis i& it is
missing data &or any #aria!le included in t'e analysis(

I& our sample is large) $e may !e a!le to allo$ cases


to !e e%cluded(

I& our sample is small) $e $ill try to use a su!stitution


met'od so t'at $e can retain enoug' cases to 'a#e
su&&icient po$er to detect e&&ects(

In eit'er case) $e need to ma*e certain t'at $e


understand t'e potential impact t'at missing data
may 'a#e on our analysis(
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 3
+ools &or e#aluating missing data

SSS 'as a speci&ic pac*age &or e#aluating missing


data) !ut it is included under t'e "+ license(

In place o& t'is pac*age) $e $ill &irst e%amine


missing data using SSS statistics and procedures(

A&ter studying t'e standard SSS procedures t'at $e


can use to e%amine missing data) $e $ill use an SSS
script t'at $ill produce t'e output needed &or
missing data analysis $it'out re,uiring us to issue all
o& t'e SSS commands indi#idually(
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 4
-ey issues in missing data analysis

We $ill &ocus on t'ree *ey issues &or e#aluating


missing data.

+'e num!er o& cases missing per #aria!le

+'e num!er o& #aria!les missing per case

+'e pattern o& correlations among #aria!les


created to represent missing and #alid data(

/urt'er analysis may !e re,uired depending on t'e


pro!lems identi&ied in t'ese analyses(
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 5
ro!lem 1
1( 0ased on a missing data analysis &or t'e #aria!les
1employment status)1 1num!er o& 'ours $or*ed in t'e past $ee*)1
1sel& employment)1 1go#ernmental employment)1 and
1occupational prestige score1 in t'e dataset 2SS3444(sa#) is t'e
&ollo$ing statement true) &alse) or an incorrect application o& a
statistic5
+'e #aria!les 1num!er o& 'ours $or*ed in t'e past $ee*1 and
1employment status1 are missing data &or more t'an 'al& o& t'e
cases in t'e data set and s'ould !e e%amined care&ully !e&ore
deciding 'o$ to 'andle missing data(
1( +rue
3( +rue $it' caution
3( /alse
6( Incorrect application o& a statistic
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 6
Identi&ying t'e num!er o& cases in t'e data set
This problem wants to know if a variable is
missing data for more than half the cases.
Our first task is to identify the number of
cases that meets that criterion.
If we scroll to the bottom of the data set,
we see than there are 270 cases in the data
set.
270 2 ! "#$.
If any variable included in the analysis has
more than "#$ missing cases, the answer to
the problem will be true.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 7
Re,uest &re,uency distri!utions
%e will use the output for
fre&uency distributions to
find the number of missing
cases for each variable.
'elect the Frequencies( )
Descriptive Statistics
command from the Analyze
menu.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 8
Completing t'e speci&ication &or &re,uencies
Second, click on the O*
button to complete the
re&uest for statistical
output.
First, move the five
variables included in the
problem statement to
the list bo+ for variables.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 9
7um!er o& missing cases &or eac' #aria!le
In the table of statistics at
the top of the ,re&uencies
output, there is a table
detailing the number of
missing cases for each
variable in the analysis.
-one of the variables has more than "#$ missing cases, although
number of hours worked in the past week comes close.
The answer to the &uestion is false.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
!
ro!lem 3
3( 0ased on a missing data analysis &or t'e #aria!les
1employment status)1 1num!er o& 'ours $or*ed in t'e past
$ee*)1 1sel& employment)1 1go#ernmental employment)1 and
1occupational prestige score1 in t'e dataset 2SS3444(sa#) is t'e
&ollo$ing statement true) &alse) or an incorrect application o& a
statistic5
16 cases are missing data &or more t'an 'al& o& t'e #aria!les in
t'e analysis and s'ould !e e%amined care&ully !e&ore deciding
'o$ to 'andle missing data(
1( +rue
3( +rue $it' caution
3( /alse
6( Incorrect application o& a statistic
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide

Create a #aria!le t'at counts missing data


%e want to know how
many of the five variables
in the analysis had
missing data for each
case in the data set.
%e will create a variable
containing this
information that uses an
'.'' function to count
the number of variables
with missing data.
To compute a new
variable, select the
Compute(
command from the
Transform menu.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
2
8nter speci&ications &or ne$ #aria!le
Third, click on the
up arrow button to
move the NMISS
function into the
-umeric /+pression
te+t bo+.
First, type in the name for
the new variable nmiss in
the Target variable te+t bo+.
Second, scroll down the list
of functions and highlight
the NMISS function.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
3
8nter speci&ications &or ne$ #aria!le
The NMISS function is
moved into the -umeric
/+pression te+t bo+.
Second, click on the
right arrow button to
move the variable
name into the function
arguments.
To add the list of
variables to count
missing data for,
we first highlight
the first variable to
include in the
function, wrkstat.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
4
8nter speci&ications &or ne$ #aria!le
First, before we add another
variable to the function, we
type a comma to separate the
names of the variables.
Third, click on the
right arrow button to
move the variable
name into the function
arguments.
Second, to add
the ne+t variable
we highlight the
second variable to
include in the
function, hrs1.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
5
Complete speci&ications &or ne$ #aria!le
0ontinue adding variables to
function until all of the
variables specified in the
problem have been added.
1e sure to type a comma
between the variable names.
%hen all of the variables have
been added to the function,
click on the O button to
complete the specifications.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
6
+'e nmiss #aria!le in t'e data editor
If we scroll the worksheet
to the right, we see the new
variable that '.'' has 2ust
computed for us.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
7
A &re,uency distri!ution &or nmiss
To answer the
&uestion of how many
cases had each of the
possible numbers of
missing value, we
create a fre&uency
distribution.
'elect the Frequencies( )
Descriptive Statistics
command from the Analyze
menu.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
8
Completing t'e speci&ication &or &re,uencies
Second, click on the O*
button to complete the
re&uest for statistical
output.
First, move the nmiss
variable to the list of
variables.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
9
+'e &re,uency distri!ution
'.'' produces a fre&uency
distribution for the nmiss
variable.
"70 cases had valid, non3
missing values for all $
variables. 4$ cases had one
missing value5 " case had 2
missing values5 and "6 cases
had missing values for 6
variables.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
2!
Ans$ering t'e pro!lem
The problem asked whether
or not "6 cases had missing
data for more than half the
variables. ,or a set of five
variables, cases that had #,
6, or $ missing values
would meet this
re&uirement.
The number of cases with
#, 6, or $ missing values is
"6.
The answer to the problem
is true.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
2
ro!lem 3
3( 0ased on a missing data analysis &or t'e #aria!les
1employment status)1 1num!er o& 'ours $or*ed in t'e past
$ee*)1 1sel& employment)1 1go#ernmental employment)1 and
1occupational prestige score1 in t'e dataset 2SS3444(sa#) is t'e
&ollo$ing statement true) &alse) or an incorrect application o& a
statistic5 "se 4(41 as t'e le#el o& signi&icance(
A&ter e%cluding cases $it' missing data &or more t'an 'al& o&
t'e #aria!les &rom t'e analysis i& necessary) t'e presence o&
statistically signi&icant correlations in t'e matri% o& dic'otomous
missing9#alid #aria!les suggests t'at t'e missing data pattern
may not !e random(
1( +rue
3( +rue $it' caution
3( /alse
6( Incorrect application o& a statistic
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
22
Compute #alid9missing dic'otomous #aria!les
To evaluate the pattern of
missing data, we need to
compute dichotomous
valid7missing variables for
each of the five variables
included in the analysis.
%e will compute the new
variable using the 8ecode
command.
To create the new
variable, select the
!eco"e # Into
Di$$erent %aria&les'
from the (rans$orm
menu.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
23
8nter speci&ications &or ne$ #aria!le
First, move the first
variable in the analysis,
wrkstat, into the Numeric
%aria&le )* Output %aria&le
te+t bo+.
Second, type the name for the new
variable into the -ame te+t bo+. 9y
convention is to add an underscore
character to the end of the variable name.

If this would make the variable more than
4 characters long, delete characters from
the end of the original variable name.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
24
8nter speci&ications &or ne$ #aria!le
Next, type the label for the
new variable into the :abel
te+t bo+. 9y convention is to
add the phrase ;%ali"+Missin,-
to the end of the variable
label for the original variable.
Finally, click on
the 0hange button
to add the name of
the dichotomous
variable to the
Numeric Variable ->
Output Variable te%t
!o%(
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
25
8nter speci&ications &or ne$ #aria!le
To specify the values for the
new variable, click on the Ol"
an" New %alues' button.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
26
C'ange t'e #alue &or missing data
The dichotomous variable should be
coded " if the variable has a valid value,
0 if the variable has a missing value.
First, mark
the System) or
user)missin,
option button.
Second, type 0 in
the <alue te+t bo+.
Third, click on the A"" button
to include this change in the
list of Ol")*New list bo+.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
27
C'ange t'e #alue &or #alid data
First, mark
the All other
values option
button.
Second, type " in
the <alue te+t bo+.
Third, click on the A"" button
to include this change in the
list of Ol")*New list bo+.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
28
Complete t'e #alue speci&ications
=aving entered the values
for recoding the variable
into dichotomous values, we
click on the Continue button
to complete this dialog bo+.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
29
Complete t'e recode speci&ications
=aving entered specifications for the
new variable and the values for
recoding the variable into dichotomous
values, we click on the O button to
produce the new variable.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
3!
+'e dic'otomous #aria!le
The procedure for creating a dichotomous
valid7missing variable is repeated for the
four other variables in the analysis> hrs",
wrkslf, wrkgovt, and prestg40.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
3
/iltering cases $it' e%cessi#e missing #aria!les
To filter cases included in
further analysis, we choose
the Select Cases'
command from the Data
menu.
The problem calls for us to
e+clude cases that have
missing data for more than
half of the variables.
%e do this by selecting in,
or filtering, cases that have
fewer than half missing
variables, i.e. less than #
missing variables.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
32
8nter speci&ications &or selecting cases
Second, click on the I$'
button to enter the
criteria for including
cases.
First, click on the I$
con"ition is satis$ie"
option button on the
Select panel.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
33
8nter speci&ications &or selecting cases
Second, click
on the Continue
button to
complete the If
specification.
First, enter the criteria
for including cases>
nmiss ? #
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
34
Complete t'e speci&ications &or selecting cases
To complete the
specifications, click
on the O button.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
35
Cases e%cluded &rom &urt'er analyses
'.'' marks the cases that will not be
included in further analyses by drawing
a slash mark through the case number.
%e can verify that the selection is
working correctly by noting that the
case which is omitted had 6 missing
variables.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
36
Correlating t'e dic'otomous #aria!les
To compute a correlation
matri+ for the dichotomous
variables, select the
Correlate command from
the Analyze menu.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
37
Speci&ications &or correlations
Second, click on
the O button to
complete the
re&uest.
First, move the
dichotomous variables
to the variables list bo+.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
38
Correlations
.
a
.
a
.
a
.
a
.
a
. . . . .
256 256 256 256 256
.
a
1 -.049 .
a
-.042
. . .437 . .501
256 256 256 256 256
.
a
-.049 1 .
a
-.010
. .437 . . .877
256 256 256 256 256
.
a
.
a
.
a
.
a
.
a
. . . . .
256 256 256 256 256
.
a
-.042 -.010 .
a
1
. .501 .877 . .
256 256 256 256 256
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
L!"# $#C% S&&'S
((alid)*issing)
N'*!%# "$ +"'#S
,"#-%. LS& ,%%-
((alid)*issing)
# S%L$-%*P "#
,"#-S $"#
S"*%!"./
((alid)*issing)
0"(& "# P#1(&%
%*PL"/%%
((alid)*issing)
#S "CC'P&1"NL
P#%S&10% SC"#%
(1980) ((alid)*issing)
L!"#
$#C%
S&&'S
((alid)*is
sing)
N'*!%#
"$ +"'#S
,"#-%.
LS& ,%%-
((alid)*issin
g)
# S%L$-%*P
"# ,"#-S
$"#
S"*%!"./
((alid)*issin
g)
0"(& "#
P#1(&%
%*PL"/%%
((alid)*issi
ng)
#S
"CC'P
&1"NL
P#%S&10
% SC"#%
(1980)
((alid)*is
sing)
Cannot 2e 3o456ted 2e3a6se at least one o7 t8e 9aria2les is 3onstant.
a.
+'e correlation matri%
The correlation matri+ is
symmetric along the diagonal
;shown by the blue line@. The
correlation for any pair of
variables is included twice in
the table. 'o we only count
the correlations below the
diagonal ;the cells with the
yellow background@.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
39
Correlations
.
a
.
a
.
a
.
a
.
a
. . . . .
256 256 256 256 256
.
a
1 -.049 .
a
-.042
. . .437 . .501
256 256 256 256 256
.
a
-.049 1 .
a
-.010
. .437 . . .877
256 256 256 256 256
.
a
.
a
.
a
.
a
.
a
. . . . .
256 256 256 256 256
.
a
-.042 -.010 .
a
1
. .501 .877 . .
256 256 256 256 256
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
L!"# $#C% S&&'S
((alid)*issing)
N'*!%# "$ +"'#S
,"#-%. LS& ,%%-
((alid)*issing)
# S%L$-%*P "#
,"#-S $"#
S"*%!"./
((alid)*issing)
0"(& "# P#1(&%
%*PL"/%%
((alid)*issing)
#S "CC'P&1"NL
P#%S&10% SC"#%
(1980) ((alid)*issing)
L!"#
$#C%
S&&'S
((alid)*is
sing)
N'*!%#
"$ +"'#S
,"#-%.
LS& ,%%-
((alid)*issin
g)
# S%L$-%*P
"# ,"#-S
$"#
S"*%!"./
((alid)*issin
g)
0"(& "#
P#1(&%
%*PL"/%%
((alid)*issi
ng)
#S
"CC'P
&1"NL
P#%S&10
% SC"#%
(1980)
((alid)*is
sing)
Cannot 2e 3o456ted 2e3a6se at least one o7 t8e 9aria2les is 3onstant.
a.
+'e correlation matri%
The correlations marked with
footnote a could not be
computed because one of the
variables was a constant, i.e.
the dichotomous variable has
the same value for all cases.
This happens when one of the
valid7missing variables has no
missing cases, so that all of
the cases have a value of "
and none have a value of 0.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
4!
Correlations
.
a
.
a
.
a
.
a
.
a
. . . . .
256 256 256 256 256
.
a
1 -.049 .
a
-.042
. . .437 . .501
256 256 256 256 256
.
a
-.049 1 .
a
-.010
. .437 . . .877
256 256 256 256 256
.
a
.
a
.
a
.
a
.
a
. . . . .
256 256 256 256 256
.
a
-.042 -.010 .
a
1
. .501 .877 . .
256 256 256 256 256
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
L!"# $#C% S&&'S
((alid)*issing)
N'*!%# "$ +"'#S
,"#-%. LS& ,%%-
((alid)*issing)
# S%L$-%*P "#
,"#-S $"#
S"*%!"./
((alid)*issing)
0"(& "# P#1(&%
%*PL"/%%
((alid)*issing)
#S "CC'P&1"NL
P#%S&10% SC"#%
(1980) ((alid)*issing)
L!"#
$#C%
S&&'S
((alid)*is
sing)
N'*!%#
"$ +"'#S
,"#-%.
LS& ,%%-
((alid)*issin
g)
# S%L$-%*P
"# ,"#-S
$"#
S"*%!"./
((alid)*issin
g)
0"(& "#
P#1(&%
%*PL"/%%
((alid)*issi
ng)
#S
"CC'P
&1"NL
P#%S&10
% SC"#%
(1980)
((alid)*is
sing)
Cannot 2e 3o456ted 2e3a6se at least one o7 t8e 9aria2les is 3onstant.
a.
+'e correlation matri%
In the cells for which the correlation
could be computed, the probabilities
indicating significance are 0.6#7,
0.$0", and 0.477.
-one of the correlations are
statistically significant. The answer to
the &uestion is false. %e do not
need to be concerned about a missing
data problem for this set of variables.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
4
"sing scripts

+'e process o& e#aluating missing data re,uires


numerous SSS procedures and outputs t'at are time
consuming to produce(

+'ese procedures can !e automated !y creating an


SSS script( A script is a program t'at e%ecutes a
se,uence o& SSS commands(

+'oug't $riting scripts is not part o& t'is course) $e


can ta*e ad#antage o& scripts t'at I use to reduce t'e
!urdensome tas*s o& e#aluating missing data(
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
42
"sing a script &or missing data

+'e script :MissingDataC'ec*(s!s; $ill produce all o&


t'e output $e 'a#e used &or e#aluating missing data)
as $ell as ot'er outputs descri!ed in t'e te%t!oo*(

7a#igate to t'e lin* :SSS Scripts and Synta%; on t'e


course $e! page(

Do$nload t'e script &ile :MissingDataC'ec*(e%e; to


your computer and install it) &ollo$ing t'e directions
on t'e $e! page(
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
43
<pen t'e data set in SSS
1efore using a script, a data
set should be open in the
'.'' data editor.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
44
In#o*e t'e script
To invoke the script, select
the 8un 'cript( command
in the Atilities menu.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
45
Select t'e missing data script
First, navigate to the folder where you put the script.
If you followed the directions, you will have a file with
an B.'1'B e+tension in the 0>C'%#4487 folder.
If you only see a file with an D./E/F e+tension in the
folder, you should double click on that file to e+tract
the script file to the 0>C'%#4487 folder.
Third, click on
!un button to
start the script.
Second, click on the
script name to highlight
it.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
46
+'e script dialog
The script dialog bo+ acts
similarly to '.'' dialog
bo+es. Gou select the
variables to include in the
analysis and choose options
for the output.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
47
Complete t'e speci&ications
'elect the variables for the
analysis. This analysis uses
the variables for the e+ample
on page $H in the te+tbook.
0lick on the O*
button to produce
the output.
The checkbo+es
are marked to
produce the
output we need
for our problems.
The only
additional option
is to compute the
t3tests and chi3
s&uare tests for
all of the
variables.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
48
+'e script &inis'es
If you '.'' output viewer is
open, you will see the output
produced in that window.
'ince it may take a while to
produce the output, and
since there are times when
it appears that nothing is
happening, there is an alert
to tell you when the script is
finished.
Anless you are absolutely
sure something has gone
wrong, let the script run
until you see this alert.
%hen you see this alert,
click on the O* button.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
49
<utput &rom t'e script
The script will produce lots
of output. Idditional
descriptive material in the
titles should help link
specific outputs to specific
tasks.

You might also like