This document discusses analyzing missing data through SPSS. It will examine missing data using SPSS statistics and procedures, then use an SPSS script to produce output for missing data analysis without requiring individual commands. Three key issues for evaluating missing data are discussed: the number of cases missing per variable, the number of variables missing per case, and the pattern of correlations among variables representing missing and valid data. Examples are provided to identify the number of missing cases for each variable, create a variable counting the number of missing values per case, and compute dichotomous valid/missing variables for further analysis of the missing data pattern.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
66 views49 pages
Analyzing Missing Data: Problems Using Scripts
This document discusses analyzing missing data through SPSS. It will examine missing data using SPSS statistics and procedures, then use an SPSS script to produce output for missing data analysis without requiring individual commands. Three key issues for evaluating missing data are discussed: the number of cases missing per variable, the number of variables missing per case, and the pattern of correlations among variables representing missing and valid data. Examples are provided to identify the number of missing cases for each variable, create a variable counting the number of missing values per case, and compute dichotomous valid/missing variables for further analysis of the missing data pattern.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 49
SW388R7
Data Analysis &
Computers II Slide 1 Analyzing Missing Data Introduction ro!lems "sing Scripts SW388 R7 Data Analysi s & Compu ters II Slide 2 Missing data and data analysis
Missing data is a pro!lem in multi#ariate data !ecause
a case $ill !e e%cluded &rom t'e analysis i& it is missing data &or any #aria!le included in t'e analysis(
I& our sample is large) $e may !e a!le to allo$ cases
to !e e%cluded(
I& our sample is small) $e $ill try to use a su!stitution
met'od so t'at $e can retain enoug' cases to 'a#e su&&icient po$er to detect e&&ects(
In eit'er case) $e need to ma*e certain t'at $e
understand t'e potential impact t'at missing data may 'a#e on our analysis( SW388 R7 Data Analysi s & Compu ters II Slide 3 +ools &or e#aluating missing data
SSS 'as a speci&ic pac*age &or e#aluating missing
data) !ut it is included under t'e "+ license(
In place o& t'is pac*age) $e $ill &irst e%amine
missing data using SSS statistics and procedures(
A&ter studying t'e standard SSS procedures t'at $e
can use to e%amine missing data) $e $ill use an SSS script t'at $ill produce t'e output needed &or missing data analysis $it'out re,uiring us to issue all o& t'e SSS commands indi#idually( SW388 R7 Data Analysi s & Compu ters II Slide 4 -ey issues in missing data analysis
We $ill &ocus on t'ree *ey issues &or e#aluating
missing data.
+'e num!er o& cases missing per #aria!le
+'e num!er o& #aria!les missing per case
+'e pattern o& correlations among #aria!les
created to represent missing and #alid data(
/urt'er analysis may !e re,uired depending on t'e
pro!lems identi&ied in t'ese analyses( SW388 R7 Data Analysi s & Compu ters II Slide 5 ro!lem 1 1( 0ased on a missing data analysis &or t'e #aria!les 1employment status)1 1num!er o& 'ours $or*ed in t'e past $ee*)1 1sel& employment)1 1go#ernmental employment)1 and 1occupational prestige score1 in t'e dataset 2SS3444(sa#) is t'e &ollo$ing statement true) &alse) or an incorrect application o& a statistic5 +'e #aria!les 1num!er o& 'ours $or*ed in t'e past $ee*1 and 1employment status1 are missing data &or more t'an 'al& o& t'e cases in t'e data set and s'ould !e e%amined care&ully !e&ore deciding 'o$ to 'andle missing data( 1( +rue 3( +rue $it' caution 3( /alse 6( Incorrect application o& a statistic SW388 R7 Data Analysi s & Compu ters II Slide 6 Identi&ying t'e num!er o& cases in t'e data set This problem wants to know if a variable is missing data for more than half the cases. Our first task is to identify the number of cases that meets that criterion. If we scroll to the bottom of the data set, we see than there are 270 cases in the data set. 270 2 ! "#$. If any variable included in the analysis has more than "#$ missing cases, the answer to the problem will be true. SW388 R7 Data Analysi s & Compu ters II Slide 7 Re,uest &re,uency distri!utions %e will use the output for fre&uency distributions to find the number of missing cases for each variable. 'elect the Frequencies( ) Descriptive Statistics command from the Analyze menu. SW388 R7 Data Analysi s & Compu ters II Slide 8 Completing t'e speci&ication &or &re,uencies Second, click on the O* button to complete the re&uest for statistical output. First, move the five variables included in the problem statement to the list bo+ for variables. SW388 R7 Data Analysi s & Compu ters II Slide 9 7um!er o& missing cases &or eac' #aria!le In the table of statistics at the top of the ,re&uencies output, there is a table detailing the number of missing cases for each variable in the analysis. -one of the variables has more than "#$ missing cases, although number of hours worked in the past week comes close. The answer to the &uestion is false. SW388 R7 Data Analysi s & Compu ters II Slide ! ro!lem 3 3( 0ased on a missing data analysis &or t'e #aria!les 1employment status)1 1num!er o& 'ours $or*ed in t'e past $ee*)1 1sel& employment)1 1go#ernmental employment)1 and 1occupational prestige score1 in t'e dataset 2SS3444(sa#) is t'e &ollo$ing statement true) &alse) or an incorrect application o& a statistic5 16 cases are missing data &or more t'an 'al& o& t'e #aria!les in t'e analysis and s'ould !e e%amined care&ully !e&ore deciding 'o$ to 'andle missing data( 1( +rue 3( +rue $it' caution 3( /alse 6( Incorrect application o& a statistic SW388 R7 Data Analysi s & Compu ters II Slide
Create a #aria!le t'at counts missing data
%e want to know how many of the five variables in the analysis had missing data for each case in the data set. %e will create a variable containing this information that uses an '.'' function to count the number of variables with missing data. To compute a new variable, select the Compute( command from the Transform menu. SW388 R7 Data Analysi s & Compu ters II Slide 2 8nter speci&ications &or ne$ #aria!le Third, click on the up arrow button to move the NMISS function into the -umeric /+pression te+t bo+. First, type in the name for the new variable nmiss in the Target variable te+t bo+. Second, scroll down the list of functions and highlight the NMISS function. SW388 R7 Data Analysi s & Compu ters II Slide 3 8nter speci&ications &or ne$ #aria!le The NMISS function is moved into the -umeric /+pression te+t bo+. Second, click on the right arrow button to move the variable name into the function arguments. To add the list of variables to count missing data for, we first highlight the first variable to include in the function, wrkstat. SW388 R7 Data Analysi s & Compu ters II Slide 4 8nter speci&ications &or ne$ #aria!le First, before we add another variable to the function, we type a comma to separate the names of the variables. Third, click on the right arrow button to move the variable name into the function arguments. Second, to add the ne+t variable we highlight the second variable to include in the function, hrs1. SW388 R7 Data Analysi s & Compu ters II Slide 5 Complete speci&ications &or ne$ #aria!le 0ontinue adding variables to function until all of the variables specified in the problem have been added. 1e sure to type a comma between the variable names. %hen all of the variables have been added to the function, click on the O button to complete the specifications. SW388 R7 Data Analysi s & Compu ters II Slide 6 +'e nmiss #aria!le in t'e data editor If we scroll the worksheet to the right, we see the new variable that '.'' has 2ust computed for us. SW388 R7 Data Analysi s & Compu ters II Slide 7 A &re,uency distri!ution &or nmiss To answer the &uestion of how many cases had each of the possible numbers of missing value, we create a fre&uency distribution. 'elect the Frequencies( ) Descriptive Statistics command from the Analyze menu. SW388 R7 Data Analysi s & Compu ters II Slide 8 Completing t'e speci&ication &or &re,uencies Second, click on the O* button to complete the re&uest for statistical output. First, move the nmiss variable to the list of variables. SW388 R7 Data Analysi s & Compu ters II Slide 9 +'e &re,uency distri!ution '.'' produces a fre&uency distribution for the nmiss variable. "70 cases had valid, non3 missing values for all $ variables. 4$ cases had one missing value5 " case had 2 missing values5 and "6 cases had missing values for 6 variables. SW388 R7 Data Analysi s & Compu ters II Slide 2! Ans$ering t'e pro!lem The problem asked whether or not "6 cases had missing data for more than half the variables. ,or a set of five variables, cases that had #, 6, or $ missing values would meet this re&uirement. The number of cases with #, 6, or $ missing values is "6. The answer to the problem is true. SW388 R7 Data Analysi s & Compu ters II Slide 2 ro!lem 3 3( 0ased on a missing data analysis &or t'e #aria!les 1employment status)1 1num!er o& 'ours $or*ed in t'e past $ee*)1 1sel& employment)1 1go#ernmental employment)1 and 1occupational prestige score1 in t'e dataset 2SS3444(sa#) is t'e &ollo$ing statement true) &alse) or an incorrect application o& a statistic5 "se 4(41 as t'e le#el o& signi&icance( A&ter e%cluding cases $it' missing data &or more t'an 'al& o& t'e #aria!les &rom t'e analysis i& necessary) t'e presence o& statistically signi&icant correlations in t'e matri% o& dic'otomous missing9#alid #aria!les suggests t'at t'e missing data pattern may not !e random( 1( +rue 3( +rue $it' caution 3( /alse 6( Incorrect application o& a statistic SW388 R7 Data Analysi s & Compu ters II Slide 22 Compute #alid9missing dic'otomous #aria!les To evaluate the pattern of missing data, we need to compute dichotomous valid7missing variables for each of the five variables included in the analysis. %e will compute the new variable using the 8ecode command. To create the new variable, select the !eco"e # Into Di$$erent %aria&les' from the (rans$orm menu. SW388 R7 Data Analysi s & Compu ters II Slide 23 8nter speci&ications &or ne$ #aria!le First, move the first variable in the analysis, wrkstat, into the Numeric %aria&le )* Output %aria&le te+t bo+. Second, type the name for the new variable into the -ame te+t bo+. 9y convention is to add an underscore character to the end of the variable name.
If this would make the variable more than 4 characters long, delete characters from the end of the original variable name. SW388 R7 Data Analysi s & Compu ters II Slide 24 8nter speci&ications &or ne$ #aria!le Next, type the label for the new variable into the :abel te+t bo+. 9y convention is to add the phrase ;%ali"+Missin,- to the end of the variable label for the original variable. Finally, click on the 0hange button to add the name of the dichotomous variable to the Numeric Variable -> Output Variable te%t !o%( SW388 R7 Data Analysi s & Compu ters II Slide 25 8nter speci&ications &or ne$ #aria!le To specify the values for the new variable, click on the Ol" an" New %alues' button. SW388 R7 Data Analysi s & Compu ters II Slide 26 C'ange t'e #alue &or missing data The dichotomous variable should be coded " if the variable has a valid value, 0 if the variable has a missing value. First, mark the System) or user)missin, option button. Second, type 0 in the <alue te+t bo+. Third, click on the A"" button to include this change in the list of Ol")*New list bo+. SW388 R7 Data Analysi s & Compu ters II Slide 27 C'ange t'e #alue &or #alid data First, mark the All other values option button. Second, type " in the <alue te+t bo+. Third, click on the A"" button to include this change in the list of Ol")*New list bo+. SW388 R7 Data Analysi s & Compu ters II Slide 28 Complete t'e #alue speci&ications =aving entered the values for recoding the variable into dichotomous values, we click on the Continue button to complete this dialog bo+. SW388 R7 Data Analysi s & Compu ters II Slide 29 Complete t'e recode speci&ications =aving entered specifications for the new variable and the values for recoding the variable into dichotomous values, we click on the O button to produce the new variable. SW388 R7 Data Analysi s & Compu ters II Slide 3! +'e dic'otomous #aria!le The procedure for creating a dichotomous valid7missing variable is repeated for the four other variables in the analysis> hrs", wrkslf, wrkgovt, and prestg40. SW388 R7 Data Analysi s & Compu ters II Slide 3 /iltering cases $it' e%cessi#e missing #aria!les To filter cases included in further analysis, we choose the Select Cases' command from the Data menu. The problem calls for us to e+clude cases that have missing data for more than half of the variables. %e do this by selecting in, or filtering, cases that have fewer than half missing variables, i.e. less than # missing variables. SW388 R7 Data Analysi s & Compu ters II Slide 32 8nter speci&ications &or selecting cases Second, click on the I$' button to enter the criteria for including cases. First, click on the I$ con"ition is satis$ie" option button on the Select panel. SW388 R7 Data Analysi s & Compu ters II Slide 33 8nter speci&ications &or selecting cases Second, click on the Continue button to complete the If specification. First, enter the criteria for including cases> nmiss ? # SW388 R7 Data Analysi s & Compu ters II Slide 34 Complete t'e speci&ications &or selecting cases To complete the specifications, click on the O button. SW388 R7 Data Analysi s & Compu ters II Slide 35 Cases e%cluded &rom &urt'er analyses '.'' marks the cases that will not be included in further analyses by drawing a slash mark through the case number. %e can verify that the selection is working correctly by noting that the case which is omitted had 6 missing variables. SW388 R7 Data Analysi s & Compu ters II Slide 36 Correlating t'e dic'otomous #aria!les To compute a correlation matri+ for the dichotomous variables, select the Correlate command from the Analyze menu. SW388 R7 Data Analysi s & Compu ters II Slide 37 Speci&ications &or correlations Second, click on the O button to complete the re&uest. First, move the dichotomous variables to the variables list bo+. SW388 R7 Data Analysi s & Compu ters II Slide 38 Correlations . a . a . a . a . a . . . . . 256 256 256 256 256 . a 1 -.049 . a -.042 . . .437 . .501 256 256 256 256 256 . a -.049 1 . a -.010 . .437 . . .877 256 256 256 256 256 . a . a . a . a . a . . . . . 256 256 256 256 256 . a -.042 -.010 . a 1 . .501 .877 . . 256 256 256 256 256 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N L!"# $#C% S&&'S ((alid)*issing) N'*!%# "$ +"'#S ,"#-%. LS& ,%%- ((alid)*issing) # S%L$-%*P "# ,"#-S $"# S"*%!"./ ((alid)*issing) 0"(& "# P#1(&% %*PL"/%% ((alid)*issing) #S "CC'P&1"NL P#%S&10% SC"#% (1980) ((alid)*issing) L!"# $#C% S&&'S ((alid)*is sing) N'*!%# "$ +"'#S ,"#-%. LS& ,%%- ((alid)*issin g) # S%L$-%*P "# ,"#-S $"# S"*%!"./ ((alid)*issin g) 0"(& "# P#1(&% %*PL"/%% ((alid)*issi ng) #S "CC'P &1"NL P#%S&10 % SC"#% (1980) ((alid)*is sing) Cannot 2e 3o456ted 2e3a6se at least one o7 t8e 9aria2les is 3onstant. a. +'e correlation matri% The correlation matri+ is symmetric along the diagonal ;shown by the blue line@. The correlation for any pair of variables is included twice in the table. 'o we only count the correlations below the diagonal ;the cells with the yellow background@. SW388 R7 Data Analysi s & Compu ters II Slide 39 Correlations . a . a . a . a . a . . . . . 256 256 256 256 256 . a 1 -.049 . a -.042 . . .437 . .501 256 256 256 256 256 . a -.049 1 . a -.010 . .437 . . .877 256 256 256 256 256 . a . a . a . a . a . . . . . 256 256 256 256 256 . a -.042 -.010 . a 1 . .501 .877 . . 256 256 256 256 256 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N L!"# $#C% S&&'S ((alid)*issing) N'*!%# "$ +"'#S ,"#-%. LS& ,%%- ((alid)*issing) # S%L$-%*P "# ,"#-S $"# S"*%!"./ ((alid)*issing) 0"(& "# P#1(&% %*PL"/%% ((alid)*issing) #S "CC'P&1"NL P#%S&10% SC"#% (1980) ((alid)*issing) L!"# $#C% S&&'S ((alid)*is sing) N'*!%# "$ +"'#S ,"#-%. LS& ,%%- ((alid)*issin g) # S%L$-%*P "# ,"#-S $"# S"*%!"./ ((alid)*issin g) 0"(& "# P#1(&% %*PL"/%% ((alid)*issi ng) #S "CC'P &1"NL P#%S&10 % SC"#% (1980) ((alid)*is sing) Cannot 2e 3o456ted 2e3a6se at least one o7 t8e 9aria2les is 3onstant. a. +'e correlation matri% The correlations marked with footnote a could not be computed because one of the variables was a constant, i.e. the dichotomous variable has the same value for all cases. This happens when one of the valid7missing variables has no missing cases, so that all of the cases have a value of " and none have a value of 0. SW388 R7 Data Analysi s & Compu ters II Slide 4! Correlations . a . a . a . a . a . . . . . 256 256 256 256 256 . a 1 -.049 . a -.042 . . .437 . .501 256 256 256 256 256 . a -.049 1 . a -.010 . .437 . . .877 256 256 256 256 256 . a . a . a . a . a . . . . . 256 256 256 256 256 . a -.042 -.010 . a 1 . .501 .877 . . 256 256 256 256 256 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N L!"# $#C% S&&'S ((alid)*issing) N'*!%# "$ +"'#S ,"#-%. LS& ,%%- ((alid)*issing) # S%L$-%*P "# ,"#-S $"# S"*%!"./ ((alid)*issing) 0"(& "# P#1(&% %*PL"/%% ((alid)*issing) #S "CC'P&1"NL P#%S&10% SC"#% (1980) ((alid)*issing) L!"# $#C% S&&'S ((alid)*is sing) N'*!%# "$ +"'#S ,"#-%. LS& ,%%- ((alid)*issin g) # S%L$-%*P "# ,"#-S $"# S"*%!"./ ((alid)*issin g) 0"(& "# P#1(&% %*PL"/%% ((alid)*issi ng) #S "CC'P &1"NL P#%S&10 % SC"#% (1980) ((alid)*is sing) Cannot 2e 3o456ted 2e3a6se at least one o7 t8e 9aria2les is 3onstant. a. +'e correlation matri% In the cells for which the correlation could be computed, the probabilities indicating significance are 0.6#7, 0.$0", and 0.477. -one of the correlations are statistically significant. The answer to the &uestion is false. %e do not need to be concerned about a missing data problem for this set of variables. SW388 R7 Data Analysi s & Compu ters II Slide 4 "sing scripts
+'e process o& e#aluating missing data re,uires
numerous SSS procedures and outputs t'at are time consuming to produce(
+'ese procedures can !e automated !y creating an
SSS script( A script is a program t'at e%ecutes a se,uence o& SSS commands(
+'oug't $riting scripts is not part o& t'is course) $e
can ta*e ad#antage o& scripts t'at I use to reduce t'e !urdensome tas*s o& e#aluating missing data( SW388 R7 Data Analysi s & Compu ters II Slide 42 "sing a script &or missing data
+'e script :MissingDataC'ec*(s!s; $ill produce all o&
t'e output $e 'a#e used &or e#aluating missing data) as $ell as ot'er outputs descri!ed in t'e te%t!oo*(
7a#igate to t'e lin* :SSS Scripts and Synta%; on t'e
course $e! page(
Do$nload t'e script &ile :MissingDataC'ec*(e%e; to
your computer and install it) &ollo$ing t'e directions on t'e $e! page( SW388 R7 Data Analysi s & Compu ters II Slide 43 <pen t'e data set in SSS 1efore using a script, a data set should be open in the '.'' data editor. SW388 R7 Data Analysi s & Compu ters II Slide 44 In#o*e t'e script To invoke the script, select the 8un 'cript( command in the Atilities menu. SW388 R7 Data Analysi s & Compu ters II Slide 45 Select t'e missing data script First, navigate to the folder where you put the script. If you followed the directions, you will have a file with an B.'1'B e+tension in the 0>C'%#4487 folder. If you only see a file with an D./E/F e+tension in the folder, you should double click on that file to e+tract the script file to the 0>C'%#4487 folder. Third, click on !un button to start the script. Second, click on the script name to highlight it. SW388 R7 Data Analysi s & Compu ters II Slide 46 +'e script dialog The script dialog bo+ acts similarly to '.'' dialog bo+es. Gou select the variables to include in the analysis and choose options for the output. SW388 R7 Data Analysi s & Compu ters II Slide 47 Complete t'e speci&ications 'elect the variables for the analysis. This analysis uses the variables for the e+ample on page $H in the te+tbook. 0lick on the O* button to produce the output. The checkbo+es are marked to produce the output we need for our problems. The only additional option is to compute the t3tests and chi3 s&uare tests for all of the variables. SW388 R7 Data Analysi s & Compu ters II Slide 48 +'e script &inis'es If you '.'' output viewer is open, you will see the output produced in that window. 'ince it may take a while to produce the output, and since there are times when it appears that nothing is happening, there is an alert to tell you when the script is finished. Anless you are absolutely sure something has gone wrong, let the script run until you see this alert. %hen you see this alert, click on the O* button. SW388 R7 Data Analysi s & Compu ters II Slide 49 <utput &rom t'e script The script will produce lots of output. Idditional descriptive material in the titles should help link specific outputs to specific tasks.