0% found this document useful (0 votes)
18 views9 pages

SPSS Tutorials

The document provides an overview of using SPSS for data analysis and management. It covers opening and understanding different file types in SPSS, exploring the data and variable views, using the output window to view results, and writing syntax. Descriptive statistics like frequencies, summaries, and graphs for single variables are demonstrated. Methods for selecting cases, transforming variables, and recording values are explained. Hypothesis testing is also mentioned.

Uploaded by

Martina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views9 pages

SPSS Tutorials

The document provides an overview of using SPSS for data analysis and management. It covers opening and understanding different file types in SPSS, exploring the data and variable views, using the output window to view results, and writing syntax. Descriptive statistics like frequencies, summaries, and graphs for single variables are demonstrated. Methods for selecting cases, transforming variables, and recording values are explained. Hypothesis testing is also mentioned.

Uploaded by

Martina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

SPSS TUTORIALS

Module 1

Type of file
- Introduction
o To open a file = file > open > data (there are also other types
o Then, select dataset and click open > a dataset window and an output
window will appear
o DATA.sav: data view and variable view
o OUTPUT.spo: each shown in a specific output window
o SYNTAX.sps: shown in a specific syntax window + can have a script in the SPSS
coding language (e.g. useful when needed to repeat the same command with
small changes)
- Data
o Data view: reports each case in a row and the values of each variable in
columns
o Variable view: for each variable (in rows), there are different types of info
reported in columns
 If we click on the row of the chosen variable, we are moved in its
position within the data view
o Variable view info: name, type (e.g. numeric, string…), label (detail the name
of the variable), values (reports the legend of the levels, e.g. 1 “yes”, 2 “no”),
missing (report the levels for missing values, e.g. -1 “no answer”), measure
(reports variable type as scale of measurement used, i.e. scale, ordinal, or
nominal)
- Output
o Output window: reports everything that we asked SPSS to do and contains
the results that it provides us for the analysis
o Analyze > descriptive statistics > frequencies > drag variable and click OK
o Log and analysis (e.g. frequencies), which is divided into title, notes, statistics,
and variable question (e.g. do you have a phone?)
o Can be saved: file > save or save as...
- Syntax
o In the output file, log reports the code corresponding to the command
launched through the drop-down menu
 We can use a syntax file to draft the script with the command lines of
the analysis we want to perform according to the SPSS language
o Analyze > descriptive statistics > frequencies and drag the variable to analyze
> if you click “paste” instead of “ok”, then the command will appear in the
syntax window
 > from there, you can select it, click on the play green button, and the
results of the analysis will appear on the output window
 Command name is blue, variable name is black, parameter is red, and
the option is green
 Can add comments using * at the beginning of the phrase
 Can copy and paste commands, e.g. changing variable name and
running the analysis again
- The four main menus
o In the graphical user interface of SPSS, there are 4 very relevant menus: data,
transform, analyze, and graph
1. Data: collects commands that perform operations at the dataset level, e.g.
merging datasets, sorting cases, filtering cases, extracting a smaller portion of
the dataset by selecting some variables…
2. Transform: commands for managing and preparing the dataset before doing
an analysis, e.g. recording a variable, creating a new variable through a math
operation on other variables, collapsing some variables etc…
3. Analyze: commands to perform analysis (descriptive statistics and model
computation)
4. Graph: commands to ask SPSS to plot a graph

Data management
- Creating a smaller dataset
o File > save as… > variables… (a new window appears – only the selected
variables will be saved) > “drop all” and select the needed variables >
continue (and paste or save)
- Selecting cases
o Data > select cases > “if condition is satisfied” > “if…” (a new window
appears) > select the needed variables, drag them and write the condition
(e.g. PESEX=2) > continue > choose final output
 “Filter out selected cases”: from the variable view, double-click on the
row of the variable PESEX and you’ll see that the cases with value
different than 2 will be crossed out
 If you launch a command, e.g. analyze > descriptive statistics >
frequencies > PESEX variable chosen > “ok” > the output
analysis will be run only on PESEX=2
 “Copy selected cases to a new dataset” > “ok” > a new dataset will
appear with only the cases where PESEX=2 (remember to save this
window as it won’t automatically save)
 “Delete unselected cases” (not suggested because those cases will be
deleted and won’t be recovered in any way)
- Transforming a variable
o Transform > compute variable
o “Target variable” (to give it a name) = “Numeric expression” (can search
“function group”, e.g. for log, do arithmetic > Ln”)
 To write in “numeric expression”, either use the blue arrow or rewrite
the function (same thing to select the variable), e.g. LN(PTC1Q10) >
then click “ok”
 Output window result + new variable appears on the main grid
 In variable view, you can write a label for the variable
- Recording the values of a variable
o E.g. PEEDUCA is a categorical variable with many categories, so we might
want to reduce them (have only 3) in order to lower complexity
o Transform > “record into different variables…” > drag chosen variable > on the
right, write name and label and press “change”
o Click “old and new values” (a new window appears) > “system-missing” to
“system-missing” and add, “system- or user- missing” to “system-missing”
and add (don’t forget to do this!)
 Then, old “value” e.g. 31, to new “value” e.g. 1 and click add
 Then, old “range” e.g. 32 through 40, to new “value” e.g. 2
 Then, old “range, value through highest” e.g. 41, to new “value” e.g. 3
 Click “continue” (you’ll go back to the previous window) and “ok” >
results will appear in the output window (reporting syntax” and in the
data view (new variable will show)
o Useful when creating a dummy variable: identifying that the case belongs to a
category of interest, with value 1 (of interest) or 0 (not of interest)
 Transform > “record into different variables…” > “reset” to clean up
 Drag chosen variable and write name & label > “old and new values…”
 Do everything as before with the system-missing and user-missing
 Old “value” e.g. 31 to new “value” e.g. 1 (value of interest), and add
 Old “all other values” to new “value” e.g. 0, “add” > “continue” >
“change” > “ok”

Analysis of one variable – descriptive statistics


- Frequency distribution
o Analyze > descriptive statistics > frequencies > drag variable and click “ok”
o In the output window, a table with results with appear
 At the top, the label of the variable reported will be shown
 In the first column, the values taken by the variable are shown (or
corresponding label)
 In the second column, the absolute frequencies (number of
respondents who gave a certain answer) are shown
 In the third column, the relative frequencies are shown (percentage
computed out of total value)
 In the fourth column, the percentages considering only those who
actually gave an answer are shown (total excluding missing values)
 In the fifth column, cumulative percentages are shown: for a
numerical variable, this means that for example 80% of visits to art
museums are not greater than 4
- Summary measures
o Analyze > descriptive statistics > frequencies > drag variable and click
“statistics”
o In the new window, you can choose the measures for the output file (e.g.
mean, median, and standard deviation) > click “continue” and “ok”
o In the new window, a new “statistics” table will appear
- Plotting a graph
o Analyze > descriptive statistics > frequencies > drag variable and click “charts”
o In the new window, choose the chart type (e.g. histogram for numerical)
 You can also compare the distribution of the variable to a normal
distribution by clicking on the box “show normal curve on histogram”
 Click “continue” and “ok” > in the output window, the plot will appear
o To check for the normal distribution, you can also click “analyze > “descriptive
statistics” > “Q-Q Plots”
 In the dialogue window, drag the variable and click “ok” > the output
window will show the Q-Q Plot

Analysis of one variable – statistical inference


- Hypothesis testing
o E.g. suppose you want to test the hypothesis that the average number of
visits to art museums or galleries in the past 12 months is greater than 3
o Analyze > compare means > one-sample T test
 In the new window, drag the chosen variable (e.g. number of…) and
set the test value as 3, then click “ok”
o In the output window, a new table will appear, reporting the test statistic, the
number of degrees of freedom of the t distribution of the test statistic, the
significance (p-value > in this case, we have one-sided test so the p-value is
the shown number divided by 2)
- Confidence intervals
o Analyze > descriptive statistics > explore
o Drag variable to “dependent list” and click “statistics” > “descriptives” and
choose confidence level for the mean > “continue” and “ok”
o In the output window, a new table “descriptives” will appear, showing the
confidence intervals (e.g. we are 90% confident that, for those who visited art
galleries and museums, the average number of visits is included within 3.03
and 3.25)

Evaluating association
- Crosstabs
o To study the association between two variables that are categorical or
numerical with a small number of values, we can use cross tabs
 E.g. understand if visits to an art museum or gallery are related to
gender
o Analyze > descriptive statistics > crosstabs
 Put one variable in rows and the other in columns
 To evaluate the presence of association, we need a condition of
frequencies: click on “cells” > click “row” (relative frequencies of the
variable in column conditioned to the variable in row) and “column”
(for the opposite) > “continue”
 To carry out a test for independence, click on “statistics” and “chi-
square” > “continue” and “ok”
o The output will show a new table “crosstabulation”
 First row (count) = joint absolute frequencies (e.g.
 Second row shows relative frequencies by row
 Third row shows relative frequencies by column
o For example:

 1260 females interviewed answered “yes”, while 3071 males


answered “no”
 Among males, 77.4% answered “no”
 Among those who answered “yes”, 58.4% are females
o The output will also show a new table “chi-square tests”
 In the last column, we have the p-value (exact sig.)
- Boxplot and group means
o To understand whether a numerical variable changes its behavior for different
groups/individuals/objects defined by the categories of an ordinal or nominal
variable, we can use a boxplot to compare the distribution of the quantitative
variable in each group
o Graph > legacy dialogs > boxplot > keep default settings and click “define”
 E.g. boxplot of number of visits by gender: drag “number of visits…” to
“variable” and drag “sex” to “category axis” > “ok” and the boxplot will
appear in the output window
o Another method: analyze > descriptive statistics > explore > drag “number of
visits…” to “dependent list” and “sex” to “factor list” > “ok” and the
“descriptives” table will appear in the output window
 There, we can compare the means
 The boxplot will also be shown
- Linear relationship
o To assess the sign and strength of the linear relationship between 2 numerical
variables (e.g. number of visits and age), we could use the Pearson’s
correlation coefficient
o Analyze > correlate > bivariate
 Drag the two variables and select the Pearson correlation coefficient >
“ok” and the “correlations” matrix will show in the output window
 In the example, Person correlation is 0.006 and Sig. (p-value) is 0.77 =
no significant linear relationship between the 2 variables
Module 2

Linear regression command


- New variables (to launch a regression analysis, we need to transform the relevant
categorical variables to be included as independent into dummy variables)
o R_EDU
o DU_less_first_grade
o DU_more_high_school
o DU_male
o DU_female
- Analyze > regression > linear
o E.g. regress number of visits on age, gender, and level of education
o Drag “number of visits…” to dependent and everything else (age, less than
first grade, more than high school, male) > “ok”
- Three tables will appear in the output window:
o Model summary
o ANOVA
o Coefficients

Violation of the model assumptions


- To check for violations, analyze > regression > linear
o Drag relevant variables and click “statistics” > on top of the default options,
select “part and partial correlations” (will add 3 columns to the coefficients
table: zero, partial, and part correlations) and “collinearity diagnostics (will
add 2 columns to the coefficients table: tolerance and VIF)
 Select “Durbin-Watson”: used for detecting serial correlation of the
error terms (another possible violation) > “continue”
o “Continue” will lead us to the previous table > select “plots” > select
“produce all partial plots” (to assess the event of violation of the linearity
assumption)
o “Scatter 1 of 1”: scatter plot to check for homoscedasticity of error terms
 Drag “ZPRED” to “X” and “ZRESID” to “Y”
o Select “histogram” and “normal probability plot” to see if the error terms are
normally distributed
o Then, click “continue” and “ok”
- The output window will show the tables that we need to check for violations
o Event of violation of the linear assumption: check partial regression plots
o Multicollinearity assumption: check “coefficients” table
o Homoscedasticity of error terms: check the scatterplot of the standardized
residual against the standardized predicted value
o Normality of the error terms: check the histogram and P-P plot
o No correlation of the error terms: check “model summary” table (look at
Durbin-Watson value)
Outliers and influential cases
- Identifying outliers
o Analyze > regression > linear > select the variables and click statistics > case-
wide diagnostics > continue and ok
o In the output window, there will be a list of outlier cases with their position in
the dataset
- Identifying influential cases
o Analyze > regression > linear > select the variables and click statistics > save >
select standardized residuals, Cook’s distance, and DfFit > continue and ok
 In the output window, a “residuals statistics” table will appear
o Analyze > descriptive statistics > descriptives > select relevant statistics >
options (make sure that maximum is selected) > continue and ok
 In the output window, a “residuals statistics” table will appear
 If maximum values of Cook’s distance and DFFIT are < 1, there are no
influential cases

Module 3

Odds ratio
- Binary categorical value = odds ratio to evaluate association
o Association between response variable (visits to art museums) and variable
sex
- Analyze > descriptive statistics > crosstabs
o Response variable in column and independent variables in rows
o Cell > percentages = row > continue
o Statistics > risk and chi-square > ok
- Output = risk estimate shows the odds ratio of visiting an art museum or gallery,
comparing males against females

Binary logistic regression


- SPSS can automatically fit a logistic model
- Analyze > regression > binary logistic
o Select dependent variable and covariates
o To signal categorical variables: categorical > select variables and choose
reference category (first or last) > continue
o Options > Hosmer-Lemeshow goodness-of-fit and CI for exp(B) (choose %) >
continue
o For measure of outliers: options > Casewise listing of residuals > continue
o For influential cases: save > Cook’s > continue > ok
- Output tables:
o Case processing summary = how many cases were retained for the analysis
o Dependent variable encoding = how the response variable has been recorded
 E.g. yes = 0 and no = 1 means that the odds ratio will describe the log
of the odds of not having visited / having visited
o Categorical variables codings = how the categorical variables were recorded
into dummy variables

 Female is reference category for sex


 3 is the reference category for education recorded
 Education recorded has 2 dummy variables: 1 when education
recorded is 1 and 0 when it is 2 VS 0 when education recorded is 1 and
1 when it is 2
o Classification table = shows % of correctly classified cases
o Variables in the equation = what we need to know about each covariate (B,
S.E., sig, confidence interval…)
o Casewise list = find out if there are residuals that are higher than 2 standard
deviations
o …
- To analyze influential cases, check out the new variable that will get automatically
created in the variable view
o Analyze > descriptive statistics > frequencies > choose new variable and tick
off “display frequencies table” > statistics > min and max

Multinomial logistic regression


- How to preform multinomial logistic regression analysis, e.g. we are also interested in
no response, refused, don’t know as responses to visits to art museums
o Create new variable with those variables as = 0, not in universe as missing,
and everything else (yes and no) = copy
- Analyze > regression > multinomial logistic
o New variable as the dependent variable and select reference category >
choose first, last, or custom > continue
o Select numerical covariates as “covariates” and categorical ones as “factors”
(to automatically create dummy variables for the latter)
o Statistics > classification table and goodness-of-fit > continue > ok

Module 4

Factor analysis
- To create new variables, equal to existing ones, e.g. N_jazz_zero_new: transform >
compute variable > choose and drag “number of live jazz performances” and “ok”
o Then, change the missing values to create the new variable: transform >
recode into same variables > choose and drag N_jazz_zero_new > select “old
and new values” (system or user missing have new value = 0) > continue
o Select “if” > “include if case satisfies condition” > choose and drag “attended
a live jazz performance > set it equal to =2 > continue > ok
- To perform a factor analysis, analyze > dimension reduction > factor
o Under variables, select and drag variables to include in the analysis
o Descriptives > under “statistics” select univariate descriptives and initial
solution, under “correlation matrix” select coefficients, significance levels,
inverse (for partial correlations), anti-image (for measure of sampling
adequacy for each variable), and KMO and Bartlett’s test (to understand if
factor analysis is possible because of strong correlation) > continue
o Extraction > select scree plot as well (can also change “extract” to “fixed
number of factors” and choose a number) > continue
o Rotation > Varimax > continue
o Options > select “sorted by size” and “suppress small coefficients” (can
choose absolute value, e.g. 0.3) > continue
o Scores > select “save as variables” (saving factors created as new variables –
will appear in variable view window) > continue > ok

Cluster analysis
- Analyze > classify > K-means cluster
o Can create it only on the variables created for the factor analysis
o Choose and drag variables and choose number of clusters
- Output window: iteration history could say “…iterations failed to converge…etc etc” >
could be a problem for the result > change this number for more robust results
o Analyze > classify > K-means cluster
o Iterate > maximum = e.g. 20 > continue
o Options > ANOVA table > continue > ok
- New output: iteration history “convergence ratio reached…” = good (also became = 0
at the last iteration)
o Number of cases in each cluster table shows uneven concentration in cluster
2 (too much) > need to run analysis again
o Analyze > classify > K-means cluster > change number of clusters to e.g. 5
o Save > cluster membership (to describe clusters) > continue > ok
- New output: iteration history is good + ANOVA p-values are significant + number of
cases in each cluster is still not balanced but it’s the best option
o Under variable view, there will be a new variable (QCL_1): can analyze it, e.g.
analyze > descriptive statistics > frequencies and choose the new variable
o Analyze > descriptive statistic > cross tabs (new variable in row and another
one in column)
 Cells > click row and column percentages
 Statistics > chi-square (to see if there is association)

You might also like