An Introduction To Statistical Package For The Social Sciences

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12


6. An Introduction to Statistical Package for the

Social Sciences
Nick Emtage and Stephen Duthy

This module provides an introduction to statistical analysis, particularly in regard to survey

data. Some of the features of the Statistical Package for the Social Sciences (SPSS) are
then explained, with reference to a farm forestry survey. Of necessity, this is a brief overview
to the highly complex and powerful SPSS package.

1. INTRODUCTION user had an advanced understanding of the

mathematics required. More recent
Computer based statistical packages are an computer software packages are
important tool for researchers in the social reasonably easy to use for people with
sciences. The prospect of using statistics is some familiarity with computers. Most of the
sometimes either repugnant or simply packages have features such as drop-down
frightening for people, yet most researchers menus, tree structure diagrams and on-line
recognise the potential utility of statistical help systems. This said, it should be
analysis to aid them to describe, analyse, remembered that the packages discussed
interpret and report their data. The here are large and highly complex. While
mathematics behind statistical analysis can they are considerably easier to use today
be daunting for those who have little formal than they were even 10 years ago, like
training in either mathematics or the use of other large software packages, familiarity
statistics. The development of specialist and ease of use are only developed through
statistical analysis packages has greatly practice with the package. A user can
reduced the mathematical challenge of become functionally proficient with a
undertaking many analyses. It should be package such as Excel and Word after
emphasised, however, that these packages several weeks, use but development of a
have not reduced the need for researchers high level of expertise can take many
to understand the assumptions behind months or even several years.
statistical analyses, and to be able to
interpret their results. The packages have When choosing which package to use for
however reduced the need for researchers statistical analyses a number of factors
to be able to undertake many of the must be considered. These include the
calculations that are required for statistical availability of a package, its cost, the
analyses. In this way they allow researchers functions it can perform, familiarity with the
to concentrate on understanding the package and the availability of an expert
assumptions behind the various methods, statistician to assist with the analysis
as well as the potential applications and process. As discussed above, the packages
limitations of various statistical tests. take time and effort to learn, and many
researchers prefer to continue using a
Statistical software packages have, like particular package once they learned how
other software packages, changed greatly to use it. Other factors may affect this
since the advent of the personal computer a however. Availability of a package is an
little over 20 years ago. Some of the important factor in deciding whether to use
authors still remember programming it or not. If an institution has already
mainframe computers with paper cards. obtained the rights to use a particular
Holes were punched into the cards and package, it may be the only choice
these were then fed into the computer. available. Buying copies of the latest
Computers in those days were scarce, versions of the specialist statistical
especially the big ones with four megabyte packages is expensive, as is the cost of
memory! Needless to say that statistical maintaining the license to use the package.
tests were difficult to perform unless the If an institution already has a package that
54 Socio-economic Research Methods in Forestry

can provide the functions required then the number of cases and variables that can be
researcher may be forced to use that used. An institutional license costs even
package despite preferences for other more, depending on the number of
software because of limited funds. expected users. The different packages
have licenses that also differ. In most cases
Where expert statisticians are available to licenses are set up to expire automatically
assist with data analyses then the preferred after a limited period after which the
package of the expert is likely to be the one package can no longer be used. The
used. As discussed in other modules it is package is developed for a number of
important to discuss research projects with operating systems including Windows and
expert statisticians during their design to Unix. Information about SPSS products is
ensure that the data collected will be in a available on-line at
format that allows the use of the desired
analysis techniques. It is also important at Organisation of the SPSS package
this stage to discuss the packages available
to the researcher and the time available to The set-up of the version 10.0 package
access a computer for data entry, analysis (used for illustration here) is organised into
and reporting. Where access to the two main sections, for defining and entering
computers with the statistical software is data and for output. When defining and
limited it may be possible for the researcher entering data, users can move between the
to enter the data into a spreadsheet variable and data views by clicking on
program like Excel and then transfer the the tabs at the bottom of the screen. The
data set to the statistics package in order to third output section opens in a separate
carry out the analyses. In this case it is window and displays the results of the
important to have some understanding of statistical analyses. The output data are
the formatting required by the statistical saved as a separate file to the data set.
package to be used so as to avoid
unnecessary reformatting of the data in the In the variable view (Figure 1) the users
statistical package. Where possible data sets up the data entry and analysis cells by
should be entered directly into the statistical naming and defining the variables included
package to avoid the potential need to in the data set. Users are required to use
reformat the data. names for the variables of eight or fewer
characters. Names must begin with an
2. THE STATISTICAL PROGRAM FOR alphabetic character. Longer descriptions of
SOCIAL SCIENTISTS (SPSS) the variables can be added using the
Labels dialog box (Figure 2). A quick way
The SPSS Corporation first produced the to define the variable format (including the
SPSS software package in the early 1980s variable type, the number of characters
and has recently released version 11.0. It is used and labels) if a number of variables
presently one of the most commonly used have a similar format is to copy the
statistical packages in Australian research attributes of a variable then paste them into
institutions and is available at all Australian other variable fields.
universities. The advantages of the
package are its relative ease of use, its Once the variables to be recorded have
familiarity to many statistical experts and its been named and defined the user can
functionality. One of SPSSs major access the data view to enter in the values
disadvantages is its cost. The SPSS for each variable. The SPSS data view
corporation appears to be progressively looks similar to a spreadsheet program. The
breaking up the program into different variables are organised as columns with
sections that can be purchased separately. each row as a single case in the data set
For Australian students an individual users containing values for the variables relating
license (one year) costs approximately to that case. It is common practice to use
$A100 for a base student version and codes to enter data into the package and
$A350 for a graduate pack licensed for 5 labels can be used to describe values
years (as of March 2002). The different where needed. For example, codes may be
versions have varying analytical functions used to record the types of agriculture
and different capacities in terms of the practiced on a landholding, or respondents
An Introduction to Statistical Package for the Social Sciences 55

educational levels. The defined labels will on the right side of the cell, and the user
appear, by clicking the drop-down list arrow can select the relevant value (Figure 3).

Figure 1. Variable view in SPSS 10.0

Figure 2. Defining variable labels using the Value labels dialog box
56 Socio-economic Research Methods in Forestry

Figure 3. Entering data into the SPSS

This is handy when there is a large number Once the data are entered into the SPSS
of possible responses, and thus codes, for program it is important to check the
a variable, and the user cannot remember database for typographic errors that may
all of them. The user can choose to have affect the results of statistical analyses. One
the codes or the labels displayed in the data means of achieving this is to examine the
view by selecting the Value labels option frequencies of categorical (nominal) data,
under the View menu. and descriptive statistics of numeric
(ordinal, scale or interval) data. All of the
Data analysis using SPSS analytical functions available in SPSS can
be accessed using the Analyse menu
The SPSS student pack has a wide range (Figure 6). If the Descriptive statistics then
of analytical functions, from basic the Frequencies options are selected, the
descriptive statistics to advanced general dialog box illustrated in Figure 5 appears.
linear modeling capabilities. Specific This dialog box enables users to select the
functions are also included to allow the variables for which frequencies are
transformation of variables as preparation computed as well as control the types and,
for different tests (e.g. for creating to a limited extent, the formatting of displays
standardised or logarithmic values, or the of the analyses.
calculation of scales from a number of
variables) (Figure 4). The use of these If calculation of descriptive statistics is
functions allows researchers to calculate required, users should select Descriptive
quickly new variables based on the values statistics and the Descriptives options
of other variables, test variations in under the Analysis menu to reveal the
category schemes used to classify Descriptives dialog box (Figure 7).
responses to open ended questions, and
collapse categories where necessary.
An Introduction to Statistical Package for the Social Sciences 57

Figure 4. Data transformation functions in SPSS

Figure 5. Frequencies dialog box

58 Socio-economic Research Methods in Forestry

Figure 6. Analysis options available in SPSS

Figure 7. Descriptives function dialog box

An Introduction to Statistical Package for the Social Sciences 59

Once the Descriptives dialog box is shown, develop and execute macros in Microsoft
the variables to be included in the analyses Excel. The Sax Basic language is
are selected from the list on the left side of compatible with Visual Basic for
the box (Figure 7), and transferred to the list Applications.
on the right side of the box (labeled
Variables in Figure 7) using the arrow in
the centre of the box. The types of
descriptive statistics that will be Calculated
using this function can be selected by
clicking on the Options button (Figure 7).
This reveals the Options dialog box for the
Descriptives function (Figure 8).

Other analytical functions included in the

SPSS student pack (Version 10) include
chi-square tests, correlations, regressions,
principal components analyses, ANOVA,
cluster analyses, general linear modeling
and more.

Whilst this paper does not attempt to

provide the reader with statistical skills, the
flowchart in Figure 9 may act as a guide for
the reader to access quickly those functions
in SPSS that will best serve their statistical
analysis needs.
Figure 8. Options dialog box for the
The analytical functions are adequate for all Descriptives function dialog box
but the most advanced researchers or
those requiring highly specific analyses.
Most advanced or specific applications can
be met as well, with SPSS open to
manipulation via user compiled Sax Basic
computer code (also known as scripts in
SPSS). This is similar to the use of the
Visual Basic programming language to
Purpose of
statistical analysis

Exploring Testing
relationships significance of
univariate data
between variables differences

statistics Form of data Number of groups
variance, etc)

One: mean
compared to Two Multiple
Frequencies Measurements
a specified

Number of Number of One-sample Independent Related Independent Related

variables variables t-test samples samples samples samples

One: compared to Multiple: effect One Multiple Repeated-

Two: tested for Two: degree of Form of
theoretical of 2+ predictors Form of data independent independent measures
association relationship data
distribution on a dependant variable variables ANOVA

Chi-square test for Level of Multiple
goodness-of-fit Ordinal Interval Ordinal Interval
association measurement regression

Mann-Whitney Independent- One-way Multifactorial

U test samples t-test ANOVA ANOVA
Ordinal Interval

Pearson's Wilcoxon Paired-

correlation matched- samples
rho Source: Corston and Colman (2000)
cooeficient pairs test t-test

Figure 9. Choosing an appropriate statistical procedure

An Introduction to Statistical Package for the Social Sciences 61

Using SPSS to Describe Data presenting data summaries in research and

project reports.
Whilst computer-based statistical packages
provide a high degree of functionality with The charting functions available in SPSS
regard to data analysis, they also provide a also provide a number of techniques for the
number of highly useful tools for the initial exploration and the presentation of
description and presentation of summaries data. Scatter Plots (Figures 11 and 12) can
of the dataset. be used to identify quickly the presence and
nature of any correlations between
These functions include Descriptives and variables while Histograms (Figures 13 and
Frequencies as explained earlier and 14) can be used to present a graphical
Crosstabs, also found under the Descriptive representation of the shape of the
Statistics menu, and Basic Tables, General distribution of the data for important
Tables, Multiple Response Tables and variables.
Tables of Frequencies all located under the
Custom Tables menu item (Figure 10). It is There is a reasonable amount of literature
often useful to undertake one or more of available to assist users of the SPSS
these processes before commencing data package produced by the SPSS
analysis to identify any weaknesses in the Corporation and by independent authors.
dataset such as poorly represented groups The tutorial and help facilities for the
within the sample that may limit the package are comprehensive, generally
statistical validity of some forms of analysis. easy to understand and include the on-line
Crosstabs are also an efficient way of Statistics Coach and Syntax Guide.

Figure 10. Custom Tables drop-down menu selections

62 Socio-economic Research Methods in Forestry

Figure 11. Design options for a Simple Scatterplot

Figure 12. Simple Scatterplot displayed in the Output Viewer

An Introduction to Statistical Package for the Social Sciences 63

Figure 13. Design options for a histogram

Figure 14. Histogram displayed in the Output Viewer

64 Socio-economic Research Methods in Forestry

3. CONCLUDING COMMENTS package, and hence a useful one to master.

It is necessary to allow some learning time
Researchers frequently collect large to become familiar with this package, and
quantities of data, from surveys, annual license fees can be a disincentive.
experiments and other forms of
observation. A statistical computing REFERENCES
package provides a convenient means to
store these data, and derive descriptive and Corston, R. and Colman, A. (2000), A
inferential statistics. The Statistical Package Crash Course in SPSS for Windows,
for the Social Sciences (SPSS) is a widely Blackwell, Oxford.
used general-purpose survey analysis

You might also like