0% found this document useful (0 votes)
12 views17 pages

Erm Spss Example

The document provides information about SPSS (Statistical Package for the Social Sciences), including its key features and functions. It allows researchers to view and manage data, perform statistical analyses, and visualize results in various charts and graphs. SPSS enables building predictive models using advanced statistical procedures and analyzing open-ended survey responses. The document also discusses managing variables, handling missing data, recoding variables, sorting cases, computing new variables, and concepts like p-values, alpha levels, and assumptions for hypothesis testing.

Uploaded by

J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views17 pages

Erm Spss Example

The document provides information about SPSS (Statistical Package for the Social Sciences), including its key features and functions. It allows researchers to view and manage data, perform statistical analyses, and visualize results in various charts and graphs. SPSS enables building predictive models using advanced statistical procedures and analyzing open-ended survey responses. The document also discusses managing variables, handling missing data, recoding variables, sorting cases, computing new variables, and concepts like p-values, alpha levels, and assumptions for hypothesis testing.

Uploaded by

J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Page |1

SPSS Interpretation

SPSS : Statistical Package for the Social Sciences : IBM SPSS Statistics
1) Provides plenty of basic statistical functions, i.e., frequencies, cross tabulation, and bivariate statistics.
2) Enables researchers to build and validate predictive models using advanced statistical procedures.
3) Helps researcher uncover powerful insights from responses to open ended survey questions.
4) Allows researchers to use their data to create a wide variety of visuals, i.e., chart, bar and graph.

Usage

Data Editor: Data View Displayed in spreadsheet format Data Editor: Variable View
Columns represent variables, Shows information about variables
Rows represent cases present in the open data
SPSS
Environment

Output Editor Syntax Editor


Display a log and output of the actions taken, A programming language,
Shows the result of statistical analysis, Users can write, debug and run,
Contains outline of content in the viewer Shows an outline of commands and the editor
Page |2

Variables can be managed, using the Variable View.


Once we enter the variable name, all the fields will be filled with default values that we can adjust if needed.
1) Variable name must be unique.
2) Keep variable names up to 64 characters long.
3) No spaces but can include alphanumeric characters, non-punctuation characters.
4) Upper and lower cases are allowed.
5) Reserved keywords are not allowed, i.e., ALL, AND, BY, NOT, OR, TO or WITH).

▪ Type: Type of variable, e.g.: strung, numeric, comma, Date


▪ Width: Number of displayed digits or the number of characters
▪ Decimals: Number of decimals digits
▪ Columns: The width of the actual column
▪ Align: The alignment of content in cell.
▪ Measure: Level of measurement for the variable nominal, ordinal, or scale (interval or ratio).
▪ Role: how the variable will be used in analysis process, e.g., input for IV or target for DV
Variables ▪ Labels: Brief and descriptive display name for the variable that are going to be printed in the output.
▪ Defining values: For coding categorical variables, e.g., 0 = “Male”, 2 = “Female” which helps to understand
what each value represents (We can use the Value Labels option to switch between label and value.)
Page |3

▪ System missing values are values that are completely absent from the data.
▪ They are shown as periods in data view.
▪ System missing values are only found in numeric variables.
▪ Data may contain system missing values for several reasons:
- Some respondents weren't asked some questions during the questionnaire routing.
- A respondent skipped some questions.
- Something went wrong while converting or editing the data.
- Some values weren't recorded due to equipment failure.

▪ User missing values are values that are invisible while analyzing or editing data.
The SPSS user specifies which values must be excluded.
▪ For instance,
- For categorical variables, answers such as “don't know” or “no answer” are typically excluded from analysis.
- For metric variables, unlikely values (a reaction time of 50 ms or a salary of € 9,999) are set as user missing.
Missing Data
Page |4

Example: Recode the variable “Alter” to get a new variable “AlterCluster” with these for groups:
Keep also the original variable “Alter”
18 – 19 years old -> (group) 1 20 – 29 years old -> (group) 2
30 – 39 years old -> (group) 3 40 – 19 years old -> (group) 4

Recoding

▪ Transform > Recode into different variables


▪ Input variable = old variable “Alter”
▪ Output Variable = new variable “AlterCluster”
▪ Input the range of the old variable and define its
new value
▪ Result: New variable with bigger clusters instead of
individual values

Assign “value” to the new variable “AlterCluster”

Values

Variable View > Values > Three dot symbol Value labels portal appears > Define Value & Label
Page |5

Results: Meaningfully labelled categories


Click Value Labels to switch between value and label
for our analyses

Sort your case according to certain criteria Data > Sord caases > (First) Sort by “AlterCluster”
Sorting

(Additionally) Sort by “bed” condition Result: Data sorted


Page |6

(1) Creating a new variable telling the percentage of postitive from the total number of remembered adjectives

Transform > Compute variable > Define the expression Results: The new variable called “pos_percent”
Computing
(2) Creating a new variable telling you the total number of negative and postitive remembered adjectives
(without the neutral ones)

posneg_noneutral = postitive + negative posneg_noneutral2 = ges - neutral


The probability of obtaining an effect at least as extreme
𝑝 < 𝛼 : Observed result is statistically significant
p-value as the one in your sample data,
and we reject the Null Hypothesis (H0)
assuming the truth of the null hypothesis.
𝑝 > 𝛼 : Observed result is not statistically significant
= significance level, the probability of rejecting the null
α - level and we fail to reject the Null Hypothesis (H0)
hypothesis when it is true. It is usuall set at 0.05 (5%)
▪ In performing a hypothesis test the null hypothesis is defined as μ = 6.9. It can be assumed that the population
is normally distributed and α = 0.05. After running our significance test, we get a p-value of 0.156.
→ Fail to reject H0
▪ Suppose the P-value for a hypothesis test is 0.0304. Using a = 0.05, what is the appropriate conclusion?
→ Reject the null hypothesis
Examples (the probability of obtaining an effect assuming the truth of the null hypothesis, is 3.04% which is less than 5%.)
▪ A mayor is concerned about the percentage of city residents who express disapproval of his job performance.
His political committee pays for a newspaper ad, hoping to keep his disapproval rating below 21%.
They will use a follow up poll to access effectiveness. What are the correct null and alternative hypotheses?
→ H0: p ≥ 0.21
→ Ha: p < 0.21

This probability concept is very important, DON’T forget it!


Page |7

Test Decision

These two assumptions can be used in almost every test we learnt


If we remember whether the test is parametric or non-parametric,
then we can apply the exact assumptions for them.
Page |8

Pearson’s 𝑟

Focus only these highlighted boxes since the right ones are the same things.

▪ Type: Parametic Test Example: The above data hold psychological test on
▪ Purpose: Test the relationship between 2 quantitative 128 children between 12 and 14 years old. We attempt
variables to find the relationship between each test scores (IQ,
▪ Assumption: - Nomally distributed Depression, Anxeity, Social Functioning, Well-being)
- Two variables should be quantitative data
(measured at interval or ratio level) Report Results:
- Linear relationship between variables The strongest correlation is between depression and
Negative correlation overall well-being : r(115) = -0.801, p = 0.000.
▪ Ranges from -1 to 0 There is a strong, negative significant correlation
Positive correlation between depression and overall well-being scores.
▪ Ranges from 0 to 1 There is a weak, positive and no significant correlation
Strength: Weak → 0.1 to 0.29/ -0.1 to -0.29 between IQ and anxiety test scores. Correlation
Medium → 0.3 to 0.49 / -0.3 to -0.49 Coefficient r(110) = 0.152, p = 0.110.
Strong → more than 0.5 / less than -0.5
* A correlation is significant if its Sig.(2tailed) < 0.05
** Degree of freedom = N - 2
*** The results we need to report are depending on
which exact variable that were required, so the
example show only the ideas.
Page |9

Spearman’s 𝑟𝑠

▪ Type: Non-Parametic Test Example: A Spearman's rank-order correlation was


▪ Purpose: Test the relationship between 2 quantitative run to determine the relationship between 10
variables students' English and maths exam marks.
▪ Assumption: - Two variables should be quantitative data
(measured at ordinal level at least) Report Results: There is a strong, positive correlation
- Monotonous relationship between between English and maths marks, which was
two variables statistically significant: rs(8) = 0.669, p = 0.035.

Chi2

▪ Type: Non-Parametic Test Example: A Chi squere correlation was run to


▪ Purpose: Test the relationship between 2 variables determine the relationship between gender and
that is independent from each other preferred way of taking notes, conducted by educator
▪ Assumption: - Two variables should be categorical data
(measured at ordinal or nominal level) Report Results: There is no significant association
- Two variables should consist of two/more between gender and preferred way of taking notes:
categorical groups, like gender χ2 (1) = 0.258, p = 0. 611
- Expected frequencies should be at least 5
* A correlation is significant if its Sig.(2tailed) < 0.05
or more for the majority of cells
** Degree of freedom was stated in the table
- Percesntages, means cannot be use
*** Expected frequencies are 15 for the majority of
cells, that means Chi2 can be applied
P a g e | 10

Fisher’s
Exact Test

▪ Applying when expected frequencies is less than 5 in Example: A Fisher’s Exact Test was considered to
Chi2 test (look for details under the table) determine the relationship between gender and
learning medium, coducted by educator.
* The Fisher’s Exact Test has no ststistics value,
we consider only p-value directly. Report Results: There is no significant association
between gender and learning medium with p = 0.335

** Expected frequencies are less than 5 for the


majority of cells, that means Chi2 can not be applied

Test value is the


claimed number
that manufacturer
declares that there
are 50 red gummy
One sample
bears per bag.
t-test

▪ Type: Parametic Test Example: A one sample t-test was run to determine
▪ Purpose: Test if mean of a population is statistically the deference between sample mean of red gummy
different from a known value. bear per bag and population mean of red gummy
▪ Assumption: - Nomally distributed bear per bag (Test value).
- No outliers
- Test variables should be quantitative data Report Results: Red gummy bear per bag of sample
(measured at interval or ratio level) is statistically significant from the population normal
red gummy bear per bag (Claimed number of 50) with
t(9) = 2.481, p = 0.035.
P a g e | 11

t-test for
independent
samples

▪ Known as: Unpaired t-test, Independent Measured t-test Example: A t-test for independence samples was run
▪ Type: Parametic Test to determine the deference of pre-test scores between
▪ Purpose: Compares means of two independent groups males and females, conducted by edutech student.
if there are statistically different between means
▪ Assumption: - Nomally distributed Report Results: There is a sinificant difference in the
- No outliers pre-test scores between males and females with
- Homogeneity of variences t(15) = 2.870, p = 0.012, d = 1.457 and a large effect
- Dependent variable should be sizes. Females’ scores are sinificantly higher.
quantitative data
* If the Levene test becomes significant (Sig < 0.05),
(measured at interval or ratio level)
the homogeneity is not given and you need to read
- Independent variable should be
the results from the “equal variences not assumed”.
categorical data
(measured at ordinal level at least)
with two categorical groups
Welch’s t-test Applying when homogeneity of variences is not assumed in t-test for independence samples
Mann-Whitney
Applying when assumptions are not meet in t-test for independence samples
U test
The t-test is not suitable for every experiment design because we can only use it while comparing two sample groups'
Suitability of
means. Also, this test is a parametric test, requiring (1) Being independent, (2) Being (almost) normally distributed, (3)
t-test
No outliers and (s) Homogeneity of variences
Why only two That’s because when many sample groups are being compared, the error structure for a t-test will undervalue the exact
group? error, conducting multiple t-tests can lead to severe inflation of the Type I error rate.
P a g e | 12

One-way
ANOVA
(ANOVA =
Analysis of
Variance)
P a g e | 13

▪ Type: Parametic Test Example: A hospital wants to know how a


▪ Purpose: Compares means of two or more independent homeopathic medicine for depression performs in
groups if there are statistically different among means comparison to alternatives. They administered 4
▪ Assumption: - Nomally distributed treatments to 100 patients for 2 weeks and then
- No outliers measured their depression levels. The one-way
- Homogeneity of variences ANOVA wa run by statistician team of hospital.
- Dependent variable should be
quantitative data Report Results:
(measured at interval or ratio level) In the Levene’s Test, there are no statistical significant
- Independent variable should be among variences with p = 0.949. This means variances
categorical data of BDI for the four medicine groups are all equal.
(measured at ordinal level at least)
with more than two categorical groups There is a significant difference in scorese on Beck’s
depression inventory depending on medicine
administered with F(3, 96) = 20.730 , p = 0.000, η2 =
0.393 and a small effect sizes.

→ Omnibus test is statistically significant.


→ Consider Post Hoc test to see which groups differ
significantly from each other

In the Post Hoc test,


There is the mean difference between each condition
except between Placebo and Homeopathic condition
that is not sinificantly different with p = 0.994.
Whereas it is possible to compare each sample's mean with each other mean applying t-tests when there are
more than two samples, conducting multiple t-tests, can drive to severe inflation of the Type I error rate.
Advantages
Fortunately, we can use ANOVA to test differences among several means for statistical significance without
of ANOVA
raising the Type I error rate, in short, it reduces the experimental error to a great extent. Besides, ANOVA is
simple and suitable for laboratory experiment, as well as can reduce or increase some treatments.
We first test if all means are equal. This is often called the omnibus test. (Omnibus = about everything).
Omnibus & If we conclude that not all means are equal, we sometimes test precisely which means are not equal.
Post hoc test This involves post hoc tests. Post hoc = after that in which that refers to the omnibus test.
→ Only run post hoc tests if the omnibus test is statistically significant.
P a g e | 14

Two-way
ANOVA
▪ Type: Parametic Test Example: The same hospital wants to know did their
▪ Purpose: Compares means of two or more independent different medicines result in different mean Beck’s
groups if there are statistically different among means depression inventory scores, and whether the Beck’s
▪ Assumption: - Nomally distributed depression inventory scores are related to gender in
- No outliers any way. The two-way ANOVA wa run by statistician
- Homogeneity of variences team of hospital.
- Dependent variable should be
quantitative data In short we'll try to gain insight into 4 (medicines) x 2
(measured at interval or ratio level) (gender) = 8 mean BDI scores.
- Two/more independent variable should
Report Results:
be categorical data with more than two
There is a significant main effect of gender with
categorical groups
(measured at ordinal level at least) F(1, 92) = 9.035 , p = 0.003 , ηp2 = 0.089. This means
that within gender variable, there is sinificant
difference between male and female. → Run t-test
There is significant main effect of medicine with
F(3, 92) = 22.842 , p = 0.000 , ηp2 = 0.427. This means
that within medicince variable, there is one or more
sinificant difference to each other. → Run Post Hoc
There is significant interaction effect between gender
and medicine with F(3, 92) = 5.147 , p = 0.002 , ηp2 =
0.144. (We might run test to see their relationship)
P a g e | 15

t-test for
dependent
samples

▪ Known as: Paired t-test, Repeat Measured t-test Example: Educator would like to check if the number
▪ Type: Parametic Test of typed sequences scores (Anzahl getippter
▪ Purpose: Compares the means of two measurements Sequenzen) is different between measurement 1 and 3
taken from the same individual, or related units (They are pre and post tests)
- two diferent time (pre-test & post-test)
- two different condition Report Results: There is a statistically significant
(control & experimental) between the measurement 1 and the measurement 2
▪ Assumption: - Nomally distributed over time with t(35) = 2.494, p = 0.018, d = 0.349 and
- No outliers a medium effect sizes.
- Homogeneity of variences
- Dependent variable should be
quantitative data
(measured at interval or ratio level)
- Independent variable should be
categorical data
(measured at ordinal level at least)
with two categorical groups
or one group with repeat measured
P a g e | 16

ANOVA
Repeat
Measure

▪ Type: Parametic Test Example: Educator would like to check if gender


▪ Purpose: Compares the means of two measurements affects on the number of typed sequences scores
taken from the same individual, or related units (Anzahl getippter Sequenzen; 3 different time)
- two diferent time (pre-test & post-test)
- two different condition Report Results:
(control & experimental) In the Sphericity test, there are no statistical significant
▪ Assumption: - Nomally distributed among variences with p = 0.189. This means variances
- No outliers of the number of typed sequences scores are all equal.
- Dependent variable should be
There is no statistically significant interaction effect of
quantitative data
the gender and the measurements with F(2, 68) =
(measured at interval or ratio level)
0.023, P = 0.977, ηp2 = 0.001.
- Independent variable should be
categorical data * The test of Sphericity is only used when our factor
(measured at ordinal level at least) has more than two level (the number of typed
with more than two categorical groups sequences has three level, in this case).
- The Spericity = Honogeneity variences
(used only when factor has more than
two level)
Greenhouse-
Geisser or Applying when factor has more than two level in ANOVA Repeat Measure, and the variances are not homogeneous.
Hyunh-Feldt
P a g e | 17

t-test for
dependent
These two tests are similar because they share the same assumption. For the dependent t-test or paired-samples
sample &
t-test, we generally apply it to examine two related groups' means. On the other hand, the repeated measures
ANOVA
ANOVA, we primarily apply it to examine three or more groups' means
Repeat
Measure
A value that estimates the strength or size of the association between two variables.
Cohen’s d (for t-test) and Patial Eta-squere (for ANOVA) are the examples of Effect sizes using in social science.
Effect sizes The larger the effect size, the larger the significance test’s power and the smaller the probability of getting errors.

Effect size is more useful than significant level because the p-value will only indicate any statistically significant
values but will not reveal the effect’s size or strength Besides a p-value is not as accurate as an effect size.
Study Test Justification
We could apply a Chi2 test to evaluate if
You compare three groups (18-30-year-old
there is a significant difference between
people, 30-50-year-old people, or the 50-80-
the car-owning responses across the age
year-old people) to find out if there is a Chi squere test
range. Since our dependent variables
difference in owning a car (vs. not owing a car)
don’t follow a normal distribution. (It’s
between the groups.
binary data)
Which test? We attempt to analyze the difference
You compare the reaction times (ms) of three
between the means of more than two
groups that had either coffee, or milk, or nothing ANOVA
categorical independent variable and one
for breakfast.
quantitative dependent variable.
You compare several experimental conditions The values of the dependent variable
with each other. The assumption of normal don’t follow a normal distribution.
Non-parametric test
distribution is violated. The sample size (n) is 7 In this case, we could apply non-
per group. parametric tests instead.

You might also like