0% found this document useful (0 votes)

18 views8 pages

SEMINAR Data Screening

The document outlines a session on data screening and cleaning for multivariate analyses using SPSS, focusing on a dataset of 465 male athletes. Key learning outcomes include identifying missing data, checking for normality, and addressing outliers. The session involves practical exercises on continuous and categorical variables to prepare the data for future analyses.

Uploaded by

Selvakrishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views8 pages

SEMINAR Data Screening

Uploaded by

Selvakrishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Data Screening & Cleaning for Multivariate Analyses

Dataset
Please download ‘Data – Data Screening.sav’ from the course Moodle page to your Desktop.

Learning Outcomes
Successful completion of the session means you will be able to use SPSS to perform
comprehensive data screening and cleaning. In particular, you will be able to:
1. Screen for missing data, violations of normality, outliers, linearity and
homoscedasticity
2. Tackle problems revealed by data screening appropriately

Dataset to be Used
This hypothetical dataset of 465 male athletes contains information on various characteristics
that might predict sporting success. We would typically go on to the explore how well these
characteristics predicted some dependent measure of sporting performance (e.g. time taken
to run a mile), probably in a multiple regression analysis (covered in a few weeks). But for the
purposes of this exercise we are only performing data screening and cleaning in order to
ensure the data would be ready for this, or any, type of multivariable analysis.

Continuous variables
Height (in cm)
Age (in years)
Weight (in lbs)
Strength (rated 0 - 10 on some performance test)
Speed (graded on a 0 - 5 scale on some performance test)

Categorical variables
High-protein diet (0=no, 1=yes)
Train more than 4 times a week (0=no, 1=yes)

Note: To keep the length of the seminar session manageable, we will only perform data
screening on a subset of variables (usually on variables where there is a problem). You would
usually perform full data screening on all variables. Note also that there are often alternative,
equally valid, data screening procedures that are not performed in this session.

1
Question 1 – Missing Data

First, we will check how many values are missing for each variable. Go to
Analyse > Missing Value Analysis
Put the continuous variables into the ‘Quantitative Variables’ box and the categorical
variables into the ‘Categorical Variables’ box (see below)

Now click Descriptives and select the options below

Click Continue then OK

(a) From the ‘Univariate Statistics’ table, which variables have missing data?
(the relevant column is highlighted below)

2
(b) For one of these variables, the single missing case could simply be deleted (i.e. excluded
from the analysis). For the other, however, excluding missing cases could be potentially
problematic – why?

If the people with missing values for strength are different in some way (e.g. they report being
a different age, gender etc.) compared to the people for whom strength values are present,
this suggests these cases (people) are not missing randomly. You can find out by comparing
the group with missing data (for the strength variable) vs. the group with values present (for
strength) using t-tests comparing present vs missing cases on other continuous variables.

The second Table in the SPSS output gives you the results of these t-tests, with each column
representing the results of a t-test comparing the strength values missing vs. strength values
present group for that variable as a DV; e.g. column 1 below would show the results of a t-test
with height as the DV (and strength group missing vs. strength group present as the IV).

Note: t-tests are not performed for DVs of protein diet or training as these are not continuous
DVs (which is a requirement of t-tests).

(c) Row 3 of the t-tests table shows the p-values. Are there any significant differences for
missing vs. non-missing cases for any of the continuous variables? Do the missing cases
therefore appear to be missing randomly or non-randomly?

Although we will not do this now, the Missing Values Analysis option in SPSS offers several
methods of imputing (estimating) missing data.

3
Question 2 – Normality and Univariate Outliers

To check for univariate outliers and normality, inspect the histograms for the continuous
variables in the dataset -.

Analyze > Descriptive Statistics > Frequencies

Put height, age, weight, strength and speed in the Variables box
Hit the Charts button, select Histograms and click Continue
Hit the Statistics button and from the pop-up window, check skewness and
kurtosis and click Continue
Untick the option to display frequency tables, then click OK

(a) Write down the name of the variable that appears to contain an obvious univariate outlier.

(b) In this case we will delete the outlier. The easiest way to identify the outlier is to order
weight values from highest to lowest. To do this, right-click the name of the problem variable
(at the top of the column) and choose Sort Descending.

Look at the data and see you will there are two values of 234 (lbs) – delete these values.

4
NOTE: It is very important to both make sure that any deletion of data is usually (A)
performed on a COPY of the dataset, and (B) that you consider very carefully whether deletion
is the right option (the lecture notes provide more detailed notes on handling outliers). In this
case deletion is unproblematic because the number of univariate outliers is so small.

(c) Which variable appears to be skewed and is this skew positive or negative? (The ‘Statistics’
table also give information about skewness and kurtosis and should confirm what the chart
suggests)

Given the apparent non-normality of Speed, remedial action is required. We can try to reduce
the positive skew by a logarithmic transformation.

Using Transform > Compute create a log-transformed variable of

log_speed = LG10(speed) as shown below1.
Click OK.

Plot the histogram for log_speed using the below, as we did before

Analyse > Descriptive Statistics > Frequencies

Enter log_speed as the variable
Click OK

(d) Have skew and kurtosis been reduced?

1
Sometimes (not here) you will need to add ‘+1’ to the variable you are transforming to
ensure we don’t ask SPSS to calculate the log of 0, which is a non-existent number
5
Question 3 - Homoscedasticity & Linearity

Use the Scatterplot command from the Graphs menu, as shown below, to plot the
relationship between strength (x-axis) and weight (y-axis).

(a) Does it look linear?

(b) Does it look homoscedastic?

Note: In a full data screening, you would usually examine scatterplots for more variable
pairings.

6
HOMEWORK EXTRA

Try this additional exercises at home (or in class if you have finished all of the other exercises)
to develop a more in-depth understanding of how to handle outliers in SPSS.

Univariate Outliers
Another way to identify the highest and lowest values for weight, or any other variable, is to
use the Explore option:

Descriptive Statistics > Explore and enter weight in the Dependent List box.

Then click on the Statistics button and select Outliers and Percentiles, Continue
and OK

The resulting output will give you a boxplot of the data to help you identify univariate outliers
(as an alternative to histograms). The ‘Extreme Values’ table will also give you the 5 lowest
and 5 highest values for this variable along with the row number showing where these values
occur (under the ‘Case Number’ column).

Multivariate Outliers

To check for multivariate outliers (see next page for SPSS screenshot) :

Analyse> Regression > Linear

Enter id into the ‘Dependent’ box2
Enter all variables except id into the ‘Independent(s)’ list
Click Save to record the Mahalanobis distance (MD) for each subject (this will be
added to the dataset as a new variable MAH_1).

2
The multivariate outlier statistic we want looks only at the independent variables so we aren’t
interested in what goes in the dependent variable box as it isn’t actually included in the calculations.
A dependent variable is entered purely to get the analysis to run.
7
You are not interested in the output of this regression analysis, but it calculates the MD values
you want and puts them in the SPSS dataset for you. Have a look at the dataset and you will
see the new MAH_1 variable which has a value for every participant.

You can identify who is a multivariate outlier by finding the participants whose value on
MAH_1 exceeds the critical value for the 2. This critical value corresponds to that for p =
0.001 and df = the number of IVs -- see the MV Outliers slide of the lecture notes. (You could
use the Compute command on the Transform menu to find the relevant critical value of 2, by
using the IDF.CHISQ function that SPSS provides). The critical value when df = 7 is 24.32.

The easiest way to identify if any MD values have exceeded this critical value is to Sort
MAH_1 in descending order as we did earlier with the univariate outlier for Weight.

Complete SPSS Tests
No ratings yet
Complete SPSS Tests
148 pages
Data Analysis Using SPSS: Research Workshop Series
No ratings yet
Data Analysis Using SPSS: Research Workshop Series
86 pages
Using Multivariate Statistics: Barbara G. Tabachnick
100% (1)
Using Multivariate Statistics: Barbara G. Tabachnick
22 pages
SPSS Data Analysis
100% (6)
SPSS Data Analysis
47 pages
Oup 6
No ratings yet
Oup 6
48 pages
Natalie Loxton Data Screening
No ratings yet
Natalie Loxton Data Screening
36 pages
Statistical Tests - Handout PDF
No ratings yet
Statistical Tests - Handout PDF
21 pages
Data Screening (Sometimes Referred To As "Data Screaming") Is The Process of Ensuring Your Data Is
No ratings yet
Data Screening (Sometimes Referred To As "Data Screaming") Is The Process of Ensuring Your Data Is
4 pages
Probablity Lab
No ratings yet
Probablity Lab
47 pages
L18&19 Data Exploration
No ratings yet
L18&19 Data Exploration
50 pages
Batengi Chapter 4 & 5
No ratings yet
Batengi Chapter 4 & 5
25 pages
Module 3 Data Preparation
No ratings yet
Module 3 Data Preparation
33 pages
Biostat Portfolio
No ratings yet
Biostat Portfolio
21 pages
Multivariant Data.
No ratings yet
Multivariant Data.
36 pages
Descriptive Descriptive Analysis and Histograms 1.1 Recode 1.2 Select Cases & Split File 2. Reliability
100% (1)
Descriptive Descriptive Analysis and Histograms 1.1 Recode 1.2 Select Cases & Split File 2. Reliability
6 pages
Spss Training Manual
No ratings yet
Spss Training Manual
94 pages
PMP Deck PMBOK6 V2.2 Edited PDF
No ratings yet
PMP Deck PMBOK6 V2.2 Edited PDF
360 pages
Lesson 3
No ratings yet
Lesson 3
22 pages
SPSS Data Analysis
No ratings yet
SPSS Data Analysis
47 pages
Data Screening and Main Model Analysis in Spss
No ratings yet
Data Screening and Main Model Analysis in Spss
26 pages
Past Exam Questions
0% (1)
Past Exam Questions
13 pages
Deepu Final
No ratings yet
Deepu Final
9 pages
Preliminary Analysis: - Descriptive Statistics. - Checking The Reliability of A Scale
No ratings yet
Preliminary Analysis: - Descriptive Statistics. - Checking The Reliability of A Scale
92 pages
BRM Statwiki
No ratings yet
BRM Statwiki
55 pages
Group 1 Testing Assumptions
No ratings yet
Group 1 Testing Assumptions
35 pages
Statistical Treatment of Data
No ratings yet
Statistical Treatment of Data
19 pages
MBR Lab Week 10-12-1
No ratings yet
MBR Lab Week 10-12-1
65 pages
Presentation 3
No ratings yet
Presentation 3
14 pages
Test of Relationship Parametric
No ratings yet
Test of Relationship Parametric
9 pages
Methods Notes
No ratings yet
Methods Notes
9 pages
Session 2 - QRT - Oct 3, 2020
No ratings yet
Session 2 - QRT - Oct 3, 2020
17 pages
SPSS Study Skills Task Sheet
No ratings yet
SPSS Study Skills Task Sheet
6 pages
Act 2 AGJ
No ratings yet
Act 2 AGJ
6 pages
Lecture 6 (Data Analysis and Interpretation)
No ratings yet
Lecture 6 (Data Analysis and Interpretation)
18 pages
SEM Boot Camp Day 1 Morning: Basics & Data Screening: James Gaskin James - Gaskin@byu - Edu
No ratings yet
SEM Boot Camp Day 1 Morning: Basics & Data Screening: James Gaskin James - Gaskin@byu - Edu
38 pages
320 Course Reader
No ratings yet
320 Course Reader
41 pages
Exploratory Factor Analysis
No ratings yet
Exploratory Factor Analysis
22 pages
SPSS File
No ratings yet
SPSS File
21 pages
Act2 Apren GVZA
No ratings yet
Act2 Apren GVZA
4 pages
Assignment
No ratings yet
Assignment
11 pages
SPSS Notes
No ratings yet
SPSS Notes
7 pages
Course Code: 8614 Course Name: Educational Statistics Assignment: 2 Semester: Spring 2022 Program: B.Ed
No ratings yet
Course Code: 8614 Course Name: Educational Statistics Assignment: 2 Semester: Spring 2022 Program: B.Ed
19 pages
EPIC Consulting Group
100% (1)
EPIC Consulting Group
13 pages
SPSS Workshop: Utilizing and Implementing SPSS in Our OC-Math Statistics Classes
No ratings yet
SPSS Workshop: Utilizing and Implementing SPSS in Our OC-Math Statistics Classes
11 pages
Descriptive Statistics Using Microsoft Excel
No ratings yet
Descriptive Statistics Using Microsoft Excel
5 pages
Lecture 12 (Data Analysis and Interpretation
No ratings yet
Lecture 12 (Data Analysis and Interpretation
16 pages
(Ebook-Pdf) - Statistics - Spss Guide For Dummies
No ratings yet
(Ebook-Pdf) - Statistics - Spss Guide For Dummies
11 pages
Oup 8
No ratings yet
Oup 8
36 pages
Guideline For Final Year Project - Research Supervision: Faculty of Business, Accountancy and Management
No ratings yet
Guideline For Final Year Project - Research Supervision: Faculty of Business, Accountancy and Management
71 pages
Data Screening Assumptions
No ratings yet
Data Screening Assumptions
29 pages
Oup 9
No ratings yet
Oup 9
26 pages
SPSS Workshop PDF
No ratings yet
SPSS Workshop PDF
24 pages
Published With Written Permission From SPSS Statistics 11111
No ratings yet
Published With Written Permission From SPSS Statistics 11111
21 pages
Reading SPSS2 Output
No ratings yet
Reading SPSS2 Output
2 pages
Sales and Marketing Notes
No ratings yet
Sales and Marketing Notes
141 pages
Data Screening and Psychometrics
No ratings yet
Data Screening and Psychometrics
7 pages
SPSS Def + Example - New - 1!1!2011
No ratings yet
SPSS Def + Example - New - 1!1!2011
43 pages
Lab 3 - Kristi Proc Univariate
No ratings yet
Lab 3 - Kristi Proc Univariate
10 pages
The Role of Computers in The Development of A Young Child Has Been A Widely Controversial Topic For Decades
No ratings yet
The Role of Computers in The Development of A Young Child Has Been A Widely Controversial Topic For Decades
1 page
Self-Efficacy and Mental Toughness of Philippine National Athletes: A Basis For Enhanced Motivational Program
No ratings yet
Self-Efficacy and Mental Toughness of Philippine National Athletes: A Basis For Enhanced Motivational Program
12 pages
How To Do A Two-Way ANOVA in SPSS
No ratings yet
How To Do A Two-Way ANOVA in SPSS
5 pages
Effects of Societal Gender Roles On Male and Female Language Use
No ratings yet
Effects of Societal Gender Roles On Male and Female Language Use
10 pages
RBP020L062S FPM Assessment Brief 2024-25 - Final
No ratings yet
RBP020L062S FPM Assessment Brief 2024-25 - Final
13 pages
Excel Sensitivity Analysis
No ratings yet
Excel Sensitivity Analysis
32 pages
Home Science
No ratings yet
Home Science
29 pages
Business Statistics Consolidated Assignment-1 - 10th February
No ratings yet
Business Statistics Consolidated Assignment-1 - 10th February
21 pages
Unit Plans ss201
No ratings yet
Unit Plans ss201
24 pages
Electronic Circuits Homework
100% (1)
Electronic Circuits Homework
5 pages
Strategy Analysis and Choice: Strategic Management: Concepts and Cases. 9 Edition
No ratings yet
Strategy Analysis and Choice: Strategic Management: Concepts and Cases. 9 Edition
71 pages
Test Bank For Business Analytics 4th Edition by Camm
No ratings yet
Test Bank For Business Analytics 4th Edition by Camm
21 pages
Final Transcript
No ratings yet
Final Transcript
2 pages
Technologyplan Docx 1
No ratings yet
Technologyplan Docx 1
11 pages
M6 Assignment-SpringA25 - WFAvQj1
No ratings yet
M6 Assignment-SpringA25 - WFAvQj1
20 pages
Why and How To Create A Useful Outline
No ratings yet
Why and How To Create A Useful Outline
4 pages
Student Colloquialism and It's Impact On Standard English Aquisition - Research PHD
No ratings yet
Student Colloquialism and It's Impact On Standard English Aquisition - Research PHD
4 pages
Chapter 6
No ratings yet
Chapter 6
16 pages
Efficacy and Safety of Greater Occipital Nerve Block For The Treatment of Cluster Headache - A Systematic Review and Meta-Analysis - 2020
No ratings yet
Efficacy and Safety of Greater Occipital Nerve Block For The Treatment of Cluster Headache - A Systematic Review and Meta-Analysis - 2020
32 pages
Grade 7 Q4 Summative Test SY 2022-2023
No ratings yet
Grade 7 Q4 Summative Test SY 2022-2023
4 pages
Perceived Effect of Motivation On The Job Performance of Library Personnel of Universities in Benue State, Nigeria
No ratings yet
Perceived Effect of Motivation On The Job Performance of Library Personnel of Universities in Benue State, Nigeria
10 pages
Green Marketing
No ratings yet
Green Marketing
13 pages
Empowering Waste Collection
No ratings yet
Empowering Waste Collection
14 pages
MAPE
No ratings yet
MAPE
5 pages
Gregmat Advice: (My Plan)
No ratings yet
Gregmat Advice: (My Plan)
11 pages
Econometrics Assignment 1
No ratings yet
Econometrics Assignment 1
6 pages
Shubham Tripathi CV PDF
No ratings yet
Shubham Tripathi CV PDF
3 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
From Everand
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
Jens K. Perret
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet

SEMINAR Data Screening

Uploaded by

SEMINAR Data Screening

Uploaded by

Data Screening & Cleaning for Multivariate Analyses

Now click Descriptives and select the options below

Click Continue then OK

Analyze > Descriptive Statistics > Frequencies

Using Transform > Compute create a log-transformed variable of

Analyse > Descriptive Statistics > Frequencies

(d) Have skew and kurtosis been reduced?

(a) Does it look linear?

(b) Does it look homoscedastic?

Analyse> Regression > Linear

You might also like