Ibm Spss Anubhav
Ibm Spss Anubhav
Ibm Spss Anubhav
"RESEARCH
METHODOLOGY"
This is to certify that the practical titled “Research methodology- Lab” submitted by
Anubhav Gupta to New Delhi Institute of Management, Guru Gobind Singh
Indraprastha University in partial fulfilment of requirement for the award of the
Bachelor of Business Administration degree is an original piece of work carried out
under my guidance and may be submitted for evaluation.
The assistance rendered during the study has been duly acknowledged. No part of
this work has been submitted for any other degree.
Any accomplishment requires the effort of many people and this work is not
different. Regardless of the source, I wish to express my gratitude to those who may
have contributed to this work, even though anonymously.
My final thank goes out to myself who was always encouraged to persevere through
this entire process.
Anubhav Gupta
TABLE OF CONTENTS
S. No. Contents Page No.
Chapter 1 Introduction to SPSS 2
1.1 Introduction to SPSS 3
1.2 About SPSS 4
1.3 Functions of SPSS 4
1.4 Advantages OF SPSS 5
1.5 Disadvantages of SPSS 5-6
Chapter 2 Layout of SPSS 7
2.1 Layout of SPSS 8-9
2.2 Components of SPSS-data view 10
2.3 Variable view 11-12
2.4 Analyze 13
2.5 File 13
2.6 Edit 14
2.7 Transform 14
2.8 Graphs 14-15
Chapter 3 Entering Data to SPSS 16
3.1 Typing data to SPSS 17-22
3.2 Opening file in SPSS 22-23
3.3 Steps to enter data in SPSS 24-28
Chapter 4 SPSS lab exercise 29
4.1 Multiple response 30-31
4.2 Cross tabs 32-36
4.3 Chi square 32-36
4.4 Descriptive (sum, avg, standard deviation) 37-38
4.5 Split file 39-42
4.6 T test 43-49
4.7 Anova 49-51
4.8 Correlation 51-52
4.9 Regression 53-56
4
CHAPTER 1
INTRODUCTION TO SPSS
5
1.1 INTRODUCTION TO SPSS
SPSS stands for “Statistical Package for the Social Sciences”. It is an IBM tool.
This tool first launched in 1968. This is one software package. This package is
mainly used for statistical analysis of the data. SPSS is mainly used in the
following areas like healthcare, marketing, and educational research, market
researchers, health researchers, survey companies, education researchers,
government, marketing organizations, data miners, and many others. It
provides data analysis for descriptive statistics, numeral outcome predictions,
and identifying groups. This software also gives data transformation, graphing
and direct marketing features to manage data smoothly. SPSS is a Windows
based program that can be used to perform data entry and analysis and to
create tables and graphs. SPSS is capable of handling large amounts of data
and can perform all of the analyses covered in the text and much more. SPSS is
commonly used in the Social Sciences and in the business world, so familiarity
with this program should serve you well in the future. SPSS is updated often.
This document was written around an earlier version, but the differences
should not cause any problems. If you want to go further and learn much more
about SPSS, I strongly recommend Andy Field’s book (Field, 2009, Discovering
statistics using SPSS).
6
1.2 ABOUT SPSS
SPSS Statistics is a statistical software suite developed by IBM for data
management, advanced analytics, multivariate analysis, business intelligence,
and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM
in 2009. Current versions have the brand name: IBM SPSS Statistics.
CORE FEATURES OF SPSS
The core functionalities offered in SPSS are:
CHAPTER 2
LAYOUT OF SPSS
On the File menu, click Open and select Output. Select appendixoutput.spo
from the
files that can be found at (At the moment this set of web pages is the most
recent version whichever of my books you are using.) Click Ok. The following
will appear. The left hand side is an outline of all of the output in the file. The
right side is the actual output. To shrink or enlarge either side put your cursor on
the line that divides them. When the double headed arrow appears, hold the left
mouse button and move the line in either direction. Release the button and the
size will be adjusted.
10
Finally, there is the Syntax window which displays the command language used
to run various operations. Typically, you will simply use the dialog boxes to set
up commands, and would not see the Syntax window. The Syntax window
would be activated if you pasted the commands from the dialog box to it, or if
you wrote you own syntax--something we will not focus on here. Syntax files
end in the extension spss
11
2.2 COMPONENTS OF SPSS DATA
VIEW
The Data view
The Data Editor opens immediately upon starting SPSS and,
when empty, looks like a typical spreadsheet. When data is
loaded into the Data Editor, each column will represent a
variable and each row will represent a case. Selecting the tab
at the bottom that's labeled "Variable View" allows the user to
view and edit information about each variable. To open a new
Data Editor, select "File"->"New"->"Data." When the contents
of the Data Editor are saved, the resulting file will have a ".sav"
extension. If a file has been saved in the SAV format you can
open it by selecting "File"->"Open"->"Data."
12
2.3 VARIABLE VIEW
Each variable in an SPSS dataset has a set of attributes that can be edited by toggling
to the "Variable View" tab in the Data Editor:
Name is the variable's machine readable name. This is the name used to
refer to the variable in SPSS's underlying code and, if no "Label" is
defined, the name that will appear at the top of the column in the "Data
View."
Type indicates the type of data that can be stored in the variable's
column. The most frequently used types are "String" (for text) and
"Numeric." SPSS uses the type to know what rules can be applied to a
specific variable. It won't do arithmetic on a string variable, for example.
13
Label sets the name that will be displayed at the top of the column in the
Data Editor, allowing for a human readable representation of the
variable name.
Values sets names given to coded values (e.g. if the variable contains
survey responses where a "0" represents "no" and "1" represents a "yes"
this field can be used to tell SPSS to display the text values instead of the
numerical raw data).
14
2.4 ANALYZE
Analyze Menu: The analyze menu is where all statistical analysis takes place.
From descriptive statistics to regression analysis to nonparametric tests.
2.5 FILE
File Menu: From the file menu you can open several different existing files or a
database file such as an excel file or read in a text file. You can also save any
changes to the current file.
15
2.6 EDIT
Edit Menu: from the Edit menu, you can cut, copy, paste, insert variables, insert
cases, or use find in the Data Editor window.
2.7 TRANSFORM
Transform Menu: The transform menu is where you will find the options to do some
computations on variables, to create new variables from existing ones or recode old variables.
16
2.8 GRAPHS
Graphs Menu: The graph menu is where you can create high resolution plots and
graphs to be edited in the chart editor window or you can create interactive graphs.
17
CHAPTER 3
ENTERING DATA
TO SPSS
18
3.1 TYPING DATA TO SPSS
In this section, we will learn various file formats available in SPSS, and we can
work with them. If we want to import any file in SPSS, we have to go to the File
menu, and click on Open and select Data like this:
Alternatively, we can directly click on the following folder icon and directly
open the location of our data file.
19
Once we click on this option, we have to select where our data file is located.
Suppose my data file is located at the desktop, so we will select the desktop. If
we select a desktop, we can see a file named as SPSS File.sav. By default,
the .sav file type is selected, which is the Standard file Extension type in SPSS.
It is the most commonly used file format when we are working with the SPSS.
Apart from this, when we click on File of type option, we can see a range of
other file formats, which are available with SPSS. Some of which we work and
some don't use them, but it's important to understand them. The first file format
is SPSS Statistics. It is the standard file type with .sav and .zsav extension. The
default file format. .zsav is a compressed file format of the standard extension
20
type. When we work with the large data set, we want to compress our data set or
the file to save our disk space. If we have a large file, we can compress them
using the Winzip or Winrar software and make a small file. This type of file
format .zsav can be used to open the compressed file format in the SPSS, which
is supposedly the smaller file format.
After that, we have SPSS/PC+ file format with the extension .sys. This format is
not used these days. This file format is used if we are working with the dos
(disk operating system). This file format is compatible with the old IBM
computers.
21
Now we have a Portable file format with the extension .por. This is an important
file format that is used whenever we want to share our data file across various
operating systems or various versions of SPSS. So .por file format ensures that
our file gets easily opened if someone uses IMB SPSS version 15 or 16 or uses
a different operating system as compared to ours.
Now we have Excel file format with the extensions .xls, .xlsx, _xlsm. This is the
most popular file format other than SPSS Statistics (* .sav). In this, we collect
our data in ms excel, and then we import our data in SPSS and start an analysis
with it. So excel file format is very common.
22
So these are the various types of file formats that are used in SPSS. Now, we
have to know which file formats are important for us. At the top of the
hierarchy, SPSS standard file format .sav definitely comes, but only if we are
directly saving our data in SPSS or we already have standard SPSS files from
the internet, and we want to open them in SPSS, then use it. Excel is the most
important after .sav. In some cases, we find that people use more Excel file
format compared to .sav file format. But once we have imported our data in
SPSS, we save it in a .sav file format. After this, the Text file format is very
important for us. We often don't work with them because we often work with
Excel, but it's definitely important to understand them.
23
24
25
3.3 STEPS TO ENTER DATA IN
SPSS
When you open the SPSS program, you will see a blank spreadsheet in Data View. If
you already have another dataset open but want to create a new one, click File
> New > Data to open a blank spreadsheet.
You will notice that each of the columns is labeled “var.” The column names will
represent the variables that you enter in your dataset. You will also notice that each
row is labeled with a number (“1,” “2,” and so on). The rows will represent cases that
will be a part of your dataset. When you enter values for your data in the spreadsheet
cells, each value will correspond to a specific variable (column) and a specific case
(row).
26
27
28
29
(Explain in detail with one example)
30
CHAPTER 4
SPSS LAB EXERCISE
31
4.1 MULTIPLE RESPONSE
Variable Sets creates subsets of variables to display in the Data Editor and in
dialog box variable lists.
Below is showing how to perform multiple response:
32
Case Summary
Cases
$grocery Frequencies
N Percent
33
4.2 CROSSTABS AND,
4.3 CHI-SQUARE TEST
A crosstab shows the relationship between two or more variables by recording
the frequency of observations that have multiple characteristics.
The Chi-Square Test of Independence determines whether there is an
association between categorical variables.
Below showing how to perform chi- square and crosstabs:
34
Case Processing Summary
Cases
35
Swiggy * Zepto 56 100.0% 0 0.0% 56 100.0%
Swiggy * Country Delight 56 100.0% 0 0.0% 56 100.0%
Swiggy * JioMart 56 100.0% 0 0.0% 56 100.0%
Swiggy * Zomato 56 100.0% 0 0.0% 56 100.0%
OTIPY * Zepto 56 100.0% 0 0.0% 56 100.0%
OTIPY * Country Delight 56 100.0% 0 0.0% 56 100.0%
OTIPY * JioMart 56 100.0% 0 0.0% 56 100.0%
OTIPY * Zomato 56 100.0% 0 0.0% 56 100.0%
Dunzo * Zepto 56 100.0% 0 0.0% 56 100.0%
Dunzo * Country Delight 56 100.0% 0 0.0% 56 100.0%
Dunzo * JioMart 56 100.0% 0 0.0% 56 100.0%
Dunzo * Zomato 56 100.0% 0 0.0% 56 100.0%
Crosstab
Zepto Total
0 1
Count 20 9 29
Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
sided) sided) sided)
a
Pearson Chi-Square 3.433 1 .064
b
Continuity Correction 2.505 1 .114
Likelihood Ratio 3.466 1 .063
Fisher's Exact Test .104 .056
Linear-by-Linear
3.372 1 .066
Association
N of Valid Cases 56
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 11.57.
36
b. Computed only for a 2x2 table
Symmetric Measures
Crosstab
0 1
Count 23 6 29
Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
sided) sided) sided)
a
Pearson Chi-Square 4.703 1 .030
b
Continuity Correction 3.558 1 .059
Likelihood Ratio 4.781 1 .029
Fisher's Exact Test .048 .029
Linear-by-Linear Association 4.619 1 .032
N of Valid Cases 56
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 9.16.
b. Computed only for a 2x2 table
37
Symmetric Measures
4.4 DESCRIPTIVE
The descriptive statistics feature of SPSS can also give summary statistics such
as the mean, median and standard deviation.
Below pictures showing how to perform Descriptive.:
38
Descriptive Statistics
39
4.5 SPLIT FILE
When analyzing data, it is sometimes useful to temporarily "group" or "split"
your data to compare results across different subsets.
Here, its shown how to use split function:
40
41
Warnings
Frequency tables are not produced for the following variables because they are split variables: Gender of
student.
Statistics
Valid 11 11
N
Missing 0 0
Frequency Table
42
Gender of student = female
Statistics
Valid 9 9
N
Missing 0 0
Frequency Table
4.6 T -TEST
A t-test is an inferential statistic used to determine if there is a significant
difference between the means of two groups and how they are related. T-tests
are used when the data sets follow a normal distribution and have unknown
variances, like the data set recorded from flipping a coin 100 times. The t-test is
a test used for hypothesis testing in statistics and uses the t-statistic, the t-
distribution values, and the degrees of freedom to determine statistical
significance.
Using a T-Test
Consider that a drug manufacturer tests a new medicine. Following standard
procedure, the drug is given to one group of patients and a placebo to another
group called the control group. The placebo is a substance with no therapeutic
value and serves as a benchmark to measure how the other group, administered
the actual drug, responds. After the drug trial, the members of the placebo-fed
control group reported an increase in average life expectancy of three years,
while the members of the group who are prescribed the new drug reported an
increase in average life expectancy of four years. Initial observation indicates
that the drug is working. However, it is also possible that the observation may
be due to chance. A t-test can be used to determine if the results are correct and
43
applicable to the entire population. Four assumptions are made while using a t-
test. The data collected must follow a continuous or ordinal scale, such as the
scores for an IQ test, the data is collected from a randomly selected portion of
the total population, the data will result in a normal distribution of a bell-shaped
curve, and equal or homogenous variance exists when the standard variations
are equal.
T-Test Formula
Calculating a t-test requires three fundamental data values. They include the
difference between the mean values from each data set, or the mean difference,
the standard deviation of each group, and the number of data values of each
group. This comparison helps to determine the effect of chance on the
difference, and whether the difference is outside that chance range. The t-test
questions whether the difference between the groups represents a true difference
in the study or merely a random difference.The t-test produces two values as its
output: t-value and degrees of freedom. The t-value, or t-score, is a ratio of the
difference between the mean of the two sample sets and the variation that exists
within the sample sets.The numerator value is the difference between the mean
of the two sample sets. The denominator is the variation that exists within the
sample sets and is a measurement of the dispersion or variability.
This calculated t-value is then compared against a value obtained from a critical
value table called the T-distribution table. Higher values of the t-score indicate
that a large difference exists between the two sample sets. The smaller the t-
value, the more similarity exists between the two sample sets.
44
45
46
INDEPENDENT SAMPLE TEST
The samples of independent t-tests are selected independent of each other where the data sets
in the two groups don’t refer to the same values. They may include a group of 100 randomly
unrelated patients split into two groups of 50 patients each. One of the groups becomes the
control group and is administered a placebo, while the other group receives a prescribed
treatment. This constitutes two independent sample groups that are unpaired and unrelated to
each other.
47
48
49
PAIRED TEST
The correlated t-test, or paired t-test, is a dependent type of test and is performed when the
samples consist of matched pairs of similar units, or when there are cases of repeated
measures. For example, there may be instances where the same patients are repeatedly tested
before and after receiving a particular treatment. Each patient is being used as a control
sample against themselves. This method also applies to cases where the samples are related
or have matching characteristics, like a comparative analysis involving children, parents, or
siblings.
50
51
4.7ANOVA
The one-way analysis of variance (ANOVA) is used to determine whether there are any
statistically significant differences between the means of two or more independent (unrelated)
groups (although you tend to only see it used when there are a minimum of three, rather than
two groups). For example, you could use a one-way ANOVA to understand whether exam
performance differed based on test anxiety levels amongst students, dividing students into
three independent groups (e.g., low, medium and high-stressed students). Also, it is important
to realize that the one-way ANOVA is an omnibus test statistic and cannot tell you which
specific groups were statistically significantly different from each other; it only tells you that
at least two groups were different. Since you may have three, four, five or more groups in
your study design, determining which of these groups differ from each other is important.
You can do this using a post hoc test (N.B., we discuss post hoc tests later in this guide).
52
53
4.8 CORRELATION
is a statistical technique that shows how strongly two variables are related to each other or the
degree of association between the two. For example, if we have the weight and height data of
taller and shorter people, with the correlation between them, we can find out how these two
variables are related. We can also find the correlation between these two variables and say
that their weights are positively related to height. Correlation is measured by the correlation
coefficient. It is very easy to calculate the correlation coefficient in SPSS. Before calculating
the correlation in SPSS, we should have some basic knowledge about correlation. The
correlation coefficient should always be in the range of -1 to 1.
54
55
4.9 REGRESSION
In this section, we will learn Linear Regression. Linear regression is used to study the cause
and effect relationship between the variable. Now there are many types of regression. When
we do a cause and effect analysis, we begin with linear regression.
Linear regression refers to an analysis used to establish the cause and effect between two
variables. We presumed that they are linearly related.
Linear regression means that if we increase the independent variable or input variable by one
unit or sum unit, there will be a fixed amount of increase in the dependent variable. So if we
want to quantify for every unit increase in the independent variable, what would be the
increase or decrease in the dependent variable. In this case, we have a linear regression kind
of arrangement.
56
57
Notes
Variables Entered/Removeda
Model Summary
58
Model R R Square Adjusted R Square Std. Error of the Estimate
a
1 .274 .075 .035 .937
ANOVAa
Total 43.673 48
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
59