Final Spss
Final Spss
(2023-2026)
Roll.no.00590201823
1
TABLE OF CONTENTS
Worksheet 1
1. 3
Frequency Distribution
Worksheet 2
2. 10
Measures of Central Tendency
Worksheet 3
3. 15
Outlier Texting
Worksheet 4
4. 21
One Sample T-Test
Worksheet 5
5. 26
Paired sample T-Test
Worksheet 6
6. 30
Independent-Sample T-
Test
Worksheet 7
7. 39
One way of Anova
Worksheet 8
8. 50
Chi-Square Test
2
WORKSHEET 1
FREQUENCY DISTRIBUTION
Frequency distribution is very common and important method for analyzing the
nominal (categorical) and ordinal (ranking) variables in a dataset. In every
questionnaire, one section is dedicated to demographic profiles. The different
categories of demographic profiles in a dataset are normally represented by
frequency distribution in a tabular as well as graphical forms.
Dataset of workers working in small and medium scale enterprises in city India
is shown below in table 1.1
3
S Age S Age
Gender Religion Education Gender Religion Education
No. Group No. Group
1 1 1 3 2 26 1 5 3 2
2 1 4 2 1 27 1 1 1 2
3 1 3 3 4 28 1 5 2 2
4 1 3 1 3 29 1 1 2 4
5 2 4 1 1 30 1 5 2 2
6 1 4 1 1 31 1 2 3 5
7 2 2 1 1 32 1 3 2 1
8 1 2 3 1 33 2 2 2 2
9 1 2 2 1 34 1 5 2 1
10 2 2 2 2 35 2 5 1 2
11 1 3 1 2 36 2 5 2 3
12 1 3 1 3 37 2 2 3 4
13 1 4 1 4 38 2 5 2 32
14 2 1 2 3 39 1 3 3 3
15 1 5 2 2 40 1 5 2 2
16 2 2 2 2 41 1 2 1 1
17 1 1 1 5 42 1 2 3 1
18 1 5 1 5 43 1 3 2 1
19 1 5 2 2 44 1 5 2 5
20 1 2 2 5 45 1 2 1 2
21 2 5 2 2 46 2 5 2 3
22 1 2 2 1 47 2 2 1 2
4
23 1 2 3 1 48 1 3 3 4
24 1 2 1 5 49 1 4 2 4
25 2 5 2 5 50 2 1 1 1
The coding details of different variables in the dataset are shown below table 1.2
Table 1.2: Dataset of coding of different variables
Variables Numeric codes
1. Gender
GENDER
1. Male
2. Female
2. Age Group
Age Group
2. 26-35 yrs.
3. 36-45 yrs.
5
4. 46-55 yrs.
5. 56 and above
3. Religion
RELIGION
1. Hindu
2. Muslim
3. Other religion
4. Education
EDUCATION
1. Below 10th
2. High school
3. Intermediate
4. Technical diploma
5. Degree level
6
7
SPSS Commands
Step 1: Click Analyze → Descriptive statistics → Frequency.
8
Step 3: Select the type of chart.
Education of worker
Frequency Percent Valid Percent Cumulative Percent
below 10th 14 30.8 32.0 32.0
Conclusion: The education level of 50 different workers are calculated and found
that the number of workers below 10th grade, high school, intermediate, technical
diploma and degree level are 14 (28%), 17(34%), 5(10%) and 7(14%) respectively.
10
WORKSHEET-2
Measures of Central Tendency
11
observations are lower than median value. The etension of median are quartiles,
deciles, and percentiles.
Mode
The mode of available is the observation with highest frequency or highest
concentration of frequencies.
Objective: To calculate mean, median, mode and quartile of monthly sales of
company.
Dataset of monthly sales figures (in crores) of an enterprise for 50 consecutive are
given in Table 2.1.
Table2.1: Monthly Sales Figures of 50 consecutive months of an enterprise
12
15 12 30 34 45 54
SPSS Commands
Step 1: Click Analyze → Descriptive statistics → Frequency.
Step 2: Transfer the variable to variable window and click “statistics” as shown in
the figure 2.2
13
14
Step 3: Select the option ‘mean’ , ‘median’ , ‘mode’ and ‘quartiles’ and click
‘continue’ and then ‘ok’ as shown in the figure 2.3
Statistics
monthly sales
Valid 50
N
Missing 0
Mean 60.90
Median 55.00
Mode 45a
25 38.50
Percentiles 50 55.00
75 76.25
a. Multiple modes exist. The
smallest value is shown
Conclusion: Table 2.2 represents SPSS output.
15
Mean value of sales figure is 61.38
Median value of sales figure is 55.00
Mode value of sales figure is 45a
Percentile (25) value is – 40.00
Percentile (50) value is – 55.00
Percentile (75) value is – 75.25
16
WORKSHEET-3
Outlier Texting
Outlier are:
The extreme observations lying in the extreme tails of the probability
distribution of the variables.
The observations with the highest residuals for a relation model (regression
model)
The observations that, if not included in the analysis, cause a significant
difference in the result.
On the basis of the cases mentioned above, outliers can be divided into three
different types:
1. Extreme values or univariate outliers
2. Multivariate outliers
3. Influencers
Two popular method of detecting outliers are
1. Extreme values
2. Box plot
17
Table 3.1: Data of 50 players
SPSS Commands
19
Step2: Send the hours spend variable in the dependent list and then click statistics.
20
Then required output is shown in table 3.2 and box plot diagram in shown in figure
3.4
Table 3.2: SPSS output of outlier testing
Extreme Values
Case Number Value
1 22 13.0
2 8 5.0
Highest 3 29 5.0
4 4 4.5
5 26 4.5
hours
1 30 1.0
2 11 1.0
Lowest 3 10 1.0
4 15 1.5
5 13 1.5a
21
Conclusion: Table 3.2 represents SPSS output of outliers .It represents extreme
high and extreme low values in the sportsman dataset. Case number 22,29,4,26 and
27 have extreme high values and case number 8,30,11,10 and 15 have extreme
lower values. Figure 3.4 represents that case number 22 is an outliers.
22
WORKSHEET-4
Test of Difference: One sample T-Test
23
Table 4.1: Data of weight lost by 50 customers a month after joining the
weight loss program
24
SPSS Commands
25
Step3: Click ‘Options’
26
The final SPSS output (statistical package of social science) in tabular form is
shown below in Table 4.1 and Table 4.2 and respectively.
Table 4.1
One-Sample Statistics
N Mean Std. Std. Error
Deviation Mean
weightlost 50 4.0 1.11 .158
2 6
Table 4.2
One-Sample Test
Test Value = 0
Conclusion: Sample mean is 4.02 kgs which is less than the claimed population
mean of 5 kgs. The t statistics is found to be 25.481 with p value of .000 .Since the
p value of t statistics is less than 5% level of significance, hence with 95%
confidence level the null hypothesis of no difference between sample mean and
population mean cannot be accepted and it can be concluded that sample mean is
significantly different from population mean. Therefore, the company is making a
wrong statement about the weight loss of its customers.
27
Worksheet-5
Paired sample t-
test
A paired sample t-test is also known as repeated sample t-test because data
(responses) is collected from same respondents but at different time periods. A
paired sample t-test should be used when we want to test the impact of a event or
experiment on the variable under study. In this case, the data is collected from the
same respondents before and after the event. After this, means are compared. The
null hypothesis of paired sample t-test is that the means of pre-sample and post-
sample are equal. Some of the instances where paired sample t-test can be applied
are as follows:
a. Analyzing the effectiveness of training program on the performance of
employees of a business enterprise.
b. Analyzing the impact of a new advertisement on the sales of a product.
c. Analyzing the impact of a policy on the volatility in the stock market.
d. Analyzing the difference of the respondents of the same group to two different
treatments.
Example: The HR manager of a business enterprise wants to analyze the impact of
a training program conducted for 30 employees. The purpose of conducting the
training program was to improve the performance of employees. The performance
scores of the employees are noted before and after the training program. Now, the
paired samples t-test is applied in order to analyze the impact of the training
program.
Objective: to find out the difference between before training and after training.
28
The data is given in Table 1.1
29
Table 1.1: Data of the Performance of Employees
30
SPSS Output
Step2: Click on the variable pre training score. Then click on the post training
variable. Now, move the paired variable into the paired variables box by clicking
on the right arrow button. Finally click on 'OK' as shown in fig4.2.2
31
Paired Samples Test
Paired Differences t df Sig. (2-tailed)
Mean Std. Std. 95% Confidence Interval of
Deviation the
Difference
Error Mean
Lower Upper
- 9.56460 1.74625 -20.93815 -13.79519 -9.945 29 .000
17.36667
pretraining
-
postraining
Since the significance value is .002 which is less than 5% significance level, we
can states with 95% confidence level that null hypothesis is rejected. Hence there is
significance difference between before and after training.
32
WORKSHEET 6
Independent-Sample T-Test
When we want to test the difference between two independent sample means, we
use independent-sample t-test. The independent samples may belong to the same
population or different population. Some of the instances in which the independent
samples t-test can be used are as follows:
33
Where, N1 and N2 are the sample size of two independent samples.
34
Performance Table 7.4: Average Performance of Employees
Performance
Gender
Gender
Score
Score
Age
Age
56 Male 34 56 Female 30
60 Female 45 76 Female 34
45 Female 40 78 Male 45
65 Male 60 54 Male 34
73 Male 54 87 Male 23
45 Female 42 67 Female 38
60 Female 55 98 Female 43
34 Male 35 89 Female 72
56 Male 54 54 Male 56
59 Female 39 34 Male 32
35 Female 38 45 Male 26
65 Male 29 56 Female 34
45 Male 60 34 Female 54
58 Female 32 56 Male 34
35
32 Female 25 76 Female 45
65 Male 23 87 Male 40
34 Male 26 54 Female 60
78 Male 54 98 Male 60
87 Female 42 34 Female 32
90 Male 55 23 Male 25
45 Female 35 45 Female 23
56 Male 54 65 Male 26
76 Female 39 63 Female 30
76 Male 38 68 Male 34
78 Female 29 87 Female 45
36
SPSS Commands
37
38
Step 3: Now, type the codes of gender (1 for male and 2 for females). Next, click
‘continue’ as shown in Figure 7.5
Step 4: Finally, click on 'OK' to get the group statistics and independent-samples t-
test results (shown in Table 7.5 and 7.6, respectively). Now, let us analyze and
interpret the output. Table 7.5 shows that the average performance score for males
is 61.68 with standard deviation of 19.31 and the average performance score for
females is 60.60 with the standard deviation of 18.94. The difference between the
sample means is found to be very small. Table 7.6 shows that the p-value of
Levene's test for equality of variances is 0.956, which is higher than 5 per cent
level of significance. Thus, the null hypothesis of equal sample variances can be
accepted. The result also shows that the p-value of t-statistic is 0.543, which is also
higher than
5 percent level of significance. Hence, with 95 per cent confidence the null
hypothesis of equal performance level of males and female employees can be
accepted. Thus, it can be concluded from the results of an independent sample t-
39
test that the average performance of males and females of the enterprise is the
same.
40
Group Statistics
Gender N Mean Std. Std. Error Mean
Deviation
male 25 62.48 19.395 3.879
Performance score
female 25 60.60 18.949 3.790
41
The group statistics and independent-samples test result are shown in Table 7.7 and
7.8 respectively.
Group Statistics
42
Independent Samples Test
Levene's Test t-test for Equality of Means
for Equality of
Variances
F Sig. t df Sig. (2- Mean Std. 95% Confidence Interval of
tailed) Difference Error the Difference
Differenc Lower Upper
e
43
WORKSHEET 7
Concept of ANOVA
Independent-samples t-test can be applied to situations where there are only two
independent samples. In other words, we can use independent-samples t-tests for
comparing the means of two populations (such as males and females). When we
have more than two independent samples, t-test is inappropriate. The Analysis of
Variance (ANOVA) has an advantage over t-test when the researcher wants to
compare the means of a large number of populations (i.e., three or more). ANOVA
is a parametric test that is used to study the difference among more than two
groups in the datasets. It helps in explaining the amount of variation in the dataset.
In a dataset, two main types of variations can occur. One type of variation occurs
due to chance and the other type of variation occurs due to specific reasons. These
variations are studied separately in ANOVA to identify the actual cause of
variation and help the researcher in taking effective decisions.
In case of more than two independent samples, the ANOVA test explains three
types of variance. These are as follows: • Total variance
The ANOVA test is based on the logic that if the between group variance is
significantly greater than the within group variance, it indicates that the means of
different samples are significantly different.
There are two main types of ANOVA, namely, one-way ANOVA and two-way
ANOVA. One-way ANOVA determines whether all the independent samples
44
(groups) have the same group means or not. On the other hand, two-way ANOVA
is used when you need to study the impact of two categorical variables on a scale
variable. Objective: To find out the difference between salaries of graduates, post
graduates and PhDs.
Ho: There is no difference between salaries of graduates, post graduates and PhDs.
Salary Qualification
65000.00 Postgraduate
60000.00 Postgraduate
45000.00 Graduate
40000.00 Phd
35000.00 Graduate
56000.00 Postgraduate
36000.00 Phd
45000.00 Phd
40000.00 Post graduate
35000.00 Graduate
56000.00 Phd
36000.00 Phd
25000.00 Graduate
23000.00 Graduate
40000.00 Graduate
45
SPSS Commands
Step2: Transfer the variable ‘salary’ to dependent list window and variable
‘Qualification’ to factor window.
46
STEP 3: Select ‘post hoc’ and then click ‘Tukey’ as shown below in figure 7.3
Step4: Click ‘option’ and select ‘homogeneity of variable test’ and ‘mean plot’ as
shown below in figure 7.4
47
The final SPSS output (statistical package of social science) in tabular form is
shown below in table 7.2, table 7.3, table 7.4, table 7.5, table 7.6 and figure 7.5
respectively
Descriptives
Tukey HSD
6* 70
48
Phd -6555.55556 5734.7 .497 - 7728.7219
4573 20839.83
30
6* 3873 1
49
Based on trimmed 1.268 2 25 .299
mean
50
Table 7.4 SPSS Output of one-way ANOVA
ANOVA
Salary
Total 5354678571. 27
429
51
Table 7.5 SPSS Output of one-way ANOVA
Multiple Comparisons
Tukey HSD
52
Phd Graduate 18355.555 5589.53 .008 4432.964 32278.14
56* 873 1 70
Salary
Tukey HSDa,b
Graduate 10 32200.00
00
Phd 9 50555.5556
53
a. Uses Harmonic Mean Sample Size = 9.310.
54
b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type
I error levels are not guaranteed.
Conclusion: Table 7.2 indicates that average salary of graduate is 32200, of post-
graduate is 44000 and finally of Phd is 50555.5556. This indicates that average
salary of Phd is highest and average salary of graduates is lowest. Table 7.3
represents the Levene Test which assumes the null hypothesis that all sample
variances are same. The significance value of 0.254 indicates that 95% level of
confidence the null hypothesis can be accepted. The homogeneity of variance is
one of the desired condition of one way ANOVA test. Table 7.4 represents the
results of F test in one-way ANOVA. As shown in Table 5.4 the p value of F
statistics (5.591)
55
is less than 5% level of significance. Hence with 95% confidence level, the null
hypothesis of equal group means cannot be accepted. Thus it can be concluded that
average salary of graduates, post-graduates and Phds are not same.
56
WORKSHEET-8
Chi-Square Test
Chi-square test is one of the most popular non-parametric tests. It is used in two
cases which are as follows:
To test the association between nominal variables in research.
To test the difference between the expected and observed frequencies of an
event.
The process of chi-square test compares the actual observed. frequencies with the
calculated expected frequencies of different combinations of nominal variables.
The difference between observed and expected frequencies gives logic of possible
association between categorical variables. The chi- square statistics compares the
observed count in each table cell to the count that would be expected between the
row and column classifications under the assumptions of no associations. A
negligible difference between observed and expected frequencies may indicate no
association, wherever a big difference may indicate the possibility of association.
Objective: To analyze the association between education background and level of
formality with the internet. Ho: There is no significant association between
education background and level of formality with the internet. H: There is
significant association between education background and level of formality with
the internet. Table 8.2 has the data collected from 100 internet users. The data
consists of two nominal variables 'Level of familiarity with the internet' and
'Education Background." The details of the codes provided to different sub-
categories of these nominal variables are shown in table 8.1.
57
Table 8.1 Codes provided to sub-categories
1. Low Familiarity
2. Medium
3. High
Education Background
1. Humanities
2. Management
3. Technology
4. IT
58
Table8.2 Data of 100 internet user
S.no. Level of familiarity with Education
background
The internet
1. 3.00 1.00
2. 2.00 3.00
3. 3.00 1.00
4. 3.00 1.00
5. 3.00 4.00
6. 3.00 4.00
7. 3.00 1.00
8. 3.00 1.00
9. 3.00 1.00
10. 3.00 3.00
11. 2.00 1.00
12. 1.00 1.00
13. 3.00 1.00
14. 3.00 1.00
15. 3.00 3.00
16. 2.00 4.00
17. 2.00 2.00
18. 2.00 4.00
19. 2.00 2.00
20. 2.00 4.00
21. 3.00 1.00
22. 3.00 1.00
23. 3.00 4.00
59
24. 3.00 1.00
25. 3.00 2.00
26. 3.00 2.00
27. 3.00 4.00
28. 3.00 3.00
29. 2.00 2.00
30. 3.00 1.00
31. 1.00 3.00
32. 3.00 2.00
33. 2.00 4.00
34. 3.00 2.00
35. 2.00 2.00
36. 1.00 2.00
37. 2.00 1.00
38. 2.00 4.00
39. 1.00 1.00
40. 2.00 3.00
41. 2.00 2.00
42. 1.00 1.00
43. 2.00 3.00
44. 2.00 4.00
45. 2.00 2.00
46. 3.00 1.00
47. 3.00 3.00
48. 2.00 2.00
60
50. 2.00 2.00
51. 1.00 2.00
52. 2.00 2.00
53. 1.00 4.00
54. 3.00 2.00
55. 2.00 2.00
56. 2.00 4.00
57. 1.00 3.00
58. 1.00 3.00
60. 3.00 1.00
61. 1.00 2.00
62. 1.00 2.00
63. 2.00 2.00
64. 1.00 2.00
65. 2.00 2.00
66. 1.00 2.00
67. 2.00 2.00
68. 2.00 3.00
69. 1.00 2.00
70. 3.00 1.00
71. 2.00 2.00
72. 2.00 3.00
73. 1.00 1.00
74. 2.00 2.00
75. 2.00 2.00
61
77. 1.00 1.00
78. 2.00 3.00
79. 1.00 2.00
80. 1.00 1.00
81. 1.00 1.00
82. 1.00 3.00
83. 1.00 1.00
84. 1.00 1.00
85. 1.00 2.00
86. 1.00 1.00
87. 2.00 1.00
88. 1.00 2.00
89. 2.00 1.00
90. 2.00 4.00
91. 2.00 1.00
92. 1.00 3.00
93. 1.00 4.00
94. 2.00 1.00
95. 1.00 1.00
96. 1.00 3.00
97. 1.00 2.00
98. 1.00 1.00
99. 1.00 1.00
100. 1.00 3.00
62
SPSS Commands
63
Step 3: Select the ‘chi-square’ and ‘Phi and Cramer’s V’ and click ‘continue’ as
shown in figure 8.3
Step 4: Click on ‘cells’ and select ‘observed’ and ‘expected’ and click ‘continue’
as shown in figure 8.4
64
The final SPSS output (statistical package of social science) in tabular form is
shown below in table 8.3, table 8.4, table 8.5 and table 8.6 respectively.
65
Table 8.5 SPSS Output of chi-Square test
Chi-Square Tests
Symmetric Measures
Value Approx. Sig.
Phi .308 .147
Nominal by Nominal
Cramer's V .218 .147
N of Valid Cases 100
66