0% found this document useful (0 votes)
10 views66 pages

Final Spss

The document is a research methodology project report submitted for a Bachelor's degree in Business Administration, detailing various statistical analyses including frequency distribution, measures of central tendency, and outlier testing. It includes worksheets with objectives, datasets, SPSS commands, and outputs for each statistical method applied to data from workers and sales figures. The report is structured with a table of contents and concludes with findings from the analyses conducted.

Uploaded by

Jiya Chhabra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views66 pages

Final Spss

The document is a research methodology project report submitted for a Bachelor's degree in Business Administration, detailing various statistical analyses including frequency distribution, measures of central tendency, and outlier testing. It includes worksheets with objectives, datasets, SPSS commands, and outputs for each statistical method applied to data from workers and sales figures. The report is structured with a table of contents and concludes with findings from the analyses conducted.

Uploaded by

Jiya Chhabra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 66

RESEARCH METHODOLOGY PROJECT

Project report submitted in partial fulfilment


of the requirement of degree of

Bachelors of Business Administration

(2023-2026)

Under the guidance of


DR AMANPREET KAUR LUTHRA
Submitted by:-
JIYA
CHHABRA

Roll.no.00590201823

SRI GURU TEGH BAHADUR INSTITUTE OF


MANAGEMENT AND INFORMATION
TECHNOLOGY

1
TABLE OF CONTENTS

S.NO. TITLE PAGE NO.

Worksheet 1
1. 3
Frequency Distribution
Worksheet 2
2. 10
Measures of Central Tendency
Worksheet 3
3. 15
Outlier Texting
Worksheet 4
4. 21
One Sample T-Test
Worksheet 5
5. 26
Paired sample T-Test
Worksheet 6
6. 30
Independent-Sample T-
Test
Worksheet 7
7. 39
One way of Anova
Worksheet 8
8. 50
Chi-Square Test

2
WORKSHEET 1
FREQUENCY DISTRIBUTION

Frequency distribution is a method of displaying the frequency (number of times a


particular value of variable repeats in the data) of different values of a variable in a
dataset. It represents the counts of all outcomes of variable in a sample. The
frequency distribution of a variable can be represented in tabular as well as
graphical forms.

Frequency distribution is very common and important method for analyzing the
nominal (categorical) and ordinal (ranking) variables in a dataset. In every
questionnaire, one section is dedicated to demographic profiles. The different
categories of demographic profiles in a dataset are normally represented by
frequency distribution in a tabular as well as graphical forms.

Objective: To calculate frequency distribution and present bar chart of education


profiles of the work.

Dataset of workers working in small and medium scale enterprises in city India
is shown below in table 1.1

Table1.1: Data of workers in small-

and medium- scale enterprises

3
S Age S Age
Gender Religion Education Gender Religion Education
No. Group No. Group
1 1 1 3 2 26 1 5 3 2
2 1 4 2 1 27 1 1 1 2
3 1 3 3 4 28 1 5 2 2
4 1 3 1 3 29 1 1 2 4
5 2 4 1 1 30 1 5 2 2
6 1 4 1 1 31 1 2 3 5
7 2 2 1 1 32 1 3 2 1
8 1 2 3 1 33 2 2 2 2
9 1 2 2 1 34 1 5 2 1
10 2 2 2 2 35 2 5 1 2
11 1 3 1 2 36 2 5 2 3
12 1 3 1 3 37 2 2 3 4
13 1 4 1 4 38 2 5 2 32
14 2 1 2 3 39 1 3 3 3
15 1 5 2 2 40 1 5 2 2
16 2 2 2 2 41 1 2 1 1
17 1 1 1 5 42 1 2 3 1
18 1 5 1 5 43 1 3 2 1
19 1 5 2 2 44 1 5 2 5
20 1 2 2 5 45 1 2 1 2
21 2 5 2 2 46 2 5 2 3
22 1 2 2 1 47 2 2 1 2

4
23 1 2 3 1 48 1 3 3 4
24 1 2 1 5 49 1 4 2 4
25 2 5 2 5 50 2 1 1 1

The coding details of different variables in the dataset are shown below table 1.2
Table 1.2: Dataset of coding of different variables
Variables Numeric codes
1. Gender

GENDER

1. Male

2. Female

2. Age Group
Age Group

1. Less than 25 yrs. old

2. 26-35 yrs.

3. 36-45 yrs.

5
4. 46-55 yrs.

5. 56 and above

3. Religion

RELIGION
1. Hindu

2. Muslim
3. Other religion

4. Education

EDUCATION

1. Below 10th

2. High school

3. Intermediate

4. Technical diploma

5. Degree level

6
7
SPSS Commands
Step 1: Click Analyze → Descriptive statistics → Frequency.

Step 2: Transfer the variable education to variable window.

8
Step 3: Select the type of chart.

Step 4: Finally click ‘continue’ and then ‘ok’.


The final SPSS output in tabular form is shown below in table 1.3

Education of worker
Frequency Percent Valid Percent Cumulative Percent
below 10th 14 30.8 32.0 32.0

high school 16 26.9 28.0 60.0

intermediate 7 11.5 12.0 72.0


Valid
technical 6 7.7 8.0 80.0
diploma
degree level 7 19.2 20.0 100.0

Total 49 96.2 100.0


Missing System 1 3.8
Total 50 100.0

Table 1.3: SPSS output of frequency distribution

SPSS output in graphical form is shown below in figure 1.4


9
Figure1.4 : Bar chart of education of workers

Conclusion: The education level of 50 different workers are calculated and found
that the number of workers below 10th grade, high school, intermediate, technical
diploma and degree level are 14 (28%), 17(34%), 5(10%) and 7(14%) respectively.

10
WORKSHEET-2
Measures of Central Tendency

There are three mail measures of central tendency.


These are as follows:
• Arithmetic mean
• Median
• Mode
Let us discuss these three in detail.
Arithmetic Mean
The mean of variable represents its average value. It can be calculated by using the
following formula:
Where, represents the mean and f, represents the frequency of an ith observation of
the variable.
One of the problems with arithmetic mean is that it is highly sensitive to the
presence of outliers in the data of the related variable. To avoid this problem, the
trimmed mean of the variable can be estimated. Trimmed mean is the value of the
mean of a variable after removing some extreme observation (e.g., 2.5 percent
from both the tails of the distribution) from the frequency distribution.
Median
Median is known as the 'positional average' of a variable. If we arrange the
observations of a variable in an ascending or descending order, the value of the
observation that lies in the middle of the series is known as median. The value of
the median divides the observations of a variable into two equal haves. Half of the
observations of the variable are higher than the median value and the other half

11
observations are lower than median value. The etension of median are quartiles,
deciles, and percentiles.
Mode
The mode of available is the observation with highest frequency or highest
concentration of frequencies.
Objective: To calculate mean, median, mode and quartile of monthly sales of
company.
Dataset of monthly sales figures (in crores) of an enterprise for 50 consecutive are
given in Table 2.1.
Table2.1: Monthly Sales Figures of 50 consecutive months of an enterprise

Month Sales Month Sales Month Sales Month Sales


1 60 16 8 31 45 46 70
2 70 17 15 32 49 47 98
3 45 18 40 33 68 48 45
4 90 19 54 34 65 49 89
5 110 20 56 35 70 50 100
6 40 21 25 36 60
7 90 22 43 37 30
8 50 23 56 38 40
9 70 24 120 39 110
10 65 25 120 40 150
11 54 26 130 41 34
12 72 27 23 42 56
13 45 28 32 43 97
14 24 29 54 44 34

12
15 12 30 34 45 54

SPSS Commands
Step 1: Click Analyze → Descriptive statistics → Frequency.

Step 2: Transfer the variable to variable window and click “statistics” as shown in
the figure 2.2

13
14
Step 3: Select the option ‘mean’ , ‘median’ , ‘mode’ and ‘quartiles’ and click
‘continue’ and then ‘ok’ as shown in the figure 2.3

SPSS output is shown in table 2.2


Table 2.2: SPSS output of measures of central tendency

Statistics
monthly sales
Valid 50
N
Missing 0
Mean 60.90
Median 55.00
Mode 45a
25 38.50

Percentiles 50 55.00
75 76.25
a. Multiple modes exist. The
smallest value is shown
Conclusion: Table 2.2 represents SPSS output.

15
Mean value of sales figure is 61.38
Median value of sales figure is 55.00
Mode value of sales figure is 45a
Percentile (25) value is – 40.00
Percentile (50) value is – 55.00
Percentile (75) value is – 75.25

16
WORKSHEET-3
Outlier Texting

Outlier are:
 The extreme observations lying in the extreme tails of the probability
distribution of the variables.
 The observations with the highest residuals for a relation model (regression
model)
 The observations that, if not included in the analysis, cause a significant
difference in the result.

On the basis of the cases mentioned above, outliers can be divided into three
different types:
1. Extreme values or univariate outliers
2. Multivariate outliers
3. Influencers
Two popular method of detecting outliers are
1. Extreme values
2. Box plot

Objective: To detect if any outlier(s) is present in the given data.


Dataset of 50 players are given in Table 3.1

17
Table 3.1: Data of 50 players

S.No. Gender Age Sports Hours


1 1 1 1 2.0
2 1 2 2 3.0
3 2 1 3 4.0
4 1 3 4 4.5
5 1 4 5 2.5
6 2 1 2 3.0
7 1 2 2 2.5
8 1 3 3 5
9 2 2 5 2.0
10 2 2 5 1.0
11 1 2 5 1.0
12 2 2 4 1.5
13 2 3 4 1.5
14 2 1 1 3.5
15 1 2 3 1.5
16 1 3 1 2.0
17 1 3 1 2.0
18 1 3 5 1.5
19 2 1 2 3.5
20 1 1 4 3.0
21 2 2 3 3.0
22 1 2 5 13.0
23 1 2 1 4.0
24 2 3 2 2.0
18
25 1 3 2 3.0
26 1 2 2 4.5
27 2 3 2 4.0
28 1 2 4 4.0
29 1 3 3 5.0
30 1 1 1 1.0

SPSS Commands

Step 1: Click Analyze → Descriptive statistics → Explore.

19
Step2: Send the hours spend variable in the dependent list and then click statistics.

Step 3: Select ‘outliers’ and click ‘continue’ as shown in figure 3.3

20
Then required output is shown in table 3.2 and box plot diagram in shown in figure
3.4
Table 3.2: SPSS output of outlier testing

Extreme Values
Case Number Value
1 22 13.0
2 8 5.0

Highest 3 29 5.0

4 4 4.5
5 26 4.5
hours
1 30 1.0
2 11 1.0

Lowest 3 10 1.0

4 15 1.5
5 13 1.5a

a. Only a partial list of cases with the value 1.5 are


shown in the table of lower extremes.

Figure3.4: Screenshot of SPSS output

Box Plot Diagram.

21
Conclusion: Table 3.2 represents SPSS output of outliers .It represents extreme
high and extreme low values in the sportsman dataset. Case number 22,29,4,26 and
27 have extreme high values and case number 8,30,11,10 and 15 have extreme
lower values. Figure 3.4 represents that case number 22 is an outliers.

22
WORKSHEET-4
Test of Difference: One sample T-Test

In many situations, we come across claims made by marketers about their


products. For example, a car manufacturer may claim that the average mileage of a
car is, for say, 199.9 kmpl or a business school may claim that the average package
offered to its students is Rs. 12 lakh per annum. A researcher may be interested in
analyzing the truthfulness of these claims. For this analysis, the researcher needs to
randomly pick a small sample from the population and compare its mean with the
claimed population mean. The sample mean and the population mean maybe
different from each other. In order to test whether this difference is statistically
significant, we should apply one-sample test.
The null hypothesis of one-sample test is:
"H. There is no significant difference between sample mean and population mean.”
The t-statistic in one-sample t-test can be estimated by using the following
formula: t=
Where, sample mean, population mean, = standard deviation of sample mean and N
= sample size.
Objective: To find out the difference between population mean and sample
mean. H: There is no difference between population mean and sample mean.
H: There is difference between population mean and sample mean.
Dataset of weight lost (in figure) by 50 customers a month after joining the weight
loss program is shown in table 4.1.1

23
Table 4.1: Data of weight lost by 50 customers a month after joining the
weight loss program

S.No. Weightlose S.No. Weightlose S.No. weightlose S.No. weightlose


1 2 16 4 31 6 46 5
2 3 17 5 32 2 47 4
3 2 18 4 33 5 48 5
4 4 19 3 34 5 49 6
5 5 20 4 35 4 50 5
6 3 21 5 36 4
7 3 22 6 37 3
8 2 23 4 38 4
9 3 24 5 39 3
10 4 25 6 40 4
11 2 26 5 41 5
12 3 27 4 42 4
13 3 28 4 43 3
14 4 29 5 44 4
15 3 30 5 45 5

24
SPSS Commands

Step 1: Click Analyze → Compare mean → one simple T-Test

Step2: Transfer the variable ‘weight loss’ to test variable window.

25
Step3: Click ‘Options’

Step4: Click Continue and then ‘ok’

26
The final SPSS output (statistical package of social science) in tabular form is
shown below in Table 4.1 and Table 4.2 and respectively.

Table 4.1
One-Sample Statistics
N Mean Std. Std. Error
Deviation Mean
weightlost 50 4.0 1.11 .158
2 6

Table 4.2
One-Sample Test
Test Value = 0

t df Sig. (2- Mean 95% Confidence Interval of


tailed) Difference the Difference
Lower Upper
weightlost 25.48 49 .000 4.02 3.70 4.34
1 0

When significance level is less than 0.5, then Ho gets rejected.

Conclusion: Sample mean is 4.02 kgs which is less than the claimed population
mean of 5 kgs. The t statistics is found to be 25.481 with p value of .000 .Since the
p value of t statistics is less than 5% level of significance, hence with 95%
confidence level the null hypothesis of no difference between sample mean and
population mean cannot be accepted and it can be concluded that sample mean is
significantly different from population mean. Therefore, the company is making a
wrong statement about the weight loss of its customers.

27
Worksheet-5
Paired sample t-
test

A paired sample t-test is also known as repeated sample t-test because data
(responses) is collected from same respondents but at different time periods. A
paired sample t-test should be used when we want to test the impact of a event or
experiment on the variable under study. In this case, the data is collected from the
same respondents before and after the event. After this, means are compared. The
null hypothesis of paired sample t-test is that the means of pre-sample and post-
sample are equal. Some of the instances where paired sample t-test can be applied
are as follows:
a. Analyzing the effectiveness of training program on the performance of
employees of a business enterprise.
b. Analyzing the impact of a new advertisement on the sales of a product.
c. Analyzing the impact of a policy on the volatility in the stock market.
d. Analyzing the difference of the respondents of the same group to two different
treatments.
Example: The HR manager of a business enterprise wants to analyze the impact of
a training program conducted for 30 employees. The purpose of conducting the
training program was to improve the performance of employees. The performance
scores of the employees are noted before and after the training program. Now, the
paired samples t-test is applied in order to analyze the impact of the training
program.

Objective: to find out the difference between before training and after training.

28
The data is given in Table 1.1

29
Table 1.1: Data of the Performance of Employees

Pre- Post- Pre- Post- Pre- Post-training


training training training training training
56 82 38 67 65 68
45 76 44 56 53 56
56 78 76 91 49 53
34 64 34 48 42 56
56 62 38 68 53 76
42 60 42 67 58 82
43 68 83 90 34 45
56 69 72 87 43 76
70 78 47 64 45 67
56 87 48 53 65 72

30
SPSS Output

Step 1: Click Analyze → Compare means → paired sample T-Test

Step2: Click on the variable pre training score. Then click on the post training
variable. Now, move the paired variable into the paired variables box by clicking
on the right arrow button. Finally click on 'OK' as shown in fig4.2.2

31
Paired Samples Test
Paired Differences t df Sig. (2-tailed)
Mean Std. Std. 95% Confidence Interval of
Deviation the
Difference
Error Mean
Lower Upper
- 9.56460 1.74625 -20.93815 -13.79519 -9.945 29 .000
17.36667

pretraining
-
postraining

Paired Samples Statistics


Mean N Std. Std. Error
Deviation Mean
pretraining 51.433 30 12.79192 2.33548
3
Pair 1
postraining 68.800 30 12.41634 2.26690
0

Paired Samples Correlations


N Correlation Sig.
Pair 1 pretraining & 30 .712 .00
postraining 0

Since the significance value is .002 which is less than 5% significance level, we
can states with 95% confidence level that null hypothesis is rejected. Hence there is
significance difference between before and after training.

32
WORKSHEET 6
Independent-Sample T-Test

When we want to test the difference between two independent sample means, we
use independent-sample t-test. The independent samples may belong to the same
population or different population. Some of the instances in which the independent
samples t-test can be used are as follows:

1. Testing difference in the average level of performance between employees with


the MBA degree and employees without the MBA degree.

2. Testing difference in the average wages received by labor in two different


industries.

3. Testing difference in the average monthly sales of the two firms.

The null hypothesis of independent-samples t-test is:

Ho: There is no significant difference between sample means of two independent


groups."

The t-statistic in the case of independent-samples t-test can be calculated by using


the following formula:

t 𝑥1 - 𝑥2 =( 𝑁1−1)𝑠2+ (𝑁2 −1)𝑠2


𝑥1−𝑥1
√( 1 2
) 𝑁1
1 1
+ 𝑁2)
𝑁1 + 𝑁2 − 2
(

33
Where, N1 and N2 are the sample size of two independent samples.

In SPSS, the independent-samples t-test is conducted in two stages. At stage one


SPSS software compares variances of two samples. The statistical method of
comparing two sample variances is known as Levene's homogeneity test of
variance The null hypothesis of this test is 'Equal variances assumed', i.e. there are
no significant differences between the sample variances of two independent
samples. In other words, the two samples are comparable. On the basis of Levene's
test of homogeneity, the SPSS gives two values of t-statistic. In case of equal
variances, both the values are the same. In case if the sample variances are
different, the lower t- statistic value should be considered for final analysis.

Example 7.2: A researcher is interested to analyze the difference in the average


performance of employees of an enterprise in different demographic profiles. He
divides employees on the basis of gender and their age group. The data is given in
Table 7.4:

34
Performance Table 7.4: Average Performance of Employees

Performance

Gender
Gender
Score

Score
Age

Age
56 Male 34 56 Female 30

60 Female 45 76 Female 34

45 Female 40 78 Male 45

65 Male 60 54 Male 34

73 Male 54 87 Male 23

45 Female 42 67 Female 38

60 Female 55 98 Female 43

34 Male 35 89 Female 72

56 Male 54 54 Male 56

59 Female 39 34 Male 32

35 Female 38 45 Male 26

65 Male 29 56 Female 34

45 Male 60 34 Female 54

58 Female 32 56 Male 34

35
32 Female 25 76 Female 45

65 Male 23 87 Male 40

34 Male 26 54 Female 60

78 Male 54 98 Male 60

87 Female 42 34 Female 32

90 Male 55 23 Male 25

45 Female 35 45 Female 23

56 Male 54 65 Male 26

76 Female 39 63 Female 30

76 Male 38 68 Male 34

78 Female 29 87 Female 45

36
SPSS Commands

Step 1: Click Analyze → Compare means → Independent-Sample T-Test

Step 2: Sent the test variable ‘Performance_score’ to the ‘Test Variable(s)’


window. Then, sent ‘Gender’ variable in ‘Grouping Variable’ and click ‘Define
Groups’.

37
38
Step 3: Now, type the codes of gender (1 for male and 2 for females). Next, click
‘continue’ as shown in Figure 7.5

Step 4: Finally, click on 'OK' to get the group statistics and independent-samples t-
test results (shown in Table 7.5 and 7.6, respectively). Now, let us analyze and
interpret the output. Table 7.5 shows that the average performance score for males
is 61.68 with standard deviation of 19.31 and the average performance score for
females is 60.60 with the standard deviation of 18.94. The difference between the
sample means is found to be very small. Table 7.6 shows that the p-value of
Levene's test for equality of variances is 0.956, which is higher than 5 per cent
level of significance. Thus, the null hypothesis of equal sample variances can be
accepted. The result also shows that the p-value of t-statistic is 0.543, which is also
higher than
5 percent level of significance. Hence, with 95 per cent confidence the null
hypothesis of equal performance level of males and female employees can be
accepted. Thus, it can be concluded from the results of an independent sample t-
39
test that the average performance of males and females of the enterprise is the
same.

40
Group Statistics
Gender N Mean Std. Std. Error Mean
Deviation
male 25 62.48 19.395 3.879
Performance score
female 25 60.60 18.949 3.790

Independent Samples Test


Levene's Test for t-test for Equality of Means
Equality of
Variances

F Sig. t df Sig. (2- Mean Std. Error 95% Confidence


tailed) Differenc Difference Interval of the
e Difference
Lower Upper
Equal .007 .933 .347 48 .730 1.880 5.423 -9.024 12.78
4
variances
assumed
Equal variances not .347 47.974 .730 1.880 5.423 -9.024 12.78
assumed 4

In addition to the instances discussed above, independent-samples t-test can also be


applied when we have some scale variables (such as age) rather than category (age
group). In this case, while defining groups, use the option of cut point and give
some appropriate value of the cut point. SPSS divides respondents into two groups
on the basis of this cut point. For example, if the cut point is 40 in our example
(shown in Figure 7.6), the respondents are divided into two groups. One group
consists of respondents with age less than 40 and other group with age more than
40. The output analysis remains the same.

41
The group statistics and independent-samples test result are shown in Table 7.7 and
7.8 respectively.

Group Statistics

Age N Mean Std. DeviationStd. Error Mean


>= 40 22 68.86 19.075 4.067
Performance score
< 40 28 55.79 17.152 3.241

42
Independent Samples Test
Levene's Test t-test for Equality of Means
for Equality of
Variances
F Sig. t df Sig. (2- Mean Std. 95% Confidence Interval of
tailed) Difference Error the Difference
Differenc Lower Upper
e

Equal 1. .314 2.548 48 .014 13.078 5.133 2.757 23.399


varianc 0
es 3
assumed 5
Equal 2.515 42.74 .016 13.078 5.200 2.588 23.567
varianc 1
e s not
assumed

43
WORKSHEET 7

One way of Anova

Concept of ANOVA

Independent-samples t-test can be applied to situations where there are only two
independent samples. In other words, we can use independent-samples t-tests for
comparing the means of two populations (such as males and females). When we
have more than two independent samples, t-test is inappropriate. The Analysis of
Variance (ANOVA) has an advantage over t-test when the researcher wants to
compare the means of a large number of populations (i.e., three or more). ANOVA
is a parametric test that is used to study the difference among more than two
groups in the datasets. It helps in explaining the amount of variation in the dataset.
In a dataset, two main types of variations can occur. One type of variation occurs
due to chance and the other type of variation occurs due to specific reasons. These
variations are studied separately in ANOVA to identify the actual cause of
variation and help the researcher in taking effective decisions.

In case of more than two independent samples, the ANOVA test explains three
types of variance. These are as follows: • Total variance

• Between group variance

• Within group variance

The ANOVA test is based on the logic that if the between group variance is
significantly greater than the within group variance, it indicates that the means of
different samples are significantly different.

There are two main types of ANOVA, namely, one-way ANOVA and two-way
ANOVA. One-way ANOVA determines whether all the independent samples

44
(groups) have the same group means or not. On the other hand, two-way ANOVA
is used when you need to study the impact of two categorical variables on a scale
variable. Objective: To find out the difference between salaries of graduates, post
graduates and PhDs.

Ho: There is no difference between salaries of graduates, post graduates and PhDs.

H: There is difference between salaries of graduates, post graduates and PhDs.

Table 6.1 Data of salaries and Qualification

Salary Qualification
65000.00 Postgraduate
60000.00 Postgraduate
45000.00 Graduate
40000.00 Phd
35000.00 Graduate
56000.00 Postgraduate
36000.00 Phd
45000.00 Phd
40000.00 Post graduate
35000.00 Graduate
56000.00 Phd
36000.00 Phd
25000.00 Graduate
23000.00 Graduate
40000.00 Graduate

45
SPSS Commands

Step 1: Click Analyze → Compare means → one -way ANOVA

Step2: Transfer the variable ‘salary’ to dependent list window and variable
‘Qualification’ to factor window.

46
STEP 3: Select ‘post hoc’ and then click ‘Tukey’ as shown below in figure 7.3

Step4: Click ‘option’ and select ‘homogeneity of variable test’ and ‘mean plot’ as
shown below in figure 7.4

47
The final SPSS output (statistical package of social science) in tabular form is
shown below in table 7.2, table 7.3, table 7.4, table 7.5, table 7.6 and figure 7.5
respectively

Table 6.2 SPSS Output of one-way ANOVA

Descriptives

Dependent Variable: salary

Tukey HSD

(I) (J) Mean Std. Sig. 95% Confidence Interval


Qualificati Qualificati Difference Error
Lower Upper Bound
on on (I-J)
Bound

Graduate Post - 5589.5 .108 - 2122.5914


Graduate 11800.0000 3873 25722.59
0 14

Phd - 5589.5 .008 - -4432.9641


18355.5555 3873 32278.14

6* 70

Post Graduate 11800.0000 5589.5 .108 - 25722.5914


Graduate 0 3873 2122.591
4

48
Phd -6555.55556 5734.7 .497 - 7728.7219
4573 20839.83
30

Phd Graduate 18355.5555 5589.5 .008 4432.964 32278.1470

6* 3873 1

Post 6555.55556 5734.7 .497 - 20839.8330


Graduate 4573 7728.721
9

*. The mean difference is significant at the 0.05 level.

Table 7.3 SPSS Output of one-way ANOVA

Tests of Homogeneity of Variances

Levene df1 df2 Sig.


Statistic

Salar Based on Mean 1.450 2 25 .254


y
Based on Median .421 2 25 .661

Based on Median and .421 2 17.231 .663


with adjusted df

49
Based on trimmed 1.268 2 25 .299
mean

50
Table 7.4 SPSS Output of one-way ANOVA

ANOVA

Salary

Sum of df Mean Square F Sig.


Squares

Between 1654856349. 2 827428174.6 5.591 .010


Groups 206 03

Within 3699822222. 25 147992888.8


Groups 222 89

Total 5354678571. 27
429

51
Table 7.5 SPSS Output of one-way ANOVA

Multiple Comparisons

Dependent Variable: Salary

Tukey HSD

(I) (J) Mean Std. Sig. 95%


Qualificati Qualificati Difference Error Confidence
on on (I-J) Interval
Lower Upper
Bound Bound

Graduate Post - 5589.53 .108 - 2122.591


Graduate 11800.000 873 25722.59 4
00 14

Phd - 5589.53 .008 - -


18355.555 873 32278.14 4432.964
56* 70 1

Post Graduate 11800.000 5589.53 .108 - 25722.59


Graduate 00 873 2122.591 14
4

Phd - 5734.74 .497 - 7728.721


6555.5555 573 20839.83 9
6 30

52
Phd Graduate 18355.555 5589.53 .008 4432.964 32278.14
56* 873 1 70

Post 6555.5555 5734.74 .497 - 20839.83


Graduate 6 573 7728.721 30
9

*. The mean difference is significant at the 0.05 level.

Table 7.6 SPSS Output of one-way ANOVA

Salary

Tukey HSDa,b

Qualificatio N Subset for alpha = 0.05


n
1 2

Graduate 10 32200.00
00

Post 9 44000.00 44000.0000


Graduate 00

Phd 9 50555.5556

Sig. .112 .486

Means for groups in homogeneous subsets are displayed.

53
a. Uses Harmonic Mean Sample Size = 9.310.

54
b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type
I error levels are not guaranteed.

Figure 7.5 Screenshot of SPSS Output-Graphical Form

Conclusion: Table 7.2 indicates that average salary of graduate is 32200, of post-
graduate is 44000 and finally of Phd is 50555.5556. This indicates that average
salary of Phd is highest and average salary of graduates is lowest. Table 7.3
represents the Levene Test which assumes the null hypothesis that all sample
variances are same. The significance value of 0.254 indicates that 95% level of
confidence the null hypothesis can be accepted. The homogeneity of variance is
one of the desired condition of one way ANOVA test. Table 7.4 represents the
results of F test in one-way ANOVA. As shown in Table 5.4 the p value of F
statistics (5.591)
55
is less than 5% level of significance. Hence with 95% confidence level, the null
hypothesis of equal group means cannot be accepted. Thus it can be concluded that
average salary of graduates, post-graduates and Phds are not same.

56
WORKSHEET-8
Chi-Square Test

Chi-square test is one of the most popular non-parametric tests. It is used in two
cases which are as follows:
 To test the association between nominal variables in research.
 To test the difference between the expected and observed frequencies of an
event.
The process of chi-square test compares the actual observed. frequencies with the
calculated expected frequencies of different combinations of nominal variables.
The difference between observed and expected frequencies gives logic of possible
association between categorical variables. The chi- square statistics compares the
observed count in each table cell to the count that would be expected between the
row and column classifications under the assumptions of no associations. A
negligible difference between observed and expected frequencies may indicate no
association, wherever a big difference may indicate the possibility of association.
Objective: To analyze the association between education background and level of
formality with the internet. Ho: There is no significant association between
education background and level of formality with the internet. H: There is
significant association between education background and level of formality with
the internet. Table 8.2 has the data collected from 100 internet users. The data
consists of two nominal variables 'Level of familiarity with the internet' and
'Education Background." The details of the codes provided to different sub-
categories of these nominal variables are shown in table 8.1.

57
Table 8.1 Codes provided to sub-categories

1. Codes for the variable 'Level of Familiarity with the internet’

Level of Familiarity with the internet

1. Low Familiarity
2. Medium
3. High

2. Codes for the variable ‘Education Background’

Education Background

1. Humanities
2. Management
3. Technology
4. IT

58
Table8.2 Data of 100 internet user
S.no. Level of familiarity with Education
background
The internet
1. 3.00 1.00
2. 2.00 3.00
3. 3.00 1.00
4. 3.00 1.00
5. 3.00 4.00
6. 3.00 4.00
7. 3.00 1.00
8. 3.00 1.00
9. 3.00 1.00
10. 3.00 3.00
11. 2.00 1.00
12. 1.00 1.00
13. 3.00 1.00
14. 3.00 1.00
15. 3.00 3.00
16. 2.00 4.00
17. 2.00 2.00
18. 2.00 4.00
19. 2.00 2.00
20. 2.00 4.00
21. 3.00 1.00
22. 3.00 1.00
23. 3.00 4.00
59
24. 3.00 1.00
25. 3.00 2.00
26. 3.00 2.00
27. 3.00 4.00
28. 3.00 3.00
29. 2.00 2.00
30. 3.00 1.00
31. 1.00 3.00
32. 3.00 2.00
33. 2.00 4.00
34. 3.00 2.00
35. 2.00 2.00
36. 1.00 2.00
37. 2.00 1.00
38. 2.00 4.00
39. 1.00 1.00
40. 2.00 3.00
41. 2.00 2.00
42. 1.00 1.00
43. 2.00 3.00
44. 2.00 4.00
45. 2.00 2.00
46. 3.00 1.00
47. 3.00 3.00
48. 2.00 2.00

49. 3.00 2.00

60
50. 2.00 2.00
51. 1.00 2.00
52. 2.00 2.00
53. 1.00 4.00
54. 3.00 2.00
55. 2.00 2.00
56. 2.00 4.00
57. 1.00 3.00
58. 1.00 3.00
60. 3.00 1.00
61. 1.00 2.00
62. 1.00 2.00
63. 2.00 2.00
64. 1.00 2.00
65. 2.00 2.00
66. 1.00 2.00
67. 2.00 2.00
68. 2.00 3.00
69. 1.00 2.00
70. 3.00 1.00
71. 2.00 2.00
72. 2.00 3.00
73. 1.00 1.00
74. 2.00 2.00
75. 2.00 2.00

76. 2.00 1.00

61
77. 1.00 1.00
78. 2.00 3.00
79. 1.00 2.00
80. 1.00 1.00
81. 1.00 1.00
82. 1.00 3.00
83. 1.00 1.00
84. 1.00 1.00
85. 1.00 2.00
86. 1.00 1.00
87. 2.00 1.00
88. 1.00 2.00
89. 2.00 1.00
90. 2.00 4.00
91. 2.00 1.00
92. 1.00 3.00
93. 1.00 4.00
94. 2.00 1.00
95. 1.00 1.00
96. 1.00 3.00
97. 1.00 2.00
98. 1.00 1.00
99. 1.00 1.00
100. 1.00 3.00

62
SPSS Commands

Step 1: Click Analyze → Descriptive statistics → Cross Tabs

Step 2: Transfer ‘education background’ to the row(s) window and ‘familiarity


with the internet’ to the column(s) window. Click statistics as shown in figure 8.2

63
Step 3: Select the ‘chi-square’ and ‘Phi and Cramer’s V’ and click ‘continue’ as
shown in figure 8.3

Step 4: Click on ‘cells’ and select ‘observed’ and ‘expected’ and click ‘continue’
as shown in figure 8.4

64
The final SPSS output (statistical package of social science) in tabular form is
shown below in table 8.3, table 8.4, table 8.5 and table 8.6 respectively.

Table 8.3 SPSS Output of chi-square test

Case Processing Summary


Cases
Valid Missing Total
N Percent N Percent N Percent
Education background * 100 100.0 0 0.0 10 100.0
% % 0 %
level of familiarity with
the internet

Table 8.4 SPSS Output of chi-square test

Education background * level of familiarity with the internet Crosstabulation


Level of familiarity with the Total
internet
1.00 2.00 3.00
Count 13 7 15 35
1.00
Expected Count 11.6 13.0 10.5 35.0
Count 11 16 6 33
2.00
Expected Count 10.9 12.2 9.9 33.0
Education background
Count 6 6 4 16
3.00
Expected Count 5.3 5.9 4.8 16.0
Count 3 8 5 16
4.00
Expected Count 5.3 5.9 4.8 16.0
Count 33 37 30 100
Total
Expected Count 33.0 37.0 30.0 100.0

65
Table 8.5 SPSS Output of chi-Square test

Chi-Square Tests

Value Df Asymp. Sig.


(2- sided)

Pearson Chi-Square 9.515a 6 .147


Likelihood Ratio 10.096 6 .121
Linear-by-Linear .002 1 .963
Association
N of Valid Cases 100

a. 2 cells (16.7%) have expected count less than 5. The


minimum expected count is 4.80.

Table 8.6 SPSS Output of Chi-Square test

Symmetric Measures
Value Approx. Sig.
Phi .308 .147
Nominal by Nominal
Cramer's V .218 .147
N of Valid Cases 100

Conclusion: The p value (.147) is more than 5% level of significance which


indicates that null hypothesis of no association between education background and
level of familiarity with internet is accepted.

66

You might also like