100% found this document useful (1 vote)
188 views5 pages

Business Report: Advanced Statistics Module Project I

This document analyzes how educational qualification and occupation affect salary through ANOVA testing. It collects salary data for 40 individuals along with their education level and occupation. ANOVA tests show that mean salary differs significantly by education level but not occupation level. Further analysis using Tukey's test finds that doctorate salaries are significantly higher than other education levels. An interaction plot shows that doctorates earn the highest salaries in professional occupations while high school graduates earn the least in sales. A two-way ANOVA rejects the hypothesis that mean salary is the same across all education and occupation combinations. This suggests that both factors jointly impact salary levels. The implications are that an individual's salary is mainly determined by their education level and occupation.

Uploaded by

Prasad Mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
188 views5 pages

Business Report: Advanced Statistics Module Project I

This document analyzes how educational qualification and occupation affect salary through ANOVA testing. It collects salary data for 40 individuals along with their education level and occupation. ANOVA tests show that mean salary differs significantly by education level but not occupation level. Further analysis using Tukey's test finds that doctorate salaries are significantly higher than other education levels. An interaction plot shows that doctorates earn the highest salaries in professional occupations while high school graduates earn the least in sales. A two-way ANOVA rejects the hypothesis that mean salary is the same across all education and occupation combinations. This suggests that both factors jointly impact salary levels. The implications are that an individual's salary is mainly determined by their education level and occupation.

Uploaded by

Prasad Mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Business Report

Advanced Statistics Module Project I

Prasad Mohan
PGPDSBA MAY 21 -A
Date: 15-08-2021

1
Executive Summary

To understand the dependency of salary on educational qualification and occupation, salaries of 40


individuals are collected and each person’s educational qualification and occupation are noted.
Educational qualification is at three levels, High school graduate, Bachelor, and Doctorate.
Occupation is at four levels, Administrative and clerical, Sales, Professional or speciality, and
Executive or managerial. Salary is hypothesized to depend on educational qualification and
occupation.

Introduction

The purpose of this exercise is to perform an ANOVA test to ascertain whether or not there are
significant differences between the means of our independent variables.

Data Description

1. Education - object
2. Occupation - object
3. Salary - int64

Sample Dataset

2
Problem 1A:

1) State the null and the alternate hypothesis for conducting one-way ANOVA for both
Education and Occupation individually.
The hypothesis in this case are as follows:
Case 1:
H0: Mean salary is same for any educational qualification
H1: Mean salary differs for at least one educational qualification
Case 2:
H0: Mean salary is same for any occupation level
H1: Mean salary differs for at least one occupation level

2) Perform a one-way ANOVA on Salary with respect to Education. State whether the null
hypothesis is accepted or rejected based on the ANOVA results.
The hypothesis is as framed in question 1.
H0: Mean salary is same for any educational qualification
H1: Mean salary differs for at least one educational qualification
Assuming level of significance as 0.05, when the ANOVA using stats package is performed in
Python, we get the following results:
Since the p-value 1.257709e-08 is less than the alpha 0.05, we reject the null hypothesis. Hence,
mean salary differs for at least one level of education. The same can be observed from the chart
below:

Fig 1: Salary vs education

3
3) Perform a one-way ANOVA on Salary with respect to Occupation. State whether the null
hypothesis is accepted or rejected based on the ANOVA results.
H0: Mean salary is same for any occupation level
H1: Mean salary differs for at least one occupation level
Assuming level of significance as 0.05, when the ANOVA using stats package is performed in
Python, we get the following results:
Since the p-value 0.458508 is greater than the alpha 0.05, we fail to reject the null hypothesis.
Hence, mean salary is the same for all levels of occupation. However, we can observe from the
chart below that the mean salary does differ with the occupation level:

Fig 2: Salary vs occupation level

4) If the null hypothesis is rejected in either (2) or in (3), find out which class means are
significantly different. Interpret the result.
There are 3 tests that can be conducted post ANOVA, if the null hypothesis is rejected.
a) Tukey's range
b) Bonferroni approach
c) Least significant difference test
Tukey’s HSD test is a single-step multiple comparison procedure and statistical test that is carried
out if the null hypothesis is rejected. It can be used to find means that are significantly different
from each other.
Based on the results, all the three rows have reject values as true, hence we can confirm our results
from ANOVA.

4
Problem 1B:

1) What is the interaction between two treatments? Analyze the effects of one variable on the
other (Education and Occupation) with the help of an interaction plot.[hint: use the ‘point
plot’ function from the ‘seaborn’ function]
The graphs generated depicting the effects of one variable on the other is as follows:

Fig 3: Salary vs occupation level

From the above graph, it becomes evident that there is interaction between the education and salary.
Doctorate level earn the highest salary with Prof-speciality occupation level. While HS grad earn
the least in sales level. Thus, there is a direct relationship between education level and salary. It can
also be seen that there is some level of interaction between occupation and salary.

2) Perform a two-way ANOVA based on Salary with respect to both Education and
Occupation (along with their interaction Education*Occupation). State the null and
alternative hypotheses and state your results. How will you interpret this result?
The null and alternate hypothesis are as follows:
H0: Mean salary is the same for all levels of education and occupation
H1: Mean salary differs for at least one level of education and occupation
The alpha is assumed to be 0.05.
Upon conducting the test in python, the p-value is less than the alpha, the null hypothesis is
rejected.
3) Explain the business implications of performing ANOVA for this particular case study.
From the ANOVA tests conducted individually for education and occupation, we can conclude that
the salary is mainly determined by the education level, though occupation has some effect on salary.
From the two way ANOVA on Salary with respect to education, occupation and the combination
effect of both, we can conclude that mean salary varies with different levels of education and
occupation.
******

You might also like