Business Report: Advanced Statistics Module Project I
Business Report: Advanced Statistics Module Project I
Prasad Mohan
PGPDSBA MAY 21 -A
Date: 15-08-2021
1
Executive Summary
Introduction
The purpose of this exercise is to perform an ANOVA test to ascertain whether or not there are
significant differences between the means of our independent variables.
Data Description
1. Education - object
2. Occupation - object
3. Salary - int64
Sample Dataset
2
Problem 1A:
1) State the null and the alternate hypothesis for conducting one-way ANOVA for both
Education and Occupation individually.
The hypothesis in this case are as follows:
Case 1:
H0: Mean salary is same for any educational qualification
H1: Mean salary differs for at least one educational qualification
Case 2:
H0: Mean salary is same for any occupation level
H1: Mean salary differs for at least one occupation level
2) Perform a one-way ANOVA on Salary with respect to Education. State whether the null
hypothesis is accepted or rejected based on the ANOVA results.
The hypothesis is as framed in question 1.
H0: Mean salary is same for any educational qualification
H1: Mean salary differs for at least one educational qualification
Assuming level of significance as 0.05, when the ANOVA using stats package is performed in
Python, we get the following results:
Since the p-value 1.257709e-08 is less than the alpha 0.05, we reject the null hypothesis. Hence,
mean salary differs for at least one level of education. The same can be observed from the chart
below:
3
3) Perform a one-way ANOVA on Salary with respect to Occupation. State whether the null
hypothesis is accepted or rejected based on the ANOVA results.
H0: Mean salary is same for any occupation level
H1: Mean salary differs for at least one occupation level
Assuming level of significance as 0.05, when the ANOVA using stats package is performed in
Python, we get the following results:
Since the p-value 0.458508 is greater than the alpha 0.05, we fail to reject the null hypothesis.
Hence, mean salary is the same for all levels of occupation. However, we can observe from the
chart below that the mean salary does differ with the occupation level:
4) If the null hypothesis is rejected in either (2) or in (3), find out which class means are
significantly different. Interpret the result.
There are 3 tests that can be conducted post ANOVA, if the null hypothesis is rejected.
a) Tukey's range
b) Bonferroni approach
c) Least significant difference test
Tukey’s HSD test is a single-step multiple comparison procedure and statistical test that is carried
out if the null hypothesis is rejected. It can be used to find means that are significantly different
from each other.
Based on the results, all the three rows have reject values as true, hence we can confirm our results
from ANOVA.
4
Problem 1B:
1) What is the interaction between two treatments? Analyze the effects of one variable on the
other (Education and Occupation) with the help of an interaction plot.[hint: use the ‘point
plot’ function from the ‘seaborn’ function]
The graphs generated depicting the effects of one variable on the other is as follows:
From the above graph, it becomes evident that there is interaction between the education and salary.
Doctorate level earn the highest salary with Prof-speciality occupation level. While HS grad earn
the least in sales level. Thus, there is a direct relationship between education level and salary. It can
also be seen that there is some level of interaction between occupation and salary.
2) Perform a two-way ANOVA based on Salary with respect to both Education and
Occupation (along with their interaction Education*Occupation). State the null and
alternative hypotheses and state your results. How will you interpret this result?
The null and alternate hypothesis are as follows:
H0: Mean salary is the same for all levels of education and occupation
H1: Mean salary differs for at least one level of education and occupation
The alpha is assumed to be 0.05.
Upon conducting the test in python, the p-value is less than the alpha, the null hypothesis is
rejected.
3) Explain the business implications of performing ANOVA for this particular case study.
From the ANOVA tests conducted individually for education and occupation, we can conclude that
the salary is mainly determined by the education level, though occupation has some effect on salary.
From the two way ANOVA on Salary with respect to education, occupation and the combination
effect of both, we can conclude that mean salary varies with different levels of education and
occupation.
******