Assignment 2 Solutions
Assignment 2 Solutions
Assignment 2 Solutions
ASSIGNMENT 2 SOLUTIONS
Question 1 (2 marks)
Which department in the company is the largest in staffing? Include a pivot table to support
your answer.
The Research and Development department has the largest staffing of 961 employees
Question 2 (2 marks)
Calculate the attrition rate for new staff, i.e. those who have been with the company for less
than one year. Show your working.
OR
Count of
YearsAtCompany Attrition
Grand
YearsAtCompany No Yes Total
0 63.64% 36.36% 100.00%
Grand Total 63.64% 36.36% 100.00%
Question 3 (2 marks)
Compare the attrition rate based on the different genders. Explain briefly and provide relevant
evidence in your discussions.
Count of
Employee ID Attrition
Grand
Gender No Yes Total
Female 85.20% 14.80% 100.00%
Male 82.99% 17.01% 100.00%
Grand Total 83.88% 16.12% 100.00%
Overall, the attrition rate for Male employees (17.01%) is greater than that for female
employees (14.80%)
Question 4 (3 marks)
Suppose that we randomly select 15 employees. What is the probability that at least 5 of them
will leave the company? Define the distribution and show all working.
n = 15; p = P(Attrition = yes) = 0.1612
Y ~ Bin(15,0.1612)
P(Y >= 5) = 1 - P(Y < 4)
=1-BINOM.DIST(4,15,0.1612,TRUE)
= 0.0799 or 7.99%
Question 5 (3 marks)
What is the likelihood that a staff employed in Research & Development will leave the
company? Describe two different ways to obtain this answer using Pivot tables. Show your
working.
P(Attrition = Yes|Research & Development) = 13.84%
Two Ways of calculating:
1) Using the Grand Total Table
Count of Employee ID Attrition
Grand
Department No Yes Total
Human Resources 3.47% 0.82% 4.29%
Research & Development 56.33% 9.05% 65.37%
Sales 24.08% 6.26% 30.34%
Grand Total 83.88% 16.12% 100.00%
Question 6 (4 marks)
The HR Director claims that employees from Research & Development are less likely to leave
the company than their colleagues from other departments. Based on the data provided, do you
agree with the HR Director? Provide a clear explanation, including a pivot table to support
your answer.
Yes, I agree with the HR Director because P(Attrition = Yes| R&D) = 13.84% < P(Attrition
= Yes| HR)= 19.05% < P(Attrition = Yes| Sales) = 20.63% (answer can also be a sentence
without prob statements as long as the values are mentioned)
Question 7 ( 1 + 7 + 1 = 9 marks)
a) State two methods you have learnt which can be used to investigate if employee 8Attrition9
(Column C) and your variable of choice (Variable 2) are independent. Do not show any
calculations. (Word Limit: 50 words)
The two methods are using Probability Concepts and Chi-Square Test of Independence .
b) Use the data provided and apply both methods mentioned in your answer above to
investigate if employee 8Attrition9 (Column C) and your variable of choice (Variable 2) are
independent. Remember to apply all workings as shown in tutorials and lecture examples.
The below solution is based on the variables ATTRITION and DEPARTMENT
(Variable 2). If you have chosen a different variable 2, then follow similar steps as
outlined below. The numbers and calculations are provided in the Assignment 2
Solutions.xlsx file
PROBABILITY CONCEPTS
Count of Employee ID Attrition
Department No Yes Grand Total
Human Resources 3.47% 0.82% 4.29%
Research & Development 56.33% 9.05% 65.37%
Sales 24.08% 6.26% 30.34%
Grand Total 83.88% 16.12% 100.00%
P(Attrition = Yes|HR) = P(Attrition = Yes)?
P(Attrition = Yes|HR) = 19.05%
P(Attrition = Yes) = 16.12%
P(Attrition = Yes|HR) ≠ P(Attrition = Yes)
Therefore 8Attrition9 and 8Department9 to which the employees belong are not
independent
CHI-SQUARE TEST OF INDEPENDENCE
Observed frequencies
f0 fe (f0 2 fe )2
fe
51 52.84286 0.064268
12 10.15714 0.334358
828 806.0633 0.597001
133 154.9367 3.105915
354 374.0939 1.079312
92 71.90612 5.615153
Total = 10.8
As all expected frequencies are greater than 5, the distribution is appropriate to use Chi-
Square distribution
Step 1: Hypotheses
(f0 2 fe )2
χ2calc = ∑ = 10.8
fe
Degrees of freedom = (r 2 1) × (c 2 1) = (2 2 1) × (2 2 1) = 1 × 1 = 1
p-value = P(χ2 > 10.8) = CHISQ. DIST. RT(10.8,1) = 0.0045
Step 5: Conclusion
Therefore, the variables Attrition and Department are not independent OR are
dependent OR there is a relationship between Attrition and Department
OR
As all expected frequencies are greater than 5, the distribution is appropriate to use Chi-
Square distribution
Step 1: Hypotheses
Step 3: p-value
Step 5: Conclusion
Therefore, the variables Attrition and Department are not independent OR are
dependent OR there is a relationship between Attrition and Department
Yes, both methods conclude that there is a relationship between Attrition and
Department because the two probabilities are different and the chi square test of
independence confirms this.
Question 8 (4 marks)
Use an appropriate measure to compare the average monthly income for male and female
employees. Discuss the best measure used for this comparison and why you have chosen this
measure. Include relevant evidence for your answer.
Female Male
For both males and females, mean > median, therefore the distribution of monthly
income is right skewed which indicates that there are outliers. Since the mean is
sensitive to outliers, the median is a better measure of central tendency. So, based
on the median, the average monthly income for male employees is lower than
female employees
Question 9 (3 marks)
Compare the relative dispersion of monthly salaries for male and female employees. Which
measure have you chosen to do the comparison, and why have you chosen this measure?
Question 10 ( 4 + 5 = 9 marks)
We would like to estimate the average income per month for all female/male employees in the
company.
a) Discuss 2 ways of estimating the average income per month for
all female/male employees.
The two ways of estimating the average income per month for all male/female
employees are:
• point estimate which is the sample mean income per month for male/females
• interval estimate which provides a range of possible values that the average
income per month for male/females can fall within
b) Using both the methods you outlined in your answer above, estimate the average income
per month for all female/male employees. Show all working.
Male
Point Estimate
ÿ̅ = $6380.5079
Interval Estimate (95% CI)
Let X = monthly male salary ($)
Invoke CLT as the sample size 882 > 30, so ÿ̅~�㕁(�㔇, �㔎 2 /ÿ)
�㕠
Since σ unknown, the confidence interval for the mean is �㕥 ± ā�㗼⁄2,�㕛−1
√�㕛
�㗼 = 0.05; þĀ = ÿ 2 1 = 881
n = 882
s = 4714.8566
ā ýÿ�㕖ā�㕖ý�㕎�㕙 ă�㕎�㕙Ăÿ = ā�㕛−1,�㗼⁄2 = ā881, 0.05⁄2 = 1.963 (using Excel) or 1.960 (tables)
95% Confidence interval for the mean:
4707.9568
= 6380.5079 ± 1.963 ×
√882
= ($6069.03, $6692.10)
CI t Lower Limit Upper Limit
99% 2.581 $5970.69 $6790.33
90% 1.647 $6119.10 $6641.92
Female
Point Estimate
ÿ̅ = $6686.5663
Interval Estimate (95% CI)
Let X = monthly female salary ($)
Invoke CLT as the sample size 588 > 30, so ÿ̅~�㕁(�㔇, �㔎 2 /ÿ)
�㕠
Since σ unknown, the confidence interval for the mean is �㕥 ± ā�㗼⁄2,�㕛−1
√�㕛
�㗼 = 0.05; þĀ = ÿ 2 1 = 587
n = 588
s = 4695.6085
ā ýÿ�㕖ā�㕖ý�㕎�㕙 ă�㕎�㕙Ăÿ = ā�㕛−1,�㗼⁄2 = ā587, 0.05⁄2 = 1.964 (using Excel) or 1.960 (tables)
95% Confidence interval for the mean:
4695.6085
= 6686.5663 ± 1.964 ×
√588
= ($6306.25, $7066.89)
CI t Lower Limit Upper Limit
99% 2.584 $6186.15 $7186.99
90% 1.648 $6367.55 $7005.59
Question 11 (5 + 2 + 2 = 9 marks)
a) The Human Resource (HR) Manager is concerned about the attrition rate in the HR
department. He claims that the attrition rate in the HR department is higher than the
overall attrition rate for all departments which is approximately 16%. Based on the sample
data, is there evidence to support the claim at the 5% level of significance?
nπ = 63*0.16 = 10.08
n(1-π) = 63*(1-0.16) = 52.92
Step 1: Hypothesis
H0: π ≤ 16%
�㕝−�㔋
Z= �㔋∗(1−�㔋)
√
�㕛
0.1905−0.16
Zcalc = 0.16∗(1−0.16)
= 0.66
√
63
Critical value
Zcrit = Zɑ = 1.645
OR
p-value
OR
Step 5: Conclusion
Type II Error:
We conclude that the attrition rate in the HR department is not higher than the overall
attrition rate for all departments, when it actually is
Type II Error:
The manager might not focus on the problem of attrition in HR when there is a need to
do so. Resources are wasted elsewhere when they should be channelled in the HR
department.
Once you have filtered for your sample, copy and paste the sample into a new worksheet. Use
this new worksheet to answer the following question.
Question: We wish to explore the relationship between Work Experience (Total Working
Years) and Monthly Income ($).
Based on your chosen sample, provide a brief report on the relationship between the two
variables. Use ALL relevant simple linear regression tools which you have learned in Lecture
9 and Tutorial 10.
The below solution is based on Sample A. The scatterplots and regression outputs for
the other Samples are in the Assignment 2 Solutions.xlsx excel file. Follow similar steps
as outlined below for any chosen sample.
Sample A
20000
15000
10000
5000
0
0 10 20 30 40
Work Experience
- The data points are clustered around a line indicating that the relationship is
linear
- The relationship is positive because as the Work experience (X)increases, the
Monthly income (Y) as well
- The data points are closely clustered indicating a strong relationship between
Work Experience and Monthly income
Coefficient of determination
R2 is 50.00%.
50% of the variation in Monthly Income is explained by the by the regression model. 50%
of the variation Monthly Income is left unexplained by the model. Hence the model has a
moderate fit.
r = 0.71. There is a strong, positive, linear relationship between Monthly Income and
Work Experience
For every extra year of work experience, the estimated Monthly Income increases by
$469.93 on average.
Testing significance of the variable (can perform a right tail test as well)
Step 1: Hypotheses
H0 : ´1 = 0
H1 : ´1 b 0
³ = 0.05/0.01/0.1
Step 3: P-value
p-value = 0
Step 5: Conclusion