0% found this document useful (0 votes)
10 views

Assignment 2 Solutions

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Assignment 2 Solutions

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

lOMoARcPSD|27692526

Assignment 2 Solutions

Business Statistics (Monash University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Bibi Yusra Ruhomally ([email protected])
lOMoARcPSD|27692526

ASSIGNMENT 2 SOLUTIONS
Question 1 (2 marks)
Which department in the company is the largest in staffing? Include a pivot table to support
your answer.
The Research and Development department has the largest staffing of 961 employees

Department Count of Employee ID


Human Resources 63
Research & Development 961
Sales 446
Grand Total 1470

Question 2 (2 marks)
Calculate the attrition rate for new staff, i.e. those who have been with the company for less
than one year. Show your working.

16/44 = 0.36 or 36%

OR

Count of
YearsAtCompany Attrition
Grand
YearsAtCompany No Yes Total
0 63.64% 36.36% 100.00%
Grand Total 63.64% 36.36% 100.00%

Question 3 (2 marks)
Compare the attrition rate based on the different genders. Explain briefly and provide relevant
evidence in your discussions.

Count of
Employee ID Attrition
Grand
Gender No Yes Total
Female 85.20% 14.80% 100.00%
Male 82.99% 17.01% 100.00%
Grand Total 83.88% 16.12% 100.00%

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

Overall, the attrition rate for Male employees (17.01%) is greater than that for female
employees (14.80%)

Question 4 (3 marks)
Suppose that we randomly select 15 employees. What is the probability that at least 5 of them
will leave the company? Define the distribution and show all working.
n = 15; p = P(Attrition = yes) = 0.1612
Y ~ Bin(15,0.1612)
P(Y >= 5) = 1 - P(Y < 4)
=1-BINOM.DIST(4,15,0.1612,TRUE)
= 0.0799 or 7.99%

Question 5 (3 marks)
What is the likelihood that a staff employed in Research & Development will leave the
company? Describe two different ways to obtain this answer using Pivot tables. Show your
working.
P(Attrition = Yes|Research & Development) = 13.84%
Two Ways of calculating:
1) Using the Grand Total Table
Count of Employee ID Attrition
Grand
Department No Yes Total
Human Resources 3.47% 0.82% 4.29%
Research & Development 56.33% 9.05% 65.37%
Sales 24.08% 6.26% 30.34%
Grand Total 83.88% 16.12% 100.00%

�㕃(�㔴āāÿ�㕖ā�㕖Āÿ = ĀÿĀ �㕎ÿþ �㕅&�㔷) 9.05%


�㕃(�㔴āāÿ�㕖ā�㕖Āÿ = ĀÿĀ │�㕅&�㔷) = = = 13.84%
�㕃(�㕅&�㔷) 65.37%

2) Using Row/Column Total Table


Count of Employee ID Attrition
Grand
Department No Yes Total
Human Resources 80.95% 19.05% 100.00%
Research & Development 86.16% 13.84% 100.00%
Sales 79.37% 20.63% 100.00%
Grand Total 83.88% 16.12% 100.00%

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

�㕃(�㔴āāÿ�㕖ā�㕖Āÿ = ĀÿĀ │�㕅&�㔷) = 13.84%

Question 6 (4 marks)
The HR Director claims that employees from Research & Development are less likely to leave
the company than their colleagues from other departments. Based on the data provided, do you
agree with the HR Director? Provide a clear explanation, including a pivot table to support
your answer.

Count of Employee ID Attrition


Grand
Department No Yes Total
Human Resources 80.95% 19.05% 100.00%
Research & Development 86.16% 13.84% 100.00%
Sales 79.37% 20.63% 100.00%
Grand Total 83.88% 16.12% 100.00%

Yes, I agree with the HR Director because P(Attrition = Yes| R&D) = 13.84% < P(Attrition
= Yes| HR)= 19.05% < P(Attrition = Yes| Sales) = 20.63% (answer can also be a sentence
without prob statements as long as the values are mentioned)

Question 7 ( 1 + 7 + 1 = 9 marks)

For this question, you are required to select two variables:


• Variable 1: Column C <Attrition=
• Variable 2: Any categorical variable of your choice from the dataset

a) State two methods you have learnt which can be used to investigate if employee 8Attrition9
(Column C) and your variable of choice (Variable 2) are independent. Do not show any
calculations. (Word Limit: 50 words)

The two methods are using Probability Concepts and Chi-Square Test of Independence .

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

b) Use the data provided and apply both methods mentioned in your answer above to
investigate if employee 8Attrition9 (Column C) and your variable of choice (Variable 2) are
independent. Remember to apply all workings as shown in tutorials and lecture examples.
The below solution is based on the variables ATTRITION and DEPARTMENT
(Variable 2). If you have chosen a different variable 2, then follow similar steps as
outlined below. The numbers and calculations are provided in the Assignment 2
Solutions.xlsx file
PROBABILITY CONCEPTS
Count of Employee ID Attrition
Department No Yes Grand Total
Human Resources 3.47% 0.82% 4.29%
Research & Development 56.33% 9.05% 65.37%
Sales 24.08% 6.26% 30.34%
Grand Total 83.88% 16.12% 100.00%
P(Attrition = Yes|HR) = P(Attrition = Yes)?
P(Attrition = Yes|HR) = 19.05%
P(Attrition = Yes) = 16.12%
P(Attrition = Yes|HR) ≠ P(Attrition = Yes)
Therefore 8Attrition9 and 8Department9 to which the employees belong are not
independent
CHI-SQUARE TEST OF INDEPENDENCE
Observed frequencies
f0 fe (f0 2 fe )2
fe
51 52.84286 0.064268
12 10.15714 0.334358
828 806.0633 0.597001
133 154.9367 3.105915
354 374.0939 1.079312
92 71.90612 5.615153
Total = 10.8

As all expected frequencies are greater than 5, the distribution is appropriate to use Chi-
Square distribution

Step 1: Hypotheses

H0: The variables Attrition and Department are independent

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

H1: The variables Attrition and Department are dependent

Step 2: Test Statistic

(f0 2 fe )2
χ2calc = ∑ = 10.8
fe

Step 3: p-value/critical value

Degrees of freedom = (r 2 1) × (c 2 1) = (2 2 1) × (2 2 1) = 1 × 1 = 1
p-value = P(χ2 > 10.8) = CHISQ. DIST. RT(10.8,1) = 0.0045

Critical value = χ2 ∝,�㕑�㕓 = χ2 5%,1 = 5.991

(If α = 1%, Critical value = 9.21; If α = 10%, Critical value = 4.601)

Step 4: Decision Rule and Decision

Reject H0 if p-value < α

Since 0.0045 < 0.05/0.10/0.01 , we can reject the null hypothesis

Step 5: Conclusion

We can reject H0 at the 50% significance level.

The sample does provide sufficient evidence against H0.

Therefore, the variables Attrition and Department are not independent OR are
dependent OR there is a relationship between Attrition and Department

OR

As all expected frequencies are greater than 5, the distribution is appropriate to use Chi-
Square distribution

Step 1: Hypotheses

H0: The variables Attrition and Department are independent

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

H1: The variables Attrition and Department are dependent

Step 2: Expected Freq

EXPECTED FREQ Attrition


Department No Yes
Human Resources 52.84286 10.15714
Research & Development 806.0633 154.9367
Sales 374.0939 71.90612

Step 3: p-value

p-value = CHISQ.TEST = 0.0045

Step 4: Decision Rule and Decision

Reject H0 if p-value < α

Since 0.0045 < 0.05/0.10/0.01 , we can reject the null hypothesis

Step 5: Conclusion

We can reject H0 at the 50% significance level.

The sample does provide sufficient evidence against H0.

Therefore, the variables Attrition and Department are not independent OR are
dependent OR there is a relationship between Attrition and Department

c) Are your conclusions from both methods consistent? Explain briefly.

Yes, both methods conclude that there is a relationship between Attrition and
Department because the two probabilities are different and the chi square test of
independence confirms this.

If the two methods give inconsistent results:


No, both methods provide different conclusions about the relationship between
Attrition and Department. The two probabilities are not exactly the same but are very
similar/close.

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

Question 8 (4 marks)
Use an appropriate measure to compare the average monthly income for male and female
employees. Discuss the best measure used for this comparison and why you have chosen this
measure. Include relevant evidence for your answer.

Female Male

Mean $6686.57 $6380.51

Median $5081.50 $4837.50

For both males and females, mean > median, therefore the distribution of monthly
income is right skewed which indicates that there are outliers. Since the mean is
sensitive to outliers, the median is a better measure of central tendency. So, based
on the median, the average monthly income for male employees is lower than
female employees

Question 9 (3 marks)
Compare the relative dispersion of monthly salaries for male and female employees. Which
measure have you chosen to do the comparison, and why have you chosen this measure?

The coefficient of variation is a measure of relative dispersion as it measures the standard


deviation relative to the mean relative to the mean as the two distributions have different
means. Based on CV, the monthly income for male employees has a higher relative
dispersion than the monthly income for female employees of (73.89% > 70.22%)

Question 10 ( 4 + 5 = 9 marks)

We would like to estimate the average income per month for all female/male employees in the
company.
a) Discuss 2 ways of estimating the average income per month for
all female/male employees.

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

The two ways of estimating the average income per month for all male/female
employees are:
• point estimate which is the sample mean income per month for male/females
• interval estimate which provides a range of possible values that the average
income per month for male/females can fall within

b) Using both the methods you outlined in your answer above, estimate the average income
per month for all female/male employees. Show all working.
Male
Point Estimate
ÿ̅ = $6380.5079
Interval Estimate (95% CI)
Let X = monthly male salary ($)
Invoke CLT as the sample size 882 > 30, so ÿ̅~�㕁(�㔇, �㔎 2 /ÿ)
�㕠
Since σ unknown, the confidence interval for the mean is �㕥 ± ā�㗼⁄2,�㕛−1
√�㕛

�㗼 = 0.05; þĀ = ÿ 2 1 = 881
n = 882
s = 4714.8566
ā ýÿ�㕖ā�㕖ý�㕎�㕙 ă�㕎�㕙Ăÿ = ā�㕛−1,�㗼⁄2 = ā881, 0.05⁄2 = 1.963 (using Excel) or 1.960 (tables)
95% Confidence interval for the mean:
4707.9568
= 6380.5079 ± 1.963 ×
√882
= ($6069.03, $6692.10)
CI t Lower Limit Upper Limit
99% 2.581 $5970.69 $6790.33
90% 1.647 $6119.10 $6641.92

Female
Point Estimate
ÿ̅ = $6686.5663
Interval Estimate (95% CI)
Let X = monthly female salary ($)
Invoke CLT as the sample size 588 > 30, so ÿ̅~�㕁(�㔇, �㔎 2 /ÿ)

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

�㕠
Since σ unknown, the confidence interval for the mean is �㕥 ± ā�㗼⁄2,�㕛−1
√�㕛

�㗼 = 0.05; þĀ = ÿ 2 1 = 587
n = 588
s = 4695.6085
ā ýÿ�㕖ā�㕖ý�㕎�㕙 ă�㕎�㕙Ăÿ = ā�㕛−1,�㗼⁄2 = ā587, 0.05⁄2 = 1.964 (using Excel) or 1.960 (tables)
95% Confidence interval for the mean:
4695.6085
= 6686.5663 ± 1.964 ×
√588
= ($6306.25, $7066.89)
CI t Lower Limit Upper Limit
99% 2.584 $6186.15 $7186.99
90% 1.648 $6367.55 $7005.59

Question 11 (5 + 2 + 2 = 9 marks)

a) The Human Resource (HR) Manager is concerned about the attrition rate in the HR
department. He claims that the attrition rate in the HR department is higher than the
overall attrition rate for all departments which is approximately 16%. Based on the sample
data, is there evidence to support the claim at the 5% level of significance?

Let X = number of employees attrited in the HR department

nπ = 63*0.16 = 10.08
n(1-π) = 63*(1-0.16) = 52.92

Since both are greater than 5, normal assumption is valid


p = 0.1905

Step 1: Hypothesis

H0: π ≤ 16%

H1: π > 16%

Step 2: Test statistic

�㕝−�㔋
Z= �㔋∗(1−�㔋)

�㕛

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

0.1905−0.16
Zcalc = 0.16∗(1−0.16)
= 0.66

63

Step 3: Critical Value/p-value

Critical value

Zcrit = Zɑ = 1.645

OR

p-value

P(Z > 0.66) = 0.2545

Step 4: Decision Rule and Decision

Reject H0 if Zcalc > Zcrit

Since 0.66 < 1.645 , we cannot reject the null hypothesis

OR

Reject H0 if p-value < ɑ

Since 0.2545 > 0.05, we cannot reject the null hypothesis

Step 5: Conclusion

There is insufficient evidence at the 5% level of significance to conclude that the


attrition rate in the HR department is higher than the overall attrition rate for all
departments which is approximately 16% . Therefore, it does not support the managers
claim.

b) Interpret the Type I Error/Type II Error in the context of this question.


Type I Error:
We conclude that the attrition rate in the HR department is higher than the overall
attrition rate for all departments, when it is actually not.

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

Type II Error:
We conclude that the attrition rate in the HR department is not higher than the overall
attrition rate for all departments, when it actually is

c) Discuss the business implication of the error interpreted above.


Type I Error:
The HR may end up implementing policies to reduce attrition in their department when
the policies for actually required in other departments.

Type II Error:
The manager might not focus on the problem of attrition in HR when there is a need to
do so. Resources are wasted elsewhere when they should be channelled in the HR
department.

Question 12 (10 marks)


For this question, you are NOT required to use the entire dataset. Instead, select a sample based
on your Student ID as explained below. You will need to refer to column <R= (Random Sample)
in order to choose the right sample. Filter your data based on the following instructions:

• If your Student ID ends with 0, choose Sample A

• If your Student ID ends with 1, choose Sample B

• If your Student ID ends with 2, choose Sample C

• If your Student ID ends with 3, choose Sample D

• If your Student ID ends with 4, choose Sample E

• If your Student ID ends with 5, choose Sample F

• If your Student ID ends with 6, choose Sample G

• If your Student ID ends with 7, choose Sample H

• If your Student ID ends with 8, choose Sample I

• If your Student ID ends with 9, choose Sample J

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

Once you have filtered for your sample, copy and paste the sample into a new worksheet. Use
this new worksheet to answer the following question.

Question: We wish to explore the relationship between Work Experience (Total Working
Years) and Monthly Income ($).

Based on your chosen sample, provide a brief report on the relationship between the two
variables. Use ALL relevant simple linear regression tools which you have learned in Lecture
9 and Tutorial 10.

The below solution is based on Sample A. The scatterplots and regression outputs for
the other Samples are in the Assignment 2 Solutions.xlsx excel file. Follow similar steps
as outlined below for any chosen sample.

Sample A

Monthly Income on Work Experience


25000
Monthly Income

20000
15000
10000
5000
0
0 10 20 30 40
Work Experience

- The data points are clustered around a line indicating that the relationship is
linear
- The relationship is positive because as the Work experience (X)increases, the
Monthly income (Y) as well
- The data points are closely clustered indicating a strong relationship between
Work Experience and Monthly income

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

Coefficient of determination

R2 is 50.00%.

50% of the variation in Monthly Income is explained by the by the regression model. 50%
of the variation Monthly Income is left unexplained by the model. Hence the model has a
moderate fit.

Correlation coefficient (r):

r = 0.71. There is a strong, positive, linear relationship between Monthly Income and
Work Experience

Slope coefficient (b1):

For every extra year of work experience, the estimated Monthly Income increases by
$469.93 on average.

Testing significance of the variable (can perform a right tail test as well)

Step 1: Hypotheses

H0 : ´1 = 0

H1 : ´1 b 0

Step 2: Level of Significance

³ = 0.05/0.01/0.1

Step 3: P-value

Downloaded by Bibi Yusra Ruhomally ([email protected])


lOMoARcPSD|27692526

p-value = 0

Step 4: Decision and Decision Rule

Reject H0 if p-value < ³

Since 0 < 0.05/0.01/0.1, we reject H0

Step 5: Conclusion

We can reject H0 at the 5%%0% significance level.

The sample provides enough evidence against H0

There is a significant linear relationship between Work Experience and Income.

Downloaded by Bibi Yusra Ruhomally ([email protected])

You might also like