Data Analysis - Answer
Data Analysis - Answer
IC NO : 930314-05-5057
STUDENT ID : 930314055057001
QUESTION 1
A Human Resource manager in an IT company would like to know whether the median
salary of fresh graduate employees with IT background is more than RM3500 per month.
Data of salary for 10 fresh graduates is shown in Table 1.
Table 1
Employee Salary (RM)
1 2300
2 1800
3 3600
4 3700
5 4000
6 5500
7 4400
8 3500
9 2900
10 3700
a) Explain which statistic distribution should be used in order to test the manager’s claim.
T- test statistic.
b) Construct a complete hypothesis test to test whether the manager’s claim is correct or
not at = 0.05.
Hypothesis testing
H 0 : μ≤ 3500
H 1 : μ> 3500
Critical value
The significance level is α =0.05
The critical value for a right-tailed test is, t = 1.833 based on t distribution table.
Test Statistic
^
X−μ
t=
s
√n
Mean ( x )=
∑x
n
2300+1800+3600+3700+ 4000+5500+4400+ 3500+ 2900+3700
Mean ( x )=
10
35400
Mean ( x )=
10
Mean ( x )=3540
√ (∑ x )
2
∑ x− n
Sample Standard deviation( s)=
n−1
Employee Salary (RM) X2
(x)
1 2300 5290000
2 1800 3240000
3 3600 12960000
4 3700 13690000
5 4000 16000000
6 5500 30250000
7 4400 19360000
8 3500 12250000
9 2900 8410000
10 3700 13690000
35400 135140000
√
( 35400 )2
135140000−
10
Sample Standard deviation( s)=
10−1
3540−3500
t=
1044.7754
√10
t=0.121
Conclusion
It is concluded that the null hypothesis H0 is not rejected. Therefore, there is not enough
evidence to claim that the population mean μ is greater than 3500, at the α =0.05 significance
level.
c) Based on your answer in (b) state a brief summary of the median salary of fresh
graduate employees with IT background.
Based on the answer in (b), at 0.05 significance level, there is not enough evidence to
conclude that median salary of fresh graduate employees with IT background is more than
RM 3500 per month.
QUESTION 2
According to a fast-food manager, the average number of customers between 8am to 11am
is 10, with a variance of two. His friend from another outlet does not believe that the
variance is two. He counts the number of customers in 15 other outlets and obtained the
data as in Table 2.
Table 2
a) Explain which statistic distribution should be used in order to test whether the variance
is different from two.
Chi-Square test.
Test Statistics
(n−1)s 2
X2= 2
σ
( (∑ ) )
n n 2
1 1
s=
2
n−1 ∑ X −n 2
i X
2
i
i=1 i=1
(X )
1 11 121
2 10 100
3 9 81
4 10 100
5 10 100
6 11 121
7 11 121
8 10 100
9 12 144
10 9 81
11 7 49
12 9 81
13 11 121
14 10 100
15 11 121
TOTAL 151 1541
( )
2
2 1 151
s= 1541−
15−1 15
2
s =1.4952
So;
Sample variance( s)=1.4952
2
Population Variance(σ )=2
Sample ¿ n ¿=15
Significance Level(α )=0.05
( 15−1 ) ∙ 1.4952
X2=
2
2
X =10.466
Decision
Since it is observed that X 2L =5.629≤ X 2=10.466 ≤ X 2U =26.119 , then we can conclude that the
null hypothesis is not rejected.
Conclusion
It is concluded that the null hypothesis H 0 is not rejected. Therefore, there is enough
evidence to claim that the population variance σ 2 is different that 2 at the 0.05 significance
level.
c) Based on your answer in (b) state a brief summary of the variance number of
customers.
Based on the answer in (b), we have concluded that variance is not different from 2 at 0.05
significance level.
QUESTION 3
A homeowner is interested in the effect that using the air conditioner and washing machine
had on the electric bill. He recorded the number of hours the air conditioner and the
number of times washing machine were used for 14 days. He also monitored the electric
meter for these 14 days, and computed the amount of electricity used each day in kilowatt-
hours. Data is shown in Table 3.
Table 3
a) Use Microsoft Excel to produce Analysis of Variance table for the multiple linear
regression and state the model.
Regression Statistics
Multiple R 0.904805
R Square 0.818672
Adjusted R Square 0.785703
Standard Error 10.09556
Observations 14
ANOVA
Significance
df SS MS F
F
Regression 2 5061.73 2530.87 24.832 0.000
Residual 11 1121.12 101.92
Total 13 6182.86
Where ;
y=Electricity Used (kwh)
x 1= Air Conditioner ( hours )
b) Construct a complete hypothesis test to test whether the model in (a) is fit at = 0.05.
H 0 : β 1=β 2=0
H 1 : β1 ≠ β 2 ≠ 0
F Stats =24.832
P−value=0.000
The p−value isless than α . So we can reject the null hypothesis( H 0)
As a conclusion , at α=0.05 ,the model is significant .
c) Construct a complete hypothesis test to test whether the regression coefficients, 1 and
2, are significantly zero at = 0.05.
β1
H 0 : β 1=0
H 1 : β1 ≠ 0
t=6.1294
P−value=0.000
Sincethe p−valueis less than 0.05 , so we can reject null hypothesis ( H 0 ) .
β2
H 0 : β 2=0
H 1 : β2 ≠ 0
t=6.1294
P−value=0.000
Sincethe p−valueis less than 0.05 , so we can reject null hypothesis ( H 0 ) .
d) Based on your answer in (b) and (c) state a brief summary on the linear relationship
between electricity used and the two factors.
Based on the answer in (b) and (c), there is a positive linear relationship between dependent
(electronic consumption) and independent variables (air conditioner and washing machine).
It shows that when the consumption of air conditioner increases than the electronic
consumption also increases. Similar for washing machine when the washing machine usage
increases electronic consumption also increases.