Data Analysis Problems
Data Analysis Problems
Kaihua chen
46192837
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
day 1 2 3 4 5 6 7 8 9 10 11 12 13 14
After 52 53 50 52 49 50 54 54 60 56 50 48 48 46
day 15 16 17 18 19 20 21 22 23 24 25
After 50 51 48 55 49 48 55 58 47 53 51
For the hypothesis you formulated, test your hypothesis at a 5% significance level,
using the t Table provided below.
(10 marks)
Q1 Answer:
H 0 Providing real-time information to drivers does not affect traffic
:
efficiency.(Average speed=50km/h.)
(Xi X )
s i 1
3.525
n 1
x 0 51.48 50
t 2.099
s / n 3.5251 / 25
Because the significance level is 5%, calculate the degree of freedom and look up the table
df n - 1 24 t 0.95
(24) 1.711
t 2.099>t0.95 (24) 1.711
Reject the null hypothesis, accept the alternative hypothesis,so providing real-time
information to drivers improves traffic efficiency (average speed > 50km / h)
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
Q2: People’s weight (y) is assumed to be related to the height (x). Six observations
obtained from six people are as follows.
1 160 48
2 168 60
3 170 63
4 155 40
5 180 74
6 172 66
Use the data in the table to fit a linear regression model using the ordinary least-
squares method, assess your model’s performance, and interpret your model’s result.
(You can use R or Rstudio)
(10 marks)
Q2 Answer:
The parameters of the linear regression model were calculated with the following ordinary least
square formula.
1
x y i
n
i xi yi
1.
1
xi2 n ( xi ) 2
2. y x
Get the result: 1.3894 , -174.2212 .
Linear regression model is: y 1.3894 x 174.2212 .
The linear regression model and scatter plot are shown in “Figure 1” below.
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
Weight/kg Y=1.3894x-174.22
R2 =0.9896
Height/c
m
Figure 1
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
Use R-Studio to analyze the model and obtain the following results. The code of the analysis
process is shown in Figure 2 and figure 3
Figure 2
The R-square is equal to 0.9896, and the adjusted R-square is equal to 0.987, which means that
the "height" explains 98.96% of the "weight" change, and the model fits well.
F-statistic is equal to 380.1, P-value is equal to 0.0000482. If the significance level is 5%, then
the p value is significantly less than the significance level, which means that invalid hypothesis
( 0 ) is denied, and "height" has a significant effect on "weight". This also shows that the
model is good.
Figure 3
These two data can be obtained:
AIC = 24.77984.
BIC = 24.15512.
As for the regression coefficient of this model, it can be found that the weight (kg) may be
1.3894 times of the height (CM) minus 174.2212.
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
Q3: A time series of speed for vehicles passing by a motorway location is presented in
the table below. Use two-sided moving average to de-noise the data and complete the
table. The window size n is 3.
0 60
20 64
40 58
60 74
80 48
100 52
120 62
140 70
(10 marks)
Q3 Answer:
Using bilateral moving average method to denoise the data, the answer is as follows.
20 64 (60+64+58)/3=60.67
40 58 (64+58+74)/3=65.33
60 74 (58+74+48)/3=60.00
80 48 (74+48+52)/3=58.00
100 52 (48+52+62)/3=54.00
120 62 (52+62+70)/3=61.33
140 70 70
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
Q4: Download the traffic speed data from the blackboard, fit an ARIMA model in R or
RStudio, and use the graphical approach to justify the order you selected for your
ARIMA model.
(10 marks)
Q4 Answer:
Firstly, the traffic speed data is imported into "rstudio" and loaded into the time series
model, as shown in Figure 1 below:
Figure 1
Figure 2
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
Next, because the data graph is initially a nonstationary sequence, the trend and seasonality
are removed by differentiation. As shown in Figure 2 and Figure 3:
Figure 3
Figure 4
It can be seen from this graph that it is now a stationary sequence, so the value of D can be
taken as 0 .
After that, the "ACF" and "PACF" graphs generated by R are used to analyze the tail and
truncated tail of the graph to determine the p value and Q value, as shown in the following
figure "Figure 5":
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
Figure 5
From the above figure, we can get p = 0, q = 4, and get (0,0,4)
Figure 6
Then, the ACF function of residual error is used to test the error mode of the model. As
shown in the figure below.
Figure 7
As can be seen from the above figure, ARIMA (0,0,4) has passed the test.Then, Ljung box
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
test is used to test whether there is significant autocorrelation between model residuals.
Figure 8
Through the experiment in "Rstudio", we can see from the analysis results in the figure above
that P = 0.07961 > 0.05, so we accept the null hypothesis. This model can be white noise, and
ARIMA (0,0,4) has passed the test.And then use it auto.arima Command can get the best
model of the system.As shown in the figure below.
Figure 9
Figure 10
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
Figure 10
Figure 11