0% found this document useful (0 votes)
132 views

Data Analysis Problems

The document is a data analysis assignment for a civil engineering course. It contains 4 questions analyzing traffic and human data. Question 1 asks students to formulate hypotheses to test if providing real-time traffic information improves efficiency. Question 2 asks students to perform linear regression on human height and weight data. Question 3 asks students to apply a moving average to "de-noise" traffic speed data. Question 4 asks students to fit an ARIMA model to other traffic speed data and justify their order selection.

Uploaded by

陈凯华
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views

Data Analysis Problems

The document is a data analysis assignment for a civil engineering course. It contains 4 questions analyzing traffic and human data. Question 1 asks students to formulate hypotheses to test if providing real-time traffic information improves efficiency. Question 2 asks students to perform linear regression on human height and weight data. Question 3 asks students to apply a moving average to "de-noise" traffic speed data. Question 4 asks students to fit an ARIMA model to other traffic speed data and justify their order selection.

Uploaded by

陈凯华
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

The University of Queensland Data Analysis Assignment SEM2, 2020

CIVL7505 Research Methods for Civil Engineers

Data analysis problem

Kaihua chen

46192837
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers

IMPORTANT: Please complete each question independently and submit the


assessment before 23:59, the 30th of September 2020.
Q1: You want to test whether providing real-time information to drivers can increase
traffic efficiency (i.e., speed). Before providing real-time information, the average
speed on the road of interest is 50 km/h. After providing real-time information, you
observed that over 25 days, the average speed is as shown in table below. Formulate
the null hypothesis and the alternative hypothesis for the research question: Did
providing real-time information to drivers improve traffic efficiency?

day 1 2 3 4 5 6 7 8 9 10 11 12 13 14

After 52 53 50 52 49 50 54 54 60 56 50 48 48 46

day 15 16 17 18 19 20 21 22 23 24 25

After 50 51 48 55 49 48 55 58 47 53 51

For the hypothesis you formulated, test your hypothesis at a 5% significance level,
using the t Table provided below.

(10 marks)
Q1 Answer:
H 0 Providing real-time information to drivers does not affect traffic

efficiency.(Average speed=50km/h.)

H A :Providing drivers with real-time information can improve traffic


efficiency.(Average speed>50km/h.)
x  0
t
s/ n
52  53  50  52  49  50  54  54  60  56  50  48  48  46  50  51  48  55  49  48  55  58  47  53  51
x
25
 51.48
n 2

(Xi  X )
s i 1
 3.525
n 1
x  0 51.48  50
t   2.099
s / n 3.5251 / 25
Because the significance level is 5%, calculate the degree of freedom and look up the table
df  n - 1  24 t 0.95
(24) 1.711
t  2.099>t0.95 (24)  1.711
Reject the null hypothesis, accept the alternative hypothesis,so providing real-time
information to drivers improves traffic efficiency (average speed > 50km / h)
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers

Q2: People’s weight (y) is assumed to be related to the height (x). Six observations
obtained from six people are as follows.

ID Height x (cm) Weight y (kg)

1 160 48

2 168 60

3 170 63

4 155 40

5 180 74

6 172 66

Use the data in the table to fit a linear regression model using the ordinary least-
squares method, assess your model’s performance, and interpret your model’s result.
(You can use R or Rstudio)

(10 marks)
Q2 Answer:
The parameters of the linear regression model were calculated with the following ordinary least
square formula.
1
 x y i
n
i xi  yi

1.  
1
 xi2  n ( xi ) 2
 
2.  y   x
 
Get the result:   1.3894 ,   -174.2212 .
Linear regression model is: y  1.3894 x  174.2212 .
The linear regression model and scatter plot are shown in “Figure 1” below.
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers

Weight/kg Y=1.3894x-174.22
R2 =0.9896

Height/c
m

Figure 1
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers

Use R-Studio to analyze the model and obtain the following results. The code of the analysis
process is shown in Figure 2 and figure 3

Figure 2

The R-square is equal to 0.9896, and the adjusted R-square is equal to 0.987, which means that
the "height" explains 98.96% of the "weight" change, and the model fits well.

F-statistic is equal to 380.1, P-value is equal to 0.0000482. If the significance level is 5%, then
the p value is significantly less than the significance level, which means that invalid hypothesis

(   0 ) is denied, and "height" has a significant effect on "weight". This also shows that the
model is good.

Figure 3
These two data can be obtained:
AIC = 24.77984.
BIC = 24.15512.

As for the regression coefficient of this model, it can be found that the weight (kg) may be
1.3894 times of the height (CM) minus 174.2212.
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers

Q3: A time series of speed for vehicles passing by a motorway location is presented in
the table below. Use two-sided moving average to de-noise the data and complete the
table. The window size n is 3.

Time (s) Speed Processed speed using two-


(km/h) sided moving average

0 60

20 64

40 58

60 74

80 48

100 52

120 62

140 70

(10 marks)

Q3 Answer:
Using bilateral moving average method to denoise the data, the answer is as follows.

Time (s) Speed Processed speed using


(km/h) two- sided moving
average
0 60 60

20 64 (60+64+58)/3=60.67

40 58 (64+58+74)/3=65.33

60 74 (58+74+48)/3=60.00

80 48 (74+48+52)/3=58.00

100 52 (48+52+62)/3=54.00

120 62 (52+62+70)/3=61.33

140 70 70
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers

Q4: Download the traffic speed data from the blackboard, fit an ARIMA model in R or
RStudio, and use the graphical approach to justify the order you selected for your
ARIMA model.
(10 marks)

Q4 Answer:
Firstly, the traffic speed data is imported into "rstudio" and loaded into the time series
model, as shown in Figure 1 below:

Figure 1

Figure 2
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers

Next, because the data graph is initially a nonstationary sequence, the trend and seasonality
are removed by differentiation. As shown in Figure 2 and Figure 3:

Figure 3

Then,The "ADF" function is used to determine the stationarity of the model.

Figure 4

It can be seen from this graph that it is now a stationary sequence, so the value of D can be
taken as 0 .

After that, the "ACF" and "PACF" graphs generated by R are used to analyze the tail and
truncated tail of the graph to determine the p value and Q value, as shown in the following
figure "Figure 5":
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers

Figure 5
From the above figure, we can get p = 0, q = 4, and get (0,0,4)

The numerical value is shown in the figure6 below

Figure 6

Then, the ACF function of residual error is used to test the error mode of the model. As
shown in the figure below.

Figure 7

As can be seen from the above figure, ARIMA (0,0,4) has passed the test.Then, Ljung box
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers

test is used to test whether there is significant autocorrelation between model residuals.

Figure 8
Through the experiment in "Rstudio", we can see from the analysis results in the figure above
that P = 0.07961 > 0.05, so we accept the null hypothesis. This model can be white noise, and
ARIMA (0,0,4) has passed the test.And then use it auto.arima Command can get the best
model of the system.As shown in the figure below.

Figure 9

Figure 10
The University of Queensland Data Analysis Assignment SEM2, 2020
CIVL7505 Research Methods for Civil Engineers

Figure 10

Figure 11

You might also like