Visualization
Visualization
<jtis_speedmap>
<LINK_ID>3006-30069</LINK_ID>
<REGION>K</REGION>
<ROAD_TYPE>URBAN ROAD</ROAD_TYPE>
<ROAD_SATURATION_LEVEL>
TRAFFIC GOOD
</ROAD_SATURATION_LEVEL>
<TRAFFIC_SPEED>47</TRAFFIC_SPEED>
<CAPTURE_DATE>
2014-11-07T10:38:34
</CAPTURE_DATE>
</jtis_speedmap>
Field description:
Tag
<LINK_ID>
<REGION>
<ROAD_TYPE>
<ROAD_SATURATION_LE
VEL>
<TRAFFIC_SPEED>
Data Format
start_nodeend_node
The node ids of a start node
and an end node of the roads
in Hong Kong are included.
ENUM
K Kowloon
ST Sha Tin
TM Tuen Mun
HK HK Island
Sample
3006-30069
K
ST
TM
HK
URBAN ROAD
MAJOR ROUTE
ENUM
TRAFFIC GOOD
Indicate current road traffic TRAFFIC AVERAGE
condition in general.
TRAFFIC BAD
INT
Exact figures of current road 45
traffic speed on average per
ENUM
five minutes.
yyyy-MM-ddTHH:mm:ss
Exact date-time of record.
<CAPTURE_DATE>
2014-11-07T10:38:34
Field relationship:
Road type
urban road
Saturation level
max
min
max
min
traffic bad
14
24
traffic average
29
15
49
25
traffic good
30
50
major road
geographic positions cannot be calculated only with LINK_ID. Another table provided
by GovHK has information about these links and nodes. The structure of the table is
like this:
Attribute
Sample
Link ID
722-50059
Start Node
722
834038.674
50059
833862.7
816441.533
Region
HK
Road Type
MAJOR ROUTE
It is shown that a link is related to two nodes whose geographic positions are
given. The position of a road can be calculated with those of the matching start
nodes and end nodes. The geographic information in the table is all based on HK
1980 Grid Coordinate.
system crushes, finally we get the whole intact data from 5th Oct to 14th Oct. In
other words, 2880 xml files are the data source for the time-varying traffic condition
analysis. And each xml files consists of 617 records which are in the row data format
above.
So it is vital important to find a way to store and fetch these data efficiently and
flexibly. Using a database becomes our best choice. Then another Java programme
was implemented to extract all the raw data from the 2880 xml files and store each
raw data into a database table. Finally 1,776,960 records of raw data were stored
into a single table. Basically, it is too many for tableau to process and attributes like
Link_id have nothing related to real-life. So some pre-processed must be done:
Transfer attributes:
HK 1980 Grid Coordinate is used by nowhere but Hong Kong to locate places. So
)(
Latitude:
where
and
HK1980 Grid
819,069.90m N
836,694.05m E
221843.68 N
1141042.80 E
1
2,468,395.723 m
6,381,480.500 m
6,359,840.760 m
The conversion is finished by a Java application which updates all the records in
the MySQL database setting the longitude and latitude of the nodes.
Data aggregation:
Go through the database, we found that most saturation level of a road lasts at
least 5 records. It means that the minimum traffic jam lasts for about half an hour.
Then aggregation can be done with the data to reduce the count of
records.
Besides, 5 minutes traffic fluctuation should be regarded as noise data to the degree
of a days traffic condition.
3. Database Design
As mentioned above, database is necessary for both the traffic condition
analysis and the real-time traffic condition.
Since the data are not so complicated, our database consists of three tables for
data store and one view for Tableau input.
Tables:
1) Jtis_speedmap: store the original raw data provided by GovHK
Table Size: 1,776,960
Field Name
Type
link_id
char
region
char
road_type
char
road_saturation_level
char
traffic_speed
int
id
int
form_capture_date
datetime
Field Name
Type
link_id
char
region
char
road_type
char
road_saturation_level
char
traffic_speed
int
id
int
form_capture_date
datetime
Type
link_id
varchar
start_node
int
start_node_eastings
double
start_node_northings
double
end_node
int
end_node_eastings
double
end_node_northings
double
region
varchar
road_type
varchar
Type
region
char
road_type
char
road_saturation_level
char
traffic_speed
int
id
int
longitude
float
latitude
float
form_capture_date
datetime
Difficulties
1. Data Preprocessing
The raw data is not friendly to users, or Tableau. The only field related to
geographic position in the real-time data is LINK_ID. LINK_ID is the primary key of
another static table which contains geographic position information of all the start
nodes and end nodes of the links. But the information is not based on WGS84
longitudes and latitudes but on HK 1980 Grid Coordinate. Therefore, a
transformation is necessary for the geographic information. And this is done in Java.
The real-time data is captured every 5 minutes, which implicates the necessity of
granularity reduction. The aggregation is also done in Java, after which the interval is
enlarged to 30 minutes.
allow cross-domain requests. At last, the solution is to capture and save the data in
the local database and send the requests to the localhost for updated data.
Visualization Approaches
Undoubtedly, maps are used for directly perceived view. We also use line charts
for observing fluctuation and comparison, and area charts for changes of proportion.
Plenty of trials on colours are taken for clear labelling and quantification. Dark
colours are used on important objects while light colours are on unimportant ones.
Last but not least, videos recorded for feeling time-varying data are very important.
Visualization Techniques
1. Java: data capture, data store, data aggregation, data transfer
2. JavaScript: Google Maps JavaScript API v3 for real-time traffic condition
visualization
3. PHP: fetch pre-processed data from local server
4. MySQL: data store, data export for Tableau, data support for real-time traffic
condition visualization
5. Tableau: is the best tool to make the visualization. Discoveries turn up with the
use of pages, data filtration, all types of charts, painting the chart in different
combinations of colours, etc.
6. Fraps: record screen
7. Weebly: whole project presentation
Implement
1. The process of processing data for Tableau
New Insights
1. Usually, the roads in the traffic jams are constant. These are the section of New
Territories Ring Road adjacent to Tuen Mun and Sha Tin Metro Station, Waterloo
Road, Princess Margaret Road, West and East Kowloon Corridor and Gloucestor
Road from Admiralty to North Point.
2. For urban roads, Traffic speed in Hong Kong Island fluctuates most due to time in
one day, followed by that in Kowloon. But there is little fluctuation in traffic
speed in Tuen Mun. And the traffic speed in Sha Tin has become the fastest.
3. At most of time, traffic jams are most serious in HK Island. And morning peaks in
HK Island and Kowloon usually appear at 9, but those in Sha Tin tend to start at 8.
4. From Sunday to Monday, the average traffic speed in major routes goes straight
down into a low level, which means the serious morning peak.
5. Traffic speed in major routes are generally much higher than that in urban roads.
But when it comes to traffic jams, the urban roads are the better choice to go
through, except those in Tuen Mun.