A Unified Storage Format of Traffic Data Atomic Files in LibCity
A Unified Storage Format of Traffic Data Atomic Files in LibCity
Jingyuan Wang
Beihang University, Beijing, China
Atomic Files
• In order to uniformly represent different types of traffic data, LibCity defines five kinds
of atomic files, that is, the five smallest information units in traffic data.
.dyna
.ext
Describe information that helps traffic forecasts, such as weather, temperature, etc.
Atomic Files
• For different traffic prediction tasks, different atomic files may be used, and
a dataset may not contain all six kinds of atomic files.
• The format of .geo, .usr, .rel, .dyna, and .ext is similar to the csv file, which
consists of multiple columns of data.
geo_id type coordinates
• Polygon: [[ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]]
METR_LA.geo Foursqaure.geo
• Polygon: [[ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]]
Usr Table
usr_id
0
1
2
3
4
METR_LA.rel METR_LA.geo
Road Network
Dyna Table
• The rows in the table should be aggregated according to <entity_id>, rows with the same
<entity_id> are sorted by <time>.
Dyna Table
• For entities that can use one-dimensional numbering for sensors, road sections, areas,
etc., entity_id is the corresponding ID, the column name is [entity_id], and the file suffix
name is .dyna.
• For grid-based traffic data, the entity_id is [row_id, column_id], and the file extension
is .grid.
dyna_id type time row_id column_id risk geo_id type coordinates row_id column_id
0 state 2013-01-01T00:00:00Z 0 0 0 0 Polygon [] 0 7
1 state 2013-01-01T01:00:00Z 0 0 0 1 Polygon [] 0 8
2 state 2013-01-01T02:00:00Z 0 0 0 2 Polygon [] 0 10
3 state 2013-01-01T03:00:00Z 0 0 0 3 Polygon [] 0 11
4 state 2013-01-01T04:00:00Z 0 0 0 4 Polygon [] 0 12
NYC_RISK.grid NYC_RISK.geo
Grid-based
Dyna Table
• For od-based traffic data, the entity_id is [origin_id, destination_id], and the file suffix
name is .od.
Data.od Data.geo
OD-based
Dyna Table
dyna_id type time orgin_row_id origin_column_id destination_row_id destination_column_id geo_id type coordinates row_id column_id
0 state 2013-01-01T00:00:00Z 0 1 2 1 0 Polygon [] 0 7
1 state 2013-01-01T01:00:00Z 0 1 2 1 1 Polygon [] 0 8
2 state 2013-01-01T02:00:00Z 0 1 2 1 2 Polygon [] 0 10
3 state 2013-01-01T03:00:00Z 0 1 2 1 3 Polygon [] 0 11
4 state 2013-01-01T04:00:00Z 0 1 2 1 4 Polygon [] 0 12
Data.gridod Data.geo
Dyna Table
• The traj_id column represents the number of multiple trajectories of the same user (starting from 0) and if
the user has only one trajectory this column can be empty.
• The rows in the table should be aggregated according to <entity_id>, rows with the same
<entity_id> are sorted by <traj_id>, and rows with the same <traj_id> are sorted by
<time>.
Dyna Table
• The format is: dyna_id, type, time, entity_id, (traj_id), coordinates, properties.
• The coordinates column is the latitude and longitude of the GPS point.
• The format is: dyna_id, type, time, entity_id, (traj_id), location, properties.
• The content of the location column is geo_id, which refs to the geo table and represents a
POI.
0 2012-03-01T00:00:00Z 272.03
1 2012-03-01T00:05:00Z 271.46
2 2012-03-01T00:10:00Z 271.19
3 2012-03-01T00:15:00Z 271.07
4 2012-03-01T00:20:00Z 270.83
Data.ext
Data Type Definition
The data type definition of each column in the dataset needs to be given in
the config file, which is helpful for subsequent data processing.
Type Description
geo_id Discrete limited IDs that exist in the Geo table.
usr_id Discrete limited IDs that exist in the Usr table.
rel_id Discrete limited IDs that exist in the Rel table.
time Time string conforming to ISO-8601 standard.
coordinate String conforming to the coordinate representation in Geojosn format.
num Real number.
enum Enum string.
other The rest are stored in string type.
Config File
• The config file is used to supplement the information describing the above
five tables. It is stored in json format and consists of six keys: geo, usr, rel,
dyna, ext, and info.
• For geo, rel, dyna:
• Contains a key of including_types, and uses an array to describe the type values in the table. After that,
each type is used as a key, describing which keys are contained in the properties table and their data types
under the type.
• For info:
• Contains other necessary statistical information of the dataset, for different traffic prediction tasks,
contains different contents.
Config File
"geo":{ "usr":{ "rel":{
"including_types":[ "properties":{ "including_types":[
"Point" "user_type":"enum", "geo"
], "birth_year":"time", ],
"Point":{ "gender":"enum" "geo":{
"poi_name":"other", } "link_weight":"num"
} }, }
} },
LibCity: https://libcity.ai/
Application
• LibCity has converted 35 open source datasets covering 22 cities in 11 countries
into standard atomic format datasets.
• LibCity also open sourced the atomic file conversion scripts for users to refer to
when converting their own traffic datasets.
Jingyuan Wang
Beihang University, Beijing, China
[email protected], http://www.bigcity.ai