0% found this document useful (0 votes)
36 views27 pages

A Unified Storage Format of Traffic Data Atomic Files in LibCity

The document describes a unified storage format called Atomic Files in LibCity for representing different types of traffic data. It defines five types of atomic files: .geo, .usr, .rel, .dyna, and .ext that respectively contain geographical entity information, user entity information, entity relational information, traffic state information, and additional auxiliary information. Each atomic file consists of multiple columns of data in a CSV-like format to describe the defined types of information. The atomic files provide a standardized way to uniformly store and access different types of traffic data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views27 pages

A Unified Storage Format of Traffic Data Atomic Files in LibCity

The document describes a unified storage format called Atomic Files in LibCity for representing different types of traffic data. It defines five types of atomic files: .geo, .usr, .rel, .dyna, and .ext that respectively contain geographical entity information, user entity information, entity relational information, traffic state information, and additional auxiliary information. Each atomic file consists of multiple columns of data in a CSV-like format to describe the defined types of information. The atomic files provide a standardized way to uniformly store and access different types of traffic data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

A Unified Storage Format of Traffic Data:

Atomic Files in LibCity

Jingyuan Wang
Beihang University, Beijing, China
Atomic Files
• In order to uniformly represent different types of traffic data, LibCity defines five kinds
of atomic files, that is, the five smallest information units in traffic data.

File Name Information Meaning


Geographical entity Describe the attribute information of three types of entities of point, line,
xxx.geo
information and area in geographic space, such as POI, road segement, area, etc.
User entity Describe the attributes of people involved in transportation, such as age,
xxx.usr
information gender, etc.
Entity relational Describe the relationship between entities, such as the adjacency
xxx.rel
information relationship between road sections.
Traffic state Describe the state of the traffic system on each entity, such as the speed of
xxx.dyna
information each intersection, etc.
Additional auxiliary Describe information that helps traffic forecasts, such as weather,
xxx.ext
information temperature, etc.
Configuration
config.json information
Used to supplement the description of the above table information.
Atomic Files
.geo .usr .rel

Point LineString Polygon


< geo_id, type, coordinates, properties> <user_id, properties> < rel_id, type, origin_id, destination_id, properties>

.dyna

Group traffic dynamics Individual traffic dynamics Relationship dynamics

Flow/Speed/ Individual travel trajectory Changes in connection


Changes in OD flow
Demand… Individual OD peering…

.ext
Describe information that helps traffic forecasts, such as weather, temperature, etc.
Atomic Files

• For different traffic prediction tasks, different atomic files may be used, and
a dataset may not contain all six kinds of atomic files.

• The format of .geo, .usr, .rel, .dyna, and .ext is similar to the csv file, which
consists of multiple columns of data.
geo_id type coordinates

773869 Point [-118.32,34.15] dyna_id type time entity_id traffic_speed


767541 Point [-118.24,34.12] METR-LA.geo 0 state 2012-03-01T00:00:00Z 773869 64.375
... ... ...
1 state 2012-03-01T00:05:00Z 773869 62.667
769373 Point [-118.32,34.10]
... ... ... ... ...
rel_id type origin_id destination_id cost
7094303 state 2012-06-27T23:55:00Z 769373 61.778
0 geo 716328 716328 0.0

1 geo 716328 716331 4123.8 METR-LA.rel METR-LA.dyna

... ... ... ... ...

11752 geo 774207 774207 0.0


Geo Table

Geo Table: Geographical entity information


geo_id, type, coordinates, properties (multiple columns).
• geo_id: The primary key uniquely determines a geo entity. (E.g. Number of sensors,
latitude and longitude points, road sections, areas, etc.)
• type: The type of geo. Range in [Point, LineString, Polygon]. These three values are
consistent with the points, lines and planes in Geojson.
• coordinates: Array or nested array composed of float type. Describe the location
information of the geo entity, using the coordinates format of Geojson.
• properties: Describe the attribute information of the geo entity. If there are multiple
attributes, you can use different column names to define multiple columns of data, such
as POI_name, POI_type.
Geo Table

Geo Table: Geographical entity information


geo_id, type, coordinates, properties (multiple columns).

• Note: Geojson’s coordinates format: (Longitude first, latitude second)


• Point: [102.0, 0.5]

• LineString: [ [102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0] ]

• Polygon: [[ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]]

Point LineString Polygon


Geo Table

Geo Table: Geographical entity information


geo_id, type, coordinates, properties (multiple columns).

geo_id type coordinates geo_id type coordinates venue_category_id venue_category_name


0 Point [-74.003,40.733] 4bf58dd8d48988d1e7931735 Music Venue
773869 Point [-118.31828, 34.15497]
1 Point [-73.975,40.758] 4bf58dd8d48988d176941735 Gym / Fitness Center
767541 Point [-118.23799, 34.11620]
767542 Point [-118.23818, 34.11640] 2 Point [-74.003,40.652] 4bf58dd8d48988d1e4931735 Bowling Alley

717447 Point [-118.26772, 34.07248] 3 Point [-73.980,40.726] 4bf58dd8d48988d118941735 Bar

717446 Point [-118.26572, 34.07142] 4 Point [-73.967,40.756] 4bf58dd8d48988d11d941735 Bar

METR_LA.geo Foursqaure.geo

• Note: Geojson’s coordinates format: (Longitude first, latitude second)


• Point: [102.0, 0.5]

• LineString: [ [102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0] ]

• Polygon: [[ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]]
Usr Table

Usr Table: User entity information


usr_id, properties (multiple columns).
• usr_id: The primary key uniquely determines a usr entity.
• properties: Describe the attribute information of the usr entity. If there are multiple
attributes, different column names can be used to define multiple columns of data, such
as gender, birth_date.

usr_id
0
1
2
3
4

Foursqaure.usr User portrait Travel preferences


Rel Table

Rel Table: Entity relational information


rel_id, type, origin_id, destination_id, properties (multiple columns).
• rel_id: The primary key uniquely determines the relationship between entities.
• type: The type of rel. Range in [usr, geo], which indicates whether the relationship is
based on geo or usr.
• origin_id: The ID of the origin of the relationship, which is either in the Geo table or in
the Usr table.
• destination_id: The ID of the destination of the relationship, which is one of the Geo
table or the Usr table.
• properties: Describe the attribute information of the relationship. If there are multiple
attributes, different column names can be used to define multiple columns of data.
Rel Table

Rel Table: Entity relational information


rel_id, type, origin_id, destination_id, properties (multiple columns).

rel_id type origin_id destination_id cost geo_id type coordinates


0 geo 716328 716328 0 773869 Point [-118.31828, 34.15497]
1 geo 716328 716331 4123.8 767541 Point [-118.23799, 34.11620]
2 geo 716328 716337 5179.6 767542 Point [-118.23818, 34.11640] Social Network
3 geo 716328 716339 7245.5 717447 Point [-118.26772, 34.07248]
4 geo 716328 716939 4785.1 717446 Point [-118.26572, 34.07142]

METR_LA.rel METR_LA.geo

Road Network
Dyna Table

Dyna Table: Traffic state information


dyna_id, type, time, entity_id(multiple columns), properties(multiple columns).
• dyna_id: The primary key uniquely determines a record in the Dyna table.
• type: The type of dyna. There are two values: trajectory (for trajectory based task)
and state (for traffic state prediction task).
• time: Time information, using the date and time combination notation in ISO-8601
standard, such as: 2020-12-07T02:59:46Z.
• entity_id: Describe which entity the record is based on, which is the ID of geo or usr.
• properties: Describe the attribute information of the record. If there are multiple
attributes, different column names can be used to define multiple columns of data, such
as both speed data and flow data.
Dyna Table

Dyna Table: Traffic state information


• type:state

• Point-based / Road-based / Region-based / Grid-based / Od-based / Grid-Od-based

• The format is: dyna_id, state, time, entity_id, properties.


• The entity_id column varies with the changes of different data structures for ease of use.

• The rows in the table should be aggregated according to <entity_id>, rows with the same
<entity_id> are sorted by <time>.
Dyna Table

Dyna Table: Traffic state information


• type:state——Point-based / Road-based / Region-based

• The format is: dyna_id, state, time, entity_id, properties.

• For entities that can use one-dimensional numbering for sensors, road sections, areas,
etc., entity_id is the corresponding ID, the column name is [entity_id], and the file suffix
name is .dyna.

dyna_id type time entity_id traffic_speed geo_id type coordinates


0 state 2012-03-01T00:00:00Z 773869 64.375 773869 Point [-118.31828, 34.15497]
1 state 2012-03-01T00:05:00Z 773869 62.66667 767541 Point [-118.23799, 34.11620]
2 state 2012-03-01T00:10:00Z 773869 64 767542 Point [-118.23818, 34.11640]
3 state 2012-03-01T00:15:00Z 773869 0 717447 Point [-118.26772, 34.07248]
4 state 2012-03-01T00:20:00Z 773869 0 717446 Point [-118.26572, 34.07142]

METR_LA.dyna METR_LA.geo Point-based


Dyna Table

Dyna Table: Traffic state information


• type:state——Grid-based

• The format is: dyna_id, state, time, entity_id, properties.

• For grid-based traffic data, the entity_id is [row_id, column_id], and the file extension
is .grid.

dyna_id type time row_id column_id risk geo_id type coordinates row_id column_id
0 state 2013-01-01T00:00:00Z 0 0 0 0 Polygon [] 0 7
1 state 2013-01-01T01:00:00Z 0 0 0 1 Polygon [] 0 8
2 state 2013-01-01T02:00:00Z 0 0 0 2 Polygon [] 0 10
3 state 2013-01-01T03:00:00Z 0 0 0 3 Polygon [] 0 11
4 state 2013-01-01T04:00:00Z 0 0 0 4 Polygon [] 0 12

NYC_RISK.grid NYC_RISK.geo
Grid-based
Dyna Table

Dyna Table: Traffic state information


• type:state——Od-based

• The format is: dyna_id, state, time, entity_id, properties.

• For od-based traffic data, the entity_id is [origin_id, destination_id], and the file suffix
name is .od.

dyna_id type time origin_id destination_id flow geo_id type coordinates


0 Point [-118.31828, 34.15497]
0 state 2012-03-01T00:00:00Z 0 1 345
1 state 2012-03-01T00:05:00Z 0 2 277 1 Point [-118.23799, 34.11620]

2 state 2012-03-01T00:10:00Z 0 3 64 2 Point [-118.23818, 34.11640]

3 state 2012-03-01T00:15:00Z 0 4 0 3 Point [-118.26772, 34.07248]


4 state 2012-03-01T00:20:00Z 1 2 0 4 Point [-118.26572, 34.07142]

Data.od Data.geo
OD-based
Dyna Table

Dyna Table: Traffic state information


• type:state——Grid-Od-based

• The format is: dyna_id, state, time, entity_id, properties.

• For grid-od-based traffic data, the entity_id is [origin_row_id, origin_column_id,


destination_row_id, destination_column_id], and the file extension is .gridod.

dyna_id type time orgin_row_id origin_column_id destination_row_id destination_column_id geo_id type coordinates row_id column_id
0 state 2013-01-01T00:00:00Z 0 1 2 1 0 Polygon [] 0 7
1 state 2013-01-01T01:00:00Z 0 1 2 1 1 Polygon [] 0 8
2 state 2013-01-01T02:00:00Z 0 1 2 1 2 Polygon [] 0 10
3 state 2013-01-01T03:00:00Z 0 1 2 1 3 Polygon [] 0 11
4 state 2013-01-01T04:00:00Z 0 1 2 1 4 Polygon [] 0 12

Data.gridod Data.geo
Dyna Table

Dyna Table: Traffic state information


• type:trajectory

• GPS point trajectory / Road segement-based trajectory / Check-in trajectory

• The format is: dyna_id, type, time, entity_id, (traj_id), properties.


• The entity_id column should be usr_id.

• The traj_id column represents the number of multiple trajectories of the same user (starting from 0) and if
the user has only one trajectory this column can be empty.

• The rows in the table should be aggregated according to <entity_id>, rows with the same
<entity_id> are sorted by <traj_id>, and rows with the same <traj_id> are sorted by
<time>.
Dyna Table

Dyna Table: Traffic state information


• type:trajectory —— GPS point trajectory

• The format is: dyna_id, type, time, entity_id, (traj_id), coordinates, properties.

• The coordinates column is the latitude and longitude of the GPS point.

dyna_id type time entity_id traj_id coordinates current_state usr_id


0 trajectory 2014-08-03T18:29:00Z 810 0 "[104.115353,30.64392]" 1
810
1 trajectory 2014-08-03T18:29:40Z 810 0 "[104.113091,30.642129]" 1
811
... ... ... ... ... ... ...
21 trajectory 2014-08-03T18:53:23Z 810 0 "[104.076552,30.626844]" 0 812
22 trajectory 2014-08-03T18:13:00Z 810 1 "[104.106701,30.6916]" 1
813
... ... ... ... ... ... ...
241 trajectory 2014-08-03T18:16:00Z 11919 7 "[104.100816,30.706191]" 1 814
GPS Point Trajectory
chengdu.usr
chengdu.dyna
Dyna Table

Dyna Table: Traffic state information


• type:trajectory —— Road segement-based trajectory
• The format is: dyna_id, type, time, entity_id, (traj_id), location, properties.
• The content of the location column is geo_id, which refs to the geo table and represents a
road segment.

dyna_id type time entity_id location geo_id type coordinates


usr_id
0 LineString [[-122.7323, 47.8899], [-122.7321, 47.8903]]
0 0 trajectory 2009-01-17T20:27:37Z 0 0
1 LineString [[-122.7321, 47.8903], [-122.7318, 47.8910]]
1 1 trajectory 2009-01-17T20:27:38Z 0 1
2 LineString [[-122.7318, 47.8910], [-122.7313, 47.8921]]
2 2 trajectory 2009-01-17T20:27:39Z 0 2
3 3 trajectory 2009-01-17T20:27:40Z 0 2 3 LineString [[-122.7313, 47.8921], [-122.7307, 47.8933]]

4 4 trajectory 2009-01-17T20:27:41Z 0 3 4 LineString [[-122.7307, 47.8933], [-122.7302, 47.8944]]

Data.usr Data.dyna Data.geo


Road segment-based
Trajectory
Dyna Table

Dyna Table: Traffic state information


• type:trajectory —— Check-in trajectory

• The format is: dyna_id, type, time, entity_id, (traj_id), location, properties.

• The content of the location column is geo_id, which refs to the geo table and represents a
POI.

usr_id dyna_id type time entity_id location geo_id type coordinates


0 Point [-122.7323, 47.8899]
0 0 trajectory 2009-01-17T20:27:37Z 0 0
1 Point [-122.7321, 47.8903]
1 1 trajectory 2009-01-17T20:27:38Z 0 1
2 Point [-122.7318, 47.8910]
2 2 trajectory 2009-01-17T20:27:39Z 0 2
3 3 trajectory 2009-01-17T20:27:40Z 0 2 3 Point [-122.7313, 47.8921]

4 4 trajectory 2009-01-17T20:27:41Z 0 3 4 Point [-122.7307, 47.8933]

Data.usr Data.dyna Data.geo


Check-in Trajectory
Ext Table

Ext Table: Additional auxiliary information


ext_id, time, properties (multiple columns).
• ext_id: The primary key uniquely determines a record in the external data table.
• time: Time information, using the date and time combination notation in ISO-8601
standard, such as: 2020-12-07T02:59:46Z.
• properties: Describe the attribute information of the record.

ext_id time temperature

0 2012-03-01T00:00:00Z 272.03

1 2012-03-01T00:05:00Z 271.46

2 2012-03-01T00:10:00Z 271.19

3 2012-03-01T00:15:00Z 271.07

4 2012-03-01T00:20:00Z 270.83

Data.ext
Data Type Definition

The data type definition of each column in the dataset needs to be given in
the config file, which is helpful for subsequent data processing.

Type Description
geo_id Discrete limited IDs that exist in the Geo table.
usr_id Discrete limited IDs that exist in the Usr table.
rel_id Discrete limited IDs that exist in the Rel table.
time Time string conforming to ISO-8601 standard.
coordinate String conforming to the coordinate representation in Geojosn format.
num Real number.
enum Enum string.
other The rest are stored in string type.
Config File

• The config file is used to supplement the information describing the above
five tables. It is stored in json format and consists of six keys: geo, usr, rel,
dyna, ext, and info.
• For geo, rel, dyna:
• Contains a key of including_types, and uses an array to describe the type values in the table. After that,
each type is used as a key, describing which keys are contained in the properties table and their data types
under the type.

• For usr, ext:


• Contains a properties key, describing which keys are contained in the properties table and their data types.

• For info:
• Contains other necessary statistical information of the dataset, for different traffic prediction tasks,
contains different contents.
Config File
"geo":{ "usr":{ "rel":{
"including_types":[ "properties":{ "including_types":[
"Point" "user_type":"enum", "geo"
], "birth_year":"time", ],
"Point":{ "gender":"enum" "geo":{
"poi_name":"other", } "link_weight":"num"
} }, }
} },

"dyna":{ "ext":{ "info": {


"including_types":[ "properties":{ "time_interval": 300,
"state" "temperature":"num" }
], }
"state":{ },
"entity_id":"geo_id",
"traffic_speed":"num"
}
},
Application

• We use the atomic files format to integrate 35 traffic datasets in


LibCity, an open library for traffic prediction, which solves the
problem of inconsistencies in the storage format of traffic datasets.

LibCity: https://libcity.ai/
Application
• LibCity has converted 35 open source datasets covering 22 cities in 11 countries
into standard atomic format datasets.
• LibCity also open sourced the atomic file conversion scripts for users to refer to
when converting their own traffic datasets.

Datasets Spatial Distribution Dataset Statistics Table


LibCity Datasets: https://github.com/LibCity/Bigscity-LibCity-Datasets
Thanks for listening.

Jingyuan Wang
Beihang University, Beijing, China
[email protected], http://www.bigcity.ai

You might also like