56 ZoneClustering (SIDT2009)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

SIDT 2009 International Conference 1

Clustering methods for the automatic design of traffic zones


Guido Gentile1, Daniele Tiddi1
1
DITS - Dipartimento di Idraulica Trasporti e Strade, Sapienza Università di Roma
[email protected] , [email protected]

1. Introduction

1.1. Motivation of this research


No transport study can disregard the zoning phase, since land discretization implies a
relevant approximation in the demand model and constitutes the earliest assumption
in the simulation process: all the activities carried out in a traffic zone are
concentrated in one single point, the centroid node. Therefore any trip starts at an
origin centroid and ends at a destination centroid. Moreover, zoning is performed
manually in most cases, so that the suitability of the result heavily relies on the
modeller’s skills and experience. For these reasons the design of traffic zones is
considered an expensive and delicate phase of transport planning.
When approaching a new project, if possible, practitioners avoid then to perform
deep modifications to the pattern of an existing land discretization and usually limit
the changes to aggregation or disaggregation of the current traffic zones, because
this also allows an easier comparison with previous studies and available data.
Indeed, many results are mapped to traffic zones: o-d matrices expressing demand
flows, skim matrices expressing levels of service; and their aggregations, i.e.
respectively trip generation and attraction, active and passive accessibility.
However, transport modelling is nowadays applied also in other fields besides urban
planning: in particular traffic management and infomobility. In these cases, it often
happens to develop a model from scratch and with few resources, so that the
modeller has to rely on standardized data and procedures. To this end, in this paper a
clustering algorithm for the automatic design of traffic zones is proposed. The
method will be applied to a real case with the aim of analyzing the sensitivity of the
result to the parameters of the heuristic and to the weights of the problem objectives.

Guido Gentile, Daniele Tiddi


2 Milan 29-30 June 2009

1.2. Literature review


Studies on geographic clustering have been carried out in many different fields, such
as chemistry, economics, medicine and, of course, transportation. Barnard (1996)
and Downs (2001) review the existing algorithms and analyze strength and
weakness of each specific approach. Recent works integrate the classical clustering
methods with the latest tools of operation research; e.g. Corsini et al. (2005) adopt
fuzzy logic to set belonging relations, while Laszlo et al. (2007) propose a genetic
algorithm to determine the cluster sets.
However, the k-means method appears still to be one of the most used paradigms, for
its quickness of computation and easiness of implementation. Unfortunately, that
approach does not allow to simply take into account contiguity constraint, that are
instead essential to the problem addressed in this paper, and in general not many
contributions are available in this perspective. Among the few, Murtagh (1995)
tackles a clustering problem with contiguity constrains about pixels on a grid using a
hierarchical agglomerative algorithm. In this paper a similar approach is proposed,
where the 8-neighbour pixels are substituted by a more general cell adjacency
matrix.

2. The zoning process

2.1. Mathematical formulation


Let the study area be partitioned into a set U of m two-dimensional cells overlapping
only at borders. The generic cell u∈U is characterized by a list of l attributes, that
numerically describe its relevant features, expressed by a vector {X1u, … , Xku, … , Xlu}.
From a geographical point of view each cell u is associated with a point of
coordinates (ρu, λu), which typically coincides with the mass centre of its area, and
with a shape described by r(u) polylines; the generic h-th polyline is a sequence of
q(u,h) points {p1u,h, … , piu,h, … , pq(u,h)u,h}, where piu,h has coordinates (ρiu,h, λiu,h).
One last segment pq(u,h)u,h – pq(u,h)+1u,h, where pq(u,h)+1u,h = p1u,h, closes implicitly the
area defined by convention as the space on the right of the polyline. A simply
connected polygon is described by a single sequence of points turning clockwise its
perimeter vertices, while the additional polylines are used to describe holed and

Guido Gentile, Daniele Tiddi


SIDT 2009 International Conference 3

disconnected shapes. We also consider the effect of physical barriers, that are
represented here by a set D of s dividers. Each element d∈D is associated with a
polyline {p1d, … , pid, … , pq(d)d} that denies all adjacencies fully covered by it.
Starting from the cell polylines, it is possible to obtain a more useful representation
of the partition topology given by the adjacency matrix, a symmetric Boolean m×m
matrix whose generic element Au,v is 0 if cell u is not adjacent to cell v. Otherwise,
two degrees of adjacency can be taken into account: weak adjacency, if their
polylines have at least one vertex in common, then Au,v is 2; strong adjacency if they
share at least one edge, then Au,v is 1. We will consider the latter.
To determine matrix A we can apply the following algorithm:

A=0
for u = 1, … , m and v = u+1, … , m
for h = 1, … , r(u) and k = 1, … , r(v)
if obb(pu,h, pv,k) then
for i = 1, ... , q(u,h) and j = 1, … , q(v,k)
if ppd(piu,h, pj+1v,k) < ε and ppd(pi+1u,h, pjv,k) < ε then
a=0
for d = 1, … , s :
if obb(pd, 0.5 ⋅ pi+1u,h + 0.5 ⋅ pjv,k – 0.5 ⋅ piu,h + 0.5 ⋅ pj+1v,k) then
b1 = 0, b2 = 0
for x = 1, ... , q(d)
if psd(0.5 ⋅ piu,h + 0.5 ⋅ pj+1v,k, pxd, px+1d) < ε then b1 = 1
if psd(0.5 ⋅ pi+1u,h + 0.5 ⋅ pjv,k, pxd, px+1d) < ε then b2 = 1
if b1 = 1 and b2 = 1 then a = 1 , exit for
if a = 0 then Au,v = 1
Av,u = Au,v

To speed up the geographic analysis we have introduced the function obb(p1, p2) that
turns TRUE if the bounding boxes of the two polylines p1 and p2 do overlap:
obb(p1, p2) = ρmax1 ≥ ρmin2 and ρmax2 ≥ ρmin1 and λmax1 ≥ λmin2 and λmax2 ≥ λmin1
where the notation is intuitive. Function ppd(p1, p2) expresses the distance between
point p1 and point p2, while function psd(p1, p2, p3) expresses the distance between
point p1 and the segment p2 – p3.

Guido Gentile, Daniele Tiddi


4 Milan 29-30 June 2009

Our clustering problem consists in defining an m×n matrix B, where n is the number
of zones included in the set Z; Bu,z = 1, if cell u belongs to zone z; 0, otherwise:

max ∑o βo ⋅ fo(B, X)
s.t.
∑z = 1, … , n Bu,z = 1, ∀u∈U
∑χ∈U(z)* ∑π∈P(χ) Au,π(1) ⋅ Av,π(|π|) ⋅ ∏ i = 1, … , |π|-1 Aπ(i),π(i+1) ≥ 1 ,∀z∈Z, ∀u∈U(z), ∀v∈U(z)

where fo(B, X) is the o-th objective and βo is the corresponding weight, U(z) is the set
of cells grouped into zone z∈Z, U(z)* is the power set of U(z) given by all the
possible combinations of its cells and P(χ) is the set of all possible permutations of
the combination χ. The generic permutation π = {π(1), … , π(i), … , π(|π|)} represents a
sequence of cells.
The first constraint implies that every cell must belong to one and only one zone. The
second constraint says that every zone z must be connected. In this respect, a specific
sequence π identifies a “path” of adjacent cells connecting two cells u and v if it is:
Au,π(1) ⋅ Av,π(|π|) ⋅ ∏ i = 1, … , |π|-1 Aπ(i),π(i+1) = 1. On this base, the above constraint requires
the existence of a path connecting any two cells grouped into a same zone; this path
must be constituted by a sequence of adjacent cells all belonging to the zone itself.
2.2. Criteria and objectives
The zoning process entails two divergent objectives: to minimize the topological
displacement of trip terminals, i.e. to limit the approximations in the representation
of mobility; and to minimize the number of traffic zones, i.e. to limit the resources
involved in the data collection and in the computation (run time and memory storage
required by traffic simulation grow more than linearly with the number of centroids).
An additional criterion in the design of zone borders is to let them overlap on
administrative boundaries, because this facilitates data matching. However, physical
barriers (rivers, parks, infrastructures, ecc.) hinder diffuse connection, so that the
transport network topology may lead to an anisotropic accessibility pattern, where
long paths are required to travel between two points that are instead geographically
closed to each other. Satisfying these shapes will also enhance the interpretation and
the visualization of results.
The design of traffic zones should also be strictly related to the definition of the

Guido Gentile, Daniele Tiddi


SIDT 2009 International Conference 5

relevant transport network and to the specific scope of the simulation, but this
question is of less importance when street and rail layers are retrieved from
commercial GIS databases that include all the existing infrastructures and when the
model in not aimed at a localized intervention.
In most cases the traffic zones result from a territorial aggregation of adjacent
administrative units (ranging from entire municipalities down to the census cells,
depending on the application) to allow the association with statistical attributes
(residents, workers ecc.), thus calling for contiguity constraints.
In the following we formalize four possible objectives.

1) demand displacement, i.e. the additional distance travelled on the network due to
concentrating all trip terminals to the zone centroids
f1(B, X) = -∑z∈Z ∑u∈U(z) Xku ⋅ imp((ρu, λu), (ρzk, λzk))
(ρzk, λzk) = ∑u∈U(z) Xku ⋅ (ρu, λu) / ∑u∈U(z) Xku center of mass with respect to attribute k
where k is a cell attribute (or combination of attributes) related to travel demand (e.g.
residents plus workers), and imp(p1, p2) expresses the average undirected impedance
on the road network between the geographic points p1 and p2

2) zone shape regularity, i.e. the degree of similarity to a circle (between 0 and 1)
f2(B, X) = ∑z∈Z Az / (3.14 ⋅ Rz2)
Az = ∑u∈U(z) Au area of the zone, where Au is the area of cell u, as a particular attribute
Rz = max{ppd(piu,h, (ρzk, λzk)): u∈U(z); h = 1, … , r(u); i = 1, … , q(u,h)} ray of the zone

3) intra-zone homogeneity, i.e. the standard deviation among the cells grouped into a
same zone, with reference to a proper density attribute k, or combination of attributes
(e.g. density of residents, density of workers)
f3(B, X) = ∑z∈Z (∑u∈U(z) (Xku - Mzk)2 / ∑u∈U(z) 1)0.5
Tzk = ∑u∈U(z) Xku , Mzk = Tzk / ∑u∈U(z) 1

4) inter-zone homogeneity, i.e. the standard deviation among the zone totals of some
proper mass attribute k, or combination of attributes (e.g. number of resident,
number of workers)
f4(B, X) = (∑z∈Z (Tzk - Mk)2 / ∑z∈Z 1)0.5
Mk = ∑z∈Z Tzk / ∑z∈Z 1

Guido Gentile, Daniele Tiddi


6 Milan 29-30 June 2009

2.3. Algorithm
The proposed procedure inputs geographical information data in a standard form,
such as the ISTAT database, included the shapefile of the cells. On this basis, we can
build the attributes vector X the adjacency matrix A.
Then an agglomerative hierarchical clustering process is adopted. Starting from m
different clusters, each one containing a single cell, at every step two adjacent
clusters are merged, accordingly to the best improvement of the objective function.
This way, at each step we have one less cluster; when the desired number n of zones
is reached, the procedure is stopped. The presence of multiple possible objectives
introduces as many control parameters, producing interesting results for a sensitivity
analysis.

References
Barnard J.M. (1996) Agglomerative Hierarchical Clustering Package from Barnard Chemical
Information, Ltd. Presented at the Daylight EUROMUG Meeting, Basel, Switzerland.

Downs G. (2001) Clustering in Chemistry. Presented at MathFIT Workshop, Belfast, UK.


Corsini P., Lazzerini B., Marcelloni F. (2005) A new fuzzy relational clustering algorithm based on the
fuzzy C-means algorithm. Soft Computing 9, 439 - 447.
Laszlo M., Mukherjee S. (2007) A genetic algorithm that exchanges neighbouring centers for k-means
clustering. Pattern recognition letters 28, 2359 - 2366.
Murtagh F. (1995) Interpreting the Kohonen self organizing feature map using continuity-constrained
clustering. Pattern Recognition Letters 16, 399-408.

Guido Gentile, Daniele Tiddi

You might also like