56 ZoneClustering (SIDT2009)
56 ZoneClustering (SIDT2009)
56 ZoneClustering (SIDT2009)
1. Introduction
disconnected shapes. We also consider the effect of physical barriers, that are
represented here by a set D of s dividers. Each element d∈D is associated with a
polyline {p1d, … , pid, … , pq(d)d} that denies all adjacencies fully covered by it.
Starting from the cell polylines, it is possible to obtain a more useful representation
of the partition topology given by the adjacency matrix, a symmetric Boolean m×m
matrix whose generic element Au,v is 0 if cell u is not adjacent to cell v. Otherwise,
two degrees of adjacency can be taken into account: weak adjacency, if their
polylines have at least one vertex in common, then Au,v is 2; strong adjacency if they
share at least one edge, then Au,v is 1. We will consider the latter.
To determine matrix A we can apply the following algorithm:
A=0
for u = 1, … , m and v = u+1, … , m
for h = 1, … , r(u) and k = 1, … , r(v)
if obb(pu,h, pv,k) then
for i = 1, ... , q(u,h) and j = 1, … , q(v,k)
if ppd(piu,h, pj+1v,k) < ε and ppd(pi+1u,h, pjv,k) < ε then
a=0
for d = 1, … , s :
if obb(pd, 0.5 ⋅ pi+1u,h + 0.5 ⋅ pjv,k – 0.5 ⋅ piu,h + 0.5 ⋅ pj+1v,k) then
b1 = 0, b2 = 0
for x = 1, ... , q(d)
if psd(0.5 ⋅ piu,h + 0.5 ⋅ pj+1v,k, pxd, px+1d) < ε then b1 = 1
if psd(0.5 ⋅ pi+1u,h + 0.5 ⋅ pjv,k, pxd, px+1d) < ε then b2 = 1
if b1 = 1 and b2 = 1 then a = 1 , exit for
if a = 0 then Au,v = 1
Av,u = Au,v
To speed up the geographic analysis we have introduced the function obb(p1, p2) that
turns TRUE if the bounding boxes of the two polylines p1 and p2 do overlap:
obb(p1, p2) = ρmax1 ≥ ρmin2 and ρmax2 ≥ ρmin1 and λmax1 ≥ λmin2 and λmax2 ≥ λmin1
where the notation is intuitive. Function ppd(p1, p2) expresses the distance between
point p1 and point p2, while function psd(p1, p2, p3) expresses the distance between
point p1 and the segment p2 – p3.
Our clustering problem consists in defining an m×n matrix B, where n is the number
of zones included in the set Z; Bu,z = 1, if cell u belongs to zone z; 0, otherwise:
max ∑o βo ⋅ fo(B, X)
s.t.
∑z = 1, … , n Bu,z = 1, ∀u∈U
∑χ∈U(z)* ∑π∈P(χ) Au,π(1) ⋅ Av,π(|π|) ⋅ ∏ i = 1, … , |π|-1 Aπ(i),π(i+1) ≥ 1 ,∀z∈Z, ∀u∈U(z), ∀v∈U(z)
where fo(B, X) is the o-th objective and βo is the corresponding weight, U(z) is the set
of cells grouped into zone z∈Z, U(z)* is the power set of U(z) given by all the
possible combinations of its cells and P(χ) is the set of all possible permutations of
the combination χ. The generic permutation π = {π(1), … , π(i), … , π(|π|)} represents a
sequence of cells.
The first constraint implies that every cell must belong to one and only one zone. The
second constraint says that every zone z must be connected. In this respect, a specific
sequence π identifies a “path” of adjacent cells connecting two cells u and v if it is:
Au,π(1) ⋅ Av,π(|π|) ⋅ ∏ i = 1, … , |π|-1 Aπ(i),π(i+1) = 1. On this base, the above constraint requires
the existence of a path connecting any two cells grouped into a same zone; this path
must be constituted by a sequence of adjacent cells all belonging to the zone itself.
2.2. Criteria and objectives
The zoning process entails two divergent objectives: to minimize the topological
displacement of trip terminals, i.e. to limit the approximations in the representation
of mobility; and to minimize the number of traffic zones, i.e. to limit the resources
involved in the data collection and in the computation (run time and memory storage
required by traffic simulation grow more than linearly with the number of centroids).
An additional criterion in the design of zone borders is to let them overlap on
administrative boundaries, because this facilitates data matching. However, physical
barriers (rivers, parks, infrastructures, ecc.) hinder diffuse connection, so that the
transport network topology may lead to an anisotropic accessibility pattern, where
long paths are required to travel between two points that are instead geographically
closed to each other. Satisfying these shapes will also enhance the interpretation and
the visualization of results.
The design of traffic zones should also be strictly related to the definition of the
relevant transport network and to the specific scope of the simulation, but this
question is of less importance when street and rail layers are retrieved from
commercial GIS databases that include all the existing infrastructures and when the
model in not aimed at a localized intervention.
In most cases the traffic zones result from a territorial aggregation of adjacent
administrative units (ranging from entire municipalities down to the census cells,
depending on the application) to allow the association with statistical attributes
(residents, workers ecc.), thus calling for contiguity constraints.
In the following we formalize four possible objectives.
1) demand displacement, i.e. the additional distance travelled on the network due to
concentrating all trip terminals to the zone centroids
f1(B, X) = -∑z∈Z ∑u∈U(z) Xku ⋅ imp((ρu, λu), (ρzk, λzk))
(ρzk, λzk) = ∑u∈U(z) Xku ⋅ (ρu, λu) / ∑u∈U(z) Xku center of mass with respect to attribute k
where k is a cell attribute (or combination of attributes) related to travel demand (e.g.
residents plus workers), and imp(p1, p2) expresses the average undirected impedance
on the road network between the geographic points p1 and p2
2) zone shape regularity, i.e. the degree of similarity to a circle (between 0 and 1)
f2(B, X) = ∑z∈Z Az / (3.14 ⋅ Rz2)
Az = ∑u∈U(z) Au area of the zone, where Au is the area of cell u, as a particular attribute
Rz = max{ppd(piu,h, (ρzk, λzk)): u∈U(z); h = 1, … , r(u); i = 1, … , q(u,h)} ray of the zone
3) intra-zone homogeneity, i.e. the standard deviation among the cells grouped into a
same zone, with reference to a proper density attribute k, or combination of attributes
(e.g. density of residents, density of workers)
f3(B, X) = ∑z∈Z (∑u∈U(z) (Xku - Mzk)2 / ∑u∈U(z) 1)0.5
Tzk = ∑u∈U(z) Xku , Mzk = Tzk / ∑u∈U(z) 1
4) inter-zone homogeneity, i.e. the standard deviation among the zone totals of some
proper mass attribute k, or combination of attributes (e.g. number of resident,
number of workers)
f4(B, X) = (∑z∈Z (Tzk - Mk)2 / ∑z∈Z 1)0.5
Mk = ∑z∈Z Tzk / ∑z∈Z 1
2.3. Algorithm
The proposed procedure inputs geographical information data in a standard form,
such as the ISTAT database, included the shapefile of the cells. On this basis, we can
build the attributes vector X the adjacency matrix A.
Then an agglomerative hierarchical clustering process is adopted. Starting from m
different clusters, each one containing a single cell, at every step two adjacent
clusters are merged, accordingly to the best improvement of the objective function.
This way, at each step we have one less cluster; when the desired number n of zones
is reached, the procedure is stopped. The presence of multiple possible objectives
introduces as many control parameters, producing interesting results for a sensitivity
analysis.
References
Barnard J.M. (1996) Agglomerative Hierarchical Clustering Package from Barnard Chemical
Information, Ltd. Presented at the Daylight EUROMUG Meeting, Basel, Switzerland.