0% found this document useful (0 votes)
14 views

2011-Structural Image Classification With Graph Neural Networks

The document discusses using graph neural networks to classify images based on structural representations of the images. Different methods of constructing graphs from images are explored, including using segmented regions, minimum spanning trees, and Delaunay triangulation of local image regions. Experimental results show potential for further work in using graph neural networks for image classification based on structural representations.

Uploaded by

Marcelo Vieira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

2011-Structural Image Classification With Graph Neural Networks

The document discusses using graph neural networks to classify images based on structural representations of the images. Different methods of constructing graphs from images are explored, including using segmented regions, minimum spanning trees, and Delaunay triangulation of local image regions. Experimental results show potential for further work in using graph neural networks for image classification based on structural representations.

Uploaded by

Marcelo Vieira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2011 International Conference on Digital Image Computing: Techniques and Applications

Structural Image Classification with


Graph Neural Networks
Alyssa Quek∗ , Zhiyong Wang∗ , Jian Zhang† and Dagan Feng∗

School of Information Technologies, The University of Sydney
† Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology, Sydney

Email: [email protected], [email protected], [email protected], [email protected]

Abstract—Many approaches to image classification tend to [6][7] or simple tree structures [8]. While the use of local
transform an image into an unstructured set of numeric feature features are generally favoured for its segmentation-free ability
vectors obtained globally and/or locally, and as a result lose to locate distinctive regions, global features are able to capture
important relational information between regions. In order to
encode the geometric relationships between image regions, we the “gist” of an image and supply a rich set of cues to its image
propose a variety of structural image representations that are category [9].
not specialised for any particular image category. Besides the Thus, we explore variants of structural approaches that
traditional grid-partitioning and global segmentation methods, can handle models with hundreds of regions. In order to be
we investigate the use of local scale-invariant region detectors. able to select distinctive regions, in the context of common
Regions are connected based not only upon nearest-neighbour
heuristics, but also upon minimum spanning trees and Delaunay photometric and geometric image transformations, we consider
triangulation. In order to maintain the topological and spatial region detection with local scale-invariant regions and compare
relationships between regions, and also to effectively process this approach with global image segmentation and partitioning
undirected connections represented as graphs, we utilise the methods.
recently-proposed graph neural network model. To the best of Graphs are natural data structures to model relationships,
our knowledge, this is the first utilisation of the model to process
graph structures based on local-sampling techniques, for the task with nodes representing regions and edges encoding the re-
of image classification. Our experimental results demonstrate lationships between them. Images within the same category
great potential for further work in this domain. often possess a similar structure (for example, hair-eyes-nose-
Keywords-Image classification, structural representation, graph mouth in category face). In addition, the spatially-proximate
neural networks, region adjacency graph, minimum spanning regions in an image can be connected in a variety of loose
tree, Delaunay triangulation. geometric assemblies. However, finding an optimal geometric
model or graph suitable for use across all image categories is
I. I NTRODUCTION combinatorially expensive.
Current popular image representations have shifted to the Traditional machine learning models used for classification
use of local invariant features, which are more robust to noise problems cope with graph-structured data by performing a pre-
and are able to handle various common photometric and ge- processing stage to map the graph information into a simpler
ometric image transformations (for example, lighting changes representation such as a numeric vector of floating-point values
and viewpoint differences), in comparison to their global [10]. The “flattened” list-based data loses some important
counterparts. One such representation, which has demonstrated topological relationships between the structural components
great successes in image classification and retrieval, is known (e.g. nodes), and the final result is ultimately dependent on
as the bag-of-features or bag-of-visual-words model [1] [2] [3]. the details of the preprocessing algorithm used. Recently, a
However, a key disadvantage of such an orderless and unstruc- graph neural network (GNN) model [11] has been proposed
tured model is the loss of structural information, namely, the to perform supervised learning on graph data structures. It has
relationship between regions. Another popular representation been successfully applied to a number of applications such as
is the constellation model, which involves connecting a fixed web search [12], text mining [13][14], object localization [15],
and limited number of parts (typically six or seven) into a pre- and image classification [6][7].
defined structure such as the fully-connected or star model In this paper, we investigate various graph formations with
[4]. Unfortunately, the formation of these models involves different node selections and edge connections, which will
applying very precise geometric constraints on the feature enable an integration of both visual features and structural con-
locations, and the limited number of parts means ignoring text. We begin by constructing undirected graphs out of some
a good deal of information available in images. Meanwhile, commonly-used structures in image analysis: a 4-connected
structural representations allow an arbitrary number of features uniformly-sampled grid and the RAG. These are compared
to be situated at varying locations and tend to deal with a larger to two famous graph structures, the minimum spanning tree
number of regions. These regions are often identified based (MST) [16] and Delaunay triangulation [17] constructed with
on a global segmentation or global partitioning techniques [5] local regions. In order to preserve the graph structure of an
and are usually connected as a region adjacency graph (RAG) image and incorporate the topological relationships between

978-0-7695-4588-2/11 $26.00 © 2011 IEEE 414


416
DOI 10.1109/DICTA.2011.77
the region nodes during the learning and classification phases,
we employ the recently-proposed graph neural network (GNN)
model [11]. Employing the GNN also allows us to model the
input domain as undirected graphs, which was previously not
possible with the traditional approaches (such as recursive
neural networks and Markov chains). To the best of our
knowledge, this is the first combination of the GNN model
with local-based structural image representations.
II. S TRUCTURAL I MAGE R EPRESENTATION
Graphs can be conceived as a straightforward extension
to the set representation of images as a collection of local
image features. As part of the simplicity of the unstructured
set representation, the contribution of each feature is assumed
independent, with no connection to any other. The use of a
graph allows us to form relationships and encode information
between features.
A. Graph Construction
If we partition an image into a set of regions, the struc-
tural context of an image can be represented as a graph
G = {N, E}, where N (nodes) correspond to the interest
regions and E (edges) correspond to the connections between
nodes. We explore four main methods to form an undirected
connectivity graph for an image. Fig. 1 presents the four
structural representations which will be described in turn.
1) Region adjacency graph : To create a region adjacency
graph (RAG), an image is first segmented into several parts.
We utilise a boundary-detection scheme based on edge flow
[18] to isolate regions. The edge flow technique is based
on a predictive coding method and identifies the directional
change of colour and texture at each image location at a
given scale (the only key parameter influencing segmentation Fig. 1. Structural representation. (a) The original image. (b) and (c) display
result) to create an edge flow vector. By propagating the edge the structural representations based on global-sampling techniques, while
(d) and (e) display the structural representations based on local-sampling
flow vectors, boundaries can be detected at locations which techniques (for illustrative purposes, we have reduced the number of regions).
encounter two opposing directions of colour and texture flow
in the stable state.
For all experiments, we set the scale control parameter in the 3) Minimum spanning tree : A spanning tree is a graph
algorithm to 16 for reasonable segmentation results. Due to the that connects all nodes. A minimum spanning tree (MST)
input requirements of our selected descriptor, the SIFT (Scale- [16] is a spanning tree with the lowest total cost in forming
Invariant Feature Transform) descriptor [19], these segmented a connected path to all nodes. Each node is represented
regions must be fitted into ellipses. For this step, we adopt by a region identified using a local feature detector. We
the least-squares method proposed by Fitzgibbon et al. [20] to detect interest regions using the scale-invariant blob detector,
find the best-fit ellipse with 2D points for each region. Finally, Hessian-Laplace [21] , and connect them to form a MST. The
to form the RAG, adjacent segmented regions are connected. cost is based on the Euclidean distance between the region
Regions that occupy less than four pixels are automatically centers in the 2-dimensional image space. Note that it is also
discarded by our approach. feasible to employ other distance metrics in constructing the
2) Grid with dense sampling : Dense sampling consistently MST.
samples a circular region at every nth pixel of the image. To 4) Delaunay triangulation : Adopted from the field of
reduce the overlap between circular regions, we offset every Computational Geometry, a Delaunay triangulation [17] for
odd row by half the sampling increment (n/2). To create a a set P of points in the image plane is a triangulation such
grid structure, each node is connected to its 4-neighbours (one that no point in P is inside the circumcircle of the triangles.
above, below, left and right) where applicable. We set n to 16 The circumcircle of a triangle is a circle which passes through
pixels in the grid representation, since the employed SIFT all corners of the triangle.
descriptor divides a detected region into 16 × 16-pixel sub- We use the same scale-invariant detector as was used for
regions. the MST, which yields the same set of interest regions. The

417
415
centers of the detected regions make up the set of points on our experiments, we set k to 150 and perform clustering on
which to create the Delaunay triangulation. Among the four 100 training images. With k clusters computed per-category
mentioned structural representations, Delaunay triangulation across the whole dataset, each cluster yields one representative
creates the most edges per node in an image. node, the centroid of that cluster. Each node in an image-graph
associates, based on a criterion, with a node from the repre-
B. Node and Edge Labels sentative set. We use the criterion of the minimum Euclidean
Given a set of regions extracted from an image, constructing distance, though this can be substituted with other metrics such
a fully-labelled graph also involves the selection of suitable as correlation or Mahalanobis distance. At this point, each
node and edge attributes for the respective node and edge node in a graph is associated with one centroid. In each cluster,
labels. Node labels incorporate features extracted from interest only the node closest to the centroid is retained. Finally, the
regions. Edge labels incorporate inter-node spatial information number of nodes per graph is reduced to be at maximum the
that is discarded in traditional set-based representations. number of clusters, k. The key advantage of employing this
filtering technique is that clustering need be performed only
once per-category, thus allowing new categories to be added
without re-computing the clusters. We apply further heuristics
to remove non-meaningful clusters (such as clusters with few
node members and outliers based on a threshold), which leaves
behind 85 − 100 final clusters per category.
III. C LASSIFICATION WITH GRAPH NEURAL NETWORKS
Fig. 2. Node and edge labelling method. Each node n has co-ordinates (x, y) The graph neural network (GNN) model [23][11] ex-
and scale s, ln is a node label, and l(ni ,nj ) is an edge label.
tends existing neural network methods for processing graph-
structured data. Unlike traditional recursive neural network
Our labelling method is illustrated schematically in Fig. 2. (RNN) models [24][25], whose input domain consists of
Each region yields a single node n, which is defined by directed acyclic graphs, GNN S are able to process a wider
the center co-ordinates (x, y) and the scale s of the region. general class of graphs including acyclic, cyclic, directed
This scale is automatically determined by the scale-invariant and undirected. Additionally, GNNs are able to process both
Hessian-Laplace detector. The label ln of a node n is a positional and non-positional graphs.
128-dimensional SIFT (Scale-Invariant Feature Transform) de- The underlying idea of the GNN model is that nodes in a
scriptor [19], which is based on grey-level gradient intensities. graph can represent objects or concepts, and edges represent
Each edge is labelled with three attributes derived from the their relationships. Information attached to a node n, called
positions and labels of the two nodes (ni , nj ) it connects. We a node label (ln ), usually includes features of the object (for
employ edge labels similar to those used by Revaud et al. [22] example, area and colour intensity). Similarly, an edge label
for object recognition: (l(n1 ,n2 ) ) includes features of the object relationships (such as
dist(ni ,nj )
• si +sj , the Euclidean distance between the nodes distance and angle).
ni and nj , normalised with respect to their scales. The For each node n, the GNN defines a state xn which is
denominator is always greater than 1. attached to each node, based on the information contained in
|si −sj |
• max(si ,sj ) , the normalised scale difference
the neighbourhood of n (see Fig. 3). The state xn contains
• dist(lni − lnj ), the normalised Euclidean distance a representation of the concept being modelled and can be
between the descriptors of the node labels. used to produce an output decision value on . For the task of
image classification, this output value can be interpreted as a
C. Node Filtering confidence value that a particular node belongs to a category or
With each interest region being represented by a node, this class. In the RNN, an output state is only attached to a single
approach may become computationally infeasible when the supersource node, which has to be explicitly selected in the
number rises to the hundreds-to-thousands per image. This is input domain. In contrast, the GNN allows supervision to be
especially the case when using the local scale-invariant detec- placed on every node, or a subset of them, to each produce an
tor, since multiple regions can be detected at different scales output decision value. For each iteration and for each node in
around the same locations. These identified regions of interest the graph, the connection weights are used to adapt the overall
can contain recurring patterns in different attribute spaces — network to fit the desired targets. The weights are updated by
such as recurring colours, textures, or SIFT descriptors. In the resilient back-propagation strategy [26], which is one of the
order to reduce the quantity of nodes in each image graph, most efficient strategies with feedforward neural architectures.
we employ a clustering method. This heuristically identifies
IV. E XPERIMENTS
nodes with similar patterns and selects a set of representative
nodes as part of our node filtering method. A. Experimental Settings
Firstly, for each image category in the dataset, a set of k Evaluation of the structural representations is performed
representative nodes is pre-selected based upon clustering. In upon four widely-varying unnormalized image categories from

418
416
performed a holdout method on the dataset. This involved
splitting the per-category image subset into a training set,
validation set and test set containing 150, 50 and 150 images
respectively. Due to the random initialisation of weights in
the GNN model, we reported on the average performance
over five experimental runs. Each run repeated the training
and classification process with different image subsets from
the entire dataset. In each run, the one-against-all strategy
was used to build a separate model for each category, in
which each model was trained to distinguish the images of
that category (positive class) from the images of the other
categories (negative class).
Instead of selecting one representative node to be supervised
per graph, we set all nodes per graph to be supervised. While
this increased the learning time significantly, we chose not to
constrain the model with one supersource node, which was
required in common RNN S [24]. This decision was further
supported by Di Massa’s [7] experimental results, in which it
Fig. 3. Each unique node is represented by a number. The bold node, node 1, was noted that more stable results were achieved by supervis-
has a state (x1 ), which depends on its label (l1 ), the labels of its connecting ing more nodes. By supervising all nodes, each node in a graph
edges (lco[n] ), the states of neighbouring nodes (xne[1] ) and their labels produced its own classified output-value between −1 and 1. In
(lne[n] ) [11].
order to report on the graph-focused or image-level result, for
each input graph, we simply averaged all its individual node
output-values. We note that other heuristics may be utilised to
the dataset collected by Fergus et al. [27]. A small selection
obtain a graph-focused result (such as removing outlier node
of images (resized to similar heights for the purpose of
output-values or introducing thresholds), but we have left this
presentation) from each of the four classes are shown in Fig. 4.
experimentation for future work.
Some images were extracted from the benchmark Caltech
We configured the GNN parameters to the commonly-used
database [28] and others were collected from Google’s image
two hidden layers, with ten hidden neurons and a linear acti-
search and are highly variable in nature.
vation function, matching the setup used in [6]. The maximum
number of iterations was set to 2500, and to reduce over-fitting,
evaluation of the GNN model parameters was performed every
20 epochs with the validation set [11]. We defined the cost to
be the mean squared error between the GNN’s output values
and the target values (1 or −1) from the validation set. The
model that achieved the lowest cost on the validation set was
considered the optimal model, and was then applied to the test
set.
Rather than using a fixed classification threshold on the
output values, we calculated the Receiver Operating Charac-
teristics (ROC) and reported the Area Under the ROC Curve
(AUC) [29]. Experiments were conducted with Matlab 7.9 on
a Linux system running on a quad-core 2.4GHz Intel CPU
and 16GB memory. Due to the current implementation of
the GNN emulator, dataset sizes and the number of classes
Fig. 4. Sample images from the dataset
had to be limited. This is due to the oversized matrices
created internally with larger input datasets, which hinders
We followed a similar experimental setup to that used by Di Matlab from performing the necessary complex mathematical
Massa [6] with the same image dataset. For each category, a operations on them. This issue is exacerbated by our chosen
subset of 350 images were randomly selected from the original approach of the supervision of all nodes per graph, and also
dataset. The selected images were split into two equally- by the large quantities of node and edges in some structural
sized positive and negative sets. For the negative half, we image representations.
performed stratification, which ensured that each category was B. Experimental Results and Discussions
represented in approximately equal proportion, in order to
avoid learning bias towards any particular category. Fig. 5 presents the average AUC performances of the four
To evaluate the performance of the classification task, we structural representations explained above. For the MST and

419
417
Delaunay structural representations, we present the clustering- objects and backgrounds are very diverse and do not share a
based node-filtering results, denoted with Clust, alongside the strong correlation.
non-filtered results, denoted with All.

V. C ONCLUSION AND F UTURE W ORK

In this paper, we presented structural image classification


with graph neural networks by exploring local-based struc-
tures for graph representation. Particularly, we investigated
two strategies for graph construction, the minimum spanning
tree, and Delaunay triangulation. Our experimental results
also demonstrated that the proposed local-based structural
representation with the GNN model has great potential in
image classification. Structural image representations based
on local techniques are advantageous over global techniques
in the presence of higher intra-class image variability (such
as lighting and viewpoint differences, and varying numbers of
instances). In particular, the MST achieved better classification
accuracy than Delaunay triangulation when there were a large
number of nodes, and was inferior when there were a small
number of nodes. The former situation could indicate that
the addition of edges merely incurred a higher learning cost,
Fig. 5. AUC performance comparison between structures created from with little or no classification improvement, when there are
global-based and local-based sampling techniques. Global-based structural a large number of nodes reasonably connected. Therefore,
representations include the RAG and grid, while local-based structural repre- it may be worthwhile to further investigate different graph
sentations include the MST and Delaunay triangulation. Also shown are the
performance averages across all categories and the average number of nodes. construction approaches such as combination of both local and
global structures.
Based on Fig. 5, results reveal the MST formed with The above observations could be further validated with
Hessian-Laplace regions to be the best performing structural larger datasets and more categories. In order to train the
representation, and the RAG based on global segmentation to GNN within a feasible amount of time, the incorporation of a
be the worst performing structural representation. While our filtering stage, to reduce the number of detected local regions,
clustering-based node-filtering method lowered performances is pertinent to larger-scale applications. Our cluster-based fil-
by around 11% for MST and Deluanay triangulation, there was tering approach could be improved by an alternative centroid-
a reduction of 88% in the number of nodes and a reduction of selection method, perhaps by selecting an actual node rather
3.5 hours in the learning time. We can see that the use of more than averaging, as in our method. Better heuristics could be
edges by Delaunay triangulation (average 3 edges per node) formulated for the removal of non-meaningful clusters, which
did not gain any significant improvements over MST (average might be based on an evaluation of cluster quality. Based
1 edge per node). We note that the grid structure based on our observations, any new filtering techniques applied to
upon dense sampling performed relatively well. It produced this task should retain a greater proportion of regions that
comparable results to the use of all Hessian-Laplace regions correspond to patterns that recur across the majority of images
connected by Delaunay triangulation, but with an average of in a category. In regard to the current computational cost, faster
46% fewer utilised nodes per graph. algorithms should also be investigated.
For both the MST and the grid structure, their best perform- Other aspects that are worth further exploration include the
ing category, Houses, contains images with the most consistent use of different detectors, which may incorporate additional
visual content at similar locations such as the sky, grass or affine properties (such as skew and rotation), colour infor-
concrete ground. These large textured areas, including the mation or detectors based on parts [30]. Different heuristics
house itself, are able to be well described with the grey- to connect regions together could be explored. For instance,
level, intensity-based SIFT descriptor. The presence of these joining closest regions in feature space, second-order neigh-
multiple similar patterns across most images in the same bourhood, or using scale information. Furthermore, we could
category, and the connections between them, were able to aid exploit the isolation of objects or areas by a global segmen-
in the classification process. This suggests that a combination tation technique, and form graphs with higher intra-segment
of local regions detected around objects of interest, along with connectivity and lower inter-segment connectivity. Finally, we
global background information could produce comparable could examine different attributes and attribute fusion, for both
results with fewer nodes. For the other two categories with the node and edge labels. These might include colour-invariant
lower classification performance, Camels and Guitars, both descriptors [31] for node labels and angles for edge labels.

420
418
VI. ACKNOWLEDGEMENTS [20] A. W. Fitzgibbon, M. Pilu, and R. B. Fisher, “Direct least-squares
fitting of ellipses,” IEEE Transactions on Pattern Analysis and Machine
The work presented in this paper was partially supported Intelligence, vol. 21, no. 5, pp. 476–480, May 1999.
by ARC (Australian Research Council) grants. A/Prof Zhang [21] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas,
was affliated with NICTA (National ICT Australia) during the F. Schaffalitzky, T. Kadir, and L. V. Gool, “A comparison of affine
region detectors,” International Journal on Computer Vision, vol. 65,
course of these experiments. The authors would like to thank no. 1-2, pp. 43–72, 2005.
S. Zhang and M. Hagenbuchner for their assistance with the [22] J. Revaud, G. Lavoué, Y. Ariki, and A. Baskurt, “Scale-invariant
GNN. proximity graph for fast probabilistic object recognition,” in Proceedings
of the ACM International Conference on Image and Video Retrieval,
2010, pp. 414–421.
R EFERENCES [23] G. Monfardini, “A recursive model for neural processing in graphical do-
[1] G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray, “Visual mains,” Ph.D. dissertation, Universit degli Studi di Siena, (Dipartimento
categorization with bags of keypoints,” in European Conference on di Ingegneria dell’Informazione), Siena, Italy, 2007.
Computer Vision. Workshop on Statistical Learning in Computer Vision, [24] P. Frasconi, M. Gori, and A. Sperduti, “A general framework for adaptive
Prague, Czech Republic, May 2004. processing of data structures,” IEEE Transactions on Neural Networks,
[2] J. Sivic and A. Zisserman, “Video Google: A text retrieval approach vol. 9, no. 5, pp. 768–786, September 1998.
to object matching in videos,” in IEEE International Conference on [25] M. Bianchini, M. Gori, and F. Scarselli, “Processing directed acyclic
Computer Vision, Washington, DC, USA, 2003, pp. 1470 – 1477. graphs with recursive neural networks,” IEEE Transactions on Neural
[3] P. Tirilly, V. Claveau, and P. Gros, “Language modeling for bag-of- Networks, vol. 12, no. 6, pp. 1464–1470, November 2001.
visual words image categorization,” in ACM International Conference on [26] M. Riedmiller and H. Braun, “A direct adaptive method for faster
Image and Video Retrieval, Niagara Falls, Ontario, Canada, July 2008, backpropagation learning: The rprop algorithm,” in IEEE International
pp. 249–258. Conference on Neural Networks, 1993, pp. 586–591.
[4] X. Cheng, Y. Hu, and L.-T. Chia, “Hierarchical word image repre- [27] R. Fergus, P. Perona, and A. Zisserman, “A sparse object category model
sentation for parts-based object recognition,” in IEEE International for efficient learning and exhaustive recognition,” in Proceedings of the
Conference on Image Processing, Nov 2009, pp. 301 –304. IEEE Conference on Computer Vision and Pattern Recognition, vol. 1,
[5] C. Jiang and F. Coenen, “Graph-based image classification by weighting Washington, DC, USA, 2005, pp. 380–387.
scheme,” in Proceedings of Artificial Intelligence. Springer, 2008, pp. [28] L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models
63–76. from few training examples: an incremental bayesian approach tested on
[6] V. Di Massa, G. Monfardini, L. Sarti, F. Scarselli, M. Maggini, and 101 object categories.” in IEEE International Conference on Computer
M. Gori, “A comparison between recursive neural networks and graph Vision and Pattern Recognition, Workshop on Generative-Model Based
neural networks,” in International Joint Conference on Neural Networks, Vision, 2004.
2006, pp. 778–785. [29] J. A. Hanley and B. J. Mcneil, “The meaning and use of the area under a
[7] V. Di Massa, “Graph neural networks, image classification and object receiver operating characteristic (roc) curve.” Radiology, vol. 143, no. 1,
recognition,” Ph.D. dissertation, Universit degli Studi di Siena, (Dipar- pp. 29–36, April 1982.
timento di Ingegneria dellInformazione), Siena, Italy, 2008. [30] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan,
[8] Z. Wang, D. Feng, and Z. Chi, “Comparison of image partition methods “Object detection with discriminatively trained part based models,” IEEE
for adaptive image categorization based on structural image represen- Transactions on Pattern Analysis and Machine Intelligence, vol. 32,
tation,” in the 8th International Conference on Control, Automation, no. 9, pp. 1627–1645, September 2010.
Robotics and Vision, vol. 1, Kunming, China, December 2004, pp. 676– [31] K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek, “Evaluating
680. color descriptors for object and scene recognition,” IEEE Transactions
[9] A. Oliva and A. Torralba, “Modeling the shape of the scene: A on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1582–
holistic representation of the spatial envelope,” International Journal 1596, August 2010.
of Computer Vision, vol. 42, pp. 145–175, 2001.
[10] S. Haykin, Neural Networks: A Comprehensive Foundation. Upper
Saddle River, NJ, USA: Prentice Hall, 1998.
[11] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini,
“The graph neural network model,” IEEE Transactions on Neural
Networks, vol. 20, no. 1, pp. 61–80, 2009.
[12] F. Scarselli, S. L. Yong, M. Gori, M. Hagenbuchner, A. C. Tsoi,
and M. Maggini, “Graph neural networks for ranking web pages,” in
International Conference on Web Intelligence, Washington, DC, USA,
2005, pp. 666–672.
[13] S. L. Yong, M. Hagenbuchner, A. C. Tsoi, F. Scarselli, and M. Gori,
“Document mining using graph neural network,” in Proceedings of the
5th Initiative for the Evaluation of XML Retrieval Workshop, N. Fuhr,
M. Lalmas, and A.Trotman, Eds., 2007, pp. 458–472.
[14] R. Chau, A. C. Tsoi, M. Hagenbuchner, and V. C. S. Lee, “A con-
ceptlink graph for text structure mining,” in Proceedings of the Thirty-
Second Australasian Conference on Computer Science, Wellington, New
Zealand, January 2009, pp. 129–137.
[15] G. Monfardini, V. Di Massa, F. Scarselli, and M. Gori, “Graph neural
networks for object localization,” in European Conference on Artificial
Intelligence, Riva del Garda, Italy, August 2006, pp. 665–669.
[16] R. C. Prim, “Shortest connection networks and some generalization,”
Bell System Technology Journal, vol. 36, pp. 1389–1401, 1957.
[17] B. Delaunay, “Sur la sphe‘re vide,” Izv. Akad. Nauk SSSR, Otdelenie
Matematicheskii i Estestvennyka Nauk, vol. 7, pp. 793–80, 1934.
[18] W.-Y. Ma and B. Manjunath, “Edgeflow: a technique for boundary detec-
tion and image segmentation,” IEEE Transactions on Image Processing,
vol. 9, no. 8, pp. 1375 –1388, August 2000.
[19] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”
International Journal on Computer Vision, vol. 60, no. 2, pp. 91–110,
2004.

421
419

You might also like