0% found this document useful (0 votes)

13 views20 pages

SHMOSM056 - Annotation Process in OSM

Uploaded by

csriyer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views20 pages

SHMOSM056 - Annotation Process in OSM

Uploaded by

csriyer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

bs_bs_banner

Transactions in GIS, 2012, 16(4): 561–579

Research Article

The Annotation Process

in OpenStreetMap

Peter Mooney Padraig Corcoran

Department of Computer Science School of Computer Science and
National University of Ireland Informatics
Maynooth University College Dublin

Abstract
In this article we describe the analysis of 25,000 objects from the OpenStreetMap
(OSM) databases of Ireland, United Kingdom, Germany, and Austria. The objects
are selected as exhibiting the characteristics of “heavily edited” objects. We consider
“heavily edited” objects as having 15 or more versions over the object’s lifetime. Our
results indicate that there are some serious issues arising from the way contributors
tag or annotate objects in OSM. Values assigned to the “name” and “highway”
attributes are often subject to frequent and unexpected change. However, this “tag
flip-flopping” is not found to be strongly correlated with increasing numbers of
contributors. We also show problems with usage of the OSM ontology/controlled
vocabularly. The majority of errors occurring were caused by contributors choosing
values from the ontology “by hand” and spelling these values incorrectly. These
issues could have a potentially detrimental effect on the quality of OSM data while
at the same time damaging the perception of OSM in the GIS community. The
current state of tagging and annotation in OSM is not perfect. We feel that the
problems identified are a combination of the flexibility of the tagging process in
OSM and the lack of a strict mechanism for checking adherence to the OSM
ontology for specific core attributes. More studies related to comparing the names of
features in OSM to recognized ground-truth datasets are required.

1 Introduction

OpenStreetMap (OSM) is a collaborative project to create a free editable map of the

world. It is currently probably the most prominent and well-known example of Volun-
teered Geographic Information (VGI) on the Internet (Goodchild 2007, 2008). The OSM
database is a very significant collection of volunteer collected spatial data and is worthy

Address for correspondence: Peter Mooney, Department of Computer Science, National University
of Ireland Maynooth, Maynooth,Co. Kildare, Ireland. E-mail: [email protected]

© 2012 Blackwell Publishing Ltd

doi: 10.1111/j.1467-9671.2012.01306.x
562 P Mooney and P Corcoran

of research investigation. OSM is based on the “wiki collaborative model”. Prasarn-

phanich and Wagner (2011) remark that the wiki model’s readwrite web paradigm,
which enables peer production and incremental improvement in an integral and organic
way “has led to the creation of significant knowledge assets and corresponding knowl-
edge communities”. The OSM Statistics page on the OSM wiki (OSM-Stats 2011) shows,
in real-time, the number of users, GPS traces, nodes, ways, and relations currently stored
in the OSM database. At the time of writing ( July 2011) there are over 100 million ways
in the database. There are over 400,000 contributors registered in the OSM project.
Volunteers in the OSM community collect geographic information using GPS devices and
submit this to the global OSM database (Ciepluch et al. 2009). In recent years companies
such as Yahoo! and Bing have made global aerial imagery available to the OSM projects.
This imagery can be used as a base layer in one of several OSM editors where volunteers
can trace the outline of geographical features from the aerial imagery. Spatial datasets
which are available under OSM compatible free and open data licenses can be “bulk”
imported into OSM. Examples include: Automotive Navigation Data’s donation of the
entire road and street network database of the Netherlands, CORINE Landcover data-
bases for France and Estonia, and the TIGER road network dataset in the United States.
However, for many reasons, OSM discourages bulk importing unless it is supported by
the larger OSM community (see OpenStreetMap 2011b). For example if bulk imports are
incorrectly managed they may delete existing contributed data within the OSM database
in areas where specific OSM contributors are actively maintaining this data. The pref-
erence and priority for spatial data collection, importing, and editing of data in the OSM
global database is with ‘volunteer mappers’ (individual and groups).
Real world geographic features are represented in OSM as points, lines, and poly-
gons. Thematic attributes for these features are stored as tags. Tagging has emerged as a
popular means to annotate online objects such as bookmarks, photos, and videos
(Cantador et al. 2011). In most collaborative systems, users create or upload content
(items), annotate them with freely chosen words (tags), and share it with other users
(who may in turn edit or update the annotations). In OSM there are no upper limits to
the number of tags associated with any object. While it is discouraged, and editing
software will identify the problem, objects do not necessarily have to be assigned any
tags. OSM does not have any content restrictions on tags that can be assigned to points,
lines, or polygons. One can use any tags provided “the values are verifiable” (Open-
StreetMap 2011a). The Map Features guide (OpenStreetMap 2011a) emphasizes that
there is “a benefit in agreeing to a recommended set of features and corresponding tags
in order to create, interpret and display a common basemap.” Using tags (and their
recommended set of values) increases the likelihood that spatial data contributed to OSM
will be understood by various cartographic rendering engines which create map visual-
izations from OSM data. OSM contributors can also tag and edit objects that they did
not create themselves. Tagging has been described as “one of the dilemmas in user
behaviour in Web 2.0” (Liu et al. 2011). Properly and exhaustively tagging all objects
(for example objects in OSM or photographs in an online album) is labour intensive and
time-consuming but is important for the overall quality of the collection of objects. Poor
or incorrect tagging leads to unsatisfactory results (Liu et al. 2011). In this article we use
“annotation” as a synonym for tagging or assigning tags to objects in OSM.
This article gives an overview of characteristics of the annotation process in OSM to
investigate what happens when multiple contributors edit objects in a spatial database.
The article shows that OSM data is constantly changing and evolving and through

© 2012 Blackwell Publishing Ltd

Transactions in GIS, 2012, 16(4)
The Annotation Process in OpenStreetMap 563

analysis of the history of the evolution of objects one can see how objects have changed
(in response to edits) over time. OSM is a very large (and growing) spatial dataset and
probably the most well known example of VGI. We analyse the annotation of 25,000
spatial objects or features in the OSM databases of Ireland (653), United Kingdom
(10,040), Germany (10,604), and Austria (3,367). Extracting the history of objects in
OSM is currently a difficult and time-consuming process (Mooney and Corcoran 2011a,
b). We specifically selected objects which are “heavily edited”, that is objects with 15 or
more versions of edits. We set V as the lower threshold of versions at V ⱖ 15. These
objects are more likely to exhibit collaborative editing where multiple OSM contributors
edit and annotate these objects. We feel these objects are particularly interesting and
could eventually assist us in understanding the nature of contributions in the OSM
collaborative project model. Antin (2011) remarks that for collaborative projects (in
their case Wikipedia) focus often quickly turns to the “practical challenges of informa-
tion quality, coordination, and contributor bias” related to these open models.
The remainder of this article is organized as follows. Section 2 provides a discussion
of the literature related to the topic of contributors to VGI projects and the quality of
their contributions. In Section 3 we discuss the experimental setup for this research. It is
necessary to describe how the 25,000 objects were selected from OSM and how their
histories were compiled. The core of the article is Section 4 which outlines the results of
our analysis on the selected 25,000 OSM objects. Section 5 summarizes the key outcomes
from the article and in this section we present some issues for future work and research
on this topic.

2 Overview of Related Work

In this section we provide an overview of related work. To better organize this overview,
we have divided related work into literature-based related work (section 2.1) and then
web-based related work (section 2.2). Web-based related work is for research which may
not have peer-reviewed literature related to it.

2.1 OSM in Peer-reviewed Literature

VGI and OpenStreetMap are exciting research areas at present (Mooney et al. 2010b) in
GIS and related disciplines (Goodchild 2008). Qian et al. (2009) remark that “since
general users can add and change data in VGI, the stored data should update frequently,
and result in an abundant and updated geographic dataset.” This has “reversed the
traditional top-down flow of information” and Flanagin and Metzger (2008) state that as
the amount of VGI continues to grow “the issues of credibility and quality should assume
a prominent place on the research agenda.” Without some quantitative measures of
accessing the quality of the OSM data, the GIS community has been slow to consider OSM
as a serious source of data (Mooney et al. 2010b). While spatio-temporal accuracy and
quality are fundamental requirements for GIS modelling and applications, documentation,
metadata, and attribution of data is also of major importance. This problem is experienced
in almost all domains. Bulterman (2004) suggests that the “complete disregard for
documentation of data resources” has made it almost impossible for one to perform a
fitness for use or fitness for purpose evaluation on data resources on the Internet. Brando
and Bucher (2010) advise that the quality of VGI is enhanced if proper metadata is created

© 2012 Blackwell Publishing Ltd

Transactions in GIS, 2012, 16(4)
564 P Mooney and P Corcoran

and maintained, containing information on: types of changes and edits, methods of survey
and collection, and finally a fitness for purpose statement.
Many papers in the literature report very positive experiences and results for OSM.
Haklay (2010) describes a comparison of the road network in OSM for England with the
road network in the Ordnance Survey UK Meridian dataset. The conclusion of Haklay’s
study is that OSM is “as good if not better than the Meridian dataset in terms of
positional accuracy”. Haklay remarks that “completeness is very good for major urban
centres and draws the conclusion that if mapping applications want to use OSM data for
these locations, it is as good a choice as any other source of spatial data. A similar study
by Zielstra and Zipf (2010 a, b) of OSM and TeleAtlas for Germany shows that for larger
cities (Berlin, Frankfurt, Munich, etc.) the OSM spatial data “is so rich that OSM is now
replacing proprietary data for many projects’ ”. Ludwig et al. (2011) compare Navtec
and OSM street and road networks for Germany and conclude that between the two
datasets there “are considerable qualitative differences between regions, towns, and
street categories” but at a national level the “relative completeness of OSM objects is
high enough for maps and cartographic production.” Zielstra and Hochmair (2011)
compared OSM, TIGER, NAVTEQ NAVSTREETS and Tele Atlas Multinet street data
for the state of Florida, USA in a related study to Zielstra and Zipf (2010a, b) and found
“OSM strong heterogeneity of OpenStreetMap data for the US in terms of its complete-
ness’”. Ather (2009) comments that as OSM grows, most regions (the UK in their study)
will eventually fulfil the levels of map quality required for other GIS applications. He
goes on to comment that “it would be useful if long-term measures were in place to
provide continued assessments of OSM map quality and then communicate these results
back to users as they browse through the map.” In Over et al. (2010) the authors
comment that the quality control of OSM differs fundamentally from professionally
edited maps. The community-based approach allows anyone to upload and alter the
spatial data. But due to the huge number of editors, errors and conflicts are usually
quickly resolved. They state that “OSM has probably the most up-to-date map data.” In
urban areas, changes in the road network appear in the OSM dataset long before
appearing in other map data providers. Haklay et al. (2010) investigates the relationship
between the number of contributors to OSM and data accuracy against a ground-truth
dataset. Haklay et al. (2010) conclude that, beyond 15 contributors per square kilome-
tre, the positional accuracy of OSM becomes very good below 6 m. At the other end of
the scale, the first five contributors to an area seem to provide the biggest contribution in
terms of positional accuracy improvement. Girres and Touya (2010), in their quality
assessment of OSM dataset for France, show that the number of OSM objects in an area
clearly grows in relation to the number of contributors in the area but under a non-linear
relationship. As is clearly shown here, OSM is a multiple representation database
containing Points-of-Interest, land cover, transportation networks, buildings, waterways
and waterbodies. There is also some literature on the nature of contributors to VGI and
OSM. Coleman et al. (2009) show the participants in the production process of VGI are
both users and producers or “prosumers”. Assessing the credibility of contributors is
important for evaluating the overall reliability of their contribution. They find many
reasons why contributors take part in VGI, including: for social rewards, to take part in
an outlet for creative and independent self-expression, pride in one’s home place,
and intellectual stimulation. Budhathoki et al. (2008) argue that motivations of the
contributors from VGI are very strong and can assist in the “distribution of the produc-
tion of GI for Spatial Data Infrastructures (SDI) among organizations, individuals, and

© 2012 Blackwell Publishing Ltd

Transactions in GIS, 2012, 16(4)
The Annotation Process in OpenStreetMap 565

groups of individuals”. Then a ”hybrid SDI model that draws on the synergy between
the conceptual foundation of SDI and an extensive produser base of VGI can be
developed.”
However, there are also several negative outcomes for OSM from some of these
research studies. Welser et al. (2011) remark that there is the perception amongst many
in the scientific domain that the quality of open collaborative projects, such as Wikipedia,
“will never been sufficient as long as it relies on non-expert volunteers of unknown
identity” and this appears to be an issue for some in the GIS community regarding OSM.
Qian et al. (2009) conclude that a serious negative aspect of the VGI model is that the
underlying data is acquired by non-professionals with non-professional equipment,
meaning that there cannot be any guarantee of quality about the VGI (or OSM) data
unless it can be compared to some other source. Ballatore and Bertolotto (2011) call
OSM “spatially-rich but semantically-poor”. Smart et al. (2011) show how freely avail-
able sources of georeferenced data can be used for automated enrichment of 3D city
models. OSM is included as a key data source. They concluded that “matching the
georeferenced point locations from sources, such as OSM, to the geometry of the
buildings in the registered 3D model” posed significant problems due to accuracy and
sparse attribute problems. While Haklay’s (2010) comparison of OSM and Ordnance
Survey UK data reflects very positively on OSM, the author concludes by warning that
“there are serious issues about completeness and coverage” in the UK. Coverage is also
commented upon in the work of Zielstra and Zipf (2010a,b) who state that “while
professional data is not without its faults, the coverage of OSM in rural areas is too small
to be seriously considered a sophisticated alternative for any applications.” When one
moves away from large urban centres, the major issue for quality becomes one of
coverage – in many rural areas there is little or no OSM coverage at all. While Ludwig
et al.’s (2010) comparison of Navtec and OSM street/roads is positive from the OSM
perspective the authors conclude that “other attributes of OSM, which are needed for
other advanced GIS problems, are still relatively incomplete.” Mooney and Corcoran
(2011c) investigate the potential role VGI can play in eEnvironment and various Spatial
Data Infrastructures (SDI) on a local, regional, and national level. Specifically for OSM,
the authors conclude that while currently problems such as inconsistency of metadata
and unpredictable changes to geometries are a barrier to inclusion in SDI, the quantity of
spatial data in OSM means it has a role to play in SDI. Mooney et al. (2010a) investigate
the spatial representation of natural features in OSM. They report that there are differ-
ences in polygon structure for natural and landuse polygons based on: sampling point
density, simplification and generalisation of imported data, and inconsistency in manual
tracing from aerial imagery. Overall this highlights inconsistencies in representation of
natural features in OSM databases for different countries, regions, and contributors.
Mooney et al. (2010b) apply shape-matching techniques from pattern recognition to
compare OSM polygons (lakes) with Ordnance Survey Ireland NMA data. Their results
reveal that the shapes of these polygons in OSM compare poorly with authoritative
NMA data. The authors conclude that the quality of OSM data is not necessarily solely
restricted to a geometric comparison to some other dataset but should include other
aspects such as metadata and tagging. Girres and Touya (2010) also suggest that tagging
and annotation of objects within OSM deserves immediate attention.
The production of cartographic output from the OSM database is the most popular
use of the raw spatial data with some authors (Kessler et al. 2009, Over et al. 2010)
remarking that OSM is not considered for “serious Geomatics applications”. GIS-based

© 2012 Blackwell Publishing Ltd

Transactions in GIS, 2012, 16(4)
566 P Mooney and P Corcoran

research using OSM as input data for models and testing is beginning to appear in the
literature. Wallgrün et al. (2010) use OSM for matching a sketch map to a geo-referenced
dataset. Corcoran et al. (2011) use OSM data for testing map simplification models. The
same authors use OSM data for model testing in their work on progressive transmission
of vector data (Corcoran and Mooney 2011). Jacob et al. (2010) use OSM as the source
of spatial data for routing algorithms used in the development of haptic-feedback
applications for mobile pedestrian navigation systems.

2.2 OSM-driven Web-based Applications and Research

Some related work which uses the history of objects in OSM is available as web services
online. While MapCompare (Geofabrik 2011b) is not a history browsing tool, it allows
visual comparison of OSM with Google Maps, Bing, etc. Snapshot images could be
compiled over time to provide a visual “history” of a specific area or feature(s). The OSM
History Browser by Langläufer (2011) provides a simple interface to retrieve the entire
version history of any OSM object (node, way, or relation). The OSM identification
number of the object is required. The version history of the object is provided in HTML
table format. Differences between versions can be inspected using the “compare ver-
sions” function. The OSM History Viewer by OSM History Viewer (2011) is a similar
web application. For this web application a little more knowledge about the internal data
management of OSM is required as one must supply the identification number of
“changesets” to the application. A changeset is a collection of all the edits performed by
a particular user over a 24 hour period. The OSM History Coverage viewer by Ramm
(2011) is a web-based service which creates animated GIF images that depict how OSM
coverage of an area has changed over time. Computation is too time-consuming to offer
this as a live service, but one can request images to be created and then view or download
them once they are ready. Trame and Kessler (2011) describe a web-application which
generates heat-maps for nodes in OSM. The version number of the nodes (and polygons)
is used in heat-map visualization. Roick (2011) creates visualizations of OSM data for
Europe in hexagonal cells. Version numbers is one of the attributes visualized. van Exel
et al. (2010) consider if version history could be used as a variable in building trust and
quality metrics for OSM data. Similar work is appearing for Wikipedia but this work is
at a more advanced and mature stage of development. iChase, developed by Riche et al.
(2010) visualizes the trend of activities for articles and contributors. It allows users to
interactively explore the history of changes by drilling down into specific articles and
contributors, or time points to access the details of the changes. In similar work Suh et al.
(2008) describe WikiDashBoard, which also provides drill down functionality on the
history of Wikipedia articles. Pirolli et al. (2009) claim that their user trails using
WikiDashBoard suggested that “increased exposure to the editing/authoring histories of
Wikipedia increases credibility judgements by users.”

3 Experimental Setup

In this section we provide an overview of the experimental setup for the analysis presented
in Section 4. In section 3.1 we discuss how the OSM data is obtained, processed, and
prepared for analysis. In section 3.2 we discuss the process of selecting OSM objects for our
analysis. Section 3.3 discusses the characteristics of the selected objects and regions from

© 2012 Blackwell Publishing Ltd

Transactions in GIS, 2012, 16(4)
The Annotation Process in OpenStreetMap 567

which they are drawn. As the OSM global database contains several million objects we feel
that it is necessary to carefully select a subset of these objects for analysis.

3.1 Understanding the OSM Data

OSM data is freely available, in OSM-XML format and Esri Shapefile format, from the
GeoFabrik web service (see Geofabrik 2011a). This data is updated almost hourly.
Consequently, the most up-to-date version of the OSM database for any region of the
world is always available. GeoFabrik provide the OSM data divided into country and
continent packages which makes it very easy to download specific regions of interest
rather than processing the enormous and rather unwieldy entire “planet.osm” dataset for
the global OSM. The “planet.osm” history dump file (OSM-XML format) is available
for download. The uncompressed version of this file is currently close to 500 GB in size.
The enormity of this OSM-XML file makes it difficult to work with in both conventional
XML processing software packages and programming languages. Hardware issues of
disk space availability and memory usage mean that processing this file is beyond the
capabilities of most standard desktop or server computers. The OSM API (2011)
osmAPIPage allows access to the history of nodes, ways, and relations. These are also
returned in OSM-XML format. One must make a separate API call for each unique node,
way, or relation required. In the case of ways the OSM-XML returned containing the
history is structured as follows. Each version of the way is included in chronological
order of when it was created. For each version of the way an unordered listing of the tags
(key-value pairs) associated with that version is also included. The timestamp and user ID
of the contributor is included. Unfortunately only the OSM identifiers of the nodes in
each way are provided. A separate API call must be performed to look up and retrieve the
spatial coordinates of nodes in each way. This makes the process very time-consuming
rather than computationally complex. In Mooney and Corcoran (2011a), we describe a
software-based method for downloading the history of a chosen set of objects from
OSM. This involves firstly identifying objects in the most current version of the OSM
database. Then for each of these objects their history (in OSM-XML format) is down-
loaded directly from the OSM servers, using the OSM API (2011) osmAPIPage. The
history of each object is only a reference to the nodes used to create each geometry.
Subsequently, each node must be downloaded from the OSM API (2011) osmAPIPage to
create the geometry of the objects in a PostGIS database.

3.2 Selecting OSM Objects for Analysis

The OSM global database contains several million objects (OSM-Stats 2011). Conse-
quently, we felt that it was necessary to carefully select a subset of these objects for analysis.
As our emphasis is on tagging and annotation it was necessary to select objects which had
a non-empty set of tags and had tags for the most frequently occurring attributes including:
name, highway, and landuse. “Heavily edited” objects in OSM should provide good
examples of “significant editing and revision work by many contributors” (Anderka et al.
2011). This criteria allows us to discard analysis, for this study, of objects in OSM with a
very low number of edits. A closely related concept in Wikipedia to heavily edited objects
in OSM is the “featured article” and much research work related to the quality and
trustworthiness of Wikipedia articles focus on “featured articles”. In Javanmardi and
Lopes (2010), the authors discuss the development of a model for the evolution of content

© 2012 Blackwell Publishing Ltd

Transactions in GIS, 2012, 16(4)
568 P Mooney and P Corcoran

Table 1 The distribution of version numbers of ways in the four OpenStreetMap

datasets

Versions Ireland United Kingdom Germany Austria

1 139,722 2,106,647 6,442,209 682,155

2 51,306 729,014 2,126,222 178,694
3 23,806 305,536 988,126 85,113
4 11,274 158,557 552,740 71,552
5 6,599 84,889 336,239 67,489
6 → 10 9,296 131,836 594,210 113,652
11 → 15 1,571 22,929 124,019 13,833
16 → 20 369 6,242 36,596 3,697
21 → 30 198 3,183 19,321 2,050
31 → 40 36 670 4,294 520
>40 15 328 2,332 290
Total 244,192 3,549,831 11,226,308 1,219,045

quality in Wikipedia articles in order to estimate the fraction of time during which articles
retain high-quality status. They select only “featured articles”. As outlined by Anderka
et al. (2011), featured articles in Wikipedia are “made” after significant editing and
revision work by many contributors and moderators. Stein and Hess (2007) argue that
“instead of just looking at the formal characteristics of featured articles one should look
at what contributors do on these pages” in order to understand the effects of multiple
contributors over an extended period of time.

3.3 Heavily Edited Objects

Table 1 provides a summary of the distribution of version numbers in the four OSM
datasets we have used in the analysis in this article. The Ireland OSM dataset is used.
This includes the Republic of Ireland and Northern Ireland as part of the island of
Ireland. The United Kingdom consists of England, Scotland, and Wales. Germany and
Austria were chosen for inclusion in this study because they have two of the most
active OSM communities in Europe. It is very interesting to note from Table 1 that a
very large percentage of ways in all four datasets have five or less versions. The per-
centages are as follows: Ireland (95.3% as 232,707 from 244,192), UK (95.4% as
3,384,643 from 3,549,831), Germany (93.1% as 10,445,536 from 11,226,308), and
Austria (89.0% as 1,085,003 from 1,219,045). It was necessary to choose a threshold
V on or above which features could be considered as “heavily edited” or “popular”
features. Unfortunately, to our current knowledge, there is no similar work in the
literature from which we can base this choice. For the purposes of this work we chose
to set the threshold value V as 15. Setting V as 15 should allow us to gather features
from the OSM databases which have been edited by multiple contributors. We manu-
ally sampled 200 features with V ⱖ 15 from the UK dataset. These features exhibited
a number of interesting characteristics including: long editing timespan from feature
creation to current version timestamp, multiple unique contributors adding/deleting

© 2012 Blackwell Publishing Ltd

Transactions in GIS, 2012, 16(4)
The Annotation Process in OpenStreetMap 569

nodes and tags on the feature, and contributors returning after a number of edits have
been made by other contributors. As evident from Table 1 there are very few (relatively
speaking) “heavily edited objects” in the OSM database. Yet they offer, in our opinion,
the most interesting opportunities for analysis of the collaborative aspect of OSM
editing and contribution. The equivalent object in Wikipedia, the “featured articles” is
similar in terms of occurrence. In September 2011 there were almost four million
articles in the English language Wikipedia but only 3,377 featured articles – which
roughly translates to one featured article for every 1,000 articles. In our case study
databases there are approximately 16 million objects. Just over 12 million of these
(about 75%) have only one or two versions. Consequently we felt that the choice of
V ⱖ 15 was appropriate because of the small number of objects available.
In the Ireland dataset there are 776 features with V or more versions, in the UK dataset
there are 12,804, in the Germany dataset there are 76,355, and in the Austrian dataset
there are 7,950. The total number of features with versions V ⱖ 15 is 97,885. We randomly
selected 25,000 of these features and finally 653 features were selected from the Ireland
dataset, 10,040 from the United Kingdom dataset, 10,604 from the German dataset, and
3,367 from the Austrian dataset. Each dataset is a mix of landuse, highway, amenity,
waterways, and natural features. In total the OSM-XML processing and OSM data
download from the OSM-API took 920 hours. The scripts to automatically download the
OSM-XML using the OSM-API were carefully monitored, as connection breaks occurred
frequently. We were also mindful of bandwidth limiting on the OSM servers. The scripts
were usually only run during normal working hours. The datasets were downloaded and
processed during May and June 2011. The data was stored in a PostGIS database.
For each object P the history of edits is downloaded as an OSM-XML file from the
OSM API. Suppose that the object P has n versions (n ⱖ V) where i = 0 is the first version
and i = n-1 is the final or current version. Then each version Pi of P is stored as the tuple
represented in Equation 1.

Pi = (ui, vi, Ni, τ i, ci, NSRi, G(i), A(i), L(i), Di, T (i)) (1)

where the elements of the tuple Pi are as defined as follows: ui is the user ID of the OSM
contributor who edited version vi, vi is the version of the OSM object, Ni is the number
of nodes in object Pi, ti is the timestamp for the edit, ci is the changeset that the edit was
saved in, NSRi is the number of nodes which “survived” from the previous version Pi-1
of polygon Pi, G(i) is the geometry of Pi, A(i) is the area of G(i) in hectares (only
calculated for polygons), L(i) is the length or perimeter of G(i) in meters, Di is the mean
spacing in meters between the adjacent nodes of Pi, and T(i) is the set of tags (keys,values)
assigned to this version of Pi which are stored as a comma-separated list. Finally, if
specified for each version Pi of the object P a vector data file representation is written out
to disk. There are a number of possible output formats: Esri Shapefile, KML file, or GPX
format. This allows for quick visualization within most desktop GIS software and some
web-based GIS.

4 Experimental Analysis

In this section we outline results from our experimental analysis of the 25,000 OSM
features. The analysis focuses on three aspects of annotation of these features in OSM
namely: assignment of values to attributes (or in OSM terminology values to tags or tag

© 2012 Blackwell Publishing Ltd

Transactions in GIS, 2012, 16(4)
570 P Mooney and P Corcoran

Table 2 The percentage of ways in the four OpenStreetMap datasets with the specified
number of unique tags over each object’s history

UniqueTags UK% Germany% Austria% Ireland%

1,2 15.96 35.01 13.51 20.31

3,4 36.11 35.46 21.09 36.92
5,6 27.15 16.94 16.31 23.69
7,8 13.07 6.61 11.7 11.23
9,10 4.68 2.58 8.61 5.38
11,12 1.72 1.52 11.4 2
13,14 0.74 0.84 9.21 0.46
ⱖ15 0.56 1.05 8.18 0
Total 10,040 10,604 3,367 653

keys) (section 4.1), types of contribution by the OSM volunteers on these features
(section 4.2), and use of the OSM Map Features page as a controlled vocabulary (section
4.3).

4.1 Tag Assignment

One of the major concerns about OSM is that the flexibility of the tagging/annotation
model is such that spurious data or noise will be created (Mooney et al. 2010b). In
Table 2 we show the distribution of unique tags (key-value pairs) assigned to the objects
in the four OSM datasets. The first column shows the number of unique tags. All values
are percentages of the total number of objects in the corresponding OSM dataset
(outlined in the final row of the table). A low number (ⱕ4) of unique tags can indicate
stable tags which remain unchanged over the lifetime of the object. A higher number of
unique tags can reveal either a more detailed set of tags or frequent changes to the values
of tags over the lifetime of the object (referred to as tag “flip-flopping”). There are some
interesting observations. In the case of Austria one can notice the large percentage of
objects with ⱖ9 unique tags. We speculate that this could be caused by the large bulk
import of government spatial data into the Austrian OSM database with very rich
metadata. Germany has the highest percentage of objects in the four OSM datasets with
just one or two unique tags. In the four datasets some objects were available without any
tags assigned. Often this problem was corrected very quickly (within one hour).
However, in Austria 47 objects, in UK 310 objects, in Ireland 46 objects, and Germany
398 objects had an empty tag set for at least one day.
Table 3 summarizes the number of unique values assigned to the “name” tag of
objects in the four OSM databases. As expected 70% or more of the objects that have a
“name” attribute with an assigned value which remains unchanged up and until the
current version. The high percentage of objects having two name value assignments is
probably a result of placename spelling errors, incorrect naming, or the splitting of a
single polygon or way into two or more new objects. From our knowledge of the data we
believe that contributor disagreement, spelling errors, and uncertainty in local knowledge
(possibly resulting from aerial imagery tracing rather than physical sampling) are respon-

© 2012 Blackwell Publishing Ltd

Transactions in GIS, 2012, 16(4)
The Annotation Process in OpenStreetMap 571

Table 3 Number of unique values assigned to the “name” tag of objects in the four OSM
databases

Number Names UK Ireland Austria Germany

1 5,528 (76.6%) 299 (69.9%) 1,804 (70.2%) 3,950 (78.1%)

2 1,333 (18.5%) 101 (23.6%) 587 (22.8%) 851 (16.8%)
3 280 (3.9%) 19 (4.4%) 148 (5.8%) 195 (3.9%)
4 58 (0.8%) 6 (1.4%) 27 (1.1%) 49 (1%)
ⱖ5 15 (0.2%) 3 (0.7%) 5 (0.2%) 15 (0.3%)

Table 4 Example of changes to the value of the name attribute of the road polyline
24276789 (England)

Date of Edit Version Name Contributor

2008-05-08 1 Oakthorp Drive 35691

2008-05-09 6 Over Green Drive 35691
2008-05-09 9 Oak Thorp Cr 35691
2008-05-09 10 Oak Thorp Dr 35691
2008-05-11 14 Oak Thorp Dr; Broomcroft Rd 35691
2008-05-11 15 Oak Thorp Dr 35691
2010-02-07 18 Oak Thorp Drive 9065
*2010-08-24 19 Oak Thorpe Drive 35691

Table 5 Example of changes to the value of the name attribute of the road polyline
9779683 (Germany)

Date of Edit Version Highway Contributor

2007-10-18 1 primary 16631

2007-10-18 2 tertiary 16631
2007-12-06 8 tertiary; primary 16631
2007-12-06 9 residential 16631
2008-01-04 10 residential; secondary 16631
2008-01-04 14 residential 16631
2008-06-20 22 tertiary 46829
*2010-03-21 29 tertiary 95223

sible for the assignment of three or more values to the “name” tag of any object. For the
purpose of illustration we provide Tables 4, 5, and 6, outlining the edit history of three
road polylines in OSM where there are changes to either the road “name” (Tables 4 and
6) or the highway designation attribute (Table 5). The aesterick symbol indicates the

© 2012 Blackwell Publishing Ltd

Transactions in GIS, 2012, 16(4)
572 P Mooney and P Corcoran

Table 6 Example of changes to the value of the name attribute of the road polyline
4755815 (Scotland)

Date of Edit Version Name Contributor

2007-06-14 1 A199 6871

2008-01-24 2 null 5121
2009-03-18 17 Edinburgh Road 108345
2011-01-04 24 Milton Road East 364126
2011-01-04 25 Edinburgh Road 364126
2011-01-13 27 Milton Road East 364126
2011-01-13 28 Edinburgh Road 364126
*2011-02-10 29 Edinburgh Road 108345

Table 7 Number of unique values assigned to the “highway” tag of objects in the four
OSM databases. The column ‘highway’ indicates the number of unique values assigned

Highway UK Ireland Austria Germany

1 4,999 (59.4%) 298 (50.5%) 1,110 (47.1%) 495 (54.8%)

2 2,621 (31.2%) 222 (37.6%) 855 (36.3%) 271 (30%)
3 650 (7.7%) 60 (10.2%) 305 (12.9%) 110 (12.2%)
4 117 (1.4%) 8 (1.4%) 78 (3.3%) 22 (2.4%)
ⱖ5 22 (0.3%) 2 (0.3%) 10 (0.4%) 5 (0.6%)

current version of the objects in the OSM database. Each table shows the date of edit, the
version number, the tag value, and the ID of the contributor who made the edit. In all
three cases multiple contributors are involved. Some edits made on the same day
probably, in our opinion, correspond to self-corrections by the contributors who made
them.
Table 7 is similar to the results presented in Table 3. In this table we summarize the
number of unique values that the “highway” attribute is assigned for all objects in the four
databases with the “highway” tag. There are some interesting observations. A very small,
but not negligible, number of objects have a high number of changes of highway
designation. For example object 9779683 in Germany has four different values assigned to
its highway attribute over its 29 version history. These values are “primary”, “tertiary”,
“residential”, “secondary”. There are seven unique contributors to this object. A signifi-
cant percentage of objects in all four databases have three or more unique values assigned
to their highway attribute. We believe that it is unlikely that real-world physical highways
(motorways, roads, paths, etc.) could change their designation this frequently. For
example: UK (9.3%), Ireland (11.8%), Austria (16.7%), and Germany (15.1%). With
comparison to a ground-truth dataset it is difficult to understand precisely the reason for
the tag “flip-flopping” with the highway attribute. We believe this could demonstrate
uncertainty amongst contributors regarding the designation for a given highway object.
This could also reveal a deeper issue of semantics within the OSM Map Features. Different

Transactions in GIS, 2012, 16(4)
The Annotation Process in OpenStreetMap 573

contributors may have conflicting understanding of similar descriptions such as “living-

street” and “residential”.

4.2 Influence of Contributors

In our case study datasets there are, 2,779 unique contributors to the UK dataset, 355 to
the Ireland dataset, 1,485 in Austria, and 9,325 for the Germany dataset. Any contribu-
tor to OSM can add tags or edit existing tags on OSM objects, whether they created them
or not. Haklay et al. (2010) and Girres and Touya (2010) show that increases in the
number of OSM contributors in an area is strongly related to an increase in geometric
data quality and spatial data volume. What effects do the number of contributors for
each object have on the number of changes to the “name” tag or changes to the
designation value of “highway” attributes in Tables 3 and 7? We calculated the corre-
lation and the Spearman correlation (r and p-value, respectively) for the number of
unique contributors to each object against the number of tag “flip-flops” on the
“highway” tag. Objects are included if they had been assigned a “highway” tag for more
than V/2 versions of their history. Unfortunately the results are inconclusive. We calcu-
lated the two-sided p-value for a hypothesis test where the null hypothesis was that the
number of contributors and the number of tag “flip-flops” were uncorrelated. A p-value
exceeding 0.05 corresponded to accepting the null hypothesis. The results were as
follows (N, corr, r, p – value): UK (8210, 0.21, 0.13, 0.561), Ireland (590, -0.18, -0.07,
0.46), Austria (2350, 0.09, 0.08, 0.061), and Germany (903, 0.22, 0.18, 0.112). The
results are disappointing but expected. The correlation values in all cases correspond to
very weak correlations. Similar results were calculated for tag “flip-flops” on the “name”
tag. While the correlation values are weak to moderate no conclusion can be drawn to
indicate a relationship between number of contributors to an object and tag “flip-
flopping” on the object. We calculated the correlation between the number of unique
contributors to an object and the number of tags at the current version. The results did
not reveal any obvious relationship. Correlations were: Germany: 0.05, UK: 0.19,
Ireland: 0.18, and Austria: 0.12. On the one hand it is a valid assumption to assume the
number of tags will increase as more contributors are involved in editing an object.
However, Kessler et al. (2011) state that no changes to tagging over many versions, under
the eyes of many contributors, could be used as a mechanism for assigning trust or
stability to an object’s tags.

4.3 Adherence to OSM Controlled Vocabulary

As discussed above, the OSM Map Features (OSM Map Features 2011) page provides a
listing of the most popular values for the most frequently occurring attributes (highway,
amenity, landuse, natural, etc.). Interestingly, we found that there were a core set of
values causing non-compliance. For example incorrect spelling of “landuse = forest” as
“forrest, forestry, forrestry” while “highway = residential” had incorrect spellings of
“ressidential, resident, residental’ ”. Errors such as these could, potentially, be corrected
automatically. For the “highway” attribute there are 37 core values (primary, motorway,
cycleway, livingstreet, etc.). For the “landuse” attribute there are 29 core values (forest,
farmyard, industrial, grass, etc.). While editor software for OSM usually present these
core values in drop-down-list selection functionality contributors can type these values in
as free-text or supply their own values for the attribute. For example, in the UK there are

Transactions in GIS, 2012, 16(4)
574 P Mooney and P Corcoran

Table 8 Overall usage of values from the OSM controlled vocabulary ‘Map Features’
from all unique values assigned to “landuse” and “highway” tags. The compliance
column indicates the number of unique values found with the number of these not in the
controlled vocabulary in brackets

Database Attribute Compliance Observations

UK Landuse 39 (10) Spelling Errors

UK Highway 138 (101) Spelling errors
‘pedestrianissed’,
‘tersiary’ and
assigning the name
of the road or
highway to the
highway tag
Ireland Landuse 5 (0) All valid
Ireland Highway 30 (0) All valid
Germany Landuse 105 (76) Spelling errors
‘medow’, ‘forrest’,
and invalid
assignments ‘fruit
trees’
Germany Highway 49 (12) Street names assigned
to highway attribute
Austria Landuse 72 (43) Spelling errors of core
values, invalid
values
Austria Highway 118 (81) Spelling errors,
multiple value
assignments,
alternative values
from bulk import

577 objects which have the “landuse” attribute at some stage of their history. In total 39
values were assigned to “landuse” attribute tags (so 39–29 = 10 free text tags not defined
in OSM Map Features). From a visual inspection, spelling errors accounted for the
majority of these variants. In Table 9 a summary of the number of objects with landuse
or highway attributes (at first and current version) is provided. The number of objects
with these attributes is shown in the Objects column. The number of these objects where
the values for either landuse or highway attributes are drawn from the Map Features
controlled vocabulary is shown in the Compliance column. In all cases the number of
objects with these tags increases from the first to the current version. Compliance with
the Map Features controlled vocabulary is very good overall. Being compliant with the
map features controlled vocabulary does not in any way indicate that this attribute
assignment is currently correct and would need to be confirmed by comparison to
ground-truth datasets.

Transactions in GIS, 2012, 16(4)
The Annotation Process in OpenStreetMap 575

Table 9 Compliance of tagging with the map features controlled vocabulary: first and
current versions

Database Attribute Version #Objects #Compliant

UK Highway First 6,730 6,650

UK Highway Current 8,269 8,267
Ireland Highway First 456 455
Ireland Highway Current 579 579
Austria Highway First 2,019 1,760
Austria Highway Current 2,326 2,325
Germany Highway First 608 587
Germany Highway Current 697 697
UK Landuse First 463 437
UK Landuse Current 577 558
Ireland Landuse First 6 6
Ireland Landuse Current 7 7
Austria Landuse First 516 406
Austria Landuse Current 914 910
Germany Landuse First 5,253 5,154
Germany Landuse Current 7,058 7,039

5 Conclusions and Future Work

In this article we have investigated how spatial objects are tagged in OSM databases. We
selected four OSM databases and from these 25,000 heavily edited objects were chosen
for analysis. The article begun with an introduction to “tagging” in OSM. This was
followed by a comprehensive overview of the literature on OSM. We then described the
process of choosing heavily edited objects and working with OSM-XML data. The
locations of Ireland, United Kingdom, Germany, and Austria were chosen because of the
home location of the authors and the activity of the OSM communities in the other three
regions. The analysis could be easily extended to other regions. Table 1 shows that over
90% of objects in the four OSM databases have ⱕ3 versions. This makes it difficult to
undertake analysis to investigate collaborative editing on these objects. Subsequently our
analysis chose to investigate “heavily edited” objects. These offer a similar concept to the
Wikipedia Featured Article. Heavily edited articles in Wikipedia are usually those that
gain the status of “featured article”. Featured articles are recognized as articles of high
quality, with a long history of collaborative editing, and have become relatively stable (no
major recent edits) (Anderka et al. 2011). Welser et al. (2011) explain that heavily edited
articles in Wikipedia usually gain the status of “featured article” and are subsequently
recognized as articles of high quality. Korfiatis et al. (2006) based their analysis of quality
of Wikipedia articles on successive edits and therefore focused on articles with a long edit
history. Hecht and Gergle (2010) focus on articles that have been edited frequently,
particularly those by the same contributor. Nemoto et al. (2011) indicate that quality
increases and stabilizes, the more contributors work on a given article.

Transactions in GIS, 2012, 16(4)
576 P Mooney and P Corcoran

The tagging and annotation of these heavily edited objects in OSM is restricted to the
use of a small number of tags. In all four datasets at least 50% of objects use six tags or
less over their history. We found the use of tags such as “source” and “description” (to
indicate how the data was captured, etc.) was sparse. Only 3.5% of the 25,000 objects
used one or both of these tags. Tag “flip-flopping” occurs where the values assigned to
tags such as “name” and “highway” change multiple times. Tables 3 and 7 show that a
small, but not negligible, percentage of objects have their “name” or “highway” tags
assigned different values over the object lifetime. The OSM Map Features page offers a
controlled vocabulary from which contributors can choose values for tags such as
“landuse” and “highway’ ”. Table 9 shows the number of objects which draw the values
for their “landuse’ ” and “highway” tags from the Map Features controlled vocabulary.
The rate of compliance is very high (>98% for the current versions of all objects).
However this compliance does not imply that the current values assigned to these
attributes are correct. Table 8 shows wide variations on the controlled vocabulary used
and Tables 7 and Table 3 show significant “flip-flopping” of values assigned to the
“highway” and “name” tags, respectively. Finally, no relationship was found to exist in
our four datasets between the number of contributors to an object and the number of
tags or tag “flip-flopping” on that object. Overall, this work shows that there are issues
in how contributors tag and annotate spatial features in OSM. These issues need to be
addressed before OSM can be considered for use in “serious geomatics applications”
(Mooney et al. 2010b, Over et al. 2010).
As described in section 3.2 our database of history for the selected 25,000 objects is
a very detailed record of contributor activity to OSM over a period of approximately
four years. This provides us with a very rich dataset from which future research work can
be developed. Haklay et al. (2010) and Girres and Touya (2010) show that increases in
the number of OSM contributors in an area are strongly related to an increase in
geometric data quality and spatial data volume. An immediate issue for future work
would be comparison of the tags of these 25,000 with ground-truth data to investigate
if a relationship exists between the number of contributors and accuracy of tagging.
There are no moderators for content in OSM. Contributors can take a ‘moderator’
responsibility for a particular OSM region or a set of objects. It would be interesting to
conduct a survey of OSM contributors to understand the causes of tag value changes for
example. This would allow us to formulate some indication of the methods of contri-
butions of different communities of OSM from different countries, etc. This work could
also include an analysis of the geometric and positional accuracy of heavily edited objects
over time measured against some ground truth dataset. As outlined in section 3.3. there
are just over 97,000 objects in the four databases with V ⱖ 15. From the complete set of
objects in all four databases this represents less than 1% of all objects. For future work
we will also consider reducing the threshold value V to investigate the effects it has on our
analysis and results. Finally, in the field of visualization we feel there is scope for work
on the visualization of the historical evolution of features in OSM. Gilbert and Kara-
halios (2009) remark that they “see vast potential for social visualization to make large
impacts on social production projects” because added value can be gained for both those
involved in the project and outside it from being able to “to observe the long-term
impacts on motivation and production in real, working social production communities.”
Several authors (Suh et al. 2008, Pirolli et al. 2009, Riche et al. 2010) claim that the
increased exposure to the editing/authoring histories, using visualization software
applications, for collaborative knowledge projects like Wikipedia, increases credibility

Transactions in GIS, 2012, 16(4)
The Annotation Process in OpenStreetMap 577

judgements and offers transparency. This can eventually lead to “improvements in the
interpretation, communication, and trustworthiness” (Suh et al. 2008) of collaboratively
generated knowledge. We feel that this could extend to include OpenStreetMap and other
VGI projects.

References

Anderka M, Stein B, and Lipka N 2011 Towards automatic quality assurance in Wikipedia. In
Proceedings of the Twentieth International Conference Companion on World Wide Web
(WWW’ 11), Hyderabad, India: 5–6
Antin J 2011 My kind of people?: Perceptions about Wikipedia contributors and their motivations.
In Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems
(CHI ’11), Vancouver, British Columbia: 3411–20
Ather A 2009 A Quality Analysis of OpenStreetMap Data. Unpublished M.Eng., Department of
Civil, Environmental and Geomatic Engineering, University College London
Ballatore A and Bertolotto M 2011 Semantically enriching VGI in support of implicit feedback
analysis. In Tanaka K, Fröhlich P, and Kim K-S (eds) Web and Wireless Geographical
Information Systems. Berlin, Springer Lecture Notes in Computer Science Vol. 6574: 78–93
Brando C and Bucher B 2010 Quality in user generated spatial content: A matter of specifications.
In Proceedings of the Thirteenth AGILE International Conference on Geographic Information
Science, Guimarães, Portugal
Budhathoki N, Bruce B, and Nedovic-Budic Z 2008 Reconceptualizing the role of the user of spatial
data infrastructure. GeoJournal 72: 149–60
Bulterman D C A 2004 Is it time for a moratorium on metadata? IEEE MultiMedia 11: 10–17
Cantador I, Konstas I, and Jose J M 2011 Categorising social tags to improve folksonomy-based
recommendations. Web Semantics: Science, Services and Agents on the World Wide Web 9:
1–15
Ciepluch B, Mooney P, Jacob R, and Winstanley A C 2009 Using OpenStreetMap to deliver
location-based environmental information in Ireland. In Proceedings of the Seventeenth ACM
SIGSPATIAL International Conference on Advances in Geographic Information Systems,
Seattle, Washington: 17–22
Coleman D J, Georgiadou Y, and Labonte J 2009 Volunteered geographic information: The nature
and motivation of producers. International Journal of Spatial Data Infrastructures Research 4:
332–58
Corcoran P and Mooney P 2011 Topologically consistent selective progressive transmission. In
Geertman S, Reinhardt W, and Toppen F (eds) Advancing Geoinformation Science for a
Changing World. Berlin, Springer Lecture Notes in Geoinformation and Cartography: 519–38
Corcoran P, Mooney P, and Winstanley A C 2011 Planar and non-planar topologically consistent
vector map simplification. International Journal of Geographical Information Science 25: in
press
Flanagin A J and Metzger M J 2008 The credibility of volunteered geographic information.
GeoJournal 72: 137–48
Geofabrik 2011a Data Downloads for OpenStreetMap Data. WWW document, http://
www.geofabrik.de/data/download.html
Geofabrik 2011b Mapcompare: Visual Comparison of Google Maps and OpenStreetMap. WWW
document, http://tools.geofabrik.de/mc/
Gilbert E and Karahalios K 2009 Using social visualization to motivate social production. IEEE
Transactions on Multimedia 11: 413–21
Girres J-F and Touya G 2010 Quality assessment of the French OpenStreetMap dataset. Transac-
tions in GIS 14: 435–59
Goodchild M F 2007 Citizens as sensors: The world of volunteered geography. GeoJournal 69:
211–21
Goodchild M F 2008 Commentary: Whither VGI? GeoJournal 72: 239–44
Haklay M 2010 How good is volunteered geographical information? A comparative study of
OpenStreetMap and Ordnance Survey datasets. Environment and Planning B 37: 682–703

Transactions in GIS, 2012, 16(4)
578 P Mooney and P Corcoran

Haklay M, Basiouka S, Antoniou V, and Ather A 2010 How many volunteers does it take to map
an area well? The validity of linus’ law to volunteered geographic information. Cartographic
Journal 47: 315–22
Hecht B J and Gergle D 2010 On the “localness” of user-generated content. In Proceedings of the
2010 ACM Conference on Computer Supported Cooperative Work (CSCW ’10), Savannah,
Georgia: 229–32
Jacob R, Mooney P, Corcoran P, and Winstanley A C 2010 Haptic-GIS: Exploring the possibilities.
In Proceedings of the Eighteenth ACM SIGSPATIAL Conference on Advances in Geographic
Information Systems (Volume 2), San Jose, California: 13–18
Javanmardi S and Lopes C 2010 Statistical measure of quality in Wikipedia. In Proceedings of the
First Workshop on Social Media Analytics (SOMA ‘10), Washington, D.C.: 132–38
Kessler C, Janowicz K, and Bishr M 2009 An agenda for the next generation gazetteer: Geographic
information contribution and retrieval. In Proceedings of the Seventeenth ACM SIGSPATIAL
International Conference on Advances in Geographic Information Systems, Seattle, Washing-
ton: 91–100
Kessler C, Trame J, and Kauppinen T 2011 Tracking editing processes in volunteered geographic
information: The case of OpenStreetMap. In Proceedings of Workshop on Identifying Objects,
Processes and Events in Spatio-Temporally Distributed Data (IOPE), Belfast, Maine
Korfiatis N, Poulos M, and Bokos G 2006 Evaluating authoritative sources using social networks:
An insight from Wikipedia. Online Information Review 30: 252–62
Langläufer 2011 The OSM History Browser Web Application. WWW document, http://
osm.virtuelle-loipe.de/history/
Liu D, Wang M, Hua X-S, and Zhang H-J (2011 Semi-automatic tagging of photo albums via
exemplar selection and tag inference. IEEE Transactions on Multimedia 13: 82–91
Ludwig I, Voss A, and Krause-Traudes M 2011 A comparison of the street networks of Navteq and
OSM in Germany. In Geertman S, Reinhardt W, and Toppen F (eds) Advancing Geoinforma-
tion Science for a Changing World. Berlin, Springer Lecture Notes in Geoinformation and
Cartography: 65–84
Mooney P and Corcoran P 2011a Accessing the history of objects in OpenStreetMap. In Proceed-
ings of the Fourteenth AGILE International Conference on Geographic Information Science,
Utrecht, The Netherlands
Mooney P and Corcoran P 2011b Annotating spatial features in OpenStreetMap. In Proceedings
of the Nineteenth Annual GIS Research UK Conference (GISRUK 2011), Portsmouth,
England
Mooney P and Corcoran P 2011c Can volunteered geographic information be a participant in
eEnvironment and SDI? In Hrebícek J, Schimak G, and Denzer R (eds) Environmental
Software Systems: Frameworks of eEnvironment. Boston, MA, Springer Advances in Infor-
mation and Communication Technology Vol. 359: 115–22
Mooney P, Corcoran P, and Winstanley A C 2010a A study of data representation of natural
features in OpenStreetMap. In Proceedings of the Sixth International Conference on Geo-
graphic Information Science (GIScience 2010), Zurich, Switzerland
Mooney P, Corcoran P, and Winstanley A C 2010b Towards quality metrics for OpenStreetMap.
In Proceedings of the Eighteenth ACM SIGSPATIAL International Conference on Advances in
Geographic Information Systems, Seattle, Washington: 514–17
Nemoto K, Gloor P, and Laubacher R 2011 Social capital increases efficiency of collaboration
among Wikipedia editors. In Proceedings of the Twenty-second ACM Conference on Hyper-
text and Hypermedia (HT ‘11), Eindhoven, The Netherlands: 231–40
OpenStreetMap 2011a The Map-features Page on OpenStreetMap: A Guide to the Recommended
Tagging of OSM Features. WWW document, http://wiki.openstreetmap.org/wiki/
Map_Features
OpenStreetMap 2011b OpenStreetMap Automated Import and Editing Code of Conduct. WWW
document, http://wiki.openstreetmap.org/wiki/Automated_Edits/Code_of_Conduct
OSM API 2011 OpenStreetMap Editing API for Fetching and Saving Raw Geodata From/to the
Global OpenStreetMap Database. WWW document, http://wiki.openstreetmap.org/wiki/API
(Online Wiki)
OSM History Viewer 2011 The OSM Changeset Visualiser Web Application. WWW document,
http://osmhv.openstreetmap.de/index.jsp

Transactions in GIS, 2012, 16(4)
The Annotation Process in OpenStreetMap 579

OSM Map Features 2011 The Map Features Page Guide to Tagging Features on OpenStreetMap.
WWW document, http://wiki.openstreetmap.org/wiki/Map_Features
OSM-Stats 2011 The OpenStreetMap Statistics Page. WWW document, http://www.
openstreetmap.org/stats/data_stats.html
Over M, Schilling A, Neubauer S, and Zipf A 2010 Generating web-based 3D city models from
OpenStreetMap: The current situation in Germany. Computers Environment and Urban
Systems 34: 496–507
Pirolli P,Wollny E, and Suh B 2009 So you know you’re getting the best possible information: A tool
that increases Wikipedia credibility. In Proceedings of the Twenty-seventh International Con-
ference on Human Factors in Computing Systems (CHI ‘09), Boston, Massachusetts: 1505–08
Prasarnphanich P and Wagner C 2011 Explaining the sustainability of digital ecosystems based on
the Wiki model through critical-mass theory. IEEE Transactions on Industrial Electronics 58:
2065–72
Qian X, Di L, Li D, Li P, Shi L, and Cai L 2009 Data cleaning approaches in Web2.0 VGI
application. In Proceedings of the Seventeenth International Conference on Geomatics,
Fairfax, Virginia: 1–4
Ramm F 2011 Displaying historic OpenStreetMap coverage. WWW document, http://
labs.geofabrik.de/history/
Riche N H, Lee B, and Chevalier F 2010 iChase: Supporting exploration and awareness of editing
activities on Wikipedia. In Proceedings of the International Conference on Advanced Visual
Interfaces (AVI ‘10), Rome, Italy: 59–66
Roick O 2011 OSMatrix: Visualisation of contributions to OSM. WWW document, http://
koenigstuhl.geog.uni-heidelberg.de/osmatrix/
Smart P D, Quinn J A, and Jones C B 2011 City model enrichment. ISPRS Journal of Photogram-
metry and Remote Sensing 66: 223–34
Stein K and Hess C 2007 Does it matter who contributes: A study on featured articles in the
German Wikipedia. In Proceedings of the Eighteenth International ACM Conference on
Hypertext and Hypermedia (HT ‘07), Manchester, United Kingdom: 171–74
Suh B, Chi E H, Kittur A, and Pendleton B A 2008 Lifting the veil: Improving accountability and
social transparency in Wikipedia with Wikidashboard. In Proceeding of the Twenty-sixth
International Conference on Human Factors in Computing Systems (CHI ‘08), Florence, Italy:
1037–40
Trame J and Kessler C 2011 Exploring the lineage of volunteered geographic information with heat
maps. In Proceedings of the GeoViz 2011: Linking Geovisualization with Spatial Analysis and
Modeling Workshop, Hamburg, Germany
van Exel M, Dias E, and Fruijtier S 2010 The impact of crowdsourcing on spatial data quality
indicators. In Proceedings of the Sixth International Conference on Geographic Information
Science (GIScience 2010), Zurich, Switzerland
Wallgrün J O, Wolter D, and Richter K-F 2010 Qualitative matching of spatial information. In
Proceedings of the Eighteenth ACM SIGSPATIAL International Conference on Advances in
Geographic Information Systems, San Jose, California: 300–09
Welser H T, Cosley D, Kossinets G, Lin A, Dokshin F, Gay G, and Smith M 2011 Finding social
roles in Wikipedia. In Proceedings of the 2011 iConference, Seattle, Washington: 122–29
Zielstra D and Hochmair H H 2011 Digital street data: Free versus proprietary. GIM International
25(7): 29–33
Zielstra D and Zipf A 2010a A comparative study of proprietary geodata and volunteered
geographic information for Germany. In Proceedings of the Thirteenth AGILE International
Conference on Geographic Information Science, Guimarães, Portugal
Zielstra D and Zipf A 2010b Quantiative studies on the data quality of OpenStreetMap in
Germany. In Fabrikant S I, Reichenbacher T, van Kreveld M, and Schlieder C (eds) Geo-
graphic Information Science: Proceedings of GIScience 2010. Berlin, Springer Lecture Notes in
Computer Science Vol. 6292: 20–26

Transactions in GIS, 2012, 16(4)
Copyright of Transactions in GIS is the property of Wiley-Blackwell and its content may not be copied or
emailed to multiple sites or posted to a listserv without the copyright holder's express written permission.
However, users may print, download, or email articles for individual use.

Maharashraboadbooks-Book For Class 12-Information Technology (Arts) - Open Street Map (OSM)
No ratings yet
Maharashraboadbooks-Book For Class 12-Information Technology (Arts) - Open Street Map (OSM)
8 pages
OpenStreetMap in GIScience Experiences, Research, and Applications (Jamal Jokar Arsanjani, Alexander Zipf Etc.)
No ratings yet
OpenStreetMap in GIScience Experiences, Research, and Applications (Jamal Jokar Arsanjani, Alexander Zipf Etc.)
324 pages
Ebook Module Tutorial - OpenStreetMap Spatial Data Collection Guideline
No ratings yet
Ebook Module Tutorial - OpenStreetMap Spatial Data Collection Guideline
168 pages
Advances in Cartography and GIScience. Volume 1
No ratings yet
Advances in Cartography and GIScience. Volume 1
576 pages
Jacques Alain Miller Marginalia
100% (2)
Jacques Alain Miller Marginalia
22 pages
Psionics Augmented - Compilation 2
100% (5)
Psionics Augmented - Compilation 2
88 pages
Gis Lab Report Group 1
No ratings yet
Gis Lab Report Group 1
25 pages
Mapping Fromsatellite Imagery: Nasaasira Christian
No ratings yet
Mapping Fromsatellite Imagery: Nasaasira Christian
109 pages
Practice - Visualization Using OpenStreetMap Data
100% (1)
Practice - Visualization Using OpenStreetMap Data
48 pages
2022 08 15 Study Evolution of The Osm Data Model
No ratings yet
2022 08 15 Study Evolution of The Osm Data Model
40 pages
Blooket Haks
33% (3)
Blooket Haks
77 pages
Survey of India Toposheets: Vaibhav Kalia Astt. Prof. Centre For Geo-Informatics
No ratings yet
Survey of India Toposheets: Vaibhav Kalia Astt. Prof. Centre For Geo-Informatics
28 pages
Unit3 Collecting Spatial Data Using OSM Intermediate Level
No ratings yet
Unit3 Collecting Spatial Data Using OSM Intermediate Level
91 pages
Sabrina Grimsrud MSC GIMA Thesis
No ratings yet
Sabrina Grimsrud MSC GIMA Thesis
110 pages
CSP Report
No ratings yet
CSP Report
34 pages
10.1515 - Geo 2019 0012
No ratings yet
10.1515 - Geo 2019 0012
12 pages
P1 Practical Exercise GIS221 2023
No ratings yet
P1 Practical Exercise GIS221 2023
11 pages
Automatic Conversion of OSM Data Into LULC Maps
No ratings yet
Automatic Conversion of OSM Data Into LULC Maps
19 pages
Collecting Spatial Data Using OpenStreetMap (OSM) (Beginner Level)
No ratings yet
Collecting Spatial Data Using OpenStreetMap (OSM) (Beginner Level)
84 pages
Openstreetmap:: User-Generated Street Maps
No ratings yet
Openstreetmap:: User-Generated Street Maps
7 pages
SHMOSM050 - Quality Evaluations On Canadian OSM Data
No ratings yet
SHMOSM050 - Quality Evaluations On Canadian OSM Data
7 pages
Ijgi 09 00531
No ratings yet
Ijgi 09 00531
18 pages
Topic02 GIS Basics
No ratings yet
Topic02 GIS Basics
28 pages
ChatGPT As A Mapping Assistant
No ratings yet
ChatGPT As A Mapping Assistant
13 pages
Arcgis Tools For Openstreetmap
No ratings yet
Arcgis Tools For Openstreetmap
24 pages
OpenStreetMap - Wikipedia
No ratings yet
OpenStreetMap - Wikipedia
86 pages
Extracting Toponyms From OpenStreetMap and Other Gazetteers: Comparing Representational Accuracy in Multilingual Contexts
No ratings yet
Extracting Toponyms From OpenStreetMap and Other Gazetteers: Comparing Representational Accuracy in Multilingual Contexts
17 pages
Advances in Cartography and GIScience. Volume1
100% (4)
Advances in Cartography and GIScience. Volume1
551 pages
Change Detection Algorithms in Urban Expansion
No ratings yet
Change Detection Algorithms in Urban Expansion
51 pages
OSM 4 WFP External
No ratings yet
OSM 4 WFP External
10 pages
Open Street Map - Lite - Open Layers - From Map To Web Presentation
No ratings yet
Open Street Map - Lite - Open Layers - From Map To Web Presentation
39 pages
Assessing Openstreetmap Data Using Intrinsic Quality Indicators: An Extension To The Qgis Processing Toolbox
No ratings yet
Assessing Openstreetmap Data Using Intrinsic Quality Indicators: An Extension To The Qgis Processing Toolbox
22 pages
Geological Mapping in The Web
No ratings yet
Geological Mapping in The Web
13 pages
317 4441 1 PB
No ratings yet
317 4441 1 PB
7 pages
Mil Q2 Module8-Final
100% (1)
Mil Q2 Module8-Final
22 pages
Qualitative Comparison of Geocoding Systems Using OpenStreetMap Data
No ratings yet
Qualitative Comparison of Geocoding Systems Using OpenStreetMap Data
10 pages
Using Openstreetmap Data To Generate Building Mode
No ratings yet
Using Openstreetmap Data To Generate Building Mode
6 pages
OpenStreetMap Challenges and Opportunities in Machine Learning and Remote Sensing
No ratings yet
OpenStreetMap Challenges and Opportunities in Machine Learning and Remote Sensing
16 pages
Open Street Map
No ratings yet
Open Street Map
17 pages
Geocoding: Maxbox Starter93 With Geocoding
100% (1)
Geocoding: Maxbox Starter93 With Geocoding
7 pages
GISRUK 2022 Paper 102
No ratings yet
GISRUK 2022 Paper 102
7 pages
OSM WB-GFDRR Presentation Reduced File Size
No ratings yet
OSM WB-GFDRR Presentation Reduced File Size
68 pages
Geographic and Style Models For Historical Map Alignment and Toponym Recognition
No ratings yet
Geographic and Style Models For Historical Map Alignment and Toponym Recognition
8 pages
POLARIS: The People Have Spoken
No ratings yet
POLARIS: The People Have Spoken
8 pages
Tutorial 1 - Data Acquisition - 2024!01!18
No ratings yet
Tutorial 1 - Data Acquisition - 2024!01!18
42 pages
Using KML For Thematic Mapping: MSC in Geographical Information Science 2008
No ratings yet
Using KML For Thematic Mapping: MSC in Geographical Information Science 2008
22 pages
Faculty of Built Environment and Surveying Department of Geoinformation
No ratings yet
Faculty of Built Environment and Surveying Department of Geoinformation
5 pages
Osm 123123123
No ratings yet
Osm 123123123
19 pages
Curious Cases of Corporations in OpenStreetMap
No ratings yet
Curious Cases of Corporations in OpenStreetMap
2 pages
AECOM Handbook 2023 21 30
No ratings yet
AECOM Handbook 2023 21 30
10 pages
ICMT2013 TvorbaTeM KovarikTalhofer
No ratings yet
ICMT2013 TvorbaTeM KovarikTalhofer
9 pages
Grinding (Lecture 3)
No ratings yet
Grinding (Lecture 3)
27 pages
Osm Data in Gis Formats Free
No ratings yet
Osm Data in Gis Formats Free
19 pages
Iop Spatial
No ratings yet
Iop Spatial
8 pages
ACCA Advanced Diploma in Accounting and Business
No ratings yet
ACCA Advanced Diploma in Accounting and Business
2 pages
GIS Final Practical Writeup
No ratings yet
GIS Final Practical Writeup
21 pages
Assessing The Accuracy of Crowdsourced Data and Its Integration Ith Official Spatial Data Sets
No ratings yet
Assessing The Accuracy of Crowdsourced Data and Its Integration Ith Official Spatial Data Sets
4 pages
Geospatial PDF Maps From OSM With GDAL
No ratings yet
Geospatial PDF Maps From OSM With GDAL
19 pages
The Accident by C. L. Taylor
No ratings yet
The Accident by C. L. Taylor
10 pages
Satyam Kumar Lab5
No ratings yet
Satyam Kumar Lab5
3 pages
Openstreetmap:: User-Generated Street Maps
No ratings yet
Openstreetmap:: User-Generated Street Maps
7 pages
Report Deliver Able 3 6 1
No ratings yet
Report Deliver Able 3 6 1
11 pages
Namdeo Dhasal, A Poet of The Underworld
No ratings yet
Namdeo Dhasal, A Poet of The Underworld
5 pages
Corpo Bar Qs
100% (7)
Corpo Bar Qs
15 pages
Geo Referencing Tutorial
No ratings yet
Geo Referencing Tutorial
3 pages
External Waterproofing Brochure 0
No ratings yet
External Waterproofing Brochure 0
13 pages
Micro Economics Notes
No ratings yet
Micro Economics Notes
28 pages
Chapter 2 - Classification of Business
No ratings yet
Chapter 2 - Classification of Business
22 pages
YPTA
No ratings yet
YPTA
1 page
Shannon Butler Resume
No ratings yet
Shannon Butler Resume
3 pages
United States v. Clarence Shamein Fitzgerald, 11th Cir. (2010)
No ratings yet
United States v. Clarence Shamein Fitzgerald, 11th Cir. (2010)
3 pages
Team M.A.V.S Food Truck Business Plan Draft1 1
No ratings yet
Team M.A.V.S Food Truck Business Plan Draft1 1
43 pages
Release Notes: Epicor 10.0.700.1
No ratings yet
Release Notes: Epicor 10.0.700.1
8 pages
Proposal For CRTP
No ratings yet
Proposal For CRTP
17 pages
Built-In Types - Python 3.11.4 Documentation
No ratings yet
Built-In Types - Python 3.11.4 Documentation
75 pages
Books Doubtnut Question Bank
No ratings yet
Books Doubtnut Question Bank
80 pages
Alternative Proposal 20160912 - Mtentu (Rev.2)
No ratings yet
Alternative Proposal 20160912 - Mtentu (Rev.2)
17 pages
Pro-Choice Violence in Massachusetts
No ratings yet
Pro-Choice Violence in Massachusetts
15 pages
Midea Group Case Study Reduced
No ratings yet
Midea Group Case Study Reduced
6 pages
HPC 1 Module 3
No ratings yet
HPC 1 Module 3
11 pages
ICTNWK559 Assessment Task 1
No ratings yet
ICTNWK559 Assessment Task 1
15 pages
King Brand and Bard II
No ratings yet
King Brand and Bard II
2 pages
Basic Economic Questions
No ratings yet
Basic Economic Questions
2 pages
Types of Distributed Systems (Map)
No ratings yet
Types of Distributed Systems (Map)
1 page
Evaluate the use of open data interface solutions
From Everand
Evaluate the use of open data interface solutions
LOOK AT EVERYTHING
No ratings yet
Transportation Management Land & Sea, Aviation and Infrastructure Concepts: Analyzing the influence of Covid on company processes
From Everand
Transportation Management Land & Sea, Aviation and Infrastructure Concepts: Analyzing the influence of Covid on company processes
BoD - Books on Demand
No ratings yet
OpenStreetMap
From Everand
OpenStreetMap
Jonathan Bennett
4.5/5 (2)
LOTED: a semantic web portal for the management of tenders from the European Community
From Everand
LOTED: a semantic web portal for the management of tenders from the European Community
Francesco Valle
No ratings yet
Introduction to Cognitive Science: Cognitive Processing of Visual Design Elements In Virtual Environments
From Everand
Introduction to Cognitive Science: Cognitive Processing of Visual Design Elements In Virtual Environments
Ben Posetti
No ratings yet
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet

SHMOSM056 - Annotation Process in OSM

Uploaded by

SHMOSM056 - Annotation Process in OSM

Uploaded by

bs_bs_banner

Transactions in GIS, 2012, 16(4): 561–579

The Annotation Process

Peter Mooney Padraig Corcoran

OpenStreetMap (OSM) is a collaborative project to create a free editable map of the

© 2012 Blackwell Publishing Ltd

of research investigation. OSM is based on the “wiki collaborative model”. Prasarn-

© 2012 Blackwell Publishing Ltd

2 Overview of Related Work

2.1 OSM in Peer-reviewed Literature

© 2012 Blackwell Publishing Ltd

© 2012 Blackwell Publishing Ltd

© 2012 Blackwell Publishing Ltd

2.2 OSM-driven Web-based Applications and Research

© 2012 Blackwell Publishing Ltd

3.1 Understanding the OSM Data

3.2 Selecting OSM Objects for Analysis

© 2012 Blackwell Publishing Ltd

Table 1 The distribution of version numbers of ways in the four OpenStreetMap

Versions Ireland United Kingdom Germany Austria

1 139,722 2,106,647 6,442,209 682,155

3.3 Heavily Edited Objects

© 2012 Blackwell Publishing Ltd

© 2012 Blackwell Publishing Ltd

UniqueTags UK% Germany% Austria% Ireland%

1,2 15.96 35.01 13.51 20.31

4.1 Tag Assignment

© 2012 Blackwell Publishing Ltd

Number Names UK Ireland Austria Germany

1 5,528 (76.6%) 299 (69.9%) 1,804 (70.2%) 3,950 (78.1%)

Date of Edit Version Name Contributor

2008-05-08 1 Oakthorp Drive 35691

Date of Edit Version Highway Contributor

2007-10-18 1 primary 16631

© 2012 Blackwell Publishing Ltd

Date of Edit Version Name Contributor

2007-06-14 1 A199 6871

Highway UK Ireland Austria Germany

1 4,999 (59.4%) 298 (50.5%) 1,110 (47.1%) 495 (54.8%)

© 2012 Blackwell Publishing Ltd

contributors may have conflicting understanding of similar descriptions such as “living-

4.2 Influence of Contributors

4.3 Adherence to OSM Controlled Vocabulary

© 2012 Blackwell Publishing Ltd

Database Attribute Compliance Observations

UK Landuse 39 (10) Spelling Errors

© 2012 Blackwell Publishing Ltd

Database Attribute Version #Objects #Compliant

UK Highway First 6,730 6,650

5 Conclusions and Future Work

© 2012 Blackwell Publishing Ltd

© 2012 Blackwell Publishing Ltd

© 2012 Blackwell Publishing Ltd

© 2012 Blackwell Publishing Ltd

© 2012 Blackwell Publishing Ltd

You might also like