Background: Title: Creator: Date
Background: Title: Creator: Date
Background
The National Statistics Postcode Directory (NSPD) relates both current and terminated postcodes in
the United Kingdom to a range of current statutory administrative, electoral, health and other area
geographies. It helps support the production of area based statistics from postcode data and is the
fundamental link file between different core geographies. The NSPD is produced by ONS
Geography, which provides geographic support to the Office for National Statistics (ONS) and
geographic services used by other organisations. The NSPD is issued quarterly and is available to
the academic community via UKBORDERS.
Linked Data is currently increasing its importance in the UK – the “Smarter Government” White
Paper aims to make all UK government data that is released through data.gov.uk, available as
Linked Data by July 2011. Indeed, the UK government is committed to publishing data as linked
data because they are convinced it is the best approach available for publishing data in a hugely
diverse and distributed environment, in a gradual and sustainable way.
One of the pervasive aspects to all data likely to be published (whether linked or not), is that
geography provides a common referencing system. Ordnance Survey Research has published and is
developing a Linked Data ontology of the Administrative Geography of Great Britain based upon
the OS BoundaryLine product. The Office of National Statistics is publishing URI sets
corresponding to many of the code lists used in the NSPD (identifying strategic health authority
areas, local government areas, census output areas, etc.) Key parts of the 2011 census are likely to
be published as Linked Data.
Project Plan
Three weeks of data modelling investigation and research, led by Yin Chen. The first week will be
spent identifying the scope of the work, plotting out future directions in which it could go.
Investigation of the current data.gov.uk stores of Linked Data, establishing which ontologies –
vocabularies, or conceptual schemas, are currently being used.
Where appropriate namespaces cannot be identified, we will mint our own and attempt to engage
the UK government data developer community with the work in progress.
Overlapping the modelling stage of the project will be three weeks of development work, building a
Linked Data store and simple web interface for browsing the NSPD and for publishing new URI
sets. An OpenLayers-based visualisation of the Linked Data NSPD which will draw on data sources
from Unlock. Time allowing, this can evolve into a “mashup” interface drawing on the SPARQL
stores from the data.gov.uk project.
On our project blog we will document decisions made and paths not taken, which will be used to
engage potential communities of users and gain feedback on work in progress. One aim is to benefit
other developers and researchers by leaving “breadcrumb trails” for future investigation
- Issues around future maintenance of the resource including data versioning
- Identifying the benefits to data user communities
- Assessing whether Linked Data really is a good fit for this kind of resource, and identifying
the limitations in the approach
- Investigating native RDF store vs RDF to RDBMS mapping approaches
There should be broader applicability to other EDINA projects and services. Foremost is the
conceptual space between UKBORDERS and Unlock – the potential for automated combination
and visualisation of diverse data sources, some of which was suggested by DIAD.
This will be an opportunity to establish basic infrastructure which can be re-used to enhance other
EDINA projects which may benefit from publishing Linked Data, particularly MediaHub.
Benefits
− Learning experience for staff in fashionable subject area
− Opportunity to research (and disseminate conclusions) in the implicit information
architecture of data.gov.uk and provide practical insight into how academia can contribute to
and re-use results of
− Service and published research helping to establish EDINA as “on the map” regarding
development of the UK's more general data infrastructure
− Adding valuable new relations to broader web of data
− Opportunity to become de facto source of authority for URI namespaces that have not been
formally developed as part of the data.gov.uk programme, giving EDINA a stake in future
development
Dissemination
Depending on the data sets released under an open license on April 1st 2010. If, as is now widely
expected, the Ordnance Survey products BoundaryLine and CodePoint are released for free re-use,
it will be possible to reconstruct a dataset equivalent to the NSPD. It will also be possible to make
explicit links to geometries available via Unlock.
Access to UKBORDERS requires a set of sub-license agreements to use data licensed by ESRC.
Access is therefore restricted to Shibboleth-authenticated sessions. Programs for consuming Linked
Data cannot at present negotiate Shibboleth authentication. The WSTIERIA work may help in
future to manage access to Linked Data where authentication is a requirement.
Ordnance Survey Research have taken the approach of publishing Linked Data without geometries
attached, but nevertheless annotated with properties which allow anyone with the right data and
license to reconstruct the geometries (e.g. the same unique names that are used in BoundaryLine
shapes).
Costs
- 3 weeks information modelling and research
- 3 weeks software engineering
- 1 day infrastructure support
- 2 days graphic and interface design
- 3 days project management and editing
Detailed costs FTE are described in the accompanying spreadsheet.
Opportunity cost
3 weeks FTE software engineering will come from Unlock’s staff allocation. Establishment of an
RDF store and linking to data in the Unlock service is an item on Unlock’s operational plan. This
project is an opportunity to begin this effort in a constrained way and focus on where Linked Data
can enhance both Unlock and UKBORDERS. If anything this is an opportunity gain.
3 weeks data modelling and scoping research – this may involve the postponement of work on
AgMap. [Yin to confirm timing, which Joe’s involvement then depends on]
References
http://statistics.data.gov.uk/
http://blog.dbtune.org/post/2008/05/20/Ceriese:-RDF-translator-for-Eurostat-data