Data Models and Structures Compact
Data Models and Structures Compact
Data Models and Structures Compact
One can use five components to describe data in a GIS: model, structure, type, format & space.
Model: representation/conceptual understanding of geographic space.
Structure: how data (and the model) is represented/stored in the computer (eg vector, raster). To confuse the issue, this
is often also called the data model.
Type: the nature of the feature represented by the data model and structure. Defined most by what valid numerical
manipulations can be applied. Nominal, ordinal, interval, ratio, cyclic.
Format: down at the numerical level on the computer – integer or floating point?
Space: What does the data represent, e.g. taxonomic, spectral, topographic etc. Data models are fairly inseparable from
data structures, and so both are covered here.
An alternative approach (and the one used in Longley et al. 2001) is a progression from:
reality -> conceptual model -> logical model -> physical model
These are analogous to the first system (which is from Burrough & McDonnell 1998) in terms of:
reality -> data model -> data structure -> data format & type (data spaces are a slightly separate concept)
Both approaches are valid, and it does not really affect things which one you use so long as the context is clear. A model is an
abstraction, or simplification, of reality. It summarises complexity in a way we can understand and, in the GIS case, represent in a
computer. Consider how you represent information available to you when you deal with it.
Interactive time - we drew a topographic map using volunteers. Audience nominated things like:
· elevation
· buildings and other structures
· roads
· vegetation
Think of what sort of variation there was. Looking for continuous versus discrete. The data model is the model used for the data. Is
it discrete or continuous? This relates to the perception of space. Both are used in the topographic map example.
Entity/object (cartographic, discrete) data model. Everything is represented on the map. Distinct units where variation is
assumed to be zero, or where there is an assumed distribution within (relates to scale).
Field (continuous) data model. Assumes variation is continuous in both geographic and data dimensions. Objects vs fields
may be summarised as "Where is something?" vs "What is here?"
Objects: Where are the houses? Where is a patch of E. Maculata woodland?
Fields: What is the elevation at some location? (eg. where my house is). Derived by inspecting the value of the field at the
relevant location, may need some mental interpolation.
A data model in a GIS needs to be a formal, mathematical and repeatable representation of the world.
This is mostly so someone who has not measured the phenomenon can understand what is being represented.
Objects can be represented using a boolean system, which is a traditional form of logic where 1 is true and 0 is false.
Given this we can have a simple map of something like "houses with more than three bedrooms" or "forested areas".
o This can then be extended to maps with more than two (presence/absence) classes simply by their being
several exclusive representations.
There is no overlap, and special classes must be defined to cope with exceptions (eg Rainforest ecotone).
Fields can be represented by mathematical functions, but in many cases real variation is too complex to be represented
using a single function.
o This is where data structures become important (see below).
Descriptions of Distance.
Data models in GIS relate most to the spatial aspect of the data. This is described by the distance between objects, or the distance
covered by a field.
Relative distance and location - elements are arranged such that they occur in the correct sequence, but that the distance
between each unit is not the same. Good examples of this are
o train routes, in particular the London underground.
o Other examples include maps you might draw to direct people from one place to another, and mental maps.
Exact (metric) distance and location are what we use on standard maps. Locationsare spaced on the map as they are on
the ground.
Both measures are applicable with a GIS, but they allow differing applications because of the topological connections they model.
Moving between data models.
o It is easy to move from fields to objects, as all that need be done is to define some thresholds. This is the common case of
classification.
o An example of this might be deriving a vegetation classification from satellite spectral data using an
unsupervised clustering algorithm.
With this you are breaking a series of continuous responses in spectral space into a set of 30-50
clusters which are summarised by their means and standard deviations.
These clusters are then further aggregated into fewer units of the vegetation types you are using.
o Going the other way is another matter. Once you have aggregated continuous data values into a set of discrete objects,
you cannot confidently return them to continuity.
o Aggregation is a one way street.
Data Structures to Represent Data Models.
Fields are difficult to represent in a GIS database because of their continuity.
What is usually used is a compromise between using discrete objects (or spatial units) and the need to maintain
continuity.
The easiest means of doing this is by using very small spatial unit.
o This also depends partly on what is being represented, as roads, for example, are continuous units and are
adequately represented by lines on the map.
There are no such problems for objects.
Most data structures use (geo)graphical primitives to describe variation. Irregular vs regular, lines vs tessellation (tiling),
vector vs raster.
Vector Data Structures.
Point, line, area defined by XY coordinates.
Irregular shapes built from groups of these shapes.
Points make lines make areas.
Polygons are commonly used, as file sizes are small where polygons are large because you only need the bounding line to
define them.
o However, they are poor for representing continuity.
TINs (triangulated irregular networks) allow the representation of continuity to a certain scale where large triangles are
used where variation is low, small triangles where variation is high.
o Can be difficult to access data algorithmically.
Raster Data Structures.
Regular shaped units of same size.
Most often square.
Distributed evenly across the dataset.
Triangular, square, hexagonal matrix of tiles.
o Raster = square pixels (for pedants).
Location defined by cell size and how many cells from an origin point you are.
Very easy to derive algorithms using raster data, but it can take up a huge amount of disk space.
Which Data Structure Should I Use?
It depends on the data model being used and the type of data represented. Also on how much disk space you have. You will note
there is a degree of overlap, and it depends on the detail you need as to which one you will use.
Fields:
· isolines (contours)
· vector networks (traffic, drainage flows)
· choropleth maps (eg soils maps)
· TINs (elevation, anything else)
· rasters (just about anything with right cell size)
Objects:
· vector (less storage space required)
· choropleth maps
· TINs
· rasters.
Advantages:
Raster is simple to analyse so can run very sophisticated analyses, simple software, intuitive data structure.
Vector is a precise representation of original data, can be converted to any grid cell size, small datasets for object data model.
Disadvantages:
Raster loses spatial resolution and precision (how much depends on cell size), poor representation of points and lines, constant cell
size (redundancy where little variation), large storage requirements.
Vector requires sophisticated encoding and display equipment, complex to analyse, large datasets when representing extreme
variation (fields).
Summary.
You need to consider the nature of what you are looking at and what you need to do to model it and combine it with other
information. Is it continuous variation? Is it a collection of objects?