Lesson 3. Data Models and Data Structures
Lesson 3. Data Models and Data Structures
Lesson 3
Data Models and Data Structures
Lesson 3: Data Models and Data Structures
Introduction
In order to visualize natural phenomena, one must first determine how to best
represent geographic space. Data models are a set of rules and/or constructs used
to describe and represent aspects of the real world in a computer. Two primary
data models are available to complete this task: raster data models and vector data
models.
This lesson discusses the different data models and you will be able to create your
own spatial data.
Learning Outcomes
Upon completion of this lesson, the students will be able to:
1. Discuss the different spatial data model.
2. Explain geodatabase and metadata.
3. Create a map with vector and raster data.
4. Perform heads-up digitizing.
5. Perform joining layers.
6. Perform interpolation technique.
ACTIVITY
Please refer to the attached activity.
Activity No. 4 Creating Vector Data
Activity No. 5 Importing CSV File into QGIS
ANALYSIS
1. Compare Raster and Vector Model in representing geographic features.
2. Illustrate raster and vector data by figures.
3. Why is snapping tool important?
4. How scale visibility matters in digitizing?
5. What do you think are the application of interpolation?
ABSTRACTION
A. Terminologies
Spatial data refers to the data or information that describes the absolute or relative
location of geographic features on the earth. The non-spatial data or the attribute
data on the other hand describes the characteristics of the spatial features. These
characteristics can be quantitative or qualitative. (Ex. spatial—a geographic feature
31 | P a g e
GIS 205 – GIS and Remote Sensing
such as a road; and non-spatial—information about that road, such as its name,
route number, classification, the number of cars on it, etc.)
Representation of Space
Burrough & McDonnell (1998) described two ways to represent the space (an area,
landscape or some bigger unit), which are as follows:
a) Discrete Entities: The space could be seen as occupied with entities that
are described by their properties and can be located on earth using
coordinate systems. The entities have a clear boundary. (Ex. Buildings,
roads, land parcels etc.)
b) Continuous fields: The variation of an attribute over the space as a
continuous field. No physical boundary can ever be observed in such case.
(Ex. Temperature, pressure, elevation etc. across an area)
Figure 12. The same data is represented differently in vector and raster
formats; the diagram also reflects the corresponding difference between
discrete and continuous data.
Data models are conceptual models of the real world. These describe us the
representation and storage of the geographic data. The data models used in GIS
are described below:
32 | P a g e
GIS 205 – GIS and Remote Sensing
Figure 13. In vector formats, points, lines, and polygons represent spatial features.
Note: A city can be marked as a single point on a world map but would be
marked as a polygon on a state map. The scale plays an important role in
deciding the geometry of a geographical feature.
33 | P a g e
GIS 205 – GIS and Remote Sensing
Figure 14. A close look at this raster of ocean depth shows that it is composed of square cells. Each cell holds
a numeric value indicating ocean depth.
Geographic entities encoded using the vector data model, are often called
features. The features can be divided into two classes:
a) Simple features
These are easy to create, store and are rendered on screen very quickly.
They lack connectivity relationships and so are inefficient for modeling
phenomena conceptualized as fields.
b) Topological features
A topology is a mathematical procedure that describes how features are
spatially related and ensures data quality of the spatial relationships.
Topological relationships include following three basic elements:
i. Connectivity: Information about linkages among spatial objects
ii. Contiguity: Information about neighboring spatial object
iii. Containment: Information about inclusion of one spatial object within
another spatial object
34 | P a g e
GIS 205 – GIS and Remote Sensing
Connectivity
Arc node topology defines connectivity - arcs are connected to each other if they
share a common node. This is the basis for many network tracing and path finding
operations.
Arcs represent linear features and the borders of area features. Every arc has a
from-node which is the first vertex in the arc and a to-node which is the last vertex.
These two nodes define the direction of the arc. Nodes indicate the endpoints and
intersections of arcs. They do not exist independently and therefore cannot be
added or deleted except by adding and deleting arcs.
Nodes can, however, be used to represent point features which connect segments
of a linear feature (e.g., intersections connecting street segments, valves
connecting pipe segments).
Arc-node topology is supported through an arc-node list. For each arc in the list
there is a from node and a to node. Connected arcs are determined by common
node numbers.
35 | P a g e
GIS 205 – GIS and Remote Sensing
Contiguity
Polygon topology defines contiguity. The polygons are said to be contiguous if they
share a common arc. Contiguity allows the vector data model to determine
adjacency.
The from node and to node of an arc indicate its direction, and it helps determining
the polygons on its left and right side. Left-right topology refers to the polygons on
the left and right sides of an arc. In the illustration above, polygon B is on the left
and polygon C is on the right of the arc 4.
Containment
36 | P a g e
GIS 205 – GIS and Remote Sensing
The polygon D is made up of arc 5, 6 and 7. The 0 before the 7 indicates that the
arc 7 creates an island in the polygon.
Simple Features
Point entities: These represent all geographical entities that are positioned by a
single XY coordinate pair. Along with the XY coordinates the point must store other
information such as what does the point represent etc.
Line entities: Linear features made by tracing two or more XY coordinate pair.
Simple line: It requires a start and an end point.
Arc: A set of XY coordinate pairs describing a continuous complex line. The
shorter the line segment and the higher the number of coordinate pairs, the
closer the chain approximates a complex curve.
Topologic Features
37 | P a g e
GIS 205 – GIS and Remote Sensing
A fully topological polygon network structure is built using boundary chains that are
digitized in any direction. It takes care of islands and lakes and allows automatic
checks for improper polygons. Neighborhood searches are fully supported. These
structures are edited by moving the coordinates of individual points and nodes, by
changing polygon attributes and by cutting out or adding sections of lines or whole
polygons. Changing coordinates require no modification to the topology but cutting
out or adding lines and polygons requires recalculation of topology and rebuilding
the database.
This topologic data structure manages information about the nodes that form each
triangle and the neighbors of each triangle.
Because points can be placed irregularly over a surface a TIN can have higher
resolution in areas where surface is highly variable. The model incorporates
original sample points providing a check on the accuracy of the model. The
information related to TIN is stored in a file or a database table. Calculation of
elevation, slope, and aspect is easy with TIN but these are less widely available
than raster surface models and more time consuming in term of construction and
processing.
39 | P a g e
GIS 205 – GIS and Remote Sensing
The TIN model is a vector data model which is stored using the relational attribute
tables. A TIN dataset contains three basic attribute tables: Arc attribute table that
contains length, from node and to node of all the edges of all the triangles.
40 | P a g e
GIS 205 – GIS and Remote Sensing
Node attribute table that contains x, y coordinates and z (elevation) of the vertices
Polygon attribute table that contains the areas of the triangles, the identification
number of the edges and the identifier of the adjacent polygons.
Storing data in this manner eliminated redundancy as all the vertices and edges
are stored only once even if they are used for more than one triangle. As TIN stores
topological relationships, the datasets can be applied to vector based
geoprocessing such as automatic contouring, 3D landscape visualization,
volumetric design, surface characterization etc.
In a simple raster data structure, the geographical entities are stored in a matrix of
rectangular cells. A code is given to each cell which informs users which entity is
present in which cell. The simplest way of encoding a raster data into computers
can be understood as follows:
(a) Entity model: It represents the whole raster data. Let us assume
that the raster data belongs to an area where land is surrounded by
water. Here a particular entity (land) is shown in green color and the
area where land is not present is shown by white.
(b) Pixel values: The pixel value for the full image is shown. Cells
having a part of the land are encoded as 1 and others where land is
not present are encoded as 0.
The huge size of the data is a major problem with raster data. An image consisting
of twenty different land-use classes takes the same storage space as a similar
raster map showing the location of a single forest. To address this problem many
data compaction methods have been developed which are discussed below:
Run length encoding
o Reduction of data on a row by row basis
o Stores a single value for a group of cells rather than storing values
for individual cells
o First line represents the dimension of the matrix (8×8) and the
number of entities (1) present. In second and subsequent lines, the
first number in the pair represents absence (0) or presence (1) of the
41 | P a g e
GIS 205 – GIS and Remote Sensing
Block encoding
o Data is stored in blocks in the raster matrix.
o The entity is subdivided into hierarchical blocks and the blocks are
located using coordinates.
o The first cell at top left hand is used as the origin for locating the
blocks
o Ex. Instead of storing 64 grid cells, all it takes is just 7 blocks. Using
block coding, it requires one 3×3 block, two 2×2 blocks and four 1×1
cell blocks to encode this raster image. In this block coding example,
the top-left corner is used as a reference for each block.
Chain encoding
o Works by defining boundary of the entity i.e. sequence of cells
starting from and returning to the given origin
o Ex. We start at position (5,2). From here we define the border using
cardinal directions and number of movements. We move east 3
positions until we hit the edge. At this location, we move south 4
positions. This process continues until the end point hits the start
point.
o Note: Only for the purpose of this exercise, we used north, east,
south and west as alphabetical values. When encoded, it is a
numerical value.
42 | P a g e
GIS 205 – GIS and Remote Sensing
Quadtree
o A raster is divided into a hierarchy of quadrants that are subdivided
based on similar value pixels.
o The division of the raster stops when a quadrant is made entirely
from cells of the same value.
o A quadrant that cannot be subdivided is called a leaf node.
A satellite or remote sensing image is a raster data where each cell has some
value and together these values create a layer. A raster may have a single layer
or multiple layers. In a multi-layer/ multi-band raster each layer is congruent with
all other layers, have identical numbers of rows and columns, and have same
locations in the plane. Digital elevation model (DEM) is an example of a single-
band raster dataset each cell of which contains only one value representing surface
elevation.
43 | P a g e
GIS 205 – GIS and Remote Sensing
A satellite image can have multiple bands, i.e. the scene/details are captured at
different wavelengths (Ultraviolet- visible- infrared portions) of the electromagnetic
spectrum. While creating a map we can choose to display a single band of data or
form a color composite using multiple bands. A combination of any three of the
available bands can be used to create RGB composites. These composites
present a greater amount of information as compared to that provided by a single
band raster.
44 | P a g e
GIS 205 – GIS and Remote Sensing
Geodatabase
Geodatabase supports various elements of GIS such as attribute data, CAD data,
geographic features, satellite and aerial images, GPS data and survey
measurements. These types of data can be represented as data objects viz.
annotation, dimension, feature class, geometric network, raster dataset, tables,
topology, relationship class etc. Geodatabase design is based on a fundamental
step of GIS design which involves organizing geographic information into a series
of data themes then specifying the content and representation of the thematic
layers. Advance capabilities (network, topology, subtypes etc.) are added later to
the geodatabase to model GIS behavior and maintain data integrity. Other key
properties of geodatabase design include definition of coordinate properties and
spatial properties, tolerances, coordinate resolution and metadata documentation
for each dataset.
Metadata
Need of metadata
To enable the process of search over distributed archives: Similar to a
library catalog, it sorts data and makes it easy for a user to find it.
Helps assessing the fitness of a dataset for a given use: Metadata is needed
to determine whether a dataset will satisfy a user’s requirement. Does the
45 | P a g e
GIS 205 – GIS and Remote Sensing
data have acceptable quality? It may also have comments from previous
users.
Provides information about data content: In the case of remotely sensed
images, it may include the percentage of cloud obscuring the scene and
some other information.
Provide information about handling the dataset: It includes technical
specification of the data format, software compatible with the data, data
volume etc.
The most widely used standard for metadata is the US Federal Geographic Data
Committee’s Content Standards for Digital Geospatial Metadata (CSDGM).
CSDGM describes the items that should be present in a metadata archive but
doesn’t prescribe the format to present it. Developers implement the standards that
suit their own ways but make sure that the implementations are interoperable i.e.
can be understood by other.
46 | P a g e
GIS 205 – GIS and Remote Sensing
Spatial features may change over time in terms of space and the content. The
changes could be geometrical (change in geometry of features), positional (change
in position of features), or a change in attributes of the features. When changes in
locations of a group of objects are observed together, the changes in the spatial
distribution pattern of the objects can be deciphered.
One may analyze the temporal data sets to monitor the changes that are
happening over the time. Though with time, a lot of things undergo changes but
monitoring the changes must be done prudently as it involves huge investment of
resources. The monitoring intervals must be fixed in a manner that captures the
change in the spatial phenomena and at the same time it must remain efficient and
viable.
The effect of urbanization on the land use of an area can be monitored by a change
detection analysis that makes use of temporal satellite images and GIS to
determine the nature, extent and rate of land cover change and fragmentation over
time and space. Temporal GIS studies are quite popular in the field of forest
conservation and management. One of the studies described the monitoring of
deforestation in a land resource inventory project in Nepal where within an interval
of 30 years (1950-1980) 50% of the forest land was lost to shrub and agriculture.
Similar, temporal studies are carried out for various sectors of natural resources
management such as biodiversity, water; land/soil etc. where considering the
future needs, making a balance between consumption and availability of the
natural resources is of utmost importance.
APPLICATION
Please refer to the attached activity.
Activity No. 4 Creating Vector Data
Activity No. 5 Importing CSV File into QGIS
Closure
You have finished with the concept of data models and data structures. You were
able to create a map with different type of data models by performing a heads-up
digitization and interpolation. In the next lesson, we will explore more on spatial
data inputting and editing.
47 | P a g e