0% found this document useful (0 votes)
51 views65 pages

Chapter - 2 - Geographic Information and Spatial Data Types

Chapter two Geographic Information and Spatial data types

Uploaded by

chebsi abdella
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views65 pages

Chapter - 2 - Geographic Information and Spatial Data Types

Chapter two Geographic Information and Spatial data types

Uploaded by

chebsi abdella
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Geographic information

and spatial data types

By

Gebremedhin Gebremeskel Haile (PhD)

June 2021
Geographic information and spatial data
types

❖ Contents
▪ Geographic phenomena def.
▪ Types of geographic phenomena
▪ Computer representation of geographic information
Introduction

❖ Geographic phenomena is the study objects of the field of


GIS.
❖ GIS facilitate such study because it represent these
phenomena digitally in a computer.
Introduction…cont.

❖ There is a difference b/n the real world, GIS/computer


world and the simulation world.
❖ Representing the real world by computers is an expertise
by itself.
❖ The representation is done:
❖ By direct observations using sensors and digitizing the
sensor output for computer usage (Remote Sensing).
❖ Digitizing paper map (indirect method).
Introduction…cont.

❖ Digitizing
◼ The process of converting the geographic features on an

analog map into digital format using a digitizing tablet, or


digitizer, which is connected to a computer.
◼ Features on a paper map are traced with a digitizer puck, a

device similar to a mouse, and the x, y coordinates of these


features are automatically recorded and stored as spatial data.
Geographic phenomena

❖ It should be clear that GIS is a software that allow us to


analyze geographic phenomena and understand them
better.
❖ Geographic phenomena is a manifestation of an entity or
processes of interest that:
▪ Can be named or described;
▪ Can be geo-referenced;
▪ Can be assigned a time (interval) at which it is/was
present.
Geographic phenomena…cont.

❖ E.g. water management:- objects of study are:


▪ Measurement of actual evapo-transpiration,
▪ Metrological data,
▪ Measurement of total water use.

❖ All of the above phenomena can be:


▪ Named/described (what it is),
▪ Georeferenced (where it is),
▪ Provided with a time interval at which each exists (when it
happened).
Geographic phenomena…cont.

❖ Not all things come with triplex; e.g. legal document in a cadastral
system.
▪ Its position in space is considered irrelevant.
❖ If the description is missing it is not interesting in GIS.
❖ If time interval is missing it is usually assumed always there (infinite
time interval).
Geographic phenomena…cont.

❖ How we could represent Geographic phenomena for both


resource management (what/where) and modeling (what
if) analysis?
▪ A geographical phenomena can be represented in various
ways and the choice of representation depends on 2 issues:
• Type of raw data;

• Type of data manipulation


Types of Geographic phenomena

❖ There are two types of geographic phenomena:


Geographic field,

▪ Geographic object.
❖ What is a geographic field?
▪ A field is a geographic phenomenon for which, for every point in
the study area, a value can be determined.
• E.g. Temperature, Elevation.
❖ What is a geographic object?
▪ Geographic objects populate the study area, and are usually well
distinguished, discrete and bounded entities. The space between
them is potentially empty or undetermined.
• E.g. Land use, soil classification.
Geographic fields

❖ A field is a geographic phenomenon that has a value


everywhere in the study area.
❖ Field is a mathematical function ƒ, where if (x, y) is a
position in an area, then ƒ(x, y) is the value of field ƒ at
locality (x, y).
❖ Fields can be classified as:
▪ Continuous, e.g. air temperature, soil salinity, elevation.
▪ Discrete, e.g. geological classes, soil type.
Geographic fields…cont.

❖ Continuous field
▪ Elevation of Falset, Spain.
Geographic fields…cont.

❖ Discrete field
▪ Geologic unit

❖ Discrete fields are a step from continuous field towards geographic


objects (both uses bounded features).
Geographic objects

❖ Geographic objects are easily distinguished & named.


❖ Their position in space is determined by a combination of one
or more of the following parameters:
▪ Location (where is it?)
▪ Shape (what form is it?)
▪ Size (how big is it?)
▪ Orientation (in which direction is it facing?)
Geographic objects

❖ Dimension answers the question whether an object is perceived as a


point, a line, area or volume feature.
❖ Geographic objects is studied not in isolation but whole collections of
objects viewed as a unit.
❖ Different objects do not occupy the same location.
❖ Boundaries
▪ When geographic object has shape and size boundary has a role
to differentiate objects.
▪ Two types of boundaries:
• Crisp boundary
• Fuzzy boundary
Progress of geographic data models

◼ A geographic data model is an abstraction of the real


world that employs a set of data objects.

◼ Three generations of geographic data models


◼ CAD data model (1960s and 1970s),

◼ Coverage data model (1980s),

◼ Geo-database data model (1990s).


CAD data model
◼ The very first computerized mapping systems drew vector maps with lines
displayed on cathode ray tubes and raster maps using over printed
characters on line printers.
◼ From this genesis, the 1960s and 1970s saw the refinement of graphics
hardware and mapping software that could render maps with reasonable
cartographic fidelity.
◼ In this era, maps were usually created with general purpose CAD
(computer-aided design) software.
◼ The CAD data model stored geographic data in binary file formats with
representations for points, lines and areas.
◼ Scant information about attributes was kept in these files; map layers and
annotation labels were the primary representation of attributes.
Coverage data model
◼ In 1981 ESRI introduced its first commercial GIS software
called ArcInfo, which implemented a second-generation
geographic data model, the coverage data model (also known
as the geo-relational data model).
Coverage…cont’
◼ Coverage model has 2 key facets:
1. Spatial data is combined with attribute data. The spatial data is
stored in indexed binary files, which are optimized for display and
access. The attribute data is stored in tables with a number of rows
equal to the number of features in the binary tables and joined by a
common identifier.
2. Topological r/ns b/n vector features can be stored. This means that
the spatial data record for a line contains information about which
node delimit that line, and by inference, which lines are connected;
it also contains information about which polygons are on its right
and left sides.
Coverage…cont’
◼ Limitation
◼ Feature are aggregated into homogeneous collections of points,
lines and polygons with generic behavior (a line represent both
road and stream with same behavior and special behavior are not
represented).
◼ To overcome the problem developers add these behaviors
through macro code written in the Arc Macro Language
(AMLTM).
◼ However, as applications become more complex, it become
apparent that a better way to associate behavior with feature
was needed.
◼ The time had come for a new model with an infrastructure to
tightly couple behavior with features.
Geo-database data model
◼ ArcInfo 8 introduces a new OO data model called
geodatabase data model.
◼ The 3 key hallmarks object orientation (polymorphism,
encapsulation and inheritance) used in this model.
◼ The defining purpose of this new data model is to let you
make the features in your GIS datasets smarter by
endowing them with natural behaviors, and to allow any sort
of r/p to be defined among features.
Geo-database data model…cont’
◼ The geodatabase data model brings a physical data closer to its logical
data model (such as owners, buildings, parcels, and roads).
◼ This model lets you implement the majority of custom behaviors without
writing any code.
◼ Most behaviors are implemented through domains, validation rules and
other functions of the frame work provided in ArcInfo.
◼ Writing software code is only necessary for the more specialized behavior
of features.
Computer representation of geographic
information
❖ We have seen in introduction that the real world should
be represented in our computer.
❖ When representing the real world in our computer, such
as elevation, we can either:
▪ Store as many pairs as possible (location, elevation),
▪ Find a symbolic representation of the elevation field
function as a formula in x and y (3x2 + 2x -7y), in which
after evaluation will give us the elevation value at any given
(x, y).
❖ In GIS a combination of both approaches is used.
Computer representation…cont.

❖ A finite intelligently chosen set of locations with their elevation are


stored.
❖ The stored values are paired with an interpolation function to infer a
reasonable elevation value for locations that are not stored.
❖ The underlying principle is called spatial autocorrelation: locations
that are close are more likely to have similar values than locations
that are far apart.
❖ In GIS, fields are usually implemented with tessellations approach
and objects with a (topological) vector approach.
Computer representation…cont.
Computer representation…cont.

❖Modeling
Raster representation
❖ Regular tessellation

❖ Square (pixel), hexagonal and triangular cells


❖ A tessellation is a partition of space into mutually
exclusive cells that together make up the complete
study space.
❖ E.g. Landsat image
Regular tessellation

❖ With each cells thematic value is associated to


characterize that part of space.
❖ The cells are the same in size and shape.
❖ The smaller the area of land that each cell represents,
the higher the resolution of the data and the larger the
file needed to store the data.
Regular tessellation…cont.

❖ The size of the file increases rapidly with resolution.


▪ E.g.
1. If a cell represents a 250x250m area on the ground, then how many
cells are needed to represent 1k on the ground?
2. How many cells are needed to represent 1 x 1 km, if a cell
represents a 250x250m area on the ground?
Regular tessellation…cont.

❖ Most commonly used regular tessellation is square tessellation.


❖ The size of the area that a single raster cell represent is called the
raster resolution.
❖ Interpolation technique is applied to assign a value to each cell.
❖ The location associated with a raster cell is fixed with convention;
▪ Cell centroid or
▪ Left lower corner
❖ In raster many of the cells may contain the same value as
neighboring cells, significantly increasing the file size.
Irregular tessellation

❖ Quadtree
Irregular tessellation…cont.

❖ Regular tessellations are not adoptive to the phenomena


they represent.
❖ Effort has been put in irregular tessellations.
▪ Irregular tessellations partition space into mutually disjoint
cells, but now the cells may vary in size and shape adopting
to the spatial phenomena that they represent.
▪ Irregular tessellations are more complex than regular
tessellation.
▪ Irregular tessellations are more adoptive, which help to
reduce the amount of memory used to store the data.
Irregular tessellation…cont.

❖ Quadtree is based on a regular tessellation of square cells,


but takes advantage of cases where neighboring cells have the
same field value, so that they can together be represented as
one bigger cell.
❖ Quadtree is constructed by repeatedly splitting up the area
into 4 quadrants (NW, NE, SE, SW).
❖ This procedure stops when all cells in quadrant have the
same field value.
❖ A higher level of resolution is provided only where it is
needed.
Irregular tessellation…cont.

❖ Using the quadtree a coarse resolution is used to encode large


homogeneous areas.
❖ A finer resolution is used for areas of high spatial variability.
❖ The procedure produces an up-side down tree-like structure called
quadtree.
❖ Quadtree are adoptive because they apply the spatial autocorrelation
principle.
Summary on tessellations

❖ Tessellations cut up the study space into cells and assign a


value to each cell.
❖ The study space is cut up arbitrarily and cell boundaries
usually may not be the real world phenomena.
❖ Tessellations do not explicitly store georeferences of the
phenomena but provide for the lower left corner of the raster
and implicitly providing georeferences for all cells in the
raster.
Vector representation

❖ Vector representations explicitly associate geo-references


with geographic phenomena.
❖ A georeference is a coordinate pair from some geographic
space and is called vector.
Vector representation…cont

❖ Vector data model provides for the precise positioning of feature in space.
❖ Vector representations could be done using:
▪Triangulated Irregular Network (TIN),
▪ Point,
▪ Line,
▪ Area.
❖ TIN can be used to represent any continuous field.
▪ TINs use a feature data model to represent surfaces.
▪ TIN is the standard implementation technique for DTM.
▪ TIN is built from a set of locations for which we have a measurement,
for instance an elevation.
▪ The locations can be arbitrarily scattered in space and are usually not
in a regular grid.
▪ Any location together with its elevation value can be viewed as a point
in 3D space.
TIN

❖ TINs represent a surface as a set of irregularly located points linked to


form a network of triangles with z-values stored at the nodes.

❖ Since the nodes can be placed irregularly over the surface, the resolution
can be adjusted according to homogeneity and heterogeneity of the area.
TIN…cont.

❖ Using a TIN model terrain parameters, such as,


elevation, slope and aspect are calculated for
each location in an area.
❖ Where the elevation values are an irregular set of
data points, a regular grid of points can be
generated by estimating values for the cells that
do not contain data points.
❖ The processes of estimating the value for missing
points is called interpolation.
TIN…cont.

❖ From the 3D points it is


possible to construct an
irregular tessellation made of
triangles.
TIN…cont.

❖ In 3D space, 3 points uniquely determine a plane, as longs as they are not


collinear.
❖ A plane fitted through these points has a fixed aspect and gradient and can be
used to compute an approximation of elevation of other locations.
❖ There are many d/t tessellations for a given input set of anchor points.
❖ Some tessellations are better than others in that they make smaller errors of
elevation approximation.
❖ The triangle which give best estimation or optimal estimation is called Delaunay
triangulation.
TIN…cont.

❖ Two important properties of Delunay triangle


▪ The triangles are equilateral,
▪ For each triangle the circumcircle through its 3 anchor
points does not contain any other anchor point.
Point representation

❖ Point are defined as single coordinate pairs (x, y) in 2D or


coordinate triplets in 3D.
❖ Points are used for objects with shapeless or size less
single locality features.
Line representations

❖ Line data are used to represent one dimensional object, such as roads,
railroad, canals, rivers and power lines.
❖ In some software line can be called arc or edge.
❖ The straight parts of a line b/n 2 consecutive vertices or end nodes
are called line segments.
❖ GISs store a line as a simple sequence of coordinates of its end nodes
and vertices (assuming all segments are straight).
❖ Collections of (connected) lines may represent
phenomena that are best viewed as net works.
Area representations

❖ Area objects are stored using a vector approach, the usual technique
is to apply a boundary model.
❖ Each area feature is represented by some arc/node structure that
determines a polygon as the area’s boundary.
❖ Areas are represented by their boundaries and each boundary is a
cyclic sequence of line features.
❖ But this is a huge data and difficult to make analysis.
Area representations…cont.

❖ Boundary model is an improved model for reducing data.


❖ The model stores parts of a polygon’s boundary as non overlapping
arcs and indicates which polygon is on left & right of each arc.
❖ The boundary model is also called topological data model (as it captures
some topological information, such as polygon neighborhood).
Spaghetti Data Model

▪ Spaghetti analogy refers to the messy representation of


geographic features by a heap of tangled lines that
twist, curl and cross one another-no more structured
than plate of spaghetti.
▪ Representing geographic features using a spaghetti
data structure posses 4 distinct problems:
1. The lines have little or no r/p to the geographic
features they represent.
2. The r/ps b/n adjacent features are not explicitly stored
and so advanced analyses are more difficult to conduct.
3. The boundaries of polygons must be represented twice,
and also there is data redundancy.
4. Inconsistencies in the boundaries b/n adjacent
polygons can cause overlaps or gaps.
Spaghetti Data Model

❖ In spaghetti model the paper map is translated line-for-line into a list


of XY coordinates.
❖ A point is encoded as a single XY coordinate pair and a line as a
string of XY coordinate pairs.
❖ An area is represented as a polygon and is recorded as a closed loop
of XY coordinates that define its boundary.
❖ A file of spatial data constructed in this manner is essentially a
collection of coordinate strings with no inherent structure (that’s
why called spaghetti model).
Spaghetti Data Model…cont.

❖ Spaghetti model is very inefficient for most types of spatial analysis (since any
spatial r/ps must be derived by computation).
Topology

▪ Topology is defined as the spatial r/ps b/n connecting or


adjacent features.
▪ Topological data structures were dev’ped to allow a
mathematical robust means of verifying data entry and to
increase computational efficiency of complex queries that
involved adjacency or containment.
▪ Topology is the mathematical method used to define
spatial r/ps
▪ Topology deals with spatial properties that do not change
under certain transformations.
Topology and spatial relationships

❖ Topological r/ps (intersection, interior, etc…) are invariant under continuous


transformation. Such properties are called topological properties and the
transformation is called a topological mapping.
Topology and spatial relationships…cont.

❖ The concept of topology arise from arc, which are a series of points
that start and end at a node.
❖ A node is an intersection point where two or more arcs meet.
❖ Isolated nodes, not connected to arcs, represent points.
❖ A polygon is comprised of a closed chain of arcs that represents the
boundaries of the area.
❖ To relate the map features to real world position, the XY coordinates
are needed and are stored in the arc coordinte Data Table.
❖ Each arc is represented by one or more straight line segments defined
by a series of coordinates.
Topology and spatial relationships…cont.

❖ Mathematical properties of geometric space used for spatial data:


▪ In 3D we can define features like points, polygons and volumes as geometric primitives of
the respective dimension.
• A point is zero-dimensional,
• A line is one dimensional,
• A polygon is 2 dimensional,
• A volume is 3 dimensional.
❖ Within topological space, features that are easy to handle should be used as
representations of geographic field.
❖ These features are called simplices as they are the simplest geometric shapes of some
dimension.
▪ Point (0-simplex)
▪ Line segment (1-simplex),
▪ Triangle (2-simplex),
▪ Tetrahedron (3-simplex),
▪ When various simplices combined into single feature, it is called simplicial complex.
Topology and spatial relationships…cont
Topology of two dimensions

❖ Topological properties of interior and boundary can be used to define


r/p b/n spatial features.
❖ Interior of region R is the largest set of points of R for which we can
construct a disk like environment around it (no matter how small)
that also falls completely inside R.
❖ Boundary of R is the set of those points belonging to R but that do not
belong to interior of R (one cannot construct a disk like environment
around such points that still belongs to R completely).
Topology of two dimensions…cont.
Topology of two dimensions…cont.

❖ If we have two region A & B, and if they have meet r/p, then in
mathematics:

❖ There are about 8 spatial r/ps: meets, disjoint, equals, inside, covered
by, contains, covers and overlap.
❖ These r/ps can be used in queries against a spatial database.
Topology of two dimensions…cont.

❖ Spatial r/ps b/n 2 regions derived from topological invariants of intersection


of boundary and intersection.
ArcGIS selection by location queries

1. Intersection
2. Are within a distance of
3. Completely contain
4. Are completely within
5. Have their center in
6. Share a line segment with
7. Touch the boundary of
8. Are identical to
9. Are crossed by the outline of
10. Contain
11. Are contained by
Representations of geographic fields

❖ Geographic field can be represented by tessellation or vector (TIN)


▪ A raster representation of elevation.
Representations of geographic fields…cont.

❖ A vector representation of elevation using isolines.


▪ An isoline is a linear feature that connects the points with equal
field value.
▪ When isoline is elevation, it is called contour lines.
Representation of geographic objects

❖ Geographic objects mostly is supported by vectors.


❖ Objects are identified by parameters, such as location, shape, size and
orientation.
Representation of geographic objects…cont.

❖ Objects can also be represented by tessellations.


▪ Remotely sensed images

▪ Boundaries appear ragged.


▪ Line and point objects are more awkward to represent using
rasters.
Advantages and disadvantages
❖ Raster representations ❖ Vector representations
❖ Advantage ❖ Advantage
1. Simple data structure 1. Efficient representation of
2. Simple representation of topology
overlays 2. Adapts well to scale changes
3. Efficient for image 3. Allows representing networks
processing 4. Allows easy associations with
❖ Disadvantage attribute data
1. Less compact data ❖ Disadvantage
structure 1. Complex data structure
2. Difficulties in 2. Overlay more difficult to
representing topology implement
3. Cell boundaries 3. Inefficient for image processing
independent of feature 4. More update-intensive
boundaries

You might also like