0% found this document useful (0 votes)
23 views142 pages

Unit 2

Gis Unit-2 Notes

Uploaded by

arunasekaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views142 pages

Unit 2

Gis Unit-2 Notes

Uploaded by

arunasekaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 142

OCE 552 - Geographic

Information System
UNIT II SPATIAL DATA MODELS
9

Database Structures – Relational,


Object Oriented – ER diagram - spatial
data models – Raster Data Structures
– Raster Data Compression - Vector
Data Structures - Raster vs Vector
Models - TIN and GRID data models -
OGC standards - Data Quality.
Database
Structures
Database Structures:
A geodatabase can be designed for
single or multiple users.
A single-user database can be a
personal geodatabase or a file
geodatabase.
A personal geodatabase stores data as
tables in a Microsoft Access database.
A file geodatabase, on the other hand,
stores data in many small-sized binary
files in a folder.
The geodatabase organizes vector
data sets into feature classes and
feature datasets.
In a geodatabase, feature classes
can be standalone feature classes or
members of a feature dataset.
The presence of feature attribute and
nonspatial data tables means that a GIS
requires a database management
system (DBMS) to manage these tables.
A DBMS is a software package that
enables us to build and manipulate a
database. A DBMS provides tools for
data input, search, retrieval,
manipulation, and output.
For example, ArcGIS uses Microsoft
Access for managing personal
geodatabases.
Many GIS packages also have
database connection capabilities to
access remote databases. This is
important for GIS users who routinely
access data from centralized
databases.
For example, GIS users at a ranger
district office may regularly retrieve
data maintained at the headquarters
office of a national forest. This
scenario represents a client-server
distributed database system.
THE RELATIONAL
MODEL
A database is a collection of
interrelated tables in digital
format. At least four types of
database designs have been
proposed in the literature:
Flat file,
Hierarchical,
Network, and
Relational
A flat file contains all data in a large table.
A feature attribute table is like a flat file.
 A hierarchical database organizes its data
at different levels and uses only the one-to-
many association between levels.
 A network database builds connections
across tables, as shown by the linkages
between the tables
A common problem with both the hierarchical
and the network database designs is that
the linkages (i.e., access paths)
between tables must be known in
advance and built into the database at
design time.
GIS packages, both commercial and
open source, typically use the
relational model for database
management.
A relational database is a collection of
tables, also called relations, that can be
connected to each other by keys.
A primary key represents one or more
attributes whose values can uniquely
identify a record in a table.
A foreign key is one or more attributes
that refer to a primary key in another
table.
But in GIS, they often have the
same name, such as the feature ID.
In that case, the feature ID is also
called the common field.
In Figure Zonecode is the common
field connecting zoning and parcel,
and PIN (parcel ID number) is the
common field connecting parcel
and owner. When used together,
the fields can relate zoning and
owner.
Normalization
 Normalization is a process of
decomposition, taking a table with all
the attribute data and breaking it down
into small tables while maintaining the
necessary linkages between them.
Normalization is designed to achieve the
following objectives:
• To avoid redundant data in tables
• To ensure that attribute data in
separate tables can be maintained
and updated separately and can be
linked whenever necessary
• To facilitate a distributed database.
The map shows four land parcels with the PINs of
P101,P102, P103, P104

Table 2.1 Unnormalised


Table
Table 2.2 First Normalisation
Fig 2.4 Second Normalisation
Fig 2.5 Final Normalised Table
Types of Relationship
A relational database may
contain four types of
relationships or cardinalities
between tables or, more
precisely, between records in
tables:
one-to-one, one-to-many, many-
to-one, and many-to-many
Four type of data relationship between table
OBJECT ORIENTED
DATABASE STRUCTURE:
 An object-based spatial database
is a spatial database that stores the
location as objects.
 The object-based spatial model treats
the world as surface littered with
recognizable objects (e.g. cities,
rivers), which exist independent of
their locations.
 Objects can be simple as polygons
and lines, or be more complex to
represent cities.
 While a field-based data model sees the world as a
continuous surface over which features (e.g. elevation)
vary, using an object-based spatial database, it is
easier to store additional attributes with the objects,
such as direction, speed, etc.

 The geodatabase model supports an object-


oriented vector data model. In this model, entities
are represented as object with properties, behaviour,
and relationships.

 These object types include simple objects, geographic


features (objects with location), network features
(objects with geometric integration with other
features), annotation features, and other more
specialized feature types.
Classes, Methods and Relationships

Each data model Object is essentially an


instance of a Class. Classes are object
oriented constructs which group objects
that share the same set of attributes and
methods.
Methods are the functions that define the
interaction of objects to the outside
world.
In addition to a description for objects, its
attributes and behaviors, a data model
also explains the relationship between
 An example of a class can be a Line feature and
one of its instances might be a river. Attribute
fields of the river line are an integer identifier,
number of line segments and start and end points
of each segment.

 Calculation of total flow volume by using the river


dimension attributes will be an example of Method
for the river object.

 In order to account for flow and interactions


between each river segment and the watershed,
and also to streamline query and storage, definition
of (topological) relationships between classes is
needed.
The three main relationships between
classes that have been implemented
in the design of the hydrologic data
model are Generalization,
Association and Aggregation.
A generalization relationship
between any two classes means that
one of the classes (Child class) is
derived from the other (Base class).
Association shows the relationship
between instances of classes.
Spatial object Class Inheritance
Hierarchy
Spatial

Point

Polyline

Open Polyline

Closed Polyline

Polygon

Extent
ENTITY RELATIONSHIP
MODEL (ER MODEL)
 The entity relationship (ER) model represents
the conceptual design of a database. The ER
diagram helps in understanding the components
of a database and relationships among them.
Entity Record
 An entity is a real world item that exists on its
own. The set of all possible values for an entity
is the entity type. For example, a particular
student such as ‘Ravi Kumar’ is an entity record.
Student is the entity type in this case.
 In ER diagram we show entity type as a
rectangle containing the type name.
Attribute
Properties that describe an entity
are known as its attributes. The
value of an attribute could be
expressed in numbers or in text.
In ER diagram attributes are
represented by ovals attached to
the entity by a line.
Attributes can be classified as:
Key attributes: An attribute whose
values are distinct for each individual
entity record and are used for
identifying an individual entity record
are known as key attributes.
For example in the student
entity type, StudentID is the key
attribute since no two students can
have same StudentID.
A key attribute is underlined in ER
diagram.
 Non-key attributes : Attributes that are not unique
but are used to describe the entities are known as
non-key attributes. Names, age, address of a student
are the non key attributes.
 Simple : Attributes that can’t be divided into
subparts are called simple attributes. For example
StudentID which is just a number is a simple
attribute.
 Composite : Attributes that can be divided into
subparts with each subpart having their own
independent meaning are composite attributes. For
example Name of a student can be divided into two
parts i.e. first name and last name. This could be
illustrated by branching off the components of the
attribute.
 Single valued: Attributes that can hold only
single value at a time are called single
valued attributes. Age of a student can’t
have more than one value and hence it is a
single valued attribute.
 Multiple valued: Attributes that can have
more than one value are called multiple
valued attributes. For example the contact
number of a student can have two or more
than two phone numbers.
 A multi valued attribute is shown as:
 Derived attributes: The attributes that are
derived using a mathematical formula and
operations on other attributes are called derived
attributes.
 Stored attributes: The attributes from which
another attributes can be derived are called stored
attributes. The age of a student can be calculated
by counting the number of years starting from his
date of birth to the present date. In this case age
is the derived attribute and date of birth is the
stored attribute. In ER diagram a derived attribute
is represented with a dotted oval and a line.
Relationship
 A relationship is an association among entity types.
It is represented as a diamond in ER diagram.
 For example an entity ‘student’ can be associated
with another entity ‘class’ as follows:

 ‘Attends’ is the relationship between the two


entities.

 The degree of a relationship type is the number of


participating entity types. The above example has
degree 2 and is therefore a binary relationship.
Cardinality
Cardinality denotes the occurrences of data
on either side of a relation.
 The cardinality ratio for a binary
relationship specifies the maximum
number of relationship instances an entity
can participate in.
A one to one relationship indicates that
a single instance of one entity is associated
with a single instance in the related entity.
A one to many or a many to one
relationship indicates that a single
instance of one entity is associated
with one or more instances of the
related entity.
A many to many relationship
indicates that either entity
participating in the relationship
may have many instances.
Example: The diagram shown
below represents the academic
functioning of a college. There are
five entities viz. Department,
Faculty, Student, Course, and
Hostel. All the five entities have
their own attributes. DNumber,
FacultyID, StudentID, CourseID, and
HostelID are the key attributes of
Department, Faculty, Student,
Course and Hostel respectively.
ER-Diagram showing academic functioning
of a college
Spatial Data
Model
Types of Raster Data
A large variety of data that we use in GIS
are encoded in raster format. They are
Satellite Imagery
USGS Digital Elevation Models ( DEMs)
Non –UDGS DEMs
Global DEMs
Digital Orthophotos
Bi-Level Scanned Files
Digital Raster Graphics (DRGs)
Graphic Files
GIS-Software specific Raster Data
Satellite Imagery
Remotely sensed satellite data are
familiar to GIS users
The spatial resolution of a satellaite
image relates to the ground pixel size.
For example, resolution of 30 meters
means 900 square meters in ground
pixel
The pixel value is also called the
brightness value, represents light
energy emitted or reflected from earth
surface.
Satellite Imagery
Remotely sensed satellite data are
familiar to GIS users
The spatial resolution of a satellaite
image relates to the ground pixel size.
For example, resolution of 30 meters
means 900 square meters in ground
pixel
The pixel value is also called the
brightness value, represents light
energy emitted or reflected from earth
surface.
USGS-Digital Elevation
Models
A digital elevation model(DEM) consists of
an array of uniformly spaced elevation data.
 A DEM is a point based,but easily converted
to raster data by placing each elevation
point at the center of a cell.
 Most GIS users in the US use DEMs from the
USGS.
 USGS DEMs include the following
◦ 7.5 minute DEM,
◦ 30-minute Dem
◦ 1-Degree DEM
◦ Alaska DEM
USGS-Digital Elevation
Models
7.5 Minute DEM
 The 7.5 minute DEM provide
elevation data at a spacing of 30
meters or 10 meters on a grid
measured in UTM coordinates that
are referenced to either NAD 27 or
NAD83.
 Each DEM covers a 7.5 by 7.5
minute block that correspond to a
USGS 1:24000 scale.
Vector Data
Structures
Vector data structure
Geographic entities encoded
using the vector data model, are
often called features.
The features can be divided into
two classes:
a. Simple features
b. Topological features
a. Simple features
These are easy to create, store
and are rendered on screen very
quickly. They lack connectivity
relationships and so are
inefficient for modeling
phenomena conceptualized as
fields.
Point entities :
These represent all geographical
entities that are positioned by a
single XY coordinate pair.
Along with the XY coordinates the
point must store other information
such as what does the point
represent etc.
Line entities : Linear features made by
tracing two or more XY coordinate pair.
Simple line: It requires a start and an end
point.
Arc: A set of XY coordinate pairs describing
a continuous complex line. The shorter the
line segment and the higher the number of
coordinate pairs, the closer the chain
approximates a complex curve.
Simple Polygons : Enclosed structures
formed by joining set of XY coordinate
pairs.
b. Topological features
A topology is a mathematical procedure
that describes how features are spatially
related and ensures data quality of the
spatial relationships.
Topological relationships include
following three basic elements:
I. Connectivity: Information about linkages
among spatial objects
II. Contiguity: Information about
neighbouring spatial object
III. Containment: Information about
inclusion of one spatial object within
another spatial object
Connectivity
Arc node topology defines
connectivity –
1. Arcs are connected to each other if
they share a common node. This is the
basis for many network tracing and
path finding operations.
2. Arcs represent linear features and the
borders of area features.
3. Every arc has a from-node which is
the first vertex in the arc and a to-node
which is the last vertex.
Arc-node Topology
Nodes can, however, be used to
represent point features which
connect segments of a linear
feature (e.g., intersections
connecting street segments,
valves connecting pipe
segments).

Node showing intersection


Arc-Node Topology with
list
Arc-node topology is supported
through an arc-node list. For each arc
in the list there is a from node and a
to node. Connected arcs are
determined by common node
numbers.
Contiguity
Polygon topology defines
contiguity. The polygons are said to
be contiguous if they share a
common arc.
Contiguity allows the vector data
model to determine adjacency
The fromnode and to node of an arc
indicate its direction, and it helps
determining the polygons on its left
and right side.
In the illustration above, polygon B is
on the left and polygon C is on the
right of the arc 4.
Polygon A is outside the boundary of
the area covered by polygons B, C
and D. It is called the external or
universe polygon
Containment

Geographic features cover distinguishable


area on the surface of the earth.
The polygons can be simple or they can
be complex with a hole or island in the
middle.
In the illustration given below assume a
lake with an island in the middle.
The lake actually has two boundaries, one
which defines its outer edge and the
other (island) which defines its inner
edge.
The polygon D is made up of arc
5, 6 and 7.
The 0 before the 7 indicates that
the arc 7 creates an island in the
polygon.
Polygons are represented as an
ordered list of arcs and not in
terms of X, Y coordinates. This is
called Polygon-Arc topology
Since arcs define the boundary of
polygon, arc coordinates are
stored only once, thereby
reducing the amount of data and
ensuring no overlap of
boundaries of the adjacent
polygons.
Polygon as a topological feature
Raster Data
Structures
Raster Data
Compression
Raster Vs
Vector Models
Comparison between Vector and Raster
Data Models
Data Model Advantages Disadvantages

Simple data structure Cell size determines the


resolution at which the data
is represented
Compatible with remote Requires a lot of storage space
Raster sensing or scanned data
Spatial analysis is easier Projection transformations are
time consuming
Simulation is easy because Network linkages are difficult
each unit has the same size to establish
and shape
Data is represented at its The location of each vertex is
original resolution and form to be stored explicitly
without generalization
Require less storage space Overlay based on criteria is
Vector difficult
Editing is faster and Spatial analysis is
convenient cumbersome
Network analysis is fast Simulation is difficult because
each unit has a different
topological form
Projection transformations are
Raster Data
Compression
Data compression refers to the
reduction of data volume, a topic
particularly important for data
delivery and Web mapping.
Data compression is related to
how raster data are encoded.
Quadtree and RLE, because of
their efficiency in data encoding,
can also be considered as data
compression methods.
A variety of techniques are
available for data compression.
They can be lossless or lossy.
A lossless compression
preserves the cell or pixel values
and allows the original raster or
image to be precisely
reconstructed.
RLE is an example of lossless
compression.
A lossy compression cannot
reconstruct fully the original
image but can achieve higher
compression ratios than a
lossless compression.
Lossy compression is therefore
useful for raster data that are
used as background images
rather than for analysis
Newer image compression
techniques can be both lossless
and lossy. An example is MrSID
(Multi-resolution Seamless Image
Database) patented by
LizardTech Inc.
MrSID uses the wavelet
transform for data compression.
The wavelet-based compression
is also used by JPEG 2000 and
ECW (Enhanced Compressed
Wavelet).
The wavelet transform treats an
image as a wave and
progressively decomposes the
wave into simpler wavelets
Using a wavelet (mathematical)
function, the transform repetitively
averages groups of adjacent pixels
(e.g., 2, 4, 6, 8, or more) and, at
the same time, records the
differences between the original
pixel values and the average.
The differences, also called wavelet
coefficients, can be 0, greater than
0, or less than 0.
Using the Haar function, we take
the average of each pair of
adjacent pixels. The averaging
results in the string (2, 8, 8, 4)
and retains the quality of the
original image at a lower
resolution.
But if the process continues, the
averaging results in the string (5,
6) and loses the darker center in
the original image.
Suppose that the process stops at
the string (2, 8,8, 4). The wavelet
coefficients will be −1 (1 − 2), −1(7
− 8), 0 (8 − 8), and 2 (6 − 4).
 If, however, a lossless compression
is needed, we can use the
coefficients to reconstruct the
original image. For example, 2 − 1 =
1 (the first pixel), 2 − (−1) = 3 (the
second pixel), and so on.
The UTM (Universal Transverse Mercator) system
is a system of coordinates that describes position
on a map
TIN and GRID
Models
TIN and Grid Models
Triangular Irregular Network (TIN)
 A surface representation derived from
irregularly spaced points and breakline
features. Each sample point has an x,y
coordinate and a z-value or surface
value.
TIN can be created from following
triangulation methods
 Delaunay Triangulation method
 Important Points method
 Adaptive Densification
Delaunay Triangulation Method

 TIN represents surface as contiguous


non-overlapping triangles created by
performing Delaunay triangulation.
 These triangles have a unique property
that the circumcircle that passes
through the vertices of a triangle
contains no other point inside it.
 This topologic data structure manages
information about the nodes that form
each triangle and the neighbors of each
triangle.
Delaunay Triangulation
Method
Advantages of Delaunay
triangulation

The triangles are as equiangular


as possible, thus reducing
potential numerical precision
problems created by long skinny
triangles
The triangulation is independent
of the order the points are
processed
Ensures that any point on the
surface is as close as possible to
 The TIN model is a vector data model which
is stored using the relational attribute tables.
TIN dataset contains three basic attribute
tables:
 Arc attribute table that contains length, from
node and to node of all the edges of all the
triangles.
 Node attribute table that contains x, y
coordinates and z (elevation) of the vertices .
 Polygon attribute table that contains the
areas of the triangles, the identification
number of the edges and the identifier of the
adjacent polygons
As TIN stores topological
relationships, the datasets can be
applied to vector based
geoprocessing such as automatic
contouring, 3D landscape
visualization, volumetric design,
surface characterization .
A triangulated irregular network
(TIN) approximates the terrain with
a set of non overlapping triangles .
Each triangle in the TIN assumes a
constant gradient. Flat areas of the
land surface have fewer but larger
triangles, whereas areas with higher
variability in elevation have denser
but smaller triangles. The TIN is
commonly used for terrain mapping
and analysis, especially for 3-D
display
Important Points Method:
The Extract Important points method
creates vector points from raster
elevation data.

Points are created automatically for


cell values at regular grid intersections
or that mark significant changes in
surface elevation, depending on the
chosen point extracting method
Adaptive Densification
Method:
It is used to create TIN objects
using raster surface data as the
input object. This process
iteratively inserts nodes inside
existing triangles at the location of
maximum surface deviation from
the plane of triangle.
Grid VS TIN
TIN Grid
Features TIN represent features In Grid, Flow directions
more accurately. Flow are restricted to grid
directions can be arbitrary points. There are only 8
possible flow directions
Advantages  Ability to describe the  Easy to store and
surface at different manipulate
level of resolution
 Easy integration with
 Effeciency in storing raster databases
data
 Smoother, more
natural appearance of
derived terrain
features
Disadvantages  In many cases require  Inability to use grid
visual inspection and sizes to reflect areas
manual control of the of different complexity
network of relief.
OGC Standards
The Open Geospatial Consortium
(OGC) is a not-for-profit
organisation focused on
developing and defining open
standards for the geospatial
community to allow
interoperability between various
software, and data services.
OGC Interoperable Sectors
Data Quality
In GIS, data quality is used to give
an indication of how good data are.
It describes the overall fitness or
suitability of data for a specific
purpose or is used to indicate data
free from errors and other problems.
Examining issues such as error,
accuracy, precision and bias can
help to assess the quality of
individual data sets.
 Data sets used for analysis need to be
complete, compatible and consistent, and
applicable for the analysis being
performed.
 Flaws in data are usually referred to as
errors.
 Error is the physical difference between
the real world and the GIS facsimile.
 A more systematic error would have
occurred if the co-ordinates for all the ski
lift stations in the data set had been
entered in (y,x) order instead of (x,y).
 Accuracy is the extent to which an
estimated data value approaches its true
value.
 If a GIS database is accurate, it is a true
representation of reality.
 It is impossible for a GIS database to be
100 per cent accurate, though it is possible
to have data that are accurate to within
specified tolerances.
 For example, a ski lift station co-ordinate
may be accurate to within plus or minus 10
metres.
Precision is the recorded level
of detail of your data.
A co-ordinate in metres to the
nearest 12 decimal places is
more precise than one specified
to the nearest three decimal
places.
Computers store data with a high
level of precision, though a high
level of precision does not imply
a high level of accuracy.
Four contestants in the shooting
have produced the results
The difference between accuracy and precision is important and is explained in
Box
Bias in GIS data is the systematic
variation of data from reality. Bias
is a consistent error throughout a
data set.
A consistent overshoot in digitized
data caused by a badly calibrated
digitizer, or the consistent
truncation of the decimal points
from data values by a software
program, are possible examples.
Resolution and generalization are
two important issues that may
affect the representation of
features in a GIS database.
In raster GIS, resolution is
determined by cell size. For
example, for a raster data set with
a 20-metre cell size, only those
features that are 20 × 20 metres
or larger can be distinguished.
Figure allows comparison of a 25
metre resolution vegetation map
with a 5 metre resolution aerial
photograph of the same area.
Resolution is dependent on the scale
of the original map, the point size
and line width of the features
represented thereon and the
precision of digitizing.
Generalization is the process of
simplifying the complexities of the
real world to produce scale models
and maps. Cartographic
generalization is a subject in itself
and is the cause of many errors in
GIS data derived from maps.

You might also like