0% found this document useful (0 votes)
70 views19 pages

Gis v3 Datamodel

This document discusses data modeling concepts for geoscience information systems. It covers spatial objects like digital elevation models and how they can be represented using raster and vector data models. Raster models use a grid structure while vector models represent boundaries. Attribute data about spatial objects is organized in tables. The relational data model represents data in normalized tables to reduce redundancy. Key concepts like tuples, fields, and keys are defined for the relational model.

Uploaded by

Leonardo Olarte
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views19 pages

Gis v3 Datamodel

This document discusses data modeling concepts for geoscience information systems. It covers spatial objects like digital elevation models and how they can be represented using raster and vector data models. Raster models use a grid structure while vector models represent boundaries. Attribute data about spatial objects is organized in tables. The relational data model represents data in normalized tables to reduce redundancy. Key concepts like tuples, fields, and keys are defined for the relational model.

Uploaded by

Leonardo Olarte
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Geoscience Information Systems

Geoscience Information Systems ModulCode GeoData BA Nr. 041


c 2012 Helmut Schaeben Geomathematics and Geoinformatics TU Bergakademie Freiberg, Germany Data modeling

TUBAF Winter Term 2012/13

Data Modeling

Data Modeling

Contents

1 2 3 4 5 6

Spatial Objects Geoobjects Raster and Vector Spatial Data Model Attribute Data The Relational Data Model Raster Data Structures Vector Data Structures

Data modeling the process of dening and organizing data into a consistent digital dataset that is useful and reveals information. Data model The logical organization of data according to a scheme. Data structure To represent the data model.

Spatial Objects (1)

Digital Elevation Model (DEM)


A digital elevation model based on suciently many measurements of the height of the surface allows to numerically determine the height at any given point of a well dened region and its digital representation as a geoobject. data (x , y , h) metadata like physical units of measurements, coordinate system, date of the recording, reliability of data, weather (visibility), name of the author ... experiences, knowledge, modeling assumptions, ... methods of spatial interpolation, approximation, prediction, ... (maths, numerics) digital representation of the DEM (informatics, data model, data structure) visualization (computer graphics)

The abstraction process of representing realworld phenomena in a computeraccessible form involves the use of symbolic models, i.e. simplied representations, e.g. digital elevation model (DEM) the cells of the grid are the spatial objects, whose values are symbolized by numbers in the data le. Unless the organizational scheme is known, the data are worthless. digital data set containing well data from drill holes in a coaleld the spatial objects are the sample locations in the wells at which formation tops, and other attributes, are recorded.

Spatial Objects (2)

Spatial Objects (3)

Digital elevation model (DEM) data organized according to a raster (grid) model, the raster is organized in a runlength encoded data structure, and the data is written on a digital storage device in a selected le format;

Digital elevation model (DEM) data organized according to a vector model, expressed as polygons bounded by contour lines, arranged in some topological structure, and written on a storage device in a digital line graph format;

Spatial Objects (4)

Spatial Objects (5)


Spatial objects can be classied according to being continuous or discontinuous, temperature, gravity; state of matter, rock bodies. being natural or imposed spatial object, discrete spatial entities, e.g. river, ore body; articial entities, e.g. pixels (picture elements). their dimension, Euclidean, fractal, etc., being regularly or irregularly shaped, being samplinglimited or denitionlimited, information about shape and extent is limited only by the amount of available sampling, e.g. an oilpool bounded by water from below and by caprock from above; metallic orebody dened by cuto grade, seismic epicentres are dened by the sensitivity of the seismometer, elevation contour line is dened by a given elevation, a geochemical anomaly is dened by the threshold.

Digital elevation model (DEM) data locations are organized according to a graph, e.g. triangulated according to some criterion (e.g. Delaunay), the triangulation is represented as a table containing the vertices of each triangle and its neighboring triangles sharing a common edge, and written on a digital storage device in a selected le format.

Raster and Vector Model (1)

Raster and Vector Model (2)

The vector model uses irregular spatial objects that can be either natural or imposed and it employs a boundary representation of these area objects. The raster model uses regular imposed spatial objects, i.e. pixels, voxels, etc., which do not require individual boundary denitions. In either model, a spatial object is assumed to have properties that are homogeneous.

Raster and Vector Model (3)

Raster and Vector Model (4)

What are the required operations to determine the area and the perimter of a geoobject with respect to the raster or the vector model? area raster: counting ... vector: adding ... perimeter raster: counting ... vector: adding ... What is more expensive in terms of cpu time?

Raster and Vector Model (5)


Area of a polygon P given by the ordered set of points Pi = (xi , yi ), i = 1, . . . , n + 1 with P1 = Pn+1 . Introducing an arbitrary additional point P = (x , y ) P , then n n x xi xi +1 1 A = Ai = | det y yi yi +1 | 2 i =1 i =1 1 1 1 = = = 1 2 1 2 1 2
n

Raster Model (1)


Each raster cell (pixel) is associated with a number quantifying the observed attribute, each layer of grid cells records a separate attribute. A raster can be represented as matrix A = (aij )i =1,...,m;j =1,...,n , its cells are addressed by row and column number (i , j ), and can be stored with addressing by sequence (s ) =1,...,mn in the le aij s(i 1)n+j ; s a[ ]+1, [ ]n
n n

|(x xi )(yi yi +1 ) + (xi xi +1 )(yi y )|


i =1 n

where [q ] denotes the largest natural number smaller than q . Spatial coordinates are not explicitly stored for each cell, because the storage order does this implicitly. The raster model represents a spatial object by enumeration. Processing raster data is ecient for e.g. overlaying images, neighborhood queries, spatial ltering, morphological operations, gradients, etc. Raster devices producing/displaying digital raster images are scanners, video-digitizers, video display monitors, line printers, inkjet plotters.

|x (yi yi +1 ) + xi yi +1 xi +1 yi + (xi +1 xi )y |
i =1 n

|xi yi +1 xi +1 yi |
i =1

Raster Model (2)

Vector Model (1)

The spatial resolution of a raster image is the size of a geoobject in the real world represented by an individual pixel. At 100m resolution, a square area of 100 km on a side requires a raster with 1000 rows and 1000 columns or 1 000 000 (1 Mill) pixels; at 10m resolution, it requires 10000 by 10000 or 100 000 000 (100 Mill) pixels. If 1 byte (requiring 8 bits of computer storage, integer numbers 0 to 255) is used per pixel, the storage needed for the latter raster image is 100 MB.

In vector mode, vertices are ordered pairs of spatial coordinates, lines surrounding polygonal areas are made by linking sequences of vertices, and areas are dened by lines that form closed loops or polygons. The vector model represents a spatial object by its boundaries, and uses a labelling scheme to keep track of their attributes. The straightforward storing of strings of coordinate pairs is referred to as spaghetti model. The spatial objects can be regarded as graphical elements. The boundary between two adjacent polygons is stored twice, once for each polygon. Vector devices producing/displaying digital vector images are digitizers that use line following principles, manual digitizing, digital pen plotters.

Vector Model (2)

Vector Model (3)


Reasonable data structures to store vector data are considerably more complex. The structuring of vector data according to topological criteria is referred to as topological model. The boundaries of polygons are broken down into a series of arcs and nodes, and the spatial relationship between arcs, nodes and polygons are explicitly dened in attributes tables. Planar enforcement results in a set of polygon objects that ll the plane of the map. The vector model requires topological attributes to facilitate operations related to adjacency, containment, etc. Then, processing vector data is ecient, e.g. nd all arcs which have granite on one side.

Vector Model (4)


Planar enforcement

Graph Models (1)

Thiessen Voronoi Dirichlet tesselation Delaunay triangulation

Graph Models (2)

Attribute Data (1)

Attributes of objects to be recorded in a database can be spatial, temporal, and thematic. Spatial attributes data about location, topology, and geometry of spatial objects Temporal attributes age of objects (geological age), time of data collection or measurement Thematic attributes rock type, annual rainfall, presence of minerals or fossil taxa. Attributes of spatial objects are usually organized into lists or tables.

Attribute Data (2)

Attribute Data (3)

Attribute tables form a unifying link between raster and vector models, e.g. a soil map may be given in both vector and raster model, and both models utilize the same polygon attribute table as the attribute value in the raster is the pointer to the polygon label.

The Relational Model (1)

The Relational Model (2)

Denition of technical terms A relation is a twodimensional structure that contains data. It is an abstract concept that corresponds in practice to a table. It is a major aspect of data modeling that pertains to general DBMS. The Relational Model is an informaticians favourite model! tuple a row of a relation (data record in a at le, statistics: sample) eld a column of a relation (referring to an attribute, property in a at le) key, keyeld an attribute uniquley identifying tuples, providing links between one relation and another

The Relational Model (3)

The Relational Model (4)

Denition Normalization of a relation is the process of converting a complex relation into a larger number of simpler relations that refer to each other satisfying relational rules. First, second, third, fourth, fth normal form ... ... aiming at the reduction/removal of redundancy in a relation.

All data must be represented in tabular form (as opposed to hierarchies or graphs). All data must be atomic, i.e. any cell in the table can contain only a single value. No duplicate tuples are allowed. Tuples can be rearranged without changing the meaning of the table.

3 4

The Relational Model (5)

The Relational Model (6)

First normal form: Relations without repeating groups of attributes. Second normal form: Each nonidentifying attribute is functionally dependent on the whole key. Third normal form: Nonidentifying attributes are mutually independent. Fourth normal form: ... Fifth normal form: ...

The Relational Model (7)


Notice formation numbers are repeated, formation uniquely determines lithology and age. First normal form: Relations without repeating groups of attributes New FORMATION relation and simplied POLYGON relation linked by formation number to eliminate repeating attributes POLYGON(poly , Fm ) FORMATION(Fm , Fm name, lith , lithology, age ,age) Rectifying for repeating groups by simplifying the FORMATION relation and creating an AGE relation POLYGON(poly , Fm ) FORMATION(Fm , Fm name, lith , lithology, age ) AGE(age , age)

The Relational Model (8)

Third normal form: Nonidentifying attributes are mutually independent Rectifying for dependencies POLYGON(poly , Fm ) FORMATION(Fm , Fm name, lith , age ) LITHOLOGY(lithology , lithology) AGE(age , age)

The Relational Model (9)

The Relational Model (10)

Spatial Data Structures

Spatial Raster Structures (1)

Spatial data structures refer to the organization of spatial data in a form suitable for digital computers. According to the raster and vector model, there are raster structures and vector structures. While the model is unique, the structure is not. There are also several structures to represent graph models.

Full raster structure restricts each layer to a single attribute, and limits the values to integers in the range of 0 to 255. Given information about the size of the array, and the ordering convention (scan order), arrays can be stored as onedimensional lists.

Spatial Raster Structures (2)

Spatial Raster Structures (3)


Attributes are often called bands in digital imagery refering to the bandwidths of the electromagnetic spectrum registered by satellite imagers like LANDSAT. Band sequential (BSQ): pixel by pixel, row by row, layer by layer Band interleaved by line (BIL): pixels of a row of one layer by pixels of the row of next layer Band interleaved by pixel (BIP): pixel of one layer by pixel of next layer In this way, the band values for each pixel are stored physically close together on the medium.

Runlength Encoding (1)


Adjacent pixels having the same value are combined together as a run, represented by a pair of numbers (runlength, pixelvalue ).

Runlength Encoding (2)

The number of bits required for runlength depends on the number of columns in the image. An image with 1024 columns requires ... bits to encode the run length, an image with 4098 columns requires ... bits to encode the run length. Thus, each run pair consists of a number for the length of the run in pixels, and a second number for the attribute or class value of the run. Each row starts with a new run.

Runlength Encoding (2)

The number of bits required for run length depends on the number of columns in the image. An image with 1024 columns requires 10 bits to encode the run length, an image with 4098 columns requires 12 bits to encode the run length.

Scan Orders for Rasters (1)

Scan Orders for Rasters (2)

row order prime row order Morton order PeanoHilbert order

Quadtrees, Octrees (1)

Quadtrees, Octrees (2)

Quadtrees and octrees are hierarchical data structues based on successive subdivision of blocks into 4 quadrants, or 8 octants, for pixels or voxels, respectively. A quadrant (block) is not further subdivided, if it is either homogeneous, i.e. if all its pixels have the same value, or if it is the size of a pixel. It is usually represented by a tree (graph) structure. 710 = row 110 0 1 column 310 1 1

= 01112 = 012 112 = 134


1 3

Quadtrees, Octrees (3)

Quadtrees, Octrees (4)

Quadtrees, Octrees (5)

Quadtrees, Octrees (6)

Quadtrees, Octrees (7)

Quadtrees, Octrees (8)

Take a look at: Tiles ` a la Google Maps: Coordinates, Tile Bounds and Projection http://www.maptiler.org/google-maps-coordinates-tile-boundsprojection/

Vector Structures (1)

Vector Structure (2)

Spaghetti Structure In the spaghetti structure, tables of locational coordinates are associated with each of the basic objects points, lines, polygons. No topological attributes are used. Relationships between spatial objects are not considered, they have to be computed from the spatial coordinates.

Vector Structure (3)

Topological Vector Structure (1)


Topological Structure Denition of technical terms points isolated points, vertices linked to form a line lines sequence of ordered vertices with a start node and an end node chain (arc, edge) line which is part of one or more polygons node point where lines or chains meet or terminate ring consists of one or more chains polygon consists of one outer ring and zero or more inner rings simple polygon no inner ring complex polygon one or more inner rings

Topological Vector Structure (2)


Basic topological structure in terms of a normalized relation by van Roessel (1987).

Topological Vector Structure (3)

Topological Vector Structure (4)

Spaghetti vs. Topological Structure by Example (1)

Find all granite contacts that are also limestone contacts Remove all boundary lines between adjacent polygons that have the same classication Find points on a structure map where fault traces intersect

Spaghetti vs. Topological Structure by Example (2)

Spaghetti vs. Topological Structure by Example (3)

Find all granite contacts that are also limestone contacts Find all granite contacts that are also limestone contacts Start with list of granite polygons, another list of limestone polygons. Then match the vertices of each granite polygon with the vertices of every limestone polygon.

Spaghetti vs. Topological Structure by Example (4)

Spaghetti vs. Topological Structure by Example (5)

Find all granite contacts that are also limestone contacts Start with list of granite polygons, another list of limestone polygons. Then match the vertices of each granite polygon with the vertices of every limestone polygon. Search of the chain topology table for (left, right) polygon pairs that are either (granite, limestone) or (limestone, granite). Remove all boundary lines between adjacent polygons that have the same classication

Spaghetti vs. Topological Structure by Example (6)

Spaghetti vs. Topological Structure by Example (7)

Remove all boundary lines between adjacent polygons that have the same classication Polygons belonging to the same class need to be matched with one another to nd common boundaries.

Remove all boundary lines between adjacent polygons that have the same classication Polygons belonging to the same class need to be matched with one another to nd common boundaries. Look in the chain topology table for (left, right) polygon pairs where left and right have the same class.

Spaghetti vs. Topological Structure by Example (8)

Spaghetti vs. Topological Structure by Example (9)

Find points on a structure map where fault traces intersect Find points on a structure map where fault traces intersect Each fault must be matched with every other fault, but pairwise comparison of vertices is not enough, because faults could intersect anywhere, not just at vertices. Each adjacent vertex pair from one fault must be compared with every adjacent pair of another fault to check if the lines intersect.

Spaghetti vs. Topological Structure by Example (9)

Raster vs. Vector Structure

Find points on a structure map where fault traces intersect Each fault must be matched with every other fault, but pairwise comparison of vertices is not enough, because faults could intersect anywhere, not just at vertices. Each adjacent vertex pair from one fault must be compared with every adjacent pair of another fault to check if the lines intersect. The node list in the node topology table is searched for nodes with at least two lines where the lines are classied as faults.

Dierent structures are used for dierent tasks, depending which are the most ecient and most suitable. The raster structure is particularly ecient for the overlay of multiple data layers (image processing); raster images occupy large amounts of storage space. Spaghetti structure is ecient for displaying objects by their boundaries (cartography). Topological structure facilitates search that requires adjacency, containment, and connectivity information, because they are explicitly stored and separated from the spatial coordinates (geometry).

Raster and Vector Model, Structure, for larger dimensions

Raster model: The generalization from 2d pixels to 3d voxels preserves representation by enumeration, and a rudimentary topology of pixels and voxels, respectively, but not of geoobjects.

Vector model: The boundary representation generalizes from polygons given by ordered 0d vertices connecting to 1d edges to polyhedra given by 0d vertices connecting to 1d edges and 2d faces. Topology is a major challange (GMaps).

You might also like