Gis v3 Datamodel
Gis v3 Datamodel
Data Modeling
Data Modeling
Contents
1 2 3 4 5 6
Spatial Objects Geoobjects Raster and Vector Spatial Data Model Attribute Data The Relational Data Model Raster Data Structures Vector Data Structures
Data modeling the process of dening and organizing data into a consistent digital dataset that is useful and reveals information. Data model The logical organization of data according to a scheme. Data structure To represent the data model.
The abstraction process of representing realworld phenomena in a computeraccessible form involves the use of symbolic models, i.e. simplied representations, e.g. digital elevation model (DEM) the cells of the grid are the spatial objects, whose values are symbolized by numbers in the data le. Unless the organizational scheme is known, the data are worthless. digital data set containing well data from drill holes in a coaleld the spatial objects are the sample locations in the wells at which formation tops, and other attributes, are recorded.
Digital elevation model (DEM) data organized according to a raster (grid) model, the raster is organized in a runlength encoded data structure, and the data is written on a digital storage device in a selected le format;
Digital elevation model (DEM) data organized according to a vector model, expressed as polygons bounded by contour lines, arranged in some topological structure, and written on a storage device in a digital line graph format;
Digital elevation model (DEM) data locations are organized according to a graph, e.g. triangulated according to some criterion (e.g. Delaunay), the triangulation is represented as a table containing the vertices of each triangle and its neighboring triangles sharing a common edge, and written on a digital storage device in a selected le format.
The vector model uses irregular spatial objects that can be either natural or imposed and it employs a boundary representation of these area objects. The raster model uses regular imposed spatial objects, i.e. pixels, voxels, etc., which do not require individual boundary denitions. In either model, a spatial object is assumed to have properties that are homogeneous.
What are the required operations to determine the area and the perimter of a geoobject with respect to the raster or the vector model? area raster: counting ... vector: adding ... perimeter raster: counting ... vector: adding ... What is more expensive in terms of cpu time?
where [q ] denotes the largest natural number smaller than q . Spatial coordinates are not explicitly stored for each cell, because the storage order does this implicitly. The raster model represents a spatial object by enumeration. Processing raster data is ecient for e.g. overlaying images, neighborhood queries, spatial ltering, morphological operations, gradients, etc. Raster devices producing/displaying digital raster images are scanners, video-digitizers, video display monitors, line printers, inkjet plotters.
|x (yi yi +1 ) + xi yi +1 xi +1 yi + (xi +1 xi )y |
i =1 n
|xi yi +1 xi +1 yi |
i =1
The spatial resolution of a raster image is the size of a geoobject in the real world represented by an individual pixel. At 100m resolution, a square area of 100 km on a side requires a raster with 1000 rows and 1000 columns or 1 000 000 (1 Mill) pixels; at 10m resolution, it requires 10000 by 10000 or 100 000 000 (100 Mill) pixels. If 1 byte (requiring 8 bits of computer storage, integer numbers 0 to 255) is used per pixel, the storage needed for the latter raster image is 100 MB.
In vector mode, vertices are ordered pairs of spatial coordinates, lines surrounding polygonal areas are made by linking sequences of vertices, and areas are dened by lines that form closed loops or polygons. The vector model represents a spatial object by its boundaries, and uses a labelling scheme to keep track of their attributes. The straightforward storing of strings of coordinate pairs is referred to as spaghetti model. The spatial objects can be regarded as graphical elements. The boundary between two adjacent polygons is stored twice, once for each polygon. Vector devices producing/displaying digital vector images are digitizers that use line following principles, manual digitizing, digital pen plotters.
Attributes of objects to be recorded in a database can be spatial, temporal, and thematic. Spatial attributes data about location, topology, and geometry of spatial objects Temporal attributes age of objects (geological age), time of data collection or measurement Thematic attributes rock type, annual rainfall, presence of minerals or fossil taxa. Attributes of spatial objects are usually organized into lists or tables.
Attribute tables form a unifying link between raster and vector models, e.g. a soil map may be given in both vector and raster model, and both models utilize the same polygon attribute table as the attribute value in the raster is the pointer to the polygon label.
Denition of technical terms A relation is a twodimensional structure that contains data. It is an abstract concept that corresponds in practice to a table. It is a major aspect of data modeling that pertains to general DBMS. The Relational Model is an informaticians favourite model! tuple a row of a relation (data record in a at le, statistics: sample) eld a column of a relation (referring to an attribute, property in a at le) key, keyeld an attribute uniquley identifying tuples, providing links between one relation and another
Denition Normalization of a relation is the process of converting a complex relation into a larger number of simpler relations that refer to each other satisfying relational rules. First, second, third, fourth, fth normal form ... ... aiming at the reduction/removal of redundancy in a relation.
All data must be represented in tabular form (as opposed to hierarchies or graphs). All data must be atomic, i.e. any cell in the table can contain only a single value. No duplicate tuples are allowed. Tuples can be rearranged without changing the meaning of the table.
3 4
First normal form: Relations without repeating groups of attributes. Second normal form: Each nonidentifying attribute is functionally dependent on the whole key. Third normal form: Nonidentifying attributes are mutually independent. Fourth normal form: ... Fifth normal form: ...
Third normal form: Nonidentifying attributes are mutually independent Rectifying for dependencies POLYGON(poly , Fm ) FORMATION(Fm , Fm name, lith , age ) LITHOLOGY(lithology , lithology) AGE(age , age)
Spatial data structures refer to the organization of spatial data in a form suitable for digital computers. According to the raster and vector model, there are raster structures and vector structures. While the model is unique, the structure is not. There are also several structures to represent graph models.
Full raster structure restricts each layer to a single attribute, and limits the values to integers in the range of 0 to 255. Given information about the size of the array, and the ordering convention (scan order), arrays can be stored as onedimensional lists.
The number of bits required for runlength depends on the number of columns in the image. An image with 1024 columns requires ... bits to encode the run length, an image with 4098 columns requires ... bits to encode the run length. Thus, each run pair consists of a number for the length of the run in pixels, and a second number for the attribute or class value of the run. Each row starts with a new run.
The number of bits required for run length depends on the number of columns in the image. An image with 1024 columns requires 10 bits to encode the run length, an image with 4098 columns requires 12 bits to encode the run length.
Quadtrees and octrees are hierarchical data structues based on successive subdivision of blocks into 4 quadrants, or 8 octants, for pixels or voxels, respectively. A quadrant (block) is not further subdivided, if it is either homogeneous, i.e. if all its pixels have the same value, or if it is the size of a pixel. It is usually represented by a tree (graph) structure. 710 = row 110 0 1 column 310 1 1
Take a look at: Tiles ` a la Google Maps: Coordinates, Tile Bounds and Projection http://www.maptiler.org/google-maps-coordinates-tile-boundsprojection/
Spaghetti Structure In the spaghetti structure, tables of locational coordinates are associated with each of the basic objects points, lines, polygons. No topological attributes are used. Relationships between spatial objects are not considered, they have to be computed from the spatial coordinates.
Find all granite contacts that are also limestone contacts Remove all boundary lines between adjacent polygons that have the same classication Find points on a structure map where fault traces intersect
Find all granite contacts that are also limestone contacts Find all granite contacts that are also limestone contacts Start with list of granite polygons, another list of limestone polygons. Then match the vertices of each granite polygon with the vertices of every limestone polygon.
Find all granite contacts that are also limestone contacts Start with list of granite polygons, another list of limestone polygons. Then match the vertices of each granite polygon with the vertices of every limestone polygon. Search of the chain topology table for (left, right) polygon pairs that are either (granite, limestone) or (limestone, granite). Remove all boundary lines between adjacent polygons that have the same classication
Remove all boundary lines between adjacent polygons that have the same classication Polygons belonging to the same class need to be matched with one another to nd common boundaries.
Remove all boundary lines between adjacent polygons that have the same classication Polygons belonging to the same class need to be matched with one another to nd common boundaries. Look in the chain topology table for (left, right) polygon pairs where left and right have the same class.
Find points on a structure map where fault traces intersect Find points on a structure map where fault traces intersect Each fault must be matched with every other fault, but pairwise comparison of vertices is not enough, because faults could intersect anywhere, not just at vertices. Each adjacent vertex pair from one fault must be compared with every adjacent pair of another fault to check if the lines intersect.
Find points on a structure map where fault traces intersect Each fault must be matched with every other fault, but pairwise comparison of vertices is not enough, because faults could intersect anywhere, not just at vertices. Each adjacent vertex pair from one fault must be compared with every adjacent pair of another fault to check if the lines intersect. The node list in the node topology table is searched for nodes with at least two lines where the lines are classied as faults.
Dierent structures are used for dierent tasks, depending which are the most ecient and most suitable. The raster structure is particularly ecient for the overlay of multiple data layers (image processing); raster images occupy large amounts of storage space. Spaghetti structure is ecient for displaying objects by their boundaries (cartography). Topological structure facilitates search that requires adjacency, containment, and connectivity information, because they are explicitly stored and separated from the spatial coordinates (geometry).
Raster model: The generalization from 2d pixels to 3d voxels preserves representation by enumeration, and a rudimentary topology of pixels and voxels, respectively, but not of geoobjects.
Vector model: The boundary representation generalizes from polygons given by ordered 0d vertices connecting to 1d edges to polyhedra given by 0d vertices connecting to 1d edges and 2d faces. Topology is a major challange (GMaps).