Unit 2 Data Management and Processing System
Unit 2 Data Management and Processing System
SYSTEM (5)
• Data processing refers to the process of
performing specific operations on a set of data
or a database.
A database is an organized collection of facts and information, such as
records on employees, inventory, customers, and potential customers. As
these examples suggest, numerous forms of data processing exist and
serve diverse applications.
Data processing primarily is performed on information systems, a
broad concept that encompasses computer systems and related
devices. At its core, an information system consists of input,
processing, and output. In addition, an information system
provides for feedback from output to input.
The input mechanism (such as a keyboard, scanner,
microphone, or camera) gathers and captures raw data and
can be either manual or automated.
Processing, which also can be accomplished manually or
automatically, involves transforming the data into useful
outputs. This can involve making comparisons, taking
alternative actions, and storing data for future use.
A relational DBMS would follow the relational model, for example. The
functions of a DBMS include data storage and retrieval, database
modifications, data manipulation, and report generation.
• Many GIS applications are implemented within organizations and have to play an important role.
The effectiveness of GIS projects therefore depends as much on the futuristic planning of hardware
and software requirements, as on technicalities of its implementation. It is therefore necessary to
address the question of how GIS can be successful in an organization and continue to do so for
many years before upgradations of hardware and software are needed.
This will help all your work go faster and not just Manifold. As
always, keep in mind that to really use lots of RAM effectively
you should be running 64-bit Windows such as Windows Vista
x64, Windows Server 2008 x64 or Windows XP x64.
Software Requirement
The following is a list of GIS application packages from which one can
choose. The choice shall depend on the needs of the organization,
functionality desired and the money available, and the period for which the
planning is being done. One may need to make a comparison of costs and
benefits (both of which keep changing rapidly) before making a final
decision.
Most widely used notable proprietary software
applications and providers:
Hydro GeoAnalyst – Environmental data management and visualization software
by Schlumberger Water Services.
Autodesk – Products include MapGuide and other products that interface with its
flagship AutoCAD software package.
Cadcorp – Developers of GIS software and OpenGIS standard (e.g. Read/Write
Open Source PostGIS database).
Intergraph – Products include GeoMedia, GeoMedia Profesional, GeoMedia
WebMap, and add-on products for industry sectors, as well as photogrammetry.
ERDAS IMAGINE – A proprietary GIS, Remote Sensing, and Photogrammetry
software developed by ERDAS, Inc.
• ESRI – Products include ArcView 3.x, ArcGIS, ArcSDE,
ArcIMS, and ArcWeb services.
• IDRISI – Proprietary GIS product developed by Clark Labs.
• MapInfo – Products include MapInfo Professional and
MapXtreme. integrates GIS software, data and services.
• MapPoint – Proprietary GIS product developed by
Microsoft.
• GISNet – A web-based GIS system developed by
Geosystems Corporation.
• Caliper – Products include Maptitude, TransCAD and
TransModeler. Develops GIS and the only GIS for
transportation.
• Pictometry – Proprietary software which allows oblique
images to be draped with shapefiles. Many gov't applications
(fire, 911, police, assessor) as well as commercial.
• Black Coral Inc — a leading edge product company
developing geospatial collaboration capabilities that enable
better outcomes for personnel and tactical teams operating
in emergency response and military environments.
2.2 GIS data creation and organization system
If you have a table of x,y coordinates, such as GPS measurements, you can add
it to ArcMap to create a new point layer (known as an x, y event layer). If you
want to make that layer permanent, you can export it from ArcMap or create a
new point feature class in ArcCatalog from the data.
• With a table of addresses, you can also add the
table to ArcMap and use it to create new point
features that represent the addresses. This is
known as geocoding.
Survey data can be directly entered into a GIS from digital data collection systems on survey
instruments using a technique called coordinate geometry (COGO). Positions from a global
navigation satellite system (GNSS) like Global Positioning System can also be collected and
then imported into a GIS.
A current trend in data collection gives users the ability to utilize field
computers with the ability to edit live data using wireless connections
or disconnected editing sessions..
• This has been enhanced by the availability of low-cost mapping-grade GPS units
with decimeter accuracy in real time. This eliminates the need to post process,
import, and update the data in the office after fieldwork has been collected.
This includes the ability to incorporate positions collected using a laser rangefinder.
New technologies also allow users to create maps as well as analysis directly in the
field, making projects more efficient and mapping more accurate.
Remotely sensed data also plays an important role in data collection and consist of
sensors attached to a platform. Sensors include cameras, digital scanners and lidar,
while platforms usually consist of aircraft and satellites. In England in the mid 1990s,
hybrid kite/balloons called helikites first pioneered the use of compact airborne
digital cameras as airborne geo-information systems.
Aircraft measurement software, accurate to 0.4 mm was used to link
the photographs and measure the ground. Helikites are inexpensive
and gather more accurate data than aircraft. Helikites can be used
over roads, railways and towns where unmanned aerial vehicles
(UAVs) are banned.
• Recently aerial data collection is becoming possible with miniature
UAVs. For example, the Aeryon Scout was used to map a 50-acre area
with a ground sample distance of 1 inch (2.54 cm) in only 12 minutes.
• The majority of digital data currently comes from photo interpretation
of aerial photographs. Soft-copy workstations are used to digitize
features directly from stereo pairs of digital photographs.
• These systems allow data to be captured in two and three dimensions,
with elevations measured directly from a stereo pair using principles
of photogrammetry.
• Analog aerial photos must be scanned before being entered into a
soft-copy system, for high-quality digital cameras this step is skipped.
• Satellite remote sensing provides another important source of
spatial data. Here satellites use different sensor packages to
passively measure the reflectance from parts of the
electromagnetic spectrum or radio waves that were sent out
from an active sensor such as radar. Remote sensing collects
raster data that can be further processed using different bands
to identify objects and classes of interest, such as land cover.
• When data is captured, the user should consider if the data
should be captured with either a relative accuracy or absolute
accuracy, since this could not only influence how information
will be interpreted but also the cost of data capture.
• After entering data into a GIS, the data usually requires editing,
to remove errors, or further processing.
• For vector data it must be made "topologically
correct" before it can be used for some advanced
analysis. For example, in a road network, lines
must connect with nodes at an intersection. Errors
such as undershoots and overshoots must also be
removed. For scanned maps, blemishes on the
source map may need to be removed from the
resulting raster. For example, a fleck of dirt might
connect two lines that should not be connected.
• Metadata is an important, but unfortunately oft overlooked component of GIS data.
Metadata is ‘data about the data’ and it’s vital to understanding to source, currency,
scale, and appropriateness of using GIS data.
• Metadata-L
The main objective of this list is to provide the GIS community with an open forum to
share ideas and discuss metadata issues and strategies
• National States GIC
A primer on what metadata is and how to use it by the National States Geographic
Information Council (GIC).
• NBII
Standards available for downloading for the National Biological Information Infrastructure.
• USGS Metadata FAQ
FAQ compiled by Peter Schweitzer of the USGS.
• USGS Metadata Tool Page
Lists some metadata tools for download that comply with FGDC standards.
Using geoprocessing to add data
• With geoprocessing, you can use tools to update existing fields, append records
to a table permanently, or append fields to a table dynamically with a join.
The Append tool
• Use this tool to add new features or other data from multiple datasets into an
existing dataset. This tool can append point, line, or polygon feature classes,
tables, rasters, raster catalogs, annotation feature classes, or dimensions feature
classes into an existing dataset of the same type. For example, several tables can
be appended to an existing table, or several rasters can be appended to an
existing raster dataset, but a line feature class cannot be appended to a point
feature class.
The Calculate Field tool
• Calculates the values of a field for a feature class, feature layer, or raster catalog.
• The Calculate Field tool is great for updating either existing fields or newly
created fields. You can calculate numbers, text, or date values into a field. Using
code blocks, you can write scripts to perform advanced calculations.
The Add Join tool
• The Add Join tool will append the fields from the join table to a base table.
• Typically, you'll join a table of data to a layer based on the value of a field that can be found in both tables. The
name of the field does not have to be the same, but the data type has to be the same; you join numbers to
numbers, strings to strings, and so on. You can perform a join with either the Join Data dialog box, accessed by
right-clicking a layer in ArcMap, or the Add Join tool.
• When you create a joined table, the fields that are appended from the join table are not permanently attached
to the base table. You can remove a join to remove the appended fields.
Using joins to add data
• You don't have to use geoprocessing to create a join. You can use the Join Data dialog box in ArcMap to create a
join. Whether created using the Join Data dialog box or with a geoprocessing tool, the join will behave in the
exact same way. Join fields are appended to the base table (or target table), and the appended fields can be
used in field calculations or labeling, symbolizing, or querying the data.
• Learn more about joins and relates
Adding data by editing
• Editing data to update existing attributes or to create new data is a process than can be done in ArcMap. You
can edit attributes through the attribute table of a layer or table or by using the Attributes window.
The Attribute table
• When you open an attribute table, the default view of the table is read-only. However, if you start an edit
session, you can manually edit the attributes in the cells of the table. When you are editing in the attribute
table, a blank row is added to the bottom of the table where you can add new data to the table.
• To make automated edits in the attribute table, you can use the Field Calculator or Calculate Geometry tools to
update the table. You can use these tools outside an edit session; however, you will not be able to undo your
calculations unless you are in an edit session
Manual Digitizing
• This is traditionally the most common way to convert paper-based sources of spatial
information (e.g. maps) to digital data. The paper map is attached by tape to a digitizing
table (or tablet as the smaller digitizers are known). Usually between 4-6 initial points of
which the coordinates are known are logged. Optimally these points are such locations
as the intersections of graticule lines. In the absence of an overlying grid system, points
are taken from identifiable locations such as street intersections or landmarks.
• The data is then digitized by tracing the features of interest with a mouse like hand held
device called a puck. Once all the features are traced the newly acquired data is
transformed from table units (the coordinates of the digitizing table) to real world units
using and algorithm. This algorithm takes the known table coordinates of the initial
points and warps the data to match the real world coordinates assigned to those points.
• To error in the adjustment from the table units to real world coordinates is called the
RMS error. Results are reported as root-mean-square (RMS) error and average error. The
RMS value reflects the range of the error; the precision of the digitized data.
• Factors contributing to this error can be human error, shrinkage or physical alteration of
the paper map and projection differences.
Scanning
• With the proliferation of low cost sources of digital imagery and
large format scanners, heads up digitizing is becoming a popular
method of digital conversion. Also know as on-screen digitizing,
this method involves digitizing directly on top of an orthorectified
image such as a satellite image or an aerial photograph. The
features of interest are traced from the image. The benefit of this
over manual digitizing is that no transformation is needed to
convert the data into the needed projection. In addition, the level
of accuracy of the derived dataset is taken from the initial
accuracy of the digital image. Heads-up digitizing is also utilized in
extracting data from scanned and referenced maps.
Coordinate Geometry (COGO)
• Coordinate geometry is a keyboard-based method of
spatial data entry. This method is most commonly
used to enter cadastral or land record data. This
method is highly precise as entering the actual
survey measurements of the property lines creates
the database. Distances and bearings are entered
into the GIS from the original surveyor plats. The GIS
software builds the digital vector file from these
values.
• Geocoding: Geocoding is another keyboard-
based method. Geocoding uses addresses
from a flat file (such as a .dbf file, MS Access
database or excel spreadsheet) to create x,y
coordinate locations interpolated from a
geocodable spatial database.
• These spatial databases are most commonly
street centerline files but can be other types.
Global Positioning Systems (GPS)
• GPS is a way to gather accurate linear and point location data. Originally
devised in the 1970s by the Department of Defense for military purposes,
the current GPS consists of 28 satellites that orbit the earth, transmitting
navigational signals. Through interpolation, these signals received by a data
logger can pinpoint the holder’s location. Depending on the unit, the
locational accuracy can reach to the millimeter. Combined with attribute
data entered at the time of collection, GPS is a rapid and acccurate method
of data collection. For more information access Trimble’s excellent tutorial
on GPS.
Image Processing
• Geodatasets can be derived from digital imagery. Most commonly satellite
imagery is utilized in a process called supervised classification in which a
user selected a sampling of pixels for which the user knows the type
(vegetation species, land use, etc). Using a classification algorithm,
remote sensing software such as ERDAS or ENVI classifies a digital image into
these named categories based on the sample pixels. In contrast to the other
methods discussed, supervised classification results in a raster dataset.
2.3 Spatial and non-spatial data base management systems
1.Spatial Data
• Data that define a location. These are in the form of graphic primitives that are
• usually either points, lines, polygons or pixels.
• · Spatial data includes location, shape, size, and orientation.
• o For example, consider a particular square:
• its center (the intersection of its diagonals) specifies its
• location
• its shape is a square
• the length of one of its sides specifies its size
• the angle its diagonals make with, say, the x-axis specifies its
• orientation.
• Spatial data includes spatial relationships. For example, the arrangement
• of ten bowling pins is spatial data.
2.Non-spatial Data
• Data that relate to a specific, precisely defined location. The data are often
• statistical but may be text, images or multi-media. These are linked in the GIS
• to spatial data that define the location.
• Non-spatial data (also called attribute or characteristic data) is that
• information which is independent of all geometric considerations.
• o For example, a person’s height, mass, and age are non-spatial data
• because they are independent of the person’s location.
• It’s interesting to note that, while mass is non-spatial data, weight is
• spatial data in the sense that something’s weight is very much
• dependent on its location.
• It is possible to ignore the distinction between spatial and non-spatial data.
• However, there are fundamental differences between them:
• o spatial data are generally multi-dimensional and auto correlated.
• o non-spatial data are generally one-dimensional and independent.
2.4 Data quality and sources of error in GIS
• Data used in GIS projects for visualization, analysis,
compilation, and sharing should meet a defined standard
for quality.
• Data quality requirements vary from project to project.
Requirements for how accurate or complete a dataset
needs to be are based on how the data will be used.
• Requirements are also influenced by technical, product,
and client requirements. ArcGIS Pro has a data quality
management system that provides tools to systematically
perform quality assurance and quality control (QA/QC) on
your GIS data, efficiently work through the issues
identified, and generate quality-related reports.
• Quality can simply be defined as the fitness for use for a
specific data set. Data that is appropriate for use with one
application may not be fit for use with another.
• It is fully dependant on the scale, accuracy, and extent of the
data set, as well as the quality of other data sets to be used.
• The recent U.S. Spatial Data Transfer Standard (SDTS)
identifies five components to data quality definitions.
– Lineage
– Positional Accuracy
– Attribute Accuracy
– Logical Consistency
– Completeness
Lineage
• The lineage of data is concerned with historical
and compilation aspects of the data such as the:
• source of the data; content of the data; data
capture specifications; geographic coverage of the
data; compilation method of the data,
• e.g. digitizing versus scanned; transformation
methods applied to the data; and the use of an
pertinent algorithms during compilation, e.g.
linear simplification, feature generalization.
Positional Accuracy
• The identification of positional accuracy is important.
This includes consideration of inherent error (source
error) and operational error (introduced error).