100% found this document useful (1 vote)
94 views43 pages

Unit 2 Data Management and Processing System

This document discusses data management and processing systems. It covers topics like data processing, information systems, input/output, data organization, database management systems, hardware and software requirements for GIS projects. Specifically, it provides details on: 1) The core components of an information system including input, processing, output and feedback. 2) Characteristics of good data like being accurate, complete, simple, economical, flexible, timely, reliable, relevant and verifiable. 3) Considerations for data organization like content, access, logical and physical structure. 4) Functions of a database management system like data storage, retrieval, modifications and report generation. 5) Minimum hardware and software

Uploaded by

Shasan Sapkota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
94 views43 pages

Unit 2 Data Management and Processing System

This document discusses data management and processing systems. It covers topics like data processing, information systems, input/output, data organization, database management systems, hardware and software requirements for GIS projects. Specifically, it provides details on: 1) The core components of an information system including input, processing, output and feedback. 2) Characteristics of good data like being accurate, complete, simple, economical, flexible, timely, reliable, relevant and verifiable. 3) Considerations for data organization like content, access, logical and physical structure. 4) Functions of a database management system like data storage, retrieval, modifications and report generation. 5) Minimum hardware and software

Uploaded by

Shasan Sapkota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

UNIT 2 DATA MANAGEMENT AND PROCESSING

SYSTEM (5)
• Data processing refers to the process of
performing specific operations on a set of data
or a database.
A database is an organized collection of facts and information, such as
records on employees, inventory, customers, and potential customers. As
these examples suggest, numerous forms of data processing exist and
serve diverse applications.
Data processing primarily is performed on information systems, a
broad concept that encompasses computer systems and related
devices. At its core, an information system consists of input,
processing, and output. In addition, an information system
provides for feedback from output to input.
The input mechanism (such as a keyboard, scanner,
microphone, or camera) gathers and captures raw data and
can be either manual or automated.
Processing, which also can be accomplished manually or
automatically, involves transforming the data into useful
outputs. This can involve making comparisons, taking
alternative actions, and storing data for future use.

Output typically takes the form of reports and


documents that are used by managers. Feedback is
utilized to make necessary adjustments to the input
and processing stages of the information system.
Characteristics of data
•Accurate information is free from error.
Accurate/Complete/Simple. .. •Complete information contains all of the important facts
•Information should be simple to find and understand.

•Information should be relatively inexpensive to produce.


Economical./Flexible/Timely. . •Flexible information can be used for a variety of purposes, not just one.
•Timely information is readily available when needed.

•Reliable information is dependable information.


Reliable//Relevant/Verifiable. . •Relevant information is important to the decision-maker.
•Verifiable information can be checked to make sure it is accurate.
DATA ORGANIZATION
• Data organization is critical to optimal data use. Consequently, it is
important to organize data in such a manner as to reflect business
operations and practices.
• As such, careful consideration should be given to content, access, logical
structure, and physical organization. Content refers to what data are
going to be collected. Access refers to the users that data are provided
to when appropriate. Logical structure refers to how the data will be
arranged. Physical structure refers to where the data will be located.
• One tool that database designers use to show the logical relationships
among data is a data model, which is a map or diagram of entities and
their relationships. Consequently, data modeling requires a thorough
understanding of business practices and what kind of data and
information is needed.
DATABASE MANAGEMENT SYSTEMS
• As indicated previously, a database management system (DBMS) is a group
of programs used as an interface between a database and an applications
program. DBMSs are classified by the type of database model they support.

A relational DBMS would follow the relational model, for example. The
functions of a DBMS include data storage and retrieval, database
modifications, data manipulation, and report generation.

A data definition language (DDL) is a collection of


instructions and commands used to define and describe
data and data relationships in a particular database.
2.1 Hardware and software requirement

• Many GIS applications are implemented within organizations and have to play an important role. 
The effectiveness of GIS projects therefore depends as much on the futuristic planning of hardware
and software requirements, as on technicalities of its implementation. It is therefore necessary to
address the question of how GIS can be successful in an organization and continue to do so for
many years before upgradations of hardware and software are needed.

GIS projects are successful because they help the organisation to


perform better in society. But it is also true that organisational
issues will contribute to the long-term success of a GIS project. In
other words, GIS and the host organisation have a close and two-
way relationship.

It is neither possible nor desirable to recommend here a generalized


requirement in terms of hardware and software for a GIS program
Hardware Requirement
• It is obvious that hardware and software requirements vary considerably depending
on the tasks undertaken.  The following minimum configuration allows installation of
most modern GIS applications for work with small components. Recommended
configurations are noted in parentheses for work with anything other than small
drawings.
2.8 Ghz PIV (Post Implementation Verification ) true PC compatible (dual-core processor
recommended).
1 GB RAM (4 GB or greater recommended).
800 x 600 SVGA (Single video graphic array ) Display (1280 x 1024 or greater
recommended).
250 MB hard disk free space (gigabytes of free space recommended).
Windows 2000, Windows XP, Windows Server 2003, Windows Server 2008 or Vista with
most recent service pack, in standard 32-bit versions or in 64-bit versions. (Windows XP
or greater recommended).
Internet Explorer 6 or most recent IE version plus most recent service pack.
Microsoft's .NET Framework 2.0 or more recent.
IIS 5.1 or greater to operate IMS.
For very large tasks, such as intensive web server applications using IMS, investing in a dual socket, quad processor
machine may be considered. This has eight cores. Various processes within modern GIS applications can use
multiple processors.  
• Memory Requirements for Large Projects - Manifold is designed for
an era where RAM memory is cheap and personnel costs are high.
Undo and many other user-friendly features require a lot of memory
to implement.
For best performance we recommend installing lots of RAM and having plenty of free disk
space for temporary files. RAM is so cheap in modern times it is "penny wise and pound
foolish" not to load up your system with the maximum amount of RAM it can hold.

This will help all your work go faster and not just Manifold. As
always, keep in mind that to really use lots of RAM effectively
you should be running 64-bit Windows such as Windows Vista
x64, Windows Server 2008 x64 or Windows XP x64.
Software Requirement
The following is a list of GIS application packages from which one can
choose.  The choice shall depend on the needs of the organization,
functionality desired and the money available, and the period for which the
planning is being done.  One may need to make a comparison of costs and
benefits (both of which keep changing rapidly) before making a final
decision.
Most widely used notable proprietary software
applications and providers:
Hydro GeoAnalyst – Environmental data management and visualization software
by Schlumberger Water Services.
Autodesk – Products include MapGuide and other products that interface with its
flagship AutoCAD software package.
Cadcorp – Developers of GIS software and OpenGIS standard (e.g. Read/Write
Open Source PostGIS database).
Intergraph – Products include GeoMedia, GeoMedia Profesional, GeoMedia
WebMap, and add-on products for industry sectors, as well as photogrammetry.
ERDAS IMAGINE – A proprietary GIS, Remote Sensing, and Photogrammetry
software developed by ERDAS, Inc.
• ESRI – Products include ArcView 3.x, ArcGIS, ArcSDE,
ArcIMS, and ArcWeb services.
• IDRISI – Proprietary GIS product developed by Clark Labs.
• MapInfo – Products include MapInfo Professional and
MapXtreme. integrates GIS software, data and services.
• MapPoint – Proprietary GIS product developed by
Microsoft.
• GISNet – A web-based GIS system developed by
Geosystems Corporation.
• Caliper – Products include Maptitude, TransCAD and
TransModeler. Develops GIS and the only GIS for
transportation.
• Pictometry – Proprietary software which allows oblique
images to be draped with shapefiles. Many gov't applications
(fire, 911, police, assessor) as well as commercial.
• Black Coral Inc — a leading edge product company
developing geospatial collaboration capabilities that enable
better outcomes for personnel and tactical teams operating
in emergency response and military environments.
2.2 GIS data creation and organization system

• You can turn some types of tabular data into geographic


data. For example, if you have a table or text file
containing spatial positions and attributes, you can create
a layer or new feature class from the data in the table.

If you have a table of x,y coordinates, such as GPS measurements, you can add
it to ArcMap to create a new point layer (known as an x, y event layer). If you
want to make that layer permanent, you can export it from ArcMap or create a
new point feature class in ArcCatalog from the data.
• With a table of addresses, you can also add the
table to ArcMap and use it to create new point
features that represent the addresses. This is
known as geocoding.

In addition, you can create a layer from a table


of the positions of certain phenomena along a
route. A table like this, known as a route event
table, might describe speed limits or pavement
conditions along a road.
• Depicted hardware (field-map technology) is
used mainly for forest inventories, monitoring
and mapping.
Data capture—entering information into the
system—consumes much of the time of
GIS practitioners.
There are a variety of methods used to
enter data into a GIS where it is stored in
a digital format.
• Existing data printed on paper or PET film maps can be digitized or
scanned to produce digital data. A digitizer produces vector data as an
operator traces points, lines, and polygon boundaries from a map.
Scanning a map results in raster data that could be further processed to
produce vector data.

Survey data can be directly entered into a GIS from digital data collection systems on survey
instruments using a technique called coordinate geometry (COGO). Positions from a global
navigation satellite system (GNSS) like Global Positioning System can also be collected and
then imported into a GIS.

A current trend in data collection gives users the ability to utilize field
computers with the ability to edit live data using wireless connections
or disconnected editing sessions..
• This has been enhanced by the availability of low-cost mapping-grade GPS units
with decimeter accuracy in real time. This eliminates the need to post process,
import, and update the data in the office after fieldwork has been collected.

This includes the ability to incorporate positions collected using a laser rangefinder.
New technologies also allow users to create maps as well as analysis directly in the
field, making projects more efficient and mapping more accurate.

Remotely sensed data also plays an important role in data collection and consist of
sensors attached to a platform. Sensors include cameras, digital scanners and lidar,
while platforms usually consist of aircraft and satellites. In England in the mid 1990s,
hybrid kite/balloons called helikites first pioneered the use of compact airborne
digital cameras as airborne geo-information systems.
Aircraft measurement software, accurate to 0.4 mm was used to link
the photographs and measure the ground. Helikites are inexpensive
and gather more accurate data than aircraft. Helikites can be used
over roads, railways and towns where unmanned aerial vehicles
(UAVs) are banned.
• Recently aerial data collection is becoming possible with miniature
UAVs. For example, the Aeryon Scout was used to map a 50-acre area
with a ground sample distance of 1 inch (2.54 cm) in only 12 minutes.
• The majority of digital data currently comes from photo interpretation
of aerial photographs. Soft-copy workstations are used to digitize
features directly from stereo pairs of digital photographs.
• These systems allow data to be captured in two and three dimensions,
with elevations measured directly from a stereo pair using principles
of photogrammetry.
• Analog aerial photos must be scanned before being entered into a
soft-copy system, for high-quality digital cameras this step is skipped.
• Satellite remote sensing provides another important source of
spatial data. Here satellites use different sensor packages to
passively measure the reflectance from parts of the
electromagnetic spectrum or radio waves that were sent out
from an active sensor such as radar. Remote sensing collects
raster data that can be further processed using different bands
to identify objects and classes of interest, such as land cover.
• When data is captured, the user should consider if the data
should be captured with either a relative accuracy or absolute
accuracy, since this could not only influence how information
will be interpreted but also the cost of data capture.
• After entering data into a GIS, the data usually requires editing,
to remove errors, or further processing.
• For vector data it must be made "topologically
correct" before it can be used for some advanced
analysis. For example, in a road network, lines
must connect with nodes at an intersection. Errors
such as undershoots and overshoots must also be
removed. For scanned maps, blemishes on the
source map may need to be removed from the
resulting raster. For example, a fleck of dirt might
connect two lines that should not be connected.
• Metadata is an important, but unfortunately oft overlooked component of GIS data.
 Metadata is ‘data about the data’ and it’s vital to understanding to source, currency,
scale, and appropriateness of using GIS data.
• Metadata-L
The main objective of this list is to provide the GIS community with an open forum to
share ideas and discuss metadata issues and strategies
• National States GIC
A primer on what metadata is and how to use it by the National States Geographic
Information Council (GIC).
• NBII
Standards available for downloading for the National Biological Information Infrastructure.
• USGS Metadata FAQ
FAQ compiled by Peter Schweitzer of the USGS.
• USGS Metadata Tool Page
Lists some metadata tools for download that comply with FGDC standards.
Using geoprocessing to add data
• With geoprocessing, you can use tools to update existing fields, append records
to a table permanently, or append fields to a table dynamically with a join.
The Append tool
• Use this tool to add new features or other data from multiple datasets into an
existing dataset. This tool can append point, line, or polygon feature classes,
tables, rasters, raster catalogs, annotation feature classes, or dimensions feature
classes into an existing dataset of the same type. For example, several tables can
be appended to an existing table, or several rasters can be appended to an
existing raster dataset, but a line feature class cannot be appended to a point
feature class.
The Calculate Field tool
• Calculates the values of a field for a feature class, feature layer, or raster catalog.
• The Calculate Field tool is great for updating either existing fields or newly
created fields. You can calculate numbers, text, or date values into a field. Using
code blocks, you can write scripts to perform advanced calculations.
The Add Join tool
• The Add Join tool will append the fields from the join table to a base table.
• Typically, you'll join a table of data to a layer based on the value of a field that can be found in both tables. The
name of the field does not have to be the same, but the data type has to be the same; you join numbers to
numbers, strings to strings, and so on. You can perform a join with either the Join Data dialog box, accessed by
right-clicking a layer in ArcMap, or the Add Join tool.
• When you create a joined table, the fields that are appended from the join table are not permanently attached
to the base table. You can remove a join to remove the appended fields.
Using joins to add data
• You don't have to use geoprocessing to create a join. You can use the Join Data dialog box in ArcMap to create a
join. Whether created using the Join Data dialog box or with a geoprocessing tool, the join will behave in the
exact same way. Join fields are appended to the base table (or target table), and the appended fields can be
used in field calculations or labeling, symbolizing, or querying the data.
• Learn more about joins and relates
Adding data by editing
• Editing data to update existing attributes or to create new data is a process than can be done in ArcMap. You
can edit attributes through the attribute table of a layer or table or by using the Attributes window.
The Attribute table
• When you open an attribute table, the default view of the table is read-only. However, if you start an edit
session, you can manually edit the attributes in the cells of the table. When you are editing in the attribute
table, a blank row is added to the bottom of the table where you can add new data to the table.
• To make automated edits in the attribute table, you can use the Field Calculator or Calculate Geometry tools to
update the table. You can use these tools outside an edit session; however, you will not be able to undo your
calculations unless you are in an edit session
Manual Digitizing
• This is traditionally the most common way to convert paper-based sources of spatial
information (e.g. maps) to digital data. The paper map is attached by tape to a digitizing
table (or tablet as the smaller digitizers are known). Usually between 4-6 initial points of
which the coordinates are known are logged. Optimally these points are such locations
as the intersections of graticule lines. In the absence of an overlying grid system, points
are taken from identifiable locations such as street intersections or landmarks.
• The data is then digitized by tracing the features of interest with a mouse like hand held
device called a puck. Once all the features are traced the newly acquired data is
transformed from table units (the coordinates of the digitizing table) to real world units
using and algorithm. This algorithm takes the known table coordinates of the initial
points and warps the data to match the real world coordinates assigned to those points.
• To error in the adjustment from the table units to real world coordinates is called the
RMS error. Results are reported as root-mean-square (RMS) error and average error. The
RMS value reflects the range of the error; the precision of the digitized data.
• Factors contributing to this error can be human error, shrinkage or physical alteration of
the paper map and projection differences.
Scanning
• With the proliferation of low cost sources of digital imagery and
large format scanners, heads up digitizing is becoming a popular
method of digital conversion. Also know as on-screen digitizing,
this method involves digitizing directly on top of an orthorectified
image such as a satellite image or an aerial photograph. The
features of interest are traced from the image. The benefit of this
over manual digitizing is that no transformation is needed to
convert the data into the needed projection. In addition, the level
of accuracy of the derived dataset is taken from the initial
accuracy of the digital image. Heads-up digitizing is also utilized in
extracting data from scanned and referenced maps.
Coordinate Geometry (COGO)
• Coordinate geometry is a keyboard-based method of
spatial data entry. This method is most commonly
used to enter cadastral or land record data. This
method is highly precise as entering the actual
survey measurements of the property lines creates
the database. Distances and bearings are entered
into the GIS from the original surveyor plats. The GIS
software builds the digital vector file from these
values.
• Geocoding: Geocoding is another keyboard-
based method. Geocoding uses addresses
from a flat file (such as a .dbf file, MS Access
database or excel spreadsheet) to create x,y
coordinate locations interpolated from a
geocodable spatial database.
• These spatial databases are most commonly
street centerline files but can be other types.
Global Positioning Systems (GPS)
• GPS is a way to gather accurate linear and point location data. Originally
devised in the 1970s by the Department of Defense for military purposes,
the current GPS consists of 28 satellites that orbit the earth, transmitting
navigational signals. Through interpolation, these signals received by a data
logger can pinpoint the holder’s location. Depending on the unit, the
locational accuracy can reach to the millimeter. Combined with attribute
data entered at the time of collection, GPS is a rapid and acccurate method
of data collection. For more information access Trimble’s excellent tutorial
on GPS.
Image Processing
• Geodatasets can be derived from digital imagery. Most commonly satellite
imagery is utilized in a process called supervised classification in which a
user selected a sampling of pixels for which the user knows the type
(vegetation species, land use, etc). Using a classification algorithm,
remote sensing software such as ERDAS or ENVI classifies a digital image into
these named categories based on the sample pixels. In contrast to the other
methods discussed, supervised classification results in a raster dataset.
2.3 Spatial and non-spatial data base management systems

1.Spatial Data
• Data that define a location. These are in the form of graphic primitives that are
• usually either points, lines, polygons or pixels.
• · Spatial data includes location, shape, size, and orientation.
• o For example, consider a particular square:
• its center (the intersection of its diagonals) specifies its
• location
• its shape is a square
• the length of one of its sides specifies its size
• the angle its diagonals make with, say, the x-axis specifies its
• orientation.
• Spatial data includes spatial relationships. For example, the arrangement
• of ten bowling pins is spatial data.
2.Non-spatial Data
• Data that relate to a specific, precisely defined location. The data are often
• statistical but may be text, images or multi-media. These are linked in the GIS
• to spatial data that define the location.
• Non-spatial data (also called attribute or characteristic data) is that
• information which is independent of all geometric considerations.
• o For example, a person’s height, mass, and age are non-spatial data
• because they are independent of the person’s location.
• It’s interesting to note that, while mass is non-spatial data, weight is
• spatial data in the sense that something’s weight is very much
• dependent on its location.
• It is possible to ignore the distinction between spatial and non-spatial data.
• However, there are fundamental differences between them:
• o spatial data are generally multi-dimensional and auto correlated.
• o non-spatial data are generally one-dimensional and independent.
2.4 Data quality and sources of error in GIS
• Data used in GIS projects for visualization, analysis,
compilation, and sharing should meet a defined standard
for quality.
• Data quality requirements vary from project to project.
Requirements for how accurate or complete a dataset
needs to be are based on how the data will be used.
• Requirements are also influenced by technical, product,
and client requirements. ArcGIS Pro has a data quality
management system that provides tools to systematically
perform quality assurance and quality control (QA/QC) on
your GIS data, efficiently work through the issues
identified, and generate quality-related reports.
• Quality can simply be defined as the fitness for use for a
specific data set. Data that is appropriate for use with one
application may not be fit for use with another.
• It is fully dependant on the scale, accuracy, and extent of the
data set, as well as the quality of other data sets to be used.
• The recent U.S. Spatial Data Transfer Standard (SDTS)
identifies five components to data quality definitions.
– Lineage
– Positional Accuracy
– Attribute Accuracy
– Logical Consistency
– Completeness
Lineage
• The lineage of data is concerned with historical
and compilation aspects of the data such as the:
• source of the data; content of the data; data
capture specifications; geographic coverage of the
data; compilation method of the data,
• e.g. digitizing versus scanned; transformation
methods applied to the data; and the use of an
pertinent algorithms during compilation, e.g.
linear simplification, feature generalization.
Positional Accuracy
• The identification of positional accuracy is important.
This includes consideration of inherent error (source
error) and operational error (introduced error).

Attribute Accuracy : Consideration of the accuracy of attributes also helps to


define the quality of the data. This quality component concerns the
identification of the reliability, or level of purity (homogeneity), in a data set.

Logical Consistency : This component is concerned with determining the


faithfulness of the data structure for a data set.
This typically involves spatial data inconsistencies such as incorrect line
intersections, duplicate lines or boundaries, or gaps in lines. These are referred
to as spatial or topological errors.
Completeness : The final quality component involves a statement
about the completeness of the data set. This includes consideration
of holes in the data, unclassified areas, and any compilation
procedures that may have caused data to be eliminated.
Error: inherent and operational,
Inherent error is the error present in source documents and data.
Operational error is the amount of error produced through the data
capture and manipulation functions of a GIS. Possible sources of
operational errors include:
Mis-labelling of areas on thematic maps; misplacement of horizontal
(positional) boundaries; human error in digitizing classification
error;. GIS algorithm inaccuracies; and human bias.
Source of Errors:
  a. Instrumental inaccuracies:
● Satellite/ air photo/ GPS/ surveying (spatial).
● Inaccuracies in attribute measuring instruments.
b.  Human Processing:
● Misinterpretation (e.g. photos), spatial and attribute.
 ● Effects of scale change and generalization.
● Effects of classification (nominal / ordinal / interval).
c. Actual Changes:
● Gradual 'natural' changes: river courses, glacier
● Catastrophic change: fires, floods, landslides.
● Seasonal and daily changes: lake/sea/ river levels.
● Man-made: urban development, new roads.
● Attribute change: forest growth (height etc.),
Processing Errors:
a. Input:  
• Digitizing: human error, the width of a line, spikes,
knots, also entering attribute  data. 
• Dangling nodes (connected to only one arc):
permissible in arc themes (river headwaters etc.).
• Pseudo-nodes (connected to one or two arcs) -
permissible in island arcs, and where attributes
change, e.g. road becomes paved from dirt or vice
versa.
• Projection input error.
b. Manipulation:
● Interpolation of point data into lines and surfaces.
● Overlay of layers, digitized separately, e.g. soils and vegetation.
● The compounding effects of processing and analysis of multiple.
layers: for example, if two layers each have correctness of 90%,
the accuracy of the resulting overlay is around 81%
● Density of observations.
● Inappropriate or inadequate inputs for models.
c. Output: 
• Scale changes - detail and scale bars.
• Color palettes: intended colors don't match from screen to
Printer.
 
GIS Errors, Accuracy, and Precision
• Errors can be injected at many points in a GIS analysis, and one of the
largest sources of error is the data collected.
• Each time a new dataset is used in a GIS analysis, new error
possibilities are also introduced.
• One of the feature benefits of GIS is the ability to use information
from many sources, so the need to have an understanding of the
quality of the data is extremely important.
• Accuracy in GIS is the degree to which information on a map matches
real-world values. It is an issue that pertains both to the quality of the
data collected and the number of errors contained in a dataset or a
map.
• One everyday example of this sort of error would be if an online
advertisement showed a sweater of a certain color and pattern, yet
when you received it, the color was slightly off.
• Precision refers to the level of measurement and
exactness of description in a GIS database. Map precision
is similar to decimal precision. Precise location data may
measure position to a fraction of a unit (meters, feet,
inches, etc.).
• Precision attribute information may specify the
characteristics of features in great detail.
• One pair fits as you would expect, but the other pair is too
short. Do you suspect a quality issue with the shoes or do
you buy the shoes that fit? Would you do the same when
selecting GIS data for a project?
• The more accurate and precise the data, the higher cost to
obtain and store it because it can be very difficult to obtain and
will require larger data files.
• For example, a 1-meter-resolution aerial photograph will cost
more to collect (increased equipment resolution) and cost more
to store (greater pixel volume) than a 30-meter-resolution aerial
photograph.
• Highly precise data does not necessarily correlate to highly
accurate data nor does highly accurate data imply high precision
data. They are two separate and distinct measurements.
Relative accuracy and precision, and the inherent error of both
precision and accuracy of GIS data determine data quality.
THANKS TOO MUCH

You might also like