0% found this document useful (0 votes)
33 views41 pages

Lecture 3

Uploaded by

Leopord Leon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views41 pages

Lecture 3

Uploaded by

Leopord Leon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Data Analysis

› For many users data analysis is the most interesting and rewarding
part of a GIS project. This is where they can start to find answers
to some of their questions, and use GIS to help develop new
questions for research. The analysis they undertake with GIS may
lead to new information that will inform decision making. For
example, end users of a web- based mapping applications may be
able to answer simple queries about the location of places of
interest, or to identify shortest routes between features using
mobile GIS platforms. As a result they can make decisions about
where to go next, and how to get there. In a more research
oriented application ‘what if’ questions might be developed and
investigated. For example presentation of the results of research
into complex planning issues may inform decision-making that will
affect the development of whole communities and regions.
TABLE 1 Examples of questions for GIS analysis

Wind farm siting application Retail store location application


Location Where is the proposed wind farm Where is the region in which the
location? store will be located?
Patterns What is the pattern of land use in this What are the flows of traffic along
area? major routes in this area?
Trends How has land use changed over the Which towns in the area have
last 20 years? experienced the greatest population
grown in recent years?
Conditions Where is there an area of privately Where is there an area of commercially
owned land of a suitable size, with zoned land, close to major transport
adequate access from an existing routes and with potential catchment of
road? over 10,000 customers?
Implications If we located the wind farm here, If we locate a store here, what would
Data analysis terminology
Term Definition
Entity An individual point, line or area in a GIS database
Attribute Data about an entity. In a vector GIS attributes are stored in a database. For example,
the street name for a line entity that represents a road may be stored. In a raster GIS
the value of a cell in the raster grid is a numerical code used to represent the attribute
present. For example, code ‘1’ may be used for a motorway, ‘2’ for a main road and ‘3’
for a minor road. Further attributes of an entity (for instance street name) may be stored
in a database linked to the raster image
Feature An object in the real world to be encoded in a GIS database
Data layer A data set for the area of interest in a GIS. For example, the Happy Valley GIS contains
data layers on nine themes, including roads, ski runs, hotels and land use. Data layers in
a GIS normally contain data of only one entity type (that is points, lines or areas). It
would be unusual for a data layer to contain, for instance, roads and hotels; these would
be stored as separate thematic data layers
Image A data layer in a raster GIS. It should be remembered that each cell in a raster image
will contain a single value that is a key to the attribute present there
MEASUREMENTS IN GIS – LENGTHS,
PERIMETERS AND AREAS
› Calculating lengths, perimeters and areas is a common
application of GIS. Measuring the length of a ski piste
from a digital map is a relatively straightforward task.
However, it is possible that different measurements can
be obtained depending on the type of GIS used (raster or
vector) and the method of measurement employed. It is
important to remember that all measurements from a GIS
will be an approximation, since vector data are made up
of straight line segments (even lines which appear as
curves on the screen are stored as a collection of short
straight line segments), and all raster entities are
approximated using a grid cell representation.
QUERIES
› Performing queries on a GIS database to retrieve data is
an essential part of most GIS projects. Queries offer a
method of data retrieval, and can be performed on data
that are part of the GIS database, or on new data
produced as a result of data analysis. Queries are useful
at all stages of GIS analysis for checking the quality of
data and the results obtained. For example, a query may
be used if a data point representing a hotel is found to lie
erroneously in the sea after data encoding. A query may
establish that the address of the hotel had been wrongly
entered into a database, resulting in the allocation of an
incorrect spatial reference
› The method of specifying queries in a GIS can have a highly
interactive flavour. Users may interrogate a map on the
computer screen or browse through databases with the help of
prompts and query builders. A user may point at a hotel on
the computer screen and click to obtain the answer to ‘What is
the name of this hotel?’. Queries can be made more complex
by combination with questions about distances, areas and
perimeters, particularly in a vector GIS, where these data are
stored as attributes in a database. This allows questions such
as ‘Where is the longest ski run?’ to be answered in one step,
where a raster GIS might require two – one to calculate the
lengths of all the ski runs, and the second to identify the
longest
Boolean operators: Venn diagrams
RECLASSIFICATION
› Reclassification is an important variation on the query idea
in GIS, and can be used in place of a query in raster GIS. If
we wished to ask ‘Where are all the areas of forestry?’ an
answer could be obtained using a query or by reclassifying
the image. Reclassification would result in a new image.
For example, if cells representing forestry in the original
image had a value of 10, a set of rules for the
reclassification could be:
› Cells with values = forestry (value 10) should take the new
value of 1
› Cells with values = forestry should take the new value of 0
BUFFERING AND NEIGHBOURHOOD
FUNCTIONS
› There is a range of functions available in GIS that allow a spatial
entity to influence its neighbours, or the neighbours to influence the
character of an entity. The most common example is buffering, the
creation of a zone of interest around an entity. Other neighborhood
functions include data filtering. This involves the recalculation of
cells in a raster image based on the characteristics of neighbours.
› The question ‘Which hotels are within 200 m of a main road?’ could
be approached in a number of ways. One option would be, first, to
produce a buffer zone identifying all land up to 200 m from the
main roads; and second, to find out which hotels fall within this
buffer zone using a point-in- polygon overlay (see later in this
chapter). Then a query would be used to find the names of the
hotels.
BUFFERING AND NEIGHBOURHOOD
FUNCTIONS
› Buffering, as already stated, is used to identify a zone of
interest around an entity, or set of entities. If a point is
buffered a circular zone is created. Buffering lines and areas
creates new areas. .Buffering is very simple conceptually but a
complex computational operation. Creating buffer zones
around point features is the easiest operation; a circle of the
required radius is simply drawn around each point. Creating
buffer zones around line and area features is more
complicated. Some GIS do this by placing a circle of the
required radius at one end of the line or area boundary to be
buffered. This circle is then moved along the length of the
segment. The path that the edge of the circle tangential to the
line makes is used to define the boundary of the buffer zone.
Buffer zones around (a) point, (b) line and (c)
area features
› Whilst buffer zones are often created with the use of one
command or option in vector GIS, a different approach is
used in many raster GIS. Here proximity is calculated.
This method was outlined earlier and will result in a new
raster data layer where the attribute of each cell is a
measure of distance.
› Other operations in raster GIS where the values of
individual cells are altered on the basis of adjacency
are called neighborhood functions.
› Filtering is one example used for the processing of remotely
sensed imagery. Filtering will change the value of a cell based
on the attributes of neighbouring cells. The filter is defined as
a group of cells around a target cell. The size and shape of the
filter are determined by the operator. Common filter shapes
are squares and circles, and the dimensions of the filter
determine the number of neighbouring cells used in the
filtering process. The filter is passed across the raster data set
and used to recalculate the value of the target cell that lies at
its centre. The new value assigned to the target cell is
calculated using one of a number of algorithms. Examples
include the maximum cell value within the filter and the most
frequent value
› A combination of distance and neighbourhood operations can be
used to perform some quite complex distance or proximity
calculations that take into account not only horizontal linear
distance but also the effects of vertical distance or slope (for
example climbing or descending a hill). Other cost factors that can
be accounted for include the effect of wind speed or resistance,
trafficability, load carried or other push/pull factors. Cost factors
such as slope and wind speed are not the same in all directions, so
proximity models need to take this into account. When the factors
controlling relative distance are the same in all directions, proximity
models are said to be ‘isotropic’ (e.g. simple buffering or linear
distance surfaces). When the factors controlling relative distance
are not the same in all directions, promixity models are said to be
‘anisotropic’
INTEGRATING DATA – MAP OVERLAY
› The ability to integrate data from two sources using map
overlay is perhaps the key GIS analysis function. Using
GIS it is possible to take two different thematic map layers
of the same area and overlay them one on top of the
other to form a new layer. The techniques of GIS map
overlay may be likened to sieve mapping, the overlaying
of tracings of paper maps on a light table. Map overlay
has its origins in.
› Overlays where new spatial data sets are created involve the
merging of data from two or more input data layers to create a
new output data layer. This type of overlay may be used in a
variety of ways. For example, obtaining an answer to the
question ‘Which hotels are within 200 m of a main road?’
requires the use of several operations. First, a buffering
operation must be applied to find all the areas of land within
200 m of a main road, then an overlay function used to
combine this buffer zone with the hotel data layer. This will
allow the identification of hotels within the buffer zone.
Alternatively, the selection of a site for a new ski piste may
require the overlay of several data sets to investigate criteria of
land use, hotel location, slope and aspect.
Vector overlay
› Vector map overlay relies heavily on the two associated
disciplines of geometry and topology. The data layers
being overlaid need to be topologically correct so that
lines meet at nodes and all polygon boundaries are
closed. To create topology for a new data layer produced
as a result of the overlay process, the intersections of
lines and polygons from the input layers need to be
calculated using geometry. For complex data this is no
small task and requires considerable computational
power
› 3 shows the three main types of vector overlay; point-in-
polygon, line-in-polygon and polygon-on- polygon. This
figure also illustrates the complexity of the overlay
operations. The overlay of two or more data layers
representing simple spatial features results in a more
complex output layer. This will contain more polygons,
more intersections and more line segments than either of
the input layers.
› Point-in-polygon overlay is used to find out the polygon
in which a point falls.. Using point-in- polygon overlay on
these vector data layers it is possible to find out in which
land use polygon each meteorological station is located.
› Line-in-polygon overlay is more complicated. Imagine
that we need to know where roads pass through forest
areas to plan a scenic forest drive. To do this we need to
overlay the road data on a data layer containing forest
polygons. The output map will contain roads split into
smaller segments representing ‘roads in forest areas’ and
‘roads outside forest areas’. Topological information must
be retained in the output map, therefore this is more
complex than either of the two input maps. The output
map will contain a database record for each new road
segment.
› Polygon-on-polygon overlay. GIS operation (vector
overlay) where polygons on one dataset are overlaid onto
polygons of another to determine location of different
polygons..
› One problem with vector overlay is the possible
generation of sliver (or ‘weird’) polygons These appear
after the overlay of two data sets that contain the same
spatial entities. These ‘sliver’ polygons arise from
inconsistencies and inaccuracies in the digitized data.
Frequently such errors go undetected but they can
become apparent during vector overlay operations.
Identifying areas suitable for a nuclear waste
repository
Raster overlay
› In the raster data structure everything is represented by
cells – a point is represented by a single cell, a line by a
string of cells and an area by a group of cells. Therefore, the
methods of performing overlays are different from those in
vector GIS. Raster map overlay introduces the idea of map
algebra or ‘mapematics’. Using map algebra input data
layers may be added, subtracted, multiplied or divided to
produce output data. Mathematical operations are
performed on individual cell values from two or more input
layers to produce an output value. Thus, the most important
consideration in raster overlay is the appropriate coding of
point, line and area features in the input data layers.
Raster overlays: (a)
point-in-polygon
(using add); (b) line-
in-polygon (using
add); (c) polygon-on-
polygon
› There are two issues affecting raster overlay that need to be
considered by users: resolution and scales of measurement.
› Resolution is determined by the size of the cell used. SPOT
satellite data, for example, are collected at a resolution of 10
m. For some analyses you may wish to overlay a SPOT image
with data collected at a different resolution, say 40 m. The
result will be an output grid with a resolution of 10 m, which is
greater than the resolution at which the second data set was
collected. Since you cannot disaggregate data with any
degree of certainty, a better approach to the overlay of these
two data sets. would be to aggregate cells in the SPOT image
to match the resolution of the second data set.
› The second issue is that of scales of measurement. Nonsensical
overlays can be performed on map layers coded using nominal,
ordinal, interval and ratio if these scales are not sufficiently
understood. For example, it is possible to add, subtract or
multiply two maps, one showing land use coded using a nominal
scale (where 1 represents settlement and 2 represents water)
and another showing mean annual rainfall coded on a ratio
scale. The result, however, is complete nonsense because there
is no logical relationship between the numbers. A mean annual
rainfall of 1000 mm minus land use type 1 is meaningless! Care
is therefore needed when overlaying raster data layers to
determine whether the operation makes real sense according to
the scales of measurement used.
SPATIAL INTERPOLATION
› Spatial interpolation is the procedure of estimating the
values of properties at unsampled sites within an area
covered by existing observations. In an ideal situation a
spatial data set would provide an observed value at every
spatial location. Satellite or aerial photography goes
some way to providing such data; however, more often
data are stratified (consisting of regularly spaced
observations but not covering every spatial location),
patchy (clusters of observations at specific locations) or
even random (randomly spaced observation across the
study area). The role for interpolation in GIS is to fill in
the gaps between observed data points.
› A common application of interpolation is for the construction
of height contours. Contours on a topographic map are drawn
from a finite number of height observations taken from surveys
and aerial photographs. The height of the land surface
between these points is estimated using an interpolation
method and represented on a map using contours.
Traditionally, contour maps were produced by hand, but today
they are most often drawn by computer. In the old hand–eye
method, often referred to as line threading or eye-balling,
contour lines were drawn between adjacent spot heights and
divided into the chosen contour interval by assuming that the
slope between adjacent spot heights remained constant.
ANALYSIS OF SURFACES
› Consideration of techniques available in GIS for surface analysis
follows on logically from the discussion of interpolation, since
interpolation techniques will invariably have been used to create
a surface for analysis. Since the Earth is three-dimensional, it
would seem that all GIS applications should include some
element of three-dimensional analysis. However, software
packages able to handle and analyze true three-dimensional
data are limited, and, analysis in GIS is more likely to be 2.5D,
since the surfaces that are produced are simply that – surfaces.
There is no underlying or overlying information. This prevents
the analysis of, for example, geological or atmospheric data, and
even to add realistic-looking trees with height to a GIS terrain
model, a CAD or other design package may be necessary.
Calculating slope and aspect
› Slope is the steepness or gradient of a unit of terrain,
usually measured as an angle in degrees or as a
percentage. Aspect is the direction in which a unit of
terrain faces, usually expressed in degrees from north.
These two variables are important for many GIS
applications. Slope values may be important for the
classification of the slope for skiing. Aspect is important
to ensure that the piste selected will retain snow cover
throughout the ski season (a completely south-facing
slope may not be suitable, since snow melt would be
more pronounced than on a north- facing slope).
Visibility analysis
› One of the common uses of terrain models is in visibility
analysis, the identification of areas of terrain that can be
seen from a particular point on a terrain surface. This is
used in a variety of applications ranging from locating
radio transmitters and cellular communications (mobile
phone) masts for maximum coverage to minimizing the
impact of commercial forestry in protected areas.
Conversely, it could be used to determine other locations
in the valley that could see the top of the new ski piste, or
any other point along the piste and so determine the
visual impact of this new development and its associated
infrastructure.
NETWORK ANALYSIS
› A network is a set of interconnected lines making up a set
of features through which resources can flow. Rivers are
one example, but roads, pipelines and cables also form
networks that can be modelled in GIS.. Here we consider
the applications and analysis of network data and how
answers to networking problems could be obtained using
a raster GIS. There are several classic network-type
problems, including identifying shortest paths, the
travelling salesperson problem, allocation modelling and
route tracing.
The shortest path problem
› The shortest path between one point and another on a network
may be based on shortest distance, in which case either raster
or vector GIS could attempt a solution. A raster GIS could
provide an answer from a proximity analysis. Impediments to
travel can be added to a raster grid by increasing the value of
cells that are barriers to travel, then finding a ‘least cost’ route
through a grid. Networks structured in vector GIS offer more
flexibility and a more thorough analysis of impediments such as
traffic restrictions and congestion. However, the shortest path
may not be defined simply in terms of distance. For example,
for an emergency vehicle to reach an accident the quickest
route may be needed and this may require the traverse of less
congested minor roads.
The travelling salesperson problem
› The travelling salesperson problem is a common
application of network analysis. The name arises from
one application area where a salesperson needs to visit a
specific set of clients in a day, and to do so by the best
route (usually the quickest). The waste collection vehicle
has the same problem – it needs to visit all the hotels in
Happy Valley, then return to base. In each case the
question is ‘In which order should the stops be visited,
and which path should be taken between them?’ This is a
complex computing task. Imagine a situation where the
delivery van has to visit just 10 customers.
Location-allocation modelling
› Network analysis may also be used for the allocation of resources by
the modelling of supply and demand through a network. To match
supply with demand requires the movement of goods, people,
information or services through the network. In other words, supply
must be moved to the point of demand or vice versa. Allocation
methods usually work by allocating links in the network to the
nearest supply centre, taking impedance values into account. Supply
and demand values can also be used to determine the maximum
catchment area of a particular supply centre based on the demand
located along adjacent links in the net- work. Without regard for
supply and demand limitations, a given set of supply centres would
service a whole network. If limits to supply and demand levels are
indicated then situations can arise where parts of the network are not
serviced despite a demand being present.
› The ability to trace flows of goods, people, services and
information through a network is another useful function
of network analysis. Route tracing is particularly useful
for networks where flows are unidirectional, such as
stream networks, sewerage systems and cable TV
networks. In hydrological applications route tracing can
be used to determine the streams contributing to a
reservoir or to trace pollutants downstream from the site
of a spillage. Route tracing can be used to find all the
customers serviced by a particular sewer main or find
those affected by a broken cable.
› Connectivity, the way network links join at network nodes, is
the key concept in route tracing. Without the correct
connectivity in a network, route tracing and most other forms
of network analysis would not work. Directionality is also
important for route tracing as this indicates the direction in
which the materials are moving along the network. Knowledge
of the flow direction is critical to establishing upstream and
downstream links in the network. This gives rise to the concept
of a directed network in which each link in the system has an
associated direction of flow. This is usually achieved during
the digitizing process by keeping the direction of digitizing the
same as the flow direction in the network.
Quantitative spatial analysis
› Quantitative spatial analysis allows ideas about spatial
processes and patterns to be tested and is used to help
find meaning in spatial data. Quantitative analysis
methods can .
– Reduce large data sets to smaller amounts of more meaningful
information.
– Explore data to suggest hypotheses or examine the distribution
of data. Exploratory data analysis (EDA) techniques are used for
this.
– Explore spatial patterns, test hypotheses about these patterns
and examine the role of randomness in their generation.

You might also like