Mining Spatial Data Bases
Mining Spatial Data Bases
A spatial database stores a large amount of space-related data, such as maps, preprocessed
remote sensing or medical imaging data, and VLSI chip layout data.
Spatial databases have many features distinguishing them from relational databases.
multidimensional spatial indexing structures that are accessed by spatial data access methods
and often require spatial reasoning, geometric computation, and spatial knowledge
representation techniques.
Spatial data mining refers to the extraction of knowledge, spatial relationships, or other
interesting patterns not explicitly stored in spatial databases. Such mining demands an
remote sensing, image database exploration, medical imaging, navigation, traffic control,
environmental studies, and many other areas where spatial data are used.
A crucial challenge to spatial data mining is the exploration of efficient spatial data mining
techniques due to the huge amount of spatial data and the complexity of spatial data types and
“Can we construct a spatial data warehouse?” Yes, as with relational data, we can integrate
spatial data to construct a data warehouse that facilitates spatial data mining.
collection of both spatial and nonspatial data in support of spatial data mining and spatial-data
There are about 3,000 weather probes distributed in British Columbia (BC), Canada, each
recording daily temperature and precipitation for a designated small area and transmitting
With a spatial data warehouse that supports spatial OLAP, a user can view weather patterns on a
map by month, by region, and by different combinations of temperature and precipitation, and
can dynamically drill down or roll up along any dimension to explore desired patterns, such as
warehouses.
The first challenge is the integration of spatial data from heterogeneous sources and systems.
The second challenge is the realization of fast and flexible on-line analytical processing in spatial
data warehouses. The star schema is a good choice for modeling spatial data warehouses because
it provides a concise and organized warehouse structure and facilitates OLAP operations.
There are three types of dimensions in a spatial data cube:
A nonspatial dimension contains only nonspatial data. Nonspatial dimensions temperature and
precipitation can be constructed for the warehouse in Example, since each contains nonspatial data
whose generalizations are nonspatial (such as “hot” for temperature and “wet” for precipitation).
For example, the spatial dimension city relays geographic data for the U.S. map. Suppose that the
dimension’s spatial representation of, say, Seattle is generalized to the string “pacific northwest.”
Although “pacific northwest” is a spatial concept, its representation is not spatial (since, in our
For example, the dimension equi temperature region contains spatial data, as do all of its
generalizations, such as with regions covering 0-5 degrees (Celsius), 5-10 degrees, and so on.
There are two types of measures in a spatial data cube:
A numerical measure contains only numerical data. For example, one measure in a spatial
data warehouse could be the monthly revenue of a region, so that a roll-up may compute the
total revenue by year, by county, and so on. Numerical measures can be further classified into
generalization (or roll-up) in the spatial data cube of Example, the regions with the same
range of temperature and precipitation will be grouped into the same cell, and the measure so
Collect and store the corresponding spatial object pointers but do not perform precomputation
Precompute and store a rough approximation of the spatial measures in the spatial data cube.
Similar to the mining of association rules in transactional and relational databases, spatial
association rules can be mined in spatial databases. A spatial association rule is of the form A ⇒ B
[s%,c%], where A and B are sets of spatial or nonspatial predicates, s% is the support of the rule,
and c% is the confidence of the rule. For example, the following is a spatial association rule:
This rule states that 80% of schools that are close to sports centers are also close to parks, and
Such a problem is essentially the problem of mining spatial co-locations. Finding spatial
Spatial data clustering identifies clusters, or densely populated regions, according to some
Suppose that you would like to classify regions in a province into rich versus poor according to
In doing so, you would like to identify the important spatial-related factors that determine a
region’s classification.
Many properties are associated with spatial objects, such as hosting a university, containing
These properties can be used for relevance analysis and to find interesting classification schemes.
Spatial trend analysis deals with another issue: the detection of changes and trends along a spatial
dimension.
Typically, trend analysis detects changes with time, such as the changes of temporal patterns in
time-series data.
Spatial trend analysis replaces time with space and studies the trend of nonspatial or spatial data
Spatial database systems usually handle vector data that consist of points, lines, polygons
Typical examples of such data include maps, design graphs, and 3-D representations of the
However, a huge amount of space-related data are in digital raster (image) forms, such as satellite