0% found this document useful (0 votes)
52 views40 pages

High Dimensional Data Management: Dr. Mohammed Eunus Ali

The document discusses high dimensional data management. It provides an overview of spatial, temporal and multimedia data and the need to manage large collections of such data. It describes spatial data management techniques including raster and vector data models. It also discusses moving object data management, high dimensional data management techniques, and outlines topics to be covered in the course like spatial databases, modeling spatial data using DBMS, and modeling spatial relationships and data types in a spatial database management system (SDBMS) data model.

Uploaded by

nazib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views40 pages

High Dimensional Data Management: Dr. Mohammed Eunus Ali

The document discusses high dimensional data management. It provides an overview of spatial, temporal and multimedia data and the need to manage large collections of such data. It describes spatial data management techniques including raster and vector data models. It also discusses moving object data management, high dimensional data management techniques, and outlines topics to be covered in the course like spatial databases, modeling spatial data using DBMS, and modeling spatial relationships and data types in a spatial database management system (SDBMS) data model.

Uploaded by

nazib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

High Dimensional Data

Management

Dr. Mohammed Eunus Ali


([email protected])

Department of Computer Science and Engineering


Bangladesh University of Engineering and Technology (BUET)
Dhaka-1000, Bangladesh
Motivation
• Spatial data
– Geographic Information: Melbourne (37, 145)
– Which city is at (30, 140)?

– Computer Aided Design: width and height (40, 50)


– Any part that has a width of 40 and height of 50?

• Temporal data
– Location of moving cars
– Hoe many cars in the downtown area?

• Multimedia data
– Color histograms of images
– Give me the most similar
image to

– Multimedia Features: color, shape, texture


2
Objectives

Motivation of Spatial, Temporal and Multimedia Data

Spatial Data Management Techniques

Moving Object Data Management Techniques

High Dimensional Data Management Techniques

Study of Advanced Topics on Database Management

3
Assessment

Critical Review on a Research Topic and Presentation:


30% (Report 5 pages 15%, Presentation 15%)

Project: 20%

Final Exam: 50%

4
Course Outline

Weeks 1-7: Lectures by the Instructor

Week 2: Groups Formation (Each group of 1-3 Students)

Week 3: Selecting a Research Topic + Project


Weeks 4-12: Presentation on the Research Topic. Will be
decided based on lottery.
Week 8: Preliminary Draft Due of Research Article
Week 14: Final Submission of Project (Whole Day)

5
Research Report

• It is an individual group project and should contain your


own written text and figures.
• The report should include critical reviews of 2-3
research papers.
• For each reviewed paper, you should cover the following:
•Summary
•What important problem is solved in the paper? 
•Strengths and weakness of the paper
•Is there any way to improve the paper?
•The report should include Abstract, Introduction, Critical
Reviews, Analysis and Recommendation, and Conclusion
•The report should be 5pages long and should be
emailed to in pdf form with the following subject title:
[Course#-report2013] and your Student IDs
6
DBMS

• A DBMS is a (usually complex) piece of software that


sits in front of a collection of data, and mediates
applications accesses to the data, guaranteeing many
properties about the data and the accesses.

Database Management System (DBMS)


provides….
… efficient, reliable, convenient, and safe
multi-user storage of and access to massive
amounts of persistent data.

7
Spatial Databases

• A common technology for some


Applications:
– GIS (geographic/geo-referenced data)
– VLSI design (geometric data)
– Modeling complex phenomena (spatial data)
• All need to manage large collections of
relatively simple spatial objects
• Spatial DB vs. Image/pictorial DB
– Spatial DB contains objects “ in ” the space
– Image DB contains representations “ of ” a space
(images, pictures,… : raster data)

8
Spatial Data Model?

The spatial component


of a layer may be
represented in two ways:

• in raster (image) format


as pixels

•in vector format as


points and lines and
areas (PLA-model)

9
Raster versus Vector Model

• Raster data model • Vector data model


– location is referenced by a grid – location referenced by x,y
cell in a rectangular array coordinates, which can be
(matrix) linked to form lines and
– attribute is represented as a polygons
single value for that cell – attributes referenced through
– much data comes in this form unique ID number to tables
• images from remote sensing – much data comes in this form
(LANDSAT, SPOT) • DIME and TIGER files from
• scanned maps US Census
• elevation data from USGS • DLG from USGS for
– best for continuous features: streams, roads, etc
• elevation • census data (tabular)
• temperature – best for features with discrete
• soil type boundaries
• • property lines
land use
• political boundaries
• transportation
10
SDBMS

A spatial database system:


• Is a database system
– A DBMS with additional capabilities for handling spatial data
• Offers spatial data types (SDTs) in its data model
and query language
– Structure in space: e.g., POINT, LINE, REGION
– Relationships among them: (l intersects r)
• Supports SDT in its implementation providing at
least
– spatial indexing (retrieving objects in particular area without
scanning the whole space)
– efficient algorithms for spatial joins (not simply filtering the
cartesian product)

11
An Example SDBMS: Oracle

• Oracle Spatial Extension


– can work with Oracle 10g DBMS
– has spatial data types (e.g. polygon), operations (e.g. overlap)
callable from SQL3 query language
– has spatial indices, e.g. R--‐trees

12
SDBMS

13
14
Spatial Data using DBMS

15
Spatial Data Using DBMS

16
Modeling

Assume 2-D and GIS application, two basic things


need to be represented:
• Objects in space: cities, forests, or rivers single
objects
• Space/Coverage: say something about every point in
space (e.g., partitions, thematic maps)
spatially related collections of objects
Modeling: spatial primitives for objects

• Point: object represented only by its location in


space, e.g. center of a state

• Line (actually a curve or ployline): representation of


moving through or connections in space, e.g. road,
river

• Region: representation of an extent in 2d-space, e.g.


lake, city
Modeling: coverages

• Partition: set of region objects that are required to


be disjoint (adjacency or region objects with
common boundaries), e.g. thematic maps
• Networks: embedded graph in plane consisting of
set of points (vertices) and lines (edges) objects, e.g.
highways, power supply lines, rivers
• Others:
nested partitions
digital terrain models
Discrete Geometric Bases

• Is Euclidean geometry a suitable base for modeling?


• Problem: space is continuous computer numbers
are discrete

20
Discrete Model

• Two approaches:
– Simplicial complexes
• Frank & Kuhn 86
• Egenhofer, Frank & Jackson 89
– Realms
• Güting & Schneider 93
• Schneider 97

21
Realm concepts

• A realm is a set of points and non intersecting lines


segments over a discrete domain that is a grid.
• Values of spatial data types can be composed from
the objects present in a realm.

22
Realm for points, lines and regions

Regions : A, B
Lines : C
Points : D

23
ROSE Algebra (Güting & Schneider 95)

ROSE = RObust Spatial Extension


A system of realm-based spatial data types->objects composed
from realm elements
Types: points, lines, regions

24
Modeling: a sample spatial type system (1/2)
EXT={lines, regions}, GEO={points, lines, regions}
• Spatial predicates for topological
relationships:
– inside: geo x regions → bool
– intersect, meets: ext1 x ext2 → bool
– adjacent, encloses: regions x regions → bool
• Operations returning atomic spatial data
types:
– intersection: lines x lines → points
– intersection: regions x regions → regions
– plus, minus: geo x geo → geo
– contour: regions → lines
Modeling: a sample spatial type system (2/2)
• Spatial operators returning numbers
– dist: geo1 x geo2 → real
– perimeter, area: regions → real
• Spatial operations on set of objects
– sum:
– A spatial aggregate function, geometric union of all
attribute values, e.g. union of set of provinces
determine the area of the country
– closest:
– Determines within a set of objects whose spatial
attribute value has minimal distance from geometric
query object
Modeling: spatial relationships

• Topological relationships: e.g. adjacent, inside,


disjoint. Are invariant under topological
transformations like translation, scaling, rotation
• Direction relationships: e.g. above, below, or
north_of, sothwest_of, …
• Metric relationships: e.g. distance

Valid topological relationships between two simple


regions (no holes, connected): disjoint, in, touch,
equal, cover, overlap
Topological Relationship

28
Modeling: SDBMS data model

• DBMS data model must be extended by SDTs at the


level of atomic data types (such as integer, string),
or better be open for user-defined types (OR-DBMS
approach):

relation states (sname: STRING; area: REGION; spop: INTEGER)


relation cities (cname: STRING; center: POINT; ext: REGION;cpop:
INTEGER);
relation rivers (rname: STRING; route: LINE)
Querying

• Two main issues:


1. Connecting the operations of a spatial algebra
(including predicates for spatial relationships) to the
facilities of a DBMS query language. Fundamental
spatial algebra operator are:
– Spatial selection
– Spatial join
2. Providing graphical presentation of spatial data (i.e.
results of queries), and graphical input of SDT
values used in queries.
Querying: spatial selection

• Spatial selection: returning those objects


satisfying a spatial predicate with the query
object
– “All cities in Bavaria”
SELECT sname FROM cities c WHERE c.center inside
Bavaria.area
– “All rivers intersecting a query window”
SELECT * FROM rivers r WHERE r.route intersects Window
– “All big cities no more than 100 Kms from Hagen”
SELECT cname FROM cities c
WHERE dist(c.center, Hagen.center) < 100 and c.pop > 500k
(conjunction with other predicates and query optimization)
Querying: spatial join

• Spatial join: A join which compares any two joined


objects based on a predicate on their spatial
attribute values.
– “For each river pass through Bavaria, find all cities within less
than 50 Kms.”
SELECT r.rname, c.cname, length(intersection(r.route, c.area))
FROM rivers r, cities c
WHERE r.route intersects Bavaria.area and dist(r.route,c.area)
< 50
Querying: I/O (1/2)
• Graphical I/O issue: how to determine “Window” or
“Bavaria” in previous examples (input); or how to
show “intersection(route, Bavaria.area)” or “r.route”
(output) (results are usually a combination of several
queries).
• Requirements for spatial querying [Egenhofer]:
– Spatial data types
– Graphical display of query results
– Graphical combination (overlay) of several query results (start a
new picture, add/remove layers, change order of layers)
– Display of context (e.g., show background such as a raster
image (satellite image) or boundary of states)
– Facility to check the content of a display (which query
contributed to the content)
Querying: I/O (2/2)
Other requirements for spatial querying [Egenhofer]:
– Extended dialog: use pointing device to select objects within a
subarea, zooming, …
– Varying graphical representations: different colors, patterns,
intensity, symbols to different objects classes or even objects
within a class
– Legend: clarify the assignment of graphical representations to
object classes
– Label placement: selecting object attributes (e.g., population) as
labels
– Scale selection: determines not only size of the graphical
representations but also what kind of symbol be used and
whether an object be shown at all
– Subarea for queries: focus attention for follow-up queries
Data Structures & Algorithms
1. Implementation of spatial algebra in an integrated
manner with the DBMS query processing

2. Not just simply implementing atomic operations


using computational geometry algorithms, but
consider the use of the predicates within set-
oriented query processing, i.e., use of spatial
indexing or access methods, and spatial join
algorithms
Data Structures (1/3)

• Representation of a value of a SDT must be


compatible with two different views:
1. DBMS perspective:
– Same as attribute values of other types with respect
to generic operations
– Can have varying and possibly large size
– Reside permanently on disk page(s)
– Can efficiently be loaded into memory
– Offers a number of type-specific implementations for
generic operations needed by the DBMS (e.g.,
transformation functions from/to ASCII or graphic)
Data Structures (2/3)

2. Spatial algebra implementation perspective,


the representation:
– Is a value of some programming language data type
– Is some arbitrary data structure which is possibly
quite complex
– Supports efficient computational geometry algorithms
for spatial algebra operations
– Is not geared only to one particular algorithm but is
balanced to support many operations well enough
Data Structures (3/3)

• From both perspectives, the representation


should be mapped by the compiler into a
single or perhaps a few contiguous areas (to
support DBMS paging). Also supports:
– Plane sweep sequence: object’s vertices stored in a
specific sweep order (e.g. x-order) to expedite plane-
sweep operation
– Approximations: stores some approximations as
well (e.g. MBR) to speed up operations (e.g.
comparison)
– Stored unary function values: such as perimeter or
area be stored once the object is constructed to
eliminate future expensive computations
Spatial Indexing

• To expedite spatial selection (as well as other


operations such as spatial joins, …)
• It organizes space and the objects in it in
some way so that only parts of the objects
need to be considered to answer a query.
• Two main approaches:
1. Dedicated spatial data structures (e.g. R-tree)
2. Spatial objects mapped to a 1-D space to utilize
standard indexing techniques (e.g. B-tree)
Spatial Indexing: operations

• Spatial data structures either store points or


rectangles (for line or region values)
• Operations on those structures: insert, delete,
member
• Query types for points:
– Range query: all points within a query rectangle
– Nearest neighbor: point closest to a query point
– Distance scan: enumerate points in increasing distance from a
query point.
• Query types for rectangles:
– Intersection query
– Containment query

You might also like