0% found this document useful (0 votes)
4 views31 pages

Multidimensional Indexes

This document outlines the concepts of multidimensional indexes in the context of data structures and queries, focusing on applications that require multiple dimensions such as Geographic Information Systems. It discusses various data structures like grid files, kd-trees, quad trees, and R-trees, as well as query types such as nearest-neighbor and range queries. Additionally, it covers hash-like structures for multidimensional data and introduces bitmap indexes for efficient data retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views31 pages

Multidimensional Indexes

This document outlines the concepts of multidimensional indexes in the context of data structures and queries, focusing on applications that require multiple dimensions such as Geographic Information Systems. It discusses various data structures like grid files, kd-trees, quad trees, and R-trees, as well as query types such as nearest-neighbor and range queries. Additionally, it covers hash-like structures for multidimensional data and introduces bitmap indexes for efficient data retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Unit II: Multidimensional

Indexes
Course code: CSE 432
Program: B.Tech. , Sem VI

Dr. Md Asif Thanedar (Ph.D. NITW)

Assistant Professor
Department of CSE
[email protected]
9494802627
June 16, 2025 1
Topics
• Applications which require Multiple dimensions
• Hash-like structures for Multidimensional data
• Tree-like structures for Multidimensional data

June 16, 2025 2


Applications that require Multidimensions
• We consider two classes of multidimensional applications
• Geographic: data elements in a two-dimensional or three-dimensional world
• Every attribute of a relation can be thought of as a dimension
• All tuples are points in a space defined by those dimensions

June 16, 2025 3


Geographic Information Systems
• It is about all objects in 2D space.
• Example:
• Points in square
• Maps where the objects represent houses,
bridges, roads or physical objects.
• An integrated-circuit design with
different regions in a 2D space.
• A windows and icons on a screen as
collection of objects Figure 1: A Map of objects in 2D space

June 16, 2025 4


Geographic Information Systems
• Queries?
• Queries asked are not of SQL type queries, However, they can be expressed
in SQL with some effort.
• Types of queries are:
• Partial Matching queries:
• We specify one or more dimensions and look for all points matching for those values
• Range queries:
• We give ranges for one or more dimensions; we ask for the set of points within those ranges.
• Nearest-neighbor queries:
• We ask for the closest point to a given point.
• Where am I queries?
• We are given a point, and we want to know in which shape or object or location the point is
located.
June 16, 2025 5
Data Cube
• It is fact table, where data can be seen
as existing in high-dimensional space.
• It is common to view the data as a
relation with an attribute for each
property.
• These attributes can be seen as
dimensions of a multidimensional
space, “data cube”.

Figure 2: Data cube

June 16, 2025 6


Multidimensional Queries in SQL
Query type 1:
• Suppose we want to answer nearest-neighbor
queries about set of points in two-dimensional
space.
• We represent the points as a relation consists of
a pair
Points(x, y)
• Two attributes, x and y, representing x-
coordinates and y-coordinates, respectively. Figure 3: SQL query to find nearest point
• We want the nearest point to the point (10.0,
20.0).
• The query is shown in Fig. 3.
June 16, 2025 7
Multidimensional Queries in SQL
Query type 2:
• Rectangles shape is common in geographic systems.
• Rectangle can be represented in several ways
• Popular one is using coordinates of lower-left and upper-right corners.
• Then, consider the relation Rectangles with schema given as
Rectangles(id, xll, yll, xul, yur)
• A query to get collection of rectangles enclosing the point (10.0, 20.0).

June 16, 2025 8


Multidimensional Queries in SQL
Query type 3:
• A Data cube
• Suitable data is typically organized into a fact table.
• The fact table
• Which provides basic elements being recoded (i.e., attributes (ex: item))
• Dimension tables
• Also provides properties of the values of each dimension

June 16, 2025 9


Executing Range queries
• Consider all the points in 2D space
• Given ranges in both dimensions
• We use B-Tree to get all pointers of the records in the range for x and y.
• Finally, we intersect these points.
• Example:
• Consider 1000,000 points in 2D space,
• x and y coordinates ranges from 0 to 1000. B-Tree indexes on both x and y.
• We are given the range query for getting points in the square of side 100 at
the center of the space i.e., 450 ≤ x ≤ 550 and 450 ≤ y ≤ 550.
• Using B-Tree for x and y we can find all the pointer to records in the range.
There are about 100000 pointer for each x and y.
• Assume, approximately 10000 pointers in the intersecting region.
June 16, 2025 10
Executing Nearest-Neighbor Queries
• Any data structure can be used to answer nearest-neighbor queries by
picking a range in each dimension.
• Unfortunately, there two things can go wrong
• There is no point within the selected range.
• The closest point within the range might not be the closest point overall.
• Example: Consider Points (x, y) relation on x and y dimensions.
• We want to know closest point available within distance d from (10, 20).
• We use B-Tree on x and y axis to get all records between 10 - d and 10 +
d on x-coordinate. Similarly, on y-coordinate 20 - d and 20 + d.

June 16, 2025 11


Hash-Like structures for Multidimensional
Data
• Hash table: the bucket for a point is a function of all attributes or
dimensions.
• Grid file: doesn’t hash values along the dimensions, rather partitions the
dimensions by sorting values.
• Another hash-like structure called “partitioned hashing”, does “hash”
various dimensions, with each dimension contributing to the bucket
number.

June 16, 2025 12


Grid Files
• One of the simplest data structure used for queries
involving multidimensional data is the grid file.
• Consider a space, in which each dimension is partitioned
using grid lines and space is partitioned into stripes.
• Consider the space consisting of points and these points
are partitioned in a grid.
• Example: A database of customers for gold jewelry
consisting of many attributes (for simplicity consider age
and salary).
• Who buys gold jewelry?
• Data base 12 customers (25, 60), (45, 60), (50, 75), (50,
100), (50, 120), (70, 110), (85, 140), (30, 260), (25, 400),
(45, 350), (50, 275), (60, 260)
June 16, 2025 13
Lookup in a Grid File (Implementation)
• We use array whose dimensions same as number of
dimensions in data file.
• To hash a point at particular bucket, need to look at
each component of the point and determine the
position of the point in the grid of that dimension.
• To locate a bucket, we need to know list of values at
which grid lines occur for each dimension.
• The positions of the point in each dimensions
together determine the bucket.
• Identify the buckets for points:
• Salary between $90K and @225K and age between 0 and
40,
• Salary below $90K and age above 55.
June 16, 2025 14
Insertion into Grid Files
• We insert a new record into a grid file using lookup procedure for getting bucket.
• If there is room in the block for the bucket, we insert the record.
• When there is no room, there are two approaches
• Add an overflow block to the bucket and insert the record.
• Reorganize the structure by adding or moving the grid lines. That is, adding grid line splits all
the buckets along that line.
• As a result, it may not be possible to select a new grid line that does the best for all beckets.
• Example we want to add (52, $200K) record to the data file. Then, the most possible
split in this case is:
• A vertical line at age = 51. This line does nothing splitting buckets above or below.
• A horizontal line at salary = 130, which will split the bucket to the right (55-100 and
90-225).
• A horizontal line at salary = 115.
June 16, 2025 15
Partitioned Hash Functions
• A hash function produces sequence of k bits. These k bits are divided
among n attributes of a relation.
• More precisely, a hash function h is actually a list of hash functions
(h1, h2, …, hn), where hi is hash value of ith attribute which produces
sequence of bits.
• The bucket in which a tuple with values (v1, v2, …, vn) is computed by
concatenating the bit sequence of h1(v1)h2(v2) … hn(vn).

June 16, 2025 16


Partitioned Hash Functions: Example
• Consider the gold jewelry data base, we want to
store in a partitioned hash table with eight buckets
(3 bits for buckets). We assume that each overflow
block holds two records. To locate a bucket, we
devote one bit to the age attribute and the
remaining two bits to the salary attribute.
• Data base 12 customers (25, 60), (45, 60), (50,
75), (50, 100), (50, 120), (70, 110), (85, 140), (30,
260), (25, 400), (45, 350), (50, 275), (60, 260)
• For the hash function on age, we take the modulo
2.
• For the hash function on salary, we take the
modul0 4.
June 16, 2025 17
Tree-like Structures for Multidimensional
data
• Multiple-key indexes
• kd-trees
• Quad trees
• R-trees

June 16, 2025 18


Multiple-key indexes
• Consider a relation with n-attributes representing data points, and we want to
support range or nearest-neighbor queries
• A simple tree-like scheme for accessing these points is an index of indexes or
a tree in which the nodes at each level are indexes for one attribute.
• Ex: A relation with 2 attributes
• Root is the index for first attribute
• This can be B-Tree or Hash table.
• The index associates with value of the first
attribute, then a pointer to another index
• If V is values of the first attribute, following
its pointer in an index for set points that have V in
their first attribute
June 16, 2025 19
Example: Multiple key indexes for gold jewelry
• Consider the gold jewelry data base having
two attributes (age, salary).
• Data base 12 customers (25, 60), (45, 60), (50,
75), (50, 100), (50, 120), (70, 110), (85, 140),
(30, 260), (25, 400), (45, 350), (50, 275), (60,
260)

June 16, 2025 20


kd-Trees
• k-dimensional (kd) tree is like binary search tree on multidimensional
data.
• A kd-tree is a binary tree in which interior nodes have an associated
attribute a and value V. Attribute Value
• Example:
Age 45

• The node splits the data points into two parts:


• Those with a-value less than V (Left part of node)
• Those with a-value greater or equal to V. (Right part of node)
• The attributes at different levels of the tree are different, i.e., alternatively
change with levels.
• Leaves will be blocks, with space for as many records as a block can hold.
June 16, 2025 21
A kd-Tree: Example
• We assume a block holds two records.
• Consider 12 points of gold-jewelry data base having (age, salary)
attributes
• (25, 60), (45, 60), (50, 75), (50, 100), (50, 120), (70, 110), (85, 140), (30,
260), (25, 400), (45, 350), (50, 275), (60, 260)

June 16, 2025 22


Insertion on kd-Tree: Example
• To insert a new record, we proceed for a lookup.
• We reach to leaf, if its block has room we put the new record into it.
• If there is no room, we split the block into two, and we divide the its
contents according to whatever attribute appropriate at the level.
• We want to insert (35, 500).

June 16, 2025 23


Quad Trees
• In quad tree, interior node is square region in
2D space or k-dimensional cube in k-
dimensional space.
• If number of points in square is same as
number of records that fit in a block, then we
consider this square as a leaf, and it is
represented by the block that holds its points.
• If there are too many points to fit in one block,
we treat the square as an interior node, with
children corresponding to its quadrants.

June 16, 2025 24


Quad Tree: Gold jewelry database

June 16, 2025 25


R-Trees
• R-trees represents the data regions in 2D space
or higher-dimensional space.
• An interior node of R-tree corresponds to
interior region. The region can be rectangle or
any shape (in practice we use rectangle).
• A node in R-tree has (instead of keys) sub
regions that represents the contents of its
children.

June 16, 2025 26


R-tree: Insertion

June 16, 2025 27


R-tree after Insertion

June 16, 2025 28


Bitmap Indexes
• We assume that records of a file have permanent numbers, 1, 2, 3,…n
(i.e., no.of rows is n).
• A bitmap index for a field F is a collection of bit-vector of length n,
one of each possible value that may appear in the field F.
• The bit-vector for value v has 1 in ith position if ith record has v in filed
F and has 0 if not.
• Example: Suppose a relation/file with two fields (F, G) has 6 records
numbered 1 to 6 with following values in order. (30, foo), (30, bar),
(40, baz), (50, foo), (40, bar), (30, baz).

June 16, 2025 29


Bitmap indexes
• Bitmap index for first field F, would have 3 entries each of length 6 bits.
F Vector
30 110001
40 001010
50 000100
Bitmap index for field G G Vector
foo 100100
bar 010010
baz 001001
June 16, 2025 30
Next class
• Unit 3: Query execution

June 16, 2025 31

You might also like