Advanced Database Management System CO-318 Advanced Datatypes and New Applications
Advanced Database Management System CO-318 Advanced Datatypes and New Applications
CO-318
ADVANCED DATATYPES AND NEW APPLICATIONS
Motivation:
Spatial data support in databases is important for efficiently storing, indexing, and querying of data on the
basis of spatial locations.
Three dimensional Raster: Raster data can be three dimensional for example temperature at different
altitudes or surface temperature at different points in time.
2. Vector data: Vector data are constructed from basic geometric objects, such as points, line segments,
polylines, triangles, and other polygons in two dimensions, and cylinders, spheres, cuboids, and other
polyhedrons in three dimensions. In the context of geographic data, points are usually represented by latitude
and longitude, and where the height is relevant, additionally by elevation.
Geographic databases have a variety of uses, including online map services; vehicle-navigation systems;
distribution-network information for public-service utilities such as telephone, electric-power, and water-
supply systems; and land usage information for ecologists and planners.
Vehicle-navigation systems are systems that are mounted in automobiles and provide road maps and trip-
planning services. They include a Global Positioning System (GPS) unit, which uses information broadcast
from GPS satellites to find the current location with an accuracy of tens of meters.
Web-based road map services form a very widely used application of map data. At the simplest level, these
systems can be used to generate online road maps of a desired region.
Several Web-based map services have defined APIs that allow programmers to create customized maps that
include data from the map service along with data from other sources. Such customized maps can be used to
display, for example, houses available for sale or rent, or shops and restaurants, in a particular area.
Map services such as Google Maps and Yahoo! Maps provide
APIs that allow users to create specialized map displays,
containing application specific data overlaid on top of standard
map data. Google maps uses vector data to represent locations
and city blocks. Web mapping or an online mapping is the
process of using, creating, and distributing maps on the World
Wide Web (the Web), usually through the use of Web
geographic information systems (Web GIS). A web map or an
online map is both served and consumed thus, web mapping is
more than just web cartography, it is an interactive service
where consumers may choose what the map will show.
Example: websites may show a map of an area with information
about restaurants overlaid on the map.
GEOGRAPHIC INFORMATION SYSTEMS
Geographic information systems (GIS) are self purpose databases that produce connected visualizations
of geospatial data—that is, data spatially referenced to Earth. Beyond creating visualizations, GIS is capable
of capturing, storing, analyzing and managing geospatial data. With GIS, users can create interactive
queries, analyze spatial information, edit data, integrate maps and present the results of these tasks. GIS
connect and overlay what are often considered disparate data sets to help people, businesses and
governments better understand our world, identifying patterns and relationships previously untapped.
Through GIS mapping and analysis, organizations can improve the decision making and optimization of
resource management, asset management, environmental impact assessments, marketing, supply chain
management and many other activities.
SPATIAL DATA: REPRESENTATION OF GEOMETRIC
INFORMATION
A line segment can be represented by the coordinates of its endpoints. For example, in a map database, the
two coordinates of a point would be its latitude and longitude.
A polyline (also called a line string) consists of a connected sequence of line segments and can be
represented by a list containing the coordinates of the endpoints of the segments, in sequence.
We can approximately represent an arbitrary curve by polylines, by partitioning the curve into a sequence
of segments.
We can represent a polygon by listing its vertices in order, the list of vertices specifies the boundary of a
polygonal region.
A polygon can be divided into a set of triangles, this process is called triangulation.
Circles and ellipses can be represented by corresponding types, or can be approximated by polygons.
Fig shows representation of geometric
information in a database
SPATIAL DATA: SPATIAL QUERIES
Nearness Queries: Nearness queries request objects that lie near a specified location. A query to find all
restaurants that lie within a given distance of a given point is an example of a nearness query. The nearest-
neighbor query requests the object that is nearest to a specified point.
Region queries: deal with spatial regions. Such a query can ask for objects that lie partially or fully inside a
specified region.
Intersection and union queries: Queries may also request intersections and unions of regions. For
example, given region information, such as annual rainfall and population density, a query may request all
regions with a low annual rainfall as well as a high population density.
SPATIAL DATA: INDEXING OF SPATIAL DATA
Motivation:
Indices are required for efficient access to spatial data.
2. Quadtree: The top node is associated with the entire target space. Each non leaf node in a quad tree
divides its region into four equal-sized quadrants, and correspondingly each such node has four child nodes
corresponding to the four quadrants. Leaf node shave between zero and some fixed maximum number of
points. Correspondingly, if the region corresponding to a node has more than the maximum number of
points, child nodes are created for that node.
Most traditional databases store only the current state of data. When updates occur, previous values are
overwritten and lost, unless special logging mechanisms are used.
However, many real-world applications need to track how data changes over time. For example:
• A hospital needs a patient's medical history.
• A factory tracks sensor data across shifts.
• An HR system maintains employee role changes.
To support such use cases, temporal databases store data along with associated time information, enabling
time-aware queries. This allows users to view past, current, or future states of the data, making temporal
databases ideal for systems where historical context matters.
PROCESS FLOW IN TEMPORAL DATABASES
Temporal databases manage and query data with respect to time. The core idea is that each
data item (or tuple) is associated with one or more time intervals. The key concepts in this
flow include:
•Time Dimensions:
•Valid Time: When the data is true in the real world.
•Transaction Time: When the data is stored in the database.
•Temporal Relations:
•Each tuple includes time intervals (from, to).
•If both valid and transaction times are stored, it’s called a bitemporal relation.
•Data Operations:
•Inserts/updates include time tagging.
•Deletions may adjust the time interval instead of removing the data.
•Querying with Time:
•Temporal queries retrieve data based on time.
•Examples: “What was valid on 2010-01-01?”, “Show salary changes over time”.
CODE EXAMPLE
Table Creation
Data Insertion
Data Retrieval
ADVANTAGES
• Historical Data Tracking
Temporal databases preserve past data states, enabling analysis of historical trends and changes over time (e.g., salary
progression, medical history, sensor patterns).
• Improved Auditability
They maintain a full record of when data was inserted, modified, or deleted, making them ideal for audit trails and
compliance with regulations.
• Time-based Querying
Users can perform powerful queries like:
"What was the value on a specific date?" or
"Show all changes between 2010 and 2020",
which are not possible in standard databases.
• Data Consistency Over Time
Helps maintain data accuracy by associating each fact with the correct time interval, minimizing ambiguity about when
data was valid.
• Support for Real-world Applications
Used in domains like finance, healthcare, insurance, and supply chain systems where the temporal dimension is critical
to operations and decision-making.
DISADVANTAGES
• Increased Complexity
Managing multiple time dimensions (valid and transaction time) adds complexity to schema design, query writing, and data
management.
• Higher Storage Requirements
Since changes aren’t overwritten but stored with new time intervals, temporal databases can consume significantly more
storage over time.
• Performance Overhead
Temporal queries (e.g., joins or filters over time intervals) can be slower due to the additional processing of time attributes
and multiple versions of the same data.
• Limited Tool Support
Not all database management systems support temporal features natively, and those that do may implement them differently
or with limitations.
• Steeper Learning Curve
Developers and analysts need to understand temporal logic and constructs (like bitemporal relations, time-based joins),
which can require additional training or experience.
REAL-WORLD USAGE
Healthcare
Used to maintain a complete and accurate timeline of patient records, including diagnosis, treatments,
prescriptions, and test results — enabling long-term patient history tracking and medical analysis.
Industrial & Manufacturing (Factories)
Supports tracking of machine performance and sensor readings over time, allowing historical analysis for
predictive maintenance, anomaly detection, and efficiency optimization.
Human Resource Systems
Captures and preserves historical data on employee roles, salary changes, promotions, and department
transfers — essential for workforce planning and regulatory compliance.
Finance & Legal
Enables creation of immutable audit trails by recording every change made to transactions or documents,
helping meet strict compliance and legal traceability requirements.
Web Applications
Stores time-based versions of user data (e.g., profile updates, preferences, settings), allowing rollback to
previous states and analyzing user behavior over time for personalization and insights.
TEMPORAL QUERY LANGUAGES – CONCEPTS &
OPERATIONS
Overview Temporal Query Types
•Temporal databases support time-aware queries and •Temporal Selection: Filters tuples based on time (e.g.,
historical tracking. during, overlaps).
•A snapshot relation (non-temporal) shows data at a single •Temporal Projection: Projects attributes while retaining
point in time. their valid time intervals.
Snapshot Operation •Temporal Join:
•Returns all tuples valid at a specific time t. •Combines tuples with overlapping time intervals.
•Time interval attributes are excluded in the result. •Result's time is the intersection of overlapping
•If t is not provided, the current system time is assumed. intervals.
•If no overlap, the tuple is discarded.
Interval Predicates and Operations
•Predicates: precedes, overlaps, contains.
•Intersect: Returns the common time range (can be empty).
•Union: May result in a single or multiple intervals based
on overlap.
SQL STANDARD 2011
Core Addition: Temporal Table Support
SQL:2011 introduced native support for temporal databases,
allowing time-based data management directly within SQL
without custom logic.
Query historical data as of a specific timestamp
Temporal Table Types:
•System-Versioned Tables
Automatically track changes over time using system-managed
columns (e.g., SysStartTime, SysEndTime), enabling historical
queries.
•Application-Time Period Tables
Use user-defined date/time columns to represent when a row is
valid in the real world, declared using PERIOD FOR.
•Bi-Temporal Tables
Combine system and application time for full temporal accuracy— Query data valid during a specific business period
ideal for auditing and regulatory compliance.
MULTIMEDIA DATABASES -
INTRODUCTION
Multimedia data—images, audio, and video—is widely used in modern
applications. Initially stored in file systems, this approach worked for small
volumes but doesn’t scale well. File systems lack efficient indexing, advanced
querying, and can lead to inconsistencies like missing or mismatched files.
A common setup stores metadata in the database and media files externally. While
manageable, this limits direct content indexing and can cause data mismatches. A
more reliable solution is storing both metadata and media in the database, enabling
full integration, better consistency, and improved querying.
STORING MULTIMEDIA IN DATABASES – KEY
CHALLENGES
Storing multimedia data directly in a database presents several technical challenges that must be addressed to ensure
efficiency, reliability, and usability in real-world applications.
• Support for Large Objects:
Multimedia files like videos can be several gigabytes in size. Databases must support large objects or manage them
using external file pointers (e.g., via SQL/MED standard).
• Continuous Media Handling:
Audio and video require steady-rate data delivery (isochronous data).
• Too slow → playback gaps
• Too fast → buffer overflow and data loss
• Similarity-Based Retrieval:
Essential for applications like image or fingerprint matching.
• Standard indexes (e.g., B+ trees, R-trees) are insufficient
• Requires specialized index structures for effective search
MULTIMEDIA FORMATS
Need for Compression MPEG-2
•Multimedia data (images, audio, video) requires large •Used in DVDs and digital TV broadcasts.
storage space. •Compresses to ~17 MB/minute with minimal loss in quality.
•Compression ensures efficient storage and faster MPEG-4
transmission. •Supports variable bandwidth and high compression efficiency.
Image Compression – JPEG •Ideal for streaming content.
•JPEG (Joint Photographic Experts Group) is the standard •Variants like MPEG-4 AVC and AVCHD support HD video.
format. Audio Compression Formats
•Reduces file size by removing visual redundancies. •MP3 (MPEG-1 Layer 3) is the most widely used audio
•Maintains acceptable image quality with smaller storage. format.
Video Compression – MPEG Standards •Other formats: RealAudio and Windows Media Audio
•MPEG (Moving Picture Experts Group) enables efficient (WMA).
video/audio compression. •Each format uses unique compression techniques for efficient
•Leverages similarities between successive frames. audio storage.
MPEG-1
•Compresses to ~12.5 MB per minute at 30 fps.
•Uses lossy compression; quality similar to VHS tapes.
CONTINUOUS-MEDIA DATA – CONCEPTS
AND DELIVERY
Key Types Cycle Period Trade-Offs
•Audio and video (e.g., movie databases). •Short Cycle:
•Requires real-time delivery for smooth playback. • Low memory usage.
Real-Time Requirements • High disk activity (frequent seeks).
•Data must arrive quickly to avoid playback gaps. •Long Cycle:
•Pacing is crucial to prevent buffer overflows. • Lower disk seeks.
•Media stream synchronization is essential (e.g., lip • Higher memory needs and longer initial delay.
sync). Admission Control
Data Fetching Mechanism •On a new request:
•Fetched in periodic cycles (e.g., every n seconds). • System checks if sufficient resources are
•Each cycle loads n seconds of data into memory available.
buffers. • If yes → request is admitted.
•Previously fetched data is streamed during the • If no → request is rejected to ensure QoS.
current cycle.
CONTINUOUS-MEDIA ARCHITECTURE – VIDEO-ON-
DEMAND SYSTEMS
🖴 Video Server:
Stores multimedia content (videos, audio) across multiple hard disks, often arranged in RAID configurations for
redundancy and performance. For rarely accessed content, tertiary storage systems like optical disks or tapes
may be used.
Terminals:
Playback is handled through end-user devices such as personal computers, smart TVs, or set-top boxes. These
terminals decode and render the streamed media for user consumption.
Network:
A high-bandwidth, reliable network is essential for transmitting multimedia content from the server to numerous
terminals simultaneously. This ensures seamless, uninterrupted playback.
System Characteristics:
Most VoD platforms rely on traditional file systems instead of database systems, which often lack the real-time
capabilities needed for continuous media delivery. These systems are optimized to deliver content predictably
and without delay.
Deployment:
Widely adopted in cable and internet-based streaming services, VoD systems power modern platforms offering
on-demand access to movies, TV shows, and educational content.
SIMILARITY-BASED RETRIEVAL
•Approximate Descriptions:
Multimedia data (e.g., fingerprints, images, audio) is often stored with approximate representations, not exact
matches.
•Examples:
•Pictorial Data:
Used in applications like trademark databases, where visually similar designs must be retrieved.
•Audio Data:
In speech-based interfaces, spoken input is matched to stored commands based on similarity.
•Handwritten Data:
Handwritten input is compared to stored samples to identify matches.
•Subjective Nature of Similarity:
Similarity can vary between users, but matching is often easier than full speech or handwriting recognition since
comparisons are limited to known data.
•Techniques & Applications:
•Specialized algorithms are used to find best matches.
•Widely used in voice-activated systems for phones, smart assistants, and in-vehicle controls.
A conceptual architecture for similarity based multimedia information retrieval
ADVANTAGES OF MULTIMEDIA DATABASES
• Adaptive Query Processing: Dynamic decision-making based on real-time network conditions and cache state, ensuring
optimal routing of queries either locally or to a central/edge server.
• Data Synchronization and Conflict Resolution: Techniques such as optimistic replication, version vectors, and merge
algorithms are critical for maintaining data consistency across distributed environments.
• Replication and Mobile Transaction Management: Lightweight replication schemes and transaction management protocols
are adapted to the mobile context to support high availability and reliability.
• Security and Privacy: Encryption (both in-transit and at-rest), robust authentication, and access control mechanisms are
integrated to protect sensitive information despite the exposed nature of wireless communications.
Mobile Database Architecture
Mobile database systems typically comprise several layers that work in unison to address the challenges posed by mobile
environments. The architecture can be broadly divided into the following components:
Mobile Hosts (MH)
• Definition: End-user devices (e.g., smartphones, tablets, laptops) that contain an embedded database engine.
• Technical Characteristics:
• Local Storage: Uses lightweight databases like SQLite or Couchbase Lite to store a subset of the central
data.
• Processing Power: Optimized for low-power operation; query processing and transaction management
• Role: Provides immediate, offline access to data and serves as the primary point for local transaction logging.
Decentralized Databases Architecture Schema
Mobile Support Stations (MSS)
• Definition: Fixed nodes such as cellular base stations or Wi-Fi access points that facilitate wireless communication between
mobile hosts and higher-tier servers.
• Technical Characteristics:
• Role: Acts as the first point of aggregation and relay for data, ensuring that mobile queries and transactions are routed
efficiently.
Edge Servers
• Definition: Servers deployed in close geographical proximity to mobile hosts, often at the network edge (e.g., MEC nodes or
cloudlets).
• Technical Characteristics:
• Connectivity Management: Handles handoffs, ensuring that mobile hosts maintain continuous
connectivity as they move between cells.
• Bandwidth Optimization: Implements protocols to efficiently manage limited wireless bandwidth and
reduce transmission overhead
• Role: Provides an intermediate processing layer that reduces round-trip times, supports adaptive query processing, and enhances
user experience through near-real-time responses.
Central Server
• Definition: The authoritative data repository that maintains the master copy of the database.
• Technical Characteristics:
• Robust Query Processing: Equipped with high-end processing capabilities to execute complex queries
and manage large-scale transactions.
• Data Integrity and Consistency: Implements ACID-compliant transaction management, backup, and
recovery protocols.
• Synchronization Hub: Coordinates data replication and conflict resolution across mobile hosts and edge
servers.
• Role: Acts as the backbone of the system, ensuring overall data consistency, security, and providing centralized administrative
control.
Local Caching and Data Broadcasting
Local Caching:
• Mechanism: The central server periodically broadcasts updates (or hot data) to all mobile hosts.
• Benefits: Reduces the number of individual requests, Optimizes bandwidth usage by pushing common
updates.
Online Operation and Data Synchronization
Synchronization Protocols:
Mechanism: Once connectivity is restored, a synchronization engine reconciles local data with the central server.
Key Techniques:
• Optimistic Replication: Assumes conflicts are rare and resolves them post hoc using version vectors or timestamps.
• Conflict Resolution Strategies: Predefined rules or machine learning models are used to automatically merge conflicting
updates.
• Batch Processing: Aggregates multiple offline transactions to reduce synchronization overhead.
Handoff and Dynamic Routing
Handoff Mechanisms:
Definition: Procedures that ensure a mobile host’s active session is seamlessly transferred from one MSS to another as it moves
geographically.
Technical Requirements:
• Fast handoff protocols to minimize connection interruptions.
• Synchronization of session data between adjacent MSSs.
Dynamic Routing:
Definition: Algorithms that determine the optimal path for data packets from mobile hosts to edge or central servers. - Key
Considerations:
• Real-time network topology awareness.
• Adaptability to fluctuating network conditions and load variations.
• Prioritization of latency-sensitive data.
Sample Queries and Code Examples with Theoretical Explanations
Additional Query Examples
Advantages and Disadvantages of Adaptive Query Processing and Caching Strategies
Advantages
• Improved Query Efficiency : AQP continuously monitors system and network conditions to select the most optimal
execution plan in real time, reducing latency and increasing throughput.
• Context-Aware Execution : By considering real-time parameters like bandwidth, battery life, and data freshness, AQP
ensures more intelligent and context-sensitive query decisions.
• Reduced Network Load: Caching significantly reduces redundant requests to the central server, minimizing network usage
particularly beneficial in bandwidth-constrained or intermittently connected environments.
• Energy Efficiency: Minimizing network communication and offloading computationally expensive operations to local caches
reduces energy consumption on mobile devices.
• Resilience to Network Fluctuations: AQP allows graceful degradation of query accuracy or performance in the event of poor
connectivity, allowing continued functionality in degraded scenarios.
• Semantic and Intelligent Caching: Semantic caching leverages context and data relevance rather than just access frequency,
improving cache hit rates for complex, real-world queries.
• Dynamic Adaptability: Adaptive strategies can self-tune over time based on historical data and usage patterns, requiring less
manual tuning and making the system more robust in varied environments.
Disadvantages
• Increased System Complexity: The runtime optimization logic in AQP and intelligent caching mechanisms introduce
algorithmic and architectural complexity, increasing the burden on developers and maintainers.
• Overhead from Monitoring and Replanning : Continuously monitoring runtime parameters (like network condition and
system load) and re-optimizing queries introduces additional computational overhead.
• Latency from Decision-Making Logic: In certain cases, the decision-making layer (e.g., whether to fetch from cache or
remote server) adds a small delay that could impact real-time applications.
• Staleness of Cached Data: Despite consistency mechanisms like TTL and versioning, there remains a risk of stale or
outdated data being served from cache, especially in rapidly changing datasets.
• Complex Cache Management : Cache replacement policies that factor in semantic or contextual data require additional
metadata tracking and may introduce computational overhead.
• Unpredictable Performance in Edge Cases: Under certain conditions (e.g., highly volatile networks or workload spikes),
the adaptive logic may make suboptimal decisions, causing inconsistent performance.
• Storage Constraints on Mobile Devices : Local caching consumes device storage, which may be limited, especially in
resource-constrained or legacy mobile devices.
Case Study: Mobile Financial Services
Theoretical Background
Mobile financial services require that critical transaction data be captured reliably on mobile devices and later reconciled with
central banking systems. The theoretical foundation for these systems is built upon:
• Distributed Transaction Management: Mobile environments often adopt an optimistic replication model to allow
transactions to execute locally and later merge with the central system. This approach is grounded in eventual consistency
theory, which guarantees that, despite temporary divergence, the system converges to a consistent state once connectivity is
restored.
• Conflict Detection and Resolution: By leveraging concepts from concurrent systems, mobile financial services employ
conflict detection algorithms using timestamp comparisons or version vectors. These techniques, derived from optimistic
concurrency control, allow the system to identify and merge conflicting transactions effectively.
• Adaptive Query Processing: Adaptive strategies dynamically determine whether to process queries on the mobile host or
offload them to edge or central servers. This decision is based on real-time network conditions, device battery status, and local
cache freshness, following cost-based optimization models.
Query Examples and Explanations
Case Study: Location-Based Services (LBS)
Theoretical Background
LBS applications must rapidly process spatial data to provide users with context-aware, location-specific recommendations. The
core theories involved include:
• Spatial Data Indexing and Geometry: The use of spatial databases and R-tree indexing, based on computational geometry
principles, allows efficient retrieval of geospatial data. Functions such as ST_Distance and ST_DWithin are critical for
calculating distances and filtering data within a geographic radius.
• Adaptive Query Processing in Spatial Contexts: Adaptive query strategies in LBS evaluate the trade-off between local
execution and offloading to edge servers. These strategies rely on cost-based optimization, taking into account dynamic
network conditions, data freshness, and processing power constraints.
Query Examples and Explanations
Thank You