Adbms Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

ADBMS

Module 1

Advanced Database Management Systems:

Basic Concepts of Query Processing:

• Definition:
• Query processing involves translating high-level queries (like SQL)
into a series of low-level operations that the database management system
(DBMS) can execute to retrieve the requested data.
• Steps in Query Processing:
• Parsing: The query is parsed to check for syntax errors and to create
a parse tree.
• Translation: The parse tree is translated into a relational algebra
expression.
• Optimization: The relational algebra expression is optimized to
produce an efficient query execution plan.
• Execution: The optimized query plan is executed to retrieve the data.
• Key Components:
• Query Parser: Converts SQL queries into a format that the query
optimizer can process.
• Query Optimizer: Determines the most efficient way to execute a
query by evaluating various execution plans.
• Query Executor: Executes the optimized query plan and retrieves the
results.

Converting SQL Queries into Relational Algebra:

• Relational Algebra Basics:


• A procedural query language that operates on relations (tables) and
produces a new relation as a result.
• Conversion Process:
• Select-From-Where Clause:
• SELECT translates to projection (π).

• WHERE translates to selection (σ).


• Example Conversion:
• SQL: SELECT name FROM students WHERE age > 20;
• Relational Algebra: π_name(σ_age > 20(students))

Basic Algorithms for Executing Query Operations:

• Selection Algorithms:
• Linear Search: Scans each tuple in the relation.
• Binary Search: Requires sorted data; divides the search space in half
iteratively.
• Index-Based Search: Uses indices to directly access tuples satisfying
the selection condition.
• Join Algorithms:
• Nested-Loop Join: For each tuple in relation R, scans all tuples in
relation S.
• Sort-Merge Join: Sorts both relations on the join attribute and then
merges them.
• Hash Join: Partitions the relations using a hash function and joins
matching partitions.
• Projection Algorithms:
• Simple Projection: Eliminates unwanted columns and may remove
duplicates.
• Projection with Sorting: Sorts the result to remove duplicates
efficiently.
• Projection with Hashing: Uses a hash table to remove duplicates.

Query Tree and Query Graph:


• Query Tree:
• A tree representation of a relational algebra expression where:
• Internal nodes represent relational algebra operations.
• Leaf nodes represent base relations (tables).
• Example:
• SQL: SELECT name FROM students WHERE age > 20;
• Query Tree:

π_name
|
σ_age > 20
|
students

• Query Graph:
• A graph representation of a query that shows the relationships and
operations in a non-linear structure.
• Nodes represent operations or relations, and edges represent the flow
of data.

Heuristic Optimization of Query Tree:

• Heuristics:
• Push Selections Down: Apply selection operations as early as
possible to reduce the size of intermediate results.
• Push Projections Down: Apply projection operations early to reduce
the number of columns in intermediate results.
• Join Ordering: Choose the most efficient order for join operations to
minimize the size of intermediate results.
• Combining Operations: Combine adjacent operations when possible
to simplify the query tree.
• Benefits:
• Reduces the cost of query execution by minimizing the amount of
data processed at each step.
• Simplifies query plans, making them easier to understand and
execute.

Functional Dependencies:

• Definition:
• A functional dependency (FD) is a constraint between two sets of
attributes in a relation.

• Example:

• Use in Normalization:
• Functional dependencies are used to identify and eliminate
redundancy in database design through normalization.

Normal Forms:

• Purpose:
• Normal forms are guidelines to reduce redundancy and avoid
anomalies in database design.
• First Normal Form (1NF):
• Ensures each column contains atomic (indivisible) values and each
row is unique.
• Second Normal Form (2NF):
• Achieved when a relation is in 1NF and all non-key attributes are fully
functionally dependent on the primary key.
• Third Normal Form (3NF):
• Achieved when a relation is in 2NF and all the attributes are
functionally dependent only on the primary key.
• Boyce-Codd Normal Form (BCNF):

• Higher Normal Forms (4NF, 5NF):


• Deal with more complex types of dependencies like multi-valued and
join dependencies.

Module 2

Overview of Object-Oriented Database (OODB):

• Definition:
• An Object-Oriented Database (OODB) stores data in the form of
objects, similar to how object-oriented programming languages like Java and C++
manage data.
• Key Characteristics:
• Persistence: Objects persist beyond program execution, stored in the
database.
• Complex Data Types: Supports storage of complex data types such
as multimedia, spatial data, and more.
• Inheritance: Objects can inherit properties and methods from other
objects.
• Encapsulation: Data and behavior are encapsulated within objects.
• Polymorphism: Objects can be treated as instances of their parent
class.
• Benefits:
• Seamless integration with object-oriented programming.
• Efficient handling of complex data types.
• Enhanced data modeling capabilities.

OO Concepts (Object-Oriented Concepts):

• Objects:
• Instances of classes containing both data (attributes) and behavior
(methods).
• Classes:
• Blueprints for creating objects, defining attributes and methods.
• Inheritance:
• Mechanism by which a class (subclass) inherits attributes and
methods from another class (superclass).
• Encapsulation:
• Bundling data and methods that operate on the data within one unit,
restricting direct access to some of the object’s components.
• Polymorphism:
• Ability to process objects differently based on their data type or class.
• Abstraction:
• Hiding complex implementation details and showing only the essential
features of the object.

Architecture of ORDBMS and OODBMS:

• Object-Relational Database Management System (ORDBMS):


• Integration: Combines features of relational databases with object-
oriented databases.
• Extensibility: Supports custom data types, methods, and inheritance.
• Query Language: Extends SQL to support object-oriented features.
• Object-Oriented Database Management System (OODBMS):
• Pure Object Model: Uses a pure object model without relational
features.
• Data Management: Manages data as objects, with support for
complex data types and relationships.
• Query Language: Often uses Object Query Language (OQL) for
querying objects.

OOD Modeling (Object-Oriented Database Modeling):

• Concepts:
• Classes and Objects: Central elements of OOD modeling.
• Relationships: Defines how objects interact (e.g., associations,
inheritance).
• Methods: Functions or procedures defined within a class.
• Modeling Techniques:
• Class Diagrams: Visual representation of classes, their attributes,
methods, and relationships.
• Use Case Diagrams: Depict how users interact with the system.
• Sequence Diagrams: Show object interactions over time.

ORD Modeling (Object-Relational Database Modeling):

• Concepts:
• Tables and Objects: Integrates relational tables with object-oriented
concepts.
• Inheritance: Supports class hierarchies within the database schema.
• Complex Data Types: Allows the use of user-defined types and
structures.
• Modeling Techniques:
• ER Diagrams (Enhanced with Objects): Extends traditional ER
diagrams to include object-oriented elements.
• Relational Tables with Object Columns: Incorporates columns that
can store objects or references to objects.

Specialization, Generalization, Aggregation, and Associations:

• Specialization:
• Creating sub-classes from a parent class, adding specific attributes or
methods.
• Generalization:
• Creating a parent class from multiple sub-classes, abstracting
common features.
• Aggregation:
• A “whole-part” relationship where a class is composed of one or more
classes.
• Associations:
• Defines relationships between classes, such as one-to-one, one-to-
many, and many-to-many.

Object Query Language (OQL):

• Definition:
• A query language used for querying object-oriented databases, similar
to SQL but tailored for object retrieval.
• Key Features:
• Object Identification: Retrieves objects based on their identity.
• Complex Queries: Supports complex queries involving nested objects
and collections.
• Inheritance: Handles queries involving class hierarchies and
polymorphism.
• Example Syntax:
• SELECT p.name FROM Person p WHERE p.age > 30;

Object Relational Concepts:

• User-Defined Types (UDTs):


• Custom data types defined by users to handle complex data
structures.
• Inheritance:
• Allows tables to inherit properties and methods from other tables,
creating a class hierarchy within the database.
• Methods and Functions:
• Enables defining functions and procedures within the database that
operate on custom types.
• Nested Tables and Arrays:
• Supports the storage of arrays and nested tables as column data
types.
• Complex Relationships:
• Manages complex relationships between objects, such as
aggregations and compositions, within the relational framework.

Module 3

Introduction - Parallel and Distributed Database:

• Parallel Database Systems:


• Definition: Systems that use multiple processors and storage devices
to execute database operations concurrently.
• Purpose: Improve performance and handle large-scale data
processing tasks by parallelizing tasks.
• Key Features: Data partitioning, parallel query processing, load
balancing, and fault tolerance.
• Distributed Database Systems:
• Definition: Systems where the database is distributed across multiple
locations, interconnected by a network.
• Purpose: Enhance data availability, reliability, and performance by
distributing data closer to where it is needed.
• Key Features: Data fragmentation, replication, distributed query
processing, and transaction management.

Design of Parallel Databases:

• Data Partitioning:
• Horizontal Partitioning: Divides a table into rows, distributing different
rows to different nodes.
• Vertical Partitioning: Divides a table into columns, distributing different
columns to different nodes.
• Hybrid Partitioning: Combination of horizontal and vertical partitioning.
• Parallel Query Processing:
• Inter-Query Parallelism: Executes multiple queries concurrently.
• Intra-Query Parallelism: Executes a single query using multiple
processors concurrently.
• Load Balancing:
• Ensures even distribution of workload across all processors to prevent
any single node from becoming a bottleneck.

Parallel Query Evaluation:

• Pipeline Parallelism:
• Executes different stages of a query simultaneously on different
processors.
• Partitioned Parallelism:
• Splits data into partitions, processing each partition in parallel.
• Parallel Join Algorithms:
• Parallel Nested-Loop Join: Distributes the outer loop across multiple
processors.
• Parallel Sort-Merge Join: Distributes sorting and merging operations
across multiple processors.
• Parallel Hash Join: Distributes hashing and joining operations across
multiple processors.

Distributed Databases Principles:

• Data Fragmentation:
• Horizontal Fragmentation: Divides a relation into subsets of tuples.
• Vertical Fragmentation: Divides a relation into subsets of columns.
• Mixed Fragmentation: Combination of horizontal and vertical
fragmentation.
• Data Replication:
• Full Replication: Copies the entire database to multiple sites.
• Partial Replication: Copies parts of the database to multiple sites.
• Data Allocation:
• Centralized: All data is stored in a single location.
• Decentralized: Data is stored across multiple locations.

Architectures:

• Client-Server Architecture:
• Clients request services from a centralized server.
• Peer-to-Peer Architecture:
• Each node in the network can act as both a client and a server.
• Federated Architecture:
• Independent databases are integrated under a common interface.
• Multi-Database System:
• Multiple autonomous databases that can cooperate without full
integration.

Design:
• Global Schema Design:
• Provides a unified view of the distributed data.
• Fragmentation Schema Design:
• Defines how data is fragmented and allocated.
• Replication Schema Design:
• Specifies which data fragments are replicated and where.

Implementation:

• Communication Infrastructure:
• Ensures reliable and efficient data transfer between distributed sites.
• Middleware:
• Facilitates communication and data exchange between
heterogeneous databases.
• Distributed DBMS Software:
• Manages distributed data, ensuring consistency and coordination.

Fragmentation:

• Purpose:
• Improve query performance and ensure data is stored close to where
it is needed.
• Types:
• Primary Fragmentation: Initial division of data.
• Derived Fragmentation: Further division based on access patterns.
• Reconstruction:
• Ensuring that fragmented data can be recombined to recreate the
original data set.

Transparencies in Distributed Databases:

• Location Transparency:
• Users do not need to know the physical location of data.
• Fragmentation Transparency:
• Users are unaware of how data is fragmented.
• Replication Transparency:
• Users do not need to know about data replication.
• Failure Transparency:
• Ensures database operations are unaffected by site failures.

Transaction Control in Distributed Database:

• Two-Phase Commit Protocol (2PC):


• Ensures all nodes in a distributed system either commit or abort a
transaction.
• Phase 1: Prepare Phase: Coordinator sends prepare message, and
nodes reply with a vote.
• Phase 2: Commit Phase: If all votes are yes, coordinator sends
commit message; otherwise, sends abort message.
• Three-Phase Commit Protocol (3PC):
• Similar to 2PC but adds a third phase to reduce chances of blocking.
• Concurrency Control:
• Distributed Locking: Manages locks on distributed data items.
• Timestamp Ordering: Uses timestamps to order transactions and
avoid conflicts.

Query Processing in Distributed Database:

• Distributed Query Optimization:


• Cost-Based Optimization: Estimates cost of various query plans to
choose the most efficient one.
• Heuristic Optimization: Uses rules of thumb to simplify query
processing.
• Query Decomposition:
• Breaking down a global query into subqueries that can be executed
on different nodes.
• Join Processing:
• Semi-Join: Reduces data transfer by only sending necessary data for
join operations.
• Bloom Join: Uses bloom filters to reduce data transfer in join
operations.
• Data Localization:
• Ensures queries are processed locally as much as possible to reduce
data transfer.

Module 4

Web Interfaces to the Web:

• Definition:
• Interfaces that enable interaction between web browsers and web
servers to access and manipulate data.
• Key Components:
• HTML/CSS: For structuring and styling web content.
• JavaScript: For client-side scripting to enhance user interaction.
• Web Servers: Serve web pages to users (e.g., Apache, Nginx).
• APIs (Application Programming Interfaces): Enable communication
between different software systems.
• HTTP/HTTPS: Protocols for transferring data over the web.
• Technologies:
• RESTful Services: Web services that follow REST (Representational
State Transfer) principles.
• SOAP (Simple Object Access Protocol): Protocol for exchanging
structured information in web services.

Overview of XML:

• Definition:
• Extensible Markup Language (XML) is a flexible text format used for
structuring and exchanging data.
• Key Features:
• Self-descriptive: Tags define the data and its structure.
• Platform-independent: Can be used across different systems and
technologies.
• Hierarchical Structure: Data is organized in a tree-like structure.
• Common Uses:
• Data interchange: Between different systems.
• Configuration files: For software applications.
• Web services: For data exchange in web applications.

Structure of XML Data:

• Elements:
• Basic building blocks, enclosed in tags (e.g., <book>).
• Attributes:
• Provide additional information about elements (e.g., <book
genre="fiction">).
• Text Content:
• The actual data within elements.
• Hierarchy:
• Nested elements to represent complex data structures.
• Example:

<library>
<book id="1">
<title>Database Systems</title>
<author>John Doe</author>
</book>
</library>

Document Schema:

• Definition:
• Defines the structure and rules for an XML document.
• Types:
• DTD (Document Type Definition): Specifies the structure and legal
elements/attributes.
• XML Schema (XSD): More powerful and expressive than DTD,
supports data types and namespaces.
• Purpose:
• Ensure data consistency and validation.

Querying XML Data:

• XPath:
• Language for navigating through elements and attributes in an XML
document.
• Example Query: //book/title
• XQuery:
• Powerful language for querying and transforming XML data.
• Example Query:

for $b in //book
where $b/author = 'John Doe'
return $b/title

• XSLT (Extensible Stylesheet Language Transformations):


• Used for transforming XML documents into different formats.

Storage of XML Data:

• Native XML Databases:


• Databases designed specifically to store and query XML data
efficiently (e.g., eXist, BaseX).
• XML-Enabled Databases:
• Traditional databases (relational or NoSQL) that provide support for
storing and querying XML data.
• Techniques:
• Text-Based Storage: Stores XML as text files.
• Shredding: Decomposes XML into relational tables.
• Binary XML Storage: Stores XML in a compact, binary format for
better performance.

XML Applications:
• Web Services:
• SOAP and RESTful services use XML for data interchange.
• RSS Feeds:
• Websites use XML for syndicating content updates.
• Configuration Files:
• Many software applications use XML for configuration (e.g.,
web.config in ASP.NET).
• Data Interchange:
• Used in B2B (Business-to-Business) applications for data exchange.

The Semi-Structured Data Model:

• Definition:
• Data model that does not adhere to a rigid structure, allowing flexibility
in data representation (e.g., XML, JSON).
• Characteristics:
• Flexible Schema: Structure can vary from one data item to another.
• Self-descriptive: Data items carry schema information within them.
• Hierarchical: Often represented in a tree-like structure.
• Examples:
• XML, JSON, NoSQL databases (e.g., MongoDB).

Implementation Issues:

• Complexity:
• Parsing and processing XML can be complex and resource-intensive.
• Performance:
• Efficient indexing and query optimization techniques are required to
handle large XML datasets.
• Storage:
• Need for effective storage solutions to handle large and nested XML
documents.
• Consistency:
• Ensuring data consistency and integrity in semi-structured data.
Indexes for Text Data:

• Purpose:
• Improve the performance of text-based queries.
• Types of Indexes:
• Full-Text Indexing: Allows efficient searching within text documents.
• Inverted Index: Maps terms to their locations in a document collection,
commonly used in search engines.
• Techniques:
• Tokenization: Breaking down text into smaller units (tokens).
• Stemming and Lemmatization: Reducing words to their base or root
form.
• Stop Words Removal: Eliminating common words (e.g., “and”, “the”)
that do not carry significant meaning.
• Examples:
• Lucene: Open-source search library for full-text indexing.
• Elasticsearch: Distributed search engine built on top of Lucene.

Module 5

You might also like