0% found this document useful (0 votes)
10 views17 pages

Adbms Notes

Uploaded by

Technical Wala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views17 pages

Adbms Notes

Uploaded by

Technical Wala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

ADBMS

Module 1

Advanced Database Management Systems:

Basic Concepts of Query Processing:

• Definition:
• Query processing involves translating high-level queries (like SQL)
into a series of low-level operations that the database management system
(DBMS) can execute to retrieve the requested data.
• Steps in Query Processing:
• Parsing: The query is parsed to check for syntax errors and to create
a parse tree.
• Translation: The parse tree is translated into a relational algebra
expression.
• Optimization: The relational algebra expression is optimized to
produce an efficient query execution plan.
• Execution: The optimized query plan is executed to retrieve the data.
• Key Components:
• Query Parser: Converts SQL queries into a format that the query
optimizer can process.
• Query Optimizer: Determines the most efficient way to execute a
query by evaluating various execution plans.
• Query Executor: Executes the optimized query plan and retrieves the
results.

Converting SQL Queries into Relational Algebra:

• Relational Algebra Basics:


• A procedural query language that operates on relations (tables) and
produces a new relation as a result.
• Conversion Process:
• Select-From-Where Clause:
• SELECT translates to projection (π).

• WHERE translates to selection (σ).


• Example Conversion:
• SQL: SELECT name FROM students WHERE age > 20;
• Relational Algebra: π_name(σ_age > 20(students))

Basic Algorithms for Executing Query Operations:

• Selection Algorithms:
• Linear Search: Scans each tuple in the relation.
• Binary Search: Requires sorted data; divides the search space in half
iteratively.
• Index-Based Search: Uses indices to directly access tuples satisfying
the selection condition.
• Join Algorithms:
• Nested-Loop Join: For each tuple in relation R, scans all tuples in
relation S.
• Sort-Merge Join: Sorts both relations on the join attribute and then
merges them.
• Hash Join: Partitions the relations using a hash function and joins
matching partitions.
• Projection Algorithms:
• Simple Projection: Eliminates unwanted columns and may remove
duplicates.
• Projection with Sorting: Sorts the result to remove duplicates
efficiently.
• Projection with Hashing: Uses a hash table to remove duplicates.

Query Tree and Query Graph:


• Query Tree:
• A tree representation of a relational algebra expression where:
• Internal nodes represent relational algebra operations.
• Leaf nodes represent base relations (tables).
• Example:
• SQL: SELECT name FROM students WHERE age > 20;
• Query Tree:

π_name
|
σ_age > 20
|
students

• Query Graph:
• A graph representation of a query that shows the relationships and
operations in a non-linear structure.
• Nodes represent operations or relations, and edges represent the flow
of data.

Heuristic Optimization of Query Tree:

• Heuristics:
• Push Selections Down: Apply selection operations as early as
possible to reduce the size of intermediate results.
• Push Projections Down: Apply projection operations early to reduce
the number of columns in intermediate results.
• Join Ordering: Choose the most efficient order for join operations to
minimize the size of intermediate results.
• Combining Operations: Combine adjacent operations when possible
to simplify the query tree.
• Benefits:
• Reduces the cost of query execution by minimizing the amount of
data processed at each step.
• Simplifies query plans, making them easier to understand and
execute.

Functional Dependencies:

• Definition:
• A functional dependency (FD) is a constraint between two sets of
attributes in a relation.

• Example:

• Use in Normalization:
• Functional dependencies are used to identify and eliminate
redundancy in database design through normalization.

Normal Forms:

• Purpose:
• Normal forms are guidelines to reduce redundancy and avoid
anomalies in database design.
• First Normal Form (1NF):
• Ensures each column contains atomic (indivisible) values and each
row is unique.
• Second Normal Form (2NF):
• Achieved when a relation is in 1NF and all non-key attributes are fully
functionally dependent on the primary key.
• Third Normal Form (3NF):
• Achieved when a relation is in 2NF and all the attributes are
functionally dependent only on the primary key.
• Boyce-Codd Normal Form (BCNF):

• Higher Normal Forms (4NF, 5NF):


• Deal with more complex types of dependencies like multi-valued and
join dependencies.

Module 2

Overview of Object-Oriented Database (OODB):

• Definition:
• An Object-Oriented Database (OODB) stores data in the form of
objects, similar to how object-oriented programming languages like Java and C++
manage data.
• Key Characteristics:
• Persistence: Objects persist beyond program execution, stored in the
database.
• Complex Data Types: Supports storage of complex data types such
as multimedia, spatial data, and more.
• Inheritance: Objects can inherit properties and methods from other
objects.
• Encapsulation: Data and behavior are encapsulated within objects.
• Polymorphism: Objects can be treated as instances of their parent
class.
• Benefits:
• Seamless integration with object-oriented programming.
• Efficient handling of complex data types.
• Enhanced data modeling capabilities.

OO Concepts (Object-Oriented Concepts):

• Objects:
• Instances of classes containing both data (attributes) and behavior
(methods).
• Classes:
• Blueprints for creating objects, defining attributes and methods.
• Inheritance:
• Mechanism by which a class (subclass) inherits attributes and
methods from another class (superclass).
• Encapsulation:
• Bundling data and methods that operate on the data within one unit,
restricting direct access to some of the object’s components.
• Polymorphism:
• Ability to process objects differently based on their data type or class.
• Abstraction:
• Hiding complex implementation details and showing only the essential
features of the object.

Architecture of ORDBMS and OODBMS:

• Object-Relational Database Management System (ORDBMS):


• Integration: Combines features of relational databases with object-
oriented databases.
• Extensibility: Supports custom data types, methods, and inheritance.
• Query Language: Extends SQL to support object-oriented features.
• Object-Oriented Database Management System (OODBMS):
• Pure Object Model: Uses a pure object model without relational
features.
• Data Management: Manages data as objects, with support for
complex data types and relationships.
• Query Language: Often uses Object Query Language (OQL) for
querying objects.

OOD Modeling (Object-Oriented Database Modeling):

• Concepts:
• Classes and Objects: Central elements of OOD modeling.
• Relationships: Defines how objects interact (e.g., associations,
inheritance).
• Methods: Functions or procedures defined within a class.
• Modeling Techniques:
• Class Diagrams: Visual representation of classes, their attributes,
methods, and relationships.
• Use Case Diagrams: Depict how users interact with the system.
• Sequence Diagrams: Show object interactions over time.

ORD Modeling (Object-Relational Database Modeling):

• Concepts:
• Tables and Objects: Integrates relational tables with object-oriented
concepts.
• Inheritance: Supports class hierarchies within the database schema.
• Complex Data Types: Allows the use of user-defined types and
structures.
• Modeling Techniques:
• ER Diagrams (Enhanced with Objects): Extends traditional ER
diagrams to include object-oriented elements.
• Relational Tables with Object Columns: Incorporates columns that
can store objects or references to objects.

Specialization, Generalization, Aggregation, and Associations:

• Specialization:
• Creating sub-classes from a parent class, adding specific attributes or
methods.
• Generalization:
• Creating a parent class from multiple sub-classes, abstracting
common features.
• Aggregation:
• A “whole-part” relationship where a class is composed of one or more
classes.
• Associations:
• Defines relationships between classes, such as one-to-one, one-to-
many, and many-to-many.

Object Query Language (OQL):

• Definition:
• A query language used for querying object-oriented databases, similar
to SQL but tailored for object retrieval.
• Key Features:
• Object Identification: Retrieves objects based on their identity.
• Complex Queries: Supports complex queries involving nested objects
and collections.
• Inheritance: Handles queries involving class hierarchies and
polymorphism.
• Example Syntax:
• SELECT p.name FROM Person p WHERE p.age > 30;

Object Relational Concepts:

• User-Defined Types (UDTs):


• Custom data types defined by users to handle complex data
structures.
• Inheritance:
• Allows tables to inherit properties and methods from other tables,
creating a class hierarchy within the database.
• Methods and Functions:
• Enables defining functions and procedures within the database that
operate on custom types.
• Nested Tables and Arrays:
• Supports the storage of arrays and nested tables as column data
types.
• Complex Relationships:
• Manages complex relationships between objects, such as
aggregations and compositions, within the relational framework.

Module 3

Introduction - Parallel and Distributed Database:

• Parallel Database Systems:


• Definition: Systems that use multiple processors and storage devices
to execute database operations concurrently.
• Purpose: Improve performance and handle large-scale data
processing tasks by parallelizing tasks.
• Key Features: Data partitioning, parallel query processing, load
balancing, and fault tolerance.
• Distributed Database Systems:
• Definition: Systems where the database is distributed across multiple
locations, interconnected by a network.
• Purpose: Enhance data availability, reliability, and performance by
distributing data closer to where it is needed.
• Key Features: Data fragmentation, replication, distributed query
processing, and transaction management.

Design of Parallel Databases:

• Data Partitioning:
• Horizontal Partitioning: Divides a table into rows, distributing different
rows to different nodes.
• Vertical Partitioning: Divides a table into columns, distributing different
columns to different nodes.
• Hybrid Partitioning: Combination of horizontal and vertical partitioning.
• Parallel Query Processing:
• Inter-Query Parallelism: Executes multiple queries concurrently.
• Intra-Query Parallelism: Executes a single query using multiple
processors concurrently.
• Load Balancing:
• Ensures even distribution of workload across all processors to prevent
any single node from becoming a bottleneck.

Parallel Query Evaluation:

• Pipeline Parallelism:
• Executes different stages of a query simultaneously on different
processors.
• Partitioned Parallelism:
• Splits data into partitions, processing each partition in parallel.
• Parallel Join Algorithms:
• Parallel Nested-Loop Join: Distributes the outer loop across multiple
processors.
• Parallel Sort-Merge Join: Distributes sorting and merging operations
across multiple processors.
• Parallel Hash Join: Distributes hashing and joining operations across
multiple processors.

Distributed Databases Principles:

• Data Fragmentation:
• Horizontal Fragmentation: Divides a relation into subsets of tuples.
• Vertical Fragmentation: Divides a relation into subsets of columns.
• Mixed Fragmentation: Combination of horizontal and vertical
fragmentation.
• Data Replication:
• Full Replication: Copies the entire database to multiple sites.
• Partial Replication: Copies parts of the database to multiple sites.
• Data Allocation:
• Centralized: All data is stored in a single location.
• Decentralized: Data is stored across multiple locations.

Architectures:

• Client-Server Architecture:
• Clients request services from a centralized server.
• Peer-to-Peer Architecture:
• Each node in the network can act as both a client and a server.
• Federated Architecture:
• Independent databases are integrated under a common interface.
• Multi-Database System:
• Multiple autonomous databases that can cooperate without full
integration.

Design:
• Global Schema Design:
• Provides a unified view of the distributed data.
• Fragmentation Schema Design:
• Defines how data is fragmented and allocated.
• Replication Schema Design:
• Specifies which data fragments are replicated and where.

Implementation:

• Communication Infrastructure:
• Ensures reliable and efficient data transfer between distributed sites.
• Middleware:
• Facilitates communication and data exchange between
heterogeneous databases.
• Distributed DBMS Software:
• Manages distributed data, ensuring consistency and coordination.

Fragmentation:

• Purpose:
• Improve query performance and ensure data is stored close to where
it is needed.
• Types:
• Primary Fragmentation: Initial division of data.
• Derived Fragmentation: Further division based on access patterns.
• Reconstruction:
• Ensuring that fragmented data can be recombined to recreate the
original data set.

Transparencies in Distributed Databases:

• Location Transparency:
• Users do not need to know the physical location of data.
• Fragmentation Transparency:
• Users are unaware of how data is fragmented.
• Replication Transparency:
• Users do not need to know about data replication.
• Failure Transparency:
• Ensures database operations are unaffected by site failures.

Transaction Control in Distributed Database:

• Two-Phase Commit Protocol (2PC):


• Ensures all nodes in a distributed system either commit or abort a
transaction.
• Phase 1: Prepare Phase: Coordinator sends prepare message, and
nodes reply with a vote.
• Phase 2: Commit Phase: If all votes are yes, coordinator sends
commit message; otherwise, sends abort message.
• Three-Phase Commit Protocol (3PC):
• Similar to 2PC but adds a third phase to reduce chances of blocking.
• Concurrency Control:
• Distributed Locking: Manages locks on distributed data items.
• Timestamp Ordering: Uses timestamps to order transactions and
avoid conflicts.

Query Processing in Distributed Database:

• Distributed Query Optimization:


• Cost-Based Optimization: Estimates cost of various query plans to
choose the most efficient one.
• Heuristic Optimization: Uses rules of thumb to simplify query
processing.
• Query Decomposition:
• Breaking down a global query into subqueries that can be executed
on different nodes.
• Join Processing:
• Semi-Join: Reduces data transfer by only sending necessary data for
join operations.
• Bloom Join: Uses bloom filters to reduce data transfer in join
operations.
• Data Localization:
• Ensures queries are processed locally as much as possible to reduce
data transfer.

Module 4

Web Interfaces to the Web:

• Definition:
• Interfaces that enable interaction between web browsers and web
servers to access and manipulate data.
• Key Components:
• HTML/CSS: For structuring and styling web content.
• JavaScript: For client-side scripting to enhance user interaction.
• Web Servers: Serve web pages to users (e.g., Apache, Nginx).
• APIs (Application Programming Interfaces): Enable communication
between different software systems.
• HTTP/HTTPS: Protocols for transferring data over the web.
• Technologies:
• RESTful Services: Web services that follow REST (Representational
State Transfer) principles.
• SOAP (Simple Object Access Protocol): Protocol for exchanging
structured information in web services.

Overview of XML:

• Definition:
• Extensible Markup Language (XML) is a flexible text format used for
structuring and exchanging data.
• Key Features:
• Self-descriptive: Tags define the data and its structure.
• Platform-independent: Can be used across different systems and
technologies.
• Hierarchical Structure: Data is organized in a tree-like structure.
• Common Uses:
• Data interchange: Between different systems.
• Configuration files: For software applications.
• Web services: For data exchange in web applications.

Structure of XML Data:

• Elements:
• Basic building blocks, enclosed in tags (e.g., <book>).
• Attributes:
• Provide additional information about elements (e.g., <book
genre="fiction">).
• Text Content:
• The actual data within elements.
• Hierarchy:
• Nested elements to represent complex data structures.
• Example:

<library>
<book id="1">
<title>Database Systems</title>
<author>John Doe</author>
</book>
</library>

Document Schema:

• Definition:
• Defines the structure and rules for an XML document.
• Types:
• DTD (Document Type Definition): Specifies the structure and legal
elements/attributes.
• XML Schema (XSD): More powerful and expressive than DTD,
supports data types and namespaces.
• Purpose:
• Ensure data consistency and validation.

Querying XML Data:

• XPath:
• Language for navigating through elements and attributes in an XML
document.
• Example Query: //book/title
• XQuery:
• Powerful language for querying and transforming XML data.
• Example Query:

for $b in //book
where $b/author = 'John Doe'
return $b/title

• XSLT (Extensible Stylesheet Language Transformations):


• Used for transforming XML documents into different formats.

Storage of XML Data:

• Native XML Databases:


• Databases designed specifically to store and query XML data
efficiently (e.g., eXist, BaseX).
• XML-Enabled Databases:
• Traditional databases (relational or NoSQL) that provide support for
storing and querying XML data.
• Techniques:
• Text-Based Storage: Stores XML as text files.
• Shredding: Decomposes XML into relational tables.
• Binary XML Storage: Stores XML in a compact, binary format for
better performance.

XML Applications:
• Web Services:
• SOAP and RESTful services use XML for data interchange.
• RSS Feeds:
• Websites use XML for syndicating content updates.
• Configuration Files:
• Many software applications use XML for configuration (e.g.,
web.config in ASP.NET).
• Data Interchange:
• Used in B2B (Business-to-Business) applications for data exchange.

The Semi-Structured Data Model:

• Definition:
• Data model that does not adhere to a rigid structure, allowing flexibility
in data representation (e.g., XML, JSON).
• Characteristics:
• Flexible Schema: Structure can vary from one data item to another.
• Self-descriptive: Data items carry schema information within them.
• Hierarchical: Often represented in a tree-like structure.
• Examples:
• XML, JSON, NoSQL databases (e.g., MongoDB).

Implementation Issues:

• Complexity:
• Parsing and processing XML can be complex and resource-intensive.
• Performance:
• Efficient indexing and query optimization techniques are required to
handle large XML datasets.
• Storage:
• Need for effective storage solutions to handle large and nested XML
documents.
• Consistency:
• Ensuring data consistency and integrity in semi-structured data.
Indexes for Text Data:

• Purpose:
• Improve the performance of text-based queries.
• Types of Indexes:
• Full-Text Indexing: Allows efficient searching within text documents.
• Inverted Index: Maps terms to their locations in a document collection,
commonly used in search engines.
• Techniques:
• Tokenization: Breaking down text into smaller units (tokens).
• Stemming and Lemmatization: Reducing words to their base or root
form.
• Stop Words Removal: Eliminating common words (e.g., “and”, “the”)
that do not carry significant meaning.
• Examples:
• Lucene: Open-source search library for full-text indexing.
• Elasticsearch: Distributed search engine built on top of Lucene.

Module 5

You might also like