Adbms Notes
Adbms Notes
Adbms Notes
Module 1
• Definition:
• Query processing involves translating high-level queries (like SQL)
into a series of low-level operations that the database management system
(DBMS) can execute to retrieve the requested data.
• Steps in Query Processing:
• Parsing: The query is parsed to check for syntax errors and to create
a parse tree.
• Translation: The parse tree is translated into a relational algebra
expression.
• Optimization: The relational algebra expression is optimized to
produce an efficient query execution plan.
• Execution: The optimized query plan is executed to retrieve the data.
• Key Components:
• Query Parser: Converts SQL queries into a format that the query
optimizer can process.
• Query Optimizer: Determines the most efficient way to execute a
query by evaluating various execution plans.
• Query Executor: Executes the optimized query plan and retrieves the
results.
• Selection Algorithms:
• Linear Search: Scans each tuple in the relation.
• Binary Search: Requires sorted data; divides the search space in half
iteratively.
• Index-Based Search: Uses indices to directly access tuples satisfying
the selection condition.
• Join Algorithms:
• Nested-Loop Join: For each tuple in relation R, scans all tuples in
relation S.
• Sort-Merge Join: Sorts both relations on the join attribute and then
merges them.
• Hash Join: Partitions the relations using a hash function and joins
matching partitions.
• Projection Algorithms:
• Simple Projection: Eliminates unwanted columns and may remove
duplicates.
• Projection with Sorting: Sorts the result to remove duplicates
efficiently.
• Projection with Hashing: Uses a hash table to remove duplicates.
π_name
|
σ_age > 20
|
students
• Query Graph:
• A graph representation of a query that shows the relationships and
operations in a non-linear structure.
• Nodes represent operations or relations, and edges represent the flow
of data.
• Heuristics:
• Push Selections Down: Apply selection operations as early as
possible to reduce the size of intermediate results.
• Push Projections Down: Apply projection operations early to reduce
the number of columns in intermediate results.
• Join Ordering: Choose the most efficient order for join operations to
minimize the size of intermediate results.
• Combining Operations: Combine adjacent operations when possible
to simplify the query tree.
• Benefits:
• Reduces the cost of query execution by minimizing the amount of
data processed at each step.
• Simplifies query plans, making them easier to understand and
execute.
Functional Dependencies:
• Definition:
• A functional dependency (FD) is a constraint between two sets of
attributes in a relation.
• Example:
• Use in Normalization:
• Functional dependencies are used to identify and eliminate
redundancy in database design through normalization.
Normal Forms:
• Purpose:
• Normal forms are guidelines to reduce redundancy and avoid
anomalies in database design.
• First Normal Form (1NF):
• Ensures each column contains atomic (indivisible) values and each
row is unique.
• Second Normal Form (2NF):
• Achieved when a relation is in 1NF and all non-key attributes are fully
functionally dependent on the primary key.
• Third Normal Form (3NF):
• Achieved when a relation is in 2NF and all the attributes are
functionally dependent only on the primary key.
• Boyce-Codd Normal Form (BCNF):
Module 2
• Definition:
• An Object-Oriented Database (OODB) stores data in the form of
objects, similar to how object-oriented programming languages like Java and C++
manage data.
• Key Characteristics:
• Persistence: Objects persist beyond program execution, stored in the
database.
• Complex Data Types: Supports storage of complex data types such
as multimedia, spatial data, and more.
• Inheritance: Objects can inherit properties and methods from other
objects.
• Encapsulation: Data and behavior are encapsulated within objects.
• Polymorphism: Objects can be treated as instances of their parent
class.
• Benefits:
• Seamless integration with object-oriented programming.
• Efficient handling of complex data types.
• Enhanced data modeling capabilities.
• Objects:
• Instances of classes containing both data (attributes) and behavior
(methods).
• Classes:
• Blueprints for creating objects, defining attributes and methods.
• Inheritance:
• Mechanism by which a class (subclass) inherits attributes and
methods from another class (superclass).
• Encapsulation:
• Bundling data and methods that operate on the data within one unit,
restricting direct access to some of the object’s components.
• Polymorphism:
• Ability to process objects differently based on their data type or class.
• Abstraction:
• Hiding complex implementation details and showing only the essential
features of the object.
• Concepts:
• Classes and Objects: Central elements of OOD modeling.
• Relationships: Defines how objects interact (e.g., associations,
inheritance).
• Methods: Functions or procedures defined within a class.
• Modeling Techniques:
• Class Diagrams: Visual representation of classes, their attributes,
methods, and relationships.
• Use Case Diagrams: Depict how users interact with the system.
• Sequence Diagrams: Show object interactions over time.
• Concepts:
• Tables and Objects: Integrates relational tables with object-oriented
concepts.
• Inheritance: Supports class hierarchies within the database schema.
• Complex Data Types: Allows the use of user-defined types and
structures.
• Modeling Techniques:
• ER Diagrams (Enhanced with Objects): Extends traditional ER
diagrams to include object-oriented elements.
• Relational Tables with Object Columns: Incorporates columns that
can store objects or references to objects.
• Specialization:
• Creating sub-classes from a parent class, adding specific attributes or
methods.
• Generalization:
• Creating a parent class from multiple sub-classes, abstracting
common features.
• Aggregation:
• A “whole-part” relationship where a class is composed of one or more
classes.
• Associations:
• Defines relationships between classes, such as one-to-one, one-to-
many, and many-to-many.
• Definition:
• A query language used for querying object-oriented databases, similar
to SQL but tailored for object retrieval.
• Key Features:
• Object Identification: Retrieves objects based on their identity.
• Complex Queries: Supports complex queries involving nested objects
and collections.
• Inheritance: Handles queries involving class hierarchies and
polymorphism.
• Example Syntax:
• SELECT p.name FROM Person p WHERE p.age > 30;
Module 3
• Data Partitioning:
• Horizontal Partitioning: Divides a table into rows, distributing different
rows to different nodes.
• Vertical Partitioning: Divides a table into columns, distributing different
columns to different nodes.
• Hybrid Partitioning: Combination of horizontal and vertical partitioning.
• Parallel Query Processing:
• Inter-Query Parallelism: Executes multiple queries concurrently.
• Intra-Query Parallelism: Executes a single query using multiple
processors concurrently.
• Load Balancing:
• Ensures even distribution of workload across all processors to prevent
any single node from becoming a bottleneck.
• Pipeline Parallelism:
• Executes different stages of a query simultaneously on different
processors.
• Partitioned Parallelism:
• Splits data into partitions, processing each partition in parallel.
• Parallel Join Algorithms:
• Parallel Nested-Loop Join: Distributes the outer loop across multiple
processors.
• Parallel Sort-Merge Join: Distributes sorting and merging operations
across multiple processors.
• Parallel Hash Join: Distributes hashing and joining operations across
multiple processors.
• Data Fragmentation:
• Horizontal Fragmentation: Divides a relation into subsets of tuples.
• Vertical Fragmentation: Divides a relation into subsets of columns.
• Mixed Fragmentation: Combination of horizontal and vertical
fragmentation.
• Data Replication:
• Full Replication: Copies the entire database to multiple sites.
• Partial Replication: Copies parts of the database to multiple sites.
• Data Allocation:
• Centralized: All data is stored in a single location.
• Decentralized: Data is stored across multiple locations.
Architectures:
• Client-Server Architecture:
• Clients request services from a centralized server.
• Peer-to-Peer Architecture:
• Each node in the network can act as both a client and a server.
• Federated Architecture:
• Independent databases are integrated under a common interface.
• Multi-Database System:
• Multiple autonomous databases that can cooperate without full
integration.
Design:
• Global Schema Design:
• Provides a unified view of the distributed data.
• Fragmentation Schema Design:
• Defines how data is fragmented and allocated.
• Replication Schema Design:
• Specifies which data fragments are replicated and where.
Implementation:
• Communication Infrastructure:
• Ensures reliable and efficient data transfer between distributed sites.
• Middleware:
• Facilitates communication and data exchange between
heterogeneous databases.
• Distributed DBMS Software:
• Manages distributed data, ensuring consistency and coordination.
Fragmentation:
• Purpose:
• Improve query performance and ensure data is stored close to where
it is needed.
• Types:
• Primary Fragmentation: Initial division of data.
• Derived Fragmentation: Further division based on access patterns.
• Reconstruction:
• Ensuring that fragmented data can be recombined to recreate the
original data set.
• Location Transparency:
• Users do not need to know the physical location of data.
• Fragmentation Transparency:
• Users are unaware of how data is fragmented.
• Replication Transparency:
• Users do not need to know about data replication.
• Failure Transparency:
• Ensures database operations are unaffected by site failures.
Module 4
• Definition:
• Interfaces that enable interaction between web browsers and web
servers to access and manipulate data.
• Key Components:
• HTML/CSS: For structuring and styling web content.
• JavaScript: For client-side scripting to enhance user interaction.
• Web Servers: Serve web pages to users (e.g., Apache, Nginx).
• APIs (Application Programming Interfaces): Enable communication
between different software systems.
• HTTP/HTTPS: Protocols for transferring data over the web.
• Technologies:
• RESTful Services: Web services that follow REST (Representational
State Transfer) principles.
• SOAP (Simple Object Access Protocol): Protocol for exchanging
structured information in web services.
Overview of XML:
• Definition:
• Extensible Markup Language (XML) is a flexible text format used for
structuring and exchanging data.
• Key Features:
• Self-descriptive: Tags define the data and its structure.
• Platform-independent: Can be used across different systems and
technologies.
• Hierarchical Structure: Data is organized in a tree-like structure.
• Common Uses:
• Data interchange: Between different systems.
• Configuration files: For software applications.
• Web services: For data exchange in web applications.
• Elements:
• Basic building blocks, enclosed in tags (e.g., <book>).
• Attributes:
• Provide additional information about elements (e.g., <book
genre="fiction">).
• Text Content:
• The actual data within elements.
• Hierarchy:
• Nested elements to represent complex data structures.
• Example:
<library>
<book id="1">
<title>Database Systems</title>
<author>John Doe</author>
</book>
</library>
Document Schema:
• Definition:
• Defines the structure and rules for an XML document.
• Types:
• DTD (Document Type Definition): Specifies the structure and legal
elements/attributes.
• XML Schema (XSD): More powerful and expressive than DTD,
supports data types and namespaces.
• Purpose:
• Ensure data consistency and validation.
• XPath:
• Language for navigating through elements and attributes in an XML
document.
• Example Query: //book/title
• XQuery:
• Powerful language for querying and transforming XML data.
• Example Query:
for $b in //book
where $b/author = 'John Doe'
return $b/title
XML Applications:
• Web Services:
• SOAP and RESTful services use XML for data interchange.
• RSS Feeds:
• Websites use XML for syndicating content updates.
• Configuration Files:
• Many software applications use XML for configuration (e.g.,
web.config in ASP.NET).
• Data Interchange:
• Used in B2B (Business-to-Business) applications for data exchange.
• Definition:
• Data model that does not adhere to a rigid structure, allowing flexibility
in data representation (e.g., XML, JSON).
• Characteristics:
• Flexible Schema: Structure can vary from one data item to another.
• Self-descriptive: Data items carry schema information within them.
• Hierarchical: Often represented in a tree-like structure.
• Examples:
• XML, JSON, NoSQL databases (e.g., MongoDB).
Implementation Issues:
• Complexity:
• Parsing and processing XML can be complex and resource-intensive.
• Performance:
• Efficient indexing and query optimization techniques are required to
handle large XML datasets.
• Storage:
• Need for effective storage solutions to handle large and nested XML
documents.
• Consistency:
• Ensuring data consistency and integrity in semi-structured data.
Indexes for Text Data:
• Purpose:
• Improve the performance of text-based queries.
• Types of Indexes:
• Full-Text Indexing: Allows efficient searching within text documents.
• Inverted Index: Maps terms to their locations in a document collection,
commonly used in search engines.
• Techniques:
• Tokenization: Breaking down text into smaller units (tokens).
• Stemming and Lemmatization: Reducing words to their base or root
form.
• Stop Words Removal: Eliminating common words (e.g., “and”, “the”)
that do not carry significant meaning.
• Examples:
• Lucene: Open-source search library for full-text indexing.
• Elasticsearch: Distributed search engine built on top of Lucene.
Module 5