0% found this document useful (0 votes)
13 views17 pages

Unit1 5

The document provides a comprehensive overview of Database Management Systems (DBMS), covering concepts such as database architecture, data models, SQL, normalization, and transaction processing. It distinguishes between database systems and file systems, explains the relational data model, and outlines the importance of normalization in database design. Additionally, it discusses ACID properties of transactions and methods for ensuring recoverability and serializability in database operations.

Uploaded by

arelaji2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views17 pages

Unit1 5

The document provides a comprehensive overview of Database Management Systems (DBMS), covering concepts such as database architecture, data models, SQL, normalization, and transaction processing. It distinguishes between database systems and file systems, explains the relational data model, and outlines the importance of normalization in database design. Additionally, it discusses ACID properties of transactions and methods for ensuring recoverability and serializability in database operations.

Uploaded by

arelaji2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

UNIT-01

Introduction

 Overview:

o A Database Management System (DBMS) is a software system designed to


manage and organize data in a structured manner, making it efficient to store,
retrieve, update, and delete data.

o It provides a controlled environment for data access and manipulation,


ensuring data integrity, security, and concurrency.

 Database System vs. File System:

o File System:

 Collection of files stored on a storage device.

 Data is stored in individual files with limited relationships between


them.

 Limited data integrity and security features.

o Database System:

 Centralized repository of data with well-defined relationships between


different data items.

 Provides mechanisms for data integrity, security, concurrency control,


and recovery.

 Offers a higher level of abstraction and data independence.

 Database System Concept and Architecture:

o Three-Schema Architecture:

 External Schema: Individual user views of the database.

 Conceptual Schema: An abstract representation of the entire


database.

 Internal Schema: How data is physically stored on the storage devices.

 Data Model, Schema, and Instances:

o Data Model: A collection of concepts that describe the structure of a database.


(e.g., Relational Model, Entity-Relationship Model)
o Schema: A specific description of a particular database within a given data
model.

o Instance: A specific set of data stored in the database at a particular point in


time.

 Data Independence and Database Language and Interfaces:

o Data Independence: The ability to change the schema at one level of the
database system without affecting the schema at the next higher level. 1

 Logical Data Independence: Changes in the conceptual schema do not


affect external schemas.

 Physical Data Independence: Changes in the internal schema do not


affect the conceptual schema.

o Database Languages:

 Data Definition Language (DDL): Used to define the database schema


(e.g., CREATE, ALTER, DROP).

 Data Manipulation Language (DML): Used to manipulate data within


the database (e.g., SELECT, INSERT, UPDATE, DELETE).

 Overall Database Structure:

o Typically consists of tables (relations) with rows (tuples) and columns


(attributes).

o Relationships between tables are defined using primary keys and foreign keys.

Data Modeling Using the Entity-Relationship Model (ER Model)

 ER Model Concepts:

o Entity: A real-world object or concept that can be uniquely identified (e.g.,


Student, Department, Course).

o Attribute: A property or characteristic of an entity (e.g., StudentID, Name,


Department).

o Relationship: An association between two or more entities (e.g., Student


'enrolls in' Course).

 Notation for ER Diagram:

o Entity: Represented by a rectangle.


o Attribute: Represented by an oval or circle.

o Relationship: Represented by a diamond.

o Cardinality: Represents the number of instances of one entity that can be


associated with instances of another entity (e.g., 1:1, 1:N, N:M).

 Mapping Constraints:

o One-to-One (1:1): One instance of entity A is associated with at most one


instance of entity B, and vice versa.

o One-to-Many (1:N): One instance of entity A is associated with many instances


of entity B, but one instance of entity B is associated with at most one instance
of entity A.

o Many-to-Many (N:M): One instance of entity A is associated with many


instances of entity B, and one instance of entity B is associated with many
instances of entity A.

 Keys:

o Super Key: A set of attributes that uniquely identifies each tuple in a relation.

o Candidate Key: A minimal super key (no subset of the attributes can uniquely
identify tuples).

o Primary Key: A candidate key chosen to uniquely identify each tuple in a


relation.

 Generalization:

o Represents a 'is-a' relationship between entities.

o A more general entity is called a superclass, and more specific entities are
called subclasses.

 Aggregation:

o Represents a 'part-of' or 'has-a' relationship.

o Treats a group of entities as a single, higher-level entity.

 Reduction of an ER Diagram to Tables:

o Translate entities and relationships into tables.

o Represent attributes as columns in tables.

o Use primary and foreign keys to represent relationships between tables.

 Extended ER Model:
o Includes additional features like weak entities, subtypes, and
specialization/generalization hierarchies.

 Relationship of Higher Degree:

o Represents a relationship between more than two entities.

o Can be modeled using additional tables or by decomposing the relationship


into binary relationships.

This provides a comprehensive introduction to database systems, including the ER


model and its key concepts.

UNIT-02

Relational Data Model

 Concepts:

o Relation: A named, two-dimensional table with rows (tuples) and columns


(attributes).

o Tuple: A single row in a relation, representing a record.

o Attribute: A named column in a relation, representing a specific property of an


entity.

o Domain: A set of permitted values for an attribute.

 Integrity Constraints:

o Entity Integrity:

 Primary Key: Uniquely identifies each row in a table.

 No attribute of a primary key can be NULL.

o Referential Integrity:

 Foreign Key: A column in one table that references the primary key of
another table.

 Ensures that related data exists and maintains consistency between


tables.

o Domain Constraints: Define the valid values for each attribute.


o Key Constraints: Primary Key, Unique Key, Candidate Key, Super Key.

 Relational Algebra:

o A set of operations to manipulate relations:

 Selection (σ): Selects rows that satisfy a given condition.

 Projection (π): Selects specific columns.

 Union (∪): Combines rows from two relations.

 Intersection (∩): Selects rows that exist in both relations.

 Difference (-): Selects rows that exist in one relation but not the other.

 Cartesian Product (×): Combines all rows from one relation with all
rows from another.

 Join (⨝): Combines rows from two relations based on a matching


condition.

 Relational Calculus:

o Tuple Relational Calculus (TRC): Defines a query as a set of tuples that satisfy
a given formula.

o Domain Relational Calculus (DRC): Defines a query as a set of domain values


that satisfy a given formula.

Introduction to SQL

 Structured Query Language: A powerful language for managing relational databases.

 Characteristics:

o Declarative language: Specifies what data to retrieve, not how to retrieve it.

o High-level language: Easy to learn and use.

o Standard language: Supported by most relational database systems.

 Advantages:

o Data independence: Changes in data storage do not affect SQL queries.

o High-level abstraction: Simplifies complex data manipulation tasks.

o Powerful and flexible: Can perform a wide range of operations.

SQL Data Types and Literals

 Data Types:
o Numeric (INTEGER, FLOAT, DECIMAL)

o Character (CHAR, VARCHAR, TEXT)

o Date/Time (DATE, TIME, TIMESTAMP)

o Boolean (BOOLEAN)

o Binary (BLOB)

 Literals:

o Constant values used in SQL statements.

o Example: 'John Doe', 123, '2023-11-28'

Types of SQL Commands

 DDL (Data Definition Language):

o CREATE: Creates database objects (tables, views, indexes).

o ALTER: Modifies the structure of database objects.

o DROP: Deletes database objects.

 DML (Data Manipulation Language):

o SELECT: Retrieves data from tables.

o INSERT: Adds new rows to a table.

o UPDATE: Modifies existing rows in a table.

o DELETE: Removes rows from a table.

 DCL (Data Control Language):

o GRANT: Grants privileges to users.

o REVOKE: Revokes privileges from users.

SQL Operators and Their Procedure

 Arithmetic Operators: +, -, *, /

 Comparison Operators: =, <>, <, >, <=, >=

 Logical Operators: AND, OR, NOT

 Set Operators: UNION, INTERSECT, EXCEPT (MINUS)

Tables, Views, and Indexes

 Tables: The fundamental data storage unit in a relational database.


 Views: Virtual tables that provide a customized view of data from one or more tables.

 Indexes: Data structures that improve the speed of data retrieval.

Queries and Subqueries

 Queries: SELECT statements that retrieve data from the database.

 Subqueries: Nested SELECT statements within another SQL statement.

Aggregate Functions

 Functions that summarize data:

o AVG(): Calculates the average value.

o SUM(): Calculates the sum of values.

o COUNT(): Counts the number of rows.

o MIN(): Finds the minimum value.

o MAX(): Finds the maximum value.

Insert, Update, and Delete Operations

 INSERT: Adds new rows to a table.

 UPDATE: Modifies existing rows in a table.

 DELETE: Removes rows from a table.

Joins

 Combine data from multiple tables:

o INNER JOIN: Returns rows where there is a match in both tables.

o LEFT JOIN: Returns all rows from the left table and matching rows from the
right table.

o RIGHT JOIN: Returns all rows from the right table and matching rows from the
left table.

o FULL JOIN: Returns all rows from both tables.

Unions, Intersections, Minus

 Set operations that combine or compare results from multiple queries.

Cursors

 Allow row-by-row processing of a result set.

Triggers
 Automatically execute a block of code in response to a database event (e.g., INSERT,
UPDATE, DELETE).

Procedures in SQL/PL SQL

 Stored blocks of code that can be executed multiple times.

 Improve code reusability and maintainability.

This comprehensive overview covers the essential concepts of the relational data model, SQL,
and related topics.

UNIT-03

Database Design & Normalization

1. Functional Dependencies (FDs)

 Definition: A functional dependency X → Y holds between two sets of attributes X and


Y in a relation R if, for every pair of tuples t1 and t2 in R, if t1[X] = t2[X], then t1[Y] =
t2[Y].

o In simpler terms: If you know the value of X, you can uniquely determine the
value of Y.

 Example: In a relation Employee(EmpID, Name, Dept), EmpID → Name holds because


each employee has a unique name associated with their ID.

2. Normal Forms

 1NF (First Normal Form):

o Atomic Values: Each attribute contains only a single, atomic value (no lists or
repeating groups within a single cell).

o No Repeating Groups: Each row must have the same number of columns.

 2NF (Second Normal Form):

o In 1NF.

o Full Functional Dependency: Every non-key attribute is fully functionally


dependent on the entire primary key.
 No partial dependencies: Non-key attributes should not depend on only
a part of the primary key if it's composite.

 3NF (Third Normal Form):

o In 2NF.

o No Transitive Dependencies: Non-key attributes should not depend on other


non-key attributes.

 BCNF (Boyce-Codd Normal Form):

o A stricter version of 3NF.

o Every determinant is a candidate key: If X → Y holds, then X must be a


superkey (a set of attributes that uniquely identify a tuple).

 Higher Normal Forms (4NF, 5NF):

o Address more complex dependencies like multi-valued dependencies (4NF)


and join dependencies (5NF).

3. Inclusion Dependencies

 Constraints that specify that a subset of tuples in one relation must also appear in
another relation.

4. Lossless Join Decompositions

 A decomposition of a relation into multiple relations is lossless if the original relation


can be perfectly reconstructed by joining the decomposed relations.

5. Normalization Using FDs, MVDs, and JDs

 FDs (Functional Dependencies): Used to achieve 1NF, 2NF, 3NF, and BCNF.

 MVDs (Multi-valued Dependencies): Used to achieve 4NF.

o Express that multiple values of one attribute are associated with a single value
of another attribute, independent of other attributes.

 JDs (Join Dependencies): Used to achieve 5NF.

o Express that a relation can be reconstructed by joining a set of its projections.

6. Alternative Approaches to Database Design

 Denormalization: Intentionally introducing redundancy to improve performance in


specific cases (e.g., for frequently accessed data).

 Object-Oriented Databases: Store data as objects with attributes and methods, better
suited for complex data structures.
 NoSQL Databases: Designed to handle large volumes of unstructured or semi-
structured data (e.g., JSON, XML).

Key Goals of Normalization

 Minimize Redundancy: Reduce data duplication to improve data integrity and


efficiency.

 Reduce Anomalies: Prevent insertion, deletion, and update anomalies that can occur
due to redundancy.

 Improve Data Integrity: Ensure the accuracy and consistency of data.

 Improve Data Maintainability: Make it easier to modify and update the database
schema.

Normalization Process

1. Identify FDs, MVDs, and JDs: Analyze the data requirements and identify the
dependencies between attributes.

2. Decompose Relations: Break down relations into smaller relations to achieve the
desired normal form.

3. Evaluate Decompositions: Ensure that the decompositions are lossless and


dependency-preserving.

By following these principles, database designers can create well-structured and efficient
databases that meet the specific needs of an application.

UNIT-04

Transaction Processing Concept


 Transaction: A logical unit of work that accesses and possibly modifies
data in a database.
 ACID Properties:
o Atomicity: All-or-nothing execution. A transaction either completes
entirely or has no effect.
o Consistency: Preserves database integrity constraints.
o Isolation: Concurrent transactions appear to execute serially.
o Durability: Once committed, changes are permanent and survive
system failures.
Testing of Serializability
 Serializability: A schedule is serializable if its effect on the database is
equivalent to the effect of executing the transactions in some serial order.
 Methods:
o Precedence Graph: A directed graph to visualize the order of
conflicting operations. If the graph has no cycles, the schedule is
conflict serializable.
o View Serializability: A schedule is view serializable if it is equivalent
to some serial schedule with respect to the final state of the
database and the values read by each transaction.
Conflict & View Serializable Schedule
 Conflict Serializable: A schedule is conflict serializable if it is equivalent to
some serial schedule with respect to the order of conflicting operations.
 View Serializable: A schedule is view serializable if it is equivalent to some
serial schedule with respect to the final state of the database and the
values read by each transaction.
 Relationship: Conflict serializability is a stronger condition than view
serializability. All conflict serializable schedules are view serializable, but
not all view serializable schedules are conflict serializable.
Recoverability
 Recoverability: Ensures that if a transaction fails, the effects of its partial
execution are undone, and the database is restored to a consistent state.
 Types:
o Cascadeless Recovery: A transaction can only commit if all
transactions it read from have already committed.
o Strict Recovery: A transaction can only commit if all transactions it
read from and wrote to have already committed.
Recovery from Transaction Failures
 Log-Based Recovery:
o Log: A sequence of records that record the actions of transactions.
o Types of Log Records:
 Start: Indicates the beginning of a transaction.
 Commit: Indicates the successful completion of a
transaction.
 Abort: Indicates the termination of a transaction due to
failure.
 Write: Records the value written to a data item.
o Recovery Techniques:
 Undo: Reverts the effects of aborted transactions.
 Redo: Re-executes the effects of committed transactions
that were not written to the stable storage before a system
crash.
Checkpoints
 Checkpoints: Periodically recorded points in the execution of
transactions.
 Purpose:
o Reduce recovery time after a system crash.
o Minimize the amount of log that needs to be processed during
recovery.
Deadlock Handling
 Deadlock: A situation where two or more transactions are waiting for
resources held by each other, resulting in a stalemate.
 Detection:
o Wait-for Graph: A directed graph to represent the wait-for
relationships between transactions. If the graph contains a cycle, a
deadlock exists.
 Prevention:
o Timestamp Ordering: Assign timestamps to transactions and
enforce an order based on timestamps.
o Resource Ordering: Impose a partial order on resources and
require transactions to request resources in that order.
 Resolution:
o Timeout: Terminate a transaction that has been waiting for
resources for too long.
o Deadlock Detection and Rollback: Detect deadlocks and abort one
or more transactions involved in the deadlock.
Distributed Database
 Distributed Data Storage: Data is physically stored on multiple computers
in a network.
 Advantages:
o High Availability: Increased fault tolerance.
o Scalability: Easy to add more nodes to the system.
o Local Autonomy: Each site can manage its own data.
 Concurrency Control: Techniques to ensure data consistency in a
distributed environment.
o Two-Phase Locking (2PL): Each transaction acquires all locks before
releasing any locks.
o Timestamp Ordering: Transactions are assigned timestamps, and
operations are executed in timestamp order.
o Optimistic Concurrency Control: Transactions execute without any
restrictions, and conflicts are detected and resolved at commit
time.
Directory System
 Maintains information about the location of data fragments in the
distributed database.
 Enables efficient data access and query processing.
 Types:
o Centralized Directory: A single node maintains the directory
information.
o Distributed Directory: Directory information is partitioned and
replicated across multiple nodes.
This comprehensive overview covers the key concepts of transaction processing,
concurrency control, and distributed databases.

UNIT-05

Concurrency Control
 Definition: Mechanisms to ensure that concurrent execution of
transactions in a database system preserves data consistency and avoids
anomalies like lost updates, dirty reads, and uncommitted dependency.
Locking Techniques
 Two-Phase Locking (2PL):
o Growing Phase: Transactions acquire locks before accessing data
items.
o Shrinking Phase: Transactions release locks without acquiring any
new locks.
o Strict 2PL: A transaction cannot release any lock until it commits or
aborts.
 Lock Compatibility Matrix: Defines which locks can be granted
concurrently on the same data item.
 Granularity of Locking:
o Fine-grained Locking: Locks are acquired on individual data items
(e.g., tuples, records).
o Coarse-grained Locking: Locks are acquired on larger units (e.g.,
tables, pages).
Timestamping Protocols
 Basic Timestamp Ordering: Each transaction is assigned a unique
timestamp. Transactions are executed in timestamp order.
 Thomas Write Rule: Allows a later timestamp transaction to overwrite an
earlier timestamp transaction's write if the earlier transaction has already
committed.
Validation Based Protocol
 Optimistic Concurrency Control: Assumes that conflicts are rare.
 Phases:
o Read Phase: Transactions read data items without acquiring locks.
o Validation Phase: Before committing, the transaction checks for
conflicts with other committed transactions.
o Write Phase: If no conflicts are detected, the transaction writes its
changes to the database.
Multiple Granularity
 Hierarchical Locking: Data is organized in a hierarchy (e.g., database,
table, page, tuple).
 Intention Locks: Indicate the intention to acquire locks at finer granularity
levels.
 Example: A transaction may acquire an intention lock on a table before
acquiring exclusive locks on specific tuples within that table.
Multi-Version Schemes
 Maintain multiple versions of data items.
 Allow concurrent readers to access older versions of data while writers
create new versions.
 Reduces blocking and improves concurrency.
Recovery with Concurrent Transactions
 Log-Based Recovery:
o Undo/Redo Logs: Record the actions of transactions.
o Checkpointing: Periodically record the state of the system to
reduce recovery time.
 Concurrency Control and Recovery:
o Concurrency control mechanisms must be coordinated with
recovery techniques to ensure consistent database state after
failures.
Case Study of Oracle
 Oracle Database: A popular commercial database system that employs a
combination of concurrency control techniques, including:
o Multi-Version Concurrency Control (MVCC)
o Locking (row-level, table-level)
o Granularity Control

 Oracle also incorporates advanced features like:


o Parallel Execution
o Data Warehousing
o High Availability and Disaster Recovery
This provides a comprehensive overview of concurrency control techniques in
database systems.

You might also like