0% found this document useful (0 votes)
6 views

DBMS Unit-1 Notes

The document provides an overview of databases and Database Management Systems (DBMS), highlighting their significance in various applications such as banking, reservations, and social media. It discusses the characteristics of databases, the functionalities of DBMS, and the roles of different users and administrators involved in database management. Additionally, it outlines the advantages of using a DBMS approach, including reduced redundancy, improved data integrity, and efficient query processing.

Uploaded by

classykomal902
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

DBMS Unit-1 Notes

The document provides an overview of databases and Database Management Systems (DBMS), highlighting their significance in various applications such as banking, reservations, and social media. It discusses the characteristics of databases, the functionalities of DBMS, and the roles of different users and administrators involved in database management. Additionally, it outlines the advantages of using a DBMS approach, including reduced redundancy, improved data integrity, and efficient query processing.

Uploaded by

classykomal902
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

DBMS

UNIT I Notes
Introduction to Database

Databases are essential in everyday activities.

Common interactions involving databases:

• Banking transactions (deposit/withdrawal)


• Hotel and airline reservations.
• Library catalogue searches.
• Online purchases.
• Supermarket purchases.

Traditional Database Applications: Stores and access textual or numeric data.

Advancements in Database System:

• Social Media Databases – Store non-traditional data (posts, tweets, images, video)
Eg. – Facebook, Instagram, etc.
• Big Data Storage Systems (NOSQL Databases) – Created to manage data for
social media applications. Handle large scale data, used by Google, Amazon and
Yahoo. Provide cloud storage for various data types (documents, images, video,
emails, etc.)

DATABASES play a critical role in almost all areas where computers are use,
including business, e-commerce, social media, engineering, medicine, law, and
education.

A Database is a collection of related data. (But any haphazard collection of data is not
a database)

Data – Facts that can be recorded.

CHARACTERISTICS OF DATABASE:

➢ It always interacts with the real world (sometimes called miniworld or Universe of
Discourse (UOD)). Changes in miniworld are reflected in database.
➢ Database is logically coherent collection of data with some meaning. Any random
assortment of data is not a database.
➢ Database can be of any size and varying complexity.
➢ A database is designed, built and populated with data for a specific purpose. It is
generated manually or it may be computerized.

DBMS (DataBase Management System)


The DBMS is a general-purpose software system that facilitates the processes of
defining, constructing, manipulating, and sharing databases among various users and
applications.
• Definition/ Defining a database – Specifying data types, structures and
constraints of the data to be stored in database.
• Metadata – Data that describes the other data. Meta-data is stored by DBMS in
the form of database catalogue or directory.
• Constructing the database – Process of storing the data on some storage medium
that is controlled by DBMS.
• Manipulating a database – Involves querying, updating and generating reports.
• Sharing a database – Allows multiple users and applications to access database
simultaneously.
ROWS in database are known as TUPLES.
COLUMNS in database are known as ATTRIBUTES.

DBMS Functionalities:
a) Queries – Retrieve specific data.
b) Transactions – Read/Write data modifications.
c) Protection – Security against unauthorized access and system failures.
d) Maintenance – ensures long term usability and adapts to changing requirements.

SIMPLIFIED DATABASE SYSTEM ENVIRONMENT

The Diagram illustrates the components


and workflow of database system
which consists of three main parts:
1) Users/Programmers – They interact with the database system by sending queries
or application programs. These queries request data retrieval, modification or
updates.
2) DBMS Software -
• Processes Queries/Programs – The DBMS receives queries from users and
processes them.
• Accesses stores data – After preprocessing, the DBMS retrieves or modifies
the necessary data from storage.
3) Storage components –
• Stored Database Definition (Meta-Data) – This contains the database
schema, data structures and constraints.
• Stored Database – The actual data stored in the system.
This structure ensures efficient data management, security and integrity while allowing
multiple users to interact with the database simultaneously.

CHARACTERISTICS OF THE DATABASE APPROACH:


In traditional file processing, each user or department creates and maintains separate
files and programs, leading to redundancy and inefficiencies. The database approach, on
the other hand, provides a single repository for maintaining data that is defined once
and then accessed by various users repeatedly through queries, transactions, and
application programs.
Key Characteristics:
1) Self-describing nature of the database system:

• A database system contains not only the data but also metadata, which
describes the structure of the database. This makes the system self-
descriptive and independent of individual applications.
• The database system includes a DBMS catalog, which stores the definitions
of database structures, data types, storage formats, and constraints.
• The information in the catalog is called metadata, which helps the DBMS
understand and manage the database structure efficiently.
• Unlike traditional file-processing systems where data definitions are
embedded within programs, a database system uses the catalog, allowing
flexibility in handling multiple applications.
• Some modern NoSQL databases store self-describing data, meaning the
data itself includes information about its structure rather than relying on a
separate catalog.
• The DBMS catalog allows for greater adaptability, enabling the same
DBMS software to work with multiple database applications, such as
university, banking, or company databases.

2) Insulation between programs and data, and data abstraction:

• In traditional file processing, data structures are embedded within


application programs, making changes to data structure difficult and
requiring modifications to all programs accessing the data.
• In a DBMS environment, the structure of data is stored in the catalog,
allowing for program-data independence—programs do not need
modification when the structure of the data changes.
• For example, adding a new attribute (e.g., Birth_date) to a STUDENT record
requires only an update to the catalog rather than modifying all access
programs.
• Program-operation independence is another key feature, especially in
object-oriented and object-relational database systems, where operations
(methods) on data are defined separately from their implementation. This
allows changes to the implementation without affecting application
programs.
• Data abstraction ensures that users interact with a conceptual
representation of data rather than worrying about storage details.
• The data model provides a logical structure that hides complex storage
details, allowing users to focus on relationships and data properties rather
than physical implementation.
• For example, users querying a STUDENT database refer to attributes like
Name, without needing to know the byte locations or storage format of each
record.
• In object-oriented databases, abstraction extends beyond data structure to
include operations (e.g., a CALCULATE_GPA function for STUDENT records),
allowing users to perform computations without knowing the underlying
implementation.

3) Support of multiple views of the data:

• A database serves multiple types of users, each requiring a different


view/perspective of the data. A view can be a subset of the database or a
virtual representation derived from stored data.
• Users do not need to know whether the data they access is physically stored
or derived.
• A multiuser DBMS must support multiple views to cater to different user
needs.
• Example:

a) A user interested in student transcripts may require a view that


displays only student names, courses, and grades.
b) Another user verifying course prerequisites may need a different view
showing only student enrolments and prerequisite courses.

4) Sharing of data and multiuser transaction processing:

• A multiuser DBMS allows multiple users to access and update the database
simultaneously, ensuring data consistency and integrity.
• Concurrency control software ensures that multiple transactions do not
interfere with each other. For example, when multiple airline reservation
agents attempt to book the same seat, the DBMS ensures that only one
agent can successfully assign it.
• Applications that require such features are known as online transaction
processing (OLTP) applications.
• A transaction is a sequence of one or more database operations (e.g.,
reading, updating records) that must be executed correctly and completely.
• The isolation property ensures that each transaction appears to execute
independently, even if many transactions run concurrently.
• The atomicity property guarantees that either all operations in a
transaction are executed, or none of them are, maintaining data integrity.
• Transactions play a critical role in ensuring data reliability in banking,
airline reservations, and other real-time applications.

By implementing the database approach, organizations reduce data redundancy,


improve data integrity, and enhance overall efficiency in managing and retrieving
information.

ACTORS ON THE SCENE

In a large organization, multiple individuals are involved in designing, using, and


maintaining a database. These individuals are categorized as follows:

1) Database Administrators (DBA):

• Oversee and manage database resources.


• Responsible for authorizing access to the database, coordinating and
monitoring its usage, and acquiring software and hardware resources as
needed.
• Ensure database security and optimize system performance.
• In large organizations, DBA is assisted by a staff that carries out these
functions.

2) Database Designers:

• Identify and define data to be stored in the database.


• Choose appropriate structures for data storage.
• Work closely with potential database users to understand requirements.
• Develop and integrate database views that support user needs.
• Often part of the DBA team and may take on additional responsibilities after
design completion.

3) Ends Users: End users interact with the database for querying, updating, and
generating reports. The database primarily exists for their use. They are
categorized as:

• Casual End Users:


o Access the database occasionally for different information needs.
o Use sophisticated query interfaces.
o Typically, middle- or high-level managers or other occasional browsers.
• Naive or Parametric End Users:
o Regularly query and update the database using standard types of queries
and updates (called canned transactions).
o Example tasks:
▪ Bank tellers checking balances and posting transactions.
▪ Airline and hotel reservation agents making bookings.
▪ Shipping employees updating package statuses.
▪ Social media users posting and reading content.
• Sophisticated End Users:
o Engineers, scientists, business analysts, and other professionals.
o Familiar with DBMS functionalities and develop complex applications.
• Standalone Users:
o Maintain personal databases using ready-made software with menu-based
or graphical interfaces.
o Example: Personal finance software users.

4) System Analysts and Application Programmers (Software Engineers)

• System Analysts:
o Determine the database requirements of naive and parametric end users.
o Develop specifications for canned transactions.
• Application Programmers:
o Implement, test, debug, document, and maintain the database applications.
o Should be knowledgeable about DBMS capabilities to develop efficient
applications.

WORKERS BEHIND THE SCENE

In addition to database designers, users, and administrators, there are individuals


involved in the development, maintenance, and operation of the DBMS software and
system environment. These individuals typically do not interact with the database
content itself but play a crucial role in ensuring the system functions smoothly.

Categories of Workers Behind the Scene:

1) DBMS System designers and implementers –

• Responsible for designing and implementing the DBMS modules and


interfaces as a software package.
• Work on complex components such as query processing, data buffering,
concurrency control, data recovery, and security.
• Ensure DBMS compatibility with system software like operating systems
and programming language compilers.

2) Tool Developers –

• Design and develop software tools that facilitates database modeling,


system design, and improved performance.
• Create optional packages such as: Database design tools, Performance
monitoring software, Natural language or graphical interfaces,
Prototyping, simulation, and test data generation tools.
• Many of these tools are developed and marketed by independent software
vendors.

3) Operators and maintenance personnel (System Administration Personnel)

• Responsible for the actual running and maintenance of the hardware and
software environment for the database system.
ADVANTAGES OF USING DBMS APPROACH
1) Controlling redundancy
• Traditional File-Based Systems: Different user groups maintain
separate files, leading to redundancy. Each department may store the
same data, resulting in duplication of effort, wasted storage space, and
data inconsistencies when updates are not uniformly applied.
• DBMS Approach: The database system integrates different user views,
using data normalization to store each logical item in only one place,
ensuring consistency and saving storage space. Controlled redundancy
(denormalization) may be used to improve query performance, and the
DBMS enforces integrity constraints to prevent inconsistencies.

2) Restricting Unauthorized Access

• Not all users can access all data; sensitive data (e.g., salaries) is restricted.
• Users are assigned accounts with passwords to control access levels (read-
only or update).
• DBMS enforces security restrictions set by the DBA.
• DBA staff have exclusive access to privileged software (e.g., account
creation).
• Parametric users access data via predefined applications.

3) Providing Persistent Storage for Program Objects

• DBMS stores program objects and complex data structures persistently.


• Object-oriented databases integrate with languages like C++ and Java.
• Eliminates manual file storage conversions, ensuring seamless data
retrieval.

4) Providing Storage Structures and Search Techniques for Efficient Query


Processing

• Uses specialized data structures (indexes, trees, hash tables) to speed up


searches.
• DBMS often has a buffering or caching module that maintains the database
in main memory buffers for faster access.
• Query processing and optimization module of the DBMS ensures efficient
execution of queries.

5) Providing Backup and Recovery

• Provides mechanisms for recovering from system failures.


• Ensures database consistency after unexpected failures.
• Supports disk backups to prevent data loss from catastrophic failures.

6) Providing Multiple User Interfaces

• Supports diverse user interfaces: query languages, programming APIs, GUI,


mobile apps.
• Offers menu-driven and forms-style interfaces for ease of use.
7) Representing Complex Relationships among data

• DBMS efficiently manages interrelated data (e.g., student-course


relationships).
• Allows defining and retrieving relationships easily.

8) Enforcing Integrity Constraints

• Ensures data correctness via constraints (data types, uniqueness, referential


integrity).
• Prevents invalid data entries and maintains data consistency.

9) Permitting Inferencing and Actions using Rules and Triggers

• Deductive Database Systems provide capabilities for defining deduction


rules for inferencing new information from the stored database facts.
• Active database system provides active rules that can automatically initiate
actions when certain events and conditions occur.
• Stored procedures execute predefined processes when specific conditions
occur.

10) Potential for Enforcing Standards

• DBA enforces naming conventions, formats, and reporting standards.


• Improves consistency and cooperation across departments.
• Reduced Application Development Time
• Creating new applications using DBMS is faster than traditional file-based
systems.
• DBMS reduces development time significantly once the database is
operational.

11) Flexibility

• Allows modifications to database structure without affecting existing data.


• Adapts to evolving user requirements with minimal disruption.

12) Up-to-Date Information Availability

• Ensures real-time updates for all users (useful in banking, reservations,


etc.).
• Concurrency control prevents conflicts in multi-user environments.

13) Economies of Scale

• Reduces redundancy by consolidating data and applications.


• Organizations can invest in high-performance hardware rather than
redundant systems.
WHEN NOT TO USE A DBMS

There are a few situations in which a DBMS may involve unnecessary overhead costs
that would not be incurred in traditional file processing. The overhead costs of using a
DBMS are due to the following:

• High initial investment in hardware, software, and training.


• The generality that a DBMS provides for defining and processing data.
• Overhead for providing security, concurrency control, recovery, and integrity
functions.

Data Models, Schemas, and Instances


Data abstraction: Suppression of details of data organization and storage, highlighting
essential features for better understanding.
One of the main characteristics of the database approach is to support data abstraction
so that different users can perceive data at their preferred level of detail.
Data Model is a collection of concepts that can be used to describe the structure of a
database including data types, relationships, and constraints. Data models also include
a set of basic operations for specifying retrievals and updates on the database.

Categories of Data Models


We can categorize the data models according to the types of concepts they use to
describe the database structure:
1. Conceptual (High-Level) Data Models
• Close to how users perceive data.
• Use concepts such as entities, attributes, and relationships.
• Example: Entity–Relationship (ER) Model.
• Includes generalization, specialization, and categories.

2. Representational (Implementation) Data Models


• Bridge between conceptual and physical models.
• Hide many details of data storage on disk but can be directly implemented
on a computer system.
• Examples:
o Relational model: Most widely used in commercial DBMSs.
o Legacy data models: Network and hierarchical models.
o Object data model: Closer to conceptual models, used in software
engineering.
o Object-relational model: Extends the relational model with object-
oriented features.
3. Physical (Low-Level) Data Models
• Describe details of how data is stored in files on storage media.
• Focus on record formats, ordering, and access paths (e.g., indexing,
hashing).
• Example: Index structures for efficient data retrieval.

4. Self-Describing Data Models


• These models combine description of the data (schema) with actual data
values.
• Different from traditional DBMSs, where schema is separate from data.
• Examples:
o XML-based models
o Key-value stores
o NoSQL systems (for big data management).

Schemas, Instances, and Database State


Description of the database structure, specified during database design is called
Database Schema and is not expected to change frequently. Schemas represented using
schema diagrams, which show structure but not actual data instances. Example:

Each object in the schema (e.g., STUDENT, COURSE) is called a schema construct.
Database State (Snapshot): The actual data in a database at a specific moment in time.
• Changes frequently as records are inserted, deleted, or modified.
• Also called current set of occurrences/instances in the database.
• Every schema construct has its own set of instances (e.g., the STUDENT schema
contains individual student records).
• Many database states can correspond to the same schema.
Difference between Schema and Database State:
Schema Database State
Defines structure and constraints of the Represents the data at a particular time.
database.
Provided to the DBMS at the time of Changes with every database update
database definition. operation.
Stored as meta-data in the DBMS catalog. Must satisfy the constraints defined in the
schema.
Also known as intension. Also known as extension of the schema.
Three-Schema Architecture and Data Independence

Three-Schema Architecture
The three-schema architecture is designed to separate user applications from the
physical database. It provides three levels of abstraction:
1. Internal Level
o Has an internal schema that describes the physical storage structure of the
database.
o It uses a physical data model to define storage details and access paths for
the database.
2. Conceptual Level
o Contains a conceptual schema that describes the structure of entire
database for a community of users.
o Hides physical storage details and focuses on entities, data types,
relationships, user operations, and constraints.
o Representational data model is used to describe the conceptual schema
when a database system is implemented.
3. External (View) Level
o Includes multiple external schemas/user views, each tailored for specific
user groups.
o Hides the rest of the database from each user group.
o Often based on a conceptual schema in a high-level data model.
Key Points:
• The three-schema architecture provides data abstraction and independence.
• The three-schema architecture is a convenient tool with which the user can
visualize the schema levels in a database system.
• The actual data resides at the physical level, while users interact with external
schemas.
• DBMS translates user requests from external schemas to conceptual schemas and
then to internal schemas.
• The transformation process is known as mappings, which can be time-consuming.
• Some DBMSs do not fully implement the three-schema separation but still follow
its principles.

Data Independence
Data independence is the ability/capacity to modify/change the database schema at one
level of a database system without affecting higher levels. There are two types:

1. Logical Data Independence


o The ability to change the conceptual schema without altering external
schemas or application programs.
o Used when expanding the database (adding a record type or data item) or
modifying constraints.
o Ensures that applications depending on the external schema remain
unaffected by changes in database structure.
2. Physical Data Independence
o The capacity to change the internal schema without changing the
conceptual schema.
o Changes may include reorganization of physical storage, creation of
additional access structures, or performance optimizations.
o Ensures that application programs and user queries remain functional
despite changes in data storage.
Physical data independence is easier to achieve as physical storage details such as
the exact location of data on disk, and hard ware details of storage encoding,
placement, compression, splitting, merging of records, and so on are hidden from the
user.
Logical data independence is more challenging and harder to receive since
structural changes at the conceptual level should not affect applications.
This architecture and data independence are fundamental in modern DBMSs to provide
flexibility, scalability, and ease of maintenance.
DBMS LANGUAGES
1. Data Definition Language (DDL)
• Used by DBA and database designers to define database schemas.
• The DDL compiler processes DDL statements and stores the schema in the DBMS
catalog.
• If a clear separation of levels exists:
o DDL defines the conceptual schema.
o Storage Definition Language (SDL) defines the internal schema.
o View Definition Language (VDL) defines external schemas (user views).
• In relational DBMSs, SQL is used as both DDL and VDL.

2. Data Manipulation Language (DML)


• Used for retrieving, inserting, deleting, and modifying data.
• Two types of DML:
1. High-level (Nonprocedural) DML
▪ Specifies complex database operations concisely.
▪ Can be used interactively or embedded in a programming language.
▪ Known as set-oriented or set-at-a-time DML.
▪ Example: SQL queries (e.g., SELECT * FROM STUDENT;).
2. Low-level (Procedural) DML
▪ Requires embedding in a general-purpose programming language.
▪ Retrieves individual records/objects and processes them separately.
▪ Uses loops to retrieve and process records.
▪ Known as record-at-a-time DML.

3. Query Language
• Subset of high-level DML used for data retrieval.
• Users specify what to retrieve, not how.
• High-level DMLs like SQL are declarative.

4. Embedded and Host Languages


• When DML commands, whether high level or low level, are embedded in a general-
purpose language:
o The programming language is the host language.
o The DML is called the data sublanguage.
• Example: SQL embedded in Python, Java, C++.

5. User Interaction with DBMS Languages


• Casual users: Use high-level query languages interactively.
• Programmers: Use embedded DMLs.
• Naive and parametric users: Use user-friendly interfaces.

DATABASE SYSTEM ENVIRONMENT


A DBMS is a complex software system interacting with various system components.
DBMS components can be divided into:
• User interfaces (for DBAs, application programmers, casual users, and
parametric users).
• Internal modules (for data storage and transaction processing).

DBMS Component Modules:


• Storage of Data
o Database and DBMS catalog are stored on disk.
o Disk access is controlled by the Operating System (OS).
o Many DBMSs have their own buffer management module to optimize
read/write operations.
o A high-level Stored Data Manager module of DBMS controls access to DBMS
information that is stored on disk whether part of database or catalog.
• User Interfaces
o DBA staff: Uses DDL to define database schemas and manage performance.
o Casual users: Use an interactive query interface to request data.
o Application programmers: Write/create programs using host programming
languages (Java, C, Python, PHP, etc.), embedding DML commands.
o Parametric users: Do data entry work by supplying parameters to predefined
transactions (canned transactions).
• DBMS Catalog
o Stores metadata (schema descriptions, file sizes, data types, constraints,
mapping info).
o Accessed by DBMS modules for query processing and optimization.

Query Processing & Optimization


• Query Compiler
o Parses and validates queries for correct syntax and data references.
o Converts queries into an internal form for execution.
• Query Optimizer
o Rearranges and optimizes query execution.
o Eliminates redundancies and chooses efficient search algorithms.
o Uses the system catalog for data access statistics.

Program Execution and Transaction Management


• Application programs are processed as follows:
1. Precompiler extracts DML commands from an application program written
in a host programming language.
2. DML Compiler converts DML commands into object code for database
access.
3. Host language code is compiled separately.
4. The compiled DML object code and host language code are linked to form a
canned transaction.
• Canned Transactions
o Executed repeatedly by parametric users via PCs or mobile apps (e.g., online
banking transactions).
o Each execution is treated as a separate transaction.

Runtime Database Processor


• Executes the privileged commands.
• Executes the executable query plans.
• Executes the canned transactions with runtime parameters.
• Works with System catalog (and updates with statistics).
• Works with stored data manager (handles disk-memory I/O).
• Some DBMSs have their own buffer management module whereas others depend
on the OS for buffer management.
• Have Concurrency control and recovery module (for transaction management).

Client-Server Architecture
• Client computer runs DBMS client software and latter is called database server.
• Database server hosts the DBMS and the database.
• Application server (middleware)access the database server, and handle requests
between client and database server.

Classification of Database Management Systems


DBMSs can be classified based on various criteria:
1. Based on Data Model
o Relational DBMS (RDBMS): Uses tables (SQL-based systems).
o Object DBMS: Uses objects and classes.
o Object-Relational DBMS: Hybrid of relational and object models.
o NoSQL DBMS:
▪ Key-Value Storage Systems (e.g., Redis)
▪ Document-Based (e.g., MongoDB)
▪ Graph-Based (e.g., Neo4j)
▪ Column-Based (e.g., Cassandra)
o Hierarchical & Network DBMS: Older models used in legacy systems.
o XML DBMS: Based on XML tree structures.
2. Based on Number of Users supported by the system
o Single-User System: Supports one user at a time.
o Multi-User System: Supports multiple concurrent users.
3. Based on no. of sites which the Database is Distributed
o Centralized DBMS: Data stored at single computer site.
o Distributed DBMS (DDBMS): Data distributed across multiple sites.
▪ Homogeneous DDBMS: Same DBMS software across all sites.
▪ Heterogeneous DDBMS: Different DBMS software across sites.
▪ Federated DBMS: Middleware integrates multiple heterogeneous
databases.
4. Based on Cost
o Open-Source DBMS: Free (e.g., MySQL, PostgreSQL).
o Commercial DBMS: Paid with licensing (e.g., Oracle, SQL Server).
5. Based on types of Access Paths
o Inverted File DBMS: Uses special indexing techniques.
o General-Purpose/ Special-Purpose DBMS: When performance is a primary
consideration, a special-purpose DBMS can be designed and built for a
specific application; such a system cannot be used for other applications
without major changes. (e.g., airline reservation systems, OLTP-Online
Transaction Processing).
6. Legacy Database Models
o Network Model (CODASYL DBTG): Uses record types and set types.
o Hierarchical Model (IMS DL/1): Represents data hierarchical tree
structures.
Each type of DBMS is suited for different use cases based on scalability, complexity, and
performance needs.

You might also like