Module 1
Module 1
● A database is a collection of related data. By data, we mean known facts that can be recorded and that
have implicit meaning.
● For example, consider the names, telephone numbers, and addresses of the people you know.
● This collection of related data with an implicit meaning is a database.
● A database is a logically coherent collection of data with some inherent meaning. A random assortment
of data cannot correctly be referred to as a database.
● A database is designed, built, and populated with data for a specific purpose.
● In other words, a database has some source from which data is derived
● The end users of a database may perform business transactions (for example, a customer buys a
camera) or events may happen (for example, an employee has a baby) that cause the information in the
database to change.
● A database can be of any size and complexity.
● An example of a large commercial database is Amazon.com.
● It contains data for over 20 million books, CDs, videos, DVDs, games, electronics, apparel, and other
items. The database occupies over 2 terabytes (a terabyte is 1012 bytes worth of storage) and is stored
on 200 different computers (called servers). About 15 million visitors access Amazon.com each day
and use the database to make purchases.
A database management system (DBMS) is a collection of programs that enables users to create and maintain a database. The
DBMS is a general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing
databases among various users and applications.
Defining a database involves specifying the data types, structures, and constraints of the data to be stored in the
database.The database definition or descriptive information is also stored by the DBMS in the form of a database
catalog or dictionary; it is called meta-data.
Constructing the database is the process of storing the data on some storage medium that is controlled by the DBMS.
Manipulating a database includes functions such as querying the database to retrieve specific data, updating the
database to reflect changes in the miniworld, and generating reports from the data.
Sharing a database allows multiple users and programs to access the database simultaneously.
● An application program accesses the database by sending queries or requests for data to the DBMS. A query typically
causes some data to be retrieved; a transaction may cause some data to be read and some data to be written into the
database.
● Other important functions provided by the DBMS include protecting the database and maintaining it over a long period
of time.
Protection includes system protection against hardware or software malfunction (or crashes) and security protection
against unauthorized or malicious access.
● The DBMS must be able to maintain the database system by allowing the system to evolve as requirements change
over time.
Characteristics of Database Approach
● Traditional file processing system eg:Grade reporting office(keep files on students
and their grades.) and accounting office(may keep track of students fees and their
payments.)
● Each user maintains separate files and programs to manipulate these file
● Redundancy and wastage of space
● Database approach single repository(data that is defined once and then accessed by
various users.)
● In a database, the names or labels of data are defined once, and used repeatedly by
queries, transactions, and applications.
● Names and labels defined once and used repeatedly
1.Self describing nature of database system
● Database system contains not only database but definition and description of database
structure and constraints
● Catalog:This definition is stored in the DBMS catalog, which contains information
such as the structure of each file, the type and storage format of each data item, and
various constraints on the data it describes the structure of the primary database
● Who uses catalog? DBMS software and also by database users who need information
about the database structure.
● DBMS works well with any no of database applications eg.University db,college
db,company db
● Traditional file processing—-data definition,specific databaseFor example, an
application program written in C++ may have struct or class declarations, and a
COBOL program has data division statements to define its files.
● Eg:To access name of student record DBMS s/w refers catalog
Fig: DB that stores student and course info
Fig:example of a DB catalog
2. Insulation between programs and data and data abstraction
● In traditional file processing, the structure of data files is embedded in the application
programs, so any changes to the structure of a file may require changing all programs
that access that file.
● The structure of data files is stored in the DBMS catalog separately from the access
programs. We call this property program-data independence.
● For example, a file access program may be written in such a way that it can access only
STUDENT records If we want to add another piece of data to each STUDENT record, say the
Birth_date, such a program will no longer work and must be changed. By contrast, in a DBMS
environment, we only need to change the description of STUDENT records in the catalog
● The characteristic that allows program-data independence and program-operation
independence is termed as data abstraction.DBMS provides users with a conceptual
representation of data that does not include many of the details of how the data is stored or how
the operations are implemented.
A data model is a type of data abstraction that is used to provide this conceptual representation.
3. Support of multiple views of the data
● A database typically has many users, each of whom may require a different
perspective or view of the database.
● What is View?A view may be a subset of the database or it may contain virtual data
that is derived from the database files but is not explicitly stored. For example, one
user of the database may be interested only in accessing and printing the transcript of
each student;the view for this user is shown in Figure 1.5(a). A second user, who is
interested only in checking that students have taken all the prerequisites of each
course for which the student registers, may require the view shown in Figure 1.5(b)
4. Sharing of data & Multiuser transaction processing
● Allow multiple user to access database at same time
● Concurrency control system:-software to ensure that several users trying to update the same data
do so in a controlled manner so that the result of the updates is correct.
● For example, when several reservation agents try to assign a seat on an airline flight, the DBMS
should ensure that each seat can be accessed by only one agent at a time for assignment to a
passenger.
● These types of applications are generally called online transaction processing (OLTP)
applications.
● The isolation property ensures that each transaction appears to execute in isolation from other
transactions, even though hundreds of transactions may be executing concurrently.
● The atomicity property ensures that either all the database operations in a transaction are
executed or none are.
Actors on the scene
In large organizations, many people are involved in the design,use, and maintenance of a
large database with hundreds of users. In this section we identify the people whose jobs
involve the day-to-day use of a large database; we call them the actors on the scene.
1. Database Administrators
● In any organization where many people use the same resources, there is a need for a
chief administrator to oversee and manage these resources.
● In a data environment the primary resource is data and secondary resource is DBMS
and related s/w.
● DBA is responsible for authorizing access of database,coordinating & monitoring its
use and acquiring software and hardware resources as needed.
● The DBA is accountable for problems such as security breaches and poor system
response time.
2. Database Designers
● Database designers are responsible for identifying the data to be stored in the database
and for choosing appropriate structures to represent and store this data.
● It is the responsibility of database designers to communicate with all prospective
database users in order to understand their requirements and to create a design that
meets these requirements.
● In many cases, the designers are on the staff of the DBA and may be assigned other
staff responsibilities after the database design is completed.
3.End users
Access the database for querying,updating & generating reports Categories
Casual end users—occasionally access,they may need different information each
time.
Naive/Parametric end users–Their main job function revolves around constantly
querying and updating the database, using standard types of queries and
updates—called canned transactions
Examples
1:Bank tellers check account balances and post withdrawals and deposits.
2: Reservation agents for airlines, hotels, and car rental companies check availability for a
given request and make reservations.
Sophisticated end users include engineers, scientists, business analysts, and others
who thoroughly familiarize themselves with the facilities of the DBMS in order to
implement their own applications to meet their complex requirements.
2. Tool developers
design & implement tools -s/w package that support Database modelling design and
database system design.Tools are optional packages that are often purchased
separately.They include packages for database design, performance monitoring,
natural language or graphical interfaces.
3. Operators and maintenance personnel– run and maintain h/w & s/w environment,
backup and recovery, security, improved performance, availabilty etc
Advantages of using the DBMS Approach
1. Controlling Redundancy
Every user group maintains its own files for handling its data-processing applications;
this may lead to some duplicate data in their files
This redundancy in storing the same data multiple times leads to several problems.
First, there is the need to perform a single logical update—such as entering data on a
new student—multiple times: once for each file where student data is recorded. This
leads to duplication of effort. Second, storage space is wasted when the same data is
stored repeatedly, and this problem may be serious for large databases. Third, files
that represent the same data may become inconsistent.
For example, one user group may enter a student’s birth date erroneously as
‘JAN-19-1988’, whereas the other user groups may enter the correct value of
‘JAN-29-1988’
2.Restricting unauthorized access
● When multiple users share a large database, it is likely that most users will not be
authorized to access all information in the database.
● For example, financial data such as salaries and bonuses is often considered
confidential, and only authorized persons are allowed to access such data.Some users
are permitted to retrieve and others to update
● Users or user groups are given account numbers protected by passwords, which they
can use to gain access to the database. A DBMS should provide a security and
authorization subsystem, which the DBA uses to create accounts and to specify
account restrictions
3. Providing Persistent Storage for Program Objects
Object oriented database system programming language have complex data structures
like c++ or java
● Traditional file system
i) The values of program variables or objects are discarded once a program terminates, unless the
programmer explicitly stores them in permanent files and later convert them into file format
● DBMS
i)Once the pgm terminates→value of program variable→ not discarded→stores object
permanently.such object is persistent
● A DBMS must provide facilities for recovering from hardware or software failures.
The backup and recovery subsystem of the DBMS is responsible for recovery.
● For example, if the computer system fails in the middle of a complex update
transaction, the recovery subsystem is responsible for making sure that the database is
restored to the state it was in before the transaction started executing
6. Providing multiple user interfaces
● Many types of users with varying knowledge use the DB, therefore DBMS should
support multiple user interfaces
● apps for mobile users
● query language for casual users
● Programming language interface for application programmers
● Forms & command codes for parametric users
● Menu driven & natural language interface for stand alone users
● Both form style interface & menu driven interface known as GUI
7.Representing complex relationships among data
● A database may include numerous varieties of data that are interrelated in many ways
-The record for ‘Brown’ in the STUDENT file is related
to four records in the GRADE_REPORT file.
-A DBMS must have the capability to represent a variety
of complex relationships among the data, to define new
relationships as they arise, and to retrieve and update
related data easily and efficiently.
8.Enforcing Integrity Constraints
● Contraints are the restrictions imposed on the data
● A DBMS should provide capabilities for defining and enforcing these constraints. The
simplest type of integrity constraint involves specifying a data type for each data item
● Eg: Student record ,class have 1integer and name has alphabetical characters
● Enforcing integrity constraints in Database Management Systems (DBMS) is vital for
maintaining data accuracy and consistency. These constraints, including primary key,
foreign key, unique, check, entity integrity, and referential integrity, ensure that data
adheres to predefined rules.
9. Permitting inferencing and action using rules
DB
● Instances: refers to a specific occurrence or individual item of data within a defined
structure or schema.
if we have a schema construct called "Student," each individual student record stored
in the database would be considered an instance of the "Student" schema. Similarly, if
we have a schema construct called "Product," each item listed in the database
representing a specific product would be an instance of the "Product" schema.
When we define a new database, we specify its database schema only to the
DBMS. At this point, the corresponding database state is the empty state with no
data. We get the initial state of the database when the database is first populated
or loaded with the initial data. every time an update operation is applied to the
database, we get another database state. At any point in time, the database has
a current state.
● The DBMS is partly responsible for ensuring that every state of the database is a
valid state—that is, a state that satisfies the structure and constraints specified in
the schema
● The DBMS stores the descriptions of the schema constructs and constraints also
called the meta-data—in the DBMS catalog so that DBMS software can refer to the
schema whenever it needs to. The schema is sometimes called the intension, and a
database state is called an extension of the schema.
Three-Schema Architecture
Internal Schema
Conceptual Schema
External Schema
Data Independence
Logical Data Independence
Physical data independence:
Is the capacity to change the internal schema without having to change the conceptual schema.
Hence, the external schemas need not be changed as well. Changes to the internal schema may be
needed because some physical files were reorganized—for example, by creating additional access
structures—to improve the performance of retrieval or update.
Logical data independence in the three-schema architecture allows modifications to the conceptual
schema without affecting the external or internal schemas, preserving user interfaces and physical
storage mechanisms
These changes at the conceptual level can occur independently, ensuring that modifications do not
disrupt user applications or the physical storage and access mechanisms defined in the internal
schema. This separation of concerns facilitates system maintenance, evolution, and adaptability,
crucial for robust database management.
Database Languages and Interfaces
- Once the design of a database is complete and a DBMS is chosen to implement the
database, the first step is to specify conceptual and internal schemas for the database and
any mappings between the two
Other utilities may be available for sorting files, handling data compression, monitoring
access by users, interfacing with the network, and performing other functions.
Tools, Application Environments, and Communications Facilities
• CASE tools are used in the design phase of database systems
• Data dictionary (or data repository) system for storing catalog information about
schemas and constraints
• Information repository stores information such as design decisions, usage standards,
application program descriptions, and user information
• Application development environment systems provide an environment for
developing database applications, including database design, GUI development,
querying and updating, and application program development
• Communications software, whose function is to allow users at locations remote
from the database system site to access the database through computer terminals,
workstations, or personal computers. These are connected to the database site through
data communications hardware such as Internet routers, phone lines, long-haul
networks, local networks, or satellite communication devices. The integrated DBMS
and data communications system is called a DB/DC system
Centralized and client server Architectures for DBMS
Centralized DBMSs Architecture
• Earlier architectures used mainframe computers to provide the main
processing for all system functions
• These mainframes replaced by users with their terminals with PCs and
workstations
• DB systems used these computers similarly to how they had used display
terminals
• So that the DBMS itself was still a centralized DBMS in which all the DBMS
functionality, application program execution, and user interface processing
were carried out on one machine
The 2nd approach to two-tier client/server architecture was taken by some object-oriented DBMSs, where the
software modules of the DBMS were divided between client and server
The server level may include the part of the DBMS software responsible for handling data storage on disk pages,
local concurrency control and recovery, buffering and caching of disk pages, and other such functions.
the client level may handle the user interface, data dictionary functions, DBMS interactions with programming
language compilers, global query optimization, concurrency control, and recovery across multiple servers,
structuring of complex objects from the data in the buffers
The architectures described here are called two-tier architectures because the software components are distributed
over two systems: client and server. The advantages of this architecture: - simplicity and seamless compatibility
with existing systems
Three-Tier and n-Tier Architectures for Web Applications
• Many Web applications use an architecture called the three-tier architecture, which
adds an intermediate layer between the client and the database server
• This intermediate layer or middle tier is called the application server or the Web
server, depending on the application
• This server plays an intermediary role by running application programs and storing
business rules (procedures or constraints) that are used to access data from the
database server
• It can also improve database security by checking a client’s credentials before
forwarding a request to the database server
• Clients contain GUI interfaces and some additional application-specific business rules
• The intermediate server accepts requests from the client, processes the request and
sends database queries and commands to the database server, and then acts as a
conduit for passing (partially) processed data from the database server to the clients
• Thus, the user interface, application rules, and data access act as the three tiers
• The presentation layer displays information to the user and allows data entry
• The business logic layer handles intermediate rules and constraints before data is
passed up to the user or down to the DBMS
• The bottom layer includes all data management services. The middle layer can also
act as a Web server, which retrieves query results from the database server and
formats them into dynamic Web pages that are viewed by the Web browser at the
client side
• If business logic layer is divided into multiple layer, then called as n-tier architecture
Classification of DBMS
Data Model
- Used in commercial DBMS [eg: relational data model, object data model]
- Many legacy applications still run on database systems based on the hierarchical and
network data models
Number of users
- Single-user systems support only one user at a time and are mostly used with PCs
- Multiuser systems, which include the majority of DBMSs, support concurrent multiple
users
Number of sites
- Centralized DBMS : the data is stored at a single computer site
- Distributed DBMS [DDBMS] : DBMS software distributed over many sites
- Homogeneous DDBMSs use the same DBMS software at all the sites
- Heterogeneous DDBMSs can use different DBMS software at each site
Cost
-Open source like MYSQL & Postgre SQL
- 30 day copy versions
- Sold in form of licenses