DBMS
Introduction
Database - A database is a collection of related data. We say, here’s a database of
archeological artifacts.. Meaning, all things related to archeology are being kept there. It could
be physical or logical.
Data - By data, we mean known facts that can be recorded and that have implicit meaning.
We usually say, a database is a collection of records.
Implicit properties of a database:
a. It represents some aspect of the real world , sometimes called the miniworld or the
universe of discourse (UoD). Changes to the miniworld are reflected in the database.
b. It is a logically coherent collection of data with some inherent meaning. It is not a random
assortment of data.
c. It is designed and built for a specific purpose. It would have an intended group of users
and some preconceived applications.
It can be of any size and complexity.
It can be manual (library card) or computerized.
End users of a database can be administrators (DBA), or simple transactional users.
Database Management System (DBMS): This is a collection of programs that enables
administrators to create and maintain a database. It is a general purpose software system that
facilitates the processes of defining, constructing, manipulating, and sharing databases among
various users and applications.Tuning data, security and maintenance are other important
functions of a database management system.
Metadata - Data about data
Schema - A database schema is a structure that represents the logical storage of the data in a
database. It represents the organization of data and provides information about the relationships
between the tables in a given database.
Data dictionary - a component in RDBMS that stores a collection of names, definitions, and
attributes for data elements used in the database. It stores all information about relationships or
tables, from the schema and constraints used. All metadata is preserved. In general, metadata
refers to information about data. Thus, storing the connection scheme and other metadata in a
single structure called a data dictionary or system directory. A data dictionary is like an A-Z
dictionary of a relational database system that stores all the information about every relationship
in the database.
RDBMS: Relational DBMS
Users of a database:
a. DBA - Database administrators administer the database system, its related software.
They are responsible for authorizing access, coordinating and monitoring usage,
acquiring software and hardware as required, tuning the database from time to time, and
looking after security and response issues.
b. Database designers - they are responsible for identifying the data to be stored, choosing
the appropriate structures to represent and store the data, and building the database.
Nowadays, this work is done by the DBAs during the early stages of the database
development.
c. End users - the users who are implicitly accessing the database when carrying out a
transaction in the related application. Their actions typically involve querying, updating,
deleting and generating reports. The database primarily exists for their use.
d. System analysts/application programmers - These are software developers who build
the application that uses the database. They need to know all the functionalities, design
and data being provided by the DBMS.
File Systems: Files based recording system was the older system of organizing data.
DBMS vs File Systems
Disadvantages:
a. Data redundancy and inconsistency: Over long periods of time, different programmers
create different applications and files to store similar data. So the same information may
be duplicated in several places and files of different formats over time. This leads to
redundancy. Also, this may lead to inconsistency of data as older data may no longer be
valid in current times. Say, a changed customer address not reflecting in all features of
the application.
b. Difficulty in accessing data: In file based systems, it becomes complex to query data
based on various parameters. While designing the application, only certain reports as
mentioned in the requirements would be programmed for. But as the usage of the
application increases, one may need various other reports. Generating these reports
may become very complex and time consuming on file based systems.
c. Data Isolation: Because data is scattered in various files, and the files may be in
different formats, writing new applications to retrieve the same may not always be
possible.
d. Integrity problems: It is difficult to add new constraints in the system when laws/policies
change or get added.
e. Atomicity problems: Some of the transactions are atomic in nature. Like debit and
credit of money. If money transfer is taking place, both debit and credit should occur, or
neither occurs. It shouldn’t be that debit occurred, but credit didn’t occur. In file based
systems, suppose there was a failure during the transaction, it is very difficult to ensure
atomicity.
f. Concurrent access anomalies: Suppose two users are trying to access the same
record and make some changes, it is very important to handle concurrent access so that
there is no anomaly of data.
g. Security problems: Not all users should be able to access all data. Access to data
should be based on the user’s privileges. User should be able to view/modify only the
records that are authorized for the user.
These difficulties with file based systems prompted the development of database systems.
A typical DBMS has a client-server architecture. It has a Server module and a Client module.
Data Model, Architecture:
Data Abstraction - refers to the suppression of details of data organisation and storage, and
highlighting of the essential features for an improved understanding of data.
Physical level: the lowest level of abstraction describes how the data are actually stored. The
physical level describes complex low level data structures in detail. Integers, floats, strings..etc.
Logical level: The next level of abstraction is what data are stored in the db and their
relationship(s). Name, age, number of pages… etc.
VIew level: This is the highest level of abstraction. This is at application level. Here the details of
the data are hidden. The system may provide many views for the same database.
In DBMS, we deal with abstracted data.
Database schema is the description of the database. This is specified during database design,
and is not expected to change frequently. The schema may evolve over time.
A displayed schema is called a schema diagram. A schema diagram displays only some
aspects of a schema. Example:
STUDENT
Name Student_number Class Major
COURSE
Course_name Course_number Credit_hours Department
PREREQUISITE
Course_number Prerequisite_number
SECTION
Section_identifier Course_number Semester Year Instructor
GRADE_REPORT
Student_number Section_identifier Grade
Each object in the schema, like STUDENT, COURSE, is called a schema construct.
The actual data in a database may change quite frequently. The data in the database at a
particular moment in time is called a database state or snapshot.
When we define a new database, we specify its database schema only to the DBMS. At this
point, the corresponding database state is ‘empty state’ with no data. The initial state of the
database is obtained when the database is first populated or loaded with initial data.
3-Tiered or 3-Schema architecture
The goal of 3-schema architecture is to separate the user application and the physical
database. In this architecture, schemas can be defined at the following 3 levels:
a. Internal Schema - describes the physical storage structure of the database
b. Conceptual Schema - describes the structure of the whole database for a community of
users. It hides the details of the physical storage structures and concentrates on
describing the entities, data types, relationships, constraints, user operations. Usually a
representational model is used to describe the conceptual model.
c. External View - this includes a number of external schemas or user views. Each external
schema describes part of the database relevant to a user group, and hides the details of
the rest of the database
DBMS languages:
● Data Definition Language (DDL): Overall design of the database is called database
schema. A database schema is specified by a set of definitions that are expressed using
a DDL.
● Data Manipulation Language (DML): this language enables users to access or
manipulate data.
● Storage Definition Language (SDL)
● View Definition Language (VDL):
Data Model - A data model is a collection of concepts that can be used to describe the structure
of a database. By ‘structure of a database’, we mean the data types, relationships, and
constraints that should hold for the data. Most data models also include a set of basic
operations for specifying retrievals and updates on the database.
Types of data model:
● Entity Relationship model - E-R model is based on perception of the real world that
consists of a collection of basic objects called entities, and of relationships among these
objects.
An entity is a thing or an object in the real world that is distinguishable from other
objects. Entities are described in a database by a set of attributes. Example, Customer,
Account may be entities in a database.
A relationship is an association among several entities. A depositor relationship
associates a customer with each account.
The overall logical structure (schema) can be expressed graphically by an E-R diagram.
● Relational data model - A relational data model uses a collection of tables to represent
both data and the relationships among those data. Each table has multiple columns,
each column has a unique name. It is an example of a record based model. This is the
most widely used data model.
● Object oriented data model
● Network model
● Hierarchical model
● Physical model
Relational model:
A relational database consists of a collection of tables, each of which is assigned a unique
name. A row in a table represents a relationship among a set of values. Since a table is a
collection of such relationships, there is a close correspondence between the concept of a table
and the mathematical concept of relation. Example,
ACCOUNT:
Account_number branch_name balance
Any row must consist of 3-tuple (v1, v2, v3); v1 belongs to domain D1, v2 in D2, v3 in D3.
account is a subset of D1 x D2 x D3 : cartesian product of a list of domains.
Keys
We must have a way to establish how entities within an entity set are distinguished. The values
of the attributes should be such that they are uniquely identifiable. No two entities in an entity
set are allowed to have exactly the same value for all attributes.
Super key:
Candidate key
Primary key
Composite key
SQL relational algebra