DBMS Unit 1
DBMS Unit 1
Rajam, AP REV.: 00
(An Autonomous Institution Affiliated to JNTUGV, AP)
1. Objective
To impact the knowledge of the Database management system and its domain.
5. Evocation
6. Deliverables
Lecture-1
Introduction to DBMS:
A database-management system (DBMS) is a collection of interrelated data and a Set of
programs to access those data. The collection of data, usually referred to as the Database,
contains information relevant to an enterprise. The primary goal of a DBMS Is to provide a
way to store and retrieve database information that is both convenient and efficient. Database
systems are designed to manage large bodies of information. Management of data involves
both defining structures for storage of information and providing mechanisms for the
manipulation of information. In addition, the database system must ensure the safety of the
information stored, despite system crashes or attempts at unauthorized access. If data are to
be shared among several users, the system must avoid possible anomalous results. Because
information is so important in most organizations, computer scientists have developed a large
body of concepts and techniques for managing data. These concepts and technique form the
focus of this book. This chapter brie .y introduces the principles of database systems.
System programmers wrote these application programs to meet the needs of the bank. New
application programs are added to the system as the need arises. For example, suppose that
the savings bank decides to offer checking accounts. As a result, the bank creates new
permanent .les that contain information about all the checking accounts maintained in the
bank, and it may have to write new application programs to deal with situations that do not
arise in savings accounts, such as overdrafts. Thus, as time goes by, the system acquires more
.les and more application programs. This typical file-processing system is supported by a
conventional operating system. The system stores permanent records in various files, and it
needs different application programs to extract records from, and add records to, the
appropriate files.
Before database management systems (DBMSs) came along, organizations usually stored
information in such systems. Keeping organizational information in a file-processing system
has a number of major disadvantages.
Data redundancy and inconsistency: Since different programmers create the files and
application programs over a long period, the various .les are likely to have different formats
and the programs may be written in several programming languages. Moreover, the same
information may be duplicated in several places (files).For example, the address and
telephone number of a particular customer may appear in a file that consists of savings-
account records and in a file that consists of checking-account records. This redundancy leads
to higher storage and access cost. In addition, it may lead to data inconsistency.
Data inconsistency: The various copies of the same data may no longer agree. For example, a
changed customer address may be re reflected in savings-account Records but not elsewhere
in the system.
Difficulty in accessing data: Suppose that one of the banks of officers needs to find out the
names of all customers who live within a particular postal-code area. The officer asks the data
processing department to generate such a list. Because the designers of the original system
did not anticipate this request, there is no application program on hand to meet it. There is,
however, an application program to generate the list of all customers. The bank of officer has
now two choices: either obtains the list of all customers and extracts the needed information
manually or ask a system programmer to write the necessary application program. Both
alternatives are obviously unsatisfactory.
Suppose that such a program is written, and that, several days later, the same of officer Needs
to trim that list to include only those customers who have an account Balance of $10,000 or
more. As expected, a program to generate such a list does not exist.
Again, the officer has the preceding two options, neither of which is satisfactory. The point
here is that conventional .le-processing environments do not allow needed data to be
retrieved in a convenient and efficient manner. More responsive data-retrieval systems are
required for general use.
Data isolation: Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
Integrity problems: The data values stored in the database must satisfy certain types of
consistency constraints .For example, the balance of a bank account may never fall below a
prescribed amount (say, $25). Developers enforce these constraints in the system by adding
appropriate code in the various application programs. However, when new constraints are
added, it is difficult to change the programs to enforce them. The problem is compounded
when Constraints involve several data items from different files.
Atomicity problems: A computer system, like any other mechanical or electrical device, is
subject to failure. In many applications, it is crucial that, if a failure occurs, the data be
restored to the consistent state that existed prior to the failure. Consider a program to
transfer $50 from account A to account B. If a system failure occurs during the execution of
the program, it is possible that the $50 was removed from account A but was not credited to
account B, resulting in an inconsistent database state. Clearly, it is essential to database
consistency that either both the credit and debit occur, or that neither occur. That is, the funds
transfer must be atomic .it must happen in its entirety or not at all. It is dif .cult to ensure
atomicity in a conventional file processing system.
Concurrent-access anomalies: For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. In such an
environment, interaction of concurrent updates may result in inconsistent data.
Security problems: Not every user of the database system should be able to access all the
data. For example, in a banking system, payroll personnel need to see only that part of the
database that has information about the various bank employees. They do not need access to
information about customer accounts. But, since application programs are added to the
system in an ad hoc Manner, enforcing such security constraints is difficult.
These difficulties, among others, prompted the development of database systems. In what
follows, we shall see the concepts and algorithms that enable database systems to solve the
problems with file-processing systems. In most of this book, we use a bank enterprise as a
running example of a typical data-processing application found in a corporation.
Instances and Schemas Databases change over time as information is inserted and deleted.
The collection of information stored in the database at a particular moment is called an
instance of the database. The overall design of the database is called the database schema.
Schemas are changed infrequently, if at all. The concept of database schemas and instances
can be understood by analogy to a program written in a programming language.
A database schema corresponds to the variable declarations (along with associated type
definitions) in a program. Each variable has a particular value at a given instant. The values of
the variables in a program at a point in time correspond to an instance of a database schema.
Lecture-2
Abstraction is one of the main features of database systems. Hiding irrelevant details from
user and providing abstract view of data to users, helps in easy and efficient user-database
interaction.
Data Abstraction:
The process of hiding irrelevant information at each level of a database is known as data
abstraction. Data abstraction in DBMS is very helpful in dealing with the complex database
system because it breaks the problem into sub problems which makes it easy to manage.
Example: We use Google daily but we have no ideas of its data storage. The information like
how and where Google stores its data is irrelevant for us that's why the information is hidden
from us. This is known as data abstraction.
Three levels of Data abstraction:
Physical Level:
This is the first or lowest level of abstraction
It defines how data is stored in the system memory
It tells the actual location of the data that is being stored by the user
The Database Administrator (DBA) manages the physical level. DBA decides certain
things like the drive where the data will be actually stored in the system and whether
the storage will be centralized or decentralized.
It totally depends on the DBA how she/he manages the database at the physical level
The access modes like sequential or random access, file organization methods like B+
tree and indexing and hashing are implemented at this level.
Logical Level or Conceptual Level:
This level describes what data stored in the database and what relationship exists
among these data.
In simple words, we create the blueprint of the database at the logical level.
Logical level describes structure of entire database
DBA use logical level for abstraction purpose.
Example:
We have to store the data of the student the columns in the student table will be the
stu_name, stu_age, stu_rno, stu_mail.
Though the data stored in the database the structure of the tables like student table,
teacher table, book table etc. are defined in the logical level.
And also how the tables are related to each other are defined here
View level:
Highest level of data abstraction. This level describes the user interaction with
database system.
Different views of same database can be created for user to interact with the database
for user friendly approach
The application program (which general users use) tries to view that data according to
the user role. We hide the data from a view that is irrelevant to them. This is easier to
understand with an example.
Example:
We have login id and password in university system students can only view their score,
courses, attendance and other details that are relevant to them. Students cannot view the
teacher's salary, can’t edit the marks and can’t update the attendance because the data is
irrelevant to them. But teachers can view each and every detail of the students as well as their
own data.
Here we create two separate views. One for the students and the other one for the teachers
with the appropriate set of data. By doing so security of the system also increases.
Data Independence:
If a database system is not multi-layered, then it becomes difficult to make any changes in the
database system. Database systems are designed in multi-layers as we learnt earlier. A
database system normally contains a lot of data in addition to users’ data. For example, it
stores data about data, known as metadata, to locate and retrieve data easily. It is rather
difficult to modify or update a set of metadata once it is stored in the database. But as a DBMS
expands, it needs to change over time to satisfy the requirements of the users. If the entire
data is dependent, it would become a tedious and highly complex job.
Metadata itself follows a layered architecture, so that when we change data at one layer, it
does not affect the data at another level. This data is independent but mapped to each other.
Logical Data Independence: Logical data is data about database, that is, it stores information
about how data is managed inside. For example, a table (relation) stored in the database and
all its constraints, applied on that relation.
Logical data independence is a kind of mechanism, which liberalizes itself from actual data
stored on the disk. If we do some changes on table format, it should not change the data
residing on the disk.
Physical Data Independence: All the schemas are logical, and the actual data is stored in bit
format on the disk. Physical data independence is the power to change the physical data
without impacting the schema or logical data.
For example, in case we want to change or upgrade the storage system itself − suppose we
want to replace hard-disks with SSD − it should not have any impact on the logical data or
schemas.
Lecture-3
Structure of
DBMS:
DBMS Defines:
DBMS Structure
Database Users:
Naive Users:
These users do not access the database directly. They access the database through some
application program. They do not write application program these are called unsophisticated
users.
For example, a bank teller who needs to transfer $50 from account A to account B invokes a
program called transfer. This program asks the teller for the amount of money to be
transferred, the account from which the money is to be transferred, and the account to which
the money is to be transferred.
Application Programmers:
Users who write and develop application programs by using diff tools
Rapid application development (RAD) tools are tools that enable an application programmer
to construct forms and reports without writing a program.
Sophisticated Users:
These do not require any interface to interact with the database
Interact with the system by making request in the form of query language.
These queries submitted to query processor
Database Administrator:
Coordinates all the activities of the database system.
Handle physical and logical level of database
Gives privileges to users
Specialized Users:
The sophisticated users who write special database application programs are called
specialized users. These write complex programs
Stand-alone Users: Those who are using database for personal usage
Query processor:
DML Compiler:
It processes the DML statements into low level instructions (machine Language)
DDL Interpreter:
It interprets the DDL statements and records them in a set of tables containing Meta data or
data dictionary.
Query evaluation engine
This engine will execute low-level instructions generated by the DML compiler
Compiler and Linker:
It processes and link DML statements embedded in an application program into procedural
calls.
Storage Manager:
Authorization and Integrity Manager:
It checks role based access control i.e. checks whether the particular person is privileged to
perform the requested operation or not.
Transaction Manager:
It ensures that the database remains in a consistent state despite the system failures and that
concurrent transaction execution proceeds without conflicting.
File Manager:
It manages the allocation of space on disk storage and the data structures used to represent
information stored on disk.
Buffer Manager:
It is responsible for fetching data from disk storage into main memory and deciding what data
to cache in memory.
Disk Storage
Data Files:
It stores the data
Data Dictionary:
It contains the Information about the structure of any database object
It is repository of information that governs the metadata
Indices:
It provides faster retrieval of data item
Statistical Data:
It stores statistical information about the data in the database. This information is used by
query processor to select efficient ways to execute query.
Database Architecture:
Database architecture is logically divided into two types.
Logical two-tier Client / Server architecture
Logical three-tier Client / Server architecture
Two-tier Client / Server Architecture
Two-tier Client / Server architecture is used for User Interface program and Application
Programs that runs on client side. An interface called ODBC(Open Database Connectivity)
provides an API that allow client side program to call the dbms. Most DBMS vendors
provide ODBC drivers. A client program may connect to several DBMS's. In this
architecture some variation of client is also possible for example in some DBMS's more
functionality is transferred to the client including data dictionary, optimization etc. Such
clients are called Data server.
Three-tier Client / Server database architecture is a commonly used architecture for web
applications. Intermediate layer called Application server or Web Server stores the web
connectivity software and the business logic(constraints) part of application used to
access the right amount of data from the database server. This layer acts like medium for
sending partially processed data between the database server and the client.
Lecture-4
Database
Model
A Database model defines the logical design of data. The model describes the
relationships between different parts of the data. Historically, in database design,
three models are commonly used. They are,
Hierarchical Model
Network Model
Relational Model
Object-Oriented Data Model.
ER-Model
Hierarchical Model
In this model each entity has only one parent, but can have several children. At the
top of hierarchy there is only one entity which is called Root.
Network Model
In the network model, entities are organized in a graph, in which some entities
can be accessed through several paths
Relational Model
In this model, data is organised in two-dimensional tables called relations. The tables or
relation arerelated to each other.
Object-Oriented Data Model.
The real-world problems are more closely represented through the object-oriented
data model.
In this model, both the data and relationship are present in a single structure known as
an object.
We can store audio, video, images, etc in the database which was not possible in the
relational model.
In this model, two are more objects are connected through links. We use this link to
relate one object to other objects.
Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model diagram.
In this model, we represent the real-world problem in the pictorial form to make it
easy for the stakeholders to understand.
It is also very easy for the developers to understand the system by just looking at the
ER diagram.
Lecture-5:
1. Lecture -6
Identifying Relationships
The next step is to determine the relationships between the entities and to determine the
cardinality of each relationship. The relationship is the connection between the entities, just
like in the real world: what does one entity do with the other, how do they relate to each
other? For example, customers buy products, products are sold to customers, a sale
comprises products, a sale happensin a shop.
The cardinality shows how much of one side of the relationship belongs to how much of the
other side of the relationship. First, you need to state for each relationship, how much of one
side belongs to exactly 1 of the other side. For example: How many customers belong to 1
sale?; How many sales belong to 1 customer?; How many sales take place in 1 shop?
Identifying Attributes
The data elements that you want to save for each entity are called 'attributes'.
About the products that you sell, you want to know, for example, what the price is, what the
name of the manufacturer is, and what the type number is. About the customers you know
their customer number, their name, and address. About the shops you know the location code,
the name, the address. Of the sales you know when they happened, in which shop, what
products were sold, and the sum total of the sale. Of the vendor you know his staff number,
name, and address. What will be included precisely is not of importance yet; it is still only
about what you want to save.
Attributes
• Entities are represented by means of their properties, called attributes.
• All attributes have values.
– For example, a student entity may have name, class, and age as attributes.
• There exists a domain or range of values that can be assigned to attributes.
– For example, a student's name cannot be a numeric value. It has to be alphabetic. A student's
age cannot be negative, etc.
• Simple attributes are those attributes which can not be divided further.
• Composite attributes are those attributes which are composed of many other simple
attributes.
• Single valued attributes are those attributes which can take only one value for a given entity
from an entity set.
• Multi valued attributes are those attributes which can take more than one value for a given
entity from an entity set.
• Derived attributes are those attributes which can be derived from other attribute(s).
• Key attributes are those attributes which can identify an entity uniquely in an entity set.
Lecture-7Entity Set
An entity set in DBMS is a set that collectively represents a group of entities of a similar
type.
Example: An entity set of cars, an entity set of bank accounts, etc. In DBMS, the whole table in
the tabular representation of data is an entity set, while each row inside this table is an entity.
Entity set of bank accounts.
Strong Entity
• The strong entity has a primary key. Weak entities are dependent on strong entity. Its
existence is not dependent on any other entity.
• Strong Entity is represented by a single rectangle
Weak Entity
• The weak entity in DBMS do not have a primary key and are dependent on the parent entity. It
mainly depends on other entities.
• Weak Entity is represented by double rectangle
• Strong entity are those entity types which has a key attribute.
• The primary key helps in identifying each entity uniquely.
• It is represented by a Single rectangle.
• In this example, Roll_no identifies each element of the table uniquely and hence, we can say
that STUDENT is a strong entity type.
Lecture8 Relationship:
• The association among entities is called a relationship.
• For example, ‘Enrolled in’ is a relationship that exists between entities Student and Course.
Relationship Set:
A set of relationships of same type is known as relationship set. The following relationship
set depicts S1 is enrolled in C2, S2 is enrolled in C1 and S3 is enrolled in C3.
Types:
A Recursive relationship is nothing but, simply an entity is having a relationship with self
Cardinality Ratio:
The number of times an entity of an entity set participates in a relationship set is
known as cardinality. Cardinality can be of different types:
One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known as
one to one relationship.
For example, A female can marry to one male, and a male can marry to one female.
One-to-many relationship:
When only one instance of the entity on the left, and more than one instance of an entity on
the right associates with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only
specific scientist.
Many-to-one relationship:
When more than one instance of the entity on the left, and only one instance of an entity on
the right associates with the relationship then it is known as a many-to-one relationship.
Many-to-many relationship:
When more than one instance of the entity on the left, and more than one instance of an entity
on the right associates with the relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
Participation constraints:
Participation constraints define the least number of relationship instances in which an entity
must compulsorily participate.
Total Participation:
It specifies that each entity in the entity set must compulsorily participate in at least one
relationship instance in that relationship set.
That is why, it is also called as mandatory participation.
Total participation is represented using a double line between the entity set and relationship
set.
Example:
– Double line between the entity set “Student” and relationship set “Enrolled in” signifies total
participation.
– It specifies that each student must be enrolled in at least one course.
Partial participation:
It specifies that each entity in the entity set may or may not participate in the relationship
instance in that relationship set.
That is why, it is also called as optional participation.
Partial participation is represented using a single line between the entity set and relationship
set.
Example:
• Single line between the entity set “Course” and relationship set “Enrolled in” signifies partial
participation.
• It specifies that there might exist some courses for which no enrollments are made.
Lecture-9
The entity-relationship (E-R) data model is based on a perception of a real world that consists
of a collection of basic objects, called entities, and of relationships among these objects.
An entity is a “” or “” in the real world that is distinguishable from other objects. For example,
each person is an entity, and bank accounts can be considered as entities.
Entities are described in a database by a set of attributes.
For example, the attributes account-number and balance may describe one particular account
in a bank, and they form attributes of the account entity set. Similarly, attributes customer-
name, customer-street address and customer-city may describe a customer entity.
An extra attribute customer-id is used to uniquely identify customers (since it may be possible
to have two customers with the same name, street address, and city). A unique customer
identifier must be assigned to each customer. In the United States, many enterprises use the
social-security number of a person (a unique number the U.S. government assigns to every
person in the United States) as a customer identifier.
A relationship is an association among several entities. For example, a depositor relationship
associates a customer with each account that she has.
The set of all entities of the same type and the set of all relationships of the same type are
termed an entity set and relationship set, respectively.
The overall logical structure (schema)of a database can be expressed graphically by an E-R
diagram which is built up from the following components:
_ Lines: which link attributes to entity sets and entity sets to relationships
Lecture-10
Features of ER Model:
Specialization –
In specialization, an entity is divided into sub-entities based on their characteristics. It is a
top-down approach where higher level entity is specialized into two or more lower level
entities. For Example, EMPLOYEE entity in an Employee management system can be
specialized into DEVELOPER, TESTER etc. as shown in Figure 2. In this case, common
attributes like E_NAME, E_SAL etc. become part of higher entity (EMPLOYEE) and
specialized attributes like TES_TYPE become part of specialized entity
Aggregation –
An ER diagram is not capable of representing relationship between an entity and a
relationship which may be required in some scenarios. In those cases, a relationship with
its corresponding entities is aggregated into a higher level entity. Aggregation is an
abstraction through which we can represent relationships as higher level entity sets.
For Example, Employee working for a project may require some machinery. So, REQUIRE
relationship is needed between relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities EMPLOYEE and PROJECT is
aggregated into single entity and relationship REQUIRES is created between aggregated
entity and MACHINERY.
7. Keywords
DBMS
Data Abstraction
Instance
Database Administrator
Data Models
8. Sample Questions
Remember:
1. Define DBMS.
2. List 5 differences between database systems and file systems
3. Define data abstraction
4. What is Data Independence
5. What are the advantages of DBMS
Understand:
1. Explain Database Management Systems.
2. Identify various applications of Database Management systems?
3. Compare database systems and file systems
4. Identify the major difference between database systems and file systems
5. Describe structure of DBMS
6. Illustrate different levels of abstraction
At the end of this session, the facilitator (Teacher) shall randomly pick-up few students to
summarize the deliverables.
All these basics are highly required to interact with the DBMS
---------------