DBMS 2units
DBMS 2units
C-5048
DATABASE MANAGEMENT SYSTEM
Department of COMPUTER SCIENCE
STUDENT NAME :
HALLTICKET NO :
GROUP &YEAR : B.Sc (MPCs) – II Year – IV Sem
In computerized information system data is the basic resource of the organization. So, proper organization and
management for data is required fro organization to run smoothly. Database management system deals the
knowledge of how data stored and managed on a computerized information system. In any organization, it
requires accurate and reliable data for better decision making, ensuring privacy of d ata and controlling data
efficiently.
The examples include deposit and/or withdrawal from a bank,hotel,airline or railway reservation, purchase items
from supermarkets in all cases, a database is accessed.
DATA:
Data is the known raw facts or figures that have no meaning. Data is always interpreted, by a human or machine,
to derive meaning. So, data is meaningless. Data can be simple at the same time unorganized unless it is
organized. Data can be represented in alphabets(A-Z, a-z) or digits(0-9) and using special characters(+,-.#,$, etc)
e.g: 25, “ajit” etc.
INFORMATION:
Information is the processed data on which decisions and actions are based. Information can be defined as the
organized and classified data to provide meaningful values.
Eg: “The age of Ravi is 25”
DATA PROCESSING:
Data processing therefore refers to the process of transforming raw data into meaningful output i.e. information.
Data processing can be done manually using pen and paper. Mechanically using simple devices like typewriters
or electronically using modern data processing tools such as computers. It refers to the sequence of activities
involved in data transformation from its row form to information. it is often referred to as cycle because the
output obtained can be stored after processing and may be used in future as input. The four main stages of data
processing cycle are:
Data collection
Data input
Data processing
Data output
META DATA:
Metadata is data that describes other data. Metadata represents data about data.
File:
File is a collection of related data stored in secondary memory.
P.vVENKATESH PAGE 1
DATABASE MANAGEMENT SYSTEM
DATABASE:
A database is a collection of data, usually stored in electronic form. A database is typically designed so that it is
easy to store and access information.
A good database is crucial to any company or organisation. This is because the database stores all the pertinent
details about the company such as employee records, transactional records, salary details etc.
3. Integrity Problems:
Developers enforce data validation in the system by adding appropriate code in the various application program.
How ever when new constraints are added, it is difficult to change the programs to enforce them.
4. Atomicity:
It is difficult to ensure atomicity in a file processing system when transaction failure occurs due to power failure,
networking problems etc.
5. Concurrent access:
In the file processing system it is not possible to access a same file for transaction at same time of different users.
6. Security problems:
There is no security provided in file processing system to secure the data from unauthorized user access.
Advantages of DBMS:
Controls database redundancy: It can control data redundancy because it stores all the data in one single
database file and that recorded data is placed in the database.
Data sharing: In DBMS, the authorized users of an organization can share the data among multiple users.
Easily Maintenance: It can be easily maintainable due to the centralized nature of the database system.
Reduce time: It reduces development time and maintenance need.
Backup: It provides backup and recovery subsystems which create automatic backup of data from hardware and
software failures and restores the data if required.
multiple user interface: It provides different types of user interfaces like graphical user interfaces, application
program interfaces
Applications of DBMS
Railway Reservation System:
Database is required to keep record of ticket booking, train’s departure and arrival status. Also if trains get late
then people get to know it through database update.
Banking:
We make thousands of transactions through banks daily and we can do this without going to the bank. So how
banking has become so easy that by sitting at home we can send or get money through banks. That is all possible
just because of DBMS that manages all the bank transactions.
Telecommunications:
Any telecommunication company cannot even think about their business without DBMS. DBMS is must for
these companies to store the call details and monthly post paid bills.
Finance:
Those days have gone far when information related to money was stored in registers and files. Today the time
has totally changed because there are lots f thing to do with finance like storing sales, holding information and
finance statement management etc.
Military:
Military keeps records of millions of soldiers and it has millions of files that should be keep secured and safe. As
DBMS provides a big security assurance to the military information so it is widely used in militaries. One can
easily search for all the information about anyone within seconds with the help of DBMS.
Online Shopping:
Online shopping has become a big trend of these days. No one wants to go to shops and waste his time.
Everyone wants to shop from home. So all these products are added and sold only with the help of DBMS.
Purchase information, invoice bills and payment, all of these are done with the help of DBMS.
Manufacturing:
Manufacturing companies make products and sales them on the daily basis. To keep records of all the details
about the products like quantity, bills, purchase, supply chain management, DBMS is used.
DBMS Architecture:
The DBMS design depends upon its architecture. The basic client/server architecture is used to deal
with a large number of PCs, web servers, database servers and other components that are connected
with networks.
The client/server architecture consists of many PCs and a workstation which are connected via the
network.
DBMS architecture depends upon how users are connected to the database to get their request done.
Database architecture can be seen as a single tier or multi-tier. But logically, database architecture is of two types
like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
In this architecture, the database is directly available to the user. It means the user can directly sit on the
DBMS and uses it.
Any changes done here will directly be done on the database itself. It doesn't provide a handy tool for end
users.
The 1-Tier architecture is used for development of the local application, where programmers can directly
communicate with the database for the quick response.
2-Tier Architecture
The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the
client end can directly communicate with the database at the server side. For this interaction, API's like:
ODBC, JDBC are used.
The user interfaces and application programs are run on the client-side.
The server side is responsible to provide the functionalities like: query processing and transaction
management.
To communicate with the DBMS, client-side application establishes a connection with the server side.
3-Tier Architecture
The 3-Tier architecture contains another layer between the client and server. In this architecture, client
can't directly communicate with the server.
The application on the client-end interacts with an application server which further communicates with the
database system.
End user has no idea about the existence of the database beyond the application server. The database also
has no idea about any other user beyond the application.
The 3-Tier architecture is used in case of large web application.
A database management system that provides three level of data is said to follow three -level architecture .
External level
Conceptual level
Internal level
External level:
The external level is at the highest level of database abstraction . At this level, there will be many views
define for different users requirement. A view will describe only a subset of the database. Any number of
user views may exist for a given global or subschema.
for example , each student has different view of the time table. the view of a student is different from the
view of the user. Thus this level of abstraction is concerned with different categories of users. Each
external view is described by means of a schema.
Conceptual level:
At this level of database abstraction all the database entities and the relationships among them are included.
One conceptual view represents the entire database . This conceptual view is defined by the conceptual
schema. The conceptual schema hides the details of physical storage structures and concentrate on describing
entities , data types, relationships, user operations and constraints.
It describes all the records and relationships included in the conceptual v iew . There is only
one conceptual schema per database. It includes feature that specify the checks to relation data consistency
and integrity.
Internal level:
It is the lowest level of abstraction closest to the physical storage method used . It indicates how
the data will be stored and describes the data structures and access methods to be used by the database .
The internal view is expressed by internal schema. The following aspects are considered at this level:
1. Storage allocation e.g: B-tree, hashing
2. access paths e.g. specification of primary and secondary keys, indexes etc
3. Miscellaneous e.g. Data compression and encryption techniques, optimization of the internal
structures.
DATA MODELS
Underlying the structure of a database is the data model: a collection of conceptual tools for describing
data, data relationships, data semantics, and consistency constraints. A data model provides a way to
describe the design of a database at the physical, logical, and view levels.
Data Models are fundamental entities to introduce abstraction in a DBMS. Data models define how data is
connected to each other and how they are processed and stored inside the system. Earlier data models were not so
scientific, hence they have lots of duplication and update anomalies.
A Database model defines the logical design and structure of a database and defines how data will be stored,
accessed and updated in a database management system. While the Relational Model is the most widely used
database model. The data models can be classified into four different categories:
Hierarchical Model
Network Model
Entity-relationship Model
Relational Model
Hierarchical Model
This database model organises data into a tree-like-structure, with a single root, to which all the other data is
linked. The heirarchy starts from the Root data, and expands like a tree, adding child nodes to the parent nodes. In
this model, a child node will only have a single parent node. This model efficiently describes many real-world
relationships like index of a book, recipes etc.
In hierarchical model, data is organised into tree-like structure with one-to-many relationship between two
different types of data, for example, one department can have many courses, many professors and of-course many
students.
Network Model
This is an extension of the Hierarchical model. In this model data is organised more like a graph, and are allowed
to have more than one parent node. In this database model data is more related as more relationships are
established in this database model. Also, as the data is more related, hence accessing the data is also easier and
fast. This database model was used to map many-to-many data relationships. This was the most widely used
database model, before Relational Model was introduced.
Entity-relationship Model
In this database model, relationships are created by dividing object of interest into entity and its characteristics
into attributes. Different entities are related using relationships. E-R Models are defined to represent the
relationships into pictorial form to make it easier for different stakeholders to understand. This model is good to
design a database, which can then be turned into tables in relational model(explained below).
Let's take an example, If we have to design a School Database, then Student will an entity with attributes be
name, age, address etc. As Address is generally complex, it can another entity with attributes street name, be
pincode, city etc, and there will be a relationship between them.
Relational Model
In this model, data is organised in two-dimensional tables and the relationship is maintained by storing a common
field. This model was introduced by E.F Codd in 1970, and since then it has been the most widely used database
model, infact, we can say the only database model used around the world.
The basic structure of data in the relational model is tables. All the information related to a particular type is
stored in rows of that table. Hence, tables are also known as relations in relational model.
Databases are stored in file formats, which contain records. At physical level, the actual data is stored in
electromagnetic format on some device. A database system provides an ultimate view of the stored data.
However, data in the form of bits, bytes get stored in different storage devices.
For storing the data, there are different types of storage options available. These storage types differ from one
another as per the speed and accessibility. There are the following types of storage devices used for storing the
data:
1. Primary Storage
2. Secondary Storage
3. Tertiary Storage
Primary Storage
It is the primary area that offers quick access to the stored data. We also know the primary storage as volatile
storage. It is because this type of memory does not permanently store the data. As soon as the system leads to a
power cut or a crash, the data also get lost. Main memory and cache are the types of primary storage.
Main Memory: It is the one that is responsible for operating the data that is available by the storage
medium. The main memory handles each instruction of a computer machine. This type of memory can
store gigabytes of data on a system but is small enough to carry the entire database. At last, the main
memory loses the whole content if the system shuts down because of power failure or other reasons.
Cache: It is one of the costly storage media. On the other hand, it is the fastest one. A cache is a tiny
storage media which is maintained by the computer hardware usually. While design ing the algorithms and
query processors for the data structures, the designers keep concern on the cache effects.
Secondary Storage:
Secondary storage is also called as Online storage. It is the storage area that allows the user to save and store
data permanently. This type of memory does not lose the data due to any power failure or system crash. That's
why we also call it non-volatile storage. There are some commonly described secondary storage media which are
available in almost every type of computer system:
Flash Memory: A flash memory stores data in USB (Universal Serial Bus) keys which are further
plugged into the USB slots of a computer system. These USB keys help transfer data to a computer
system, but it varies in size limits. Unlike the main memory, it is possible to get back the stored data
which may be lost due to a power cut or other reasons. This type of memory storage is most commonly
used in the server systems for caching the frequently used data. This leads the systems towards high
performance and is capable of storing large amounts of databases than the main memory.
Magnetic Disk Storage: This type of storage media is also known as online storage media. A magnetic
disk is used for storing the data for a long time. It is capable of storing an entire database. It is the
responsibility of the computer system to make availability of the data from a disk to the main memory for
further accessing. Also, if the system performs any operation over the data, the modified data should be
written back to the disk. The tremendous capability of a magnetic disk is that it does not affect the data
due to a system crash or failure, but a disk failure can easily ruin as well as destroy the stored data.
Tertiary Storage:
It is the storage type that is external from the computer system. It has the slowest speed. But it is capable of
storing a large amount of data. It is also known as Offline storage. Tertiary storage is generally used for data
backup. There are following tertiary storage devices available:
Optical Storage: An optical storage can store megabytes or gigabytes of data. A Compact Disk (CD) can
store 700 megabytes of data with a playtime of around 80 minutes. On the other hand, a Digital Video
Disk or a DVD can store 4.7 or 8.5 gigabytes of data on each side of the disk.
Tape Storage: It is the cheapest storage medium than disks. Generally, tapes are used for archiving or
backing up the data. It provides slow access to data as it accesses data sequentially from the start. Thus,
tape storage is also known as sequential-access storage. Disk storage is known as direct-access storage as
we can directly access the data from any location on disk.
Transactions in DBMS:
Transactions are a set of operations used to perform a logical set of work. A transact ion usually means
that the data in the database has changed. One of the major uses of DBMS is to protect the user ’s data from
system failures. It is done by ensuring that all the data is restored to a consistent state when the computer is
restarted after a crash. The transaction is any one execution of the user program in a DBMS. Executing the same
program multiple times will generate multiple transactions.
Example:
Transaction to be performed to withdraw cash from an ATM vestibule.
Example -ATM transaction steps.
Transaction Start.
Insert your ATM card.
Select language for your transaction.
Select Savings Account option.
Enter the amount you want to withdraw.
Enter your secret pin.
Wait for some time for processing.
Collect your Cash.
Trasaction Completed.
Components of DBMS
A database environment is a collective system of components that comprise and regulates the group of data,
management, and use of data, which consist of software, hardware, people, techniques of handling database, and
the data also
The database management system can be divided into five major components, they are:
1. Hardware
2. Software
3. Data
4. Users
Let's have a simple diagram to see how they all fit together to form a database management system.
Hardware
The hardware in a database environment means the computers and computer peripherals that are being used to
manage a database,
When we say Hardware, we mean computer, hard disks, I/O channels for data, and any other physical component
involved before any data is successfully stored into the memory.When we run Oracle or MySQL on our personal
computer, then our computer's Hard Disk, our Keyboard using which we type in all the commands, our
computer's RAM, ROM all become a part of the DBMS hardware.
Software
And the software means the whole thing right from the operating system (OS) to the application programs that
include database management software like M.S. Access or SQL Server.
This is the main component, as this is the program which controls everything. The DBMS software is more like a
wrapper around the physical database, which provides us with an easy-to-use interface to store, access and update
data.The DBMS software is capable of understanding the Database Access Language and intrepret it into actual
database commands to execute them on the DB.
Data
Data is that resource, for which DBMS was designed. The motive behind the creation of DBMS was to store and
utilise data.The techniques are the rules, concepts, and instructions given to both the people and the software
along with the data with the group of facts and information positioned within the database environment.
In a typical Database, the user saved Data is present and meta data is stored.Metadata is data about the data.
This is information stored by the DBMS to better understand the data stored in it.
For example: When I store my Name in a database, the DBMS will store when the name was stored in the
database, what is the size of the name, is it stored as related data to some other data, or is it independent, all this
information is metadata.
Users
Again the people in a database environment include those people who administrate and use the system.
End User: These days all the modern applications, web or mobile, store user data. How do you think they
do it? Yes, applications are programmed in such a way that they collect user data and store the data on
DBMS systems running on their server. End users are the one who store, retrieve, update and delete data.
Database Administrators: Database Administrator or DBA is the one who manages the complete
database management system. DBA takes care of the security of the DBMS, it's availability, managing the
license keys, managing user accounts and access etc.
DA (Data Administrator) and DBA (Database Administrator) both are responsible for managing
database for an organization. They differ from each other in their required skills and responsibilities.
"Person in the organization who controls the data of the database refers data administrator."
DA determines what data to be stored in database based on requirement of the organization.
DA works on such as requirements gathering, analysis, and design phases.
DA does not to be a technical person, any kind of knowledge about database technology can be more
beneficiary
DA is some senior level person in the organization. in short, DA is a business focused person but should
understand about the database technology.
"Person in the organization who controls the design and the use of the database refers database
administrator."
DBA provides necessary technical support for implementing a database.
DBA works on such as design, development , testing, and operational phases.
DBA is a technical person having knowledge of database technology.
DBA does not need to be a business person. in short, DBA is a technically focused person but should
understand about the business to administrator the database effectively.
A database administrator's (DBA) primary job is to ensure that data is available, protected from loss and
corruption, and easily accessible as needed. Below are some of the chief responsibilities that make up the day -to-
day work of a DBA. the role of the DBA is not that different.
1. Software installation and Maintenance
A DBA often collaborates on the initial installation and configuration of a new Oracle, SQL Server etc database.
The system administrator sets up hardware and deploys the operating system for the database server, then the
DBA installs the database software and configures it for use. As updates and patches are required, the DBA
handles this on-going maintenance.
And if a new server is needed, the DBA handles the transfer of data from the existing system to the new platform.
2. Data Extraction, Transformation, and Loading
Known as ETL, data extraction, transformation, and loading refers to efficiently importing large volumes of data
that have been extracted from multiple systems into a data warehouse environment.
This external data is cleaned up and transformed to fit the desired format so that it can be imported into a central
repository.
3. Specialised Data Handling
Today’s databases can be massive and may contain unstructured data types such as images, documents, or sound
and video files. Managing a very large database (VLDB) may require higher-level skills and additional
monitoring and tuning to maintain efficiency.
4. Database Backup and Recovery
DBAs create backup and recovery plans and procedures based on industry best practices, then make sure that the
necessary steps are followed. Backups cost time and money, so the DBA may have to persuade management to
take necessary precautions to preserve data.
System admins or other personnel may actually create the backups, but it is the DBA’s responsibility to make sure
that everything is done on schedule.
In the case of a server failure or other form of data loss, the DBA will use existing backups to restore lost
information to the system. Different types of failures may require different recovery strategies, and the DBA must
be prepared for any eventuality. With technology change, it is becoming ever more typical for a DBA to backup
databases to the cloud, Oracle Cloud for Oracle Databases and MS Azure for SQL Server.
5. Security
A DBA needs to know potential weaknesses of the database software and the company’s overall system and work
to minimise risks. No system is one hundred per cent immune to attacks, but implementing best practices can
minimise risks.
In the case of a security breach or irregularity, the DBA can consult audit logs to see who has done what to the
data. Audit trails are also important when working with regulated data.
6. Authentication
Setting up employee access is an important aspect of database security. DBAs control who has access and what
type of access they are allowed. For instance, a user may have permission to see only certain pieces of
information, or they may be denied the ability to make changes to the system.
7. Capacity Planning
The DBA needs to know how large the database currently is and how fast it is growing in order to make
predictions about future needs. Storage refers to how much room the database takes up in server and backup
space. Capacity refers to usage level.
If the company is growing quickly and adding many new users, the DBA will have to create the capacity to
handle the extra workload.
8. Performance Monitoring
Monitoring databases for performance issues is part of the on -going system maintenance a DBA performs. If
some part of the system is slowing down processing, the DBA may need to make configuration changes to the
software or add additional hardware capacity. Many types of monitoring tools are available, and part of the
DBA’s job is to understand what they need to track to improve the system. 3rd party organisations can be ideal for
outsourcing this aspect, but make sure they offer modern DBA support.
9. Database Tuning
Performance monitoring shows where the database should be tweaked to operate as efficiently as possible. The
physical configuration, the way the database is indexed, and how queries are handled can all have a dramatic
effect on database performance.
With effective monitoring, it is possible to proactively tune a system based on application and usage instead of
waiting until a problem develops.
10. Troubleshooting
DBAs are on call for troubleshooting in case of any problems. Whether they need to quickly restore lost dat a or
correct an issue to minimise damage, a DBA needs to quickly understand and respond to problems when they
occur.
Relational data model is the primary data model, which is used widely around the world for data storage and
processing. This model is simple and it has all the properties and capabilities required to process data with storage
efficiency. Relational Model (RM) represents the database as a collection of relations. A relation is nothing but a
table of values. Every row in the table represents a collection of related data values. These rows in the table
denote a real-world entity or relationship.
The table name and column names are helpful to interpret the meaning of values in each row. The data are
represented as a set of relations. In the relational model, data are stored as tables. However, the physical storage
of the data is independent of the way the data are logically organized.
1. Attribute: Each column in a Table. Attributes are the properties which define a relation. e.g.,
Student_Rollno, NAME,etc.
2. Tables – In the Relational model the, relations are saved in the table format. It is stored along with its
entities. A table has two properties rows and columns. Rows represent records and columns represent
attributes.
3. Tuple – It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with its attributes.
5. Degree: The total number of attributes which in the relation is called the degree of the relation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system. Relation instances
never have duplicate tuples.
9. Relation key - Every row has one, two or multiple attributes, which is called relation key.
10. Attribute domain – Every attribute has some pre-defined value and scope which is known as attribute
domain
The term data integrity refers to the accuracy and consistency of data.
Every relation has some conditions that must hold for it to be a valid relation. These conditions are called
Relational Integrity Constraints.
constraints are a set of rules. It is used to maintain the quality of information.
Integrity constraints ensure that the data insertion, updating, and other processes have to be performed in such
a way that data integrity is not affected.
Thus, integrity constraint is used to guard against accidental damage to the database.
in relational DBMS, we primarily have four types of integrity constraints, namely,
1. Domain Integrity Constraint
2. Entity Integrity Constraint
3. Key Constraints
4. Referential Integrity Constraint
1. Domain constraints
Domain constraints can be defined as the definition of a valid set of values for an attribute.
The data type of domain includes string, character, integer, time, date, currency, etc. The value of the
attribute must be available in the corresponding domain.
Attributes have specific values in real-world scenario. For example, age can only be a positive integer.
Every attribute is bound to have a specific range of values. For example, age cannot be less than zero
and telephone numbers cannot contain a digit outside 0-9.
Example:
Example:
3. Key constraints
Keys are the entity set that is used to identify an entity within its entity set uniquely.
There must be at least one minimal subset of attributes in the relation, which can identify a tuple
uniquely.
This minimal subset of attributes is called key for that relation. If there are more than one such
minimal subsets, these are called candidate keys.
An entity set can have multiple keys, but out of which one key will be the primary key. A primary key
can contain a unique and null value in the relational table.
Example:
Example:
DATABASE SCHEMA:
The design of the database is called a schema. This tells us about the structural view of the database. It gives us an
overall description of the database. A database schema defines how the data is organised using the schema
diagram. A schema diagram is a diagram which contains entities and the attributes that will define that schema. A
schema diagram only shows us the database design. It does not show the actual data of the database. Schema can
be a single table or it can have more than one table which is related.
The schema represents the relationship between these tables. Relational database schema defines the
design and structure of the relation like it consists of the relation name, set of attributes/field names/column
names. every attribute would have an associated domain. For example: In the following diagram, we have a
schema that shows the relationship between three tables: Course, Student and Section. The diagram only shows
the design of the database, it doesn’t show the data present in those tables. Schema is only a structural
view(design) of a database as shown in the diagram below.
These query languages basically will have queries on tables in the database. In the relational database, a
table is known as relation. Records / rows of the table are referred as tuples. Columns of the table are also known
as attributes. All these names are used interchangeably in relational database.
Relational Operations:
Given this simple and restricted data structure, it is possible to define some very powerful relational operators
which, from the users' point of view, act in parallel' on all entries in a table simultaneously, although their
implementation may require conventional processing.
Codd originally defined eight relational operators.
1. SELECT originally called RESTRICT
2. PROJECT
3. JOIN
4. PRODUCT
5. UNION
6. INTERSECT
7. DIFFERENCE
8. DIVIDE
KEYS in DBMS:
KEYS in DBMS is an attribute or set of attributes which helps you to identify a row(tuple) in a relation(table).
They allow you to find the relation between two tables. Keys help you uniquely identify a row in a table by a
combination of one or more columns in that table. Key is also helpful for finding unique record or row from the
table. Database key is also helpful for finding unique record or row from the table.
Here are some reasons for using SQL key in the DBMS system.
Keys help you to identify any row of data in a table. In a real-world application, a table could
contain thousands of records. Moreover, the records could be duplicated. Keys ensure that you can
uniquely identify a table record despite these challenges.
Allows you to establish a relationship between and identify the relation between tables
Help you to enforce identity and integrity in the relationship.
There are mainly 4 different types of Keys in DBMS and each key has it’s different functionality:
CANDIDATE KEY:
CANDIDATE KEY is a set of attributes that uniquely identify tuples in a table. Candidate Key is a super key
with no repeated attributes. The Primary key should be selected from the candidate keys. Every table must have at
least a single candidate key. A table can have multiple candidate keys but only a single primary key.
Example: In the given table Stud ID, Roll No, and email attributes are candidate keys which help us to uniquely
identify the student record in the table.
Example:
3 13 RA Natan [email protected]
ALTERNATE KEY:
ALTERNATE KEYS is a column or group of columns in a table that uniquely identify every row in that table. A
table can have multiple choices for a primary key but only one can be set as the primary key. All the keys which
are not primary key are called an Alternate Key.
Example:
In this table, StudID, Roll No, Email are qualified to become a primary key. But since StudID is the primary key,
Roll No, Email becomes the alternative key.
SUPER KEY:
A superkey is a group of single or multiple keys which identifies rows in a table. A Super key may
have additional attributes that are not needed for unique identification.
Example:
FOREIGN KEY:
FOREIGN KEY is a column that creates a relationship between two tables. The purpose of Foreign keys is to
maintain data integrity and allow navigation between two different instances of an entity. It acts as a cross-
reference between two tables as it references the primary key of another table.
Example:
In this key in dbms example, we have two table, teach(TABLE 2) and department(TABLE 1) in a school.
However, there is no way to see which search work in which department.
In this table, adding the foreign key in Deptcode to the Teacher name, we can create a relationship between
the two tables.
UNIT-II
DATA MODEL
The data model describes the structure of a database. It is a collection of conceptual tools for describing data, data
relationships
The entity-relationship data model perceives the real world as consisting of basic objects, called entities and
relationships among these objects. It was developed to facilitate data base design by allowing specification of
an enterprise schema which represents the overall logical structure of a data base.
Entity Relationship Diagram Symbols & Notations mainly contains three basic symbols which are rectangle, oval
and diamond to represent relationships between elements, entities and attributes. There are some sub-elements
which are based on main elements in ERD Diagram. ER Diagram is a visual representation of data that describes
how data is related to each other using different ERD Symbols and Notations.
Rectangles: This Entity Relationship Diagram symbol represents entity types
Ellipses: Symbol represent attributes
Diamonds: This symbol represents relationship types
Lines: It links attributes to entity types and entity types with other relationship types
Primary key: attributes are underlined
Double Ellipses: Represent multi-valued attributes
ER Diagram Symbols
ENTITY:
A real-world thing either living or non-living that is easily recognizable and nonrecognizable. It is
anything in the enterprise that is to be represented in our database. It may be a physical thing or simply a fact
about the enterprise or an event that happens in the real world.
An entity can be place, person, object, event or a concept, which stores data in the database. The
characteristics of entities are must have an attribute, and a unique key. Every entity is made up of some 'attributes'
which represent that entity.
Examples of entities:
Person: Employee, Student, Patient
Place: Store, Building
Object: Machine, product, and Car
Event: Sale, Registration, Renewal
Concept: Account, Course
Strong Entities
A strong entity is a type of entity which have key attribute. It can be identified uniquely by considering the
primary key of same entity. It is represented with single line rectangle symbol.
Weak Entities
A weak entity is a type of entity which doesn't have its key attribute. It can be identified uniquely by considering
the primary key of another entity. It is represented with double lined rectangle symbol.
ATTRIBUTE
An attribute describes the property of an entity. It is a single-valued property of either an entity-type or a
relationship-type. For example, a lecture might have attributes: time, date, duration, place, etc. An attribute is
represented an Ellipse (Oval) in ER diagram. There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student roll number can uniquely
identify a student from a set of students. Key attribute is represented by Ellipse (oval) same as other attributes
however the text of key attribute is underlined.
2. Composite attribute:
An attribute that is a combination of other attributes is known as composite attribute. For example, In student
entity, the student address is a composite attribute as an address is composed of other attributes such as pin code,
state, country.
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is represented with double lined
oval in an ER Diagram. For example – A person can have more than one phone numbers so the phone number
attribute is multivalued.
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It is represented by dashed
oval in an ER Diagram. For example – Person age is a derived attribute as it changes over time and can be derived
from another attribute (Date of birth).
RELATIONSHIP:
Relationship is nothing but an association among two or more entities. A relationship is represented by
diamond shape in ER diagram, it shows the relationship among entities. There are four types of relationships:
1. One to One
2. One to Many
3.Many to One
4. Many to Many
EER MODEL
Today time the complexity of the data is increasing so it becomes more and more difficult to use the
traditional ER model for database modeling. To reduce this complexity of modeling we have to make
improvements or enhancements were made to the existing ER model to make it able to handle the complex
application in a better way.
Enhanced entity-relationship diagrams are advanced database diagrams very similar to regular ER
diagrams which represent requirements and complexities of complex databases. EER is a high-level data model
that incorporates the extensions to the original ER model.
1. Generalization
Generalization is the process of generalizing the entities which contain the properties of all the
generalized entities.
It is a bottom approach, in which two lower level entities combine to form a higher level entity.
Generalization is the reverse process of Specialization.
It defines a general entity type from a set of specialized entity type.
It minimizes the difference between the entities by identifying the common features.
For example:
In the above example, Tiger, Lion, Elephant can all be generalized as Animals.
2. Specialization
Specialization is a process that defines a group entities which is divided into sub groups based on their
characteristic.
It is a top down approach, in which one higher entity can be broken down into two lower level entity.
It maximizes the difference between the members of an entity by identifying the unique characteristic or
attributes of each member.
It defines one or more sub class for the super class and also forms the superclass/subclass relationship.
For example
In the above example, Employee can be specialized as Developer or Tester, based on what role they play in an
Organization.
3. Aggregation
Aggregation is a process that represents a relationship between a whole object and its component parts.
It abstracts a relationship between objects and viewing the relationship as an object.
It is a process when two entitiesare treated as a single entity.
In the above example, the relation between College and Course is acting as an Entity in Relation with Student.
ANOMALIES IN DBMS
Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly,
update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss normal forms with
examples.
There are three types of anomalies that occur when the database is not normalized. These are – Insertion,
update and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named employee that has four
attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for storing
employee’s address and emp_dept for storing the department details in which the employee works. At some point
of time the table looks like this:
emp_id emp_name emp_address emp_dept
The above table is not normalized. We will see the problems that we face when a table is not normalized.
Update anomaly:
In the above table we have two rows for employee Ramesh as he belongs to two departments of the
company. If we want to update the address of Ramesh then we have to update the same in two rows or the data
will become inconsistent. If somehow, the correct address gets updated in one department but not in other then as
per the database, Ramesh would be having two different addresses, which is not correct and would lead to
inconsistent data.
Insert anomaly:
Suppose a new employee joins the company, who is under training and currently not assigned to any
department then we would not be able to insert the data into the table if emp_dept field doesn’t allow nulls.
Delete anomaly:
Suppose, if at a point of time the company closes the department D890 then deleting the rows that are
having emp_dept as D890 would also delete the information of employee Mahesh since she is assigned only to
this department.
To overcome these anomalies we need to normalize the data.
NORMALIZATION:
Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly,
update anomaly & deletion anomaly. Normalization divides the larger table into the smaller table and links them
using relationship.The normal forms are used to reduce redundancy from the database table. Normalization rules
are divided into the following normal forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
For example: If you have a column dob to save date of births of a set of people, then you cannot or you must not
save 'names' of some of them in that column along with 'date of birth' of others in that column. It should hold only
'date of birth' for all the records/rows.
Stu Stu
Stu Stu Stu Stu Stu
Grpn Fnam Stu DOB stu Email
rno name Age Fee Mobileno
o e
9866578591
Rames 10000/
1 468 Ram 2/2/1999 23 , [email protected]
h -
9866500000
17/12/199 10000/
2 468 Raj Rao 23 9254863909 [email protected]
9 -
12000/ [email protected]
1 467 Bheem Rakesh 30/7/2001 21 8796325400
- m
7776578591
15000/ [email protected]
3 466 Ram Rajesh 25/8/2000 20 ,
- m
8886500000
Table: student
Here the student table candidate key columns are stu_Mobileno, stu_Email and a composite primary key is
{stu_rno, stu_Grpno}
Our student table already satisfies 3 rules out of the 4 rules, as all our column names are unique, different
type of data in columns.But out of the 4 different students in our table, 2 have opted for more than 1 mobile no.
And we have stored as per the 1st Normal form each column must contain atomic value.
Stu Stu
Stu Stu Stu Stu
Grpn Fnam Stu DOB stu Email
rno name Age Fee
o e
Rames 10000/
1 468 Ram 2/2/1999 23 [email protected]
h -
17/12/199 10000/
2 468 Raj Rao 23 [email protected]
9 -
1 467 Bheem Rakesh 30/7/2001 21 12000/ [email protected]
- m
15000/ [email protected]
3 466 Ram Rajesh 25/8/2000 20
- m
Table 1: Student_data
Stu
stu Email
Mobileno
9866578591 [email protected]
9866500000 [email protected]
9254863909 [email protected]
8796325400 [email protected]
7776578591 [email protected]
8886500000 [email protected]
Table 2: Student_contact
By Breaking student table into two tables(student_data, student_contact) so, although a few values are
getting repeated but values for the stu mobile column are now atomic for each record/row.Using the First Normal
Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as
a whole will be unique.
What is Dependency?
Let's take an example of a Student table with columns stu_rno,stu_Grpno, stu_name,Stu_Fname,
Stu_DOB, Stu_Age, Stu_Fee and Stu_Email. In this table, stu_Email is the primary key and will be unique for
every row, hence we can use stu_Email to fetch any row of data from this table, where student names/rno are
same, if we know the stu_Email we can easily fetch the correct record.
Hence we can say a Primary Key for a table is the column or a group of columns(composite key) which can
uniquely identify each record in the table.
When we askStu_Fname of student with stu_Email ‘[email protected]’, and I can get it. Similarly, if I ask for
name of student i will get it. So all I need is stu_Email and every other column depends on it, or can be fetched
using it.This is Dependency and we also call it Functional Dependency.
The functional dependency is a relationship that exists between two attributes. It typically exists between
the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
In the above table we are saving the stu_Email to know all other data ofstudent .and alsotogether, stu_rno
+ stu_Grpno forms a Candidate Key for this table, which can be the Primary key.
1. For a table to be in the Second Normal form, it should be in the First Normal form and it should not have
Partial Dependency.
2. Partial Dependency exists, when for a composite primary key, any attribute in the table depends only on a
part of the primary key and not on the complete primary key.
3. To remove Partial dependency, we can divide the table, remove the attribute which is causing partial
dependency, and move it to some other table where it fits in well.
4. Finally we get three table like student_data, student_contact and Fee_details
Here all columns depends on both stu_rno and stu_Grpno. what about stu_age column? Does it depend on
our Student_datatable primary key?
Well, the column Stu_age depends on stu_DOB. For example, if Stu_DOB is updated then automatically
stu_age is also updated.But, stu_DOBis just another column in the student_data table. It is not a primary key or
even a part of the primary key, and stu_age depends on it.
This is Transitive Dependency. When a non-prime attribute depends on other non-prime attributes rather
than depending upon the prime attributes or primary key.
Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also known as 3.5 Normal
Form.
Well, in the student_datatable stu_rno, stu_Grpno together form the primary key, because
using stu_rno and stu_Grpno, we can find all the columns of thestudent_data table.This table satisfies the 1st
Normal form because all the values are atomic, column names are unique and all the values stored in a particular
column are of same domain.
This table also satisfies the 2nd Normal Form as there is no Partial Dependency.And, there is
no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.this table is also in Boyce-Codd
Normal Form.