DBMS Notes
DBMS Notes
What is Data?
Data is a collection of a distinct small unit of information. It can be used in a variety of forms like text,
numbers, media, bytes, etc. it can be stored in pieces of paper or electronic memory, etc.
Word 'Data' is originated from the word 'datum' that means 'single piece of information.' It is plural of
the word datum.
In computing, Data is information that can be translated into a form for efficient movement and
processing. Data is interchangeable.
What is Database?
A database is an organized collection of data, so that it can be easily accessed and managed.
You can organize data into tables, rows, columns, and index it to make it easier to find relevant
information.
Database handlers create a database in such a way that only one set of software program provides
access of data to all the users.
The main purpose of the database is to operate a large amount of information by storing, retrieving,
and managing data.
There are many dynamic websites on the World Wide Web nowadays which are handled through
databases. For example, a model that checks the availability of rooms in a hotel. It is an example of a
dynamic website that uses a database.
There are many databases available like MySQL, Sybase, Oracle, MongoDB, Informix, PostgreSQL, SQL
Server, etc.
Modern databases are managed by the database management system (DBMS).
SQL or Structured Query Language is used to operate on the data stored in a database.
SQL depends on relational algebra and tuple relational calculus.
A cylindrical structure is used to display the image of a database.
Evolution of Databases
The database has completed more than 50 years of journey of its evolution from flat-file system to
relational and objects relational systems. It has gone through several generations.
Cloud database
Cloud database facilitates you to store, manage, and retrieve their structured, unstructured data via a
cloud platform. This data is accessible over the Internet. Cloud databases are also called a database as
service (DBaaS) because they are offered as a managed service.
Some best cloud options are: AWS (Amazon Web Services), Snowflake Computing, Oracle Database
Cloud Services, Microsoft SQL server, Google cloud spanner.
NoSQL Database
A NoSQL database is an approach to design such databases that can accommodate a wide variety of
data models. NoSQL stands for "not only SQL." It is an alternative to traditional relational databases in
which data is placed in tables, and data schema is perfectly designed before the database is built.
NoSQL databases are useful for a large set of distributed data.
Some examples of NoSQL database system with their category are:
o MongoDB, CouchDB, Cloudant (Document-based)
o Memcached, Redis, Coherence (key-value store)
o HBase, Big Table, Accumulo (Tabular)
Advantage of NoSQL
High Scalability
NoSQL can handle an extensive amount of data because of scalability. If the data grows, NoSQL
database scale it to handle that data in an efficient manner.
High Availability
NoSQL supports auto replication. Auto replication makes it highly available because, in case of any
failure, data replicates itself to the previous consistent state.
Disadvantage of NoSQL
Open source
NoSQL is an open-source database, so there is no reliable standard for NoSQL yet.
Management challenge
Data management in NoSQL is much more complicated than relational databases. It is very challenging
to install and even more hectic to manage daily.
GUI is not available
GUI tools for NoSQL database are not easily available in the market.
Backup
Backup is a great weak point for NoSQL databases. Some databases, like MongoDB, have no powerful
approaches for data backup.
Graph Databases
A graph database is a NoSQL database. It is a graphical representation of data. It contains nodes and
edges. A node represents an entity, and each edge represents a relationship between two edges. Every
node in a graph database represents a unique identifier.
Graph databases are beneficial for searching the relationship between data because they highlight the
relationship between relevant data.
Graph databases are very useful when the database contains a complex relationship and dynamic
schema.
It is mostly used in supply chain management, identifying the source of IP telephony.
In a database, even the smallest portion of information becomes the data. For example,
a Student is a data, a roll number is a data, and the address is data, height, weight,
marks everything is data. In brief, all the living and non-living objects in this world are
data.
One of the primary aims of a database is to supply users with an abstract view of data,
hiding a certain element of how data is stored and manipulated. Therefore, the starting
point for the design of a database should be an abstract and general description of the
information needs of the organization that is to be represented in the database. And
hence you will require an environment to store data and make it work as a database.
Here, the hardware in a database environment means the computers and computer
peripherals that are being used to manage a database, and the software means the whole
thing right from the operating system (OS) to the application programs that include
database management software like M.S. Access or SQL Server. Again the people in a
database environment include those people who administrate and use the system. The
techniques are the rules, concepts, and instructions given to both the people and the
software along with the data with the group of facts and information positioned within the
database environment.
o Database management system is a software which is used to manage the database. For
example: MySQL, Oracle, etc are a very popular commercial database which is used in different
applications.
o DBMS provides an interface to perform various operations like database creation, storing data
in it, updating data, creating a table in the database and a lot more.
o It provides protection and security to the database. In the case of multiple users, it also maintains
data consistency.
DBMS allows users the following tasks:
o Data Definition: It is used for creation, modification, and removal of definition that defines the
organization of data in the database.
o Data Updation: It is used for the insertion, modification, and deletion of the actual data in the
database.
o Data Retrieval: It is used to retrieve the data from the database which can be used by
applications for various purposes.
o User Administration: It is used for registering and monitoring users, maintain data integrity,
enforcing data security, dealing with concurrency control, monitoring performance and
recovering information corrupted by unexpected failure.
Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It contains ACID properties which maintain data in a healthy state in case of failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints according to the requirements of the user.
Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores all the data in one
single database file and that recorded data is placed in the database.
o Data sharing: In DBMS, the authorized users of an organization can share the data among multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the database system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic backup of data
from hardware and software failures and restores the data if required.
o multiple user interface: It provides different types of user interfaces like graphical user interfaces,
application program interfaces
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and large memory size to
run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in most of the organization,
all the data stored in a single database and if the database is damaged due to electric failure or database
corruption then the data may be lost forever.
Relational Database
1970 - Present: It is the era of Relational Database and Database Management. In 1970, the relational
model was proposed by E.F. Codd.
Relational database model has two main terminologies called instance and schema.
The instance is a table with rows or columns
Schema specifies the structure like name of the relation, type of each column and name.
This model uses some mathematical concept like set theory and predicate logic.
The first internet database application had been created in 1995.
During the era of the relational database, many more models had introduced like object-oriented
model, object-relational model, etc.
The Relational Database Management System (RDBMS) has become the leading data-processing software in
use nowadays. This software signifies the second generation of DBMSs and is based on the relational data
model proposed by Mr. E. F. Codd in the year 1970.
In the relational model, all data is logically structured within relations, i.e., tables, as mentioned above. Each
relation has a name and is formed from named attributes or columns of data. Each tuple or row holds one value
per attribute. The greatest strength of the relational model is the simple logical structure that it forms. Behind this
simple structure is a sophisticated theoretical foundation that is lacking in the first generation of DBMSs.
• To allow a high degree of data independence, application programs must not be affected by
alterations to the internal data representation, mostly by changes to file organizations or access
paths.
• To provide considerable grounds for dealing with data semantics, reliability, and redundancy
problems. In particular, Codd's theory for the relational model introduced the concept of
normalized relations, were relations that have no repeating groups, and the process is called
normalization.
• To allow the expansion of set-oriented data manipulation languages.
DATABASE SCHEMA
When you talk about the database, you must distinguish between the database schema, which is the
logical blueprint of the database, and the database instance, which is a snapshot of the data in the
database at a given instant in time. The concept of a relation corresponds to the programming language
notion of a variable. In contrast, the concept of a relation schema corresponds to the programming
languages' notion of the type definition. In other words, a database schema is a skeletal structure that
represents the logical view of the complete database. It describes how the data is organized and how
the relations among them are associated and formulates all the constraints that are to be applied to the
data.
How it works
Data is represented in terms of tuples (rows) in RDBMS. A relational database is the most commonly
used database. It contains several tables, and each table has its primary key.
Due to a collection of an organized set of tables, data can be accessed easily in RDBMS.
What is table/Relation?
Everything in a relational database is stored in the form of relations. The RDBMS database uses tables
to store data. A table is a collection of related data entries and contains rows and columns to store data.
Each table represents some real-world objects such as person, place, or event about which information
is collected. The organized collection of data into a relational table is known as the logical view of the
database.
o Each relation has a unique name by which it is identified in the database.
o Relation does not contain duplicate tuples.
o The tuples of a relation have no specific order.
o All attributes in a relation are atomic, i.e., each cell of a relation contains exactly one value.
A table is the simplest example of data stored in RDBMS.
Let's see the example of the student table.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
1 Ajeet 24 B.Tech
What is a column/attribute?
A column is a vertical entity in the table which contains all information associated with a specific field
in a table. For example, "name" is a column in the above table which contains all information about a
student's name.
Properties of an Attribute:
o Every attribute of a relation must have a name.
o Null values are permitted for the attributes.
o Default values can be specified for an attribute automatically inserted if no other value is specified for an
attribute.
o Attributes that uniquely identify each tuple of a relation are the primary key.
Name
Ajeet
Aryan
Mahesh
Ratan
Vimal
Degree:
The total number of attributes that comprise a relation is known as the degree of the table.
For example, the student table has 4 attributes, and its degree is 4.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
Cardinality:
The total number of tuples at any one time in a relation is known as the table's cardinality. The relation
whose cardinality is 0 is called an empty table.
For example, the student table has 5 rows, and its cardinality is 5.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
Domain:
The domain refers to the possible values each attribute can contain. It can be specified using standard
data types such as integers, floating numbers, etc. For example, An attribute entitled Marital_Status
may be limited to married or unmarried values.
NULL Values
The NULL value of the table specifies that the field has been left blank during record creation. It is
different from the value filled with zero or a field that contains space.
Data Integrity
There are the following categories of data integrity exist with each RDBMS:
Entity integrity: It specifies that there should be no duplicate rows in a table.
Domain integrity: It enforces valid entries for a given column by restricting the type, the format, or the
range of values.
Referential integrity specifies that rows cannot be deleted, which are used by other records.
User-defined integrity: It enforces some specific business rules defined by users. These rules are
different from the entity, domain, or referential integrity.
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to establish and
identify relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the
PERSON table, passport_number, license_number, SSN are keys since they are unique for each person.
Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain
multiple keys, as we saw in the PERSON table. The key which is most suitable from those lists becomes a
primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the EMPLOYEE
table, we can even select License_Number and Passport_Number as primary keys since they are also
unique.
o For each entity, the primary key selection is based on requirements and developers.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys
are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes,
like SSN, Passport_Number, License_Number, etc., are considered a candidate key.
CANDIDATE KEY
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate
key.
For example: In the above EMPLOYEE table, for (EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also
be a key.
The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another table.
o Every employee works in a specific department in a company, and employee and department are two
different entities. So we can't store the department's information in the employee table. That's why we
link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the EMPLOYEE
table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple in
a relation. These attributes or combinations of the attributes are called the candidate keys. One key is
chosen as the primary key from these candidate keys, and the remaining candidate key, if it exists, is
termed the alternate key. In other words, the total number of the alternate keys is the total number of
candidate keys minus the primary key. The alternate key may or may not exist. If there is only one
candidate key in a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate
keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No,
acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is
also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple roles, and
an employee may work on multiple projects simultaneously. So the primary key will be composed of all
three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes act as a
composite key since the primary key comprises more than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created when
a primary key is large and complex and has no relationship with many other relations. The data values
of the artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add a new virtual attribute to identify each tuple in the
relation uniquely.
One to Many Relationship: It is used to create a relationship between two tables. Any single rows of
the first table can be related to one or more rows of the second tables, but the rows of second tables
can only relate to the only row in the first table. It is also known as a many to one relationship.
Representation of One to Many relational databases:
A relational database can store and arrange the data in It is used to store the data as files.
the tabular form like rows and columns.
The data normalization feature is available in the It does not have a normalization.
relational database.
In a relational database, the values are stored as tables Generally, it stores the data in the hierarchical or
that require a primary keys to possess the data in a navigational form.
database.
It is designed to handle a huge collection of data and It is designed to handle the small collection of
multiple users. data files that requires a single user.
A relational database uses integrity constraints rules that It does not follow any integrity constraints rule
are defined in ACID properties. nor utilize any security to protect the data from
manipulation.
Stored data can be accessed from the relational database There is no relationship between data value or
because there is a relationship between the tables and tables stored in files.
their attributes.