DATABASE
DATABASE
DATABASES
INTRODUCTION
A database is an organized collection of related data stored in a way that it can be easily retrieved
and manipulated. A telephone directory, a library catalogue and a class register are examples of
manual or paper-based database systems. A paper-based database requires much paper as the
database becomes larger, making it difficult to manipulate the database. The problems caused
by paper-based systems are solved by the development of computer-based systems.
The capacity for computers to store large amounts of data and their ability to quickly and
efficiently retrieve the data makes them ideal for creating and using electronic or computerized
databases. A computerized database refers to a collection of related files that are digitized.
Computerized databases are created using database software called database management
systems (DBMS).
Database management system is a software which is used to manage the database. For
example: MySQL, Oracle, etc are a very popular commercial database which is used in different
applications.
DBMS provides an interface to perform various operations like database creation, storing data in
it, updating data, creating a table in the database and a lot more.
It provides protection and security to the database. In the case of multiple users, it also maintains
data consistency.
1|P a g e
SIR MEEK
Characteristics of DBMS
➢ It uses a digital repository established on a server to store and manage the information.
➢ It can provide a clear and logical view of the process that manipulates data.
➢ DBMS contains automatic backup and recovery procedures.
➢ It contains ACID properties which maintain data in a healthy state in case of failure.
➢ It can reduce the complex relationship between data.
➢ It is used to support manipulation and processing of data.
➢ It is used to provide security of data.
➢ It can view the database from different viewpoints according to the requirements of the
user.
Advantages of DBMS
➢ Controls database redundancy: It can control data redundancy because it stores all the
data in one single database file and that recorded data is placed in the database.
➢ Data sharing: In DBMS, the authorized users of an organization can share the data among
multiple users.
➢ Easily Maintenance: It can be easily maintainable due to the centralized nature of the
database system.
➢ Reduce time: It reduces development time and maintenance need.
➢ Backup: It provides backup and recovery subsystems which create automatic backup of
data from hardware and software failures and restores the data if required.
➢ multiple user interface: It provides different types of user interfaces like graphical user
interfaces, application program interfaces
Disadvantages of DBMS
➢ Cost of Hardware and Software: It requires a high speed of data processor and large
memory size to run DBMS software.
➢ Size: It occupies a large space of disks and large memory to run them efficiently.
➢ Complexity: Database system creates additional complexity and requirements.
➢ Higher impact of failure: Failure is highly impacted the database because in most of the
organization, all the data stored in a single database and if the database is damaged due
to electric failure or database corruption then the data may be lost forever.
2|P a g e
SIR MEEK
SOME KEY TERMS IN DATABASE MANAGEMENT SYSTEM
A Record is a single row of data in a DBMS. It contains all of the information for a single entity,
such as a customer or a product. In a customer database, each record would represent a single
customer and contain all of the information about that customer. Records are accessed by using
the primary key, which is a unique identifier for each record. So, in our example, the customer ID
would be the primary key, and it would be used to find a specific customer record in the database.
A Field is a column in a table that contains a specific type of data. For example, a customer
database might have fields for the customer name, customer address, customer ID, and so on.
Each field can only contain one type of data, and the field name must be unique within the table.
Fields are used to store data in a DBMS, and they are the building blocks of tables.
An Attribute is a single piece of information that is stored in a field in a DBMS. For example, in a
customer database, the customer name would be an attribute that is stored in the customer
name field. Each field in a table can have multiple attributes, and each attribute can be of a
different data type. For example, the customer name field might be a text field, while the
customer ID field would be a number field. Attributes are the basic building blocks of fields, and
they're used to store and organize data in a DBMS.
A primary key is a special type of field in a database management system (DBMS) that uniquely
identifies each record in a table. Every table in a DBMS must have a primary key, and no two
records in the table can have the same primary key value. A primary key is often used to create
relationships between tables in the database. For example, in a customer database, the primary
key for the customer table might be the customer ID. This would allow us to link the customer
table to other tables, such as the order table, which might have a foreign key that references the
customer ID.
A composite key is a type of key in a DBMS that is made up of multiple fields. For example, in a
customer database, the primary key might be a composite key that consists of the customer ID
and the date of birth. This would ensure that each customer record is unique, since it's unlikely
that any two customers would have the same ID and date of birth. Composite keys are useful
when there isn't a single field that is unique for every record.
A foreign key is a field in one table that references the primary key of another table. For example,
in a customer database, the order table might have a foreign key that references the customer
ID field in the customer table. This allows us to link the two tables and retrieve information from
both tables using the foreign key. Foreign keys are essential for creating relationships between
tables in a DBMS.
3|P a g e
SIR MEEK
4|P a g e
SIR MEEK
MODULE 2: DATABASE DESIGN AND DATA MODELING
TYPES OF DATABASES
CENTRALIZED DATABASE
It is the type of database that stores data at a centralized database system. It comforts the users
to access the stored data from different locations through several applications. These
applications contain the authentication process to let users access data securely. An example of
a Centralized database can be Central Library that carries a central database of each library in a
college/university.
DISTRIBUTED DATABASE
Unlike a centralized database system, in distributed systems, data is distributed among different
database systems of an organization. These database systems are connected via communication
links. Such links help the end-users to access the data easily. Examples of the Distributed
database are Apache Cassandra, HBase, Ignite, etc.
We can further divide a distributed database system into:
➢ Homogeneous DDB: Those database systems which execute on the same operating
system and use the same application process and carry the same hardware devices.
➢ Heterogeneous DDB: Those database systems which execute on different operating
systems under different application procedures, and carries different hardware devices.
5|P a g e
SIR MEEK
FLAT-FILE DATABASE
A flat file database is a single table database, with separate copies of data in each part of the
business. An example is a phone directory. The problems encountered with flat file databases
are
➢ Data duplication: data is repeated and hence stored many times. This wastes disk space
and slows down query time.
➢ Maintenance is difficult as every occurrence of a piece of data needs to be updated if
its value changes
➢ More manual data entry is required and therefore a greater likelihood of errors when
data is being entered.
The solution to these problems is to divide the data into logical groups and store the data in
multiple tables, then connect (relate) the tables to each other. This results to a Relational
database.
6|P a g e
SIR MEEK
There are four commonly known properties of a relational model known as ACID properties,
where:
A means Atomicity: This ensures the data operation will complete either with success or with
failure. It follows the 'all or nothing' strategy. For example, a transaction will either be committed
or will abort.
C means Consistency: If we perform any operation over the data, its value before and after the
operation should be preserved. For example, the account balance before and after the
transaction should be correct, i.e., it should remain conserved.
I means Isolation: There can be concurrent users for accessing data at the same time from the
database. Thus, isolation between the data should remain isolated. For example, when multiple
transactions occur at the same time, one transaction effects should not be visible to the other
transactions in the database.
D means Durability: It ensures that once it completes the operation and commits the data, data
changes should remain permanent.
NOSQL DATABASE
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data sets. It
is not a relational database as it stores data not only in tabular form but in several different ways.
It came into existence when the demand for building modern applications increased. Thus,
NoSQL presented a wide variety of database technologies in response to the demands. We can
further divide a NoSQL database into the following four types:
Key-value storage: It is the simplest type of database storage where it stores every single item
as a key (or attribute name) holding its value, together.
Document-oriented Database: A type of database used to store data as JSON-like document. It
helps developers in storing data by using the same document-model format as used in the
application code.
Graph Databases: It is used for storing vast amounts of data in a graph-like structure. Most
commonly, social networking websites use the graph database.
Wide-column stores: It is similar to the data represented in relational databases. Here, data is
stored in large columns together, instead of storing in rows.
7|P a g e
SIR MEEK
CLOUD DATABASE
A type of database where data is stored in a virtual environment and executes over the cloud
computing platform. It provides users with various cloud computing services for accessing the
database. There are numerous cloud platforms, but the best options are:
➢ Amazon Web Services(AWS)
➢ Microsoft Azure
➢ Kamatera
➢ PhonixNAP
➢ ScienceSoft
➢ Google Cloud SQL, etc.
8|P a g e
SIR MEEK
RELATIONAL DATABASE MANAGEMENT SYSTEM
All modern database management systems like SQL, MS SQL Server, IBM DB2, ORACLE, My-SQL,
and Microsoft Access are based on RDBMS.
It is called Relational Database Management System (RDBMS) because it is based on the
relational model introduced by E.F. Codd.
Data is represented in terms of tuples (rows) in RDBMS. A relational database is the most
commonly used database. It contains several tables, and each table has its primary key.
Due to a collection of an organized set of tables, data can be accessed easily in RDBMS.
Brief History of RDBMS
Following are the various terminologies of RDBMS:
What is table/Relation?
Everything in a relational database is stored in the form of relations. The RDBMS database uses
tables to store data. A table is a collection of related data entries and contains rows and columns
to store data. Each table represents some real-world objects such as person, place, or event
about which information is collected. The organized collection of data into a relational table is
known as the logical view of the database.
Properties of a Relation:
➢ Each relation has a unique name by which it is identified in the database.
➢ Relation does not contain duplicate tuples.
➢ The tuples of a relation have no specific order.
➢ All attributes in a relation are atomic, i.e., each cell of a relation contains exactly one
value.
A table is the simplest example of data stored in RDBMS.
9|P a g e
SIR MEEK
Let's see the example of the student table.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
1 Ajeet 24 B.Tech
What is a column/attribute?
A column is a vertical entity in the table which contains all information associated with a specific
field in a table. For example, "name" is a column in the above table which contains all information
about a student's name.
Properties of an Attribute:
➢ Every attribute of a relation must have a name.
➢ Null values are permitted for the attributes.
➢ Default values can be specified for an attribute automatically inserted if no other
value is specified for an attribute.
➢ Attributes that uniquely identify each tuple of a relation are the primary key.
10 | P a g e
SIR MEEK
Name
Ajeet
Aryan
Mahesh
Ratan
Vimal
1 Ajeet 24 B.Tech
Degree:
The total number of attributes that comprise a relation is known as the degree of the table.
For example, the student table has 4 attributes, and its degree is 4.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
11 | P a g e
SIR MEEK
Cardinality:
The total number of tuples at any one time in a relation is known as the table's cardinality. The
relation whose cardinality is 0 is called an empty table.
For example, the student table has 5 rows, and its cardinality is 5.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
Domain:
The domain refers to the possible values each attribute can contain. It can be specified using
standard data types such as integers, floating numbers, etc. For example, An attribute entitled
Marital_Status may be limited to married or unmarried values.
NULL Values
The NULL value of the table specifies that the field has been left blank during record creation. It
is different from the value filled with zero or a field that contains space.
Data Integrity
There are the following categories of data integrity exist with each RDBMS:
Entity integrity: It specifies that there should be no duplicate rows in a table.
Domain integrity: It enforces valid entries for a given column by restricting the type, the format,
or the range of values.
Referential integrity specifies that rows cannot be deleted, which are used by other records.
User-defined integrity: It enforces some specific business rules defined by users. These rules are
different from the entity, domain, or referential integrity.
12 | P a g e
SIR MEEK
The main differences between DBMS and RDBMS are given below:
1) DBMS applications store data as file. RDBMS applications store data in a tabular form.
2) In DBMS, data is generally stored in In RDBMS, the tables have an identifier called primary
either a hierarchical form or a key and the data values are stored in the form of tables.
navigational form.
4) DBMS does not apply any RDBMS defines the integrity constraint for the purpose
security with regards to data of ACID (Atomocity, Consistency, Isolation and
manipulation. Durability) property.
5) DBMS uses file system to store data, in RDBMS, data values are stored in the form of tables,
so there will be no relation between so a relationship between these data values will be
the tables. stored in the form of a table as well.
6) DBMS has to provide some uniform RDBMS system supports a tabular structure of the data
methods to access the stored and a relationship between them to access the stored
information. information.
8) DBMS is meant to be for small RDBMS is designed to handle large amount of data. it
organization and deal with small supports multiple users.
data. it supports single user.
9) Examples of DBMS are file Example of RDBMS are mysql, postgre, sql
systems, xml etc. server, oracle etc.
After observing the differences between DBMS and RDBMS, you can say that RDBMS is an
extension of DBMS. There are many software products in the market today who are compatible
for both DBMS and RDBMS. Means today a RDBMS application is DBMS application and vice-
versa.
13 | P a g e
SIR MEEK
DATABASE NORMALIZATION
Database normalization is the process of organizing the fields and tables of a relational database
to minimize redundancy and dependency. Normalization usually involves dividing large tables
into smaller and less redundant tables and defining relationships between them. Normalization
works through a series of stages known as normal forms. In order to achieve one level of normal
form, each previous level must be met.
Advantages of Normalization
➢ Normalization helps to minimize data redundancy.
➢ Greater overall database organization.
➢ Data consistency within the database.
➢ Much more flexible database design.
➢ Enforces the concept of relational integrity.
Disadvantages of Normalization
➢ You cannot start building the database before knowing what the user needs.
➢ The performance degrades when normalizing the relations to higher normal forms
➢ It is very time-consuming and difficult to normalize relations of a higher degree.
➢ Careless decomposition may lead to a bad database design, leading to serious
problems.
14 | P a g e
SIR MEEK
FIRST NORMAL FORM
A relation (Table) is in first normal form (1NF) if and only if
✓ A relation will be 1NF if it contains an atomic value.
✓ It contains a primary key. A primary key is an attribute that identifies each entity in a
unique way.
✓ It contains no multivalued field or repeating groups. A multivalued field is one that may
take several values for a single record. A repeating group is a set of one or more
multivalued attributes that are related.
For example:
Problems:
➢ The table has no primary key
➢ It contains multivalued fields “Subject” and “Grade”
15 | P a g e
SIR MEEK
StudID StudName School Subject Grade
Exercise: The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385, UP
9064738238
Now we will observe that StudID is no longer valid as primary key. It no longer identifies each
row (record) uniquely.
To solve this problem, we declare StudID and Subject together to uniquely identify each row.
Now, the new primary key is StudID and Subject. It is a composite key.
16 | P a g e
SIR MEEK
SECOND NORMAL FORM
A relation is in second normal form (2NF) if and only if
➢ It is in 1NF
➢ Every non-key attribute is fully dependent on the primary key. In other words, there
should be no partial dependencies.
To put the data model to 2NF, we have to ensure that every non-key attribute is functionally
dependent on the entire primary key. An attribute B is said to be functionally dependent on
another attribute A if A determines B, and is written as A → B
{StudID} → {StudName}
{StudID} → {School}
{StudID} → {Grade}
{Subject} → {StudName}
{Subject} → {School}
{Subject} → {Grade}
StudName is dependent on StudID but it is not dependent on subject, the other part of the key.
School is neither dependent on StudID nor Subject.
Grade is dependent on Subject but to have the grade in a subject you need a StudID. Therefore,
grade is dependent on both Subject and StudID.
To solve these problems, we create separate tables for the fields that are not functionally
dependent on the entire key. The primary key for these tables is the part of the primary key on
which they are dependent.
We now have,
17 | P a g e
SIR MEEK
THIRD NORMAL FORM
A relation is in third normal form (3NF) if and only if
➢ It is in 2NF
➢ There are no transitive dependencies. Transitive dependency is a situation where a
non-key attribute depends on another non-key attribute.
Example:
EMPLOYEE_DETAIL table:
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The
non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It
violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table,
with EMP_ZIP as a Primary key.
18 | P a g e
SIR MEEK
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
19 | P a g e
SIR MEEK
ENTITY-RELATIONSHIP MODELING
An E-R diagram serves as a schema diagram for the required database. A schema diagram is any
diagram that attempts to show the structure of the data in a database.
Name
Number
DOB Street
An E-R diagram
The basic elements of an E-R diagram are entity sets, attributes and relationship types.
Entity Set
An entity is a person, place, concept or thing for which we intend to collect data. For example, a
customer, an employee, a book, an appointment.
A group of entities that share the same properties is an entity set. An entity is therefore a
member or an instance of an entity set. In an E-R diagram, an entity set is represented by a
rectangle. In the above E-R diagram, PERSON and HOUSE are entity sets
Attributes
An attribute is a fact about an entity or a property that describes an entity. For example, a
person’s name, date of birth or gender, a vehicle’s model, color or brand. Attributes store the
actual data we want to keep about each entity within an entity set. An attribute is represented
by an ellipse. In the above E-R diagram, name and date of birth are attributes of the entity set
PERSON while Number and Street are attributes of the entity set HOUSE. An attribute can be
simple, composed, derived, single valued or multi-valued.
20 | P a g e
SIR MEEK
➢ Simple attribute: simple attribute is an atomic value, which cannot be divided further.
For example, a person’s phone number is an atomic value of 9 digits.
➢ Composite attribute: an attribute made of more than one simple attribute. For example,
a person’s complete name may have FName, MName and LName.
➢ Derived attribute: an attribute which does not exist physically in the database, but its
value is derived from other attributes present in the database. For example, a person’s
age can be derived from his date_of_birth.
➢ Single-valued attribute: an attribute that contains one single value. For example ID car
number.
➢ Multi-valued attribute: an attribute than can contain more than one value for the same
entity. For example, a person can have more than one phone numbers or e-mail
addresses.
Relationship Type
A relationship type is a named association between entities. A person (entity) owns (relationship)
a house (entity), a teacher (entity) teaches (relationship) a subject (entity). Normally, individual
entities have individual relationships of the type between them but in an E-R diagram, this is
generalized to entity sets and relationship types. For example, the entity set PERSON is related
to the entity set HOUSE by the relationship type OWNS.A relationship type is represented by a
diamond.
Degree of a Relationship
The degree of a relationship refers to the number of participating entities in the relationship. It
can be UNARY, BINARY, TERNARY OR N-ARY.
➢ A unary relationship type is one that involves entities from a single entity set. E.g. the
relationship MANAGES between entities within the entity set EMPLOYEE.
Employee Manages
21 | P a g e
SIR MEEK
➢ A binary relationship type is a relationship between entities from two different entity sets.
An example is the relationship OWNS in the E-R diagram above.
DEPARTMENT
The cardinality ratio is a ratio of the cardinalities of the entity sets involved in a relationship. It
can be one-to-one (1:1), one-to-many (1:N) or many-to-many (M:N).
In the relationship OWNS between PERSON and HOUSE, a person can own zero or many houses.
Therefore, the cardinality of PERSON in the relationship OWNS is many while the optionality is
zero.
On the other hand, a house is owned by one person. The cardinality of HOUSE in the inverse
relationship IS_OWNED_BY, is one while the optionality is one. The relationship OWNS is
therefore described as a one-to-many (1:N) relationship.
(1,1) (0,N)
PERSON Owns HOUSE
(1:N) relationship
22 | P a g e
SIR MEEK
In the relationship RECEIVES between STUDENT and SLIP, each student receives one and only one
result slip. The cardinality of student is one and the optionality is one. Each result slip is issued to
one and only one student. The cardinality of SLIP is one and the optionality is one. This
relationship is described as one-to-one (1:1).
(1,1) (1,1)
STUDENT Receives SLIP
(1:1) relationship
In the relationship TEACHES between TEACHER and SUBJECT, a teacher teaches one or many
subjects. The cardinality of TEACHER is many and the optionality is one.
A subject is taught by one or many teachers. The cardinality of SUBJECT in the inverse relationship
IS_TAUGHT_BY, is many and the optionality is one. This relationship is described as many-to-
many (M:N).
(1,M) (1,N)
TEACHER Teaches SUBJECT
(M:N) relationship
The primary key of the link entity is a composite key that consists of the primary keys of the
entities TEACHER and SUBJECT.
The above representation of an E-R diagram is the Chen Convention. Another way of representing
ER diagrams is the crow’s foot notation that uses three symbols to show cardinality ratios. Here,
a circle means zero, a line means one and a crow’s foot means many. The cardinality is shown
next to the entity type and the optionality (if shown at all) is shown behind it.
23 | P a g e
SIR MEEK
Where A and B are entity sets and R is the relationship type.
1) A database will be made to store information about patients in a hospital. On arrival, each
patient’s personal details (name, address, and telephone number) are recorded where
possible, and they are given an admission number. They are then assigned to a particular
ward (Accident and Emergency, Cardiology, Oncology, etc.). In each ward there are a
number of doctors and nurses. A patient will be treated by one doctor and several nurses
over the course of their stay, and each doctor and nurse may be involved with several
patients at any given time.
From the description, draw a corresponding E-R diagram showing all entity sets,
attributes, relationship types and cardinality ratios.
24 | P a g e
SIR MEEK