0% found this document useful (0 votes)
32 views

Unit 1 Notes

Uploaded by

pandeyjiiiiii995
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Unit 1 Notes

Uploaded by

pandeyjiiiiii995
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Introduction

The introductory lesson explains what folks can expect from this course and who the intended
audience is.

We'll cover the following

• Introduction

• Intended audience

• What to expect?

Introduction

Throughout history, humans have tried to store information in one form or another. The Egyptians
inscribed hieroglyphs on tomb walls, papyrus, and wood as long ago as 3200 B.C. to preserve their
message for posterity. In recent centuries, paper has been used as a medium to store information.
Fast forward to the 20th century, electronic storage and retrieval of information have become
commonplace.

The relational database and its cousin, the Structured Query Language, have been empowering
applications and offerings from popular internet companies since the beginning of the internet
revolution. In fact, the most popular online developer destination, Stack Overflow, runs a SQL server.
The advent of Big Data and NoSQL technologies hasn’t put Relational Databases or SQL out of
fashion. These technologies serve different and complementary use cases, albeit there is an overlap
in the capabilities of both. Learning and mastering SQL can be a great asset when foraying into a tech
career as any meaningful application will have a relational database running under the hood, and
SQL is the de facto standard to interact with it. Familiarity with SQL pays rich dividends even in the
Big Data realm, as several software systems such as Hive and Phoenix expose SQL-like interfaces and
syntax to end-users.

Relational vs Non-Relational

This lesson talks about the database landscape as it exists today in the industry.

We'll cover the following

• Relational vs Non-Relational
• Data

• Database Management System (DBMS)

• Database

• Relational/SQL Databases

• Non-Relational/NoSQL Databases

• Big Data

Relational vs Non-Relational

There is a plethora of terms and jargon that can be confusing when first starting to read about data
and its storage. We’ll provide clarification on various terms below:

Data

Data (plural of datum) is defined as distinct pieces of information.

Data can exist in several different forms: numbers, text, bytes, Instagram pictures, or YouTube videos.
These represent various types of data that can be stored and transmitted electronically. Note that
data is usually interpreted in a context, e.g., a data representing prose in the Hindi language can’t be
interpreted as a picture and vice versa.

There are broadly three categories of data:

1. Structured Data: has some pre-defined organizational property about it that makes it easily
searchable and analyzable. The data is backed by a model that dictates the size of each field
of the data: its type, length, and any restrictions on what values it can take on.
Data stored in SQL databases is structured. Structured data is usually formatted in a universally
understandable and identifiable manner. In most instances, a schema formally specifies structured
data. Whenever you are working in any variant of SQL, you are almost always dealing with structured
data.

2. Unstructured Data: is characterized by a lack of organization and a data model that describes
the structure of a single record or attributes of any individual fields within the record.

Videos, audio, blogs, log files, social-media posts, etc, are all examples of unstructured data. It is data
without any conceptual definition or type.

3. Semi-structured Data: is a cross between structured and unstructured data, and though
there’s no explicit data model or structure definition, one may be implied. Semi-structured
data contains semantic tags, but does not conform to the structure associated with typical
relational databases.

Examples include JSON and XML data. It sits in between the spectrum of structured and unstructured
data. Another example to consider is that of metadata related to videos and audios files, which
themselves fall under unstructured data. The metadata such as creator, last accessed, permissions etc
can be considered as semi-structured data.

Semi-structured data contains certain parts that are structured, and others that are not. For example,
X-rays and other large images consist largely of unstructured data comprising of millions of pixels. It is
impossible to search and query these X-ray images in the same way that a large relational database
can be searched, queried and analyzed. However even though the files themselves consist of only
pixels there exists a small section known as metadata within each file that consists of details such as
author, creation timestamp, last accessed etc. This metadata allows for some form of analysis of
unstructured data.

This brings us to the question: if all of unstructured data has associated metadata then what is the
difference between semi-structured and unstructured data? In today’s world there’s hardly any truly
unstructured data with no organization or metadata. In fact, the distinction between semi-structured
and unstructured data is sort of a grey area and disputed. However, both are far away from the
structured rigorously organized data living in relational databases. Often what is referred to as
unstructured data such as videos, images, documents, social media postings, files etc is really semi-
structured data. However, for simplicity data is usually divided and described as structured or
unstructured.

In this course, we’ll be exclusively dealing with structured data.

Database Management System (DBMS)

Usually, when we refer to databases such as MySQL and PostgreSQL, we are talking about a
wholesome system, called the database management system. It’s a software that allows the user to
create, maintain, and delete multiple individual databases. It provides peripheral services and
interfaces for the end-user to interact with the databases.

Database

A database is an organized and structured collection of data, usually stored and retrieved
electronically.

The structure and organization of data helps in efficient retrieval.


There are broadly two types of databases:

1. Relational or SQL Databases

2. Non-Relational or NoSQL Databases

In this course, our focus will be on relational databases.

Relational/SQL Databases

Relational databases consist of data stored as rows in tables. The columns of a table rigidly follow a
defined schema that describes the type and size of the data that a table column can hold. You can
think of a schema as a blueprint of what each record or row in the table should look like. This is why
relational databases only handle structured data. Tables usually have a column as the key, which is
used to uniquely identify each row in a table. A relationship between two tables is defined by a
column or a set of columns that occur in both the tables.

Relational database management systems (RDBMS) are mature and widely adopted technology.
Popular implementations include Oracle, DB2, Microsoft SQL Server, PostgreSQL, and MySQL.

Non-Relational/NoSQL Databases

The rise of Web 2.0 companies made NoSQL databases popular. As data sets handled by internet
companies grew ever bigger in size, a new approach to designing databases came to the fore. The
strict schema of a relational database was shunned in favor of a schema-less database. NoSQL
databases come in different forms and address different use cases. The spectrum includes key-value
stores (Redis, Amazon Dynamo DB), column stores (HBase, Cassandra), document stores (Mongo DB,
Couchbase), graph databases (Neo4J), and search engines (Solr, ElasticSearch, Splunk). The primary
distinction between these NoSQL databases and SQL databases is the absence of a rigid schema in
the former. NoSQL databases, unstructured, and semi-structured data fall under the purview of Big
Data.

Big Data

Big Data usually includes data sets with sizes beyond the capability of traditional software tools (e.g.,
SQL technologies) to capture, curate, manage, and process data within a tolerable elapsed time. The
“big” in the term Big Data is a moving target that changes to a higher number as software and
hardware capabilities enhance to process higher volumes of data.

MySQL History & Architecture

In this lesson, we describe the overall architecture of MySQL RDBMS.

We'll cover the following

• MySQL history & architecture

• SQL standard and implementations

• Application/Client Layer

• MySQL Server Layer

• Storage Engine Layer

MySQL history & architecture


SQL stands for Structured Query Language but was initially conceptualized as SEQUEL Structured
English Query Language. The name was changed from SEQUEL to SQL because another company
already held the copyright to SEQUEL. SQL was originally developed by two IBM engineers to
manipulate data stored in an IBM-developed database, which was somewhat based on the relational
model for databases.

SQL standard and implementations

SQL was accepted as a standard by ANSI in 1986 and ISO in 1987. It has been revised a number of
times. Various companies have created implementations of this standard. The standard can be
thought of as the blueprint that describes the behavior while the actual implementation consists
of MySQL, SQL Server, PostgreSQL, etc.

The implementations by various vendors are similar because they all conform to the same standard.
There are some differences in syntax though. Popular implementations include Oracle RDBMS, IBM
DB2, Microsoft SQL Server, Teradata, and MySQL.

SQL is important because many companies have databases based on the relational model that are
powered by SQL related technologies.

MySQL consists of different components that we’ll discuss in this lesson. MySQL is based on a client-
server architecture. Applications such as a website or the MySQL command line client connect and
interact with the databases through the MySQL server and the server responds to queries from
clients. There are three layers that we can divide MySQL into:

1. Application/Client Layer

2. MySQL Server Layer

3. Storage Engine Layer

Application/Client Layer
The application layer is responsible for client connections, authorization, authentication, and
security.

MySQL Server Layer

The server layer is responsible for parsing, analyzing, and optimizing submitted queries. Additionally,
it also maintains caches and buffers. MySQL provides other built-in functionality such as recovery and
backup partitioning that are also handled by this layer. Lastly, the SQL interface for interacting with
the database is also part of this layer.

This layer is also referred to as the relational engine. The output of this layer is a query execution
plan that is fed into the storage engine. The storage engine then modifies or retrieves data as per the
plan.

Storage Engine Layer

The storage engine is that part of the DBMS that actually writes and retrieves data from the
underlying physical storage medium. In the case of MySQL, the architecture allows the user to
choose the storage engine from a given set of possibilities. Different storage engines are tailored to
the needs of the specific use-cases they address. For instance, the InnoDB storage engine supports
foreign key constraints and transactions whereas MyISAM is much simpler, lacking those two
features but better suited for single-user applications.
Characteristics of DBMS

DBMS stands for Data Base Management System. It is a set of computer programs that are used for
the creation and modification of a database. It is a software integrity package. The Data Base
Management System also acts as an intermediate between the end user and the Database. It also
establishes an environment for multiuser to create, access, and manipulate the data in the Database.

Characteristics of DBMS

Some well-known characteristics are present in the DBMS (Database Management System). These
are explained below.

1. Real World Entity

o The reality of DBMS (Database Management System) is one of the most important and easily
understandable characteristics. The DBMS (Database Management System) is developed in
such a way that it can manage huge business organizations and store their business data with
security.

o The Database can store information such as the cost of vegetables, milk, bread, etc. In DBMS
(Database Management System), the entities look like real-world entities.

o For example, if we want to create a student database, we need some entity. Any student stores
their data.

o In the Database, then, it should be the real-world entity. The most commonly used properties
in the student database are name, age, gender, roll number, etc.

2. Self-explaining nature

o In DBMS (Database Management System), the Database contains another database, and
another database also contains metadata.

o Here the term metadata means data about data.

o For example, in a school database, the total number of rows and the table's name are examples
of metadata.

o So the self-explaining nature means the Database explains all the information automatically
itself. This is because, in the Database, all the data are stored in a structured format.

3. Atomicity of Operations (Transactions)

o Here, atomicity means either the operation should be performed or not performed. i.e., it
should complete the operation on 0% or 100%.

o Here DBMS (Database Management System) provides atomicity as a characteristic. This is the
most important and useful characteristic of the DBMS (Database Management System). You
can completely understand the atomicity with the help of the below example.

o For example, every bank has its own Database, and the Database contains all the information
about its customers. Let transaction is the most common atomic operation of the bank. If Sona
wants to transfer 1000 rupees to the Archita account, it is possible with the help of the
atomicity feature of the Database. If there is a problem in the Archita account, if there is a
problem in the atomicity of the Database, then the money will be deducted from the Sona
account but not credited to the Archita account.

o The Database has the feature of atomicity then; such transactions have not occurred at all,
and if the transaction fails, then the money will automatically return to the sender account.

o Basically, for a successful transaction, the total operation depends on the Database. If the
Database works perfectly, the transaction will be successful, and if the Database fails, the
whole banking server will be down.

4. Concurrent Access without Anomalies

o Here the term anomalies mean multiuser can access the Database and fetch the information
without any problem.

o For a better understanding, let's take the example of a bank again. Let Sonu give his ATM card
to his sister Archita and tell her to withdraw 5000 from the ATM. At the same time, Sonu
transferred 2000 rupees to his brother Monu. At the same time, both operations perform
successfully. Initially, Sonu had 10000 rupees in his bank account. After both transactions, i.e.,
transfer and withdraw, when Sonu checks his bank balance, it shows 3000 rupees. This error-
free updation of bank balance is possible with the help of the concurrent feature of the
Database.

o Thus here we see that concurrent is a great feature of the Database.

5. Stores Any Kind of Structured Data

o The Database has the ability to store the data in a structured format.

o In most of the websites, we see that only student database examples are given for a better
understanding, but the important fact is that the Database has the ability to store an unlimited
amount of data.

o DBMS has the ability to store any type of data that exists in the real world, and these data are
structured way. It is another type of very important characteristic of DBMS.

6. Integrity

o Here the term integrity means the data should be correct and consistent in nature. Let's
understand this by taking an example.

o Let's say there is a bank named ABC bank, and ABC bank has its own Database for the storage
of its customer data. If we try to enter the account details of ABC bank and the account details
are not available in the bank, then the Database gives the incorrect output. However, if a
customer changes their address but the new address is not updated in the Database, it is called
data inconsistency.

o So the data available in the Database should be correct as well as consistent.

o If someone's account has zero balance and later the customer deposits 6000 rupees in his
account, if the new account balance is not updated in the Database, it creates a problem for
the customer.

7. Ease of Access (The DBMS Queries)


o The file and folder system was used to store the data before the DBMS came to the market.

o Searching for the student's name was a very difficult task at that time. This is because every
search operation is done manually in the file and folder system. But when DBMS comes into
the market, it is very easy to access the Database.

o In DBMS, we can search any kind of stored data by applying a simple search operation query.
It is so much faster than manual searching.

o In DBMS, there is a CRUD operation ( here CRUD means Create, Read, Update & Delete) by
which we can implement all the types of query in the Database.

8. SQL and No-SQL Databases

o There are two types of databases (not DBMS): SQL and No-SQL.

o The SQL databases store the data in the form of Tables, i.e., rows and columns. The No-SQL
databases can store data in any form other than a table. For instance: the very popular
MongoDB stores the data in the form of JSON (JavaScript Object Notation).

o The availability of SQL and No-SQL databases allows us to choose the method of storing the
data as well.

o There should not be any debate between SQL and No-SQL databases. The one that we require
for a particular project is better for that project, while the other might be better for some other
use.

o This is a characteristic of DBMS because DBMS allows us to perform operations on both kinds
of databases. So, we can run queries and operations on SQL as well as No-SQL databases.

9. ACID Properties

o The DBMS follows certain properties to maintain consistency in the Database. These
properties are usually termed ACID Properties.

o However, we have already talked about some of these properties, but it is very important to
mention the ACID properties as a whole.

o ACID stands for Atomicity, Consistency, Isolation, and Durability.

o We have already talked about atomicity and consistency. Atomicity means the transaction
should either be 0% or 100% completed, and consistency means that the change in data
should be reflected everywhere in a database.

o Isolation means that multiple transactions can occur independently without the interference
of some other transactions.

o Durability means that the chances of a successful atomic transaction, i.e., a transaction that
has been 100% completed, should reflect in the Database.

10. Security

o The Database should be accessible to the users in a limited way.

o The access to make changes to a database by the user should be limited, and the users must
not be given complete access to the entire Database.
o Unauthorized users should not be allowed to access the Database.

o Authentication: The DBMS has authentication for various users that directly refers to the limit
to which the user can access the Database. Authentication means the process of laughing in
of the user only with the rights that he/she has been authorized to. For instance, in any
organization, the admin has access to make changes to the Database of the organization as
some new employee might have joined the organization or someone might have left it.
However, the employees have access only to their personal profiles and can make changes to
them only. They cannot access the Database of any other employee or the organization as a
whole.

Data Base Architehtire:


One of the primary aims of a database is to supply users with an abstract view of data, hiding a
certain element of how data is stored and manipulated. Therefore, the starting point for the design
of a database should be an abstract and general description of the information needs of the
organization that is to be represented in the database. And hence you will require an environment
to store data and make it work as a database. In this chapter, you will learn about the database
environment and its architecture.
What is Data Base Environment:
A database environment is a collective system of components that comprise and regulates the group
of data, management, and use of data, which consist of software, hardware, people, techniques of
handling database, and the data also.

Here, the hardware in a database environment means the computers and computer peripherals that
are being used to manage a database, and the software means the whole thing right from the
operating system (OS) to the application programs that include database management software like
M.S. Access or SQL Server. Again the people in a database environment include those people who
administrate and use the system. The techniques are the rules, concepts, and instructions given to
both the people and the software along with the data with the group of facts and information
positioned within the database environment.

What is Data Base Architecture?

DBMS architecture is depending on its design and can be of the following types:

• Centralized

• Decentralized

• Hierarchical
A DBMS architecture can be seen as either a single-tier or multi-tier. An architecture having n-tier
splits the entire system into related but independent n modules that can be independently
customized, changed, altered, or replaced.

The architecture of a database system is very much influenced by the primary computer system on
which the database system runs. Database systems can be centralized, or client-server, where one
server machine executes work on behalf of multiple client machines. Database systems can also be
designed to exploit parallel computer architectures. Distributed databases span multiple
geographically separated machines.

Data Model:
The relational model is the theoretical basis of relational databases, which is a technique or way of
structuring data using relations, which are grid-like mathematical structures consisting of columns
and rows. Codd proposed the relational model for IBM, but the idea became extremely vital and
prominent that his work would become the basis of relational databases. You might be very familiar
with the physical demonstration of a relation in a database - which is known as a table.

In the relational model, all data is logically structured within relations, i.e., tables, as mentioned
above. Each relation has a name and is formed from named attributes or columns of data. Each tuple
or row holds one value per attribute. The greatest strength of the relational model is the simple
logical structure that it forms. Behind this simple structure is a sophisticated theoretical foundation
that is lacking in the first generation of DBMSs.

Objectives of the Relational Model

The relational model's objectives were specified as follows:

• To allow a high degree of data independence, application programs must not be affected by
alterations to the internal data representation, mostly by changes to file organizations or
access paths.

• To provide considerable grounds for dealing with data semantics, reliability, and redundancy
problems. In particular, Codd's theory for the relational model introduced the concept of
normalized relations, were relations that have no repeating groups, and the process is called
normalization.

• To allow the expansion of set-oriented data manipulation languages.

Real-life Structure of a Relational Database

In general, a row in a table signifies a relationship among a group of values. Since a table is a
collection of such relationships, there is a close connection amongst the concept of the table and the
mathematical concept of relation, from which the relational data model gets its name. In
mathematical terminology, a tuple is simply a sequence or list of values. A relationship
between n values is indicated mathematically by an n-tuple of values, i.e., a tuple with n values,
corresponds to a row in a table.

Database Schema
When you talk about the database, you must distinguish between the database schema, which is the
logical blueprint of the database, and the database instance, which is a snapshot of the data in the
database at a given instant in time. The concept of a relation corresponds to the programming
language notion of a variable. In contrast, the concept of a relation schema corresponds to the
programming languages' notion of the type definition. In other words, a database schema is a
skeletal structure that represents the logical view of the complete database. It describes how the
data is organized and how the relations among them are associated and formulates all the
constraints that are to be applied to the data.

In general, a relation schema consists of a directory of attributes and their corresponding domain.

• Relation: A relation is a table with columns and rows.

• Attribute: An attribute is a named column of a relation.

• Domain: A domain is the set of allowable values for one or more attributes.

• Tuple: A tuple is a row of a relation

Schema and Instance


“Schema” and “Instance” are key ideas in a database management system (DBMS) that help organize
and manage data. Let’s begin by examining their distinctions from one another.

Instances

An Instance is the state of an operational database with data at any given time. It contains a snapshot
of the database. The instances can be changed by certain CRUD operations, such as like addition, and
deletion of data. It may be noted that any search query will not make any kind of changes in the
instances.

Example:
Let’s say a table teacher in our database whose name is School, suppose the table has 50 records so
the instance of the database has 50 records for now and tomorrow we are going to add another fifty
records so tomorrow the instance has a total of 100 records. This is called an instance.

Schema

Schema is the overall description of the database. The basic structure of how the data will be stored
in the database is called schema.

Schema

Schema is of three types: Logical Schema, Physical Schema and view Schema.

Difference Between Schema and Instance

Schema Instance

It is the collection of information stored in a


It is the overall description of the database.
database at a particular moment.

Data in instances can be changed using


The schema is same for the whole database.
addition, deletion, and updation.

Does not change Frequently. Changes Frequently.


Schema Instance

Defines the basic structure of the database i.e. It is the set of Information stored at a
how the data will be stored in the database. particular time.

DBMS DATA SCHEMA:

A schema can be defined as the design of a database. The overall description of the database is
called the database schema. It can be categorized into three parts. These are:

• Physical Schema

• Logical Schema

• View Schema

A physical schema can be defined as the design of a database at its physical level. In this level, it is
expressed how data is stored in blocks of storage.

A logical schema can be defined as the design of the database at its logical level. In this level, the
programmers, as well as the database administrator (DBA), work. At this level, data can be described
as certain types of data records that can be stored in the form of data structures. However, the internal
details (such as an implementation of data structure) will be remaining hidden at this level.

View schema can be defined as the design of the database at the view level, which generally describes
end-user interaction with database systems.

For example: Let suppose you are storing students' information on a student's table. At the physical
level, these records are described as chunks of storage (in bytes, gigabytes, terabytes, or higher) in
memory, and these elements often remain hidden from the programmers. Then comes the logical
level; here at a logical level, these records can be illustrated as fields and attributes along with their
data type(s); their relationship with each other can be logically implemented. Programmers generally
work at this level because they are aware of such things about database systems. At view level, a user
can able to interact with the system, with the help of GUI, and enter the details on the screen. The
users are not aware of the fact of how the data is stored and what data is stored; such features are
hidden from them.

Detail Explanation on 3 Layers of Schema

As we came to know that there are three different types of schema in the database and these are
defined according to the levels of abstraction of the three-level architecture portrayed in the above
figure, at the highest level, there is multiple external schemas (view level schema) (also called sub-
schemas) that match up to different views of the data. At the conceptual level, there is the conceptual
schema or the logical schema that describes all the entities, attributes, and relationships together with
integrity constraints. At the lowest level of abstraction, there is the internal schema or the physical
schema that creates a complete description of the internal model, containing the classifications of
stored records, the methods of representation, the data fields, storage structures used, etc. It is to be
noted that there will be only one conceptual schema and one internal schema per database. The DBMS
is responsible for mapping between these three types of schema.

It must also check the schemas for consistency; which means, the DBMS must verify that each external
schema is derivable from the conceptual schema, and must use the information in the conceptual
schema for mapping among those external schemas and the internal schema. It also allows any
differences in entity names, attributes names, attributes order, data types, and so on, to be
determined. Lastly, each external schema is related to the conceptual schema by the
external/conceptual mapping. This enables the DBMS to map names in the user's view on the relevant
part of the conceptual schema.

Codd’s Rules in DBMS

This set of rules basically signifies the characteristics and requirements of a relational database
management system (RDBMS). In this article, we will learn about various Codd’s rules.

Rule 1: The Information Rule

All information, whether it is user information or metadata, that is stored in a database must be
entered as a value in a cell of a table. It is said that everything within the database is organized in a
table layout.

Rule 2: The Guaranteed Access Rule

Each data element is guaranteed to be accessible logically with a combination of the table name,
primary key (row value), and attribute name (column value).

Rule 3: Systematic Treatment of NULL Values

Every Null value in a database must be given a systematic and uniform treatment.

Rule 4: Active Online Catalog Rule


The database catalog, which contains metadata about the database, must be stored and accessed
using the same relational database management system.

Rule 5: The Comprehensive Data Sublanguage Rule

A crucial component of any efficient database system is its ability to offer an easily understandable
data manipulation language (DML) that facilitates defining, querying, and modifying information
within the database.

Rule 6: The View Updating Rule

All views that are theoretically updatable must also be updatable by the system.

Rule 7: High-level Insert, Update, and Delete

A successful database system must possess the feature of facilitating high-level insertions, updates,
and deletions that can grant users the ability to conduct these operations with ease through a single
query.

Rule 8: Physical Data Independence

Application programs and activities should remain unaffected when changes are made to the physical
storage structures or methods.

Rule 9: Logical Data Independence

Application programs and activities should remain unaffected when changes are made to the logical
structure of the data, such as adding or modifying tables.

Rule 10: Integrity Independence

Integrity constraints should be specified separately from application programs and stored in the
catalog. They should be automatically enforced by the database system.

Rule 11: Distribution Independence

The distribution of data across multiple locations should be invisible to users, and the database system
should handle the distribution transparently.

Rule 12: Non-Subversion Rule

If the interface of the system is providing access to low-level records, then the interface must not be
able to damage the system and bypass security and integrity constraints.

Data Independence
o Data independence can be explained using the three-schema architecture.

o Data independence refers characteristic of being able to modify the schema at one level of
the database system without altering the schema at the next higher level.

There are two types of data independence:

1. Logical Data Independence


o Logical data independence refers characteristic of being able to change the conceptual
schema without having to change the external schema.

o Logical data independence is used to separate the external level from the conceptual view.

o If we do any changes in the conceptual view of the data, then the user view of the data
would not be affected.

o Logical data independence occurs at the user interface level.

2. Physical Data Independence

o Physical data independence can be defined as the capacity to change the internal schema
without having to change the conceptual schema.

o If we do any changes in the storage size of the database system server, then the Conceptual
structure of the database will not be affected.

o Physical data independence is used to separate conceptual levels from the internal levels.

o Physical data independence occurs at the logical interface level.

Fig: Data Independence

Database Languages in DBMS


o A DBMS has appropriate languages and interfaces to express database queries and updates.

o Database languages can be used to read, store and update the data in the database.

Types of Database Languages

1. Data Definition Language (DDL)

o DDL stands for Data Definition Language. It is used to define database structure or pattern.

o It is used to create schema, tables, indexes, constraints, etc. in the database.

o Using the DDL statements, you can create the skeleton of the database.

o Data definition language is used to store the information of metadata like the number of
tables and schemas, their names, indexes, columns in each table, constraints, etc.

Here are some tasks that come under DDL:

o Create: It is used to create objects in the database.

o Alter: It is used to alter the structure of the database.

o Drop: It is used to delete objects from the database.

o Truncate: It is used to remove all records from a table.

o Rename: It is used to rename an object.

o Comment: It is used to comment on the data dictionary.

These commands are used to update the database schema that's why they come under Data
definition language.

2. Data Manipulation Language (DML)


DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a
database. It handles user requests.

Here are some tasks that come under DML:

o Select: It is used to retrieve data from a database.

o Insert: It is used to insert data into a table.

o Update: It is used to update existing data within a table.

o Delete: It is used to delete all records from a table.

o Merge: It performs UPSERT operation, i.e., insert or update operations.

o Call: It is used to call a structured query language or a Java subprogram.

o Explain Plan: It has the parameter of explaining data.

o Lock Table: It controls concurrency.

3. Data Control Language (DCL)

o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.

o The DCL execution is transactional. It also has rollback parameters.

(But in Oracle database, the execution of data control language does not have the feature of rolling
back.)

Here are some tasks that come under DCL:

o Grant: It is used to give user access privileges to a database.

o Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:

CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.

4. Transaction Control Language (TCL)

TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical
transaction.

Here are some tasks that come under TCL:

o Commit: It is used to save the transaction on the database.

o Rollback: It is used to restore the database to original since the last Commit.

Centralized and Client Server Architecture for DBMS


Centralized Architecture of DBMS:

Architectures for DBMSs have generally followed trends seen in architectures for larger computer
systems. The primary processing for all system functions, including user application programs, user
interface programs, and all DBMS capabilities, was handled by mainframe computers in earlier
systems. The primary cause of this was that the majority of users accessed such systems using
computer terminals with limited processing power and merely display capabilities. Only display data
and controls were delivered from the computer system to the display terminals, which were connected
to the central node by a variety of communications networks, while all processing was done remotely
on the computer system.

The majority of users switched from terminals to PCs and workstations as hardware prices decreased.
Initially, Database Systems operated on these computers in a manner akin to how they had operated
display terminals. As a result, the DBMS itself continued to operate as a centralized DBMS, where all
DBMS functionality, application program execution, and UI processing were done on a single computer.
The physical elements of a centralized architecture Client/server DBMS designs emerged as DBMS
systems gradually began to take advantage of the user side's computing capability.

Client-server Architecture of DBMS:

We first talk about client/server architecture in general, and then we look at how DBMSs use it. In
order to handle computing settings with a high number of PCs, workstations, file servers, printers,
database servers, etc., the client/server architecture was designed.

A network connects various pieces of software and hardware, including email and web server
software. To define specialized servers with a particular functionality is the aim. For instance, it is
feasible to link a number of PCs or compact workstations to a file server that manages the client
machines' files as clients. By having connections to numerous printers, different devices can be
designated as a printer server; all print requests from clients are then directed to this machine. The
category of specialized servers also includes web servers and email servers. Many client machines can
utilize the resources offered by specialized servers. The user is given the proper user interfaces for
these servers as well as local processing power to run local applications on the client devices. This idea
can be applied to various types of software, where specialist applications, like a CAD (computer-aided
design) package, are kept on particular server computers and made available to a variety of clients.
Some devices (such as workstations or PCs with discs that only have client software installed) would
only be client sites.

The idea of client/server architecture presupposes an underpinning structure made up of several PCs
and workstations as well as fewer mainframe computers connected via LANs as well as other types of
computer networks. In this system, a client is often a user machine that offers local processing and
user interface capabilities. When a client needs access to extra features-like database access-that are
not available on that system, it connects to a server that offers those features. A server is a computer
system that includes both hardware and software that can offer client computer services like file
access, printing, archiving, or database access. Generally speaking, some workstations install both
client and server software, while others just install client software. Client and server software,
however, typically run on separate workstations, which is more typical. On this underlying client/server
framework, Two-tier and Three-tier fundamental DBMS architectures were developed.

Two-Tier Client Server Architecture:

Here, the term "two-tier" refers to our architecture's two layers-the Client layer and the Data layer.
There are a number of client computers in the client layer that can contact the database server. The
API on the client computer will use JDBC or some other method to link the computer to the database
server. This is due to the possibility of various physical locations for clients and database servers.

Three-Tier Client-Server Architecture:

The Business Logic Layer is an additional layer that serves as a link between the Client layer and the
Data layer in this instance. The layer where the application programs are processed is the business
logic layer, unlike a Two-tier architecture, where queries are performed in the database server. Here,
the application programs are processed in the application server itself.

Types of Databases

There are various types of databases used for storing different varieties of data:

1) Centralized Database

It is the type of database that stores data at a centralized database system. It comforts the users to
access the stored data from different locations through several applications. These applications contain
the authentication process to let users access data securely. An example of a Centralized database can
be Central Library that carries a central database of each library in a college/university.

Advantages of Centralized Database

o It has decreased the risk of data management, i.e., manipulation of data will not affect the
core data.

o Data consistency is maintained as it manages data in a central repository.

o It provides better data quality, which enables organizations to establish data standards.

o It is less costly because fewer vendors are required to handle the data sets.

Disadvantages of Centralized Database

o The size of the centralized database is large, which increases the response time for fetching
the data.

o It is not easy to update such an extensive database system.

o If any server failure occurs, entire data will be lost, which could be a huge loss.

2) Distributed Database

Unlike a centralized database system, in distributed systems, data is distributed among different
database systems of an organization. These database systems are connected via communication links.
Such links help the end-users to access the data easily. Examples of the Distributed database are
Apache Cassandra, HBase, Ignite, etc.

We can further divide a distributed database system into:

o Homogeneous DDB: Those database systems which execute on the same operating system
and use the same application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which execute on different operating systems
under different application procedures, and carries different hardware devices.

Advantages of Distributed Database

o Modular development is possible in a distributed database, i.e., the system can be expanded
by including new computers and connecting them to the distributed system.

o One server failure will not affect the entire data set.

3) Relational Database

This database is based on the relational data model, which stores data in the form of rows(tuple) and
columns(attributes), and together forms a table(relation). A relational database uses SQL for storing,
manipulating, as well as maintaining the data. E.F. Codd invented the database in 1970. Each table in
the database carries a key that makes the data unique from others. Examples of Relational databases
are MySQL, Microsoft SQL Server, Oracle, etc.

Properties of Relational Database

There are following four commonly known properties of a relational model known as ACID properties,
where:

A means Atomicity: This ensures the data operation will complete either with success or with failure.
It follows the 'all or nothing' strategy. For example, a transaction will either be committed or will abort.

C means Consistency: If we perform any operation over the data, its value before and after the
operation should be preserved. For example, the account balance before and after the transaction
should be correct, i.e., it should remain conserved.

I means Isolation: There can be concurrent users for accessing data at the same time from the
database. Thus, isolation between the data should remain isolated. For example, when multiple
transactions occur at the same time, one transaction effects should not be visible to the other
transactions in the database.

D means Durability: It ensures that once it completes the operation and commits the data, data
changes should remain permanent.

4) NoSQL Database

Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data sets. It is not
a relational database as it stores data not only in tabular form but in several different ways. It came
into existence when the demand for building modern applications increased. Thus, NoSQL presented
a wide variety of database technologies in response to the demands. We can further divide a NoSQL
database into the following four types:
Key-value storage: It is the simplest type of database storage where it stores every single item as a key
(or attribute name) holding its value, together.

Document-oriented Database: A type of database used to store data as JSON-like document. It helps
developers in storing data by using the same document-model format as used in the application code.

Graph Databases: It is used for storing vast amounts of data in a graph-like structure. Most commonly,
social networking websites use the graph database.

Wide-column stores: It is similar to the data represented in relational databases. Here, data is stored
in large columns together, instead of storing in rows.

Advantages of NoSQL Database

o It enables good productivity in the application development as it is not required to store data
in a structured format.

o It is a better option for managing and handling large data sets.

o It provides high scalability.

o Users can quickly access data from the database through key-value.

5) Cloud Database

A type of database where data is stored in a virtual environment and executes over the cloud
computing platform. It provides users with various cloud computing services (SaaS, PaaS, IaaS, etc.) for
accessing the database. There are numerous cloud platforms, but the best options are:

o Amazon Web Services(AWS)

o Microsoft Azure
o Kamatera

o PhonixNAP

o ScienceSoft

o Google Cloud SQL, etc.

6) Object-oriented Databases

The type of database that uses the object-based data model approach for storing data in the database
system. The data is represented and stored as objects which are similar to the objects used in the
object-oriented programming language.

7) Hierarchical Databases

It is the type of database that stores data in the form of parent-children relationship nodes. Here, it
organizes data in a tree-like structure.

Data get stored in the form of records that are connected via links. Each child record in the tree will
contain only one parent. On the other hand, each parent record can have multiple child records.

8) Network Databases

It is the database that typically follows the network data model. Here, the representation of data is in
the form of nodes connected via links between them. Unlike the hierarchical database, it allows each
record to have multiple children and parent nodes to form a generalized graph structure.

9) Personal Database

Collecting and storing data on the user's system defines a Personal Database. This database is basically
designed for a single user.

Advantage of Personal Database


o It is simple and easy to handle.

o It occupies less storage space as it is small in size.

10) Operational Database

The type of database which creates and updates the database in real-time. It is basically designed for
executing and handling the daily data operations in several businesses. For example, An organization
uses operational databases for managing per day transactions.

11) Enterprise Database

Large organizations or enterprises use this database for managing a massive amount of data. It helps
organizations to increase and improve their efficiency. Such a database allows simultaneous access to
users.

Advantages of Enterprise Database:

o Multi processes are supportable over the Enterprise database.

o It allows executing parallel queries on the system.


ER (Entity Relationship) Diagram in DBMS
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is
used to define the data elements and relationship for a specified system.

o It develops a conceptual design for the database. It also develops a very simple and easy to
design view of data.

o In ER modeling, the database structure is portrayed as a diagram called an entity-relationship


diagram.

For example, Suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc. The address can be another entity with attributes like city,
street name, pin code, etc and there will be a relationship between them.

Component of ER Diagram
1. Entity:

An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as
rectangles.

Consider an organization as an example- manager, product, employee, department etc. can be taken
as an entity.

a. Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key
attribute of its own. The weak entity is represented by a double rectangle.

2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the text underlined.

b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. The composite
attribute is represented by an ellipse, and those ellipses are connected with an ellipse.

c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued attribute. The
double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.
d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from another attribute like Date of
birth.

3. Relationship

A relationship is used to describe the relation between entities. Diamond or rhombus is used to
represent the relationship.

Types of relationship are as follows:

a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known as one to one
relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then this is known as a one-to-many relationship.

For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.

c. Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity on the right
associates with the relationship then it is known as a many-to-one relationship.

For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.
Notation of ER diagram

Database can be represented using the notations. In ER diagram, many notations are used to express
the cardinality. These notations are as follows:

Fig: Notations of ER diagram

Cardinality in DBMS (Mapping Constraints)


DBMS

DBMS stands for Database Management System, which is a tool, or a software used to do various
operations on a Database like the Creation of the Database, Deletion of the Database, or Updating the
current Database. To simplify processing and data querying, the most popular types of Databases
currently in use typically model their data as rows and columns in a set of tables. The data may then
be handled, updated, regulated, and structured with ease. For writing and querying data, most
Databases employ Structured Query Language (SQL).
Cardinality

Cardinality means how the entities are arranged to each other or what is the relationship structure
between entities in a relationship set. In a Database Management System, Cardinality represents a
number that denotes how many times an entity is participating with another entity in a relationship
set. The Cardinality of DBMS is a very important attribute in representing the structure of a Database.
In a table, the number of rows or tuples represents the Cardinality.

Cardinality Ratio

Cardinality ratio is also called Cardinality Mapping, which represents the mapping of one entity set to
another entity set in a relationship set. We generally take the example of a binary relationship set
where two entities are mapped to each other.

Cardinality is very important in the Database of various businesses. For example, if we want to track
the purchase history of each customer then we can use the one-to-many cardinality to find the data
of a specific customer. The Cardinality model can be used in Databases by Database Managers for a
variety of purposes, but corporations often use it to evaluate customer or inventory data.

There are four types of Cardinality Mapping in Database Management Systems:

1. One to one

2. Many to one

3. One to many

4. Many to many

One to One

One to one cardinality is represented by a 1:1 symbol. In this, there is at most one relationship from
one entity to another entity. There are a lot of examples of one-to-one cardinality in real life databases.

For example, one student can have only one student id, and one student id can belong to only one
student. So, the relationship mapping between student and student id will be one to one cardinality
mapping.

Another example is the relationship between the director of the school and the school because one
school can have a maximum of one director, and one director can belong to only one school.

Note: it is not necessary that there would be a mapping for all entities in an entity set in one-to-one
cardinality. Some entities cannot participate in the mapping.
Many to One Cardinality:

In many to one cardinality mapping, from set 1, there can be multiple sets that can make relationships
with a single entity of set 2. Or we can also describe it as from set 2, and one entity can make a
relationship with more than one entity of set 1.

One to one Cardinality is the subset of Many to one Cardinality. It can be represented by M:1.

For example, there are multiple patients in a hospital who are served by a single doctor, so the
relationship between patients and doctors can be represented by Many to one Cardinality.

One to Many Cardinalities:


In One-to-many cardinality mapping, from set 1, there can be a maximum single set that can make
relationships with a single or more than one entity of set 2. Or we can also describe it as from set 2,
more than one entity can make a relationship with only one entity of set 1.

One to one cardinality is the subset of One-to-many Cardinality. It can be represented by 1: M.

For Example, in a hospital, there can be various compounders, so the relationship between the
hospital and compounders can be mapped through One-to-many Cardinality.

Many to Many Cardinalities:

In many, many cardinalities mapping, there can be one or more than one entity that can associate with
one or more than one entity of set 2. In the same way from the end of set 2, one or more than one
entity can make a relation with one or more than one entity of set 1.

It is represented by M: N or N: M.

One to one cardinality, One to many cardinalities, and Many to one cardinality is the subset of the
many to many cardinalities.

For Example, in a college, multiple students can work on a single project, and a single student can also
work on multiple projects. So, the relationship between the project and the student can be
represented by many to many cardinalities.
Appropriate Mapping Cardinality

Evidently, the real-world context in which the relation set is modeled determines the Appropriate
Mapping Cardinality for a specific relation set.

o We can combine relational tables with many involved tables if the Cardinality is one-to-many
or many-to-one.

o One entity can be combined with a relation table if it has a one-to-one relationship and total
participation, and two entities can be combined with their relation to form a single table if
both of them have total participation.

o We cannot mix any two tables if the Cardinality is many-to-many.


Keys
o Keys play an important role in the relational database.

o It is used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.

For example, ID is used as a key in the Student table because it is unique for each student. In the
PERSON table, passport_number, license_number, SSN are keys since they are unique for each person.

Types of keys:

1. Primary key

o It is the first key used to identify one and only one instance of an entity uniquely. An entity can
contain multiple keys, as we saw in the PERSON table. The key which is most suitable from
those lists becomes a primary key.

o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary keys
since they are also unique.

o For each entity, the primary key selection is based on requirements and developers.
2. Candidate key

o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.

o Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes,
like SSN, Passport_Number, License_Number, etc., are considered a candidate key.

3. Super Key

Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate
key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can
also be a key.

4. Foreign key

o Foreign keys are the column of the table used to point to the primary key of another table.

o Every employee works in a specific department in a company, and employee and department
are two different entities. So we can't store the department's information in the employee
table. That's why we link these two tables through the primary key of one table.

o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the
EMPLOYEE table.

o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.

5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple
in a relation. These attributes or combinations of the attributes are called the candidate keys. One key
is chosen as the primary key from these candidate keys, and the remaining candidate key, if it exists, is
termed the alternate key. In other words, the total number of the alternate keys is the total number
of candidate keys minus the primary key. The alternate key may or may not exist. If there is only one
candidate key in a relation, it does not have an alternate key.

For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate
keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No,
acts as the Alternate key.

6. Composite key

Whenever a primary key consists of more than one attribute, it is known as a composite key. This key
is also known as Concatenated Key.

For example, in employee relations, we assume that an employee may be assigned multiple roles, and
an employee may work on multiple projects simultaneously. So the primary key will be composed of
all three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes act as
a composite key since the primary key comprises more than one attribute.
7. Artificial key

The key created using arbitrarily assigned data are known as artificial keys. These keys are created
when a primary key is large and complex and has no relationship with many other relations. The data
values of the artificial keys are usually numbered in a serial order.

For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add a new virtual attribute to identify each tuple in the
relation uniquely.

Generalization
o Generalization is like a bottom-up approach in which two or more entities of lower level
combine to form a higher level entity if they have some attributes in common.

o In generalization, an entity of a higher level can also combine with the entities of the lower
level to form a further higher level entity.

o Generalization is more like subclass and superclass system, but the only difference is the
approach. Generalization uses the bottom-up approach.

o In generalization, entities are combined to form a more generalized entity, i.e., subclasses are
combined to make a superclass.

For example, Faculty and Student entities can be generalized and create a higher level entity Person.

Specialization
o Specialization is a top-down approach, and it is opposite to Generalization. In specialization,
one higher level entity can be broken down into two lower level entities.

o Specialization is used to identify the subset of an entity set that shares some distinguishing
characteristics.

o Normally, the superclass is defined first, the subclass and its related attributes are defined
next, and relationship set are then added.
For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER or
DEVELOPER based on what role they play in the company.

Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation,
relationship with its corresponding entities is aggregated into a higher level entity.

For example: Center entity offers the Course entity act as a single entity in the relationship which is in
a relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he
will never enquiry about the Course only or just about the Center instead he will ask the enquiry about
both.

Reduction of ER diagram to Table


The database can be represented using the notations, and these notations can be reduced to a
collection of tables.

In the database, every entity set or relationship set can be represented in tabular form.

The ER diagram is given below:

There are some points for converting the ER diagram to the table:

o Entity type becomes a table.

In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual tables.

o All single-valued attribute becomes a column for the table.

In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the column of STUDENT table.
Similarly, COURSE_NAME and COURSE_ID form the column of COURSE table and so on.

o A key attribute of the entity type represented by the primary key.

In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are the key attribute
of the entity.

o The multivalued attribute is represented by a separate table.


In the student table, a hobby is a multivalued attribute. So it is not possible to represent multiple values
in a single column of STUDENT table. Hence we create a table STUD_HOBBY with column name
STUDENT_ID and HOBBY. Using both the column, we create a composite key.

o Composite attribute represented by components.

In the given ER diagram, student address is a composite attribute. It contains CITY, PIN, DOOR#, STREET,
and STATE. In the STUDENT table, these attributes can merge as an individual column.

o Derived attributes are not considered in the table.

In the STUDENT table, Age is the derived attribute. It can be calculated at any point of time by
calculating the difference between current date and Date of Birth.

Using these rules, you can convert the ER diagram to tables and columns and assign the mapping
between the tables. Table structure for the given ER diagram is as below:

Figure: Table structure


It is getting harder and harder to apply the conventional ER paradigm for database modeling as data
complexity rises today. The existing ER model needs to be enhanced or improved for it to better handle
the complicated application to reduce the modeling complexity and you may want to opt for an
enhanced entity-relationship model.

All the components of an ER diagram are included in an EER diagram, with the following additions that
help us create and maintain detailed databases through high-level models and tools.

• Specialization and Generalization

• Aggregation

• Category or Union types

• Subclass and superclass

What is an EER Diagram?

Today, as data creation has expanded, so has the complexity of data. As a result, using the standard
Entity-Relationship paradigm for database modeling is becoming increasingly problematic. To reduce
modeling complexity, there was a need to update and enhance the existing Entity-Relationship model
so that it could manage modeling and application complexity better. This is where the concept of the
EER model i.e. Enhanced ER Models comes into play.

Like regular ER diagrams, enhanced ER models are sophisticated database diagrams that show the
requirements and complexity of many complex databases. In a word, it is a hybrid ER model with a few
more or fewer difficulties. A sort of diagram used to depict Enhanced ER Models is the Enhanced ER
Diagram. In addition to other ideas, they illustrate subclass and superclass, generalization and
specialization, union or category, and aggregation.

The below diagram is a representation of an Enhanced Entity-relationship model. As you can see, the
below EER model is expanded upon ER models
A Brief History of EER Diagrams

Before working on various projects, an IBM engineer in the United Kingdom created a new ER diagram
in the 1970. That is when the concept took root, grew, and evolved into what is now known as the EER
diagram.

EER models are useful tools for creating high-level model databases. The EER diagrams became
important because it provides all the features that are included in the ER diagrams with the addition
of the following elements:

• Specialization and Generalization

• Aggregation

• Category or Union types

• Subclass and superclass


Features of the EER Model

• EER generates designs that are more accurate to database schemas.

• The diagrammatic style is useful in displaying the EER schema.

• The modeling concepts that are included in the ER model are also included in the EER model.

• The data properties and constraints reflected by the EER model are very precise.

• It incorporates the ideas of specialization and generalization.

• It is used to represent a collection of objects that are a union of distinct types of objects.

Sub Class and Super Class

The link between subclasses and superclasses introduces the idea of inheritance. The 'd' symbol is
used to indicate the relationship between subclasses and superclasses.

Sub Class

A superclass is a type of entity that is connected to one or more subtypes. And, also note that a
database entity cannot be created just by belonging to a superclass.

For example: The superclass of shapes includes the subgroups like Triangle, Circles, and Squares.

Super Class

A subclass is a collection of objects with special characteristics. The traits and properties of a subclass
are inherited from its superclass.

For example: Triangles, Circles, and squares are the subclass of the Shape superclass.

Specialization

Specialization is a procedure that defines a set of entities that are divided into subgroups based on
their characteristics. The Enhanced ER model was designed in a top to bottom approach using the
Specialization. In this model, the superclass or parent object is defined first by utilizing a box i.e. a
rectangle box. After this, it is separated into subclasses, which are comparable entity types.

Let's take an example of a scenario, that handles, stores, and processes a large amount of data, for a
company that manufactures automobiles. The primary feature of this company is the Vehicle, also
considered as a superclass. All the other attributes of the superclass are the type of vehicle, the color
of the vehicle, the average of the vehicle, etc.

Now, the vehicle which is a superclass can further be subdivided into various subclasses, for example,
Cars, scooters, and Trucks. Here, each of the above-mentioned subclasses inherits all of the attributes
of the superclass i.e. Vehicle superclass. Also, note that a subclass can have its properties in addition
to inherited ones. The below diagram is a representation of the above-given scenario.

Generalization

Extraction of the shared characteristics or traits of entities to compile them into a superclass is the
process of Generalization. It is a reverse process of Specialization. In short, it converts subclasses to
superclasses. However, generalization is a process that combines only those entity sets that share the
same features into higher-level entities.

As generalization is a reverse process of specialization, it follows a bottom-up approach i.e. the lower-
level entities combine to form a higher-level entity.

Let's take the above same example of the data handling scenario for a company that manufactures
automobiles. The scooter, car, and truck are the three primary entities in the Enhanced ER diagram in
the given example. These entities can include attributes like registration number, license period,
insurance number, and so on, and they can be used as subclasses for both Commercial and Private
vehicle superclasses. The attributes belong to the subclasses Car, Scooter, and truck and are included
in their respective superclasses due to their commonality. This process of taking the shared attributes
and reaching the fundamental primary root is known as Generalization.

Category or Union

A Category or Union represents a single sub-class or super-class relationship between two or more
super-classes. However, the participation of the superclasses can be partial or total. In short, a category
or union represents a relationship of "either" type.

Let's consider an example of a car and its owner. The owner can be considered as a subclass and the
superclass can be an individual, a company, or a bank. As shown in the below EER model in DBMS, the
subclass i.e. a car owner in the car booking model can be any of the superclasses i.e. an individual, a
company, or a bank.

Aggregation

Aggregation is a high-level data modeling technique that makes use of both Generalization and
Specialization. It is widely used to connect distinct entity types based on a shared relationship. This
idea is commonly used to represent execution elements, operational lines, or functional behavior of a
similar sort in terms of shared properties.

Aggregation is used to simplify the details of a given database by converting ternary relationships to
binary ones. Ternary relationships are merely one sort of relationship that exists between three
entities. Let's take an example to understand aggregation. In this example, both the teachers and
students are related to each other via learning. And, this relationship between the teachers and
students is acting as an entity that is related to the Student.

Conclusion

• Enhanced ER models are advanced database diagrams that, like ordinary ER diagrams, depict
the requirements and complexities of complicated databases.

• All the components of an ER diagram are included in an EER diagram and there are some
additions mentioned below that helps us create and maintain detailed databases through
high-level model and tools.

• Additions are Specialization and Generalization, Aggregation, Category or Union types, and
Subclass and superclass.

• EER model in DBMS generates designs that are more accurate to database schemas and the
modeling concepts that are included in the ER model are also included in the EER model.

• Generalization is the process of extracting the common attributes or properties of entities to


sum them up and form a superclass.

• Specialization is a procedure that defines a set of entities that are divided into subgroups based
on their characteristics.

• Aggregation is a high-level data modeling technique that makes use of both Specialization and
Generalization. It is widely used to connect distinct entity types based on a shared relationship

You might also like