0% found this document useful (0 votes)
115 views86 pages

Database Systems

A database is an organized collection of data that can be easily accessed and managed. It organizes data into tables, rows, columns, and indexes. Common database types include relational databases like MySQL, distributed databases that distribute data across systems, centralized databases that store all data in one location, cloud databases that store data virtually in the cloud, and NoSQL databases used for large unstructured data. A data warehouse contains historical data from multiple sources to facilitate analysis and decision making.

Uploaded by

Haider Sultan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views86 pages

Database Systems

A database is an organized collection of data that can be easily accessed and managed. It organizes data into tables, rows, columns, and indexes. Common database types include relational databases like MySQL, distributed databases that distribute data across systems, centralized databases that store all data in one location, cloud databases that store data virtually in the cloud, and NoSQL databases used for large unstructured data. A data warehouse contains historical data from multiple sources to facilitate analysis and decision making.

Uploaded by

Haider Sultan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 86

Database Systems

What is Database?
A database is an organized collection of data, so that it can be easily
accessed and managed.

Database organize data into tables, rows, columns, and index to make it
easier to find relevant information.

• A database is a systematic collection of data.


• They support electronic storage and manipulation of data.
• Databases make data management easy.
Database Example
• An online telephone directory uses a database to store data of people,
phone numbers, and other contact details.

• Your electricity service provider uses a database to manage billing,


client-related issues, handle fault data, etc.

• There are many dynamic websites on the World Wide Web nowadays


which are handled through databases. For example, a model that
checks the availability of rooms in a hotel. It is an example of a dynamic
website that uses a database.
Database Components
Hardware
The hardware consists of physical, electronic devices like computers, I/O devices,
storage devices, etc. This offers the interface between computers and real-world
systems.

Software
This is a set of programs used to manage and control the overall database. This includes
the database software itself, the Operating System, the network software used to share
the data among users, and the application programs for accessing data in the database.

Data
Data is a raw and unorganized fact that is required to be processed to make it
meaningful. Data can be simple at the same time unorganized unless it is organized.
Generally, data comprises facts, observations, perceptions, numbers, characters,
symbols, images, etc.
Procedure
Procedure are a set of instructions and rules that help you to use the
DBMS. It is designing and running the database using documented
methods, which allows you to guide the users who operate and manage
it.

Database Access Language


Database Access language is used to access the data to and from the
database, enter new data, update already existing data, or retrieve
required data from DBMS. The user writes some specific commands in a
database access language and submits these to the database.
Types of Databases
• Distributed databases
• Relational databases
• Centralized database
• Cloud databases
• NoSQL databases
• Data warehouses
Distributed databases
A distributed database is a type of database that has contributions from
the common database and information captured by local computers. In
this type of database system, the data is not in one place and is
distributed at various organizations.
• Data is distributed among different database systems of an
organization.
• These database systems are connected via communication links. Such
links help the end-users to access the data easily. 
Examples of the Distributed database are Apache Cassandra, HBase,
Ignite, etc.
Distributed databases
• We can further divide a distributed database
system into:
• Homogeneous DDB: Those database systems
which execute on the same operating system and
use the same application process and carry the
same hardware devices.
• Heterogeneous DDB: Those database systems
which execute on different operating systems
under different application procedures, and
carries different hardware devices.
Advantages of Distributed Database
• Modular development is possible in a distributed database, i.e., the
system can be expanded by including new computers and connecting
them to the distributed system.
• One server failure will not affect the entire data set.
Relational databases
This type of database defines database relationships in the form of tables. It is also
called Relational DBMS, which is the most popular DBMS type in the market.
This database is based on the relational data model, which stores data in the form
of rows(tuple) and columns(attributes), and together forms a table(relation).
• A relational database uses SQL for storing, manipulating, as well as maintaining
the data.
• E.F. Codd invented the database in 1970.
• Each table in the database carries a key that makes the data unique from others. 
Examples of Relational databases are MySQL, Microsoft SQL Server, Oracle, etc.
Properties of Relational Database
There are following four commonly known properties of a relational model known as ACID
properties, where:
1. A means Atomicity: This ensures the data operation will complete either with success or with
failure. It follows the 'all or nothing' strategy. For example, a transaction will either be
committed or will abort.
2. C means Consistency: If we perform any operation over the data, its value before and after the
operation should be preserved. For example, the account balance before and after the
transaction should be correct, i.e., it should remain conserved.
3. I means Isolation: There can be concurrent users for accessing data at the same time from the
database. Thus, isolation between the data should remain isolated. For example, when multiple
transactions occur at the same time, one transaction effects should not be visible to the other
transactions in the database.
4. D means Durability: It ensures that once it completes the operation and commits the data, data
changes should remain permanent.
Centralized database
It is a centralized location, and users from different backgrounds can
access this data. This type of computers databases store application
procedures that help users access the data even from a remote location.
It is the type of database that stores data at a centralized database system.
It comforts the users to access the stored data from different locations
through several applications.
These applications contain the authentication process to let users access
data securely.
example of a Centralized database can be Central Library that carries a
central database of each library in a college/university.
Advantages of Centralized Database
• It has decreased the risk of data management, i.e., manipulation of data
will not affect the core data.
• Data consistency is maintained as it manages data in a central repository.
• It provides better data quality, which enables organizations to establish data
standards.
• It is less costly because fewer vendors are required to handle the data sets.
Disadvantages of Centralized Database
• The size of the centralized database is large, which increases the response
time for fetching the data.
• It is not easy to update such an extensive database system.
• If any server failure occurs, entire data will be lost, which could be a huge
loss.
Cloud databases
A type of database where data is stored in a virtual environment and executes over
the cloud computing platform. It provides users with various cloud computing
services (SaaS, PaaS, IaaS, etc.) for accessing the database. There are numerous
cloud platforms, but the best options are:
• Amazon Web Services(AWS)
• Microsoft Azure
• Kamatera
• PhonixNAP
• ScienceSoft
• Google Cloud SQL, etc.
There are so many advantages of a cloud database, some of which can pay for
storage capacity and bandwidth. It also offers scalability on-demand, along with
high availability.
NoSQL databases
NoSQL database is used for large sets of distributed data. There are a
few big data performance problems that are effectively handled by
relational databases. This type of computers database is very efficient
in analyzing large-size unstructured data.
Non-SQL/Not Only SQL is a type of database that is used for storing a
wide range of data sets. It is not a relational database as it stores data
not only in tabular form but in several different ways. It came into
existence when the demand for building modern applications increased.
Thus, NoSQL presented a wide variety of database technologies in
response to the demands.
NoSQL databases
We can further divide a NoSQL database into the following four types:

1. Key-value storage: It is the simplest type of database storage where it stores


every single item as a key (or attribute name) holding its value, together.
2. Document-oriented Database: A type of database used to store data as JSON-
like document. It helps developers in storing data by using the same document-
model format as used in the application code.
3. Graph Databases: It is used for storing vast amounts of data in a graph-like
structure. Most commonly, social networking websites use the graph database.
4. Wide-column stores: It is similar to the data represented in relational databases.
Here, data is stored in large columns together, instead of storing in rows.
Advantages of NoSQL Database
• It enables good productivity in the application development as it is
not required to store data in a structured format.
• It is a better option for managing and handling large data sets.
• It provides high scalability.
• Users can quickly access data from the database through key-value.
Data warehouses
Data Warehouse is to facilitate a single version of truth for a company for decision
making and forecasting. A Data warehouse is an information system that contains
historical and commutative data from single or multiple sources. Data Warehouse
concept simplifies the reporting and analysis process of the organization.

Data Warehousing (DW) is process for collecting and managing data from varied
sources to provide meaningful business insights.

Data warehouse is typically used to connect and analyze business data from
heterogeneous sources. The data warehouse is the core of the BI system which is
built for data analysis and reporting.
How Data warehouse works?
A Data Warehouse works as a central repository where information arrives from one or more data
sources. Data flows into a data warehouse from the transactional system and other relational
databases.
Data may be:
I. Structured
II. Semi-structured
III. Unstructured data
• The data is processed, transformed, and ingested so that users can access the processed data in
the Data Warehouse through Business Intelligence tools, SQL clients, and spreadsheets. A data
warehouse merges information coming from different sources into one comprehensive database.
• By merging all of this information in one place, an organization can analyze its customers more
holistically. This helps to ensure that it has considered all the information available. Data
warehousing makes data mining possible. Data mining is looking for patterns in the data that may
lead to higher sales and profits.
What Is a Data Warehouse Used For?
Airline:
In the Airline system, it is used for operation purpose like crew assignment, analyses of route profitability, frequent
flyer program promotions, etc.

Banking:
It is widely used in the banking sector to manage the resources available on desk effectively. Few banks also used for
the market research, performance analysis of the product and operations.

Healthcare:
Healthcare sector also used Data warehouse to strategize and predict outcomes, generate patient’s treatment reports,
share data with tie-in insurance companies, medical aid services, etc.

Public sector:
In the public sector, data warehouse is used for intelligence gathering. It helps government agencies to maintain and
analyze tax records, health policy records, for every individual.
Investment and Insurance sector:
In this sector, the warehouses are primarily used to analyze data
patterns, customer trends, and to track market movements.

Telecommunication:
A data warehouse is used in this sector for product promotions, sales
decisions and to make distribution decisions.

Hospitality Industry:
This Industry utilizes warehouse services to design as well as estimate
their advertising and promotion campaigns where they want to target
clients based on their feedback and travel patterns.
Database Vs Data Structures
Database Management System
Database management system is a software which is used to manage
the database. For example: MySQL, Oracle, etc are a very popular
commercial database which is used in different applications. DBMS
provides an interface to perform various operations like database
creation, storing data in it, updating data, creating a table in the
database and a lot more. It provides protection and security to the
database. In the case of multiple users, it also maintains data
consistency.
DBMS
DBMS allows users the following tasks:
• Data Definition: It is used for creation, modification, and removal of definition
that defines the organization of data in the database.
• Data Updation: It is used for the insertion, modification, and deletion of the
actual data in the database.
• Data Retrieval: It is used to retrieve the data from the database which can be
used by applications for various purposes.
• User Administration: It is used for registering and monitoring users, maintain
data integrity, enforcing data security, dealing with concurrency control,
monitoring performance and recovering information corrupted by
unexpected failure.
DBMS
Characteristics of DBMS
1. It uses a digital repository established on a server to store and manage the
information.
2. It can provide a clear and logical view of the process that manipulates data.
3. DBMS contains automatic backup and recovery procedures.
4. It contains ACID properties which maintain data in a healthy state in case of failure.
5. It can reduce the complex relationship between data.
6. It is used to support manipulation and processing of data.
7. It is used to provide security of data.
8. It can view the database from different viewpoints according to the requirements of
the user.
ACID Properties in DBMS
DBMS is the management of data that should remain integrated when any changes
are done in it. It is because if the integrity of the data is affected, whole data will
get disturbed and corrupted. Therefore, to maintain the integrity of the data, there
are four properties described in the database management system, which are
known as the ACID properties. The ACID properties are meant for the transaction
that goes through a different group of tasks, and there we come to see the role of
the ACID properties.
Atomicity
The term atomicity defines that the data remains atomic. It means if any operation is
performed on the data, either it should be performed or executed completely or should not
be executed at all. It further means that the operation should not break in between or
execute partially. In the case of executing operations on the transaction, the operation
should be completely executed and not partially.
Example: If Remo has account A having $30 in his account from which he wishes to send
$10 to Sheero's account, which is B. In account B, a sum of $ 100 is already present. When
$10 will be transferred to account B, the sum will become $110. Now, there will be two
operations that will take place. One is the amount of $10 that Remo wants to transfer will
be debited from his account A, and the same amount will get credited to account B, i.e.,
into Sheero's account. Now, what happens - the first operation of debit executes
successfully, but the credit operation, however, fails. Thus, in Remo's account A, the value
becomes $20, and to that of Sheero's account, it remains $100 as it was previously present.
Atomicity

It can be seen that after crediting $10, the amount is still


$100 in account B. So, it is not an atomic transaction.

Thus, when the amount loses atomicity, then in the


bank systems, this becomes a huge issue, and so the
atomicity is the main focus in the bank systems.
Consistency
The word consistency means that the value should remain preserved
always. In DBMS, the integrity of the data should be maintained, which
means if a change in the database is made, it should remain preserved
always. In the case of transactions, the integrity of the data is very
essential so that the database remains consistent before and after the
transaction. The data should always be correct.
Consistency
There are three accounts, A, B, and C, where A is making
a transaction T one by one to both B & C. There are two
operations that take place, i.e., Debit and Credit. Account
A firstly debits $50 to account B, and the amount in
account A is read $300 by B before the transaction. After
the successful transaction T, the available amount in B
becomes $150. Now, A debits $20 to account C, and that
time, the value read by C is $250 (that is correct as a
debit of $50 has been successfully done to B). The debit
and credit operation from account A to C has been done
successfully. We can see that the transaction is done
successfully, and the value is also read correctly. Thus,
the data is consistent. In case the value read by B and C
is $300, which means that data is inconsistent because
when the debit operation executes, it will not be
consistent.
Isolation
The term 'isolation' means separation. In DBMS, Isolation is the property of a
database where no data should affect the other one and may occur concurrently. In
short, the operation on one database should begin when the operation on the first
database gets complete. It means if two operations are being performed on two
different databases, they may not affect the value of one another. In the case of
transactions, when two or more transactions occur simultaneously, the consistency
should remain maintained. Any changes that occur in any particular transaction will
not be seen by other transactions until the change is not committed in the memory.
• Example: If two operations are concurrently running on two different accounts,
then the value of both accounts should not get affected. The value should remain
persistent. As you can see in the diagram, account A is making T1 and T2
transactions to account B and C, but both are executing independently without
affecting each other. It is known as Isolation.
Isolation

Example: If two operations are concurrently


running on two different accounts, then the
value of both accounts should not get
affected. The value should remain persistent.
As you can see in the diagram, account A is
making T1 and T2 transactions to account B
and C, but both are executing independently
without affecting each other. It is known as
Isolation.
Durability
Durability ensures the permanency of something. In DBMS, the term
durability ensures that the data after the successful execution of the
operation becomes permanent in the database. The durability of the
data should be so perfect that even if the system fails or leads to a
crash, the database still survives. However, if gets lost, it becomes the
responsibility of the recovery manager for ensuring the durability of the
database. For committing the values, the COMMIT command must be
used every time we make changes.
Therefore, the ACID property of DBMS plays a vital role in maintaining
the consistency and availability of data in the database.
Advantages of DBMS
• Controls database redundancy: It can control data redundancy because it stores all
the data in one single database file and that recorded data is placed in the database.
• Data sharing: In DBMS, the authorized users of an organization can share the data
among multiple users.
• Easily Maintenance: It can be easily maintainable due to the centralized nature of
the database system.
• Reduce time: It reduces development time and maintenance need.
• Backup: It provides backup and recovery subsystems which create automatic backup
of data from hardware and software failures and restores the data if required.
• multiple user interface: It provides different types of user interfaces like graphical
user interfaces, application program interfaces.
Disadvantages of DBMS
• Cost of Hardware and Software: It requires a high speed of data
processor and large memory size to run DBMS software.
• Size: It occupies a large space of disks and large memory to run them
efficiently.
• Complexity: Database system creates additional complexity and
requirements.
• Higher impact of failure: Failure is highly impacted the database
because in most of the organization, all the data stored in a single
database and if the database is damaged due to electric failure or
database corruption then the data may be lost forever.
Three schema Architecture
• The three schema architecture is also called ANSI/SPARC architecture or three-level architecture.
• This framework is used to describe the structure of a specific database system.
• The three schema architecture is also used to separate the user applications and physical database.
• The three schema architecture contains three-levels. It breaks the database down into three different
categories.

• It shows the DBMS architecture.


• Mapping is used to transform the request and response between various database levels of architecture.
• Mapping is not good for small DBMS because it takes more time.
• In External / Conceptual mapping, it is necessary to transform the request from external level to
conceptual schema.
• In Conceptual / Internal mapping, DBMS transform the request from the conceptual to internal level.
Three schema Architecture
Three schema Architecture
1. Internal Level
• The internal level has an internal schema which describes the physical storage structure of the database.
• The internal schema is also known as a physical schema.
• It uses the physical data model. It is used to define that how the data will be stored in a block.
• The physical level is used to describe complex low-level data structures in detail.

2. Conceptual Level
• The conceptual schema describes the design of a database at the conceptual level. Conceptual level is also
known as logical level.
• The conceptual schema describes the structure of the whole database.
• The conceptual level describes what data are to be stored in the database and also describes what relationship
exists among those data.
• In the conceptual level, internal details such as an implementation of the data structure are hidden.
• Programmers and database administrators work at this level.
Three schema Architecture
3. External Level
• At the external level, a database contains several schemas that
sometimes called as subschema. The subschema is used to describe the
different view of the database.
• An external schema is also known as view schema.
• Each view schema describes the database part that a particular user
group is interested and hides the remaining database from that user
group.
• The view schema describes the end user interaction with database
systems.
Data Independence
Data independence can be explained using the three-schema architecture.
Data independence refers characteristic of being able to modify the schema at one level of the
database system without altering the schema at the next higher level.
There are two types of data independence:

1. Logical Data Independence


• Logical data independence refers characteristic of being able to change the conceptual
schema without having to change the external schema.
• Logical data independence is used to separate the external level from the conceptual view.
• If we do any changes in the conceptual view of the data, then the user view of the data would
not be affected.
• Logical data independence occurs at the user interface level.
Data Independence
2. Physical Data Independence
Physical data independence can be defined as the capacity to change the internal schema without having to
change the conceptual schema. If we do any changes in the storage size of the database system server, then
the Conceptual structure of the database will not be affected. Physical data independence is used to separate
conceptual levels from the internal levels. Physical data independence occurs at the logical interface level.
Database Language
A DBMS has appropriate languages and interfaces to express database
queries and updates. Database languages can be used to read, store
and update the data in the database.
Data Definition Language
DDL stands for Data Definition Language. It is used to define database structure or pattern. It is used to
create schema, tables, indexes, constraints, etc. in the database. Using the DDL statements, you can create
the skeleton of the database. Data definition language is used to store the information of metadata like the
number of tables and schemas, their names, indexes, columns in each table, constraints, etc.
Here are some tasks that come under DDL:
1. Create: It is used to create objects in the database.
2. Alter: It is used to alter the structure of the database.
3. Drop: It is used to delete objects from the database.
4. Truncate: It is used to remove all records from a table.
5. Rename: It is used to rename an object.
6. Comment: It is used to comment on the data dictionary.
These commands are used to update the database schema that's why they come under Data definition
language.
Data Manipulation Language

DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a


database. It handles user requests.
Here are some tasks that come under DML:
1. Select: It is used to retrieve data from a database.
2. Insert: It is used to insert data into a table.
3. Update: It is used to update existing data within a table.
4. Delete: It is used to delete all records from a table.
5. Merge: It performs UPSERT operation, i.e., insert or update operations.
6. Call: It is used to call a structured query language or a Java subprogram.
7. Explain Plan: It has the parameter of explaining data.
8. Lock Table: It controls concurrency.
Data Control Language
DCL stands for Data Control Language. It is used to retrieve the stored or
saved data. The DCL execution is transactional. It also has rollback
parameters. (But in Oracle database, the execution of data control
language does not have the feature of rolling back.)
Here are some tasks that come under DCL:
• Grant: It is used to give user access privileges to a database.
• Revoke: It is used to take back permissions from the user.
There are the following operations which have the authorization of
Revoke: CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and
SELECT.
Transaction Control Language
TCL is used to run the changes made by the DML statement. TCL can be
grouped into a logical transaction.

Here are some tasks that come under TCL:

• Commit: It is used to save the transaction on the database.


• Rollback: It is used to restore the database to original since the last
Commit.
Keys
Keys play an important role in the relational database.
It is used to uniquely identify any record or row of data from the table. It is also used to establish and identify
relationships between tables.
• For example: In Student table, ID is used as a key because it is unique for each student. In PERSON table,
passport_number, license_number, SSN are keys since they are unique for each person.
Types of key:
1. Primary key
It is the first key which is used to identify one and only one instance of an entity uniquely. An entity can contain
multiple keys as we saw in PERSON table. The key which is most suitable from those lists become a primary key.
In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In the EMPLOYEE table, we
can even select License_Number and Passport_Number as primary key since they are also unique.
For each entity, selection of the primary key is based on requirement and developers.
2. Candidate key
• A candidate key is an attribute or set of an attribute which can uniquely identify a tuple.
• The remaining attributes except for primary key are considered as a candidate key. The candidate keys are as
strong as the primary key.
• For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of the attributes like SSN,
Passport_Number, and License_Number, etc. are considered as a candidate key.
3. Super Key
• Super key is a set of an attribute which can uniquely identify a tuple.
Super key is a superset of a candidate key.
• For example: In the above EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME) the name of two employees can be the same, but
their EMPLYEE_ID can't be the same. Hence, this combination can also
be a key.
• The super key would be EMPLOYEE-ID, (EMPLOYEE_ID, EMPLOYEE-
NAME), etc.
4. Foreign key
• Foreign keys are the column of the table which is used to point to the
primary key of another table.
• In a company, every employee works in a specific department, and
employee and department are two different entities. So we can't
store the information of the department in the employee table. That's
why we link these two tables through the primary key of one table.
• We add the primary key of the DEPARTMENT table, Department_Id as
a new attribute in the EMPLOYEE table.
• Now in the EMPLOYEE table, Department_Id is the foreign key, and
both the tables are related.
4. Foreign key
Data Models
Data Model is the modeling of the data description, data semantics,
and consistency constraints of the data. It provides the conceptual tools
for describing the design of a database at each level of data abstraction.
Therefore, there are following four data models used for understanding
the structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and columns
within a table. Thus, a relational model uses tables for representing data and in-between
relationships. Tables are also called relations. This model was initially described by Edgar F. Codd, in
1969. The relational data model is the widely used model which is primarily used by commercial data
processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation of data as objects and
relationships among them. These objects are known as entities, and relationship is an association
among these entities. This model was designed by Peter Chen and published in 1976 papers. It was
widely used in database designing. A set of attributes describe the entities. For example,
student_name, student_id describes the 'student' entity. A set of the same type of entities is known
as an 'Entity set', and the set of the same type of relationships is known as 'relationship set'.
3) Object-based Data Model: An extension of the ER model with notions of functions, encapsulation,
and object identity, as well. This model supports a rich type system that includes structured and
collection types. Thus, in 1980s, various database systems following the object-oriented approach
were developed. Here, the objects are nothing but the data carrying its properties.
4) Semistructured Data Model: This type of data model is different from the other three data models
(explained above). The semistructured data model allows the data specifications at places where the
individual data items of the same type may have different attributes sets. The Extensible Markup
Language, also known as XML, is widely used for representing the semistructured data. Although
XML was initially designed for including the markup information to the text document, it gains
importance because of its application in the exchange of data.
Data model Schema and Instance
• The data which is stored in the database at a particular moment of time is called an instance of the
database.
• The overall design of a database is called schema.
• A database schema is the skeleton structure of the database. It represents the logical view of the entire
database.
• A schema contains schema objects like table, foreign key, primary key, views, columns, data types,
stored procedure, etc.
• A database schema can be represented by using the visual diagram. That diagram shows the database
objects and relationship with each other.
• A database schema is designed by the database designers to help programmers whose software will
interact with the database. The process of database creation is called data modeling.
A schema diagram can display only some aspects of a schema like the name of record type, data type,
and constraints. Other aspects can't be specified through the schema diagram. For example, the given
figure neither show the data type of each data item nor the relationship among various files.
Data model Schema and Instance
In the database, actual data changes quite frequently. For example, in the given figure, the database changes
whenever we add a new grade or add a student. The data at a particular moment of time is called the instance
of the database.
ER Model
• ER model stands for an Entity-Relationship model. It is a high-level data
model. This model is used to define the data elements and relationship for a
specified system.
• It develops a conceptual design for the database. It also develops a very
simple and easy to design view of data.
• In ER modeling, the database structure is portrayed as a diagram called an
entity-relationship diagram.
example, Suppose we design a school database. In this database, the student
will be an entity with attributes like address, name, id, age, etc. The address
can be another entity with attributes like city, street name, pin code, etc and
there will be a relationship between them.
Component of ER Diagram
1. Entity:
• An entity may be any object, class, person or place. In the ER diagram,
an entity can be represented as rectangles.
• Consider an organization as an example- manager, product, employee,
department etc. can be taken as an entity.

Weak Entity
• An entity that depends on another entity called a weak entity. The
weak entity doesn't contain any key attribute of its own. The weak
entity is represented by a double rectangle.
2. Attribute
• The attribute is used to describe the property of an entity. Eclipse is
used to represent an attribute.

Example, id, age, contact number, name, etc. can be attributes of a


student.
a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary key. The key
attribute is represented by an ellipse with the text underlined.

b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The composite
attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute.
The double oval is used to represent multivalued attribute.
example, a student can have more than one phone number.

d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.
example, A person's age changes over time and can be derived from another attribute like Date of
birth.
3. Relationship
A relationship is used to describe the relation between entities.
Diamond or rhombus is used to represent the relationship.

Types of relationship are as follows:


1. One to One
2. One to Many
3. Many to One
4. Many to Many
a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is
known as one to one relationship.
example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an
entity on the right associates with the relationship then this is known as a one-to-
many relationship.
example, Scientist can invent many inventions, but the invention is done by the
only specific scientist.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an
entity on the right associates with the relationship then it is known as a many-to-
one relationship.
example, Student enrolls for only one course, but a course can have many
students.

d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance
of an entity on the right associates with the relationship then it is known as a many-
to-many relationship.
example, Employee can assign by many projects and project can have many
employees.
Notation of ER diagram
ER Design Issues
Here, we will discuss the basic design issues of an ER database schema
in the following points:

1. Use of Entity Set vs Attributes


2. Use of Entity Set vs. Relationship Sets
3. Use of Binary vs n-ary Relationship Sets
4. Placing Relationship Attributes
1) Use of Entity Set vs Attributes
The use of an entity set or attribute depends on the structure of the real-
world enterprise that is being modelled and the semantics associated with
its attributes. It leads to a mistake when the user use the primary key of an
entity set as an attribute of another entity set. Instead, he should use the
relationship to do so. Also, the primary key attributes are implicit in the
relationship set, but we designate it in the relationship sets.
2) Use of Entity Set vs. Relationship Sets
It is difficult to examine if an object can be best expressed by an entity set
or relationship set. To understand and determine the right use, the user
need to designate a relationship set for describing an action that occurs in-
between the entities. If there is a requirement of representing the object as
a relationship set, then its better not to mix it with the entity set.
3) Use of Binary vs n-ary Relationship Sets
Generally, the relationships described in the databases are binary relationships. However, non-binary
relationships can be represented by several binary relationships. For example, we can create and
represent a ternary relationship 'parent' that may relate to a child, his father, as well as his mother.
Such relationship can also be represented by two binary relationships i.e, mother and father, that may
relate to their child. Thus, it is possible to represent a non-binary relationship by a set of distinct
binary relationships.
4) Placing Relationship Attributes
The cardinality ratios can become an affective measure in the placement of the relationship attributes.
So, it is better to associate the attributes of one-to-one or one-to-many relationship sets with any
participating entity sets, instead of any relationship set. The decision of placing the specified attribute
as a relationship or entity attribute should possess the characteristics of the real world enterprise that
is being modelled.
For example, if there is an entity which can be determined by the combination of participating entity
sets, instead of determing it as a separate entity. Such type of attribute must be associated with the
many-to-many relationship sets.
• Thus, it requires the overall knowledge of each part that is involved inb desgining and modelling an
ER diagram. The basic requirement is to analyse the real-world enterprise and the connectivity of
one entity or attribute with other.
Mapping Constraints
A mapping constraint is a data constraint that expresses the number of
entities to which another entity can be related via a relationship set.
• It is most useful in describing the relationship sets that involve more
than two entity sets.
• For binary relationship set R on an entity set A and B, there are four
possible mapping cardinalities. These are as follows:
1. One to one (1:1)
2. One to many (1:M)
3. Many to one (M:1)
4. Many to many (M:M)
One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one
entity in E2, and an entity in E2 is associated with at most one entity in
E1.
One-to-many
In one-to-many mapping, an entity in E1 is associated with any number
of entities in E2, and an entity in E2 is associated with at most one
entity in E1.
Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one
entity in E2, and an entity in E2 is associated with any number of
entities in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any
number of entities in E2, and an entity in E2 is associated with any
number of entities in E1.
Convert ER into Table
There are some points for converting the ER diagram to the table:
Entity type becomes a table.
In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual tables.
All single-valued attribute becomes a column for the table.
In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the column of STUDENT table. Similarly, COURSE_NAME
and COURSE_ID form the column of COURSE table and so on.
A key attribute of the entity type represented by the primary key.
In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are the key attribute of the entity.
The multivalued attribute is represented by a separate table.
In the student table, a hobby is a multivalued attribute. So it is not possible to represent multiple values in a single column
of STUDENT table. Hence we create a table STUD_HOBBY with column name STUDENT_ID and HOBBY. Using both the
column, we create a composite key.
Composite attribute represented by components.
In the given ER diagram, student address is a composite attribute. It contains CITY, PIN, DOOR#, STREET, and STATE. In the
STUDENT table, these attributes can merge as an individual column.
Derived attributes are not considered in the table.
In the STUDENT table, Age is the derived attribute. It can be calculated at any point of time by calculating the difference
between current date and Date of Birth.
• Using these rules, you can convert the ER diagram to tables and columns and assign the mapping between the tables.
Convert ER into Table
Convert ER into Table
Relationship of higher degree
The degree of relationship can be defined as the number of occurrences in one entity that is associated
with the number of occurrences in another entity.
There is the three degree of relationship:
1. One-to-one (1:1)
2. One-to-many (1:M)
3. Many-to-many (M:N)

One-to-one
• In a one-to-one relationship, one occurrence of an entity relates to only one occurrence in another entity.
• A one-to-one relationship rarely exists in practice.
For example: if an employee is allocated a company car then that car can only be driven by that employee.
Therefore, employee and company car have a one-to-one relationship.
Relationship of higher degree
One-to-many
• In a one-to-many relationship, one occurrence in an entity relates to many occurrences in another entity.
For example: An employee works in one department, but a department has many employees.
• Therefore, department and employee have a one-to-many relationship.

Many-to-many
• In a many-to-many relationship, many occurrences in an entity relate to many occurrences in another entity.
• Same as a one-to-one relationship, the many-to-many relationship rarely exists in practice.
For example: At the same time, an employee can work on several projects, and a project has a team of many
employees.
• Therefore, employee and project have a many-to-many relationship.
Generalization
• Generalization is like a bottom-up approach in which two or more entities of lower level combine to form a higher level
entity if they have some attributes in common.
• In generalization, an entity of a higher level can also combine with the entities of the lower level to form a further higher
level entity.
• Generalization is more like subclass and superclass system, but the only difference is the approach. Generalization uses the
bottom-up approach.
In generalization, entities are combined to form a more
generalized entity, i.e., subclasses are combined to make a superclass.
For example, Faculty and Student entities
can be generalized and create a higher level entity Person.
Specialization
• Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one higher level entity can be
broken down into two lower level entities.
• Specialization is used to identify the subset of an entity set that shares some distinguishing characteristics.
• Normally, the superclass is defined first, the subclass and its related attributes are defined next, and relationship set are
then added.
For example: In an Employee management system,
EMPLOYEE entity can be specialized as TESTER or DEVELOPER
based on what role they play in the company.
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship with
its corresponding entities is aggregated into a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will never
enquiry about the Course only or just about the Center instead he will ask the enquiry about both.

You might also like