0% found this document useful (0 votes)
21 views32 pages

DBMS Unit-1

The document provides an overview of Database Management Systems (DBMS), explaining its purpose, characteristics, advantages, and disadvantages. It details the architecture of DBMS, types of data models, and the distinction between database schema and instance, as well as data independence. Additionally, it covers various database languages such as Data Definition Language (DDL), Data Manipulation Language (DML), and Data Control Language (DCL).

Uploaded by

tusharbindal22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views32 pages

DBMS Unit-1

The document provides an overview of Database Management Systems (DBMS), explaining its purpose, characteristics, advantages, and disadvantages. It details the architecture of DBMS, types of data models, and the distinction between database schema and instance, as well as data independence. Additionally, it covers various database languages such as Data Definition Language (DDL), Data Manipulation Language (DML), and Data Control Language (DCL).

Uploaded by

tusharbindal22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

DBMS UNIT-1

What is Database
The database is a collection of inter-related data which is used to retrieve, insert and
delete the data efficiently. It is also used to organize the data in the form of a table,
schema, views, and reports, etc.

For example: The college Database organizes the data about the admin, staff, students
and faculty etc.

Using the database, you can easily retrieve, insert, and delete the information.

Database Management System


o Database management system is a software which is used to manage the
database. For example: MySQL , Oracle, , etc are a very popular commercial
database which is used in different applications.
o DBMS provides an interface to perform various operations like database creation,
storing data in it, updating data, creating a table in the database and a lot more.
o It provides protection and security to the database. In the case of multiple users,
it also maintains data consistency.

Why to Learn DBMS?


A modern DBMS has the following characteristics −
• Real-world entity − A modern DBMS is more realistic and uses real-world
entities to design its architecture. It uses the behavior and attributes too. For
example, a school database may use students as an entity and their age as an
attribute.
• Relation-based tables − DBMS allows entities and relations among them to
form tables. A user can understand the architecture of a database just by
looking at the table names.
• Isolation of data and application − A database system is entirely different than
its data. A database is an active entity, whereas data is said to be passive, on
which the database works and organizes. DBMS also stores metadata, which is
data about data, to ease its own process.
• Less redundancy − DBMS follows the rules of normalization, which splits a
relation when any of its attributes is having redundancy in values. Normalization
is a mathematically rich and scientific process that reduces data redundancy.
• Consistency − Consistency is a state where every relation in a database
remains consistent. There exist methods and techniques, which can detect
attempt of leaving database in inconsistent state. A DBMS can provide greater
consistency as compared to earlier forms of data storing applications like file-
processing systems.
• Query Language − DBMS is equipped with query language, which makes it
more efficient to retrieve and manipulate data. A user can apply as many and as
different filtering options as required to retrieve a set of data. Traditionally it was
not possible where file-processing system was used.

DBMS allows users the following tasks:

o Data Definition: It is used for creation, modification, and removal of definition


that defines the organization of data in the database.
o Data Updation: It is used for the insertion, modification, and deletion of the
actual data in the database.
o Data Retrieval: It is used to retrieve the data from the database which can be
used by applications for various purposes.
o User Administration: It is used for registering and monitoring users, maintain
data integrity, enforcing data security, dealing with concurrency control,
monitoring performance and recovering information corrupted by unexpected
failure.

Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the
information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It contains ACID properties which maintain data in a healthy state in case of
failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints according to the requirements
of the user.

Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores
all the data in one single database file and that recorded data is placed in the
database.
o Data sharing: In DBMS, the authorized users of an organization can share the
data among multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature
of the database system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic
backup of data from hardware and software failures and restores the data if
required.
o multiple user interface: It provides different types of user interfaces like
graphical user interfaces, application program interfaces

Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and
large memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in
most of the organization, all the data stored in a single database and if the
database is damaged due to electric failure or database corruption then the data
may be lost forever.
DBMS vs. File System
There are the following differences between DBMS and File systems:

Basis DBMS Approach File System Approach

Meaning DBMS is a collection of data. In The file system is a collection of


DBMS, the user is not required to data. In this system, the user has
write the procedures. to write the procedures for
managing the database.

Sharing of data Due to the centralized approach, Data is distributed in many files,
data sharing is easy. and it may be of different formats,
so it isn't easy to share data.

Data Abstraction DBMS gives an abstract view of data The file system provides the detail
that hides the details. of the data representation and
storage of data.

Security and DBMS provides a good protection It isn't easy to protect a file under
Protection mechanism. the file system.

Recovery DBMS provides a crash recovery The file system doesn't have a
Mechanism mechanism, i.e., DBMS protects the crash mechanism, i.e., if the system
user from system failure. crashes while entering some data,
then the content of the file will be
lost.

Manipulation DBMS contains a wide variety of The file system can't efficiently
Techniques sophisticated techniques to store store and retrieve the data.
and retrieve the data.

Concurrency DBMS takes care of Concurrent In the File system, concurrent


Problems access of data using some form of access has many problems like
locking. redirecting the file while deleting
some information or updating
some information.

Where to use Database approach used in large File system approach used in large
systems which interrelate many files. systems which interrelate many
files.

Cost The database system is expensive to The file system approach is


design. cheaper to design.

Data Due to the centralization of the In this, the files and application
Redundancy and database, the problems of data programs are created by different
Inconsistency redundancy and inconsistency are programmers so that there exists a
controlled. lot of duplication of data which
may lead to inconsistency.

Structure The database structure is complex to The file system approach has a
design. simple structure.

Data In this system, Data Independence In the File system approach, there
Independence exists, and it can be of two types. exists no Data Independence.
o Logical Data Independence
o Physical Data Independence

Integrity Integrity Constraints are easy to Integrity Constraints are difficult to


Constraints apply. implement in file system.

Data Models In the database approach, 3 types of In the file system approach, there
data models exist: is no concept of data models
o Hierarchal data models exists.

o Network data models


o Relational data models

Flexibility Changes are often a necessity to the The flexibility of the system is less
content of the data stored in any as compared to the DBMS
system, and these changes are more approach.
easily with a database approach.

Examples Oracle, SQL Server, Sybase etc. Cobol, C++ etc.


DBMS Architecture
Types of DBMS Architecture

1-Tier Architecture

o In this architecture, the database is directly available to the user. It means the
user can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.

2-Tier Architecture

o The 2-Tier architecture is same as basic client-server. In the two-tier architecture,


applications on the client end can directly communicate with the database at the
server side. For this interaction, API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing
and transaction management.
o To communicate with the DBMS, client-side application establishes a connection
with the server side.
Fig: 2-tier Architecture

3-Tier Architecture

o The 3-Tier architecture contains another layer between the client and server. In this
architecture, client can't directly communicate with the server.
o The 3-Tier architecture is used in case of large web application.

Fig: 3-tier Architecture


Three schema Architecture
o The three schema architecture is also called ANSI/SPARC architecture or three-level
architecture.
o This framework is used to describe the structure of a specific database system.
o The three schema architecture is also used to separate the user applications and physical
database.

System Architecture of DBMS


Data Models
Data Model is the modeling of the data description, data semantics, and consistency
constraints of the data. It provides the conceptual tools for describing the design of a
database at each level of data abstraction. Therefore, there are following four data
models used for understanding the structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and
columns within a table. Thus, a relational model uses tables for representing data and
in-between relationships. Tables are also called relations. This model was initially
described by Edgar F. Codd, in 1969. The relational data model is the widely used model
which is primarily used by commercial data processing applications.

2) Entity-Relationship Data Model: An ER model is the logical representation of data


as objects and relationships among them. These objects are known as entities, and
relationship is an association among these entities. This model was designed by Peter
Chen and published in 1976 papers. It was widely used in database designing. A set of
attributes describe the entities. For example, student_name, student_id describes the
'student' entity. A set of the same type of entities is known as an 'Entity set', and the set
of the same type of relationships is known as 'relationship set'.

3) Object-based Data Model: An extension of the ER model with notions of functions,


encapsulation, and object identity, as well. This model supports a rich type system that
includes structured and collection types. Thus, in 1980s, various database systems
following the object-oriented approach were developed. Here, the objects are nothing
but the data carrying its properties.

4) Semistructured Data Model: This type of data model is different from the other
three data models (explained above). The semistructured data model allows the data
specifications at places where the individual data items of the same type may have
different attributes sets. The Extensible Markup Language, also known as XML, is widely
used for representing the semistructured data. Although XML was initially designed for
including the markup information to the text document, it gains importance because of
its application in the exchange of data.

Database Schema
A database schema is the skeleton structure that represents the logical view of the
entire database. It defines how the data is organized and how the relations among
them are associated. It formulates all the constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a
descriptive detail of the database, which can be depicted by means of schema
diagrams. It’s the database designers who design the schema to help programmers
understand the database and make it useful.

A database schema can be divided broadly into two categories −


• Physical Database Schema − This schema pertains to the actual storage of
data and its form of storage like files, indices, etc. It defines how the data will be
stored in a secondary storage.
• Logical Database Schema − This schema defines all the logical constraints that
need to be applied on the data stored. It defines tables, views, and integrity
constraints.

Database Instance
It is important that we distinguish these two terms individually. Database schema is the
skeleton of database. It is designed when the database doesn't exist at all. Once the
database is operational, it is very difficult to make any changes to it. A database
schema does not contain any data or information.
A database instance is a state of operational database with data at any given time. It
contains a snapshot of the database. Database instances tend to change with time. A
DBMS ensures that its every instance (state) is in a valid state, by diligently following
all the validations, constraints, and conditions that the database designers have
imposed

Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at one level
of the database system without altering the schema at the next higher level.

There are two types of data independence:

1. Logical Data Independence


o Logical data independence refers characteristic of being able to change the conceptual
schema without having to change the external schema.
o Logical data independence is used to separate the external level from the conceptual
view.
o If we do any changes in the conceptual view of the data, then the user view of the data
would not be affected.
o Logical data independence occurs at the user interface level.

2. Physical Data Independence


o Physical data independence can be defined as the capacity to change the internal
schema without having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then the
Conceptual structure of the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal levels.
o Physical data independence occurs at the logical interface level.

Database Language
o A DBMS has appropriate languages and interfaces to express database queries and
updates.
o Database languages can be used to read, store and update the data in the database.
Types of Database Language

1. Data Definition Language

o DDL stands for Data Definition Language. It is used to define database structure or
pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the number of
tables and schemas, their names, indexes, columns in each table, constraints, etc.

Here are some tasks that come under DDL:

o Create: It is used to create objects in the database.


o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.

These commands are used to update the database schema that's why they come under
Data definition language.

2. Data Manipulation Language


DML stands for Data Manipulation Language. It is used for accessing and manipulating
data in a database. It handles user requests.

Here are some tasks that come under DML:

o Select: It is used to retrieve data from a database.


o Insert: It is used to insert data into a table.
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.
o Merge: It performs UPSERT operation, i.e., insert or update operations.
o Call: It is used to call a structured query language or a Java subprogram.
o Explain Plan: It has the parameter of explaining data.
o Lock Table: It controls concurrency.

3. Data Control Language

o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
o The DCL execution is transactional. It also has rollback parameters.

(But in Oracle database, the execution of data control language does not have
the feature of rolling back.)

Here are some tasks that come under DCL:

o Grant: It is used to give user access privileges to a database.


o Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:

CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.

4. Transaction Control Language


TCL is used to run the changes made by the DML statement. TCL can be grouped into a
logical transaction.

Here are some tasks that come under TCL:


o Commit: It is used to save the transaction on the database.
o Rollback: It is used to restore the database to original since the last Commit.

Types of Database Users in DBMS


Database Users
1. Application Programmers – They are the developers who interact with
the database by means of DML queries.
2. Sophisticated Users – They are database developers, who write SQL
queries to select/insert/delete/update data.
3. Specialized Users – These are also sophisticated users, but they write
special database application programs. They are the developers who
develop the complex programs to the requirement.
4. Stand-alone Users – These users will have a stand-alone database for
their personal use. These kinds of the database will have readymade
database packages which will have menus and graphical interfaces.
5. Native Users – these are the users who use the existing application to
interact with the database.
Database Administrators
A DBA has many responsibilities.

• Installing and upgrading the DBMS Servers:


• Design and implementation:
• Performance tuning:
• Migrate database servers:
• Backup and Recovery:
• Security:
• Documentation:
Types of DBA
There are different kinds of DBA depending on the responsibility that he owns.

• Administrative DBA – This DBA is mainly concerned with installing, and maintaining DBMS
servers. His prime tasks are installing, backups, recovery, security, replications, memory
management, configurations, and tuning. He is mainly responsible for all administrative tasks of
a database.
• Development DBA – He is responsible for creating queries and procedures for the requirement.
Basically, his task is similar to any database developer.
• Database Architect – Database architect is responsible for creating and maintaining the users,
roles, access rights, tables, views, constraints, and indexes. He is mainly responsible for
designing the structure of the database depending on the requirement. These structures will be
used by developers and development DBA to code.
• Data Warehouse DBA –DBA should be able to maintain the data and procedures from various
sources in the data warehouse. These sources can be files, COBOL, or any other programs. Here
data and programs will be from different sources. A good DBA should be able to keep the
performance and function levels from these sources at the same pace to make the data
warehouse work.
• Application DBA –He acts like a bridge between the application program and the database. He
makes sure all the application program is optimized to interact with the database. He ensures all
the activities from installing, upgrading, and patching, maintaining, backup, recovery to
executing the records work without any issues.
• OLAP DBA – He is responsible for installing and maintaining the database in OLAP systems. He
maintains only OLAP databases.

Transaction Management system


A transaction is a very small unit of a program and it may contain several lowlevel tasks.
A transaction in a database system must maintain Atomicity, Consistency, Isolation,
and Durability − commonly known as ACID properties − in order to ensure accuracy,
completeness, and data integrity.

ACID Properties
The expansion of the term ACID defines for:

1) Atomicity: The term atomicity defines that the data remains atomic. It means if any
operation is performed on the data, either it should be performed or executed
completely or should not be executed at all. It further means that the operation should
not break in between or execute partially. In the case of executing operations on the
transaction, the operation should be completely executed and not partially.
2) Consistency: The word consistency means that the value should remain preserved
always. In DBMS, the integrity of the data should be maintained, which means if a
change in the database is made, it should remain preserved always. In the case of
transactions, the integrity of the data is very essential so that the database remains
consistent before and after the transaction. The data should always be correct.

3) Isolation: The term 'isolation' means separation. In DBMS, Isolation is the property of
a database where no data should affect the other one and may occur concurrently. In
short, the operation on one database should begin when the operation on the first
database gets complete. It means if two operations are being performed on two
different databases, they may not affect the value of one another. In the case of
transactions, when two or more transactions occur simultaneously, the consistency
should remain maintained. Any changes that occur in any particular transaction will not
be seen by other transactions until the change is not committed in the memory.

4) Durability: Durability ensures the permanency of something. In DBMS, the term


durability ensures that the data after the successful execution of the operation becomes
permanent in the database. The durability of the data should be so perfect that even if
the system fails or leads to a crash, the database still survives. However, if gets lost, it
becomes the responsibility of the recovery manager for ensuring the durability of the
database. For committing the values, the COMMIT command must be used every time
we make changes.

Therefore, the ACID property of DBMS plays a vital role in maintaining the consistency
and availability of data in the database.
Components of DBMS

In this section, we will look at the common components that are universal
across all DBMS software, including:

• Storage engine
• Query language
• Query processor
• Optimization engine
• Metadata catalog
• Log manager
• Reporting and monitoring tools
• Data utilities

Decision Support System


Decision support systems (DSS) are interactive software-based systems intended to help
managers in decision-making by accessing large volumes of information generated from
various related information systems involved in organizational business processes, such as
office automation system, transaction processing system, etc.

A decision support system helps in decision-making but does not necessarily give a
decision itself. The decision makers compile useful information from raw data, documents,
personal knowledge, and/or business models to identify and solve problems and make
decisions.
Attributes of a DSS
• Adaptability and flexibility
• High level of Interactivity
• Ease of use
• Efficiency and effectiveness
• Complete control by decision-makers
• Ease of development
• Extendibility
• Support for modeling and analysis
• Support for data access
• Standalone, integrated, and Web-based

Components of a DSS
• Database Management System (DBMS) − To solve a problem the necessary
data may come from internal or external database.
• Model Management System − It stores and accesses models that managers
use to make decisions. Such models are used for designing manufacturing
facility, analyzing the financial health of an organization, forecasting demand of
a product or service, etc.
• Support Tools − Support tools like online help; pulls down menus, user
interfaces, graphical analysis, error correction mechanism, facilitates the user
interactions with the system.

ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This
model is used to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy
to design view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-
relationship
diagram.
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.

a. Weak Entity
• An entity that depends on another entity called a weak entity. The weak entity
doesn't contain any key attribute of its own. The weak entity is represented by a
double rectangle.

• 2. Attribute
• The attribute is used to describe the property of an entity. Eclipse is used to
represent an attribute.
• For example, id, age, contact number, name, etc. can be attributes of a student.

• a. Key Attribute
• The key attribute is used to represent the main characteristics of an entity. It
represents a primary key. The key attribute is represented by an ellipse with the
text underlined.

b. Composite Attribute

• An attribute that composed of many other attributes is known as a composite


attribute. The composite attribute is represented by an ellipse, and those ellipses
are connected with an ellipse.

c. Multivalued Attribute

• An attribute can have more than one value. These attributes are known as a
multivalued attribute. The double oval is used to represent multivalued attribute.
• For example, a student can have more than one phone number.

d. Derived Attribute

• An attribute that can be derived from other attribute is known as a derived


attribute. It can be represented by a dashed ellipse.
• For example, A person's age changes over time and can be derived from another
attribute like Date of birth.

3. Relationship
• A relationship is used to describe the relation between entities. Diamond or
rhombus is used to represent the relationship.

Types of relationship are as follows:

a. One-to-One Relationship

• When only one instance of an entity is associated with the relationship, then it is
known as one to one relationship.
• For example, A female can marry to one male, and a male can marry to one
female

b. One-to-many relationship

• When only one instance of the entity on the left, and more than one instance of
an entity on the right associates with the relationship then this is known as a one-
to-many relationship.
• For example, Scientist can invent many inventions, but the invention is done by
the only specific scientist.

c. Many-to-one relationship

• When more than one instance of the entity on the left, and only one instance of
an entity on the right associates with the relationship then it is known as a many-
to-one relationship.
• For example, Student enrolls for only one course, but a course can have many
students.

d. Many-to-many relationship

• When more than one instance of the entity on the left, and more than one
instance of an entity on the right associates with the relationship then it is known
as a many-to-many relationship.
• For example, Employee can assign by many projects and project can have many
employees.

Notation of ER diagram

Mapping Constraints
o A mapping constraint is a data constraint that expresses the number of entities to which
another entity can be related via a relationship set.
o It is most useful in describing the relationship sets that involve more than two entity sets.
o For binary relationship set R on an entity set A and B, there are four possible mapping
cardinalities. These are as follows:
1. One to one (1:1)
2. One to many (1:M)
3. Many to one (M:1)
4. Many to many (M:M)

One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity in E2, and
an entity in E2 is associated with at most one entity in E1.

One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of entities in E2,
and an entity in E2 is associated with at most one entity in E1.

Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one entity in E2, and
an entity in E2 is associated with any number of entities in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of entities in
E2, and an entity in E2 is associated with any number of entities in E1.

Participation Constraints
• Total Participation − Each entity is involved in the relationship. Total
participation is represented by double lines.
• Partial participation − Not all entities are involved in the relationship. Partial
participation is represented by single lines.
ER Design Issues
However, users often mislead the concept of the elements and the design process of the
ER diagram. Thus, it leads to a complex structure of the ER diagram and certain issues
that does not meet the characteristics of the real-world enterprise model.

Here, we will discuss the basic design issues of an ER database schema in the following
points:

1) Use of Entity Set vs attributes


The use of an entity set or attribute depends on the structure of the real-world
enterprise that is being modelled and the semantics associated with its attributes. It
leads to a mistake when the user use the primary key of an entity set as an attribute of
another entity set. Instead, he should use the relationship to do so. Also, the primary key
attributes are implicit in the relationship set, but we designate it in the relationship sets.

2) Use of Entity Set vs. Relationship Sets


It is difficult to examine if an object can be best expressed by an entity set or relationship set. To
understand and determine the right use, the user need to designate a relationship set for
describing an action that occurs in-between the entities. If there is a requirement of
representing the object as a relationship set, then its better not to mix it with the entity set.

3) Use of Binary vs n-ary Relationship Sets


Generally, the relationships described in the databases are binary relationships. However, non-
binary relationships can be represented by several binary relationships. For example, we can
create and represent a ternary relationship 'parent' that may relate to a child, his father, as well
as his mother. Thus, it is possible to represent a non-binary relationship by a set of distinct
binary relationships.

4) Placing Relationship Attributes


The cardinality ratios can become an affective measure in the placement of the
relationship attributes. So, it is better to associate the attributes of one-to-one or one-
to-many relationship sets with any participating entity sets, instead of any relationship
set.
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.

For example, ID is used as a key in the Student table because it is unique for each
student. In the PERSON table, passport_number, license_number, SSN are keys since
they are unique for each person.

Types of keys:
1. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset
of a candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the


name of two employees can be the same, but their EMPLYEE_ID can't be the same.
Hence, this combination can also be a key.

The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.

2. Candidate key

o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of
the attributes, like SSN, Passport_Number, License_Number, etc., are considered a
candidate key.

3. Primary key

o It is the key used to identify one and only one instance of an entity uniquely. An entity
can contain multiple keys, as we saw in the PERSON table. The key which is most suitable
from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In
the EMPLOYEE table, we can even select License_Number and Passport_Number as
primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and developers.

4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another
table.
o Every employee works in a specific department in a company, and employee and
department are two different entities. So we can't store the department's information in
the employee table. That's why we link these two tables through the primary key of one
table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in
the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.

5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely
identify each tuple in a relation. These attributes or combinations of the attributes are
called the candidate keys. One key is chosen as the primary key from these candidate
keys, and the remaining candidate key, if it exists, is termed the alternate key. In other
words, the total number of the alternate keys is the total number of candidate keys
minus the primary key. The alternate key may or may not exist. If there is only one
candidate key in a relation, it does not have an alternate key.

For example, employee relation has two attributes, Employee_Id and PAN_No, that act
as candidate keys. In this relation, Employee_Id is chosen as the primary key, so the
other candidate key, PAN_No, acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite
key. This key is also known as Concatenated Key.

For example, in employee relations, we assume that an employee may be assigned


multiple roles, and an employee may work on multiple projects simultaneously. So the
primary key will be composed of all three attributes, namely Emp_ID, Emp_role, and
Proj_ID in combination. So these attributes act as a composite key since the primary key
comprises more than one attribute.

7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These1 keys
are created when a primary key is large and complex and has no relationship with many
other relations. The data values of the artificial keys are usually numbered in a serial
order.

For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is
large in employee relations. So it would be better to add a new virtual attribute to
identify each tuple in the relation uniquely.

E-R Diagram:

You might also like