DBMS Unit-I
DBMS Unit-I
Database Languages:
Database languages, also known as query languages or data query languages, are a classification
of programming languages that developers use to define and access databases, which are
collections of organized data that users can access electronically. These languages allow users to
complete tasks such as controlling access to data, defining and updating data and searching for
information within the database management system (DBMS). A query is a statement requesting
the retrieval of information. The portion of a DML that involves information retrieval is called a
query language.
Data definition language (DDL) creates the framework of the database by specifying the
database schema, which is the structure that represents the organization of data. Its common uses
include the creation and alteration of tables, files, indexes and columns within the database. This
language also allows users to rename or drop the existing database or its components. Here's a
list of DDL statements:
CREATE: Creates a new database or object, such as a table, index or column
ALTER: Changes the structure of the database or object
DROP: Deletes the database or existing objects
RENAME: Renames the database or existing objects
Data manipulation language (DML) provides operations that handle user requests, offering a way
to access and manipulate the data that users store within a database. Its common functions
include inserting, updating and retrieving data from the database. Here's a list of DML
statements:
Data control language (DCL) controls access to the data that users store within a database.
Essentially, this language controls the rights and permissions of the database system. It allows
users to grant or revoke privileges to the database. Here's a list of DCL statements:
Transaction control language (TCL) manages the transactions within a database. Transactions
group a set of related tasks into a single, executable task. All the tasks must succeed in order for
the transaction to work. Here's a list of TCL statements:
Relational Databases:
A relational database is based on the relational model and uses a collection of tables to
represent both data and the relationships among those data. It also includes a DML and DDL.
Tables
Each table has multiple columns and each column has a unique name presents a sample
relational database comprising two tables.
Database Design
Database systems are designed to manage large bodies of information. These large bodies of
information do not exist in isolation. They are part of the operation of some enterprise whose end
product may be information from the database or may be some device or service for which the
database plays only a supporting role. Database design mainly involves the design of the
database schema. The design of a complete database application environment that meets the
needs of the enterprise being modeled requires attention to a broader set of issues. The overall
design of the database is called the database schema.
Database Design for a University Organization
To illustrate the design process, let us examine how a database for a university
could be designed. The initial specification of user requirements may be based
on interviews with the database users, and on the designer’s own analysis of
the organization. The description that arises from this design phase serves as the
basis for specifying the conceptual structure of the database. Here are the major
characteristics of the university.
• The university is organized into departments. Each department is identified
by a unique name (dept name), is located in a particular building, and has a
budget.
• Each department has a list of courses it offers. Each course has associated with
it a course id, title, dept name, and credits, and may also have associated
prerequisites.
• Instructors are identified by their unique ID. Each instructor has name, associated
department (dept name), and salary.
• Students are identified by their unique ID. Each student has a name, an associated
major department (dept name), and tot cred (total credit hours the student
earned thus far).
• The university maintains a list of classrooms, specifying the name of the
building, room number, and room capacity.
• The university maintains a list of all classes (sections) taught. Each section is
identified by a course id, sec id, year, and semester, and has associated with it
a semester, year, building, room number, and time slot id (the time slot when the
class meets).
• The department has a list of teaching assignments specifying, for each instructor,
the sections the instructor is teaching.
• The university has a list of all student course registrations, specifying, for
each student, the courses and the associated sections that the student has
taken (registered for).
The query processor is important because it helps the database system to
simplify and facilitate access to data. The query processor allows database users to obtain good
performance while being able to work at the view level and not be burdened with understanding
the physical-level details of the implementation of the system. It is the job of the database system
to translate updates and queries written in a nonprocedural language, at the logical level, into an
efficient sequence of operations at the physical level.
The Query Processor
The query processor components include:
• DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all give
the same result. The DML compiler also performs query optimization; that is, it picks the lowest cost
evaluation plan from among the alternatives.
• Query evaluation engine, which executes low-level instructions generated
by the DML compiler.
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In
the PERSON table, passport_number, license_number, SSN are keys since they are unique for
each person.
Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely. An
entity can contain multiple keys, as we saw in the PERSON table. The key which is most
suitable from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In
the EMPLOYEE table, we can even select License_Number and Passport_Number as
primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and developers.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the
attributes, like SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a
candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of
two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this
combination can also be a key.
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another
table.
o Every employee works in a specific department in a company, and employee and
department are two different entities. So we can't store the department's information in
the employee table. That's why we link these two tables through the primary key of one
table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in
the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each
tuple in a relation. These attributes or combinations of the attributes are called the candidate
keys. One key is chosen as the primary key from these candidate keys, and the remaining
candidate key, if it exists, is termed the alternate key. In other words, the total number of the
alternate keys is the total number of candidate keys minus the primary key. The alternate key
may or may not exist. If there is only one candidate key in a relation, it does not have an
alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as
candidate keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate
key, PAN_No, acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key.
This key is also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple
roles, and an employee may work on multiple projects simultaneously. So the primary key will
be composed of all three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So
these attributes act as a composite key since the primary key comprises more than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are
created when a primary key is large and complex and has no relationship with many other
relations. The data values of the artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add a new virtual attribute to identify each tuple in
the relation uniquely.
These query languages basically will have queries on tables in the database. In the
relational database, a table is known as relation. Records / rows of the table are
referred as tuples. Columns of the table are also known as attributes. All these
names are used interchangeably in relational database.
1. Select Operation:
1. Notation: σ p(r)
Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT.
These relational can use as relational operators like =, ≠, ≥, <, >, ≤.
32.1M
593
Input:
1. σ BRANCH_NAME="perryride" (LOAN)
Output:
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result. Rest
of the attributes are eliminated from the table.
o It is denoted by ∏.
Where
Input:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples that
are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
1. Notation: R ∪ S
Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples
that are in both R & S.
o It is denoted by intersection ∩.
1. Notation: R ∩ S
Input:
Output:
CUSTOMER_NAME
Smith
Jones
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples
that are in R but not in S.
o It is denoted by intersection minus (-).
1. Notation: R - S
Input:
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the
other table. It is also known as a cross product.
o It is denoted by X.
1. Notation: E X D
Example:
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
1. EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
1. ρ(STUDENT1, STUDENT)