0% found this document useful (0 votes)
37 views22 pages

DBMS Unit-I

Bsc

Uploaded by

srigopi1415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views22 pages

DBMS Unit-I

Bsc

Uploaded by

srigopi1415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Introduction

A database-management system (DBMS) is a collection of interrelated data and a set of


programs to access those data. The collection of data, usually referred to as the database,
contains information relevant to an enterprise. The primary goal of a DBMS is to provide a way
to store and retrieve database information that is both convenient and efficient.
Database systems are designed to manage large bodies of information. Management of
data involves both defining structures for storage of information and providing mechanisms for
the manipulation of information
Databases are widely used. Here are some representative applications:
• Enterprise Information
◦ Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other accounting information.
◦ Human resources: For information about employees, salaries, payroll taxes, and benefits, and
for generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of items in
factories, inventories of items in warehouses and stores, and orders for items.
Database-System Applications
Databases are widely used. Here are some representative applications:
• Enterprise Information
◦ Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other accounting information.
◦ Human resources: For information about employees, salaries, payroll taxes, and benefits, and
for generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of items in
factories, inventories of items in warehouses and stores,
and orders for items.
◦ Online retailers: For sales data noted above plus online order tracking,
generation of recommendation lists, and maintenance of online product
evaluations.
• Banking and Finance
◦ Banking: For customer information, accounts, loans, and banking transactions.
◦ Credit card transactions: For purchases on credit cards and generation of monthly statements.
◦ Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds; also for storing real-time market data to enable online trading by
customers and automated trading by the firm.
• Universities: For student information, course registrations, and grades (in addition to standard
enterprise information such as human resources and accounting).
• Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
Purpose of Database Systems
Database systems arose in response to early methods of computerized management
of commercial data. As an example of such methods, typical of the 1960s, consider part of a
university organization that, among other data, keeps information about all instructors, students,
departments, and course offerings. One way to keep the information on a computer is to store it
in operating system files. the files, including programs to:
• Add new students, instructors, and courses
• Register students for courses and generate class rosters
• Assign grades to students, compute grade point averages (GPA), and generate
transcripts
System programmers wrote these application programs to meet the needs of the
university.
New application programs are added to the system as the need arises. For example, suppose that
a university decides to create a new major (say, computer science).As a result, the university
creates a new department and creates new permanent files (or adds information to existing files)
to record information about all the instructors in the department, students in that major, course
offerings, degree requirements, etc.
This typical file-processing system is supported by a conventional operating system. The
system stores permanent records in various files, and it needs different application programs to
extract records from, and add records to, the appropriate files. Keeping organizational
information in a file-processing system has a number of major disadvantages:
Data redundancy and inconsistency.: since different programmers create the files and
application programs over a long period, the various files are likely to have different structures
and the programs may be written in several programming languages. Moreover, the same
information may be duplicated in several places (files). For example, if a student has a double
major (say, music and mathematics) the address and telephone number of that student
may appear in a file that consists of student records of students in the Music department and in a
file that consists of student records of students in the Mathematics department. This redundancy
leads to higher storage and access cost. In addition, it may lead to data inconsistency.
Difficulty in accessing data. Suppose that one of the university clerks needs to find out the
names of all students who live within a particular postal-code area. The clerk asks the data-
processing department to generate such a list. Because the designers of the original system did
not anticipate this request, there is no application program on hand to meet it. There is, however,
an application program to generate the list of all students. The university clerk has now two
choices: either obtain the list of all students and extract the needed information manually or ask a
programmer to write the necessary application program. Both alternatives are obviously
unsatisfactory. Suppose that such a program is written, and that, several days later, the same
clerk needs to trim that list to include only those students who have taken at least 60 credit hours.
As expected, a program to generate such a list does not exist. Again, the clerk has the preceding
two options, neither of which is satisfactory.
Data isolation. Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
Integrity problems. The data values stored in the database must satisfy certain types of
consistency constraints. Suppose the university maintains an account for each department, and
records the balance amount in each account. Suppose also that the university requires that the
account balance of a department may never fall below zero. Developers enforce these constraints
in the system by adding appropriate code in the various application programs. The problem is
compounded when constraints involve several data items from different files.
Atomicity problems. A computer system, like any other device, is subject to failure. In many
applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that
existed prior to the failure. Consider a program to transfer $500 from the account balance of
department A to the account balance of department B. If a system failure occurs during the
execution of the program, it is possible that the $500 was removed from the balance of
department A but was not credited to the balance of department B, resulting in an inconsistent
database state. Clearly, it is essential to database consistency that either both the credit and debit
occur, or that neither occur. That is, the funds transfer must be atomic.
Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. for example,
suppose a registration program maintains a count of students registered for a course, in order to
enforce limits on the number of students registered. When a student registers, the program reads
the current count for the courses, verifies that the count is not already at the limit, adds one to the
count, and stores the count back in the database. Suppose two students register concurrently,
with the count at (say) 39. The two program executions may both read the value 39, and both
would then write back 40, leading to an incorrect increase of only 1, even though two students
successfully registered for the course and the count should be 41. Furthermore, suppose the
course registration limit was 40; in the above case both students would be able to register, eading
to a violation of the limit of 40 students.
Security problems. Not every user of the database system should be able to access all the data.
For example, in a university, payroll personnel need to see only that part of the database that has
financial information. They do not need access to information about academic records. But, since
application programs are added to the file-processing system in an ad hoc manner, enforcing
such security constraints is difficult.
View of Data
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data. A major purpose of a database system is to provide users with an
abstract view of the data. That is, the system hides certain details of how the data are stored and
maintained.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many database
system users are not computer trained, developers hide the complexity from users through
several levels of abstraction, to simplify users interactions with the system:
• Physical level. The lowest level of abstraction describes how the data are actually stored. The
physical level describes complex low-level data structures in detail.
• Logical level. The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The logical level thus describes the
entire database in terms of a small number of relatively simple structures. Although
implementation of the simple structures at the logical level may involve complex physical-level
structures, the user of the logical level does not need to be aware of this complexity. This is
referred to as physical data independence. Database administrators, who must decide what
information to keep in the database, use the logical level of abstraction.
• View level. The highest level of abstraction describes only part of the entire database. Even
though the logical level uses simpler structures, complexity remains because of the variety of
information stored in a large database.
Many users of the database system do not need all this information; instead,
they need to access only a part of the database. The view level of abstraction
exists to simplify their interaction with the system. The system may provide
many views for the same database.

Database Languages:
Database languages, also known as query languages or data query languages, are a classification
of programming languages that developers use to define and access databases, which are
collections of organized data that users can access electronically. These languages allow users to
complete tasks such as controlling access to data, defining and updating data and searching for
information within the database management system (DBMS). A query is a statement requesting
the retrieval of information. The portion of a DML that involves information retrieval is called a
query language.

1. Data definition language (DDL)

Data definition language (DDL) creates the framework of the database by specifying the
database schema, which is the structure that represents the organization of data. Its common uses
include the creation and alteration of tables, files, indexes and columns within the database. This
language also allows users to rename or drop the existing database or its components. Here's a
list of DDL statements:
 CREATE: Creates a new database or object, such as a table, index or column
 ALTER: Changes the structure of the database or object
 DROP: Deletes the database or existing objects
 RENAME: Renames the database or existing objects

2. Data manipulation language (DML)

Data manipulation language (DML) provides operations that handle user requests, offering a way
to access and manipulate the data that users store within a database. Its common functions
include inserting, updating and retrieving data from the database. Here's a list of DML
statements:

 INSERT: Adds new data to the existing database table


 UPDATE: Changes or updates values in the table
 DELETE: Removes records or rows from the table
 SELECT: Retrieves data from the table or multiple tables

3. Data control language (DCL)

Data control language (DCL) controls access to the data that users store within a database.
Essentially, this language controls the rights and permissions of the database system. It allows
users to grant or revoke privileges to the database. Here's a list of DCL statements:

 GRANT: Gives a user access to the database


 REVOKE: Removes a user's access to the database

4. Transaction control language (TCL)

Transaction control language (TCL) manages the transactions within a database. Transactions
group a set of related tasks into a single, executable task. All the tasks must succeed in order for
the transaction to work. Here's a list of TCL statements:

 COMMIT: Carries out a transaction


 ROLLBACK: Restores a transaction if any tasks fail to execute
 S**AVEPOINT**: Sets a point in a transaction to save

Relational Databases:
A relational database is based on the relational model and uses a collection of tables to
represent both data and the relationships among those data. It also includes a DML and DDL.
Tables
Each table has multiple columns and each column has a unique name presents a sample
relational database comprising two tables.
Database Design
Database systems are designed to manage large bodies of information. These large bodies of
information do not exist in isolation. They are part of the operation of some enterprise whose end
product may be information from the database or may be some device or service for which the
database plays only a supporting role. Database design mainly involves the design of the
database schema. The design of a complete database application environment that meets the
needs of the enterprise being modeled requires attention to a broader set of issues. The overall
design of the database is called the database schema.
Database Design for a University Organization
To illustrate the design process, let us examine how a database for a university
could be designed. The initial specification of user requirements may be based
on interviews with the database users, and on the designer’s own analysis of
the organization. The description that arises from this design phase serves as the
basis for specifying the conceptual structure of the database. Here are the major
characteristics of the university.
• The university is organized into departments. Each department is identified
by a unique name (dept name), is located in a particular building, and has a
budget.
• Each department has a list of courses it offers. Each course has associated with
it a course id, title, dept name, and credits, and may also have associated
prerequisites.
• Instructors are identified by their unique ID. Each instructor has name, associated
department (dept name), and salary.
• Students are identified by their unique ID. Each student has a name, an associated
major department (dept name), and tot cred (total credit hours the student
earned thus far).
• The university maintains a list of classrooms, specifying the name of the
building, room number, and room capacity.
• The university maintains a list of all classes (sections) taught. Each section is
identified by a course id, sec id, year, and semester, and has associated with it
a semester, year, building, room number, and time slot id (the time slot when the
class meets).
• The department has a list of teaching assignments specifying, for each instructor,
the sections the instructor is teaching.
• The university has a list of all student course registrations, specifying, for
each student, the courses and the associated sections that the student has
taken (registered for).
The query processor is important because it helps the database system to
simplify and facilitate access to data. The query processor allows database users to obtain good
performance while being able to work at the view level and not be burdened with understanding
the physical-level details of the implementation of the system. It is the job of the database system
to translate updates and queries written in a nonprocedural language, at the logical level, into an
efficient sequence of operations at the physical level.
The Query Processor
The query processor components include:
• DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands.

A query can usually be translated into any of a number of alternative evaluation plans that all give
the same result. The DML compiler also performs query optimization; that is, it picks the lowest cost
evaluation plan from among the alternatives.
• Query evaluation engine, which executes low-level instructions generated
by the DML compiler.

Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.

For example, ID is used as a key in the Student table because it is unique for each student. In
the PERSON table, passport_number, license_number, SSN are keys since they are unique for
each person.
Types of keys:

1. Primary key

o It is the first key used to identify one and only one instance of an entity uniquely. An
entity can contain multiple keys, as we saw in the PERSON table. The key which is most
suitable from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In
the EMPLOYEE table, we can even select License_Number and Passport_Number as
primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and developers.

2. Candidate key

o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the
attributes, like SSN, Passport_Number, License_Number, etc., are considered a candidate key.

3. Super Key

Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a
candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of
two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this
combination can also be a key.

The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.

4. Foreign key

o Foreign keys are the column of the table used to point to the primary key of another
table.
o Every employee works in a specific department in a company, and employee and
department are two different entities. So we can't store the department's information in
the employee table. That's why we link these two tables through the primary key of one
table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in
the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.

5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each
tuple in a relation. These attributes or combinations of the attributes are called the candidate
keys. One key is chosen as the primary key from these candidate keys, and the remaining
candidate key, if it exists, is termed the alternate key. In other words, the total number of the
alternate keys is the total number of candidate keys minus the primary key. The alternate key
may or may not exist. If there is only one candidate key in a relation, it does not have an
alternate key.

For example, employee relation has two attributes, Employee_Id and PAN_No, that act as
candidate keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate
key, PAN_No, acts as the Alternate key.

6. Composite key

Whenever a primary key consists of more than one attribute, it is known as a composite key.
This key is also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple
roles, and an employee may work on multiple projects simultaneously. So the primary key will
be composed of all three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So
these attributes act as a composite key since the primary key comprises more than one attribute.

7. Artificial key

The key created using arbitrarily assigned data are known as artificial keys. These keys are
created when a primary key is large and complex and has no relationship with many other
relations. The data values of the artificial keys are usually numbered in a serial order.

For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add a new virtual attribute to identify each tuple in
the relation uniquely.

Relational Query Languages


Relational query languages use relational algebra to break the user requests and
instruct the DBMS to execute the requests. It is the language by which user
communicates with the database. These relational query languages can be
procedural or non-procedural.

Procedural Query Language


A procedural query language will have set of queries instructing the DBMS to
perform various transactions in the sequence to meet the user request. For
example, get_CGPA procedure will have various queries to get the marks of student
in each subject, calculate the total marks, and then decide the CGPA based on his
total marks. This procedural query language tells the database what is required from
the database and how to get them from the database. Relational algebra is a
procedural query language.
Non-Procedural Query Language
Non-procedural queries will have single query on one or more tables to get result
from the database. For example, get the name and address of the student with
particular ID will have single query on STUDENT table. Relational Calculus is a non
procedural language which informs what to do with the tables, but doesn’t inform
how to accomplish this.

These query languages basically will have queries on tables in the database. In the
relational database, a table is known as relation. Records / rows of the table are
referred as tuples. Columns of the table are also known as attributes. All these
names are used interchangeably in relational database.

Types of Relational operation

1. Select Operation:

o The select operation selects tuples that satisfy a given predicate.


o It is denoted by sigma (σ).

1. Notation: σ p(r)

Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT.
These relational can use as relational operators like =, ≠, ≥, <, >, ≤.

For example: LOAN Relation

32.1M

593

Prime Ministers of India | List of Prime Minister of India (1947-2020)

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Input:

1. σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT


Perryride L-15 1500

Perryride L-16 1300

2. Project Operation:

o This operation shows the list of those attributes that we wish to appear in the result. Rest
of the attributes are eliminated from the table.
o It is denoted by ∏.

1. Notation: ∏ A1, A2, An (r)

Where

A1, A2, A3 is used as an attribute name of relation r.

Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn

Input:

1. ∏ NAME, CITY (CUSTOMER)


Output:

NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

3. Union Operation:

o Suppose there are two tuples R and S. The union operation contains all the tuples that
are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.

1. Notation: R ∪ S

A union operation must hold the following condition:

o R and S must have the attribute of the same number.


o Duplicate tuples are eliminated automatically.

Example:

DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO
Johnson A-101

Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION

CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17
Input:

1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Johnson

Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry

Williams

Mayes

4. Set Intersection:

o Suppose there are two tuples R and S. The set intersection operation contains all tuples
that are in both R & S.
o It is denoted by intersection ∩.
1. Notation: R ∩ S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Smith

Jones

5. Set Difference:

o Suppose there are two tuples R and S. The set intersection operation contains all tuples
that are in R but not in S.
o It is denoted by intersection minus (-).

1. Notation: R - S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Jackson

Hayes
Willians

Curry

6. Cartesian product

o The Cartesian product is used to combine each row in one table with each row in the
other table. It is also known as a cross product.
o It is denoted by X.

1. Notation: E X D

Example:

EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B

DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales
C Legal

Input:

1. EMPLOYEE X DEPARTMENT

Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename Operation:

The rename operation is used to rename the output relation. It is denoted by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
1. ρ(STUDENT1, STUDENT)

You might also like