iunit-dbms-notes
iunit-dbms-notes
database management system (Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala
Engineering College)
1. INTRODUCTION TO DATABASE
➢ Database is collection of data which is related by some aspect. Data is collection of facts
and figures which can be processed to produce information. Mostly data represents
recordable facts.
➢ DBMS consists of a collection of interrelated data and a set of programs to access those
datas.
➢ Data aids in producing information which is based on facts.
➢ A database management system stores data, in such a way which is easier to retrieve,
manipulate and helps to produce information.
➢ So a database is a collection of related data that we can use for
o Defining - specifying types of data
o Constructing - storing & populating.
o Manipulating - querying, updating, reporting.
➢ A DBMS is a collection of software programs that allows a user to define datatypes,
structures, constraints, store data permanently, modify and delete operations.
➢ DBMS is basically a software used to add, modify, delete, select data from database.
➢ In simpler words, DBMS is a collection of interrelated data and software programs to
access those data.
DISADVANTAGES OF FILE SYSTEM OVER DB
➢ In the early days, File-Processing system is used to store records. It uses various files for
storing the records.
➢ Drawbacks of using file systems to store data:
o Data redundancy and inconsistency
o Multiple file formats, duplication of information in different files
o Difficulty in accessing data
o Need to write a new program to carry out each new task
o Data isolation 4 multiple files and formats
o Integrity problems- Hard to add new constraints or change existing ones
o Atomicity problem
o Failures may leave database in an inconsistent state with partial updates carried
Out. E.g. transfer of funds from one account to another should either complete or
not happen at all
o Concurrent access anomalies -Concurrent accessed needed for performance
o Security problems
➢ Database systems offer solutions to all the above problems
➢ The typical file processing system is supported by a conventional operating system. The
➢ system stores permanent records in various files, and it needs different application
programs to extract records from, and add records to, the appropriate files.
➢ A file processing system has a number of major disadvantages.
✓ Data redundancy and inconsistency
✓ Difficulty in accessing data
✓ Data isolation 3 multiple files and formats
✓ Integrity problems
✓ Atomicity of updates
✓ Concurrent access by multiple users
✓ Security problems
1. Data redundancy and inconsistency: In file processing, every user group maintains its own
files for handling its data processing applications.
2. Difficulty in accessing data:File processing environments do not allow needed data to be
retrieved in a convenient and efficient manner.
3. Data isolation :Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
4. Integrity problems:The data values stored in the database must satisfy certain types of
consistency constraints. Example: The balance of certain types of bank accounts may never fall
below a prescribed amount . Developers enforce these constraints in the system by addition
appropriate code in the various application programs
5. Atomicity problems: Atomic means the transaction must happen in its entirety or not at all. It
is difficult to ensure atomicity in a conventional file processing system. Example: Consider a
program to transfer $50 from account A to account B. If a system failure occurs during the
execution of the program, it is possible that the $50 was removed from account A but was not
credited to account B, resulting in an inconsistent database state.
6. Concurrent access anomalies: For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. In such an
environment, interaction of concurrent updates is possible and may result in inconsistent data.
To guard against this possibility, the system must maintain some form of supervision. But
supervision is difficult to provide because data may be accessed by many different application
programs that have not been coordinated previously.
Example: When several reservation clerks try to assign a seat on an airline flight, the system
should ensure that each seat can be accessed by only one clerk at a time for assignment to a
passenger.
7. Security problems: Enforcing security constraints to the file processing system is difficult.
APPLICATION OF DATABASE
➢ Database Applications
➢ It refers that how database is actually stored in database, what data and structure of data
used by database for data. So describe all this database provides user with views and
these are
o Data abstraction
o Instances and schemas
Data abstraction:
➢ As a data in database are stored with very complex data structure so when user come and
want to access any data, he will not be able to access data if he has go through this data
structure.
➢ So to simplify the interaction of user and database, DBMS hides some information which
is not of user interest, this is called data abstraction.So developer hides complexity from
user and store abstract view of data.
➢ Data abstraction has three level of abstractions
Physical level: This is the lowest level of data abstraction which describe How data is actual
stored in database. This level basically describe the data structure and access path /indexing use
for accessing file.
Logical level: The next level of abstraction describe what data are stored in the database and
what are the relationship existed among those of data.
View level: In this level user only interact with database and the complexity remain unview user
see data and there may be many views of one data like chart and graph.
• Network Model:
• Relational Model:
5. DBMS ARCHITECTURE
• DBMS architecture depends upon how users are connected to the database to get
their request done.
Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of three types like: 1-tier architecture, 2-tier architecture and 3-tier
architecture.
1- Tier Architecture
• In this architecture, the database is directly available to the user. It means the user
can directly sit on the DBMS and uses it.
• Any changes done here will directly be done on the database itself. It doesn't provide
a handy tool for end users.
• The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2- Tier Architecture
• The 2-Tier architecture is same as basic client-server. In the two-tier architecture,
applications on the client end can directly communicate with the database at the
server side. For this interaction, API's like: ODBC, JDBC are used.
• The user interfaces and application programs are run on the client-side.
• The server side is responsible to provide the functionalities like: query processing and
transaction management.
• To communicate with the DBMS, client-side application establishes a connection with
the server side.
3- Tier Architecture
• The 3-Tier architecture contains another layer between the client and server. In this
architecture, client can't directly communicate with the server.
• The application on the client-end interacts with an application server which further
communicates with the database system.
• End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the application.
• The 3-Tier architecture is used in case of large web application.
1. The internal level 3 has an internal schema which describes the physical storage
structure of the database. Uses a physical data model and describes the complete details
of data storage and access paths for the database.
2. The conceptual level 3 has a conceptual schema which describes the structure of the
database for users. It hides the details of the physical storage structures, and concentrates
on describing entities, data types, relationships, user operations and constraints. Usually a
representational data model is used to describe the conceptual schema.
3. The External or View level 3 includes external schemas or user views. Each external
schema describes the part of the database that a particular user group is interested in and
hides the rest of the database from that user group. Represented using the representational
data model.
• The three schema architecture is used to visualize the schema levels in a database.
• The three schemas are only descriptions of data, the data only actually exists is at the
physical level.
6. RELATIONAL DATABASE
➢ A relational database is a database system in which the database is organized and
accessed according to the relationships between data items without the need for any
consideration of physical orientation and relationship.
➢ Relationships between data items are expressed by means of tables.
➢ It is a tool, which can help you store, manage and disseminate information of various
kinds.
➢ It is a collection of objects, tables, queries, forms, reports, and macros, all stored in a
computer program all of which are inter-related.
10
Catalog:
✓ A catalog consists of all the information of the various schemas
(external,conceptual and internal) and also all of the corresponding mappings
(external/conceptual, conceptual/internal).
✓ It contains detailed information regarding the various objects that are of interest
tothe system itself; e.g., tables, views, indexes, users, integrity rules, security
rules, etc.
✓ In a relational database, the entities of the ERD are represented as tables and their
attributes as the columns of their respective tables in a database schema.
✓ It includes some important terms, such as:
Table: Tables are the basic storage structures of a database where data about
something in the real world is stored. It is also called a relation or an entity.
Row: Rows represent collection of data required for a particular entity.
In order to identify each row as unique there should be a unique identifier called
the primary key, which allows no duplicate rows.
A row is also called a record or a tuple.
Column: Columns represent characteristics or attributes of an entity. Each
attribute maps onto a column of a table. Hence, a column is also known as an
attribute.
Relationship: Relationships represent a logical link between two tables. A
relationship is depicted by a foreign key column.
Degree: number of attributes
Cardinality: number of tuples
An attribute of an entity has a particular value. The set of possible values that a
given attribute can have is called its domain.
The data in an RDBMS is stored in database objects which are called as tables.
A field is a column in a table that is designed to maintain specific information
about every record in the table.
A record is also called as a row of data is each individual entry that exists in a
table.
A NULL value in a table is a value in a field that appears to be blank, which
means a field with a NULL value is a field with no value.
11
Keys:
➢ An attribute or set of attributes whose values uniquely identify each entity in an entity set
is called a key for that entity set.
1. Primary Key:
➢ It is a minimum super key.
➢ It is a unique identifier for the table(a column or a column combination with the
property that at any given time no two rows of the table contain the same value
in that column or column combination).
2. Candidate Key:
➢ There may be two or more attributes or combinations of attributes that uniquely identify
an instance of an entity set. These attributes or combinations of attributes are called
candidate keys.
12
3. Super Key:
➢ If we add additional attributes to a key, the resulting combination would still uniquely
identify an instance of the entity set.
➢ Such augmented keys are called super keys.
4. Foreign Key:
➢ A foreign key is a field (or collection of fields) in one table that uniquely
identifies a row of another table. In simpler words, the foreign key is defined in a second
table, but it refers to the primary key in the first table.
13
5. Secondary Key:
➢ A secondary key is an attribute or combination of attributes that may not be a candidate
key, but that classifies the entity set on a particular characteristic.
➢ Any key consisting of a single attribute is called a simple key
➢ Combination of attributes is called a composite key.
• Relational Integrity constraints in DBMS are referred to conditions which must be present for a
valid relation. These Relational constraints in DBMS are derived from the rules in the mini-
world that the database represents.
• There are many types of Integrity Constraints in DBMS. Constraints on the Relational database
management system is mostly divided into three main categories are:
Domain Constraints
Key Constraints
Referential Integrity Constraints
Domain Constraints
➢ Attributes have specific values in real-world scenario. For example, age can only be a
positive integer
➢ Domain constraints specify that within each tuple, and the value of each attribute must be
unique. This is specified as data types which include standard data types integers, real
numbers, characters, Booleans, variable length strings, etc.
Key Constraints
14
➢ An attribute that can uniquely identify a tuple in a relation is called the key of the table.
The value of the attribute for different tuples in the relation has to be unique.
8. RELATIONAL ALGEBRA
➢ Relational algebra is a procedural query language.
➢ It gives a step by step process to obtain the result of the query.
➢ It uses operators to perform queries. which takes instances of relations as input and yields
instances of relations as output.
15
1. Select Operation:
• The select operation selects tuples that satisfy a given predicate.
• It is denoted by sigma (σ).
• Notation: σ p(r)
2. Project Operation:
16
• This operation shows the list of those attributes that we wish to appear in the
result.
✓ Rest of the attributes are eliminated from the table.
• It is denoted by ∏.
• Notation: ∏ A1, A2, An (r)
3. Union Operation:
• Suppose there are two tuples R and S.
• The union operation contains all the tuples that are either in R or S or both in R & S.
• It eliminates the duplicate tuples.
• It is denoted by ∪.
• Notation: R ∪ S
17
4. Set Intersection:
• Suppose there are two tuples R and S. The set intersection operation contains all tuples
that are in both R & S.
• It is denoted by intersection n.
18
5. Set Difference:
• The result of set difference operation is tuples, which are present in one relation but
are not in the second relation.
• Notation r - s .
• Finds all the tuples that are present in r but not in s.
19
20
21
• Outer joins are used to include all the tuples from the relations included in join operation
in the resulting relation.
• An outer join is of three types:
1. Left outer join
2. Right outer join
3. Full outer join
• Consider the example:
EMPLOYEE:
22
FACT_WORKERS:
•
(ii) Right outer join (ø)
• In Right outer join, all the tuples from the Right relation, say S, are included in the
resulting relation.
• If there are some tuples in relation S which are not matched with tuple in the Right
Relation R, then the attributes of relation S of the resulting relation become NULL.
23
9. SQL FUNDAMENTALS
➢ What is SQL?
• SQL stands for Structured Query Language
• SQL allows you to access a database
• SQL is an ANSI standard computer language
• SQL can execute queries against a database
• SQL can retrieve data from a database
• SQL can insert new records in a database
24
25
26
27
28
LIKE Condition
INSERT INTO
29
UPDATE
30
DELETE
31
ORDER BY
32
GROUP BY
33
HAVING
34
with DB and executing the code in the DB within the high level language.
• High level programming language compilers cannot interpret SQL statements.
• Hence source code files containing embedded SQL statements must be preprocessed
before compiling.
• Thus each SQL statement coded in a high level programming language source code file must
be prefixed with the keywords EXEC SQL and terminated with either a semicolon or the
keyword END_EXEC.
Connection to Database:
• This is the first step while writing a query in high level languages. First connection to the DB
that we are accessing needs to be established.
• This can be done using the keyword CONNECT. But it has to precede with =EXEC SQLto
indicate that it is a SQL statement.
EXEC SQL CONNECT db_name;
EXEC SQL CONNECT HR_USER; //connects to DB HR_USER
• Once connection is established with DB, we can perform DB transactions.
Host variables
• Database manager cannot work directly with high level programming language
variables.
• Instead, it must be special variables known as host variables to move data between
an application and a database.
35
Since query needs to be prepared at run time, in addition to the structures discussed in
embedded SQL, we have three more clauses in dynamic SQL. These are mainly used to build the
query and execute them at run time.
PREPARE
Since dynamic SQL builds a query at run time, as a first step we need to capture all the inputs
from the user. It will be stored in a string variable. Depending on the inputs received from the
user, string variable is appended with inputs and SQL keywords.
These SQL like string statements are then converted into SQL query. This is done by using
PREPARE statement.
EXECUTE
This statement is used to compile and execute the SQL statements prepared in DB.
EXEC SQL EXECUTE sql_query;
EXECUTE IMMEDIATE
This statement is used to prepare SQL statement as well as execute the SQL statements in DB. It
performs the task of PREPARE and EXECUTE in a single line.
EXEC SQL EXECUTE IMMEDIATE :sql_stmt;
Example
#include stdio.h
#include conio.h
int main(){
EXEC SQL INCLUDE SQLCA;
36
38