DBMS Unit 1
DBMS Unit 1
o The DBMS design depends upon its architecture. The basic client/server architecture is
used to deal with a large number of PCs, web servers, database servers and other
components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation which are
connected via the network.
o DBMS architecture depends upon how users are connected to the database to get their
request done.
Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user can
directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide a
handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture,
applications on the client end can directly communicate with the database at the server
side. For this interaction, API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and
transaction management.
o To communicate with the DBMS, client-side application establishes a connection with the
server side.
2) Keys:
Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely. An entity
can contain multiple keys, as we saw in the PERSON table. The key which is most suitable
from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee.
In the EMPLOYEE table, we can even select License_Number and Passport_Number as
primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and developers.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the
attributes, like SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a
candidate key.
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another table.
o Every employee works in a specific department in a company, and employee and
department are two different entities. So we can't store the department's information in the
employee table. That's why we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in
the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each
tuple in a relation. These attributes or combinations of the attributes are called the candidate keys.
One key is chosen as the primary key from these candidate keys, and the remaining candidate
key, if it exists, is termed the alternate key. In other words, the total number of the alternate keys
is the total number of candidate keys minus the primary key. The alternate key may or may not
exist. If there is only one candidate key in a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as
candidate keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate
key, PAN_No, acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This
key is also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple
roles, and an employee may work on multiple projects simultaneously. So the primary key will be
composed of all three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So
these attributes act as a composite key since the primary key comprises more than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are
created when a primary key is large and complex and has no relationship with many other
relations. The data values of the artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add a new virtual attribute to identify each tuple in the
relation uniquely.
3) Dynamic SQL:
Dynamic SQL is a powerful feature in database management systems (DBMS) that allows SQL
statements to be constructed and executed at runtime rather than at compile time. This is useful
for applications that need to build and execute complex queries on the fly, where the exact
structure of the SQL statement is not known until the program is executed. Here are some key
points about dynamic SQL in DBMS:
1. **Construction at Runtime**:
Dynamic SQL statements are created and executed during the runtime of the application. This
contrasts with static SQL, where the SQL statements are fixed and known at compile time.
2. **Flexibility**:
Dynamic SQL provides the flexibility to create complex and variable SQL queries based on
user inputs or other runtime conditions. This makes it ideal for applications where query
parameters or structure may change dynamically.
3. **Execution**:
Dynamic SQL can be executed using various methods depending on the DBMS. Common
approaches include using SQL functions or procedures that support dynamic execution, such as
`EXECUTE IMMEDIATE` in Oracle or `sp_executesql` in SQL Server.
### Advantages
- **Flexibility**: Allows creation of highly flexible applications that can generate SQL queries
based on dynamic conditions.
- **Code Reusability**: Facilitates code reuse by allowing the same code to handle various SQL
query requirements.
- **Adaptability**: Suitable for applications that need to adapt to varying database schemas or
user inputs.
### Disadvantages
- **Security Risks**: Dynamic SQL can introduce security vulnerabilities, particularly SQL
injection attacks, if user inputs are not properly sanitized.
- **Performance Overheads**: Can have performance overhead due to the need for query
parsing and optimization at runtime.
- **Complexity**: Can increase the complexity of the application code, making it harder to
maintain and debug.
### Examples
In Oracle, you can use the `EXECUTE IMMEDIATE` statement to execute a dynamically
constructed SQL statement:
```sql
DECLARE
sql_stmt VARCHAR2(1000);
emp_id NUMBER := 1001;
BEGIN
sql_stmt := 'UPDATE employees SET salary = salary * 1.1 WHERE employee_id = :id';
EXECUTE IMMEDIATE sql_stmt USING emp_id;
END;
```
In SQL Server, you can use the `sp_executesql` stored procedure to execute a dynamically
constructed SQL statement:
```sql
DECLARE @sql NVARCHAR(1000);
DECLARE @emp_id INT;
SET @emp_id = 1001;
SET @sql = N'UPDATE employees SET salary = salary * 1.1 WHERE employee_id = @id';
EXEC sp_executesql @sql, N'@id INT', @id = @emp_id;
```
4) Relational Algebra:
Relational algebra is a procedural query language. It gives a step by step process to obtain the
result of the query. It uses operators to perform queries.
1. Select Operation:
Notation: σ p(r)
Where:
Input:
σ BRANCH_NAME="perryride" (LOAN)
Output:
Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result. Rest
of the attributes are eliminated from the table.
o It is denoted by ∏.
Where
A1, A2, A3 is used as an attribute name of relation r.
Input:
Output:
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples that
are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
Notation: R ∪ S
Example:
DEPOSITOR RELATION
BORROW RELATION
Input:
Output:
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples
that are in both R & S.
o It is denoted by intersection ∩.
Notation: R ∩ S
Input:
Output:
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples
that are in R but not in S.
o It is denoted by intersection minus (-).
Notation: R - S
Input:
Output:
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the other
table. It is also known as a cross product.
o It is denoted by X.
Notation: E X D
Example:
EMPLOYEE
DEPARTMENT
Input:
EMPLOYEE X DEPARTMENT
Output:
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)
5) Data Abstraction in DBMS:
Data abstraction is the procedure of concealing irrelevant or unwanted data from the end user.
The database system contains intricate data structures and relations. The developers keep away
the complex data from the user and remove the complications so that the user can comfortably
access data in the database and can only access the data they want, which is done with the help
of data abstraction.
The main purpose of data abstraction is to hide irrelevant data and provide an abstract view of
the data. With the help of data abstraction, developers hide irrelevant data from the user and
provide them the relevant data. By doing this, users can access the data without any hassle, and
the system will also work efficiently.
In DBMS, data abstraction is performed in layers which means there are levels of data abstraction
in DBMS that we will further study in this article. Based on these levels, the database management
system is designed.
In DBMS, there are three levels of data abstraction, which are as follows:
The physical or internal layer is the lowest level of data abstraction in the database management
system. It is the layer that defines how data is actually stored in the database. It defines methods
to access the data in the database. It defines complex data structures in detail, so it is very
complex to understand, which is why it is kept hidden from the end user.
Data Administrators (DBA) decide how to arrange data and where to store data. The Data
Administrator (DBA) is the person whose role is to manage the data in the database at the physical
or internal level. There is a data center that securely stores the raw data in detail on hard drives
at this level.
The logical or conceptual level is the intermediate or next level of data abstraction. It explains
what data is going to be stored in the database and what the relationship is between them.
It describes the structure of the entire data in the form of tables. The logical level or conceptual
level is less complex than the physical level. With the help of the logical level, Data Administrators
(DBA) abstract data from raw data present at the physical level.
View or External Level is the highest level of data abstraction. There are different views at this
level that define the parts of the overall data of the database. This level is for the end-user
interaction; at this level, end users can access the data based on their queries.
6) Data Models
Data Model is the modeling of the data description, data semantics, and consistency constraints
of the data. It provides the conceptual tools for describing the design of a database at each level
of data abstraction. Therefore, there are following four data models used for understanding the
structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and columns
within a table. Thus, a relational model uses tables for representing data and in-between
relationships. Tables are also called relations. This model was initially described by Edgar F.
Codd, in 1969. The relational data model is the widely used model which is primarily used by
commercial data processing applications.
4) Semistructured Data Model: This type of data model is different from the other three data
models (explained above). The semistructured data model allows the data specifications at places
where the individual data items of the same type may have different attributes sets. The Extensible
Markup Language, also known as XML, is widely used for representing the semistructured data.
Although XML was initially designed for including the markup information to the text document, it
gains importance because of its application in the exchange of data.