0% found this document useful (0 votes)
36 views28 pages

CENG301 DBMS - Session-6

Uploaded by

grupsakli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views28 pages

CENG301 DBMS - Session-6

Uploaded by

grupsakli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

CENG301 Database Management Systems

Session-6
Asst. Prof. Mustafa YENIAD
[email protected]
Database Administrator
• Database Administrator:
• A person who has central control over the system is called database administrator (DBA).
The functions of DBA are:
1. Creation and modification of conceptual schema definition,
2. Implementation of storage structure and access method,
3. Schema and physical organization modifications,
4. Granting of authorization for data access,
DBMS

5. Integrity constraints specification,


6. Execute immediate recovery procedure in case of failures,
7. Ensure physical security to database.
8. And goes on…
Database Languages
• Database languages are are crucial to express database queries and updates.
• These languages play a major role to read, update, and store data in the database. There are basically four
types of database languages: DML, DDL, TCL, and DCL.
• SQL means Structured Query Language and is composed of several areas, and each of them has a specific
acronym and sub-language:
DBMS
Database Languages
• DML stands for Data Manipulation Language and it covers INSERT, UPDATE and DELETE
statements, which are used to access and manipulation data into the system.

• Data manipulation involves retrieval of data from the database, insertion of new data into
the database and deletion of data or modification of existing data.
DBMS

• The INSERT command is used to insert data into a database.


• The UPDATE command is used to update existing data in the database.
• The DELETE command is used to delete records from a database table.
Database Languages
• DDL stands for Data Definition Language and it covers CREATE, DROP, ALTER, TRUNCATE, COMMENT and
RENAME statements, which are used to define on-disk data structures but not data with the terms of SQL
objects. The conceptual schema is specified by a set of definitions expressed by this language. It simply
deals with descriptions of the database schema and is used to create and modify the structure of database
objects in the database.
• These commands are normally not used by a general user, who should be accessing the database via an
application. These definitions include all the entity sets and their associated attributes and their relation
ships. The result of DDL statements will be a set of tables that are stored in special file called data
DBMS

dictionary.
• CREATE is used to create table and objects.
• DROP is used to remove items from a table.
• ALTER is used to change the structure of a database.
• TRUNCATE is used to delete all records from the table, including all of the records’ allotted spaces.
• COMMENTS is used to add comments to the data dictionary.
• RENAME is used to rename existing database items.
Database Languages
• TCL stands for Transaction Control Language and includes begin and commit
statements, and also COMMIT, ROLLBACK, START TRANSACTION and SET
TRANSACTION commands. As a remember, each transaction begins with a specific
task and ends when all the tasks in the group are successfully completed. If any of the
tasks fail, the transaction fails. Therefore, a transaction has only two results: success
or failure.
DBMS

• COMMIT is used to save all the transactions in a database.


• ROLLBACK is used to restore the last transaction when there will be an error in the new transaction.
• START TRANSACTION (or BEGIN) is used to start new transaction.
• SET TRANSACTION is used to set parameters for the transaction to occur.
Database Languages
• DCL stands for Data Control Language and is covered with the statements GRANT and
REVOKE.
• This language enables user to grant authorization and canceling authorization of
database objects.

• GRANT is used to give permission to the users so that they can perform various read
and write operations on the database system.
DBMS

• REVOKE is used to take access back from the users. It basically cancels the permission
that is granted to the user (withdraws the user’s access privileges given by using the
GRANT command).
Database Languages
• DQL stands for Data Query Language and is used to retrieve the record from a
database table by using SELECT command, which is used to get data from database
objects such as tables. This category of command is used to view required records
from a database table.

• SELECT simply, is used to find (view) the subsets of records that are satisfying the
DBMS

given predicate.
Typical DBMS Environment
Typical DBMS components, in a simplified form:
• The top part of the figure refers to the various users of the database environment and
their interfaces. The lower part shows the internal modules of the DBMS responsible for
storage of data and processing of transactions.
• DBA Staff works on defining the database and tuning it by making changes to its
definition using the DDL Statements and other Privileged Commands,
• Casual Users work with Interactive Query interfaces to formulate queries (queries are
parsed and validated for correctness of the query syntax),
• Application Programmers create programs using some host programming languages,
• Parametric Users do data entry work by supplying parameters to predefined
transactions.
DBMS

• The DDL Compiler processes schema definitions, specified in the DDL, and stores
descriptions of the schemas (meta-data) in the DBMS catalog.
• The System Catalog includes information such as the names and sizes of files, names and
data types of data items, storage details of each file, mapping information among
schemas, and constraints.
• The Query Compiler compiles the queries, parses and validates for correctness.
• The Query Optimizer is concerned with the rearrangement and possible reordering of
operations, elimination of redundancies, and use of efficient search algorithms during
execution. It also generates executable code that performs the necessary operations for
the query and makes calls on the runtime processor.
• The Precompiler extracts DML commands from an application program written in a host
programming language. These commands are sent to the DML Compiler for compilation
into object code for database access. The rest of the program is sent to the host language
compiler.
• The Runtime Database Processor executes the privileged commands, the executable query plans, and the canned transactions with runtime parameters. It works with the
system catalog and may update it with statistics. It also works with the Stored Data Manager, which in turn uses basic operating system services for carrying out low-level
input/output (read/write) operations between the disk and main memory.
• Concurrency Control / Backup and Recovery Subsystems are integrated into the working of the runtime database processor for purposes of transaction management.
DB Models
• DB Data Model: Describes the structure of a database. It is a collection of conceptual tools
for describing data, data relationships and consistency constraints and various types of
data model such as:
1. Object based logical model
a. ER-model
b. Functional model
DBMS

c. Object oriented model


d. Semantic model
2. Record based logical model
a. Hierarchical database model
b. Network model
c. Relational model
3. Physical model
Database Entity-Relationship (E-R) Model
• The Entity-Relationship (E-R) data model perceives the real world as consisting of basic
objects, called entities and relationships among these objects. It was developed to
facilitate data base design by allowing specification of an enterprise schema which
represents the overall logical structure of a data base.
• Main features of E-R Model:
• E-R model is a high level conceptual model
DBMS

• Allows us to describe the data involved in a real world enterprise in terms of objects
and their relationships,
• Widely used to develop an initial design of a database,
• Provides a set of useful concepts that make it convenient for a developer to move from
a baseid set of information to a detailed and description of information that can be
easily implemented in a database system,
• Describes data as a collection of entities, relationships and attributes.

• Briefly, the E-R data model employs three basic notions: Entity Sets, Relationship Sets and
Attributes.
Database Entity-Relationship (E-R) Model
• Entity Sets: An entity is a "thing" or "object" in the real world that is distinguishable from all other objects.
• For example, each person in an enterprise is an entity.
• An entity has a set of properties and the values for some set of properties may uniquely identify an entity.
• BOOK is entity and its properties (calles as attributes) bookcode, booktitle, price etc.
• An entity set is a set of entities of the same type that share the same properties, or attributes.
• The set of all PERSONS who are customers at a given bank, for example, can be defined as the
entity set customer.
DBMS

• Attributes: Attributes are descriptive properties possessed by each member of an entity set.
• An entity is represented by a set of attributes. For example:
• CUSTOMER is an entity and its attributes are customerid, custmername, custaddress etc.
• Attributes may be in different types:
• Key attribute: A main attribute may be used to distinguish one entity from another in a group of entities (i.e. student_id).
• Composite attribute: A composite attribute is an attribute that is made up of several other attributes (i.e. the student address is a
composite attribute in the student entity since an address is made up of other attributes such as street, town, and city).
• Multivalued attribute: Refers to an attribute that can hold multivalued / several values (i.e. mobile_phone_number, since a person
can have multiple phone numbers).
• Derived attribute: Doesn’t have its own value but can be obtained with the help of other attributes (i.e. age is a derived attribute,
which can be obtained with the help of current date and date of birth).
Database Entity-Relationship (E-R) Model
• Relationship Sets:
• A relationship set is a set of relationships of the same type.
• Formally, it is a mathematical relation on n>=2 entity sets.
• If E1,E2…En are entity sets,
then a relationship (ri) set R is a subset of {(e1, e2,… en)| e1 Є E1, e2 Є E2…, en Є En}
where (ri)= (e1, e2,… en) is a relationship instance.
• EMPLOYEE and DEPARTMENT, which associates each
DBMS

employee with the department for which the employee


works. Each relationship instance in the relationship set
WORKS_FOR associates one EMPLOYEE entity and one
DEPARTMENT entity.
• Figure on the left illustrates this example, where each
relationship instance ri is shown connected to the
EMPLOYEE and DEPARTMENT entities that participate in ri.
• In the miniworld represented by this figure,
• the employees e1, e3, and e6 work for department d1
• the employees e2 and e4 work for department d2
Some instances in the WORKS_FOR relationship set, which represents a • and the employees e5 and e7 work for department d3
relationship type WORKS_FOR between EMPLOYEE and DEPARTMENT
Database Entity-Relationship (E-R) Model
• Relationship Types:
• Building efficient relations in a database is extremely important -it helps enforce referential integrity
which in its turn contributes to database normalization.
• The term relation is sometimes used to refer to a table in a relational database. However, it is more
often used to describe the relationships that exist between the tables in a relational database.

• There are 3-main types of relationship in a database:


DBMS

1. one-to-one (1:1)
A one-to-one relationship occurs when a
single instance of one entity is linked to
a single instance of another entity.
2. one-to-many (1:n)
A one-to-many relationship occurs when a
single instance of one entity is linked to
several instances of another entity.
3. many-to-many (m:n)
A many-to-many relationship occurs when
more than one instance of one entity is
linked to several instances of another entity.
History of the RDMS - Dr. Codd
• Edgar Frank "Ted" Codd (19 August 1923 - 18 April 2003)
• Invented the relational model for database management systems
DBMS

• Read: https://history.computer.org/pioneers/codd.html
Normalization
• Relational databases are the backbone of many software
systems. They allow us to store, manage, and retrieve data
in an organized and efficient way. However, as the size and
complexity of our data grow, so do the challenges of
maintaining its integrity and consistency. This is where
database normalization comes in.
• Normalization is a process of designing a database schema
in a way to reduce redundancy or duplication of record
DBMS

and eliminate various anomalies (like insert anomaly,


update anomaly, and delete anomaly).
• Normalization is an important process in designing a good
database and eliminating the flaws of a bad database
design. A badly designed database has issues in adding
and deleting records from a table and makes the table
data in an inconstant state.
• The major advantages of normalization are:
• Briefly, database normalization is the process of organizing • It reduces the complexity of big relations,
data in a database to minimize redundancy and • It helps to reduce redundancy of record in a relation,
dependency. • It helps to eliminate various anomalies in a database,
• The goal is to ensure that each piece of data is stored in • It also helps to maintain atomicity in a relation,
one place and in one format. This improves data integrity, • Various normal forms of normalization are used to
eliminates data duplication, and simplifies data address different kinds of issues of a relation.
management.
Problems Without Using Database Normalization
• A database not being normalized correctly can lead to data anomalies. Three types of anomalies can occur insertion, updation,
and deletion.
• These anomalies can be avoided or minimized by designing databases that adhere to the principles of normalization.
Normalization involves organizing data into tables and applying rules to ensure data is stored in a consistent and efficient
manner. By reducing data redundancy and ensuring data integrity, normalization helps to eliminate anomalies and improve the
overall quality of the database.

1. Insertion Anomaly: An Insertion anomaly occurs when you can’t add data to a table without adding unrelated data because the
required fields are missing or because the data is incomplete.
DBMS

For example, imagine you have a table that contains customer orders. If you don’t normalize the data, you might include the
customer’s address in the same table. It means you can’t add an order for a new customer without their address, even if you don’t
have that information yet.
2. Deletion Anomaly: A deletion anomaly occurs when you delete data from a table and unintentionally delete other data as well.
Specifically, deletion anomaly occurs when a row of data is deleted from a table containing the information required by other tables in
the database. This can result in the loss of information that is necessary for the other tables to function properly.
3. Updation Anomaly: An Updation anomaly occurs when you have to update the same data in multiple places. For example, imagine you
have a table that contains customer orders. If you don’t normalize the data, you might include the customer’s name and address in the
same table. If a customer moves, you must update their address in every order they’ve placed.
Normalization - 1st Normal Form
• This is the most basic level of normalization. The first normal form helps to eliminate duplicate data and simplify queries. A
relation will be called in 1st Normal Form (1NF) if it holds the following:
1. Each cell must contain an atomic value.
2. Any attribute must not have multiple values.

• In other words, each table cell should contain only a single value, and each column (field) should have a unique name.

• The following relation named EMPLOYEE is not in 1NF because it has multivalued attibute:
DBMS

• The representation of the EMPLOYEE table into 1NF is shown below where separate rows have been added for each mobile
number to eliminate maintaining atomic value in each cell:
Normalization - 1st Normal Form
• Another example for the 1st normal form:
DBMS

• Relation STUDENT in Table 1 is not in 1NF because of multi-valued attribute STUD_PHONE.


• Its decomposition into 1NF has been shown in Table 2.
Normalization - 1st Normal Form
• Another example for the 1st normal form:
• Below is a students’ record table that has information about student roll number, student name, student course, and age of the student.
DBMS

• In the students' record table, you can see that the course column has two values. Thus it does not follow the First Normal Form.
Now, if you use the First Normal Form to the above table, you get the below table as a result:

• By applying the 1st NF, you achieve atomicity, and also every column has unique values.
Normalization - Keys
Before proceeding with the 2nd NF, let's get familiar with Primary Key, Candidate Key, Super Key, and Foreign Key.
• Primary Key: The primary key is the unique identifier for a table, and it’s used to enforce data integrity by ensuring that each
record in the table is uniquely identified.
• The primary key must be unique (no duplicate values), not null, and can’t be changed or updated once set.
• More than one column can also be a primary key for a table (candidate key).
• Only one candidate key can be designated as the primary key.
DBMS

Primary Key: SupplierID

Primary Key: ProductID


Normalization - Keys
• Candidate Key: A candidate key refers to one or a combination of columns in a database table that can
be used as a unique identifier. Having more than one candidate key in a single table is possible, and
each can be considered a primary key.
DBMS

Candidate key: (OrderID, SupplierID)

Candidate key: (ProductID, SupplierID)

Candidate key: (OrderID, ProductID)


Normalization - Keys
• A candidate key is a set of one or more columns that can identify a record uniquely in a table, and you can use each candidate
key as a Primary Key. Now, let’s use an example to understand this better:
DBMS

• Super Key: Super key is a set of over one key that can identify a record uniquely in a table, and the Primary Key is a subset
(combination) of the Super Key:
Normalization - Keys
• Foreign Key: If an attribute can only take the values which are present as values of some other attribute, it will be a
foreign key to the attribute to which it refers.
• The relation which is being referenced is called referenced relation and the corresponding attribute is called referenced
attribute.
• The referenced attribute of the referenced relation should be the primary key to it.

• It is a key it acts as a primary key in one table and it acts as secondary key in another table.
• It combines two or more relations (tables) at a time.

DBMS

They act as a cross-reference between the tables.


Normalization - 2nd Normal Form
• A relation will be called in second normal form (2NF) if it holds the following properties:
1. It must be in 1NF.
2. All nonprime attributes must be fully functional dependent on a prime attribute.
• The table should not possess partial dependency. The partial dependency here means the proper subset of the candidate key should
give a non-prime attribute.

• In other words, 2NF eliminates redundant data by requiring that each non-key attribute be dependent on the primary key. This means
that each column should be directly related to the primary key, and not to other tables' columns. These new tables should establish
relationships with their parent tables by using foreign keys.
• Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more than one subject.
DBMS

• In the given table on the left, non-prime attribute TEACHER_AGE is dependent on


TEACHER_ID which is a proper subset of a candidate key. That's why it violates the
rule for 2NF.
• To convert the given table into 2NF, we decompose it into two tables:
Normalization - 2nd Normal Form
• Another example for the 2nd normal form:
• Below is a students’ project table:
• In the StudentProject table, we have partial dependency; let us
see how:
• The prime key attributes are StudentID and ProjectID.
• As stated, the non-prime attributes i.e. StudentName and
ProjectName should be functionally dependent on part of
a candidate key, to be Partial Dependent.
• The StudentName can be determined by StudentID, which
makes the relation Partial Dependent.
DBMS

• The ProjectName can be determined by ProjectID, which


makes the relation Partial Dependent.
• Therefore, the <StudentProject> relation violates the 2NF
in Normalization and is considered a bad database
design.

• To remove Partial Dependency and violation on 2NF, decompose the above table:
Normalization - 3rd Normal Form
• A relation will be called in third normal form (3NF) if it holds the following properties:
1. It must be in 2NF.
2. There shouldn't be any transitive dependency.
3. Every non-key column must be directly dependent on the primary key.

• The 3rd Normal Form ensures the reduction of data duplication. It is also used to achieve
DBMS

data integrity.

• As a remember, transitive dependency means when a non-key column is determined by


another non-key column, which is determined by the primary key. This can cause data
redundancy and inconsistency.
Normalization - 3rd Normal Form
• Let’s take the example of the Customers table we have been using: • And suppose we add a new column CustomerState to the table to track the state where the customer lives:

• Here, we have introduced a transitive dependency between CustomerState and CustomerID through CustomerAddress. That is, CustomerState is
DBMS

determined by CustomerAddress, which is determined by CustomerID.


• This can cause data redundancy and inconsistency because if the CustomerAddress for a customer changes, then we need to update the CustomerState
for that customer as well. This can lead to inconsistencies if we forget to update one of the fields.
• To remove this transitive dependency, we can split the Customer table into two separate tables, one for the customers and one for their addresses:
• So finally, we will now have a total of three tables:

• In conclusion, the Third Normal Form (3NF) helps to remove the transitive dependencies and makes the table more consistent and reliable. It is important to note that normalization
does not guarantee performance optimization but helps maintain data consistency, which can lead to better performance.

You might also like