Best DBMS
Best DBMS
Why it is important?
An Overview of the Database Management
• DBMS is the acronym of Data Base Management System.
• DBMS is a collection of interrelated data and a set of programs to
access this data in a convenient and efficient way.
• It controls the organization, storage, retrieval, security and integrity of
data in a database.
• The emergence of the first type of DBMS was between 1960's-70's;
that was the Hierarchical DBMS. IBM had the first model, developed
on IBM 360 and their (DBMS) was called IMS, originally it was written
for the Apollo program. This type of DBMS was based on binary trees,
where the shape was like a tree and relations were only limited
between parent and child records.
File System vs DBMS
• Data accessibility is easy
• Transaction support
• Concurrency control with Recovery services
• Authorization services
• The value of data is the same at all places.
• Allows multiple users to share a file at the same time
• Protection & Security
Types of Data
• Structured Data/Relational Database examples: Microsoft SQL Server,
Oracle Database, MySQL, PostgreSQL and IBM Db2
• Unstructured Data/NOSQL example: Apache Cassandra, MongoDB,
CouchDB, and Couchbase.
Types of Database Users
• Administrators − Administrators maintain the DBMS and are responsible for
administrating the database. They are responsible to look after its usage and by whom it
should be used. They create access profiles for users and apply limitations to maintain
isolation and force security. Administrators also look after DBMS resources like system
license, required tools, and other software and hardware related maintenance.
• Designers − Designers are the group of people who actually work on the designing part
of the database. They keep a close watch on what data should be kept and in what
format. They identify and design the whole set of entities, relations, constraints, and
views.
• End Users − End users are those who actually take the benefits of having a DBMS. End
users can range from simple viewers who pay attention to the logs or market rates to
sophisticated users such as business analysts.
Database Models
A database model shows the logical structure of a database, including
the relationships and constraints that determine how data can be
stored and accessed.
• Hierarchical Database Model
• Network Database Model
• Relational Database Model
• Object-Oriented Database Model
• Entity-Relationship Database Model
SQL Introduction
• SQL stands for Structured Query Language
• Declarative or Informal
• Not case sensitive
Types of Commands
DDL
• CREATE TABLE EMPLOYEE(Name VARCHAR2(20),
Email VARCHAR2(100), DOB DATE);
• DROP TABLE EMPLOYEE;
• Alter for Add column, remove column, change datatype, column
name, change datatype length
• ALTER TABLE table_name ADD column_name COLUMN-definition;
• TRUNCATE TABLE table_name;
• Rename old to new name
DML
• INSERT INTO TABLE_NAME (value1, value2, value3, .... valueN);
• UPDATE table_name SET [column_name1= value1,...column_nameN
= valueN] [WHERE CONDITION]
• DELETE FROM table_name [WHERE condition];
DCL
• GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHER_US
ER;
• REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;
Transaction Control Language
• COMMIT;
• ROLLBACK;
• SAVEPOINT SAVEPOINT_NAME;
DQL
• SELECT expressions FROM TABLES WHERE conditions;
• SELECT Distinct
Order by
• The ORDER BY keyword is used to sort the result-set in ascending or
descending order.
• By default ascending
• SELECT * FROM employee
ORDER BY Salary DESC;
Examples
• Employee table
ID NAME SALARY DEPARTMENT
1 A 10000 IT
2 B 20000 HR
3 C 30000 IT
4 A 40000 SALES
5 D 50000 IT
Aggregate Functions
• Sum
• Avg
• Count
• Max
• Min
Group by & Having
• The GROUP BY statement groups rows that have the same values into
summary rows, like "find the number of customers in each country".
• The GROUP BY statement is often used with aggregate functions
(COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or
more columns.
• The HAVING clause was added to SQL because the WHERE keyword
could not be used with aggregate functions.
Operators
• AND, OR, NOT
• SELECT * FROM EMPLOYEE
WHERE DEPARTMENT=‘IT’ OR DEPARTMENT=‘HR’;
• SELECT * FROM EMPLOYEE
WHERE DEPARTMENT=‘IT AND NAME=‘A’;
• SELECT * FROM EMPLOYEE
WHERE NOT DEPARTMENT=‘IT’;
• >, <, <=, >=, <>
• Between, Not between
Operators
• In, Not In
• SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);
Joins and Nested Queries
• Joins we discussed in relational algebra video and practice questions
will be discussed in LIVE Class!!!!!!!!!
Command Syntax with example (SQL is not case sensitive)
Create Create table Student (id int, name varchar (10)); /* table name is student and two columns
inserted, id with datatype integer and name with varchar of size 10 */
Alter Adding new column-> Alter table Student Add age int;
(it has many Remove a column-> Alter table Student drop column age;
variations) Change a column datatype-> Alter table student modify id varchar (10);
Rename a column-> Alter table student rename column id TO new_id;
Change size of datatype-> Alter table student modify id varchar (25);
Insert Insert into Student values (1, ‘Ram’); /* insert into student table but column sequences
wise*/
For example:
Savepoint S1;
Savepoint S2;
Rollback Rollback to savepoint_name;
For example:
Rollback to S1;
Rollback to S2;
Introduction to Constraints in SQL
Constraints are the rules that we can apply on the type of data in a table.
UNIQUE It ensures that all the values in the column must be unique.
PRIMARY KEY It ensures that all the values in the column must be unique as well as Not null
FOREIGN KEY A Foreign key is a field which can uniquely identify each row in another table. It is
used for Referential Integrity.
CHECK It is used for validating the values of a column to meet a particular condition. For
example, if age column should contain data value more than 18. Then check is used.
DEFAULT This constraint specifies a default value for the column when no value is specified by
the user. For example, if we want default date 01/01/2000 should come. Then
default is used
1
RAID 1 Mirroring without parity or striping 2 n − 1 drive failures
𝑛
1
RAID 3 Byte-level striping with dedicated parity 3 1− One drive failure
𝑛
1
RAID 4 Block-level striping with dedicated parity 3 1− One drive failure
𝑛
1
RAID 5 Block-level striping with distributed parity 3 1− One drive failure
𝑛
2
RAID 6 Block-level striping with double distributed parity 4 1− Two drive failure
𝑛
DBMS Architecture
2-Tier Architecture
The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the client end
can directly communicate with the database at the server side. For this interaction, API's like: ODBC, JDBC are
used.
The user interfaces and application programs are run on the client-side.
The server side is responsible to provide the functionalities like: query processing and transaction management.
To communicate with the DBMS, client-side application establishes a connection with the server side.
3-Tier Architecture
The 3-Tier architecture contains another layer between the client and server. In this architecture, client can't
directly communicate with the server.
The application on the client-end interacts with an application server which further communicates with the
database system.
End user has no idea about the existence of the database beyond the application server. The database also has
no idea about any other user beyond the application.
The 3-Tier architecture is used in case of large web application.
3 Schema Architecture: Data abstraction.
Physical Level: At the physical level, the information about the location of database objects in the data store is
kept. Various users of DBMS are unaware of the locations of these objects. In simple terms, physical level of a
database describes how the data is being stored in secondary storage devices like disks and tapes etc.
Conceptual Level: At conceptual level, data is represented in the form of various database tables. For Example,
STUDENT database may contain STUDENT and COURSE tables which will be visible to users but users are unaware
of their storage. It defines tables, views, and integrity constraints. Also referred as logical schema, it describes
what kind of data is to be stored in the database.
External Level: An external level specifies a view of the data in terms of conceptual level tables. Each external
level view is used to cater to the needs of a particular category of users. For Example, FACULTY of a university is
interested in looking course details of students, STUDENTS are interested in looking at all details related to
academics, accounts, courses and hostel details as well. So, different views can be generated for different users.
Data Independence
Data independence means a change of data at one level should not affect another level. Two types of data
independence are present in this architecture:
Physical Data Independence: Any change in the physical location of tables and indexes should not affect the
conceptual level or external view of data. This data independence is easy to achieve and implemented by most
of the DBMS.
Conceptual Data Independence: The data at conceptual level schema and external level schema must be
independent. This means a change in conceptual schema should not affect external schema. e.g.; Adding or
deleting attributes of a table should not affect the user’s view of the table. But this type of independence is
difficult to achieve as compared to physical data independence because the changes in conceptual schema are
reflected in the user’s view.
Database Schema
Database schema is the skeleton of database that represents the logical view of the entire database.
A database schema does not contain any data or information.
A database schema defines its entities and the relationship among them.
Database Instance
It contains a snapshot of the database.
Advantages of DBMS
• Minimized redundancy and data inconsistency
• Simplified Data Access
• Multiple data views
• Data Security
• Concurrent access to data
• Backup and Recovery mechanism
Database model
A Database model defines the logical design and structure of a database and defines how data will
be stored, accessed and updated in a database management system. While the Relational Model is
the most widely used database model, there are other models too:
• Hierarchical Model
• Network Model
• Entity-relationship Model
• Relational Model
Hierarchical Model
This database model organizes data into a tree-like-structure, with a single root, to which all the other data is
linked. The hierarchy starts from the Root data, and expands like a tree, adding child nodes to the parent nodes.
In this model, a child node will only have a single parent node.
This model efficiently describes many real-world relationships like index of a book, recipes etc.
In hierarchical model, data is organized into tree-like structure with one one-to-many relationship between two
different types of data, for example, one department can have many courses, many professors and one course
many students.
Network Model
This is an extension of the Hierarchical model. In this model data is organized more like a graph, and are allowed
to have more than one parent node.
In this database model data is more related as more relationships are established in this database model. Also,
as the data is more related, hence accessing the data is also easier and fast. This database model was used to
map many-to-many data relationships.
This was the most widely used database model, before Relational Model was introduced.
Entity-relationship Model
In this database model, relationships are created by dividing object of interest into entity and its characteristics
into attributes.
Different entities are related using relationships.
E-R Models are defined to represent the relationships into pictorial form to make it easier for different
stakeholders to understand.
This model is good to design a database, which can then be turned into tables in relational model.
Let's take an example, If we have to design a School Database, then Student will be an entity with attributes
name, age, address etc. As Address is generally complex, it can be another entity with attributes street name,
pincode, city etc, and there will be a relationship between them.
Relational Model
In this model, data is organized in two-dimensional tables and the relationship is maintained by storing a
common field.
This model was introduced by E.F Codd in 1970, and since then it has been the most widely used database model,
infect, we can say the only database model used around the world.
The basic structure of data in the relational model is tables. All the information related to a particular type is
stored in rows of that table.
Hence, tables are also known as relations in relational model.
In the coming tutorials we will learn how to design tables, normalize them to reduce data redundancy and how
to use Structured Query language to access data from tables.
Schema
• Schema can be defined as the design of a database. The overall
description of the database is called the database schema.
• You can relate it as something like Functions, Comments,
Preprocessor, statements, types and variables in programming
languages.
• Subschema is subset of schema which allows the user to view only
their authorized part.
Types
1. Physical Schema: The design of a database at physical level is called
physical schema, how the data stored in blocks of storage is described
at this level.
2. Logical schema: Logical schema can be defined as the design of
database at logical level. In this level, the programmers as well as the
database administrator (DBA) work.
3. View Schema: View schema can be defined as the design of
database at view level which generally describes end-user interaction
with database systems.
What is an Instance?
• Databases change over time as information is inserted and deleted.
The collection of information stored in the database at a particular
moment is called an instance.
Data Independence/Transparency
• There are two types of data independence: physical and
logical data independence.
• Physical data independence is the ability to modify the
physical schema without causing application programs to
be rewritten.
• Logical data independence is the ability to modify the
logical schema without causing application programs to be
rewritten.
• Logical data independence is more difficult to achieve than
physical data independence.
What is Key
• A DBMS key is an attribute or set of an attribute which helps you to
identify a row(tuple) in a relation(table).
• Simple vs Composite
NAME AGE
Ravi 20
Aman 21
Ravi 20
Types of Keys
• Candidate Key
• Primary Key
• Alternate Key
• Super Key
• Foreign Key
Candidate Key
• CANDIDATE KEY is a set of attributes that uniquely identify tuples in a
table
• The Primary key should be selected from the candidate keys.
• Every table must have at least a single candidate key.
• A table can have multiple candidate keys but only a single primary
key.
Primary Key
• Unique + Not Null
• Only one in a table
• Can be composite
• Qus. Can a table exist without primary key attribute??
Alternative key
• Candidate Key – Primary Key
Super Key
• Super Set of Candidate Key
• Must contain Candidate key + Anything
• R(ABCD) contain four attributes and given that A is candidate key.
Find all super keys?
Qus. R(ABCD) contain four attributes and given that A and B are two
candidate keys. Find all super keys?
Foreign Key
• FOREIGN KEY is a column that creates a relationship between two
tables
• Can have duplicate values
• Maintain Referential Integrity
Qus. Can a table contain many foreign keys????
Qus. Can foreign key have Null values????
Referenced table vs Referencing table
(Insertion & deletion)
Integrity Constraints
• Integrity constraints are a set of rules. It is used to maintain the
quality of information.
• Unique key constraint
• Primary key Constraint
• Referential integrity constraint
• Domain Constraint ( Check constraint)
Questions
• Which of the following is True?
A) All the candidate keys can be called as super keys
B) All the Super keys can be called as candidate keys
Topics to be covered
• Definition with examples
• Notations
• What is Entity
• Attributes and their types
• Relationship and their types
• Minimization
• Generalization
• Specialization
• Aggregation
• Conversion of ER to tables
ER model (Entity-Relationship Model)
Symbols Used:
• A single rectangle is used for representing a strong entity set.
• A diamond symbol is used for representing the relationship that exists between two strong entity sets.
• A single line is used for representing the connection of the strong entity set with the relationship set.
• A double line is used for representing the total participation of an entity set with the relationship set.
• Total participation may or may not exist in the relationship.
2. Weak Entity Set:
• A weak entity set is an entity set that does not contain sufficient attributes to uniquely identify its entities.
• In other words, a primary key does not exist for a weak entity set.
• However, it contains a partial key called as a discriminator.
• Discriminator can identify a group of entities from the entity set.
• Discriminator is represented by underlining with a dashed line.
• The combination of discriminator and primary key of the strong entity set makes it possible to uniquely
identify all entities of the weak entity set.
• Thus, this combination serves as a primary key for the weak entity set.
• Clearly, this primary key is not formed by the weak entity set completely.
Symbols Used
• A double rectangle is used for representing a weak entity set.
• A double diamond symbol is used for representing the relationship that exists between the strong and
weak entity sets and this relationship is known as identifying relationship.
• A double line is used for representing the connection of the weak entity set with the relationship set.
• Total participation always exists in the identifying relationship.
In ER diagram, weak entity set is always present in total participation with the identifying relationship set.
So, the Primary key of Apartment= Building number + Door number
Closure Property
• The set of all those attributes which can be functionally determined from
an attribute set is called as a closure of that attribute set.
• Closure of attribute set {X} is denoted as {X}+.
Consider a relation R(A , B , C , D , E , F , G ) with the functional dependencies
A → BC, BC → DE, D → F, CF → G
Find Candidate Key from Closure
• Minimal set of attribute whose closure contains all the attributes of
the relation, then that attribute set is called as a candidate key of that
relation.
Minimal Cover/Canonical Cover/Irreducible
• R ( W , X , Y , Z ) – : X → W, WZ → XY, Y → WXZ
What is a Functional Dependency?
Software Engineering,
103 Amrita
Compiler Design
Second Normal Form (2NF)
• A relation must be in 1NF and
• No partial dependency should exist in the relation.
• Partial Dependency: If a non-prime attribute can be determined by
the part of the candidate key in a relation, it is known as a partial
dependency.
• if L.H.S is the proper subset of a candidate key and R.H.S is the non-
prime attribute, then it shows a partial dependency.
Partial Dependency &
Prime, Non-Prime attributes
• Partial Dependency: If a non-prime attribute can be determined by
the part of the candidate key in a relation, it is known as a partial
dependency.
• if L.H.S is the proper subset of a candidate key and R.H.S is the non-
prime attribute, then it shows a partial dependency.
• R(ABCD) && FD= AB->C, C->D, B->D && Candidate key=AB
Third Normal Form (3NF)
• A relation must be in second normal form (2NF) And there should be
no transitive functional dependency exists for non-prime attributes in
a relation.
Rollno State City
1 Punjab Chandigarh
2 Haryana Ambala
3 Punjab Chandigarh
4 Haryana Ambala
5 Uttar Pradesh Ghaziabad
Check whether a table in 3rd NF or Not
• In all FDs X->Y
• X is a super key or candidate key And Y is a prime attribute, i.e., Y is a
part of candidate key.
Boyce-Codd Normal Form (BCNF)
• A relation is in 3NF And for every functional dependency, X → Y, L.H.S
of the every functional dependency (X) be the super key of the table.
• R(ABCD) && FD: A->B, B->C, C->D, D->A
Fourth Normal Form
• A relation is in BCNF.
• And, there is no multivalued dependency exists in the relation.
• Multivalued dependency: For a dependency X → Y, if for a single
value of X, multiple values of Y exists, then the relation may have a
multi-valued dependency. It is represented by the double arrow
sign (→→).
Fourth Normal Form
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
Fifth normal form (5NF)
• A relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.
• No spurious tuples
• 5NF is also known as Project-join normal form (PJ/NF).
What is Relational Algebra?
• RELATIONAL ALGEBRA is a widely used procedural query
language(also Formal).
• Mathematical expressions
• Building block of SQL
• SQL a procedural language
Basic Relational Algebra Operations
• Unary Relational Operations
• SELECT (symbol: σ)
• PROJECT (symbol: π)
• RENAME (symbol: ρ Rho )
• Relational Algebra Operations From Set Theory
• UNION (υ)
• INTERSECTION (⋂),
• DIFFERENCE (-)
• CARTESIAN PRODUCT ( x )
• Binary Relational Operations or Derived
• JOIN
• DIVISION
Examples
Union, Intersection, Difference
• Both tables must be the same number of attributes.
• Attribute domains need to be compatible.
Cross Product (M*N)
Types of JOIN
• Various forms of join operation are:
• Inner Joins:
EQUI join ⋈ A.column = B.column (B)
Natural join ⋈
• Outer join:
Left Outer Join
Right Outer Join
Full Outer Join
Division
• When Query is like every/all then we use division
• Find the student who have completed all the tasks