SQL Database
SQL Database
(Database Management)
Databases
• SQL stands for Structured Query Language
• it was first developed in the 1970s by IBM researchers Raymond
Boyce and Donald Chamberlin
• it is a database language used for constructing statements processed
by a database server
Databases
DATABASE
a database is a set of persistant data used by a given
application and managed by a databse management system (DBMS)
THE DATA IN THE MAIN MEMORY (RAM) IS LOST EVERY TIME THE
APPLICATION FINISHES OR WE SWITCH OFF THE COMPUTER !!!
Databases
DATABASE
a database is a set of persistant data used by a given
application and managed by a databse management system (DBMS)
DATABASE
a database is a set of persistant data used by a given
application and managed by a databse management system (DBMS)
DATABASE LANGUAGE
every database server has a database language with
which we can execute statements – SQL is a database language
Databases
APPLICATION
SQL
1.) main memory (RAM): all the data structures considered so far are
stored in the main memory
hard drive storage can store large amounts and sizes of files
such as file systems or databases
File Systems and Databases
There are 2 types of memory:
1.) main memory (RAM): all the data structures considered so far are
stored in the main memory
T
S
A
F
stack memory and heap memory are located in the
main memory as well
hard drive storage can store large amounts and sizes of files
such as file systems or databases
File Systems and Databases
There are 2 types of memory:
1.) main memory (RAM): all the data structures considered so far are
stored in the main memory
T
S
A
F
stack memory and heap memory are located in the
main memory as well
hard drive storage Wcan store large amounts and sizes of files
L O
S
such as file systems or databases
File Systems and Databases
EXTERNAL MEMORY
8 12 20
-2 9 11 12 14 19 32
B-Trees
B-TREE PROPERTIES
1.) all the nodes of the tree structure can contain m keys – so it may
have m+1 children (branching factor)
2.) every node is at least half full – so contain at least items
3.) if the N number of items in a node is less than then we merge it
with another node and if N>m then we split the node
4.) all leaf nodes are at the same level (balanced)
Procedural and Non-
Procedural Programming
Languages
(Database Management)
Programming Languages
C, C++ or Java
Programming Languages
PROCEDURAL LANGUAGES
• we can define the methods and variables
• there are conditional statements and loops as well
• we have full control over the application
• procedural languages define the results and the entire mechanism of
an application
• C, C++ or Java are procedural programming languages
Programming Languages
NON-PROCEDURAL LANGUAGES
• we define the query with SQL but the database engine will do the
entire mechanism
• the database engine (optimizer) takes the SQL statement and decide
what will be the most efficient execution
• some database vendors will enable us to write complete scripts -
PL/SQL or T-SQL
• MySQL have stored procedures – so procedural commands
Programming Languages
SQL STATEMENT
Query Language
Processor
OPTIMIZER
DBMS Engine
Transaction Manager
PHYSICAL DATABASE
Database Management
Systems
(Database Management)
Database Management Systems
• there are several database management systems available
Oracle Database
MySQL
Oracle Database
• the Oracle database uses the so-called PL/SQL language
• it stands for Procedural Language for SQL
• it has the advantage of procedural languages: loops and conditional
statements
• it is quite fast and has better performance in the main
• it can handle errors and exceptions
MS SQL Database
• Microsoft’s SQL Server uses the so-called T-SQL language
• it has a transaction control feature
• it can handle errors and exceptions
• we can define variables as well
MySQL
• MySQL has a huge advantage: it is open-source
• it is suited for small (and medium) web pages like WordPress websites
and webpages
• it can not handle huge datasets efficiently and transactions are not
handled efficiently
• it has poor performance scaling
NoSQL
• NoSQL is a non-relational database system (these are graph
databases)
• MongoDB and Apache Cassandra uses it under the hood
• It is extremely useful in big data and real-time web applications
• it has a huge advantage: it is scalable
Data Types
(Database Management)
Data Types
1.) CHARACTER DATA
char(20) varchar(20)
the maximum length for char the maximum length for varchar
is 255 bytes. is 65 535 bytes.
for example we can store country for example we can store names
codes with 3 letters with char(3) efficiently with varchar(30) because the length
of names differ
- HUN, USA or GER -
Data Types
2.) TEXT DATA
If we want to store a huge text then using varchar is not the best
option possible – use texts for data that exceed 64 kB
2.71828
UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
1 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table
UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
1 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
• what happens to the foreign keys when the given table’s values are
updated (ON UPDATE) or removed (ON DELETE)?
• there are several algorithms to deal with this problem
Foreign Keys
1.) RESTRICT
This is the simplest solution – it is forbidden to update or remove a given row (can
not modify the value)
This is another solution how to deal with the problem – whenever the entry in the
parent tabe is removed (or updated) then the matching values are set to be NULLs
„If a row from the parent table has a matching row in the
child table MySQL sets the value to be a NULL”
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table
UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
1 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table
UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
2 Harvard
THIS IS THE PARENT TABLE OR
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 NULL
these references
4 Adam 29 NULL the given institutions
in the UNIVERSITY table
UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
2 Harvard
THIS IS THE PARENT TABLE OR
THE REFERENCED TABLE
Foreign Keys
1.) CASCADE
This is another solution how to deal with the problem – whenever the entry in the
parent tabe is removed (or updated) then the matching values are updated
UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
1 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table
UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
10 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 10
these references
4 Adam 29 10 the given institutions
in the UNIVERSITY table
UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
10 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table
UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
1 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table
UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
2 Harvard
THIS IS THE PARENT TABLE OR
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
these references
the given institutions
in the UNIVERSITY table
UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
2 Harvard
THIS IS THE PARENT TABLE OR
THE REFERENCED TABLE
Joining Tables
(Database Management)
Inner Join
• what if we want to combine the values of multiple database tables?
• this is when we have to use the JOIN keyword
• there are several ways to combine multiple tables
TABLE 1 TABLE 2
UNIVERSITY
ID NAME
1 MIT
2 Harvard
3 Cambridge
Inner Join
SELECT PERSON.person_name, UNIVERSITY.university_name FROM PERSON
INNER JOIN UNIVERSITY ON PERSON.university = UNIVERSITY.university_id
RESULT SET
person_name university_name
Ana MIT
Adam Harvard
Left Join
(Database Management)
Left Join
TABLE 1 TABLE 2
UNIVERSITY
ID NAME
1 MIT
2 Harvard
3 Cambridge
Left Join
SELECT PERSON.person_name, UNIVERSITY.university_name FROM PERSON
LEFT JOIN UNIVERSITY ON PERSON.university = UNIVERSITY.university_id
RESULT SET
person_name university_name
Kevin NULL
Joe NULL
Ana MIT
Adam Harvard
Right Join
(Database Management)
Right Join
TABLE 1 TABLE 2
UNIVERSITY
ID NAME
1 MIT
2 Harvard
3 Cambridge
Right Join
SELECT PERSON.person_name, UNIVERSITY.person_name FROM PERSON
RIGHT JOIN UNIVERSITY ON PERSON.university = UNIVERSITY.university_id
RESULT SET
person_name university_name
Ana MIT
Adam Harvard
NULL Cambridge
Right Join
• what is the difference between right join and left join?
• left join and right join statements clauses are equivalent and they can
replace each other if the table order is reverse
Subqueries
(Database Management)
Subqueries
• a subquery is a standard SQL statement or query contained in
another SQL query
• It is always enclosed within parantheses
Subqueries
NON-CORRELATED SUBQUERIES
Subqueries
There are 2 main types of subqueries:
NON-CORRELATED SUBQUERIES
CORRELATED SUBQUERIES
Non-Correlated Subqueries
Non-correlated subqueries are totally self-contained which means
it does not reference anything from the containing statement
SELECT * FROM city WHERE id <> (SELECT id FROM city WHERE code=‚HUN’); ERROR !!!
The subquery returns multiple rows and we can not equate a single value to
a set of values. We can check whether a given value can be found within
a set of values with IN and NOT IN keywords
Non-Correlated Subqueries
Non-correlated subqueries are totally self-contained which means
it does not reference anything from the containing statement
SELECT * FROM city WHERE id NOT IN (SELECT id FROM city WHERE code=‚HUN’); GOOD !!!
The subquery returns multiple rows and we can not equate a single value to
a set of values. We can check whether a given value can be found within
a set of values with IN and NOT IN keywords
Correlated Subqueries
Correlated subqueries are NOT self-contained which means it does
reference something from the containing statement
- SIMPLIFY QUERIES -
if the relationships of the database tables are simple
then we can construct simple queries as well
Redundancy
STUDENT
ID NAME AGE UNIVERSITY UNIVERSITY ADDRESS
1 Kevin 19 Harvard Cambridge, MA, USA
2 Joe 34 Harvard Cambridge, MA, USA
3 Ana 22 Harvard Cambridge, MA, USA
4 Adam 29 Harvard Cambridge, MA, USA
Redundancy
STUDENT
ID NAME AGE UNIVERSITY UNIVERSITY ADDRESS
1 Kevin 19 Harvard Cambridge, MA, USA
2 Joe 34 Harvard Cambridge, MA, USA
3 Ana 22 Harvard Cambridge, MA, USA
4 Adam 29 Harvard Cambridge, MA, USA
UNIVERSITY
ID NAME ADDRESS
1 Harvard Cambridge, MA, USA
Data Normalization
• how to achieve data normalization and how to reduce redundancy?
UNIVERSITY
ID NAME
1 Harvard
2 MIT
Data Normalization – Third
Normal Form
• first we should make sure the second normal form related principles
are not violated
• the given database table must not have any transitive dependencies
Data Normalization – Third
Normal Form
STUDENT
ID NAME SUBJECT TEACHER
1 Kevin Java Harvard
2 Ana C Harvard
3 Daniel C++ MIT
4 Adam Python Harvard
if the database is locked then any other users wishing to modify (update
or read) that data must wait until the lock has been released
1.) locking
2.) „versioning”
Locking
1.) database writers must request (and receive) a write lock from the server
to modify a given data but there may be multiple read locks
ONLY ONE WRITE LOCK IS GIVEN OUT AT A TIME FOR A GIVEN TABLE
database readers must request (and receive) a read lock from the server
to query data
For example: MS SQL server – problem is that it can lead to long wait times
Locking
multiple sessions can acquire the the session that has the write lock
read lock at the same time can read and write as well
the session can read exclusively other sessions can not even read
while holding the read lock from the database table
(no write operations are allowed) while the write lock is
not released
other sessions with the write lock
has to wait for the read lock to read lock is a shared lock
finish execution while write lock is exclusive
Locking
THERE ARE MULTIPLE TYPES OF LOCK:
table lock: only a single user can manipulate a given database table at a given time
Quite slow approach but easy to implement
Oracle server
MySQL
problem if there are long running queries
uses both techniques
while data is being modified
Storage Engine for MySQL
We have been discussing locking techniques. MySQL supports several locking
strategies and we can specify what we want to do
MyISAM InnoDB
non-transactional engine with table locking transactional engine with row-level locking
A
atomicity
ACID Principles
A C
atomicity consistency
ACID Principles
A C I
atomicity consistency isolation
ACID Principles
A C I D
atomicity consistency isolation durability
Atomicity
• atomicity principle requires that every single transaction can be „all
or nothing”
• it means that if one part of the transaction fails then the entire
transaction fails so database state is left unchanged „rollback”
• so aborted transactions do not happen
• an atomic system must guarantee atomicity in every situation – even
when errors or power failures happen
Consistency
• consistency property makes sure that any transaction will bring the
database from one valid state to another
• the data inserted into the database must be valid
• including constraints, cascades or triggers...
Isolation
• the isolation property ensures that the concurrent execution of
transactions results in a system state that would be obtained if
transactions were executed serially
• sequentially means one after the other
• it is basically a concurrency control
• the effects of an incomplete transaction might not even be visible to
another transaction
Durability
• the durability property ensures that once a transaction has been
committed it will remain so
• even in the event of power loss or errors
• to defend against power loss transactions must be recorded in a non-
volatile memory
Views
(Database Management)
Views
• it is general in software engineering that applications should hide
concrete implementation
• this is why SOLID principles and design patterns came to be
• users can use the application but know nothing about the
implementation details
• good database design is the same
• We should keep database tables private and users can access data
only through a set of views
Views
• views are SQL statements that is stored in the database
• views do not involve data storage which means views do not consume
disk space
• we just assign a name to a select (or query) we store this given query
for others to use
• users can use this view to access data just like querying tables directly
• we can use all the statements: SELECT, GROUP BY, ORDER BY ...
Views
the problem with this approach
is that the application may
access private information
(credit card data etc.)
DATABASE TABLES
APPLICATION
Views
V
I DATABASE TABLES
APPLICATION E
W
Views
• for example exposing an entire bank account may violate some policies
• we define a view and the user can access only a portion of the bank
accounts in order to verify the identity
• so we can limit access to a given database table with views
Views
1.) VIEWS CAN HIDE COMPLEXITY
3.) SUMMARY we can summarize data from various tables which can be used to
generate reports (for example JasperReports)
Indexes
(Database Management)
Indexing
• when we insert a new item into a database table – the order of the
items are not sorted by default
• the new item is always inserted to the next available location
• indexing is a mechanism for finding a specific item fast
Indexing
1 Adam
6 Kevin the items of a database table
are not necesserily ordered by deafult
3 Ana
DATABASE TABLES ARE STORED
8 Michael IN FILES IN THE EXTERNAL MEMORY
5 Daniel
there may be gaps in the table
because of the previous
remove operations
4 Sofia
Indexing
• indexes are special tables in the database that are kept in a sorted
order – in the form of a tree like structure
• INDEXES DOES NOT CONTAIN ALL THE DATA
• the indexes store just the column (usually the primary key) used to
locate a given row in a database table
• basically a pointer where the given row is physically located in memory
• indexes can speed up SELECT and WHERE clauses
• but can dramatically slow down UPDATE and INSERT statements
• the query optimizer optimizes the query
Indexing
ALTER TABLE table_name ADD INDEX index_name (column_name)
Better approach when the columns can have a small number of values
Indexing
• we use a tree like structure (B-tree) to store
and represent the indexes
• this is how we can find items in O(logN)
12 logarithmic running time complexity
• why is it slow to INSERT or UPDATE items?
4 20
• because of course we have to reconstruct
5 23 the tree like structure
1
• actually we store key-value pairs
• the index is the key and the value is the
pointer to the actual row in the table
Disadvantages of Indexing
• do not use indexes unnecessary
• every time we insert a new item or remove an item – the indexes (data
structure) must be reconstructed
• so if we have several indexes then the database server has to manage
the tree like data structure
• and of course indexes use memory ...
• Indexes should not be used on small database tables
• columns that are frequently manipulated should not be indexed
Constraints
(Database Management)
Constraints
• constraints are restrictions to a given database table column
• NOT NULL constraint
• PRIMARY KEY constraint
• FOREIGN KEY constraint
• UNIQUE KEY constraint
• CHECK constraint
Bitmap Indexing
(Database Management)
Bitmap Indexing
the main problem with tree like structres
is that they may get too large
12
which means they contain a huge
4 20 amount of items and this is why they may
become a bit slower
5 23
1
TRUE 0 0 1 0
FALSE 1 1 0 1
Bitmap Indexing
ADS
ID NAME FOR CHILDREN STATUS
1 website_traffic FALSE pending
2 udemy_mail FALSE approved
3 facebook_ad TRUE rejected
4 instagram_ad FALSE approved
TRUE 0 0 1 0
FALSE 1 1 0 1
PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE status = ’pending’
TRUE 0 0 1 0
FALSE 1 1 0 1
PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE status = ’pending’
TRUE 0 0 1 0
FALSE 1 1 0 1
PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE status = ’pending’
TRUE 0 0 1 0
FALSE 1 1 0 1
PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE status = ’pending’
TRUE 0 0 1 0
FALSE 1 1 0 1
PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE status = ’pending’
TRUE 0 0 1 0
FALSE 1 1 0 1
PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE children=’true’ AND status = rejected’
TRUE 0 0 1 0
FALSE 1 1 0 1
PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
TRUE 0 0 1 0
FALSE 1 1 0 1
PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
TRUE 0 0 1 0
FALSE 1 1 0 1
PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
TRUE 0 0 1 0
FALSE 1 1 0 1
PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
TRUE 0 0 1 0
FALSE 1 1 0 1
PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
3 Ana
DATABASE TABLES ARE STORED
8 Michael IN FILES IN THE EXTERNAL MEMORY
5 Daniel
there may be gaps in the table
because of the previous
remove operations
4 Sofia
Database Data Structures
• the data located on the hard drive disk
DISK MEMORY (HDD) MAIN MEMORY (HDD) can not be proccessed explicitly
• it must be brought into the main memory
• in the main memory (RAM) we can use
PROGRAM either the stack memory or the heap
memory
• we can manipulate (read or write) the
blocks (pages) exclusively
• page size ranges from 2-16 KB
• ACCESSING THE BLOCKS IS SLOW !!!
Database Data Structures
• accessig items on the external memory (HDD) is way slower than
manipulating the mian memory
• we need totally different data structures
5 8
1 Adam
6 Kevin
8 Michael
5 Daniel
Database Data Structures
8 12 20
-2 9 11 12 14 19 32
5 12
0 1 5 6 8 12 13 17
Database Data Structures
8
5 12
0 1 5 6 8 12 13 17
0 Adam
6 Kevin
5 Ana
17 Sofia
13 Michael
1 Daniel
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;
5 12
0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;
5 12
0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;
5 12
0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;
5 12
0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;
5 12
0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;
5 12
0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;
5 12
0 1 5 6 8 12 13 17
Database Data Structures
B TREES B+ TREES
all the nodes have pointers just the leaf nodes have
to database table rows pointers to the table rows
we have an application
with the user interface
components (buttons) etc.
TABLE #1
TABLE #2
TABLE #3
modern applications
communicate with the
server (and the database)
via the internet
we have an application
with the user interface
components (buttons) etc.
TABLE #1
TABLE #2
TABLE #3
STORED
modern applications PROCEDURE
communicate with the
server (and the database)
via the internet