0% found this document useful (0 votes)
6 views

SQL Database

Uploaded by

Ashok T-rex
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

SQL Database

Uploaded by

Ashok T-rex
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 220

What is SQL?

(Database Management)
Databases
• SQL stands for Structured Query Language
• it was first developed in the 1970s by IBM researchers Raymond
Boyce and Donald Chamberlin
• it is a database language used for constructing statements processed
by a database server
Databases

DATABASE
a database is a set of persistant data used by a given
application and managed by a databse management system (DBMS)

persistant data means that it remains in the


database permanently

THE DATA IN THE MAIN MEMORY (RAM) IS LOST EVERY TIME THE
APPLICATION FINISHES OR WE SWITCH OFF THE COMPUTER !!!
Databases

DATABASE
a database is a set of persistant data used by a given
application and managed by a databse management system (DBMS)

DATABASE MANAGEMENT SYSTEM (DATABASE SERVER)


data present in a given database is managed by a distinct
programming system – MySQL is DBMS
Databases

DATABASE
a database is a set of persistant data used by a given
application and managed by a databse management system (DBMS)

DATABASE MANAGEMENT SYSTEM (DATABASE SERVER)


data present in a given database is managed by a distinct
programming system – MySQL is DBMS

DATABASE LANGUAGE
every database server has a database language with
which we can execute statements – SQL is a database language
Databases
APPLICATION

SQL

stores the indexes of the


data in a B+ tree
DBMS
THIS IS WHY THE OPERATIONS
HAVE GUARANTEED O(logN)
RUNNING TIMES !!!
The Relational Model
• SQL is based on an abstract mathematical theory – this is the famous
relational model
• the relational model was first constructed in 1970 at IMB by E. F. Codd
• his relational model provides a theory for database language
• data are stored in tables

SQL is a relational database language


The Relational Model

ID Name Gender Age


1 Kevin Hart male 21
2 Joe Smith male 46
3 Ana Burrows female 27
4 James Whistler male 38
5 Emma Watson female NULL
The Relational Model
• the rows in a table has no specific order – the content of the table is a
set of rows
• there is a single value associated with a given row and a given column
and this is called atomic value
• unknow values are called NULL values
• the table must satisfy so-called integrity constraints – the age
paremeter should be greater than 0
• every single row has a unique primary key or ID
The Relational Model
• redundant data can be stored more efficiently
• we can even eliminite redundancy
• every independent piece of information is in only one place – this is
called normalization
External Storage
(Database Management)
File Systems and Databases

balanced search trees are working extremely


fine and they can be stored in the
main memory (RAM)

WHAT IF WE WANT TO STORE > 1GB HUGE DATA?


File Systems and Databases
There are 2 types of memory:

1.) main memory (RAM): all the data structures considered so far are
stored in the main memory

 stack memory and heap memory are located in the


main memory as well

2.) external memory (peripheral memory): hard disk, CD-ROM etc.

 hard drive storage can store large amounts and sizes of files
such as file systems or databases
File Systems and Databases
There are 2 types of memory:

1.) main memory (RAM): all the data structures considered so far are
stored in the main memory
T
S
A
F
 stack memory and heap memory are located in the
main memory as well

2.) external memory (peripheral memory): hard disk, CD-ROM etc.

 hard drive storage can store large amounts and sizes of files
such as file systems or databases
File Systems and Databases
There are 2 types of memory:

1.) main memory (RAM): all the data structures considered so far are
stored in the main memory
T
S
A
F
 stack memory and heap memory are located in the
main memory as well

2.) external memory (peripheral memory): hard disk, CD-ROM etc.

 hard drive storage Wcan store large amounts and sizes of files
L O
S
such as file systems or databases
File Systems and Databases
EXTERNAL MEMORY

hard drive disk (HDD) is one or more


rigid rapidly rotating platters coated
with magnetic material

CAN RETAIN DATA EVEN


WHEN POWERED OFF !!!
File Systems and Databases
EXTERNAL MEMORY track is a circular path on the surface
of the HDD on which information
is recorded and read
hard drive disk (HDD) is one or more
rigid rapidly rotating platters coated
with magnetic material

CAN RETAIN DATA EVEN


WHEN POWERED OFF !!!
File Systems and Databases
EXTERNAL MEMORY

hard drive disk (HDD) is one or more


rigid rapidly rotating platters coated
with magnetic material

CAN RETAIN DATA EVEN


WHEN POWERED OFF !!!

a block is a subdivision of the


hard drive disk (HDD) storing
512 bytes
File Systems and Databases
• the data located on the hard drive disk
DISK MEMORY (HDD) MAIN MEMORY (HDD) can not be proccessed explicitly
• it must be brought into the main memory
• in the main memory (RAM) we can use
either the stack memory or the heap
memory
• we can manipulate (read or write) the
blocks which means at least 512 bytes
• ACCESSING THE BLOCKS IS SLOW !!!
File Systems and Databases
• the data located on the hard drive disk
DISK MEMORY (HDD) MAIN MEMORY (HDD) can not be proccessed explicitly
• it must be brought into the main memory
• in the main memory (RAM) we can use
PROGRAM either the stack memory or the heap
memory
• we can manipulate (read or write) the
blocks which means at least 512 bytes
• ACCESSING THE BLOCKS IS SLOW !!!
File Systems and Databases
• the data located on the hard drive disk
DISK MEMORY (HDD) MAIN MEMORY (HDD) can not be proccessed explicitly
• it must be brought into the main memory
• in the main memory (RAM) we can use
PROGRAM either the stack memory or the heap
memory
• we can manipulate (read or write) the
blocks which means at least 512 bytes
• ACCESSING THE BLOCKS IS SLOW !!!
File Systems and Databases

DISK MEMORY (HDD) MAIN MEMORY

organizing the data efficiently storing the data efficiently


stored on the hard drive disk (HDD) on the main memory (RAM)
has something to do with database has something to do with
management systems (DBMS) data structures
File Systems and Databases
• accessig items on the external memory (HDD) is way slower than
manipulating the mian memory
• we need totally different data structures

EXTERNAL MEMORY ACCES TIME: 12 ms


RAM ACCESS TIME: 0.0001 ms

• so far we have manipulated data present on the main memory but


now we have to fetch the data from the external memory first
File Systems and Databases
• recursive approaches are working quite fine
when using the main memory (RAM)
12 • doing the same on the external memory
(HDD) is slow because of the access time
4 20 • we can manipulate (read or write) the
blocks which means at least 512 bytes
5 23
1 • ACCESSING THE BLOCKS IS SLOW !!!
• conclusion: we should minimize the amount
of read operations
• this is why B-trees are shallow structures
File Systems and Databases
• recursive approaches are working quite fine
when using the main memory (RAM)
12 • doing the same on the external memory
(HDD) is slow because of the access time
4 20 • we can manipulate (read or write) the
blocks which means at least 512 bytes
5 23
1 • ACCESSING THE BLOCKS IS SLOW !!!
• conclusion: we should minimize the amount
of read operations
• this is why B-trees are shallow structures
File Systems and Databases
• recursive approaches are working quite fine
when using the main memory (RAM)
12 • doing the same on the external memory
(HDD) is slow because of the access time
4 20 • we can manipulate (read or write) the
blocks which means at least 512 bytes
5 23
1 • ACCESSING THE BLOCKS IS SLOW !!!
• conclusion: we should minimize the amount
of read operations
• this is why B-trees are shallow structures
File Systems and Databases
• recursive approaches are working quite fine
when using the main memory (RAM)
12 • doing the same on the external memory
(HDD) is slow because of the access time
4 20 • we can manipulate (read or write) the
blocks which means at least 512 bytes
5 23
1 • ACCESSING THE BLOCKS IS SLOW !!!
• conclusion: we should minimize the amount
of read operations
• this is why B-trees are shallow structures
File Systems and Databases
• recursive approaches are working quite fine
when using the main memory (RAM)
12 • doing the same on the external memory
(HDD) is slow because of the access time
4 20 • we can manipulate (read or write) the
blocks which means at least 512 bytes
5 23
1 • ACCESSING THE BLOCKS IS SLOW !!!
• conclusion: we should minimize the amount
of read operations
• this is why B-trees are shallow structures
B-Trees
• it was first constructed in 1971 by Rudolf Bayer and Ed McCreight
• B-trees are self balancing tree like data structures
• supports operations such as insertion, deletion, sequential access and
searching in O(logN) time complexity
• the nodes may have more than 2 children + multipley keys
• B-tree data sturctures are optimized for systems that read and write
large blocks of data
• B-trees are a good example of a data structure for external memory
• commonly used in databases and filesystems
B-Trees
Every node may have multiple children (more than 2) but the running time is still
O(logN) logartithmic
logab is just a
logbN = constant so does
not matter

 we can change the base of the logarithm and the


running time complexity for the algorithm stay the same

O(c* logN) = c* O(logN) = O(logN)

 thats why the branching factor does not matter


in the running time complexities
B-Trees

8 12 20

-2 9 11 12 14 19 32
B-Trees
B-TREE PROPERTIES

1.) all the nodes of the tree structure can contain m keys – so it may
have m+1 children (branching factor)
2.) every node is at least half full – so contain at least items
3.) if the N number of items in a node is less than then we merge it
with another node and if N>m then we split the node
4.) all leaf nodes are at the same level (balanced)
Procedural and Non-
Procedural Programming
Languages
(Database Management)
Programming Languages

PROCEDURAL LANGUAGES FUNCTIONAL LANGUAGES


(NON-PROCEDURAL)
the program code is written
as a squence of commands and we just have to specify
instructions what to do

we have to define what to do as the semantics is quite easy


well as how to do it but not really efficient

overall efficiency is SQL or Lisp


quite high

C, C++ or Java
Programming Languages
PROCEDURAL LANGUAGES
• we can define the methods and variables
• there are conditional statements and loops as well
• we have full control over the application
• procedural languages define the results and the entire mechanism of
an application
• C, C++ or Java are procedural programming languages
Programming Languages
NON-PROCEDURAL LANGUAGES
• we define the query with SQL but the database engine will do the
entire mechanism
• the database engine (optimizer) takes the SQL statement and decide
what will be the most efficient execution
• some database vendors will enable us to write complete scripts -
PL/SQL or T-SQL
• MySQL have stored procedures – so procedural commands
Programming Languages
SQL STATEMENT

Query Language
Processor

OPTIMIZER

DBMS Engine

Transaction Manager

PHYSICAL DATABASE
Database Management
Systems
(Database Management)
Database Management Systems
• there are several database management systems available

Oracle Database

SQL Server (MS SQL)

MySQL
Oracle Database
• the Oracle database uses the so-called PL/SQL language
• it stands for Procedural Language for SQL
• it has the advantage of procedural languages: loops and conditional
statements
• it is quite fast and has better performance in the main
• it can handle errors and exceptions
MS SQL Database
• Microsoft’s SQL Server uses the so-called T-SQL language
• it has a transaction control feature
• it can handle errors and exceptions
• we can define variables as well
MySQL
• MySQL has a huge advantage: it is open-source
• it is suited for small (and medium) web pages like WordPress websites
and webpages
• it can not handle huge datasets efficiently and transactions are not
handled efficiently
• it has poor performance scaling
NoSQL
• NoSQL is a non-relational database system (these are graph
databases)
• MongoDB and Apache Cassandra uses it under the hood
• It is extremely useful in big data and real-time web applications
• it has a huge advantage: it is scalable
Data Types
(Database Management)
Data Types
1.) CHARACTER DATA

Characters can be stored as fixed-length or variable-length strings

 fixed-length strings always consume the same number of bytes


 with variable-length strings the number of bytes can change

char(20) varchar(20)

the maximum length for char the maximum length for varchar
is 255 bytes. is 65 535 bytes.

for example we can store country for example we can store names
codes with 3 letters with char(3) efficiently with varchar(30) because the length
of names differ
- HUN, USA or GER -
Data Types
2.) TEXT DATA

If we want to store a huge text then using varchar is not the best
option possible – use texts for data that exceed 64 kB

TEXT TYPE MAXIMUM NUMBER OF BYTES


tinytext 255
text 65 535
mediumtext 16 777 215
longtext 4 294 967 295
Data Types
3.) NUMERICAL DATA
There are several different numerical data types depending what range
we are looking for. We can have unsigned data for values greater than or equal to 0

TYPE SIGNED RANGE UNSIGNED RANGE


tinyint -127 to 127 0 to 255
smallint -32 768 to 32 767 0 to 65 535
mediumint -8 388 608 to -8 388 607 0 to 16 777 215
int or integer -2 147 483 648 to 2 147 483 647 0 to 4 294 967 295
bigint - 9 223 372 036 854 775 808 to 0 to 18 446 744 073 709 551 615
9 223 372 036 854 775 807
Data Types
3.) NUMERICAL DATA

When using floating point numbers we have FLOAT(p,s), DOUBLE(p,s)


and we can use DECIMAL(p,s) as well

2.71828

Float is a single-precision number (represented on 4 bytes) while double is


a double-precision number (represented on 8 bytes)

the p precision is the total


the s scale is the number of allowable
number of allowable digits both to the
digits to the right of the decimal point
left and to the right of the decimal point
Data Types
4.) DATE AND TIME

TYPE DEFAULT FORMAT VALUE


date YYYY-MM-DD 1000-01-01 to 9999-12-31
datetime YYYY-MM-DD HH:MI:SS 1000-01-01 00:00:00 to 9999-12-31 23:59:59

timestamp YYYY-MM-DD HH:MI:SS 1970-01-01 00:00:00 to 2037-12-31 23:59:59


year YYYY 1901 to 2155
time HHH:MI:SS -838:59:59 to 838:59:59
Primary Keys – Indexes
(Database Management)
Primary Keys

„A primary key (index) is a column or a set of columns


that uniquely identifies each row in the table”
Primary Keys
• primary key uniquely identifies every row in a database table
• the primary key must be unique – if there are multiple primary keys
then the combination of the columns must be unique
• can not have NULL values in a primary key column – the NOT NULL
constraint is added implicitly
• a table can have only one primary key
• if we do not specify a primary key explicitly then MySQL generates a
hidden primary key
Booleans
(Database Management)
Booleans
• BOOLEAN is basically TINYINT(1)
• MySQL assigns integer values to boolean variables
• 0 is FALSE
• 1 (or any other value) is TRUE
Multiple Tables
(Database Management)
Foreign Keys
• a foreign key is a column (or multiple columns) in a table that links to
a column (or multiple columns) in another table
• a table may have multiple foreign keys
• each foreign key references to a primary key of the different parent
tables
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table

UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
1 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table

UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
1 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
• what happens to the foreign keys when the given table’s values are
updated (ON UPDATE) or removed (ON DELETE)?
• there are several algorithms to deal with this problem
Foreign Keys
1.) RESTRICT

This is the simplest solution – it is forbidden to update or remove a given row (can
not modify the value)

„If a row from the parent table has a matching row


in the child table MySQL rejects updating or
removing rows in the parent table”
Foreign Keys
1.) SET NULL

This is another solution how to deal with the problem – whenever the entry in the
parent tabe is removed (or updated) then the matching values are set to be NULLs

„If a row from the parent table has a matching row in the
child table MySQL sets the value to be a NULL”
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table

UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
1 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table

UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
2 Harvard
THIS IS THE PARENT TABLE OR
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 NULL
these references
4 Adam 29 NULL the given institutions
in the UNIVERSITY table

UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
2 Harvard
THIS IS THE PARENT TABLE OR
THE REFERENCED TABLE
Foreign Keys
1.) CASCADE

This is another solution how to deal with the problem – whenever the entry in the
parent tabe is removed (or updated) then the matching values are updated

„If a row from the parent table is deleted or updated then


the values of the matching rows in the child table
automatically removed or updated”
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table

UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
1 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table

UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
10 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 10
these references
4 Adam 29 10 the given institutions
in the UNIVERSITY table

UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
10 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table

UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
1 MIT
THIS IS THE PARENT TABLE OR
2 Harvard
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE
3 Anna 22 1
these references
4 Adam 29 1 the given institutions
in the UNIVERSITY table

UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
2 Harvard
THIS IS THE PARENT TABLE OR
THE REFERENCED TABLE
Foreign Keys
PERSON the university column
stores a foreign keys
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL THIS IS THE CHILD TABLE OR
2 Joe 34 NULL THE REFERENCING TABLE

these references
the given institutions
in the UNIVERSITY table

UNIVERSITY
the university has the primary keys (IDs)
ID NAME that are being referenced
2 Harvard
THIS IS THE PARENT TABLE OR
THE REFERENCED TABLE
Joining Tables
(Database Management)
Inner Join
• what if we want to combine the values of multiple database tables?
• this is when we have to use the JOIN keyword
• there are several ways to combine multiple tables

1.) INNER JOIN

2.) LEFT JOIN

3.) RIGHT JOIN

4.) FULL JOIN


f
Inner Join

„Inner join selects records that have matching


values in both database tables”
Inner Join

TABLE 1 TABLE 2

when using the Venn-diagram representation


inner join is the intersection of the
two database tables
Inner Join
PERSON
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL
2 Joe 34 NULL
3 Ana 22 1
4 Adam 29 2

UNIVERSITY
ID NAME
1 MIT
2 Harvard
3 Cambridge
Inner Join
SELECT PERSON.person_name, UNIVERSITY.university_name FROM PERSON
INNER JOIN UNIVERSITY ON PERSON.university = UNIVERSITY.university_id

RESULT SET
person_name university_name
Ana MIT
Adam Harvard
Left Join
(Database Management)
Left Join

„The LEFT JOIN statement returns all records from


the left table and the matching records from the right table”
Left Join

TABLE 1 TABLE 2

when using the Venn-diagram representation


left join are the records in in the left database table
with the matching records in the right table
Left Join
PERSON
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL
2 Joe 34 NULL
3 Ana 22 1
4 Adam 29 2

UNIVERSITY
ID NAME
1 MIT
2 Harvard
3 Cambridge
Left Join
SELECT PERSON.person_name, UNIVERSITY.university_name FROM PERSON
LEFT JOIN UNIVERSITY ON PERSON.university = UNIVERSITY.university_id

RESULT SET
person_name university_name
Kevin NULL
Joe NULL
Ana MIT
Adam Harvard
Right Join
(Database Management)
Right Join

„The RIGHT JOIN statement returns all records from


the right table and the matching records from the left table”
Right Join

TABLE 1 TABLE 2

when using the Venn-diagram representation


right join yields the records in the right database table
with the matching records in the left table
Right Join
PERSON
ID NAME AGE UNIVERSITY
1 Kevin 19 NULL
2 Joe 34 NULL
3 Ana 22 1
4 Adam 29 2

UNIVERSITY
ID NAME
1 MIT
2 Harvard
3 Cambridge
Right Join
SELECT PERSON.person_name, UNIVERSITY.person_name FROM PERSON
RIGHT JOIN UNIVERSITY ON PERSON.university = UNIVERSITY.university_id

RESULT SET
person_name university_name
Ana MIT
Adam Harvard
NULL Cambridge
Right Join
• what is the difference between right join and left join?
• left join and right join statements clauses are equivalent and they can
replace each other if the table order is reverse
Subqueries
(Database Management)
Subqueries
• a subquery is a standard SQL statement or query contained in
another SQL query
• It is always enclosed within parantheses
Subqueries

SELECT name FROM city WHERE id = (SELECT MAX(id) FROM city);

outer SQL query inner SQL query


or statement that
will produce the THIS IS THE SUBQUERY !!!
final result
or statement that
will produce the
subresult
Subqueries
There are 2 main types of subqueries:
Subqueries
There are 2 main types of subqueries:

NON-CORRELATED SUBQUERIES
Subqueries
There are 2 main types of subqueries:

NON-CORRELATED SUBQUERIES

CORRELATED SUBQUERIES
Non-Correlated Subqueries
Non-correlated subqueries are totally self-contained which means
it does not reference anything from the containing statement

WITH NON-CORRELATED SUBQUERIES THE INNER QUERY DOES


NOT DEPEND ON OUTER QUERY !!!

select every column


from city table
except for the
SELECT * FROM city WHERE id <> (SELECT MAX(id) FROM city);
last one

This is also called a scalar subquery which means the


subquery returns a single row and single column
(for example a single integer ID)
Non-Correlated Subqueries
Non-correlated subqueries are totally self-contained which means
it does not reference anything from the containing statement

WITH NON-CORRELATED SUBQUERIES THE INNER QUERY DOES


NOT DEPEND ON OUTER QUERY !!!

SELECT * FROM city WHERE id <> (SELECT id FROM city WHERE code=‚HUN’); ERROR !!!

The subquery returns multiple rows and we can not equate a single value to
a set of values. We can check whether a given value can be found within
a set of values with IN and NOT IN keywords
Non-Correlated Subqueries
Non-correlated subqueries are totally self-contained which means
it does not reference anything from the containing statement

WITH NON-CORRELATED SUBQUERIES THE INNER QUERY DOES


NOT DEPEND ON OUTER QUERY !!!

SELECT * FROM city WHERE id NOT IN (SELECT id FROM city WHERE code=‚HUN’); GOOD !!!

The subquery returns multiple rows and we can not equate a single value to
a set of values. We can check whether a given value can be found within
a set of values with IN and NOT IN keywords
Correlated Subqueries
Correlated subqueries are NOT self-contained which means it does
reference something from the containing statement

WITH CORRELATED SUBQUERIES THE INNER QUERY DOES


DEPEND ON OUTER QUERY !!!

 correlated subqueries are slower and they


should be avoided if possible

 they are not executed once prior the execution of


the containing statement

 correlated subqueries are executed once for each


and every candidate row
Correlated Subqueries
Correlated subqueries are NOT self-contained which means it does
reference something from the containing statement

WITH CORRELATED SUBQUERIES THE INNER QUERY DOES


DEPEND ON OUTER QUERY !!!

SELECT COUNTRY.name FROM COUNTRY WHERE


22 = (SELECT COUNT(*) FROM CITY c WHERE c.country_code=COUNTRY.code);
Performance

NON-CORRELATED SUBQUERIES CORRELATED SUBQUERIES

inner query runs on its own – does inner query depends


not depend on the outer query on the outer query

inner query is executed before outer query is executed


the outer query before the inner query

we can use it without not efficient – we should


any problems use JOIN instead
Subqueries and JOIN
• a subquery is a standard SQL statement or query contained in
another SQL query
• we could use JOIN whenever we are using subqueries
• usually combining tables with JOIN is more efficient
Data Normalization
(Database Management)
Data Normalization
• what is the motivation behind data normalization?
• database normalization is the process of structuring a database in
order to reduce data redundancy and improve data integrity
• it was first proposed by E. F. Codd while constructing the relational
model for databases
Data Normalization

- MINIMIZE DUPLICATE DATA -


of course there will be duplicate data in the database we just
want to minimize the amount
Data Normalization

- MINIMIZE DUPLICATE DATA -


of course there will be duplicate data in the database we just
want to minimize the amount

- MINIMIZE DATA MODIFICATION -


we may have to modify the same data multiple locations
because of redundant data
Data Normalization

- MINIMIZE DUPLICATE DATA -


of course there will be duplicate data in the database we just
want to minimize the amount

- MINIMIZE DATA MODIFICATION -


we may have to modify the same data multiple locations
because of redundant data

- SIMPLIFY QUERIES -
if the relationships of the database tables are simple
then we can construct simple queries as well
Redundancy
STUDENT
ID NAME AGE UNIVERSITY UNIVERSITY ADDRESS
1 Kevin 19 Harvard Cambridge, MA, USA
2 Joe 34 Harvard Cambridge, MA, USA
3 Ana 22 Harvard Cambridge, MA, USA
4 Adam 29 Harvard Cambridge, MA, USA
Redundancy
STUDENT
ID NAME AGE UNIVERSITY UNIVERSITY ADDRESS
1 Kevin 19 Harvard Cambridge, MA, USA
2 Joe 34 Harvard Cambridge, MA, USA
3 Ana 22 Harvard Cambridge, MA, USA
4 Adam 29 Harvard Cambridge, MA, USA

this is a typical case of an unncessary data repetition

THAT IS CALLED REDUNDANCY !!!

it can dramatically increase the size of


the database and it is hard to handle (update)
Redundancy
STUDENT
ID NAME AGE UNIVERSITY
1 Kevin 19 1
2 Joe 34 1
3 Ana 22 1
4 Adam 29 1

UNIVERSITY
ID NAME ADDRESS
1 Harvard Cambridge, MA, USA
Data Normalization
• how to achieve data normalization and how to reduce redundancy?

- FIRST NORMAL FORM (1NF) -


first step in the normalization process
Data Normalization
• how to achieve data normalization and how to reduce redundancy?

- FIRST NORMAL FORM (1NF) -


first step in the normalization process

- SECOND NORMAL FORM (2NF) -


second step in the normalization process
Data Normalization
• how to achieve data normalization and how to reduce redundancy?

- FIRST NORMAL FORM (1NF) -


first step in the normalization process

- SECOND NORMAL FORM (2NF) -


second step in the normalization process

- THIRD NORMAL FORM (3NF) -


third step in the normalization process
Data Normalization – First
Normal Form
• database design is similar to software architectural design
• so maybe we can apply similar principles to SOLID pinciples and
design patterns
• we should design database tables that can be extended easily
• every database table must contain atomic values
• every database table column must contain data of the same type –
for example when using DATE and DATETIME
• every database table column must have a unique name
• order of the data is not important – the primary keys identify the
items in the table anyways
Data Normalization – First
Normal Form
STUDENT
ID NAME AGE SUBJECT
1 Kevin 19 math
2 Joe 34 physics
3 Ana 22 literature
4 Adam 29 math, physics
Data Normalization – Second
Normal Form
• first we should make sure the first normal form related principles are
not violated
• the given database table must not have any partial dependencies
Data Normalization – Second
Normal Form
• first we should make sure the first normal form related principles are
not violated
• the given database table must not have any partial dependencies

we have the primary key of the


STUDENT STUDENT table
ID NAME AGE (the ID in this case)
1 Kevin 19
THERE ARE FUNCTIONAL
2 Joe 34
DEPENDENCIES !!!
3 Ana 22
4 Adam 29 all other columns are depending
on the primary key exclusively
Data Normalization – Second
Normal Form
• first we should make sure the first normal form related principles are
not violated
• the given database table must not have any partial dependencies

we have the primary key of the


STUDENT STUDENT table
ID NAME AGE (the ID in this case)
1 Kevin 19
THERE ARE FUNCTIONAL
2 Joe 34
DEPENDENCIES !!!
3 Ana 22
4 Adam 29 all other columns are depending
on the primary key exclusively
Data Normalization – Second
Normal Form
STUDENT
ID UNIVERSITY_ID NAME UNIVERSITY_NAME
1 1 Kevin Harvard
2 1 Ana Harvard
3 2 Daniel MIT
4 1 Adam Harvard
Data Normalization – Second
Normal Form
STUDENT
ID UNIVERSITY_ID NAME UNIVERSITY_NAME
1 1 Kevin Harvard
2 1 Ana Harvard
3 2 Daniel MIT
4 1 Adam Harvard

we may define a primary key based on


multiple columns (ID and UNIVERSITY_ID)
Data Normalization – Second
Normal Form
STUDENT
ID UNIVERSITY_ID NAME UNIVERSITY_NAME
1 1 Kevin Harvard
2 1 Ana Harvard
3 2 Daniel MIT
4 1 Adam Harvard

the name of the university depends just


one of the components of the primary key

THIS IS CALLED PARTIAL DEPENDENCY !!!

it is an indicator that we should split


the database table into multiple tables
Data Normalization – Second
Normal Form
STUDENT
ID UNIVERSITY_ID NAME UNIVERSITY_NAME
1 1 Kevin Harvard
2 1 Ana Harvard
3 2 Daniel MIT
4 1 Adam Harvard

the name of the university depends just


one of the components of the primary key

THIS IS CALLED PARTIAL DEPENDENCY !!!

it is an indicator that we should split


the database table into multiple tables
Data Normalization – Second
Normal Form
STUDENT
ID NAME UNIVERSITY
1 Kevin 1
2 Ana 1
3 Daniel 2
4 Adam 1

UNIVERSITY
ID NAME
1 Harvard
2 MIT
Data Normalization – Third
Normal Form
• first we should make sure the second normal form related principles
are not violated
• the given database table must not have any transitive dependencies
Data Normalization – Third
Normal Form
STUDENT
ID NAME SUBJECT TEACHER
1 Kevin Java Harvard
2 Ana C Harvard
3 Daniel C++ MIT
4 Adam Python Harvard

the primary key is the ID of the student


and there is a column that depends
on another column – that is not the primary key

THIS IS CALLED TRANSITIVE DEPENDENCY !!!


Data Normalization – Third
Normal Form
STUDENT
ID NAME SUBJECT TEACHER
1 Kevin Java Harvard
2 Ana C Harvard
3 Daniel C++ MIT
4 Adam Python Harvard

the primary key is the ID of the student


and there is a column that depends
on another column – that is not the primary key

THIS IS CALLED TRANSITIVE DEPENDENCY !!!

again the solution is to create separate database tables


Transactions and Locking
(Database Management)
Transactions
• most of the SQL statements are independent
• UPDATE, CREATE or SELECT
• but there may be several SQL queries that need to be executed
together – right after another
• AS A SINGLE LOGICAL UNIT OF WORK
• transactions help to execute multiple SQL statements concurrently
• the problem is that multiple users can manipulate the same database
table at the same time
• we have to use locking to avoid inconsistencies
Locking
The database server uses locks to handle simultaneous use of data resources

 if the database is locked then any other users wishing to modify (update
or read) that data must wait until the lock has been released

There are two strategies to handle concurrent modification:

1.) locking

2.) „versioning”
Locking
1.) database writers must request (and receive) a write lock from the server
to modify a given data but there may be multiple read locks

ONLY ONE WRITE LOCK IS GIVEN OUT AT A TIME FOR A GIVEN TABLE

 database readers must request (and receive) a read lock from the server
to query data

 the advantage is that multiple users can read data simultaneously


+ read requests are blocked until the write lock is released

For example: MS SQL server – problem is that it can lead to long wait times
Locking

READ LOCK WRITE LOCK

multiple sessions can acquire the the session that has the write lock
read lock at the same time can read and write as well

the session can read exclusively other sessions can not even read
while holding the read lock from the database table
(no write operations are allowed) while the write lock is
not released
other sessions with the write lock
has to wait for the read lock to read lock is a shared lock
finish execution while write lock is exclusive
Locking
THERE ARE MULTIPLE TYPES OF LOCK:

 table lock: only a single user can manipulate a given database table at a given time
Quite slow approach but easy to implement

 page lock: only a single user can manipulate a given


page at a given time – page is a segment 2-16 KB of memory

 row lock: only a single user can manipulate a given


row at a given time

FAST BUT HAVE TO STORE SOME EXTRA INFORMATION


Database Versioning
2.) another approach is called versioning that allows multiple users (session)
to read and write data at the same time

THE SERVER ENSURES THAT DATA WILL BE CONSISTENT BY KEEPING


MULTIPLE VERSIONS OF THE DATA

 there may be multiple versions of the data


visible to multiple transactions

 when a transaction modifies a record


a new version is written to the database

Oracle server
MySQL
problem if there are long running queries
uses both techniques
while data is being modified
Storage Engine for MySQL
We have been discussing locking techniques. MySQL supports several locking
strategies and we can specify what we want to do

MyISAM InnoDB
non-transactional engine with table locking transactional engine with row-level locking

/* we can check a table storage engine


SHOW TABLE STATUS WHERE name = 'city’;

/* we can alter a given table engine


ALTER TABLE city ENGINE = INNODB;
Transactions
(Database Management)
Transactions
• we can group multiple SQL statements such that either all or none of
the statements succeed
• this is called atomicity principle
• it is crucial with banking applications
• for example a given bank customer transfers money from a savings
account to the current account
Transactions
/* customer wants to transfer $500 from the saving account
UPDATE saving_account SET balance = balance – 500 WHERE account_id = 123;
/* customer wants to transfer $500 from the saving account to the current account
UPDATE current_account SET balance = balance + 500 WHERE account_id = 55;
/* the bank wants to track everything
INSERT INTO bank_log VALUES(‚2020-03-12’,500,123,55);
Transactions
/* customer wants to transfer $500 from the saving account
UPDATE saving_account SET balance = balance – 500 WHERE account_id = 123;
/* customer wants to transfer $500 from the saving account to the current account
UPDATE current_account SET balance = balance + 500 WHERE account_id = 55;
/* the bank wants to track everything
INSERT INTO bank_log VALUES(‚2020-03-12’,500,123,55);

what is the problem with these SQL statements?


here in this case we handle every single query
independently but they are depending
heavily on each other
Transactions
/* customer wants to transfer $500 from the saving account
UPDATE saving_account SET balance = balance – 500 WHERE account_id = 123;
/* customer wants to transfer $500 from the saving account to the current account
UPDATE current_account SET balance = balance + 500 WHERE account_id = 55;
/* the bank wants to track everything
INSERT INTO bank_log VALUES(‚2020-03-12’,500,123,55);

we should include all of the SQL statements within a transaction


we want to make sure if one of the statements fail then
we rollback and make no change at all
WE WANT TO AVOID INCONSISTENT STATES !!!
Transaction Control
• COMMIT: in order to save changes
• ROLLBACK: to rollback changes
• SAVEPOINT: we can define points within groups of transactions in
which to rollback
• START TRANSACTION: this is how we start a transaction
ACID Principles
(Database Management)
ACID Principles
• transaction is a single logical operation on the database data
• for example updating the name of a customer, withdraw money from
credit card etc.
• IMPORTANT: a single transaction may involve multiple changes
• banking applications rely heavily on transactions
• when transferring money – balance decreases on one account but
balance increases on another account
• 1970s: Jim Gray defined the ACID properties in order to make
transactions reliable
ACID Principles

A
atomicity
ACID Principles

A C
atomicity consistency
ACID Principles

A C I
atomicity consistency isolation
ACID Principles

A C I D
atomicity consistency isolation durability
Atomicity
• atomicity principle requires that every single transaction can be „all
or nothing”
• it means that if one part of the transaction fails then the entire
transaction fails so database state is left unchanged „rollback”
• so aborted transactions do not happen
• an atomic system must guarantee atomicity in every situation – even
when errors or power failures happen
Consistency
• consistency property makes sure that any transaction will bring the
database from one valid state to another
• the data inserted into the database must be valid
• including constraints, cascades or triggers...
Isolation
• the isolation property ensures that the concurrent execution of
transactions results in a system state that would be obtained if
transactions were executed serially
• sequentially means one after the other
• it is basically a concurrency control
• the effects of an incomplete transaction might not even be visible to
another transaction
Durability
• the durability property ensures that once a transaction has been
committed it will remain so
• even in the event of power loss or errors
• to defend against power loss transactions must be recorded in a non-
volatile memory
Views
(Database Management)
Views
• it is general in software engineering that applications should hide
concrete implementation
• this is why SOLID principles and design patterns came to be
• users can use the application but know nothing about the
implementation details
• good database design is the same
• We should keep database tables private and users can access data
only through a set of views
Views
• views are SQL statements that is stored in the database
• views do not involve data storage which means views do not consume
disk space
• we just assign a name to a select (or query) we store this given query
for others to use
• users can use this view to access data just like querying tables directly
• we can use all the statements: SELECT, GROUP BY, ORDER BY ...
Views
the problem with this approach
is that the application may
access private information
(credit card data etc.)

DATABASE TABLES
APPLICATION
Views

V
I DATABASE TABLES
APPLICATION E
W
Views
• for example exposing an entire bank account may violate some policies
• we define a view and the user can access only a portion of the bank
accounts in order to verify the identity
• so we can limit access to a given database table with views
Views
1.) VIEWS CAN HIDE COMPLEXITY

SQL queries can be very complex: joining lots of tables and so on


+ complex calculations in a given query
We can encapsulate this complexity into a view
~ we can use the view as a table later on !!!

2.) SECURITY ISSUES views can restrict access to data


We can encapsulate the data that a user needs to see

3.) SUMMARY we can summarize data from various tables which can be used to
generate reports (for example JasperReports)
Indexes
(Database Management)
Indexing
• when we insert a new item into a database table – the order of the
items are not sorted by default
• the new item is always inserted to the next available location
• indexing is a mechanism for finding a specific item fast
Indexing
1 Adam
6 Kevin the items of a database table
are not necesserily ordered by deafult

3 Ana
DATABASE TABLES ARE STORED
8 Michael IN FILES IN THE EXTERNAL MEMORY
5 Daniel
there may be gaps in the table
because of the previous
remove operations
4 Sofia
Indexing
• indexes are special tables in the database that are kept in a sorted
order – in the form of a tree like structure
• INDEXES DOES NOT CONTAIN ALL THE DATA
• the indexes store just the column (usually the primary key) used to
locate a given row in a database table
• basically a pointer where the given row is physically located in memory
• indexes can speed up SELECT and WHERE clauses
• but can dramatically slow down UPDATE and INSERT statements
• the query optimizer optimizes the query
Indexing
ALTER TABLE table_name ADD INDEX index_name (column_name)

ALTER TABLE table_name DROP INDEX index_name

we can set an index to be unique – not just for


performance reasons but also for data integrity

CREATE UNIQUE INDEX index_name ON table_name (column_name)

there are also implicit indexes – that are created automatically


by the database servers (such as the primary key)
Indexing
THERE ARE MULTIPLE TYPES OF INDEXING:

1.) B-Tree Indexing

The underlying data structure is a tree like structure that can


gurantee O(logN) logarithmic running time complexity

 good for a dataset where the columns may


contain several different values

2.) Bitmap Indexing

Better approach when the columns can have a small number of values
Indexing
• we use a tree like structure (B-tree) to store
and represent the indexes
• this is how we can find items in O(logN)
12 logarithmic running time complexity
• why is it slow to INSERT or UPDATE items?
4 20
• because of course we have to reconstruct
5 23 the tree like structure
1
• actually we store key-value pairs
• the index is the key and the value is the
pointer to the actual row in the table
Disadvantages of Indexing
• do not use indexes unnecessary
• every time we insert a new item or remove an item – the indexes (data
structure) must be reconstructed
• so if we have several indexes then the database server has to manage
the tree like data structure
• and of course indexes use memory ...
• Indexes should not be used on small database tables
• columns that are frequently manipulated should not be indexed
Constraints
(Database Management)
Constraints
• constraints are restrictions to a given database table column
• NOT NULL constraint
• PRIMARY KEY constraint
• FOREIGN KEY constraint
• UNIQUE KEY constraint
• CHECK constraint
Bitmap Indexing
(Database Management)
Bitmap Indexing
the main problem with tree like structres
is that they may get too large
12
which means they contain a huge
4 20 amount of items and this is why they may
become a bit slower
5 23
1

INSTEAD OF USING B-TREES WE


CAN USE OTHER APPROACHES !!!
Bitmap Indexing
ADS
ID NAME FOR CHILDREN STATUS
1 website_traffic FALSE pending
2 udemy_mail FALSE approved
3 facebook_ad TRUE rejected
4 instagram_ad FALSE approved

we should use B-tree indexing when


there are a huge number of possible values
for a given column

THIS IS CALLED CARDINALITY IN MATHEMATICS


Bitmap Indexing

„We should use bitmap indexing when the number of possible


values is rather small (low cardinality) and use B-trees when
there is a high cardinality”
Bitmap Indexing
ADS
ID NAME FOR CHILDREN STATUS
1 website_traffic FALSE pending
2 udemy_mail FALSE approved
3 facebook_ad TRUE rejected
4 instagram_ad FALSE approved

B-tree B-tree bitmap bitmap


indexing indexing indexing indexing
Bitmap Indexing
ADS
ID NAME FOR CHILDREN STATUS
1 website_traffic FALSE pending
2 udemy_mail FALSE approved
3 facebook_ad TRUE rejected
4 instagram_ad FALSE approved

TRUE 0 0 1 0
FALSE 1 1 0 1
Bitmap Indexing
ADS
ID NAME FOR CHILDREN STATUS
1 website_traffic FALSE pending
2 udemy_mail FALSE approved
3 facebook_ad TRUE rejected
4 instagram_ad FALSE approved

TRUE 0 0 1 0
FALSE 1 1 0 1

PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE status = ’pending’

TRUE 0 0 1 0
FALSE 1 1 0 1

PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE status = ’pending’

TRUE 0 0 1 0
FALSE 1 1 0 1

PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE status = ’pending’

TRUE 0 0 1 0
FALSE 1 1 0 1

PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE status = ’pending’

TRUE 0 0 1 0
FALSE 1 1 0 1

PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE status = ’pending’

TRUE 0 0 1 0
FALSE 1 1 0 1

PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1
Bitmap Indexing
SELECT name FROM ads WHERE children=’true’ AND status = rejected’

TRUE 0 0 1 0
FALSE 1 1 0 1

PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1

we can use logical operators


(AND and OR) to get the results
for the queries
Bitmap Indexing
SELECT name FROM ads WHERE children=’true’ AND status = rejected’

TRUE 0 0 1 0
FALSE 1 1 0 1

PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1

we can use logical operators


(AND and OR) to get the results
for the queries
Bitmap Indexing
SELECT name FROM ads WHERE children=’true’ AND status = rejected’

TRUE 0 0 1 0
FALSE 1 1 0 1

PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1

we can use logical operators


(AND and OR) to get the results
for the queries
Bitmap Indexing
SELECT name FROM ads WHERE children=’true’ AND status = rejected’

TRUE 0 0 1 0
FALSE 1 1 0 1

PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1

we can use logical operators


(AND and OR) to get the results
for the queries
Bitmap Indexing
SELECT name FROM ads WHERE children=’true’ AND status = rejected’

TRUE 0 0 1 0
FALSE 1 1 0 1

PENDING 1 0 0 0
REJECTED 0 0 1 0
APPROVED 0 1 0 1

we can use logical operators


(AND and OR) to get the results
for the queries
Database Data Structures
(Database Management)
Database Data Structures
• databases are stored as simple files on the external memory (HDD)
• the concrete implementation depends on the database engine
• MyISAM creates sepearate files for each database table
• InnoDB uses just a single file
Database Data Structures
1 Adam
6 Kevin the items of a database table
are not necesserily ordered by deafult

3 Ana
DATABASE TABLES ARE STORED
8 Michael IN FILES IN THE EXTERNAL MEMORY
5 Daniel
there may be gaps in the table
because of the previous
remove operations
4 Sofia
Database Data Structures
• the data located on the hard drive disk
DISK MEMORY (HDD) MAIN MEMORY (HDD) can not be proccessed explicitly
• it must be brought into the main memory
• in the main memory (RAM) we can use
PROGRAM either the stack memory or the heap
memory
• we can manipulate (read or write) the
blocks (pages) exclusively
• page size ranges from 2-16 KB
• ACCESSING THE BLOCKS IS SLOW !!!
Database Data Structures
• accessig items on the external memory (HDD) is way slower than
manipulating the mian memory
• we need totally different data structures

EXTERNAL MEMORY ACCES TIME: 12 ms


RAM ACCESS TIME: 0.0001 ms

• so far we have manipulated data present on the main memory but


now we have to fetch the data from the external memory first
Database Data Structures
1 Adam
6 Kevin what happends to the pages
in the main memory?
3 Ana
WE HAVE TO CONSIDER THE ITEMS
8 Michael
IN THE PAGES ONE BY ONE
5 Daniel

so it is a sequential search but it is


in the main memory so it is quite fast
4 Sofia
Database Data Structures
SELECT * FROM city WHERE code=’USA’;

• so if we are looking for an item in the database table – we have no


other option but to consider the entries one by one
• this is called linear search with O(N) running time
• can we do better?
• of course if the underlying data is sorted – we can apply binary search
with O(logN) logarithmic running time
• WE SHOULD KEEP THE DATABASE TABLE SORTED ACCORDING
TO THE PRIMARY KEY
B Trees
(Database Management)
Database Data Structures
• we know that tree data structures can guarantee logarithmic O(logN)
running time complexity
• they store the items in a sorted order
• so we can store the indexes of a database table (such as the primary
keys) in B trees
Database Data Structures

12 so we store the primary keys


or indexes like this
4 20
every node in the tree stores a pointer
8 23
1 to a databse table’s row as well
9
7 THE PROBLEM IS THAT WE HAVE TO
SWAP PAGES EVERY TIME WE VISIT
A CHILD NODE !!!
Database Data Structures

12 so we store the primary keys


or indexes like this
4 20
every node in the tree stores a pointer
8 23
1 to a databse table’s row as well
9
7 THE PROBLEM IS THAT WE HAVE TO
SWAP PAGES EVERY TIME WE VISIT
A CHILD NODE !!!
Database Data Structures

12 so we store the primary keys


or indexes like this
4 20
every node in the tree stores a pointer
8 23
1 to a databse table’s row as well
9
7 THE PROBLEM IS THAT WE HAVE TO
SWAP PAGES EVERY TIME WE VISIT
A CHILD NODE !!!
Database Data Structures

12 so we store the primary keys


or indexes like this
4 20
every node in the tree stores a pointer
8 23
1 to a databse table’s row as well
9
7 THE PROBLEM IS THAT WE HAVE TO
SWAP PAGES EVERY TIME WE VISIT
A CHILD NODE !!!
Database Data Structures

12 so we store the primary keys


or indexes like this
4 20
every node in the tree stores a pointer
8 23
1 to a databse table’s row as well
9
7 THE PROBLEM IS THAT WE HAVE TO
SWAP PAGES EVERY TIME WE VISIT
A CHILD NODE !!!
Database Data Structures

12 so we store the primary keys


or indexes like this
4 20
every node in the tree stores a pointer
8 23
1 to a databse table’s row as well
9
7 THE PROBLEM IS THAT WE HAVE TO
SWAP PAGES EVERY TIME WE VISIT
A CHILD NODE !!!
Database Data Structures
6

5 8

1 Adam
6 Kevin

8 Michael
5 Daniel
Database Data Structures

„B trees store multiple keys in the same tree node


in order to minimize page swapping”
Database Data Structures

8 12 20

-2 9 11 12 14 19 32

so the nodes contain multiple keys and these will be


fetched into the main memory
(and we search for the item we are looking for in the main memory)
Database Data Structures
• there are some disadvantages of using B trees
• we have to store the pointers to the database table rows with every
single key in the tree like structure
• the operations are quite complicated – such as inserting and removing
items from the tree
B+ Trees
(Database Management)
Database Data Structures
• B+ trees are very similar to B trees but there may be duplicate values
• the leaf nodes contain all the values – this is why there are duplicates
• leaf nodes form a linked list which is a crucial feature when dealing with
databases
• we can store more keys in the nodes because no need to store the
pointers to the rows in the database
• easier to do insertion and removal operations
Database Data Structures
8

5 12

0 1 5 6 8 12 13 17
Database Data Structures
8

5 12

0 1 5 6 8 12 13 17

0 Adam
6 Kevin
5 Ana
17 Sofia
13 Michael
1 Daniel
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;

5 12

0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;

5 12

0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;

5 12

0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;

5 12

0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;

5 12

0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;

5 12

0 1 5 6 8 12 13 17
Database Data Structures
SELECT * FROM city WHERE id BETWEEN 6 AND 12;

5 12

0 1 5 6 8 12 13 17
Database Data Structures

B TREES B+ TREES

all the nodes have pointers just the leaf nodes have
to database table rows pointers to the table rows

there are no duplicate values there is redundancy because


of the leaf nodes
all the operations are quite
complicated (and a bit slower) easy and fast operations
Strored Procedures
(Database Management)
Stored Procedures
• SQL is a non-procedural programming language by default
• with stored procedures we can create procedural like scripts that can be
executed at any time
• stored procedures are relatively fast – not as fast as compiled languages
such as C or C++
• MAIN SPEED GAIN COMES FROM REDUCTION OF NETWORK TRAFFIC
• they are stored on the server
• we can use stored procedures for repetitive tasks
• no need to bother with the high level programming language – we can
use PHP, Java or C#
Stored Procedures
• stored procedures can centralize logic that was originally implemented
in applications
• in order to save time and memory extensive or complex processing that
requires execution of several SQL statements can be saved into stored
procedures and all applications call the procedures
Stored Procedures the server stores
the database itself
(with the database tables)

we have an application
with the user interface
components (buttons) etc.

TABLE #1

TABLE #2

TABLE #3

modern applications
communicate with the
server (and the database)
via the internet

THIS IS THE BOTTLENECK !!!


Stored Procedures the server stores
the database itself
(with the database tables)

we have an application
with the user interface
components (buttons) etc.
TABLE #1

TABLE #2

TABLE #3

STORED
modern applications PROCEDURE
communicate with the
server (and the database)
via the internet

THIS IS THE BOTTLENECK !!!


Stored Procedures
Disadvantages
• there are several disadvantages of stored procedures
• stored procedures need memory
• MySQL is not a very effective procedural scripting language – stored
procedures increase the CPU usage as well
• hard to debug stored procedures

You might also like