0% found this document useful (0 votes)
25 views77 pages

Chapter 2

The document discusses relational databases and the SQL language. It describes the relational model, how relations are represented through tables, and the core components of the SQL language including data definition, manipulation, and control instructions.

Uploaded by

nonstres095
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views77 pages

Chapter 2

The document discusses relational databases and the SQL language. It describes the relational model, how relations are represented through tables, and the core components of the SQL language including data definition, manipulation, and control instructions.

Uploaded by

nonstres095
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

DATABASES

2. Relational Databases

[email protected]
Chapter 2. Relational Databases

• Relational model
• Relations
• Attributes
• Data domains

• Representing relations through tables


• SQL language
• Lexical conventions
• Expressions, operators, functions
• Data definition instructions: CREATE, ALTER, DROP
• Data manipulation instructions: INSERT, SELECT, UPDATE, DELETE
Relational model

• The relational model was introduced by IBM in 1970 and is the


most widespread database model in use today
Database usage (September 2018)
1400

1200

1000

800

600

400

200

0
Oracle MySQL Microsoft PostgreSQL MongoDB DB2 Elasticsearch Redis (Key- Microsoft Cassandra
(Relational) (Relational) SQL Server (Relational) (Document (Relational) (Search Value Store) Access (Wide
(Relational) Store) Engine) (Relational) Column
Store)
Relational model

• A relational database consists in a finite set of relations, each


relation representing an entity type or an association between two
or more types (sets) of entities
• A relation is defined by its attributes. The attributes of a relation
are the attributes of the entity type or association which is
represented by a relation
• A data domain is a set of atomic values, with certain meaning,
from which the relation attributes can take value:
• 𝐷 = 𝑑𝑖 |𝑖 = 1, … , 𝑛
• 𝑑𝑖 is an element of the data domain that satisfies certain constraints
• The values of the domain are atomic (indivisible)
• A special NULL value can belong to any data domain
Relational model

• The relation schema is a description of a relation


• The relation schema 𝑅 𝐴1 , 𝐴2 , … , 𝐴𝑖 , … , 𝐴𝑛 is defined by the relation
name 𝑅 and the ordered list of its attributes 𝐴1 , 𝐴2 , … , 𝐴𝑖 , … , 𝐴𝑛
• Each attribute 𝐴𝑖 is defined on its data domain 𝐷 𝐴𝑖
• Example: STUDENTS (First Name, Last Name, Date of Birth, Faculty, …)

• A relation 𝑅 defined by a schema 𝑅 𝐴1 , 𝐴2 , … , 𝐴𝑖 , … , 𝐴𝑛 is a set of


n-tuples 𝑡, each tuple consisting in an ordered list of 𝑛 values
𝑡 = 𝑣1 , 𝑣2 , … , 𝑣𝑖 , … , 𝑣𝑛 , where 1 ≤ 𝑖 ≤ 𝑛 and 𝑣𝑖 is the value of the
attribute 𝐴𝑖 , which belongs to its data domain 𝐷 𝐴𝑖
Relational model

• The number of tuples in a relation is called the cardinality of the


relation
• Each tuple in a relation is unique (there are no duplicate tuples)
• A relation is represented by a table, which consists in:
• The name of the table, which is identical to the name of the relation it
represents
• A set of columns equal to the number of attributes of the relation, each column
representing an attribute
• The table header, which contains the attribute names of the relation
• A set of rows, each row representing a tuple (an entity)
• A set of values for the attributes of each tuple
Representing relations through tables

STUDENTS (First Name, Last Name, Date of Birth, Faculty, …)

Table name Columns - Attributes

STUDENTS
First Name Last Name Date of Birth Faculty … Table header
Mihaela Andreescu 1998-06-23 ETTI …
Andrei Barbu 1999-10-14 ETTI … Rows (tuples)
Georgiana Constantin 1998-02-23 ETTI …
Laura Dumitrescu 1997-11-24 ETTI …

Attribute values
Representing relations through tables

• A Database Management System (DBMS) provides tools for


designing and displaying databases and tables
SQL language

• SQL (Structured Query Language) was introduced by IBM in 1970


Year Standard Alias Features
1986 SQL-86 SQL-87 First formalized by ANSI. Ratified by ISO in 1987
1989 SQL-89 Minor revision that added integrity constraints
1992 SQL-92 SQL2 Major revision
1999 SQL:1999 SQL3 Regular expression matching, recursive queries, triggers,
procedural and flow-control statements, object-relational features
2003 SQL:2003 XML-related features, columns with auto-generated values
2006 SQL:2006 SQL in conjunction with XML
2008 SQL:2008 Minor improvements
2011 SQL:2011 Minor improvements
2016 SQL:2016 JSON features
SQL language

• Each RDBMS implements a SQL language dialect, which reduces


the portability of applications
• Various SQL implementations may lack some commands provided
in the standard, but there may be extensions specific to each DBMS
• The SQL language uses table representation for relations, which is
simpler and more intuitive (uses the terms table, column, row)
• The SQL language includes:
• Data Definition Language (DDL)
• Data Manipulation Language (DML)
• Data Control Language (DCL)
• Transaction Control Language (TCL)
SQL language

• SQL2 is a non-procedural language


• An SQL2 statement specifies what information should be set or obtained, not
the mode (procedure) in which it operates
• SQL2 does not contain execution flow control instructions (for, while, if, etc.)

• The SQL3 standard provides control instructions and user-defined


type creation, being implemented in object-relational DBMSs
• For database applications, procedural SQL language extensions,
libraries and programming interfaces were developed, that
integrate SQL statements into the application programs
SQL language

• A SQL statement is usually delimited by a semicolon (;)


• Each SQL statement contains a command which specifies what
action to take, followed by other elements that specify operations,
clauses, parameters, etc.
• Example: SELECT * FROM students;

• SQL is case-insensitive, with the exception of case-sensitive


(quoted) identifiers
• Some conventions specify that SQL key words should be written in
uppercase, but these are subjective to the developer
SQL language

• The tokens of the SQL statement


• key words: CREATE, INSERT, SELECT, UPDATE, DELETE, FROM, etc.
• identifiers:
• simple - alpha-numeric characters and underscore (_): students, first_name, etc.
• delimited (quoted) - contain any character, using single quotes: 'First Name', etc.
• constants: 1000, 100.5, 'Ionescu', NULL
• special characters: *, ., ;
• whitespaces: space, new line, tab

• An instruction can be written on one or more lines and on a line


there can be one or more instructions
Expressions and operators in SQL

• Parentheses can be used to specify a specific order of operations if it


is different from the previous operator's default order
• An SQL expression consists of one or more operands, operators
and brackets if required by the operator precedence
• An operand can be:
• the name of a column - in this case the value stored in that column is used
from one or more rows of the table
• a constant
• the value returned by a function

• Example: SELECT * FROM students WHERE faculty = 'ETTI';


Expressions and operators in SQL

• An SQL operator can be:


• one or more special characters: +, -, *, /, %, <= etc.
• a keyword: AND, OR, NOT, LIKE, etc.

• SQL operators classifications:


• By number of operands:
• Unary: NOT
• Binary: =, <=, +, etc.
• By operation type:
• Arithmetic: +, -, *, etc. (with integer or real numbers); ~, &, |, <<, >> (bit oriented)
• Comparison: <, >, =, <>, !=, <=, >= (arithmetic); IS NULL, IN, LIKE, etc. (SQL)
• Logical: AND, OR, NOT
Expressions and operators in SQL

• Comparison operators (arithmetic and SQL) return a logical value:


• False (0), if the condition is not met
• NULL, if it is not known if the condition is met or not
• True (1), if the condition is met

• Logical operators (AND, OR, NOT):


A B A and B A or B A not A
TRUE TRUE TRUE TRUE TRUE FALSE
TRUE FALSE FALSE TRUE FALSE TRUE
TRUE NULL NULL TRUE NULL NULL
FALSE FALSE FALSE FALSE
FALSE NULL FALSE NULL
NULL NULL NULL NULL
Predefined SQL functions

• In SQL there are two types of functions:


• Predefined functions
• User-defined functions

• Predefined functions can be:


• Aggregate functions
• Scalar functions

• Aggregate functions calculate a result from multiple rows of a table


• Scalar functions receive one or more arguments and return the
calculated value, or NULL in case of an error
• Scalar functions arguments can be constants or attribute values
specified by corresponding column names
SQL data types

• SQL numeric data types:


• Integer types
Type Bytes Min. value Max. value Min. value Max. value
(signed) (signed) (unsigned) (unsigned)
TINYINT* 1 -128 127 0 255
SMALLINT 2 -32768 32767 0 65535
MEDIUMINT* 3 -8388608 8388607 0 16777215
INT 4 -2147483648 2147483647 0 4294967295
BIGINT 8 -263 263 – 1 0 264 – 1

• * - Types specific only to certain DBMSs (generally to be avoided)


SQL data types

• SQL numeric data types:


• Fixed-point types (exact value)
Type Bytes Precision (p) Scale (s)
DECIMAL(p, s) The precision is user The precision is user
variable defined (max. 65 digits*) defined (max. 30 digits*)
NUMERIC (p, s)

• Floating-point types (approximate value)


Type Bytes Characteristics
FLOAT/REAL 4 If the precision (p) is between 0 – 23 (it will
DOUBLE [PRECISSION] 8 result in 4 bytes and between 24 – 53 – 8 bytes

• * - The number of maximum digits can vary between DBMSs


SQL data types

• SQL character data types:

Type Length Characteristics


CHAR (n) 0 – max* Used for fixed-length data
VARCHAR (n) 0 – max* Used for variable length short data
TEXT** variable* Used for large text data

• * - The maximum length varies between DBMSs (good practice is to not exceed
255 characters)
• ** - TEXT data type, although not a part of the official SQL standard is found
in many DBMSs such as Oracle, MySQL, MS SQL Server or PostgreSQL
SQL data types

• SQL date and time data types:

Type Format Characteristics


DATE YYYY-MM-DD Only valid dates allowed
TIME HH:MM:SS Only valid values allowed
TIMESTAMP* YYYY-MM-DD HH:MM:SS Converts dates to UTC for storage
DATETIME** YYYY-MM-DD HH:MM:SS No date conversion

• * - TIMESTAMP and DATETIME types may not both be present on all DBMSs.
TIMESTAMP values are stored as seconds since (or before in some cases) a
"start date" (such as Unix time – January 1st 1970). On some DBMSs, the range
of TIMESTAMP values is limited
• ** - The DATETIME data type may not be present in all DBMSs
Notation conventions

• For the presentation of SQL and other languages, libraries and


interfaces, the following conventions are used:
[ ] (square brackets) Optional elements for the command
{ } (braces) Mandatory elements for the command
| (vertical bar) Separate the alternative elements from the square
brackets or braces; only one element from the list
will be in the final command
[, … n] The preceding element can be repeated n times;
repeated elements are separated by a comma
element_1, … , element_n List of n elements of the same type; repeated
items are separated by a comma
element_list List of elements of the same type separated by
commas
SQL statements

• Data definition instructions (DDL) in SQL


• CREATE
• ALTER
• DROP

• These instructions can define, edit or delete all types of entities in


database management system. These entities are:
• DATABASE
• TABLE
• VIEW
• PROCEDURE
• TRIGGER
• USER
Creating and editing databases

• Creating a database
CREATE DATABASE database_name;
• The database name must consist only in alpha-numeric characters
and underscore (_)
• Changing database characteristics
ALTER DATABASE database_name alter_specification;
alter_specification:
[DEFAULT] CHARACTER SET [=] charset_name
| [DEFAULT] COLLATE [=] collation_name
Using and deleting databases

• Using a database
USE database_name;
• A database can be viewed as a container for its entities: tables,
views, functions, procedures and triggers
• Deleting a database
DROP DATABASE [IF EXISTS] database_name;
• Dropping a database will also delete all the files that the database
system may create during normal operation
Creating tables

• The CREATE TABLE statement has the following syntax:


CREATE TABLE [IF NOT EXISTS] table_name(
col_1 data_domain_1 [column_constraints],
col_2 data_domain_2 [column_constraints],

col_n data_domain_n [column_constraints],
[table_constraints]
);
Creating tables

• Example:
CREATE TABLE students(
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
date_of_birth DATE,
faculty VARCHAR(255),
enrolment_year INT,
`group` VARCHAR(255)
);
Editing tables

• The edit operation (ALTER TABLE) allows users to:


• Add or delete attributes
• Change attributes definition (name, data domain, constraints)
• Adding, changing or deleting table constraints

• Adding columns (attributes)


ALTER TABLE table ADD col domain [constraints]
[FIRST | AFTER column_name];
• Example:
ALTER TABLE students ADD email VARCHAR(255);
ALTER TABLE students ADD id INT NOT NULL FIRST;
Editing tables

• Changing attributes
ALTER TABLE table CHANGE col_old_name col_new_name
data_domain [column_constraints];
• Example:
ALTER TABLE students CHANGE enrolment_year
enrolment_year INT UNSIGNED;
• Adding column constraints
ALTER TABLE table ADD [CONSTRAINT [name]] {
PRIMARY KEY | UNIQUE }({column | column_list});
Editing tables

• Example:
ALTER TABLE students ADD CONSTRAINT unique_email
UNIQUE(email);
ALTER TABLE students ADD UNIQUE(email);
ALTER TABLE students ADD PRIMARY KEY(id);
• Deleting columns or constraints
ALTER TABLE table DROP {[COLUMN] | INDEX | PRIMARY
KEY | FOREIGN KEY} {column_name | key_name} ;
• Example:
ALTER TABLE students DROP INDEX unique_email;
ALTER TABLE students DROP email;
Deleting tables

• Deleting a table
DROP TABLE [IF EXISTS] table_name;
• Example:
DROP TABLE students;
• Dropping a table will delete all the data from the table, as well as
the table structure
• The operation is irreversible
Integrity constraints

• The integrity constraints are the rules that can be added for the
stored data to best correspond with the modelled reality:
• They can be defined when designing the database
• They must be respected by any state of the relation

• The integrity constraints can be classified by the place where they


are defined:
• column constraints
• table constraints

• Classification by number of relations involved:


• intra-relation constraints
• inter-relation constraints
Integrity constraints

• Intra-relation constraints - rules that are imposed in a single


relation; there are three categories:
• Domain constraints - are imposed to the values of the attributes
• Tuple constraints - conditions that are imposed on the tuples of a key
relationship (primary or secondary)
• Constraints imposed by data dependencies (functional, multi-value or junction
dependencies); these are constraints between attribute values in a relation

• Inter-relation constraints - rules that are imposed between two or


more relations; they ensure referential integrity (correct association
of relations) through foreign keys
Integrity constraints

• Classification in terms of defining and verifying the constraints:


• Inherent
• Implicit
• Explicit

• The inherent constraints are those of the data model itself, which
don’t need to be explicitly defined as they are included in the
database management system
• Example: in the relational model the constraint that the value of
each attribute is atomic (indivisible) is an inherent constraint
Integrity constraints

• Implicit constraints are rules specific to each management system;


they are defined by the database engineer, and the management
system checks and imposes them automatically
• Each DBMS may have its own implicit constraints, but generally
domain constraints, tuple constraints and referential integrity
constraints are implicit constraints in any DBMS
• Explicit constraints are additional constraints specific to that
database; the designer defines the explicit constraints as well as
their checking procedures (functions, stored procedures, triggers)
• Example: Data dependencies not determined by the keys of
relations
Domain constraints

• Domain constraints:
• NOT NULL constraint
• constraint of default value (DEFAULT)
• check constraint (CHECK)

• The NOT NULL constraint means that the attribute can not take the
NULL value in any tuple of the relation
• The NULL value of an attribute in a tuple means that the value of
that attribute is not known for that tuple
• Examples:
• the birth date of a historical person is not known at all
• the value of an attribute is not known when inserting the tuple, but it will be
known and updated later
Domain constraints

• When creating a table, the NULL option is default (if nothing is


specified), or it can explicitly specified as NULL or NOT NULL
• The NULL and NOT NULL options are introduced as column
constraints in the SQL CREATE TABLE statement.
CREATE TABLE students(
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
…,
`group` VARCHAR(255) NULL
);
Domain constraints

• The constraint of implicit value of an attribute (DEFAULT): if the


value of an attribute is not specified when inserting a tuple, then:
• the attribute receives the default value (DEFAULT), if it has been defined
• the attribute receives the NULL value if no default value has been defined, as
long ass the NULL value is permitted for that attribute
• an error is generated if no default value has been defined and NULL
values are not allowed
CREATE TABLE students(
first_name VARCHAR(255) NOT NULL,
…,
country VARCHAR(255) DEFAULT 'Romania'
);
Domain constraints

• Verification Constraint (CHECK) – verifies the attribute values by a


condition that must be TRUE
• It is defined as a table constraint in the CREATE TABLE statement:
[CONSTRAINT constraint_name] CHECK (condition)
• Example:
CREATE TABLE students(
…,
b_day DATE,
CONSTRAINT dob CHECK (YEAR(b_day) < 2001)
);
Tuple constraints (row constraints)

• A relation is defined as a set of tuples. The tuples of a relation must


be distinct (there can’t be two or more identical tuples)
• For the relation tuples to be distinct, a primary key is used in each
relation
• A primary PK key of a relation is an attribute (or a set of attributes)
of that relation that has the unicity property, that is, each value of
the primary key is unique in that relation:
• Ti[PK] ≠ Tj[PK] for any i ≠ j, where Ti and Tj are 2 different tuples of the
relation

• The unicity of the primary key is a tuple integrity constraint: each


tuple can be precisely identified by the primary key alone
Tuple constraints (row constraints)

• The primary key must follow the following requirements:


• to be irreducible - there is no own subset of the PK key, different from NULL
which has the unicity property
• to be defined (known) for each tuple in the relation, therefore, NULL values
for any of the attributes of the primary key are not allowed

• The primary key is an implicit constraint: it is defined by the


designer when creating the table, and the DBMS checks for keeping
and maintaining the integrity of the tuples:
• At INSERT, the tuple must have the primary key unique and defined (there
are no other tuples in relation to the same value of the primary key)
• Mainly, at UPDATE, it is forbidden to change the value of the primary key
(but some DBMSs can admit the change, provided that the value of the
modified key is unique)
Primary and artificial keys

• It is possible to define either natural primary keys or primary


artificial keys, provided that they meet the conditions of unicity
and irreducibility
• A natural primary key is an attribute (or set of attributes) of the
relation:
• is a property of the entity type (or association) represented by that relation
(person, product, etc.)
• has naturally unique values: there are no two tuples with the same value of the
primary key, because there are no two entities with the same value of the
respective property

• Example:
• Passport number, CNP for Romanian citizens, SSN for US citizens
Primary and artificial keys

• An Artificial Primary Key is an attribute (usually simple) that is not


a property of the entity type or association represented by the
relation, but is added to the relation schema for uniquely
identifying tuples
• The uniqueness of the artificial primary key must be ensured by the designer
and the DBMS
• The irreducibility of the artificial primary key is ensured if it is a simple
attribute

• Example: STUDENTS(id, first_name, last_name, …)


• id - is an artificial primary key
Primary and artificial keys

• Natural primary keys could also be defined by simple or


compound attributes that have property of unicity under certain
conditions:
• Simple Attribute: CNP - Only applicable to people in Romania
• the compound attribute (first_name, last_name, dob, address) - too many
attributes

• For efficiency of tuple identification operations, primary keys with


fewer attributes (simple attributes) are preferred, when possible
• DBMSs offer different methods of ensuring the uniqueness of the
artificial key value
Primary and artificial keys

• In Microsoft SQL Server, unique values of the primary key are


obtained using the IDENTITY parameter, which ensures
incrementing the value of the key attribute
• In Oracle systems, artificial keys can be generated using
SEQUENCE objects; a SEQUENCE object generates a unique
number at each call of the NEXTVAL method
• In MySQL, the AUTO_INCREMENT parameter is used to generate
unique keys for primary keys
• In PostgreSQL, the SERIAL data type is used to generate unique
values for the primary key
Defining primary keys

• The primary key can be defined as a column constraint (when it


consists of a single attribute) or as a table constraint (for composite
primary keys)
CREATE TABLE students (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,

);
CREATE TABLE assignments (
student_id INT UNSIGNED NOT NULL,
class_id INT UNSIGNED NOT NULL,
CONSTRAINT pk PRIMARY KEY (student_id, class_id)
);
Inter-relation constraints (foreign keys)

• Let two relations R1 and R2, with a 1:M association


• A foreign key is a subset of FK attributes of relation R2 that refers a
candidate key CK in R1 relation and satisfy the conditions:
• foreign key FK attributes are defined on domains compatible with those of the
candidate key CK attributes of the R1 relation
• the values of the FK attributes in a tuple of the R2 relation can either be NULL
or identical with the values of the CK attributes of a certain tuple in the current
state of the R1 relation

• Two domains are compatible if they are of the same data type and
semantic (it makes sense to compare them)
• In SQL, domain checking is limited to verifying the compatibility of data types
and semantic compatibility has to be ensured by the designer
Inter-relation constraints (foreign keys)

• The foreign key represents a referential constraint between two


relations:
• Referred relation (R1) - parent relation
• Relation that refers (R2) - child relation

• The reference is made by value: the value of the foreign key is


equal to the value of the referenced candidate key
• The foreign key can be specified in the CREATE TABLE or ALTER
TABLE statement
[CONSTRAINT constraint_name] FOREIGN KEY (key_attr)
REFERENCES refered_relation (candidate_key)
[ON UPDATE {RESTRICT|CASCADE|SET NULL|NO ACTION}]
[ON DELETE {RESTRICT|CASCADE|SET NULL|NO ACTION}];
Inter-relation constraints (foreign keys)

• Foreign keys can be used only to ensure referential integrity, but


they can also specify actions in case the parent data is changed
(updated or deleted)
• RESTRICT – will not allow the editing/deleting of the parent
attribute value, if it has references to it in the child table
• CASCADE – will propagate the same operation from the parent to
the child table
• SET NULL – when editing or deleting a parent row, the child
values will be set to NULL, if the child column specifications allow
this. If the child attributes cannot be NULL, an error is raised
• NO ACTION – usually the same as RESTRICT
Inter-relation constraints (foreign keys)

CREATE TABLE assignments(


student_id INT UNSIGNED NOT NULL,
class_id INT UNSIGNED NOT NULL,

PRIMARY KEY (student_id, class_id),

CONSTRAINT fk_students FOREIGN KEY (student_id)


REFERENCES students(id)
ON UPDATE CASCADE
ON DELETE CASCADE,

);
Indexes

• The time of execution of operations on data in relations depends on


how the set of elements (tuples) are represented
• The search operation of an item in a set runs faster if the elements
of the set are represented by an ordered collection, such as lists,
trees or hash tables
• The time to search for an element in an unordered set of N items is
proportional to N – O(N) complexity
• The search time growth is linear
• If the item is stored in a binary tree-like structure, the complexity
becomes O(log N), which shows a significant improvement for
large data sets
Indexes

• SQL allows the definition of indexes to increase performance of


lookup (search) operations
Indexes

• It is not mandatory for the tuples of a relation to be ordered


• For the speed-up of search operations (SELECT) using a single key
(candidate – primary or secondary), ordered collections are used
• The other operations (INSERT, UPDATE, DELETE) also run faster
when dealing with ordered collections
• An index is an auxiliary structure which is physically stored in the
database, alongside the tables and other entities
• Structures used for indexing: binary search trees, B-Trees (Balanced
trees), R-Trees (used for spatial access), dispersion tables, hash
indexes, etc.
Indexes

• There are two categories of indexes:


• A primary index which sorts and locates the tuples in the database files
• Zero or more secondary indexes which do not modify the physical location of
the tuples

• The primary index is defined on the primary key of the relation


• Each element (node) of the primary index contains a tuple of the
relation and the elements are ordered by the value of the primary
key PK
• Each primary index contains the value of the primary key attribute
(Pki), the values of the other attributes (ai, bi, …) and the addresses
of the descendants of the node (Lj, Lk)
Indexes
Indexes

• Each element (node) of the primary index contains a relation tuple


• The query operations that use the primary index (primary key)
perform efficiently, being a search in an ordered collection
• They require maximum logN search steps (N is the total number of
tuples of the relation)
• Queries that are done by the value of other attributes are executed
much more inefficiently, being a search in an unordered set
• N maximum steps are required (N is the total number of tuples of
the relation)
• To improve the operations, secondary indexes are defined on those
attributes that appear frequently in queries
Indexes

• A secondary index on a relation attribute is a structure ordered by


the value of that attribute. The secondary index contains:
• the indexed attribute value (which is an order label)
• the address(es) of the tuples containing that value of that attribute

• There are two categories of secondary indexes: unique (UNIQUE)


and "normal"
• An UNIQUE secondary index is defined on an attribute A (simple
or compound) of a the relation that takes unique values (such as an
unique key - secondary or alternative)
• The "normal" secondary index (which is not unique - does not have
a specific name) is defined on an attribute A that does not have
unique values (it is not an unique key)
Indexes
Indexes

• In SQL, a secondary index can be created in the CREATE TABLE


statement as a column or table constraint, or it can be added using
the ALTER TABLE statement
• In general, the DBMSs systems automatically add:
• A UNIQUE secondary index for each candidate key (defined by the UNIQUE
constraint)
• A normal secondary index for each foreign key; as such, a secondary index
helps to quickly find all the tuples associated with a foreign key value

• Secondary indexes have advantages and disadvantages:


• Advantages: speeds up the query operations that are done by index value
• Disadvantages: take up memory space and take time to update relationships
The INSERT statement

• The INSERT statement is used to enter data into tables


INSERT INTO table_name [(col_1, col_2, …, col_n)]
VALUES (val_1, val_2, …, val_n);
• Example:
INSERT INTO students VALUES (1, 'Mike',
'Wazowski', '1997-03-23', 'ETTI', 2015, '441F');
INSERT INTO students (id, first_name, last_name)
VALUES (2, 'James', 'Sullivan');
The INSERT statement

• The column list can be omitted if values are entered for all
attributes of the table
• The order of the values must be the same as the order of the
columns of the table
• The order of the columns comes from the order of the attribute
definitions in the CREATE TABLE instruction, as well as all
subsequent table alterations (ALTER TABLE)
• The order of the columns can be queried using the DESCRIBE
statement
• If values are not specified for all columns, the missing columns
will get a DEFAULT value (if specified) or NULL (if accepted)
The DESCRIBE statement

• If no DEFAULT value is specified and NULL is not allowed, an


error is returned
DESCRIBE students;
• Sample output:
Field Type Null Key Default Extra
id int(11) NO PRI NULL
first_name varchar(255) NO NULL
last_name varchar(255) NO NULL
date_of_birth date YES NULL
faculty varchar(255) YES NULL
enrolment_year int(10) unsigned YES NULL
group varchar(255) YES NULL
The SELECT statement

• The SELECT instruction extracts (queries) data from one or more


tables, according to specific conditions
SELECT [DISTINCT] { * | column_list |
function_list | constant_list } [FROM table_list]
[WHERE condition] [other_clauses];
• The SELECT statement returns a table with the columns specified
in the columns list/function list/constant list
• The resulting table will contain the rows (tuples) of the cartesian
product of the tables in table_list for which the expression
condition is true
The SELECT statement

• The SELECT statement has the following parts (clauses):


• SELECT – specifies the list of columns/functions/constants of the result table
• FROM – specifies the list of tables from which the result is selected
• WHERE – specifies one or more conditions which will be applied on the result
• Other clauses:
• ORDER BY – allows the ordering of the results
• GROUP BY – allows grouping the results when using aggregate functions
• HAVING – allows adding conditions on computed results
• LIMIT – allows limiting the number of results and pagination
The SELECT statement

• Examples:
SELECT 3, 10, 5 + 7, 'Test';
SELECT * FROM students;
SELECT first_name, last_name FROM students;
SELECT first_name, last_name FROM students WHERE
`group` = '441F';
SELECT COUNT(*) FROM students;
SELECT NOW();
SELECT *, DATE(NOW()) FROM students;
SELECT DISTINCT `group` FROM students;
The SELECT statement

• The FROM clause specifies the tables (or views) from which the
data is queried. The syntax for the FROM clause is:
FROM table_list
• The WHERE clause allows adding one or more conditions (using
the logical comparison operators AND or OR) which must be met
by the resulting data.
WHERE condition [AND |OR condition] […]
• To control the order of the operations, parentheses () may be used
when necessary. The AND operator takes precedence over the OR
operator.
The SELECT statement

• The ORDER BY clause allows the ordering of the resulting data


using one or more identifiers. The syntax of the order clause is:
ORDER BY col_name|col_alias [ASC|DESC] [, …]
• The GROUP BY clause is used in conjunction with aggregate
functions, in order to group the results by one ore more columns.
Without the GROUP BY clause, the aggregate functions used in a
query will compute the result using all the rows from the resulting
table. The syntax for the GROUP BY clause is:
GROUP BY column_1 [, column_2, …]
The SELECT statement

• The HAVING clause is similar to the WHERE clause, but it allows


adding one or more conditions for the results of aggregate
functions. The syntax for the HAVING clause is:
HAVING condition [AND |OR condition] […]
• The LIMIT clause allows limiting the result set to a specific number
and the pagination of the results. By default, a SELECT instruction
will return all the rows in the queried tables for which the
conditions specified by the WHERE and HAVING clauses are met.
The syntax for the LIMIT clause is:
LIMIT [start_index,] number_of_rows
The SELECT statement

• The updated syntax of the SELECT statement is:

SELECT [DISTINCT] {* | column_list |


function_list | constant_list}
[FROM table_list]
[WHERE condition [AND |OR condition] […]]
[GROUP BY column_1 [, column_2, …]]
[HAVING condition [AND |OR condition] […]]
[ORDER BY col_name|col_alias [ASC|DESC] [, …]]
[LIMIT [start_index,] number_of_rows];
Aggregate functions

• Aggregate functions are predefined SQL functions which return the


computed value using multiple rows
• In SQL2, the following aggregate functions are defined:

Function Result
COUNT Returns the number of rows in the resulting table
SUM Returns the sum of the values in the column passed as the argument
MIN Returns the minimum value in the column passed as the argument
MAX Returns the maximum value in the column passed as the argument
AVG Returns the average of the values in the column passed as the argument
Aggregate functions

• Using aggregate functions without grouping clauses (GROUP BY):


SELECT COUNT(*) FROM table_name;
Returns the number of rows in the table
SELECT COUNT(column_name) FROM table_name;
Returns the number of not null values in the column
SELECT SUM(column_name), MIN(column_name),
MAX(column_name), AVG(column_name) FROM table_name;
Returns the sum, minimum value, maximum value and average value of the
values from the specified column

• Using aggregate functions with grouping clauses will generate a


result for each value of column or columns in the GROUP BY
clause
Scalar functions

• Scalar functions are predefined SQL functions which receive zero


or more arguments and return a value from the received arguments
• Numeric functions:
ABS() COS() LOG10() RAND()
ACOS() COT() LOG2() ROUND()
ASIN() DEGREES() MOD() SIGN()
ATAN() EXP() PI() SIN()
CEIL() FLOOR() POW() SQRT()
CONV() LN() RADIANS() TAN()

• Example:
SELECT CEIL(COUNT(*)/10) FROM products;
Scalar functions

• String functions:
ASCII() INSTR() OCT() RIGHT()
CHAR() LEFT() QUOTE() RPAD()
CONCAT() LENGTH() REGEXP_LIKE() RTRIM()
CONCAT_WS() LOCATE() REGEXP_REPLACE() SPACE()
FORMAT() LOWER() REGEXP_SUBSTR() SUBSTR()
FROM_BASE64() LPAD() REPEAT() TO_BASE64()
HEX() LTRIM() REPLACE() TRIM()
INSERT() MID() REVERSE() UPPER()

• Example:
SELECT CONCAT_WS(' ', first_name, last_name) FROM students;
SELECT SUBSTR(filename, -3) FROM files;
Scalar functions

• Date and time functions:


CONVERT_TZ() DAY() MAKETIME() TIMESTAMP()
CURRENT_DATE() DAYNAME() MINUTE() TO_DAYS()
CURRENT_TIME() DAYOFWEEK() MONTH() TO_SECONDS()
DATE() DAYOFYEAR() MONTHNAME() UTC_DATE()
DATE_ADD() FROM_DAYS() NOW() UTC_TIME()
DATE_FORMAT() GET_FORMAT() STR_TO_DATE() UTC_TIMESTAMP()
DATE_SUB() HOUR() TIME() WEEK()
DATEDIFF() MAKEDATE() TIME_FORMAT() YEAR()

• Example:
SELECT DATE_FORMAT(dob, '%d.%m.%Y') FROM students;
SELECT DATE_ADD(NOW(), INTERVAL 3 MONTH);
Aliases

• The results of aggregate or scalar functions will appear in the


resulting table using the function as a column name
• For presentation purposes, as well as for reusing the result of the
function for other clauses of the query, they can receive aliases
SELECT function | column | constant [AS] alias
[FROM table];
• Examples:
SELECT COUNT(*) AS number_of_students FROM students;
SELECT SUM(quantity * price) total FROM product_orders
GROUP BY order_id;
SELECT DATE_ADD(created_at, INTERVAL 6 MONTH) expires_at
FROM subscriptions;
The UPDATE statement

• The UPDATE statement allows updating column values for one or


more rows of the table
UPDATE table_name SET col_1 = expr_1 [, … n]
[WHERE condition];
• Example:
UPDATE students SET `group` = '442G' WHERE `group`
IS NULL;
• If the WHERE clause is omitted, the values for the columns will be
changed for all the records in the table
The DELETE statement

• The DELETE statement allows deleting rows from the table


DELETE FROM table_name [WHERE condition];
• Example:
DELETE FROM students WHERE id = 2;
• If the WHERE clause is omitted, all the rows from the table will be
deleted
• The UPDATE and DELETE operation are irreversible

You might also like