D426 Study Guide
D426 Study Guide
database systems.
Database roles
People interact with databases in a variety of roles:
A database administrator is responsible for securing the database system
against unauthorized users. A database administrator enforces procedures
for user access and database system availability.
Authorization. Many database users should have limited access to specific
tables, columns, or rows of a database. Database systems authorize
individual users to access specific data.
Rules. Database systems ensure data is consistent with structural and
business rules. Ex: When multiple copies of data are stored in different
locations, copies must be synchronized as data is updated. Ex: When a
course number appears in a student registration record, the course must
exist in the course catalog.
The query processor interprets queries, creates a plan to modify the
database or retrieve data, and returns query results to the application. The
query processor performs query optimization to ensure the most efficient
instructions are executed on the data.
The storage manager translates the query processor instructions into low-
level file-system commands that modify or retrieve data. Database sizes
range from megabytes to many terabytes, so the storage manager uses
indexes to quickly locate data.
The transaction manager ensures transactions are properly executed. The
transaction manager prevents conflicts between concurrent transactions. The
transaction manager also restores the database to a consistent state in the
event of a transaction or system failure.
MongoDB MongoDB NoSQL Open source 5
INSERT inserts rows into a table.
SELECT retrieves data from a table.
UPDATE modifies data in a table.
DELETE deletes rows from a table.
The SQL CREATE TABLE statement creates a new table by specifying the table and
column names. Each column is assigned a data type that indicates the format of
column values. Data types can be numeric, textual, or complex. Ex:
INT stores integer values.
DECIMAL stores fractional numeric values.
VARCHAR stores textual values.
DATE stores year, month, and day.
Analysis
Logical design
Physical design
SQL sublanguages
The SQL language is divided into five sublanguages:
Data Definition Language (DDL) defines the structure of the database.
Data Query Language (DQL) retrieves data from the database.
Data Manipulation Language (DML) manipulates data stored in a
database.
Data Control Language (DCL) controls database user access.
Data Transaction Language (DTL) manages database transactions.
The DROP TABLE statement deletes a table, along with all the table's rows, from a
database.
ALTER TABLE statement
The ALTER TABLE statement adds, deletes, or modifies columns on an existing
table.
A data type is a named set of values from which column values are drawn. In
relational databases, most data types fall into one of the following categories:
Integer data types represent positive and negative integers. Several integer
data types exist, varying by the number of bytes allocated for each value.
Common integer data types include INT, implemented as 4 bytes of storage,
and SMALLINT, implemented as 2 bytes.
Category Example Data type Storage Notes
Signed range:
-128 to 127
TINYINT 1 byte Unsigned
range: 0 to
255
Signed range:
-32,768 to
32,767
SMALLINT 2 bytes
Unsigned
range: 0 to
65,535
Signed range:
-8,388,608 to
8,388,607
MEDIUMINT 3 bytes
34 and - Unsigned
Integer
739448 range: 0 to
16,777,215
Signed range:
-
2,147,483,648
INTEGER or to
4 bytes
INT 2,147,483,647
Unsigned
range: 0 to
4,294,967,295
Signed range:
-263 to 263 -1
BIGINT 8 bytes Unsigned
range: 0 to 264
-1
Reverses the
- (unary) sign of one -(-2) 2
numeric value
Subtracts one
- (binary) numeric value 11 - 5 6
from another
Multiplies two
* numeric 3 * 5 15
values
Arithmetic
Divides one
/ numeric value 4 / 2 2
by another
Divides one
numeric value
by another
% (modulo) 5 % 2 1
and returns
the integer
remainder
Raises one
numeric value
^ 5^2 25
to the power
of another
Compares two
!= values for 1 != 2 TRUE
inequality
Compares two
< 2 < 2 FALSE
values with <
UPDATE statement
The UPDATE statement modifies existing rows in a table. The UPDATE statement uses
the SET clause to specify the new column values. An optional WHERE clause specifies
which rows are updated. Omitting the WHERE clause results in all rows being updated.
DELETE statement
The DELETE statement deletes existing rows in a table. The FROM keyword is followed
by the table name whose rows are to be deleted. An optional WHERE clause specifies
which rows should be deleted. Omitting the WHERE clause results in all rows in the table
being deleted.
The TRUNCATE statement deletes all rows from a table. TRUNCATE is nearly identical
to a DELETE statement with no WHERE clause except for minor differences that depend
on the database system.
The MERGE statement selects data from one table, called the source, and inserts the
data to another table, called the target.
Primary keys
A primary key is a column, or group of columns, used to identify a row. The primary
key is usually the table's first column and appears on the left of table diagrams, but the
position is not significant to the database.
A simple primary key consists of a single column. A composite primary key
consists of multiple columns. Auto-increment columns
An auto-increment column is a numeric column that is assigned an automatically
incrementing value when a new row is inserted.
Database users occasionally make the following errors when inserting primary keys:
Inserting values for auto-increment primary keys.
Omitting values for primary keys that are not auto-increment columns.
MySQL allows insertion of a specific value to an auto-increment column. However,
overriding auto-increment for a primary key is usually a mistake.
A foreign key is a column, or group of columns, that refer to a primary key. The data
types of the foreign and primary keys must be the same, but the names may be
different.
FOREIGN KEY constraint
A foreign key constraint is added to a CREATE TABLE statement with the FOREIGN KEY
and REFERENCES keywords. When a foreign key constraint is specified, the database
rejects insert, update, and delete statements that violate referential integrity.
Referential integrity actions
An insert, update, or delete that violates referential integrity can be corrected manually.
However, manual corrections are time-consuming and error-prone. Instead, databases
automatically correct referential integrity violations with any of four actions, specified as
SQL constraints:
RESTRICT rejects an insert, update, or delete that violates referential integrity.
SET NULL sets invalid foreign keys to NULL.
SET DEFAULT sets invalid foreign keys to the foreign key default value.
CASCADE propagates primary key changes to foreign keys.
Column and table constraints
A constraint is a rule that governs allowable values in a database. Constraints are
based on relational and business rules, and implemented with special keywords in a
CREATE TABLE statement. The database automatically rejects insert, update, and
delete statements that violate a constraint.
Adding and dropping constraints
Constraints are added and dropped with the ALTER TABLE TableName followed by
an ADD, DROP, or CHANGE clause
values.
BETWEEN operator
The BETWEEN operator provides an alternative way to determine if a value is
between two other values. The operator is written value BETWEEN minValue AND
maxValue and is equivalent to value >= minValue AND value <= maxValue.
LIKE operator
The LIKE operator, when used in a WHERE clause, matches text against a pattern
using the two wildcard characters % and _.
% matches any number of characters. Ex: LIKE 'L%t' matches "Lt", "Lot",
"Lift", and "Lol cat".
_ matches exactly one character. Ex: LIKE 'L_t' matches "Lot" and "Lit" but
not "Lt" and "Loot".
The ORDER BY clause orders selected rows by one or more columns in
ascending (alphabetic or increasing) order. The DESC keyword with the
ORDER BY clause orders rows in descending order.
SELECT ABS(-5);
Returns the absolute
ABS(n)
value of n
returns 5
SELECT LOWER('MySQL');
LOWER(s) Returns the lowercase s
returns 'mysql'
returns 22
SELECT
HOUR(t) MINUTE('22:11:45');
Returns the hour, minute,
MINUTE(t)
or second from time t
SECOND(t) returns 11
SELECT
SECOND('22:11:45');
returns 45
Aggregate functions
An aggregate function processes values from a set of rows and returns a
summary value. Common aggregate functions are:
COUNT() counts the number of rows in the set.
MIN() finds the minimum value in the set.
MAX() finds the maximum value in the set.
SUM() sums all the values in the set.
AVG() computes the arithmetic mean of all the values in the set.
Aggregate functions appear in a SELECT clause and process all rows that satisfy the
WHERE clause condition. If a SELECT statement has no WHERE clause, the
aggregate function processes all rows.
HAVING clause
The HAVING clause is used with the GROUP BY clause to filter group results. The
optional HAVING clause follows the GROUP BY clause and precedes the optional
ORDER BY clause.
A join is a SELECT statement that combines data from two tables, known as the
left table and right table, into a single result. The tables are combined by
comparing columns from the left and right tables, usually with the = operator. The
columns must have comparable data types.
a column name can be replaced with an alias. The alias follows the column name,
separated by an optional AS keyword.
A join clause determines how a join query handles unmatched rows. Two common
join clauses are:
INNER JOIN selects only matching left and right table rows.
FULL JOIN selects all left and right table rows, regardless of match.
In a FULL JOIN result table, unmatched left table rows appear with NULL values in
right table columns, and vice versa.
LEFT JOIN selects all left table rows, but only matching right table rows.
RIGHT JOIN selects all right table rows, but only matching left table rows.
An outer join is any join that selects unmatched rows, including left, right, and full
joins.
The UNION keyword combines the two results into one table.
Equijoins
An equijoin compares columns of two tables with the = operator. Most joins are
equijoins. A non-equijoin compares columns with an operator other than =, such
as < and >.
Self-joins
A self-join joins a table to itself.
Cross-joins
A cross-join combines two tables without comparing columns. A cross-join uses a
CROSS JOIN clause without an ON clause. As a result, all possible combinations of
rows from both tables appear in the result.
A subquery, sometimes called a nested query or inner query, is a query within
another SQL query.
. An alias is a temporary name assigned to a column or table. The AS keyword
follows a column or table name to create an alias.
In some databases, view data can be stored. A materialized view is a view for
which data is stored at all times. Whenever a base table changes, the corresponding
view tables can also change, so materialized views must be refreshed.
When WITH CHECK OPTION is specified, the database rejects inserts and updates
that do not satisfy the view query WHERE clause. Instead, the database generates
an error message that explains the violation.
An entity-relationship model is a high-level representation of data requirements,
ignoring implementation details
An entity-relationship model includes three kinds of objects:
An entity is a person, place, product, concept, or activity.
A relationship is a statement about two entities.
An attribute is a descriptive property of an entity.
A relationship is usually a statement about two different entities, but the two
entities may be the same. A reflexive relationship relates an entity to itself.
An entity-relationship diagram, commonly called an ER diagram, is a schematic
picture of entities, relationships, and attributes. Entities are drawn as rectangles.
Types and instances
In entity-relationship modeling, a type is a set:
An entity type is a set of things. Ex: All employees in a company.
A relationship type is a set of related things. Ex: Employee-Manages-
Department is a set of (employee, department) pairs, where the employee
manages the department.
An attribute type is a set of values. Ex: All employee salaries.
Entity, relationship, and attribute types usually become tables, foreign keys, and
columns, respectively.
An instance is an element of a set:
An entity instance is an individual thing. Ex: The employee Sam Snead.
A relationship instance is a statement about entity instances. Ex: "Maria
Rodriguez manages Sales."
An attribute instance is an individual value. Ex: The salary $35,000.
Analysis develops an entity-relationship model, capturing data requirements
while ignoring implementation details.
Logical design converts the entity-relationship model into tables, columns,
and keys for a particular database system.
Physical design adds indexes and specifies how tables are organized on
storage media.
Analysis steps
Step Name
2 Determine cardinality
5 Implement entities
6 Implement relationships
7 Implement attributes
Trivial dependencies
When the columns of A are a subset of the columns of B, A always depends on B.
Ex: FareClass depends on (FlightCode, FareClass). These dependencies are called
trivial.
This redundancy is eliminated with normalization, the last step of logical design.
Normalization eliminates redundancy by decomposing a table into two or more
tables in higher normal form.
Denormalization means intentionally introducing redundancy by merging tables
Databases commonly support four alternative table structures:
Heap table In a heap table, no order is imposed on rows.
o Heap tables optimize insert operations. Heap tables are particularly
fast for bulk load of many rows, since rows are stored in load order
Sorted table In a sorted table, the database designer identifies a sort
column that determines physical row order.
Hash table In a hash table, rows are assigned to buckets. A bucket is a
block or group of blocks containing rows. The modulo function is a simple
hash function with four steps:
Table cluster Table clusters, also called multi-tables, interleave rows of
two or more tables in the same storage area.
A table scan is a database operation that reads table blocks directly, without
accessing an index.
An index scan is a database operation that reads index blocks sequentially,
in order to locate the needed table blocks.
Hit ratio, also called filter factor or selectivity, is the percentage of table rows
selected by a query. When a SELECT query is executed
In a binary search, the database repeatedly splits the index in two until it finds the
entry containing the search value:
Indexes may also be dense or sparse:
A dense index contains an entry for every table row.
A sparse index contains an entry for every table block.
Hash index
Bitmap index
Logical index
Function index
In a hash index, index entries are assigned to buckets. A bitmap index is a grid of
bits:
Bitmap indexes contain ones and zeros.
A tablespace is a database object that maps one or more tables to a single file
CREATE TABLESPACE
Logical design specifies tables, columns, and keys. Physical design specifies
indexes, table structures, and partitions. Physical design affects query performance but
never affects query results. A storage engine or storage manager translates
instructions generated by a query processor into low-level commands that
access data on storage media.