CITS1402 Mid Sem Notes
CITS1402 Mid Sem Notes
Relation/ = table, instance/ tuple (formal)/ entity = row, attribute – column. Entity set is collection of similar objects. Table/relation stores information about a
particular entity set. Table has name, header row giving names to columns, zero or more rows. K-tuple is mathematical object with k values in specific order.
Attributes define what type of data is to be stored. Number of columns is arity. Two rows equal if have same value in every column. Relational table does not
have duplicate rows but SQL does. Structure of table largely unchanging.
A DBMS provides efficient reliable convenient and safe multi-user storage of and access to massive amounts of persistent data. A table is in first normal form if
each attribute is single valued
benefits of relational database. data independence. users don’t need to know where or how data is stored. Efficiency. RDBMS devises query execution plan to
efficiently run query. data integrity. keeps database in consistent state by enforcing integrity constraints. data administration. multiuser DBMS allows organisation
fine control over who can access / view different levels of data. concurrency control. need to stop two people from updating at same time. application
development. all big websites back by data base
anomaly is error or inconsistency in database. insertion anomaly. inserting a row may require irrelevant information. deletion anomaly. deleting a row risks losing
information. update anomaly. updating a row risks leaving the database inconsistent (would have to do it for every row to keep consistent). normalisation is
splitting information into multiple tables. redundancy is when the same data is stored in multipke tables. an attribute of one table that refers to an attribute of a
different table is called a foreign key. an attribute is called a key if it is impossible for different rows to have the same value of this attribute. you cannot tell which
attributes are keys by examing the rows that are currently in the table
aim of data independence and efficiency is user only needs to know logical structure of data. relational model is expressive enough for most data, but
constrained enough for effective query optimisation. internally consistent is database should never contain incompatible/contradictory data and prfereably not
incomplete data. constraints specified when table is defined . domain constaint. add check to column definition to indicate domain. key constraint. specify
primary key. referential integrity. specify when column in 1 table references column in another table to ensure all cross table references are correct. basic
question in normalisation is whether or not it is worthwhile to split one table into multiple
DDL defines structure of database (tables, attribtues, types)/ DML manipulate data in database including CRUD operations. CREATE TABLE creates table, DROP
TABLE deletes table, ALTER TABLE allows you to rename table, column, add column or drop column. NOT NULL used to make sure always has value (cant be null)
NULL, INTEGER, REAL, TEXT, BLOB Booleans = 1 if TRUE and 0 if FALSE. If you use numeric or text in context of Boolean then: Any non zero number (not just 1) is
true. The value 0 is false. Any text value is false
DDL: CREATE, ALTER, RENAME, DROP. DDL affects structure. DML: SELECT, INSERT, UPDATE, DELETE. DML affects data
If both arguments integers, result is also integer. Percentage is remainder. Strings compared alphabetically. IS and NOT is is used to find NULL values. BETWEEN is
ternary operator, inclusive of values
A relation is a set of tuples. Legal expression in relational algebra built from variables representing relations, set-theoretic operators, relational operators. Every
legal expression in relational algebra determines a relation. Projection operator (pi) is SELECT. Selection operator (sigma) is WHERE. Duplicate rows removed
when dealing with relational operators. Upwards pointing arrow is AND (^) and other is OR
Bare column if not group by column or summary function. NATURAL JOIN (like USING) deletes copy of repeated column
steps in designing database. Requirements analysis – what the database is used for. Conceptual design – identifies entities and relationships and construct ERD.
Logical design – devise schema based on ERD. Schema refinement (normalisation). Physical design. Application and security design. A bridge table is the design
pattern used to implement many-to-many relationships
ERD is visual way of representing. Entities being represented in database. Attributes of those entities. Relationships between those entities. Certain constraints
on all of above. Entity set is collection of objects about which the database must keep information. Composite key is key involving combination of attributes. Use
PRIMARY KEY (column1,column2,column3…) notation for composition key . Double oval in ERD is multi valued attribute and can take on multiple values. Cannot
be directly implemented into database, however could create its own table. Table in subquery is called derived table
Union compatible is same number of columns. Natural join is every column of R and S with the same name must match (don’t need condition in bowtie for this)
Redundancy is store of same piece of information multiple places. Decomposition of relation R is collection or two or more relations where Columns of each
relation is a subset of columns of R and Every column of R is found in the relations. Decomposition, normalisation relate to structure of table and not contents. R1
natural join R2 must equal R
If null in arithmetic expression, result is null. If you use NOT IN and there is Null in subquery and element doesn’t appear in subquery, going to get empty output
as don’t know if in or not. i.e. 3 NOT IN (1,2,NULL) is NULL but 2 in (1,2,NULL) is false. Only LEFT JOIN in Sqlite (not right / full)
When values in one table refer to tuples in another, Referential integrity means that these references should always refer to a legitimate tuple in the second
table. Bracket around column name in foreign key assignment. When updating / deleting:Forbidding is default. Operations causing data integrity violations not
executed. Cascading fixes up referential integrity violation by doing the same thing to the row in the child table as had happened to matching row in parent table.
Set to null allows operation to occur in controllably inconsistent state. RESTRICT / NO ACTION forbids, CASCADE, SET NULL. Don’t need to add restrict constraint,
will already do it
How relationships are implemented depend on business logic constraining relationship. Called cardinality constraint.Cardinality constraint restrict how many
times an element can occur in relation. Concerned with zero, one, many. Many to many relationship must be implemented as table in database. One and only
one can be done with foreign key to one and only one table (usually not separate table)
UPDATE table SET column = WHERE. If no where, every row. Set can alter many columns using commas. DELETE FROM table WHERE.
Virtual table defined using a stored query. CREATE VIEW name AS SELECT … SQLite reruns the query that defines the view when it is used in a query and uses the
new computed values. Structure of database determined by requirements summarised in ERD, schema normalistion and redundancy removal, security and
access control, cost and efficiency which can mean database not always easy to use. DBA can define customised tables for different users so user has table with
exactly required information, database is still optimised and easy to maintain, DBA has fine grained control over access to sensitive data. Underlying
implementation can be changed so by changing view, end user doesn’t need to change queries
Updatable view – updates to table can be performed using view. Materialised view – a view is materialised if results of query it represents actually stored in
database. View is updatable if can use DELETE, INSERT, UPDATE. Since rows are transient (only appear as result of query), system works out how to uniquely alter
actual table to so when view accessed again, it has been changed accordingly (may not be possible some times). Updatable only occurs if there is a unique way to
do it. Materialised view used for view with complex query, infrequently changing data, is queried often. Results stored in actual database table (caching). If DML
on base table, result of view query (with changed table) and stored value of amterialised view may be different. They are out of sync until materialised view is
refereshed . Refreshing is rerunning query and saving output. This can be done automatically, on schedule or with REFRESH MATERIALISED VIEW. Materialised
view taken to mean auto refresh by defaults. Sqlite doesn’t implement updatable/ materialised
Triggers are SQL statements that are run automatically when data in nominated table changes. Each trigger is defined on 1 named table. CREATE TRIGGER name
AFTER/BEFORE/INSTEAD OF DML ON table WHEN BEGIN END. INSTEAD OF only works on views in sqlite. OLD and NEW is DML is UPDATE. CASE WHEN THEN
ELSE END (no commas for different statements, works like python). Triggers can validate and abort actions if violate business rule, maintain auxiliary databases to
simulate materialised view, implement access control. Adds flexible layer of security and integrity
A probe is the action of finding(on the disk), reading a row of the table and testing to see if it matches condition. Time taken is determined by number of probes.
If database (name (alphabetic), num) has n rows, num to name is n/2, name to num is log2(n). An index on a particular column in database is auxiliary database
structure that speeds up searches on that column from linear to logarithmic. CREATE INDEX name ON table(column). Creates balanced (b) tree. Index must be
updated whenever data changes (automatic) but INSERT and DELETE slower then if no index. Index good if data is large, frequently queried, infrequently
changing. DROP INDEX name. VACUUM to recover space of index. Index automatically created on primary key as likely to be used in join statements, foreign key
constraints, subqueries. Can index multiple columns. If both used, query vastly sped up, only C1 sped up, only C2 not sped up. With composite index, data
organised within C1 then C2
An optional relationship (includes 0(0-1 or 0-many)) can be implemented as foreign key because foreign key can be null in Sqlite. 1-1 or 1-many ( mandatory
participation) is foreign key. Entity sets become tables, many to many become tables, 1-many usually foreign keys, 1-1 foreign keys
For every legal r of R, r = R1r natural JOIN R2r to be a lossless join. If we can find one example where we cannot recovered R from R1 and R2 then not lossless
join. A key is an attribute of set of attributes with property that only one row in database can have same value for attributes. A key is a minimal key if no proper
subset of its attributes is a key. Decomposition is lossless Join If and only if: R1 or R2 = R and R1 and R2 is a key for R1 or R2. Tension between redundancy and
efficiency in RDBMS. Large number of small tables good for reducing redundancy but multi table joins is difficult to code, resource intensive to run and doesn’t
scale well. Theory of normalisation defines different types and levels of redundancy
If X and Y two subsets of attributes of relation where no tuples can be identical on X but different on Y, then X determines Y. In BCNF if for every functional
dependency (X -> A): A is an element of X (trivial) Or X is a key. If in BCNF, no redundancy arising from functional dependencies. If R not in BCNF, then must be
BCNF violation ( X -> Y). Assuming X and Y is not an empty set (i.e. y contains more attributes determined by X, not the ones in X itself). Can decompose relation
into: R1 = R – Y and R2 = XY. For examples in INLR, if L -> R, then R1 = INL and R2 = LR. If R1 or R2 not in BCNF then keep decomposing. Problems: complicated
joins, some FD’s from original relation can no longer be enforced with decomposed relations
Atomicity – transactions atomic if system insures they cant be half done. I.e. either whole transaction completes or fails with no effect. Consistency – if database
consistent and transaction occurs, result must be consistent. Database consequences (i.e. CASCADE) must be fully executed before statement deemed complete .
Isolation – each user of DB should be able to execute transaction as if only user. In multiuser database, can’t afford to do this so locking protocols used to prevent
interference/ Durability – once user is informed of successful completion, then effects persistent. User shield from any problems (crash) that could occur after
being notified that transactions successfully completed. Properties of database that ensure always in valid state, acid compliant. DBMS must be robust under
normal (multiple user concurrent) and unusual (crashes, connection failure, disk full error, power outage) conditions. System interleave statements from two
users . Transaction is sequence of statements that comprise meaningful activity in users environment. To ensure atomicity, DBMS rolls back database if
transaction fails: DBMS maintains log of changes, Statement first recorded in log, Database then updated and written to disk: Called write ahead log. Transaction
initiated by BEGIN TRANSACTION and all statements same transaction until COMMIT, END TRANSACTION, ROLL BACK
NoSql is range of DBMS that is not relational, Mongo (document) most popular. In sql, data is always structured into tables and new data must fit into structure.
Mongo data is largely unstructured and new data of any type can be added anywhere anytime. In RDBMS, table create strict template, table has fixed schema,
determine number, type name column, schema designed in advance and hard to change, normalised to minimise redundancy and storage requirements. JSON
Javascript Object Notation text based key value format to store exchange objects, single object within {} is document. Each document consists of field and each
field has key – value. Value can be single value or array. Elements of array can be documents. Document is arbitrarily complicated. In document DB, all
information about entity can be stored in single document. In MongoDB, database contains collections. Collection contains documents. Documents contain key
value pairs. Indexes and search algorithms make retrieving data fast. Collection is database, document is table, key-value is row. Organistion (SQL) replaced by
search (NoSQL). Easiest way to deal with large datasets / more queries is more servers (scaling out/horizontal scaling). In Mongo, data can easily be divided
amongst as many servers as required, divide collections or even individual documents between servers. If server has right document, has all information
required. Scaling out harder in RDBMS as data organised in different format to how is used. Tables cannot easily be distributed across multiple servers as queries
involve multiple tables