0% found this document useful (0 votes)
90 views

Databases Practical Notes

This document discusses several topics related to databases including data modeling, the relational model, SQL, database schemas, keys, foreign keys, and entity relationship diagrams. It provides explanations of concepts like logical and physical data independence, normalization, and database transactions.

Uploaded by

Daniel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Databases Practical Notes

This document discusses several topics related to databases including data modeling, the relational model, SQL, database schemas, keys, foreign keys, and entity relationship diagrams. It provides explanations of concepts like logical and physical data independence, normalization, and database transactions.

Uploaded by

Daniel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

notities database lectures

Storing data is important in many different parts of the industry.


Developing data models, sql statements and a little bit of database
programming.
There is reasoning between good and bad database design.
Databases follow a certain logical structure with specific semantics, and a spciefic group of
users. We store in databases, we can use a query language. With files, you also have weak
logical structure, no efficient access, limited protection from data loss and no access control.
data independence and avoidance of duplication are advantages of databases. They give a
logical view, users can interact with a simple view and behind the scenes we have rapid
access and manipulation, along with the different views of the same database.
ANSI SPARC architecture.
Physical level (internal) → stored data
Logical level → logical (conceptual)
View (external)--> what the user can see. Different applications are able to have different
views. Application programs are able to hide details of data types and hiding information. At
the logical level (conceptual schema), we describe data stored and relations among them,
and the physical level is the byte layout and practical stuff.
Logical data independence → ability to modify the logical schema without breaking existing
applications (adding additional columns, for example). Physical data independence is the ability to
modify the physical schema without chaing the logical schema.
Relational model -> relational databases. A table is a set of tuples with no duplicate tuples
and no order on the tuples necessary.
Database schema describe the structure, relation and constraints of a database.
structured query language.
SQL → declarative query languages. It describes what information is sought, not prescribe how to
retrieve the information. Just say what you want to know basically.
imperative languages → explicit control, implicit logic
declarative languages → implicit control, explicit logic.
between these few other subsection.
SQL is declarative data manipulation language. User describes conditions the requested
data is required to fulfil.
Data models and integrity constraints → relational model and meta language for describing, sql can
be used for table and constraint definitions. Primary key constraint → certain item but can also be
foreign key constraint → referring to another database. We can put constraints on data types, columns
constraint and check constraints. (aka unique, nullability and >18 and <150.
We can put constraints for certain things in a table; not null (needs to be filled)SQL can also
be used to create views.
Concurrent access and transactions. Transaction is a sequence of operations that performs
a single logical function in a database application. The management insures ACID
properties; atomicity, consitency, isolation, durabilitly; Fully or not at all, all integrity
constraints hold, multiple users can modify, and once successful, modified data is persistent.
ENtity relationship model
Blue boxes → entities → objects
red diamond → relationship description
Each entity has its own information things, shown with yellow circles.
Entity relationship model → classes/associations.

shorter videos week 1;


relational model example
unique column makes it easier to identify rows. Address may be null. You can reference to
other table by naming the column the same. (done with sid in this example.

Database schemas describe the structure.


SOme entries must by certain data type;; strings, number, binary data, date and time.
We can also give something a DOM domain, limiting the set of certain values in a certain
range. You can also still add constraints to this.
Relation schema defines the structure of a finite sequence of attribute names. Can be written
as tuple. Goes for each attribute, data type (or domain) dj.
Create table is not the best to work with from human to human. Mentioning the name of the
table and the name of each column makes it a lot easier; or just give the header.

Database states → actual content at a certain moment.


Tuple are formed to describe a certain row at a certain moment .is part of dom(d1) if each
value is in the domain and amount of elements in the tuple is the same. A database I for
database schema s defines for every relation name Ri a finite set of tuples with respect to
the secma. Thus I(Ri) is a relation in the mathematical sense.
Database consists of relations; (classes), consisting of tuples (objects), consisting of
attribute values (object properties).

Null values; we can leave table cells empty; ‘null’, which is not part of the domain of this
column. Its neither the number 0 or a empty string, different from all values of any data type.
Can occur when the value doesnt exist, not applicable, now known or any value will do and the info in
their doesn’t matter. Without null values, we would have to split a relation into many more specific
relations and subclasses. Do not use fake values; different users will make their own strings, just a
dash etcetera. Sql uses a three-valued logic → true, false, unknown. Any comparison with null give
back unknown. A = null is a different query compared to A is null. Stupid and annoying af. SQL
allows to control whether the attribute value may be null or not, (not null restraint). Leads to simpler
application programs and fewer surprises during query evaluation.

Integrity constraints;
Model the relevant part of the world is the task, tables can lead to too many meaningless states.
Integrity constraints → keep values that certain cells have to satisfy, making it so that states are
realistic in a real world scenario.
Not null, key constraints (only appear once), foreign key constraints → values in a column must also
appear as key values in another table, check constraints must satisfy a given predicate. Can protect
against data input errors, company standard, inconsistency. Application/querying might be made
easier as well with these restrictions.

Keys;
Relational models; a key that uniquely identifies the tuples in R (relation). If all the attributes
in a tuple are the same, they are the same tuple. Disagree at least once to be different.
Key constraints only go for stuff in the same row, not for stuff in the same column
necessarily. If {A,B } is a key, rows may agree in A or B, but not both. All relations have a
key. If a key is in B key, b is weaker as more states exist that satisfy it. Minimal if no proper
subset is a key. (you can’t drop anything and have it still be a valid key).
A relation may have more than one minimal key. Primary key cannot be null, all other keys
are called alternate.secondary keys. Usually a single attribute that is never updated.
Keys are constraints; they refer to all possible states, not just the current/correct one. COnstraining the
right thing is important. No good choice → add an extra column with a simple column.

Foreign keys;
Basically info you get from other tables. Relational models don't provide explicit
relationships, links or pointers. Use the key attributes to reference a tuple.
Same column names might refer to the same columns. Foreign keys are not keys themselves →
unique identification is not true in the alternate table.
Foreign key constraints ensure that every type in t results where it is not null, there exists a
tuple u in students as well such that t.sid = u.sid.
Foreign keys may be null.
Deleting rows that are referenced → rejection, cascade ( tuples that are referenced also deleted),
foreign key set to null.
Foreign keys are denoted with an arrow

Data modelling
Database design phases.
Formal model serves as a measure of correctness. Needs expertise, flexibility, and size is
enormous.
Phase 1; conceptual; think of what you want to store, relations and the constraints.
phase 2; transformation of the conceptual schema into the schema supported by the
database. E.g. relational model
phase 3; physical; design indexes, table distribution, buffer sizes, maximize performance of
the final system.

Entity relationship model.


Three main ingredients → entity sets, attributes, relationship sets.
Rectangles → entity sets
Ellipse → attributes, where double is multi-valued, dashed is derived.
Diamonds→ relationship sets.
Lines → link attributes and relationship sets to entity sets.
underline indicates primary key attributes.
Entity is an abstract object. They have attributes. Entity set → collection of similar entities. (all
persons, companies etc.).
Entities could be objects, while sets could be classes.
Entity set is represented by a set of attributes.
Simple vs composite attributes → can be made up of different values of different types.
Attributes; can be nested and part of other attributes. Multi-valued is also possible; needs
more than a singular value. Derived attributes can be derived from other attributes that
already exist in the model. Relationship set connects two entire groups. Role indicators can
be useful to use (gets put on the line). Binary; degree 2, relationship sets of degree. (number
of entities participating in the relationship). Of course, can always be split up with an extra
relationship describer.
Cardinality limits.
Expressing different kinds of relations. UML notatie. Expresses the number of entities to which
another entity can be associated via a relationship set. You have an at least and at most amount of
entities that can be connected to it. 0..1; 0/1, 0..* → any number
1..* → 1 and higher (not zero).
1..1 → can only be connected to a singular one. Becomes bijective function.
0..1 relation → either has to connect to at most one, or none at all.
Bit weird notation; can only be zero/one-to-many. Kind of flipped around? THe amount you
can connect to depends on how many are on the other side of the descpition. Arrows can
express 0 or 1, a certain kind of arrows with circles and arrows.
Total participation → every entity in the entity set participates in at least one relationship in the
relationship set. If not → partial participation, you may not participate in a certain relation.

Relationship sets with attributes.


An attribute can also be a property of a relationship set. The relationship depends on the two
given things. Cardinality can affect this kind of design.
Depending on the kind of relationship → one to many/many-to-many can result in different options.

Weak entity sets.


Every payment must be associated with a certain lone → entity set without a primary key. Depends on
the existence of an identifying entity set. The discriminator is a partial key; distinguishes. the weak
entity only in combination with the identifying entity.

IS-A Inheritance.
Employee inheritance attributes of previous things that are in the input of an ISA.
Lower level entities are subgroups of the top entity. They inherit all attributes, but also the
relationship sets. Can be created both top-down and bottom-up.
There can be some membership constraints. (value-based assignments). The default is user-defined;
manual assignment to subclasses. Disjointness→ something can belong to at most one subclass, else
overlapping possible (base assumption). Completeness → total specialization constraint; each
superclass entity must belong to a subclass; must be either of one.

Aggregation
Relation (works-on).
Relationships might need a connection that we want to make. We fix this not by connection
to all entities. SOlved by treating relationship set as an abstract entry, which allows relation
between relations.

Notation summary.
Entity set → rectangle → weak if line around.
Circle is an attribute, circle is multi-valued. Dotted line only is derived from others.
Relationship set is a diamond, where double diamond is an identifying relationship set for a
weak entity set. just watch the video

Unified modeling language.


Main difference; in a box in the same entity set. Drawn like a weird all boxed out ER model.
Inheritance is done with two arrows, either disjoint or overlapped. Writing disjoint is still
legitimate.

5. Advanced SQL

basic sql syntax


select clause + from clause + where
where is a condition for the rows in the case to be considered, if absent all rows considered.
Can be easily practiced within the environment. We basically really declare variables of
range over tuples of a relation. Can also be understood through pseudo. Attributes are
referenced in the form R.A if R is the only tuple variable with attribute A, then it suffices A.
UNambiguity check is purely syntactic → if the same in the rows, just check that R.number =
E.number is the number in both tables.

Querying multiple tables.


select A1, A2, AN from students, results table where C, then you will check all 40
combinations of the rows.
Better evaluation algorithm of course, done by query optimiser. nested foreach algorithm suffices, as
long as we get the same outcome. JOin condition to ensure you get the results you want if you look
through 2 tables. “and” condition in the “where” clause. It is always an error if there are two tuples
variables which are not linked (almost always). Joint conditions usually correspond to foreign key
relationships. Logical if you think about it; the same keys hold the same information, just in different
tables. You can draw lines in order to understand what route to take if two pieces of information
aren’t connected. Cycles in the connection graphs → selection may be more difficult. THink about
which part it is which you have to choose, dependent on the context. Do not join more tables than
needed, optimizer might not see redundancy.
Make sure to have join conditions between things in different tables practically always it
seems.

Self joins:
Same table can be queried more than once. Might have to consider more than one tuple of the same
relation → Homework marks example. You can just add more internal and classes to get a self join.

Duplicate elimination.
Duplicates have to be explicitly eliminated in sql. The distinct modifier may be applied to the
select clause; request explicit duplicate row duplication.
Superfluous distinct → uniquely determined by the result.
Algorithm to do this;
1. let k be this set of attributes in the select clause
2. If A=c in the where close and c is a constant, add A to K
3. if A=B in the where clause and B is in K, add A to K
4. if K has a key of a variable X, add all attributes of X to K
5. repeat till stable.
If K contains a key of every type available listed under from, then distinct is superfluous.
Common mistakes; missing join conditions, unnecessary joins, self joins with missing
equality conditions, unexpected duplicates, unnecessary distinct. Sometimes only slow, not
necessarily breaking.

outer and inner joins, left and right outer joins are two different options.
Natural → yields comparison of columns with the same name
using(A1, An), with columns appearing in both tables, the join predicate will return equality.
Join predicates; determine when things are matching so when they are true.
inner[join]: is the base form of join, cartesian product.
left[outer] join: preserves rows of the left table, right table attributes will become null
right[outer] join: same thing but in reverse:
Full [outer]join: preserves rows of both tables
cross join: cartesian product (all combinations of the rows of the two tables).
Join eliminates the tuples without a partner.
Left out join presserves all tuples in its left argument, filling the attributes of the second table
with null. Cartesian product is just all possible combinations.
Count is an action is SQL that will make you count the amount of times a certain attribute is
there.
Even if you have statements or conditions, they stay in the final table and can cause
confusion. Filtering the table before the drawing operations in order to properly remove
them. Filtering things in the tables on the side of the join class is not possible, and has to be
done beforehand.
Joining with on/using/natural all require slightly different methodology.

Part 2
Non-monotonic queries.
Example would be to find student who hasn’t submitted homework. However, if you would
then add a new row, would lead to less rows in the answer.
Currently, we can’t formulate non-monotic behaviour. “negated existential quantification”. Bowls
down to → test whether a query yields a non-empty result.

Not in
Attribute shouldn’t appear in the result subquery - check. The subquery is evaluated before
the main query.

NOt exists
True if the result of the subquery is empty. The outer query and subquery are currelated, and
the subquery is parameterized. Else it can just be replaced with true or false.
NOn-correlated subqueries with non exists are almost always an indication of error.
Exists without negation; will be true if it is not empty.

For all
In logic; existential and universal quantifiers exist, one where “exists” and the other where
“for all that exist”. Although we do have a restricted form available.
for all x(p) == -EX(-P)
all cars are red == there exists no car that is not red.
Implication does not exist; for all X(alpha → beta)
becomes -Any where X(alpha and not beta)
Translating natural language to sql can be difficult.
For all and implication relation can be sometimes interchanged, they are logically equivalent
to each other in the example at 7:27
Translation from there is then almost trivial.

Nested Subqueries.
If we want to find everyone who has solved all assignments → need a loop.
Need to have an outer query to solve that. You can repeat that over and over, having loops
inside of loops over and over again.

All, any, some are allowed keywords.


The subquery must yield a single column. Universal is all, existentially is any or some.
Although we do not necessarily need them as we can convert any into an equivalent exists
statement. They just make life a lot easier.

Single value subqueries.


Comparison of single values is allowed without using the keywords of all any or some. Must
then return a single value. Category and number together can form a key, leading to the
guarantee of 1 attribute returned in order to fulfill this requirement. Empty subquery is
interpreted as null, which can be used to search for null values, although that is bad
manners (as “not exists” is much more effective).

Subqueries under from.


From clause can be applied to nested applications. Subqueries can take the place of the
table in the form clause. “as” statement gives column names if we work with this.
Aggregations allow us to calculate things such as minima, maxima or averages. Subquery
may not be referenced where it is introduced. A view declaration registers a query under a
given identifier, making it easier to use in other subqueries. Should be thought of as a
subquery macro.

Aggregation functions.
We go from a set or multiset to a single value. The input is the set of values in an entire
column, and the output is a singular value depending on which aggregation we do.
Count(*); we count all rows of a result
Some things are sensitive to duplicates, while others are very insensitive (min, max).
Simple aggregations feed the value set of the entire column into an aggregation function.
Done through grouping the columns. If we don’t want to count things in a column twice;
make sure to add “distinct”. avg calculates the average of the values in the attributes
mentioned in its brackets. Simple aggregations may not be nested; single values will have
no sense anyway. Aggregations can not be used in a where clause, as it is a condition for a
single row, not a column. AGgregation function is used without group by, no attributes may
appear in the select clause. Null values are filtered out before aggregations are applied, the
exception being count(*), which counts rows. If the input set is empty, it will be null (or if all
attributes are null). The count function of an empty input is zero. null and zero difference
seen here!! It's a bit confusing, pay attention.
Aggregation with GROUP BY and HAVING
Group by partitions the rows of a table into disjoint groups, based on value equality for the
group by attributes. Aggregation functions are now applied for each group separately. The
groups are formed after evaluation of the from and where clauses. The “group by” never
over produces empty groups. Be specific and include topics in your query to avoid issues.
Aggregations may not be used in the where clause, but in the having clause we are ONLY
allowed to use aggregation functions. Having clauses can make us drop entire groups in
queries.

Aggregations with subqueries.


Can be used to have nested equations, taking minima of not yet computed stuff. Aggregation
subqueries can be used in the select clause.

Case distinctions. Union ,case, coalesce


UNion allows combining the results of two queries. Necessary for two specializations that
are stored in different tables. “course” table might be built from “graduate_course” and
“undergraduate_courses”. Union is also commonly used for case analysis.
UNion allows you to combine different queries into a singular table despite assigning
different values; application of case distinction. The union operand subqueries must return
the same number of columns and compatible data types.
intersect; (A n B). Do not add to the expressivity → can rewrite union.
Sometimes a conditional expression suffices and is more efficient. Typical application to replace a
null value by a value Y → coalesce (X,Y) (every x=null becomes Y).

Sorting output with ORDER BY


Sorting rows in output can be ordered through the order by clause.
Order by [attributename][asc].
If things are a tie, compare by next attribute.
Order by is not allowed to be applied to a subquery, since it is purely cosmetic.

Functional Dependencies & Database Normalization (week 4)

introduction
Reasoning about good or bad design for databases.
Functional dependencies → generalization of keys, central part of relational database design theory.
Defines when relation is in normal form.
Violation of the normal form → sign of bad database design.
Data stored redundantly often.
3NF and BCNF are often used. 3NF in practice.
BCNF requires that all FD’s are keys.
Normalization algorithm → construct good relation schemas, the derived tables will automatically be
in BCNF.
First normal form → all table entries are atomic (not lists, sets, records, relations)
Next steps are based upon this one.

Functional dependencies
Whenever two rows agree on 1 thing in the row, they must also agree on another variable.
Functional dependency; A1…An → B1…Bn holds for relation if and only if
the same things point to each other; seems redundant. Similar to partial keys, determining
uniquely some attributes, but not in general.
A,B → C,D implies A,B → C and A,B → D but not A → C,D or B → C,D
Keys vs functional dependencies; keys really are functional dependencies. A key uniquely
determines all attributes of its relation.
Functional dependencies are partial keys, as we restrict it to the attributes that are shown in
the list. We want to turn FD’s into keys as databases can control those better. Pay attention
to what attribute actually uniquely determines other attributes. This can also be a
combination of different attributes.

Implication of functional dependencies.


If A → B and B → C then A→C as well.
How to determine whether alpha does imply beta? Should include the Armstrong axioms;
reflexivity, augmentation, and transitivity.
Simpler by working with covers → Cover of attributes with repeat to F of FD’s as long as for any
given A, F implies that alpha → A, although hard to compute (9:30).
When asked to give all things in cover; check what if implies and recheck all other
statements until all statements did not give a new attribute after full cycle.

Canonical sets of functional dependencies


We make the right-hand sides singulars, minimize the left hand sides through the covers,
then remove implied FD’s.
You’re allowed to just split double functional dependencies.
We remove implied by looking for double chained things, we can check if they are still
equivalent to the original set at the end. Dropping things you can do if you can ‘get it back’
from another FD. Probably easier on paper.

Determining keys
Finding a minimal key is done through the covers. You can get different keys depending on
algorithm/order. Just remove stuff until you can’t remove anything anymore that is implied.
To find all minimal keys → candidates that do not exist on the right. If you can conclude something
new is necessary, test all of them and see if they fit and which was the least necessary. All of them
that cover everything make up all minimal keys (that are minimal).

Determinants
A set is a determinant of another if the functional dependencies hold, the left side is minimal and the
set is not trivial (not a superset of the other). If you drop any attribute it changes the right hand side →
minimal. Basically just work through all the options once again and find what describes everything,
just don’t consider ‘minimal’

Consequences of bad design


Redundant storage of certain facts, and inserting and updating can become hard.
Redundancy also makes upholding of integrity hard, creating a need for additional
constraints. The goal is to enforce everything to be built upon FDs.

Boyce-codd normal form.


If all dependencies are implied by keys it is in this form. Either it contains a key or fd is trivial
because it is a subset of the original list of functional dependencies. To find minimal keys;
basically pick the keys that are on the left i guess. If a relation is in BCNF, key constraints
automatically satisfy all FD, anomalies do not occur.

3NF
Slightly more general than BCNF. Key attribute appears in the minimal key, necessary for
this.
If something is in bcnf, it's also in 3nf. 3nf; “b is a key attribute of R”. It basically has the extra
option; the right-hand side is an attribute of a minimal key. Conversion of real life examples
to letters can be complicated; make sure to think about combinations of attributes forming a
key.

Splitting relations
Splitting relations; make sure that is lossless by using a join. Why split? Good question.
Split is lossless if the set of shared attributes is a key in at least one of them. We can always
transform a relation into bcnf by lossless splitting (right/left rule automatically fills). It
sometimes leads to opportunities to store data that applies to 2 columns but maybe not to 2
others. Think about the usability of it. Can waste storage space.
Preservation of FD’s is handy and nice but not a restriction.

Transformation to BCNF
1. compute canonical set of FD’s
2. maximise the right-hand sides of the Fd’s
3. split off violating FD’s one by one. Just practice a lot, the theoretical rules are
confusing af.
A → d, B → C, B→ D and D → E
With {a,b} as minimal key, not in bcnf.
Write everything that’s on the left and maximise the information gain you can get from it.
Split in relations and have 1 be the key for the one that violates.

Transformation to 3nf preserves all dependencies.


We make a canonical set, merge things with the same left side, not necessarily maximizing
right-hand side, then create a relation foro every functional dependency, then remove
subsets. Way easier to understand what is happening. Advantage of keeping key constraints
gets preference when building database.

Multivalued dependencies and fourth normal form.


Multivalued dependencies may have multiple things such that nothing alone forms a non-
trivial dependencies. That means it will be in BCNF, even if a lot of information is redundant.
Splitting can help solve those redundancies, but you might lose information; they need to be
independent of each other. Multivalued dependency; you have a set of values in a column.
Whenever two tuples agree on a, one can exchange their b values and the resulting tuples
are in the same table. Expresses independence properly. They always come in pairs; needs
to be satisfied for all columns.

Normal forms and conceptuel designs.


If a good er schema is transformed into the relational model, the result will satisfy all normal
forms. If this does not happen, there usually is some sort of flaw.
In the ER model, the solution is the same as in the relational model, we have to split the
entity set! FUnctional dependencies between attributes of a relationship always violet the
bcnf. ALso the case if an attribute of a ternary relationship depends on only two of them.
Needs to depend on all of them.

Denormalization.
Process of adding redundant columns in order to improve performance. This avoids the
otherwise required joins. Insertion and deletion anomalies are avoided, although there will be
update anomalies. Can also be used to create entirely new separate redundant tables (or
aggregation).

Other constraints that can be interesting.


Using keys can give you other constraints. Interrelational constraints between tables can
solve certain problems if things can be both things at the same time.

Concurrency anomalies.
What happens when multiple people at the same time access the database.
What happens if things fail to comply before finishing the process? Atomicity is needed;
need to finish all or nothing at all. Sometimes you can also have an inconsistent database
state if you do things at the same time or during other processes. If there are problems
during transactions might cause problems, even if it undoes all the actions done before.
Lost update anomaly, inconsistent read, dirty read and unrepeatable read are the names.
ACID properties; atomicity; fully or not at all, consistency; consistent state, isolation; modify
without seeing each other's actions. Durability; once committed, persistent regardless of
crash.

Transactions
They are a list of actions. ends with either commit or abort.
Scheduler responsible for execution order of concurrent database access.
The order in which two actions appear must be the same as described in the transaction
originally. Schedule is serial if actions are not interleaved; one after another. Serializable;
effect on the database the same as serial schedule. If it is not, we might get anomalies as
mentioned before. They conflict if; they are from different transactions, involve the same data
item and one of the actions is a write. WR/RW/WW conflict may make it not serializable.
WR → dirty read
RW conflict → unrepeatable (t1 read y and t2 reads y)
WW → overwritten (t1 writes y, then t2 writes Y)
We can swap actions without changing the actions if the actions are non-conflicting.
COnflict equivalent -> swaps of non-conflicting adjacent actions.
Conflict serializable if it is conflict equivalent to some serial schedule.
precedence graph → The graph has a node for each transaction, edge from t1 to t2 if there is a
conflicting action between them in which t1 occurs first. COnflict seems to be just things you can’t
swap without it making the transaction different. If no cycles ; equivalent serial schedule can be made
by topological sort. Blind writes → don’t matter in the end.
Conflict serializable if no cycle and order needed is clear, then doing it in that order.
Two phase locking
checks for concurrency control to ensure serializability during runtime.
Hard as you don’t know the transactions that will be happening. You need a strategy;
pessimistic/optimistic/multi-version. Transactions must lock objects before using them.
Shared-lock/exclusive lock → before reading/before writing. Only one can hold an exclusive lock.
Objects can have both locks at the same time. So when unlocking, you can write US(A) or UX(A) to
unlock them individually. A transaction cannot get new locks once it releases any lock. Pretty easy to
check. If after U there is a S or X, it works like that. The 2-phase lock protocol does not work in the
wife money example → solved with x-lock.

Deadlock handling two phase locking.


IF both transactions are waiting for each other to unlock, they are deadlocked. Can be solved through
waits-for-graph. Nodes are transactions, and an edge is there if t1 is blocked by a lock held by t2, then
t1 → t2. If there are cycles, that is a deadlock, and gets solved by aborting 1 or more transactions. The
victim selection is hard (young → starvation, old → loss of computation). Timeout → maximum
amount of time that a lock can be used.

Cascading rollbacks.
Problems with canceling abortions. Committing can not be done if other transactions have
read from something written in an aborted transaction. They need to abort as well (else dirty
reads). They can cascade onto multiple transactions. Recoverable if t2 reads from t1, then
the commit of t2 must be after the commit of t1. Cascadeless schedule; you can also delay
the entire read before t1 is committed, avoiding all dirty reads and rollbacks that do not
cascade. Recoverable is less drastic then cascadeless.

Strict and proclaiming 2 phase locking


Extra rule; all locks only when the transaction is completed.
All locks declared at the start is proclaiming, although not applicable to multi-query
transactions. Proclaiming is all locks at the start, strict is all unlocks at the end.

Granularity of locking.
At what level are we locking?Concurrency vs overhead. If you lock entire databases, we have low
concurrency but very low overhead. But if we only lock rows, we also get high concurrency and
overhead. If we can use multi-granularity locking, it will be better. We can determine the granularity
for each individual transaction. Row-lock will be used for things that only edit a single row. Table
lock for things that select all rows in an entire table or a big part of a table. Intention locks →
intention shared/intention exclusive are different things. Before introducing lock, the intention is to
share locks on all coarser levels of granularity. Before a granule g can be locked in a mode, it has to
first obtain the intention lock on all coarser granularities before obtaining the actual lock. Intention
shares apply to the bigger surrounding as well i guess.

Isolation levels.
Some degree of inconsistency will be acceptable for increased concurrency and
performance. the levels are;
- read uncommitted
- read committed
- repeatable read
- serializable
Making certain things possible; dirty read, non-repeat read, phantom rows.
What isolation levels are supported depends on the management system that is being used.
Phantom row; using multi-granularity locking solves the phantom row problem.
SQL has some stuff to help with this; set autocommit on/off, start transaction, commit,
rollback, set transaction isolation level.

Optimistic concurrency control.


Assume that transaction will conflict and lock a lot. → pessi
Optimistically we hope for the best and only do effort if something has actually gone wrong.
Read phase; execute, but do not write back to database, see if we validate and if conflicts
we abort, only to then write.
Backwards-oriented and forwards are two variants for this. Tk trying to commit backwards;
succeeding if it has started last and if there is no overlap between the reads/writes from
previous transactions.
When doing it forwards we compare it to all the running transactions with the same rules
without the “last” rule.

Multi-version concurrency control.


We can serialize more things if we are able to remember old values of other previous
transactions. Multiple versions of the data are stored. Snapshot isolation; Remember the
values at the moment a transaction has started. Reading never needs to be locked.
Transactions now only conflict if they write the same object. Avoid a few anomalies (dirty
read, unrepeatable, phantom) but write skews can occur. It can dodge constraints.

Optimizing database performance.


For each query in the log we can analyze average time and variance, looking at
read-only/updating queries. How do we read and write sets of the different queries and how
do they intersect? Optimization can be done by making sets smaller, changing scheduling of
queries to reduce contention and use a different isolation level for the queries.

You might also like