W2 DBMS
W2 DBMS
WEEK 2
LEC 1: Introduction to Relational Model (Pt.1)
• Attribute
o Attribute Types:
▪ Consider
Students = Roll #, First Name, Last Name, DoB, Passport #, Aadhaar #,
Department relation
▪ The set of allowed values for each attribute is called the domain of
the attribute
• Roll #: Alphanumeric string
• First Name, Last Name: Alpha String
• DoB: Date
• Passport #: String (Letter followed by 7 digits) – nullable
(optional)
• Aadhaar #: 12-digit number
• Department: Alpha String
▪ Attribute values are (normally) required to be atomic; that is
indivisible
▪ The special value null is a member of every domain. This indicates
that the value is unknown
▪ The null value may cause complications in the definition of many
operations
• Schema and instance
o Relation Schema and Instance
▪ A1, A2, …, An are attributes
▪ R = (A1, A2, …, An) is a relation schema. Example: instructor = (ID, name,
dept_name, salary)
Formally, given sets D1, D2, …Dn a relation r is a subset of D1 x D2 x … x Dn . Thus, a relation is a
set of n-tuples (a1, a2, …, an) where each ai ∈ Di
▪ The current values (relation instance) of a relation are specified by a
table
▪ An element t of r is a tuple, represented by a row in a table
Example: instructor ≡ (String (5) x String x String x Number+), where ID ∈ String, name ∈
String, dept_name ∈ String, and salary ∈ Number+
▪ Relations are unordered with unique tuples
• Order of tuples/rows is irrelevant (tuples may be stored in
arbitrary order)
• No two tuples/rows may be identical
• Keys
o Let K ⊆ R, where R is the set of attributes in the relation
o K is the superkey of R if values for K are sufficient to identify a unique tuple
of each possible relation r(R)
▪ Example: {ID} and {ID, name} are both superkeys of instructor
o Superkey K is a candidate key if K is minimal
▪ Example: {ID} is a candidate key for instructor
o One of the candidate keys is selected to be the primary key
o A surrogate key (or synthetic key) is a database is a unique identifier for
either an entity in modeled world or an object in the database
▪ Not derived from application data, unlike a natural (or business) key
which is derived from application data
o Example:
▪ Students = Roll #, First Name, Last Name, DoB, Passport #, Aadhaar #,
Department
▪ Super Key: Roll #, {Roll #, DoB}
▪ Candidate Keys: Roll #, {First Name, Last Name}, Aadhaar #
• Passport # cannot be a key as it is nullable
▪ Primary Key: Roll #
• Can Aadhaar # be a primary key? It may suffice for unique
identification. But Roll # may have additional useful info
▪ Secondary Key/ Alternate Key: {First Name, Last Name}, Aadhar #
▪ Simple Key: Consists of a single attribute.
▪ Composite Key: {First Name, Last Name}
• Consists of more than one attribute to uniquely identify an
entity occurrence
• One or more of the attributes, which make up the key are not
simple keys in their own right
▪ Foreign Key constraint: Value in one relation must appear in another
• Referencing relation
o Enrolment: Foreign keys – Roll #, Course #
• Referenced relation
o Students, Courses
▪ A compound key consists of more than one attribute to uniquely
identify an entity occurrence
• Each attribute, which makes up the key, is a simple key in its
own right
• {Roll #, Course #}
▪ π A, C (r)
▪ r∪s
o Set difference of two relations
▪ Relation r, s
▪ r–s
▪ r∩s
▪ r ∩ s = r – (r – s)
o Joining 2 relations – Cartesian Product
▪ Relation r, s
▪ rxs
• rxs
o Renaming a table
▪ Allows us to refer to a relation (say E) by more than one name ρX (E)
returns the expression E under the name X
▪ Relation r
▪ r x ρs (r)
o Composition of operations
▪ Can build expressions using multiple operations
▪ E.g.: σA=C (r x s)
• rxs
• σA=C (r x s)
• Natural Join: r ⋈ s
• Aggregate Operators
o Can we compute:
▪ SUM
• E.g.: SUMB (σB > 2(r))
▪ AVG
▪ MAX
▪ MIN
• Notes about Relational Languages
o Each query input is a table (or set of tables)
o Each query output is a table
o All data in the output table appears in one of the input tables
o Relational algebra is not Turing complete (a system is called Turing complete
if this system is able to recognize or decide other data-manipulation rule
sets)
• Summary of Relational Operators
LEC 3: Introduction to SQL (Pt.1)
• History of Query Language
o IBM developed Structured English Query Language (SEQUEL) as a part of
System R project. Renamed Structured Query Language (SQL)
o There aren’t any alternatives to SQL for speaking to relational databases
(that is, SQL as a protocol), but there are many alternatives to writing SQL in
the applications
o These alternatives have been implemented in the form of frontends for
working with relational databases. Examples:
▪ SchemeQL and CLSQL, which are probably the most flexible, owing to
their Lisp heritage, they look a lot more like SQL than other frontends
▪ LINQ (in .Net)
▪ ScalaQL and ScalaQuery (in Scala)
▪ SqlStatement, ActiveRecord and many others in Ruby
▪ HaskellDB
▪ …
o There are several query languages that are derived or inspired from SQL. Of
them most effective is SPARQL (SPARQL Protocol and RDF Query Language)
▪ RDL – Resource Description Language
▪ Standardized by W3C Consortium as key technology of semantic web
▪ Versions:
• SPARQL 1.0
• SPARQL 1.1
▪ Used as query languages for several NoSQL systems – particularly the
Graph Databases that use RDF as store
• Data Definition Language (DDL)
o SQL DDL allows the specification of info about relations, including:
▪ The Schema for each relation
▪ The Domain of values associated with each attribute
▪ Integrity Constraints
▪ Also,
• The set of Indices to be maintained for each relation
• Security and Authorization information for each relation
• The Physical Storage Structure of each relation on disk
o Domain Types in SQL
▪ char(n) – Fixed length character string, with user-specified length n
▪ varchar(n) – Variable length character strings, with user-specified
maximum length n
▪ int – Integer (a finite subset of the integers that is machine-
independent)
▪ smallint(n) – Small integer (a machine independent subset of the
integer domain type)
▪ numeric(p,d) – Fixed point number, with user-specified precision of p
digits, with d digits to the right of decimal point. (ex., numeric (3,1)
allows 44.5 to be stored exactly, but not 444.5 or 0.32)
▪ real, double precision – Floating point and double-precision floating
point numbers, with machine-dependent precision
▪ float(n) – Floating point number, with user-specified precision of at
least n digits
o Create Table Construct
▪ An SQL relation is defined using the create table command:
create table r (A1D1, A2D2, …, AnDn),
(integrity-constraint1),
…
(integrity-constraintk));
salary numeric(8,2));
ID char(5),
dept_name varchar(20),
salary numeric(8,2),
o Select Clause
• The select clause lists the attributes desired in the result of a
query
▪ Corresponds to the projection operation of the relational
algebra
• Example: find the names of all instructors:
select name,
from instructor