UNIT2
UNIT2
The relational model represents the database as a collection of relations. Informally, each
relation resembles a table of values or, to some extent, a flat file of records. It is called a flat
file because each record has a simple linear or flat structure.
When a relation is thought of as a table of values, each row in the table represents a collection
of related data values. A row represents a fact that typically corresponds to a real-world entity
or relationship. The table name and column names are used to help to interpret the meaning
of the values in each row.
Example: In STUDENT relation because each row represents facts about a particular student
entity. The column names Name, Student_number, Class, and Major specify how to interpret
the data values in each row, based on the column each value is in. All values in a column are
of the same data type.
In the formal relational model terminology, a row is called a tuple, a column header is called
an attribute, and the table is called a relation. The data type describing the types of values that
can appear in each column is represented by a domain of possible values.
Codd’s rules:
Codd's Rules, also known as Codd's 12 Rules, were formulated by Edgar F. Codd, the father of the
relational model. These rules define what is required from a database management system (DBMS)
to be considered truly relational. Here's a concise overview of Codd's 12 Rules:
These rules serve as a guideline for evaluating the "relational-ness" of a DBMS. It's worth noting
that while these rules are important, few (if any) commercial systems fully comply with all of them.
They remain more of an ideal to strive for rather than a strict set of requirements.
Domains, Attributes, Tuples, and Relations
Domain:
A domain D is a set of atomic values. By atomic we mean that each value in the domain is
invisible as far as the formal relational model is concerned. A common method of specifying
a domain is to specify a data type from which the data values forming the domain are drawn.
It is also useful to specify the name for the domain, to help in interpreting its values.
Some examples of domains follow:
• Usa_phone_numbers: The set of ten-difgit phone numbers valid in United States.
• Social_security_numbers: The set of valid nine-digit social security numbers.
• Names: The set of character strings that represents the names of persons.
• Employee_ages: Possible ages of employees in a company; each must be an
integer value between 15 and 80.
The preceding are called logical definitions of domains. A data type or format is also
specified for each domain. For example, the data type for the domain Usa_phone_numbers
can be declared as a character string of the form (ddd)ddddddd, where each d is a numeric
(decimal) digit and the first three digits form a valid telephone area code. The data type for
Employee_ages is an integer number between 15 and 80.
Attribute:
An attribute Ai is the name of a role played by some domain D in the relation schema R.
D is called the domain of Ai and is denoted by dom(Ai).
Tuple:
Mapping from attributes to values drawn from the respective domains of those attributes.
Tuples are intended to describe some entity (or relationship between entities) in the miniworld
Example: a tuple for a PERSON entity might be
{ Name -- ”smith”, Gender--> Male, Age --> 25 }
Relation:
A named set of tuples all of the same form i.e., having the same set of attributes.
Relation schema:
strings, and variable-length strings are also available, as are date, time, timestamp, and
money, or other special data types.
1.1.2 Key Constraints and Constraints on NULL Values
A key is a set of one or more attributes that can uniquely identify each row in a table. A key
not only identifies the rows of a table but also relates two or more tables.
Different Types of Keys:
1) Super Key
2) Candidate Key
3) Primary Key
4) Foreign Key
5) Secondary Key/Alternate Key
6) Unique Key
7) Composite Key
8) Surrogate Key
9) Partial Key
1) Super Key: Super Key is an attribute (or a set of attributes) that uniquely identify a
tuple i.e. an entity in entity set.
It is a superset of Candidate Key, since Candidate Keys are selected from super key.
Example:
2) Candidate Key: Each table has only a single primary key. Each relation may have one
or more candidate key. One of these candidate key is called Primary Key. Each
candidate key qualifies for Primary Key. Therefore candidates for Primary Key is
called Candidate Key.
Candidate key can be a single column or combination of more than one column. A
minimal super key is called a candidate key.
Example:
Above, Student_ID, Student_Enroll and Student_Email are the candidate keys. They
are considered candidate keys since they can uniquely identify the student record.
3) Primary Key: It is an attribute or set of attributes that uniquely identify an entity
(row) in the entity set (table). The main difference between the primary key and the
candidate key in that is primary key does not contain NULL values.
Primary
Key must be UNIQUE and NOT NULL.
Example:
4) Foreign Key: A foreign key is a set of attributes in a table that refers to the primary
key of another table. The foreign key links these two tables.
Example:
5) Secondary Key/Alternalte Key: A primary key is the field in a database that is the
primary key used to uniquely identify a record in a database. A secondary key is an
additional key, or alternate key, which can be use in addition to the primary key to
locate specific data.
Secondary Key is the key that has not been selected to be the primary key. However,
it is considered a candidate key for the primary key.
Therefore, a candidate key not selected as a primary key is called secondary key.
Candidate key is an attribute or set of attributes that you can consider as a Primary
key. Note: Secondary Key is not a Foreign Key.
Example 1:
Above, Student_ID, Student_Enroll and Student_Email are the candidate keys. They
are considered candidate keys since they can uniquely identify the student record.
Select any one of the candidate key as the primary key. Rest of the two keys would be
Secondary Key.
If you selected Student_ID as primary key, therefore Student_Enroll and
Student_Email will be Secondary Key (candidates of primary key).
Example 2:
Example 2:
Above, our composite keys are StudentID and StudentEnrollNo. The table has two
attributes as primary key.
Therefore, the Primary Key consisting of two or more attribute is called Composite
Key.
8) Surrogate Key: A Surrogate Key’s only purpose is to be a unique identifier in a
database, for example, incremental key.
Surrogate Key has no actual meaning and is used to represent existence. It has an
existence only for data analysis.
Example: The surrogate key is
Key in the <ProductPrice> table.
Here, using partial key Emp_no, we can not identify a tuple uniquely but we can
select a bunch of tuples from the table
Superkey
A superkey SK specifies a uniqueness constraint that no two distinct tuples in any state r of R
can have the same value for SK. Every relation has at least one default superkey the set of all
its attributes.
Key
A key K of a relation schema R is a superkey of R with the additional property that removing
any attribute A from K leaves a set of attributes K that is not a superkey of R anymore.
Hence, a key satisfies two properties:
1. Two distinct tuples in any state of the relation cannot have identical values for
(all) the attributes in the key. This first property also applies to a superkey.
2. It is a minimal superkey that is, a superkey from which we cannot remove any
attributes and still have the uniqueness constraint in condition will hold.This
property is not required by a superkey.
Example: Consider the STUDENT relation
The attribute set {Ssn} is a key of STUDENT because no two student tuples can have
the same value for Ssn.
Any set of attributes that includes Ssn for example, {Ssn, Name, Age} is a superkey.
The superkey {Ssn, Name, Age} is not a key of STUDENT because removing Name or
Age or both from the set still leaves us with a superkey.
In general, any superkey formed from a single attribute is also a key. A key with multiple
attributes must require all its attributes together to have the uniqueness property.
Candidate Key
A relation schema may have more than one key. In this case, each of the keys is called a
candidate key.
Example: The CAR relation has two candidate keys: License_number and
Engine_serial_number
Primary Key
It is common to designate one of the candidate keys as the primary key of the relation. This is
the candidate key whose values are used to identify tuples in the relation. We use the
convention that the attributes that form the primary key of a relation schema are underlined.
Other candidate keys are designated as unique keys and are not underlined.
Another constraint on attributes specifies whether NULL values are or are not permitted.
For example, if every STUDENT tuple must have a valid, non-NULL value for the Name
attribute, then Name of STUDENT is constrained to be NOT NULL.
Relational Algebra & relational calculus:
Certainly. Relational algebra and relational calculus are formal languages used to manipulate and
query relational databases. They form the theoretical foundation for SQL and other query
languages. Let's break them down:
Relational Algebra:
Relational Calculus:
1. Definition: A declarative language that describes what data to retrieve, not how to
retrieve it.
2. Types:
o Tuple Relational Calculus (TRC)
o Domain Relational Calculus (DRC)
3. Key components:
o Variables
o Atomic formulas
o Logical connectives (∧, ∨, ¬)
o Quantifiers (∃ for "exists", ∀ for "for all")
4. Basic structure: {T | P(T)} where T is a tuple variable and P(T) is a predicate
Comparison:
1. Expressiveness: Relational algebra and relational calculus are equivalent in power. Any
query that can be expressed in one can be expressed in the other.
2. Usage: Relational algebra is closer to how queries are actually executed, while relational
calculus is more abstract and closer to natural language.
3. Implementation: Most database systems use relational algebra as the basis for query
optimization and execution.
4. Learning curve: Relational algebra is often considered easier to learn initially, while
relational calculus can be more intuitive for complex queries once mastered.