Database Normalization (AS)
Database Normalization (AS)
Data storage, retrieval and update. The DBMS allows users to store, retrieve and
update information as easily as possible. These users are not necessarily computer
experts and do not need to be aware of the internal structure of the database or how to
set it up. Data from the database can be retrieved by using queries.
Creation and maintenance of the data dictionary. This is a file containing the
details of the structure of the database including details of tables, fields, field types,
field lengths and any other characteristics.
Field Name Data Type Field Size for display Description Example
Employee
Integer 10 Unique ID of each employee 1645000001
Number
The data dictionary is automatically updated by the database management system when any
changes are made in the database. This is known as an active data dictionary as it is self updating
Passive Data Dictionary
A passive data dictionary is maintained separately to the database whose contents are stored in
the dictionary. That means that if the database is modified the database dictionary is not
automatically updated as in the case of Active Data Dictionary.
So, the passive data dictionary has to be manually updated to match the database. This needs
careful handling or else the database and data dictionary are out of sync.
Managing the facilities for sharing the database. Many databases need a multi-
access facility. Two or more people must be able to access the database
simultaneously and to update records without causing a problem.
Backup and recovery. Information in the database must not be lost in the event of
system failure.
security. Setting access rights for usesr. The DBMS must check user passwords and
allow appropriate privileges.
DBMS features
Data dictionary- part of the database hidden from everyone except
DBA which contains metadata (data about tables, attributes and data
about how data is organised in the physical storage)
Index table for improving performance: it’s a secondary table
contains the attribute values (which are unique Eg. Primary key or a
a secondary key which was a candidate key) and pointers to the
corresponding tuples in the original table. Searching the index table
quicker than searching the full table
Security issues: access rights, implementing back up procedures
Advantages of DBMS
Data independence
The data and the programs using the data are stored separately. Any changes to the
structure of a database, for example adding a field or a table, will not affect any of
the programs that access the data. In a file-based system. a small change in a file
structure may require a considerable amount of reprogramming to all the programs
that access that file.
Data consistency
When the database is well structured, each data item stored only once, however many
applications can use it without any danger of an item, if there are many copies of
same data such as an employee’s address, being updated in one place and not in
another this happened the data would not be consistent.
No data redundancy
Ease of use
The DBMS provides easy-to-use queries that enable use obtain instant answers. In a
file-based system a query would have to be specially written by a programmer.
For example, a range check could be set up for the date of birth of a new student in a
school to make sure that the student’s age is within appropriate limits. Checks, such
as parity and checksums, should be used to make sure that stored data has not
become corrupted.
Greater data security
The DBMS will ensure that only authorised users are allowed access to the data.
Different users can have different access privileges, depending on their needs. In a
file-based system using a number of files it is difficult to control access. Relational
databases provide different methods of database security.
Data integrity: the data that is stored in the database is reasonable as it has not been
accidentally or maliciously altered.
Data independence: data and programs are stored separately. Changes to the
structure of the data (for example. by the addition of an extra field in a table) do not
result in all programs having to be rewritten.
Tuple
A raw(record) in a relation is referred to as tuple. A tuple is a set or collection of
atomic values
Attributes
For example, for a person the attributes can be: Name, DateOfBirth. Informal terms
used to define an attribute are: column in a table or field in a data file.
Candidate keys
A candidate key is a unique identifier for the tuples (records) of a relation (tables).
By definition, every relation has at least one candidate key (the first property or
attribute of a relation).
In practice, most relations have multiple candidate keys. If a relation has more than
one candidate key, the one that is chosen to represent the relation is called the
primary key, and the remaining candidate keys are called alternate keys.
Primary key
A primary key is a unique attribute (or attribute combination) of the tuples of the
relation. It is a candidate key that is chosen to represent the relation in the database
and to provide a way to uniquely identify each tuple of the relation.
A database relation always has a primary key. Primary key is the most important
feature of a relational database concept
Foreign keys
An attribute in one table that refers to the primary key in another table.
A foreign key is an attribute (or attribute combination) in one relation or table whose
values are required to match those of the primary key of some relation or table to set
relation between these two tables.
Referential integrity
When a relationship is between two tables are created using primary and foreign key,
the DBMS will prevent any entry of an attribute which is not present in the main
entry table.
In a database design, a table would be given a name with the attributes names listed
in the bracket after the table name
Entity-relationship modelling.
The data base designing can be done using Entity- Relationship (ER)
diagram. First created and used by system analyst then passed to the
database designer.
Eg. An agency needs a database to handle booking for bands. Each band
has a number of members. Each booking is for a venue. Each booking
might be for one or more bands.
Step 1: Choose the entities
Choose Booking, Band, Member, Venue (these are the nouns in the statement.
Ignore agency because it’s only one Agency)
1:1(One-to-One)
1:M(One-to-Many)
M:1
M:M
1. A band can have many members and a member can be in one band(M:1)
Member Band
A member must be in (belongs to) one of the bands band and a band must have
many members(M:1)
Member Band
2. A booking must have a venue associated with and venue can have many
bookings or cannot have any booking(M:1)
Booking Venue
3. A band can have many bookings or no booking (May be it’s a new band) and
booking can be for many bands or one band. (M:M)
Band Booking
The full ER diagram for agencies booking can be drawn as shown below.
If a relationship is 1:M, no further refinement is needed. The entity at the many end
needs to have a foreign key referencing the primary key of the entity at the
one(relationship) end.
In the above example it can be a band-booking entity created to solve the M:M
problem. The logical entity model would contain the link entity as shown below.
With the link entity we can have two foreign keys. One referencing the primary key
of band and one referencing the primary key of booking.
Each entity in the ER diagram is a table in the relational database. So, the primary
keys and the foreign keys can be chosen as shown below.
Data are actually stored as bits, or numbers and strings, but it is difficult to work with
data at this level.
Schema: Description of data at some level. Each level has its own schema.
physical
conceptual
external
The physical schema describes details of how data is stored: files, indices, etc. on the
random-access disk system. It also typically describes the record layout of files and type
of files (hash, b-tree, flat).
Problem:
In the relational model, the conceptual schema (description of data) presents data
as a set of tables.
The DBMS maps data access between the conceptual to physical schemas automatically.
Examples:
Information that can be derived from stored data might be viewed as if it were stored.
Applications are written in terms of an external schema. The external view is computed
when accessed. It is not stored. Different external schemas can be provided to different
categories of users. Translation from external level to conceptual level is done
automatically by DBMS at run time. The conceptual schema can be changed without
changing application:
Data Model
Schema: description of data at some level (e.g., tables, attributes, constraints, domains)
D
a
t
a
Independence
Database Normalization
While designing a database out of an entity—relationship model, the main problem
existing in that “raw” database is redundancy. Redundancy is storing the same data
item in more than one place. A redundancy creates several problems like the
following:
1. Extra storage space: storing the same data in many places takes large amount
of disk space.
2. Entering same data more than once during data insertion.
3. Deleting data from more than one place during deletion.
4. Modifying data in more than one place.
5. Anomalies may occur in the database if insertion, deletion, modification etc
are not done properly. It creates inconsistency and unreliability in the database.
This is a step by step process of removing different kinds of redundancy and anomaly
at each step. At each step a specific rule is followed to remove specific kind of
impurity in order to give the database a slim and clean look.
Process:
Example
ORDER(Num, CustName, City, Country, ProdID, Description)
Step 2: 1NF
Transform a table of unnormalised data into first normal form (1NF)
2. Remove repeating attributes to a new table together with a copy of the key
from the UNF table
4. The key from the unnormalised table always becomes part of the key of the
new table. A compound key is created. The value for this key must be unique
for each entity occurrence.
Example:
Table 3.6.2 has repeating groups in the attributes ProdID and Description. We
remove the repeating groups by:
● linking the new table to the original table ORDER with a foreign key.
2NF
Transform data in first normal form (1NF) into second normal form (2NF)
Rule: A table is in second normal form if any partial dependencies have been
removed. That is, every non-key attribute must be fully dependent on all of
the(entire) primary key. Remove any non-key attributes that only depend on part of
the table key to a new table
Process:
If yes, remove attribute to new table with a copy of the part of the key it is
dependent upon. The key it is dependent upon becomes the key in the new
table. Underline the key in this new table.
If no, check against other part of the key and repeat the above process.
If the non-key attribute is fully dependent on the compound key or not dependent on
either part of the key, keep the attribute in current table.
Example
In our ORDER-PRODUCTS table, Description depends only on ProdID and not on
Num. Hence the non-key attribute (Description) is not dependent on all of the
primary key. We say that Description is dependent on ProdID or, we can say ProdID
determines Description (ProdID →Description).
● linking the new table to the ORDER-PRODUCTS table with a foreign key.
Tables 3.6.5 and 3.6.6 show the data in second normal form.
3NF
To transform data in second normal form (2NF) into third normal form (3NF)
Rule: Remove to a new table, any non-key attributes that are more dependent
on other non-key attributes than the primary key, is called non-key dependency
(transitional dependency).
Third normal form (like second normal form) is concerned with the non-key
attributes. To be in 3NF, there must be no dependencies between any of the non-key
attributes.
Ignore tables with zero or only one non-key attribute (these go straight to 3NF with
no conversion).
Process:
Example:
There is a problem with the original ORDER table. City determines the Country
(City → Country, that would mean, A Country can have many cities, so when
store country and city the name country can be repeated so it cannot be made a
primary key. City becomes primary key and that determines the country)
So we have two non-key attributes which are dependent. This means that ORDER is
not in 3NF. Tables 3.6.7 and 3.6.8 show the data in third normal form.
Exercise:
Write the unnormalised form and normalise table using first, second and third normal
form.
SQL
Structure Query Language(SQL) is a programming language used for storing and
managing data in RDBMS. SQL was the first commercial language introduced for
E.F Codd's Relational model. Today almost all RDBMS(MySql, Oracle, Infomix,
Sybase, MS Access) uses SQL as the standard database language. SQL is used to
perform all type of data operations in RDBMS.
SQL statements are divided into two major categories: data definition language
(DDL) and data manipulation language (DML). Both of these categories contain far
more statements than we can present here, and each of the statements is far more complex
than we show in this introduction.
Create Table
NOT NULL: This is optional and can be omitted. A column can be defined as NOT
NULL. Use of this option means that the data must include a value for that column. If
a column is defined as NOT NULL, and no value is given when inserting data, then
an error will occur. The row will not be stored.
PRIMARY KEY: This part of the command specifies the column, or columns, that
make up the primary key. In a relational database all tables must have one, and only
one, primary key (PK) defined. Should a table have more than one column that
makes up the primary key then a comma separates them, eg:
FOREIGNKEY: This part of the command specifies the column, or columns, that
make up the Foreign key (FK). In a relational database a foreign key is the primary
key of the parent table that links or join tables together and enforce referential
integrity between tables
Can include letters and numbers but must begin with a letter.
Cannot include spaces. However, if you wish to represent a space use an
underscore (_). For example, employee_ number.
Cannot be a reserved word used in SQL, eg TABLE.
Cannot be the name of a SQL command, eg CREATE.
DROP TABLE
If you totally mess things up and want to start over, you can always get rid of
any object you’ve created with a drop statement. The syntax is different for
tables and constraints.
This command will remove the mentioned table from the database. No confirmation
message is given: the table is simply removed. Therefore use the command with care.
For example, to remove the table employee from the database, use the SQL
command:
Alter TABLE
The alter table statement may be used as you have seen to specify primary and
foreign key constraints, as well as to make other modifications to the table structure.
Key constraints may also be specified in the CREATE TABLE statement.
The first command will add the specified column to the specified table. For example,
it has been decided to allocate a company car to some employees. To add the
registration number of the car to the employee table, use the SQL command:
The specified column will be added to the end of the table definition and the
specified column will be added as an optional column
The second command will remove the specified column from the specified table. For
example to remove the column car_registration added in the previous step use the
SQL command:
Renaming Tables
The syntax of this command is:
For example to rename the table employee _copy to employee _Table , use the SQL
command:
In this section we will look at the INSERT command in detail. However, the
SELECT, UPDATE and DELETE commands will be discussed in the next section.
Insert Statement
To enter a value for every column in a row, for example, using the student table,
created as follows:
, date_of_birth DATE
, dept_number CHAR(5)
);
Insert Example
To enter all the data for the student Miss Janet Smith into the student table, the SQL
INSERT command could be:
INSERT INTO student
The values entered must be in the same order as the defined order of the columns in
the table.
There must be a value for every column in the table (any unknown values being
entered as the SQL reserved word NULL. See below and the discussion on the next
page).
All character strings and dates are enclosed in single quotes ('....').
When entering dates, they are specified as 'dd-mmm-yyyy' where mmm is the first
three characters of the month, ie Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct,
Nov, Dec. For example: '07-Sep-1949'.
If a column is optional (ie it is not specified as NOT NULL), and you do not wish to
enter a value, then use the SQL reserved word NULL when entering its value. For
example:
WHERE
The where clause sets a conditional statement for your queries in which the database
search for rows that meet your specific where condition. A typical query will have
the following syntax.
A conditional statement has 2 parts, the left side is your table column and the right
side is the condition to be met. Example: WHERE table_column(s) = condition, for
example:
FROM employee
Select Examples
The order by statement allows for table column assortment. It allows for ascending or
descending lists of your table column values permitting SQL to reorder your table
rows for the purpose of viewing. A typical query will have the following syntax.
For example, list all employees in the property department. Return the results in
ascending order of forename within ascending order of surname.
SELECT *
FROM employee
WHERE depno = 22
List all employees in the company. Return the results in descending order of date of
birth within ascending order of department number.
SELECT *
FROM employee
BETWEEN
The BETWEEN ... AND operator selects a range of data between two values. These
values can be numbers, text, or dates. A typical query will have the following syntax.
FROM student
Note: The between clause returns all the rows that meet the criteria 'greater than and
equal to' the lower limit and 'less than and equal to' the higher limit. Therefore, in the
above example, staffs who have a salary of exactly £15,000 or £25,000 will be
selected as well as all the salaries in between those ranges.
LIKE
When dealing with strings, sometimes you do not want to match on exact strings like
='Blue', but instead on partial strings, substrings, or particular patterns. This could
allow you, for instance, to find all cars with a colour starting with 'B'. The LIKE
operator provides this functionality. The LIKE operator is used in place of an '=' sign.
In its basic form it is identical to '='. For instance, both of the following statements
are identical:
The power of LIKE is that it supports two special characters, '%' and '-'. . Whenever
there is an '-' character in the string, any character will match. Whenever there is a '%'
character in the string, it means any number of characters. A typical query will have
the following syntax.
SQL supports two wildcard operators in conjunction with the LIKE operator
which are explained in detail in the following table.
The percent sign represents zero, one or multiple characters. The underscore
represents a single number or a character. These symbols can be used in
combinations.
Let us take a real example, consider the CUSTOMERS table having the records
as shown below.
+----+----------+-----+-----------+----------+
Following is an example, which would display all the records from the
CUSTOMERS table, where the SALARY starts with 200.
IS NULL
In SQL, the data in a column can hold a special value known as NULL. This must
not be confused with spaces or zero. It means that this column has no value assigned,
ie it is unknown.
In order to check whether a value in a column has a null value then the IS
NULL clause is used in the WHERE statement.
For example, to select all students whose date of birth has not yet been entered, a
SQL SELECT command could be:
SELECT title, forename, surname
FROM student
Conversely, if you wish to select all students whose date of birth has been entered,
then the IS NOT NULL clause is used
FROM student
UPDATE
To change the values in one or more columns you use the SQL UPDATE command.
The WHERE condition is optional but, if omitted, all the rows in the table are
changed. Normally, this is not desired. It is a very common mistake, so take care
when using this command.
For example:
UPDATE employee
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
+----+----------+-----+-----------+----------+
The following query will update the ADDRESS for a customer whose ID
number is 6 in the table.
To delete rows from a table you use the SQL DELETE command.
The WHERE condition is optional but, if omitted, all the rows in the table are
deleted. Normally, this is not desired. It is a very common mistake, so take care when
using this command.
If the row being deleted is referenced by a foreign key in another table then the delete
will fail (unless the on delete cascade constraint is specified in the other table). For
example:
DELETE FROM Customers WHERE CustomerName='Alfred
s Futterkiste';
or:
SELECT column_name(s)
FROM table1
JOIN table2
ON table1.column_name=table2.column_name;
The following SQL statement will return all customers with orders:
Example
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
INNER JOIN Orders
ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;
Note: The INNER JOIN keyword selects all rows from both tables as long as there
is a match between the columns. If there are rows in the "Customers" table that
do not have matches in "Orders", these customers will NOT be listed.