0% found this document useful (0 votes)
2 views44 pages

Database design, Normalization and SQL

The document discusses relational database design, focusing on creating schemas that minimize redundancy and facilitate easy information retrieval. It outlines design goals, types of anomalies (insert, update, delete), functional dependencies, and the normalization process to achieve third normal form (3NF) to prevent update anomalies. The normalization process is explained step-by-step, emphasizing the importance of identifying primary keys and ensuring all non-key attributes are fully dependent on these keys.

Uploaded by

alexkagwanja094
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views44 pages

Database design, Normalization and SQL

The document discusses relational database design, focusing on creating schemas that minimize redundancy and facilitate easy information retrieval. It outlines design goals, types of anomalies (insert, update, delete), functional dependencies, and the normalization process to achieve third normal form (3NF) to prevent update anomalies. The normalization process is explained step-by-step, emphasizing the importance of identifying primary keys and ensuring all non-key attributes are fully dependent on these keys.

Uploaded by

alexkagwanja094
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Relational Database Design

The goal of a relational data base design is to generate a set of relation schema that allows us to
store information without unnecessary redundancy. It should allow us to retrieve information
easily. A relational data base design may lead to:
( i ) Repetition of data
( ii ) Inadequacy to represent certain information.

Design Goals
(a) Avoid redundant data.
(b) Ensure that relationships among attributes are represented.
(c) Facilitate the checking of updates for violation of database integrity constraints.

AnomaIies In DAtAbAses
Database anomalies, are really just unmatched or missing information caused by limitations or f
laws within a given database. Databases are designed to collect data and sort or present it in
specific ways to the end user. Entering or deleting information or update new record can cause
some problems leading the database to inconsistent state. There are three types of anomalies.
They are:
1. Insert Anomalies
2. Update Anomalies
3. Delete Anomalie

1. Insert Anomalies •

The inability to insert part of information into a relational schema due to the unavailability of
part of the remaining information is called Insert Anomalies.

Example: If there is a guide having no student registered under him, then we cannot insert the
guide’s information in the schema project.

2. Update Anomalies
Updating a relation schema with redundant data may lead to update anomalies.
Example: If a person changes his address, the updating should be carried out wherever the
copies occur. If it is not updated properly then data inconsistency arises.
3. Delete Anomalies
If the deletion of some information leads to loss of some other information, then we say
there is a deletion anomaly.
Example: If a guide guides one student and if the student discontinues the course then the
information about the guide will be lost.

Redundant Information Storing the same information several times is called redundancy.
Redundancy leads to
• Increase in size of the database
• Wastage of storage space
• Inconsistencies Consider the following student table

Functional Dependency
A functional dependency is defined as a constraint between two sets of attributes in a relation
from a database. Functional dependencies are the relationships among the attributes within a
Relation. Functional dependencies provide a formal mechanism to express constraints between
attributes. If attribute A functionally depends on attribute B, then for every instance of B you will
know the respective value of A.
notation of functional Dependency The notation of functional dependency is A → B.

The meaning of this notation is:


1. “A” determines “B”
2. “B” is functionally dependent on “A”
3. “A” is called determinant
4. “B” is called object of the determinant
Example: Student_ID → GPA, the meaning is the grade point average (GPA) can be determined if
we know the student ID.
A → B does not imply B → A
For example, in a student relation the value of an attribute “GPA” is known then the value of an

A functional dependency A → B is said to be trivial if B ⊆ A.


attribute “Student_ID” cannot be determined.

Compound Determinants
If more than one attribute is necessary to determine another attribute in an entity, then such a
determinant is termed as composite determinant. For example, the internal marks and the
external marks scored by the student determine the grade of the student in a particular subject.
Internal mark, external mark → grade.
Since more than one attribute is necessary to determine the attribute grade it is an example of
compound determinant.

Types of Functional Dependency


There are three types of functional dependency:
(a) Full functional dependency
(b) Partial functional dependency
(c) Transitive functional dependency
(a) Full functional dependency
In a relation R, X and Y are attributes. X functionally determines Y. Subset of X should not
functionally determine Y.

In the above example marks is fully functionally dependent on student_no and course_no together and
not on subset of {student_no, course_no}. This means marks cannot be determined either by
student_no or course_no alone. It can be determined only using student_no and course_no together.
Hence marks are fully functionally dependent on {student_no, course_no}.

(b) Partial functional dependency:

Attribute Y is partially dependent on the attribute X only if it is dependent on a subset of attribute X. For
example, course_name, Instructor_name are partially dependent on composite attributes {student-no,
course_no} because course_no alone defines course_name, Instructor_name.

(c) Transitive functional dependency: X, Y and Z are 3 attributes in the relation R.

For example, grade depends on marks and in turn mark depends on {student_no course_no}, hence
Grade depends transitively on {student_no & course_no}. 6.4.4

uses of functional Dependency

Functional dependency is used in the following cases:

1. Test relations to see if they are legal under a given set of functional dependencies. If a relation r is
legal under a set F of functional dependencies, we say that r satisfies F.

2. To specify constraints on the set of legal relations. We say that F holds on R if all legal relations on R
satisfy the set of functional dependencies F.

RATIONALISING DATA USING NORMALISATION

Normalization is a formal approach to the analysis of data requirements that relies on there being some
document or other existing record (such as a program file) that lists the items of data that are to be
recorded. This set of data items is then analyzed using this process, which is known as relational data
analysis. The result is a set of relations in what is known as third normal form.
The relational data analysis (or normalisation) process is used to develop a set of theoretical groupings
of attributes. In this process, we determine the identifying attributes (the ‘key attributes’) – those
attributes that contribute to the unique identification of instances of the entity types or object classes –
and the non-identifying attributes (the ‘non-key attributes’) – the remaining attributes – and the
relationships between them. One of the fundamental concepts of the relational model of data is that if
we know the value of the key attributes for an instance of an entity type or object class we are able to
determine the values of the non-key attributes. For example, in our vehicle hire business, if we have a
unique identification number for each of our vehicles (say, the vehicle registration number AB63 MTO)
then we can determine the model (VW Polo S 1.0 3dr manual 2014) and colour (red) of the vehicle. We
cannot do this in reverse – knowing the model of a vehicle does not mean that we can determine the
vehicle registration number.

Relational data analysis is used to minimise the possibility of ‘update anomalies’. These update
anomalies can occur whenever there is duplication of data, which can lead to the possibility of one
instance of the data being changed (or updated) but not another. Update anomalies can occur
whenever data is updated, created or deleted. For example, if a customer’s address is held in more than
one place and they notify us that they are moving, all instances of the customer’s address must be
updated to the new address or some mail could still go to the old address of the customer – this was a
common problem with banks in the United Kingdom until a few years ago.

THE RULES OF NORMALISATION

We use the process of relational data analysis – the development of a set of relations in third normal
form – to develop a logical database design in which there will be no update anomalies when data is
input into the database or later updated. In general terms, this means that for any piece of information
there is only one place in the database where the data representing the information can be stored and
that place is unambiguously recognised. This process of relational data analysis allows us to confirm that
we are associating attributes that are all about one thing (person, product, and so on) or concept
(promotion, order, account, transaction and so on) of interest to the business.

In a series of additional papers published in late 1970 and 1971, Edgar Codd proposed three normal
forms known as first normal form (commonly abbreviated to 1NF or FNF), second normal form (2NF or
SNF) and third normal form (3NF or TNF). These three normal forms represent a process through which
the data design can evolve to what is known as a normalised result. Later developments have added
further normal forms, such as Boyce-Codd normal form (a stricter form of third normal form), fourth
normal form, fifth normal form and sixth normal form, but we are only going to consider the first three
normal forms. For most practical purposes, data needs to be in third normal form for it to be considered
‘normalised’.16 The term ‘third normal form’ is used to indicate that data has been normalised and has
no particular meaning other than that the checks inherent in the normalisation process have been
applied to the data.

The definitions of these first three normal forms are:

A relation is in first normal form (1NF or FNF) if and only if all underlying domains contain atomic
values only.
A relation is in second normal form (2NF or SNF) if and only if it is in first normal form and every
non-key attribute is fully dependent on the primary key.

A relation is in third normal form (3NF of TNF) if and only if it is in second normal form and every
non-key attribute is non-transitively dependent on the primary key.

THE NORMALISATION PROCESS

The normalisation process is a bottom-up process. We start with a group of data items and then work
through the process until we have a set of relations in third normal form. The data items are normally
found on the forms, reports or input screens which are used in the business.

Figure below shows a paper-based staff record form used by the human resources department of our
vehicle hire business.

Figure 6.2 The staff record form

Studying this form, we can identify a number of data items. These are:

the name of the employee;

the staff number of the employee;


the date of birth of the employee;

the National Insurance number of the employee;

the telephone number of the employee;

the address of the employee;

any disabilities of the employee;

the start date of the employment of the employee;

the number of the branch where the employee is employed;

the location of the branch where the employee is employed;

the telephone number of the branch where the employee is employed;

for each of the emergency contacts of the employee:

the name of the emergency contact;

the relationship of the emergency contact to the employee;

the address of the emergency contact;

the telephone number of the emergency contact;

for each qualification held by the employee:

the title of the qualification;

the qualification award date of the qualification;

the awarding body of the qualification;

the renewal date of the qualification.

One of these data items is selected as a unique identifier which will eventually become the primary key
of one of our relations. In this case, an appropriate unique identifier is the staff number of the
employee.

We call this initial collection of data items the ‘un-normalised form’ (often abbreviated to UNF). These
are placed on our ‘normalisation form’

The accepted convention is that all of the entries on this form are in lower case. Beside each entry is a
level indication – ‘1’ or ‘2’. All entries for which there are groups of data items that can be repeated –
emergency contacts and qualification – are indicated with the level 2. These level 2 data items are
known as repeating groups. All other data items (the ‘top of the form’ data items) are at level 1.
FIRST NORMAL FORM

The first stage in our normalisation process is to move from this un-normalised form to produce a set of
relations that are in first normal form. A relation is in first normal form if and only if all underlying
domains contain atomic values only, that is, if all the values taken by the attributes of that relation are
atomic or scalar values. The attributes are said to be single-valued. This is to comply with the rule that
there must not be any multiple-valued attributes in a relation. To move to first normal form we need to
remove the repeating groups from the initial ‘relation’.

Each of the repeating groups then becomes a relation in its own right. In our example, we have two new
relations: one for the emergency contact details; and one for the qualifications held by the employee.

For each of these new relations we need to identify a primary key. For the employee’s emergency
contacts, the name is sufficient to uniquely identify one contact among the contacts recorded for a
single employee. But we need to uniquely identify an emergency contact among all the emergency
contacts recorded in the company. In any but the smallest company it is possible for two employees to
have an emergency contact called John Davies, and for them to be two different people. The name of
the emergency contact is, therefore, insufficient to uniquely identify an emergency contact within the
company. What we really need to uniquely identify an emergency contact is both the name of the
contact and an indication, such as the staff number, of the employee for whom this person is an
emergency contact. The primary key of this relation is, therefore, the combination of ‘staff number’ and
‘emergency contact name’.

Our first normal form relations are shown in Figure 6.4. These are now relations as they comply with all
the rules we listed earlier. Each of the data items listed is now an attribute.

Both of our new relations have a primary key identified (in this case by underlining the relevant
attributes). Both of these primary keys comprise at least two attributes, the primary key of the main
relation and sufficient extra attributes to uniquely identify each instance of the real-world thing that the
relation represents. As explained above, for the emergency contacts relation, this is the combination of
the employee’s ‘staff number’ attribute and the ‘emergency contact name’ attribute.

For the relation representing the employee’s qualifications there are three elements to the primary key
since three items of data are required to uniquely identify each employee’s qualification. One of these is
the employee’s ‘staff number’ attribute, as before. There is also the ‘qualification title’ attribute. But the
combination of the employee’s staff number and the title of the qualification is insufficient to uniquely
identify a qualification held by an employee. Inspection of the form shows that Rachel Davies holds two
first aid qualifications, one of which is now out of date. So, to uniquely identify a qualification held by an
employee we also need another item of information, the date that the qualification was awarded. Hence
the primary key of the relation representing the qualifications held by employees is the combination of
the ‘staff number’, ‘qualification title’ and ‘qualification award date’ attributes.

When the primary key of a relation appears as an attribute in another relation it is known as a foreign
key in the other relation. In this case, ‘staff number’ is a foreign key in our two new relations.
SECOND NORMAL FORM

The next stage of our relational data analysis process is to move our relations to second normal form
(2NF). For a relation to be in second normal form, it has to be in first normal form and, in addition, it
must meet the condition that every attribute that is not part of the primary key is dependent on the
whole of the primary key. In other words, there must be no ‘part-key dependencies’. More formally, a
relation is in second normal form if and only if it is in first normal form and every non-key attribute is
fully dependent on the primary key.

We saw the concept of dependence earlier. If we have two attributes, ‘staff number’ and ‘name’, and we
know the value of the attribute ‘staff number’, we can determine the value of the attribute ‘name’. For
example, given the staff number UK365, we know that the name of that employee is Miss Rachel Sarah
Davies. We say that ‘name’ is dependent on ‘staff number’.

We achieve second normal form by reviewing every attribute that is not part of the primary key in the
first normal form relations to see if any of those attributes are dependent on one or more parts of a
multiple-part primary key but not the whole primary key. If so, then those attributes need to be
removed.

Our main first normal form relation has ‘staff number’ as its primary key. This is a single-part primary key
(equivalent to a simple identifier) and, therefore, each of the attributes in that relation must be
dependent on the whole of the primary key. This first normal form relation is, therefore, also in second
normal form.

Let us now consider the relation for emergency contacts. This relation has a multiple-part primary key of
two attributes – ‘employee staff number’ and ‘emergency contact name’ – and three non-key attributes
– ‘emergency contact relationship’, ‘emergency contact address’ and ‘emergency contact telephone
number’. If we know the combined values of ‘employee staff number’ and ‘emergency contact name,’
we can determine the values of the non-key attributes. The test is to take each non-key attribute in turn
and see if it is possible to determine its value using only a part of the primary key.

For ‘emergency contact relationship,’ we need to know the values of both ‘employee staff number’ and
‘emergency contact name’ to know its value. Knowing just the value of ‘employee staff number’ is
insufficient. The employee with staff number UK365 has both a father and a fiancé as emergency
contacts. Similarly knowing just the value of ‘emergency contact name’ is insufficient; it is possible that
more than one employee has an emergency contact called Mr John Davies.

The same is true for both ‘emergency contact address’ and ‘emergency contact telephone number’. This
relation for emergency contacts is, therefore, also in second normal form.

Let us now consider the other new relation, that for employee qualifications. This has a multiple-part
primary key of three attributes – ‘employee staff number’, ‘qualification title’ and ‘qualification award
date’ – and two non-key attributes – ‘qualification awarding body’ and ‘qualification renewal date’. We
need to know the values of all three elements of the primary key to determine the value of ‘qualification
renewal date’. The same is not true for ‘qualification awarding body’. The value of this attribute is not
dependent on the value of ‘employee staff number’ or on the value of ‘qualification award date’. The
value for ‘qualification awarding body’ is only dependent on the value of ‘qualification title’ (assuming
that only one body issues any qualification of a given name or title; this may not be true in the real world
but we make that assumption for the purposes of this exercise). To create our second normal form
relations we need, therefore, to remove ‘qualification awarding body’ to another relation. This new
relation also needs ‘qualification title’ as its primary key since we know that the value of ‘qualification
awarding body’ is dependent on ‘qualification title’.

The second normal form relations are now shown in Figure 6.5.

THIRD NORMAL FORM

We now need to move to third normal form. For a relation to be in third normal form it has to be in
second normal form and also has to meet the condition that every attribute that is not part of the
primary key is not dependent on an attribute that is also not part of the primary key. In other words,
there are no ‘inter-data dependencies’. More formally, a relation is in third normal form if and only if it
is in second normal form and every non-key attribute is non-transitively dependent on the primary key.

We achieve third normal form by reviewing every attribute that is not part of the primary key in the
second normal form relations to see if any of those attributes are dependent on another attribute that is
not part of the primary key. If so, then those attributes need to be removed.
In the second normal form relation for employees, the value of ‘name’ is dependent on the value of
‘staff number’, the primary key. If I know that the value of the ‘staff number’ attribute is UK365, I can
determine that the value of the ‘name’ attribute is Miss Rachel Sarah Davies. The same is true of the
‘date of birth’, ‘ni number’, ‘telephone number’, ‘address’, ‘disabilities’, ‘start date’ and ‘branch number’
attributes. However, if I know the value of the ‘branch number’ attribute then I can determine the
values of both the ‘branch location’ and the ‘branch telephone number’ attributes. The attributes
‘branch location’ and the ‘branch telephone number’ are not, therefore, directly dependent on the
primary key, the ‘staff number’ attribute. We say that the ‘branch location’ and the ‘branch telephone
number’ are transitively dependent (that is, indirectly dependent) on ‘staff number’ through ‘branch
number’.

The ‘branch location’ and the ‘branch telephone number’ attributes are, therefore, removed from the
relation for employees to form a new relation. This new relation also needs ‘branch number’ as its
primary key since both the ‘branch location’ and the ‘branch telephone number’ are dependent on
‘branch number’.

The attribute ‘branch number’ remains in the employee relation as a foreign key. So that it can be
identified as a foreign key that is not part of the primary key, it is marked with an asterisk (*).18

In the second normal form relation for emergency contacts for employees, all the non-key attributes are
directly dependent on the primary key. This relation is, therefore, also in third normal form.

Similarly, in the second normal form relation for the employee qualifications, all the non-key attributes
are directly dependent on the primary key. This relation is, therefore, also in third normal form.

In the second normal form relation for qualifications, there is only one non-key attribute (‘qualification
awarding body’). This means that this relation is automatically in third the ‘branch location’ and the
‘branch telephone number’ attributes. The attributes ‘branch location’ and the ‘branch telephone
number’ are not, therefore, directly dependent on the primary key, the ‘staff number’ attribute. We say
that the ‘branch location’ and the ‘branch telephone number’ are transitively dependent (that is,
indirectly dependent) on ‘staff number’ through ‘branch number’.

The ‘branch location’ and the ‘branch telephone number’ attributes are, therefore, removed from the
relation for employees to form a new relation. This new relation also needs ‘branch number’ as its
primary key since both the ‘branch location’ and the ‘branch telephone number’ are dependent on
‘branch number’.

The attribute ‘branch number’ remains in the employee relation as a foreign key. So that it can be
identified as a foreign key that is not part of the primary key, it is marked with an asterisk (*).18

In the second normal form relation for emergency contacts for employees, all the non-key attributes are
directly dependent on the primary key. This relation is, therefore, also in third normal form.

Similarly, in the second normal form relation for the employee qualifications, all the non-key attributes
are directly dependent on the primary key. This relation is, therefore, also in third normal form.

In the second normal form relation for qualifications, there is only one non-key attribute (‘qualification
awarding body’). This means that this relation is automatically in third normal form – with only one non-
key attribute it is impossible to have a transitive dependency.
The second normal form relation for employee qualifications also has only one non-key attribute
(‘qualification renewal date’), so this relation is also automatically in third normal form.

The third normal form relations are now shown in Figure 6.6. The last column of the form is populated
with the selected names for the third normal form relations.

With our relations now in third normal form this is as far as we go with the process of relational data
analysis.

THE THIRD NORMAL FORM DATA MODEL

We can now use these third normal form relations to create a new data model as in Figure 6.7.

Figure 6.7 The third normal form data model


EXERCISES

The following diagram represents the record of a service call made by a service engineer to a domestic
company to repair an appliance.
Analyze the entries on
this form and use relational data analysis to determine the third normal form relations.

Using the third normal form relations discovered in Exercise above, develop a third normal form model to
represent the information re

THE DATA LANDSCAPE

We can consider the ‘data landscape’ as consisting of six categories of data: structured data,
unstructured data, semi-structured data, master data, meta data and, finally, what has become known
as ‘big data’. These six categories, and their relationships to each other, are shown in Figure 11.1.
The simplest of these categories to understand is structured data – the numbers, dates and short
character strings that have traditionally been capable of storage within a database. Most people these
days think of this data being stored in a database organised into tables and columns as shown in Figure
11.2 (although that is not the only way to store structured data).
If that is structured data, what is unstructured data? This is a term that is applied to data that exists in
documents, in drawings, in audio and video clips, and so on. This data is sometimes called multimedia
data and is characteristically very large, with a size for an individual item of data ranging from kilobytes
to gigabytes. Note that while these types of data are often considered to be ‘unstructured’, they may
have well-defined (even standardised) structures – think MP3 for audio and MP4 for video. Another
characteristic of unstructured data is that it is not easily searched using traditional search algorithms.
This data can be stored and managed by database management systems that have kept pace with
developments of the SQL standard. Although it is not easy to manage, unstructured data can provide the
business with valuable information. In most businesses, the quantity of unstructured data far exceeds
the quantity of structured data. It is just that most of the unstructured data is not managed as data – it
just exists within the business.

SQL

• SQL stands for Structured Query Language. It is used for storing and managing data in
relational database management system (RDMS).
• It is a standard language for Relational Database System. It enables a user to create, read,
update and delete relational databases and tables.
• All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as
their standard database language.
• SQL allows users to query the database in a number of ways, using English-like
statements.

SQL follows the following rules:

• Structure query language is not case sensitive. Generally, keywords of SQL are written in
uppercase.
• Statements of SQL are dependent on text lines. We can use a single SQL statement on
one or multiple text line.
• Using the SQL statements, you can perform most of the actions in a database.
• SQL depends on tuple relational calculus and relational algebra.

Semicolon after SQL Statements?

Some database systems require a semicolon at the end of each SQL statement. Semicolon is the
standard way to separate each SQL statement in database systems that allow more than one SQL
statement to be executed in the same call to the server.

SQL process:
• When an SQL command is executing for any RDBMS, then the system figure out the
best way to carry out the request and the SQL engine determines that how to interpret the
task.
• In the process, various components are included. These components can be optimization
Engine, Query engine, Query dispatcher, classic, etc.
• All the non-SQL queries are handled by the classic query engine, but SQL query engine
won't handle logical files.

Characteristics of SQL

• SQL is easy to learn.


• SQL is used to access data from relational database management systems.
• SQL can execute queries against the database.
• SQL is used to describe the data.
• SQL is used to define the data in the database and manipulate it when needed.
• SQL is used to create and drop the database and table.
• SQL is used to create a view, stored procedure, function in a database.
• SQL allows users to set permissions on tables, procedures, and views.

Advantages of SQL
• High Speed-Using the SQL queries, the user can quickly and efficiently retrieve a large
amount of records from a database.
• No coding needed-In the standard SQL, it is very easy to manage the database system. It
doesn't require a substantial amount of code to manage the database system.
• Well defined Standards-Long established are used by the SQL databases that are being
used by ISO and ANSI.
• Portability-SQL can be used in laptop, PCs, server and even some mobile phones.
• Interactive language- SQL is a domain language used to communicate with the database.
It is also used to receive answers to the complex questions in seconds.
• Multiple data View-Using the SQL language, the users can make different views of the
database structure.

SQL Commands
Some of The Most Important SQL Commands include:

• SELECT - extracts data from a database


• UPDATE - updates data in a database
• DELETE - deletes data from a database
• INSERT INTO - inserts new data into a database
• CREATE DATABASE - creates a new database
• ALTER DATABASE - modifies a database
• CREATE TABLE - creates a new table
• ALTER TABLE - modifies a table
• DROP TABLE - deletes a table
• CREATE INDEX - creates an index (search key)
• DROP INDEX - deletes an index

SQL DML and DDL

SQL can be divided into:

 Data Definition Language (DDL)


 Data Manipulation Language (DML)
 Data Control Language(DCL).

Data Definition Language (DDL)


The DDL part of SQL permits database tables to be created or deleted. It also defines indexes
(keys), specifies links between tables, and imposes constraints between tables. The most
important DDL statements in SQL are:

 CREATE DATABASE - creates a new database


 ALTER DATABASE - modifies a database
 CREATE TABLE - creates a new table
 ALTER TABLE - modifies a table
 DROP TABLE - deletes a table
 CREATE INDEX - creates an index (search key)
 DROP INDEX - deletes an index

Data Manipulation Language (DML)


The query and update commands form the DML part of SQL:

 SELECT - extracts data from a database


 UPDATE - updates data in a database
 DELETE - deletes data from a database
 INSERT INTO - inserts new data into a database

Data Control Language(DCL).


 GRANT- Gives a privilege to user
 REVOKE-Takes back privileges granted from user.

MS ACCESS

Microsoft Access is a Database Management System (DBMS) from Microsoft that combines the
relational Microsoft Jet Database Engine with a graphical user interface and software development
tools. It is a member of the Microsoft Office suite of applications, included in the professional and higher
editions.

ORACLE

Oracle database is a relational database management system. It is also


called OracleDB, or simply Oracle. It is produced and marketed by Oracle
Corporation. It was created in 1977 by Lawrence Ellison and other engineers. It
is one of the most popular relational database engines in the IT market for storing,
organizing, and retrieving data. Oracle database was the first DB that designed
for enterprise grid computing and data warehousing. Enterprise grid computing
provides the most flexible and cost-effective way to manage information and
applications. It uses SQL queries as a language for interacting with the database.

MySQL

• MySQL is an open source SQL database, which is developed by a Swedish company – MySQL AB.

• MySQL is supported many different platforms including Microsoft Windows, the major Linux
distributions, UNIX, and Mac OS X.

• MySQL has free and paid versions, depending on its usage (non-commercial/commercial) and
features. MySQL comes with a very fast, multi-threaded, multi-user and robust SQL database
server.

• MySQL is used by many database-driven web applications including WordPress, Facebook,


Twitter, YouTube, and Google to provide fast, reliable storage and retrieval of website and app
data like user profiles, content, statistics etc.

MS SQL Server
MS SQL Server is a relational database management system (RDBMS) developed by Microsoft. This
product is built for the basic function of storing retrieving data as required by other applications. It can
be run either on the same computer or on another across a network.

XAMPP

XAMPP is an abbreviation where X stands for Cross-Platform, A stands for Apache, M stands
for MYSQL, and the Ps stand for PHP and Perl, respectively. It is an open-source package of web
solutions that includes Apache distribution for many servers and command-line executables along with
modules such as Apache server, MariaDB, PHP, and Perl. It consists of the Apache HTTP Server, MySQL
database, and interpreters for PHP and Perl programming languages.

How to use XAMPP to Create a MySQL Database?


Opening XAMPP

Go to your system’s XAMPP folder or simply click the XAMPP Icon to open it. The Control Panel is now
visible, and you may use it to start or stop any module.

Starting XAMPP

Select the “Start” option for the Apache and MySQL modules, respectively. The user will see the
following screen once it has started working:

Accessing Admin

Next, select the MySQL Module and click the “Admin” button. You will immediately be redirected to the
following address in a web browser:
http://localhost/phpmyadmin

Creating a Database

A number of tabs appear, including Database, SQL, User Accounts, Export, Import, Settings, and so on.

Select the “Database” tab from the drop-down menu.

The Create option is visible there.

Choose a suitable name for the Database input field. Type the name guestbook and click create.

Creating a Table

To create a table named entries. There are various options for creating a table.

You can utilize the Create Table feature.

 On the phpMyAdmin screen, choose the guestbook database.

 Select the Structure tab.

 Under Create table, give a table name and the number of columns.

 Click the Go button. This prompts you to input the column information.

The SQL command to create a table.

 On the phpMyAdmin screen, choose the SQL tab.

 Enter the following code.

USE guestbook;

CREATE TABLE entries (guestName VARCHAR(255), content VARCHAR(255),

entryID INT NOT NULL AUTO_INCREMENT, PRIMARY KEY(entryID));


If you have already selected the guestbook database (on the left panel), you do not need to include USE
guestbook in your SQL query.

To run the command, simply click the Go button.

Insert Data

There are various options for inserting data into a table.

You may utilize the Insert feature.

 On the phpMyAdmin screen, pick the guestbook database and then the entries table.

 Select the Insert tab.

 Enter the value for each column for each data record that will be added.

 Click the Go button.

The SQL command to insert data.


 On the phpMyAdmin screen, pick the guestbook database and then the entries table.

 Select the SQL tab.

 Enter the following code.

INSERT INTO entries (guestName, content) values ("Humpty", "Humpty's here!");

INSERT INTO entries (guestName, content) values ("Dumpty", "Dumpty's here too!");

To run the command, simply click the Go button.

Retrieve Data

There are various ways for retrieving data from a table.

You may utilize the Browse option.

On the phpMyAdmin screen, pick the guestbook database and then the entries table.

Select the Browse tab. This will display all existing records in the table.
The SQL command to get data.

 On the phpMyAdmin screen, pick the guestbook database and then the entries table.

 Select the SQL tab.

 Enter the following code.

SELECT * FROM entries;

To run the command, simply click the Go button.

Importing SQL File

Create a blank file called friendbook.sql. Paste the following text into the file.

CREATE TABLE friends

(friendName VARCHAR(255),

phone VARCHAR(255),

entryID INT NOT NULL AUTO_INCREMENT,

PRIMARY KEY(entryID));

INSERT INTO friends (friendName, phone) values ("Humpty", "111-111-1111");

INSERT INTO friends (friendName, phone) values ("Dumpty", "222-222-2222");

 On the phpMyAdmin screen, choose the guestbook database.

 Select the Import tab.

 Choose the .sql file to import


To run the command, simply click the Go button.

Export SQL File

 On the phpMyAdmin screen, choose the guestbook database.

 Select the export tab.

 Click the Go button to run the command.

 To run the command, simply click the Go button.

 Mac users may hit Control+Enter to run.

 For Windows users, press Control+Enter to run.


SQL JOIN
A JOIN clause is used to combine rows from two or more tables, based on a
related column between them.

Let's look at a selection from the "Orders" table:

OrderID CustomerID OrderDate

10308 2 1996-09-18

10309 37 1996-09-19
10310 77 1996-09-20

Then, look at a selection from the "Customers" table:

CustomerID CustomerName ContactName Country

1 Alfreds Futterkiste Maria Anders Germany

2 Ana Trujillo Emparedados y helados Ana Trujillo Mexico

3 Antonio Moreno Taquería Antonio Moreno Mexico

Notice that the "CustomerID" column in the "Orders" table refers to the
"CustomerID" in the "Customers" table. The relationship between the two tables
above is the "CustomerID" column.

Then, we can create the following SQL statement (that contains an INNER
JOIN), that selects records that have matching values in both tables:

Example
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;

and it will produce something like this:

OrderID CustomerName OrderDate

10308 Ana Trujillo Emparedados y helados 9/18/1996


10365 Antonio Moreno Taquería 11/27/1996

10383 Around the Horn 12/16/1996

10355 Around the Horn 11/15/1996

10278 Berglunds snabbköp 8/12/1996

Different Types of SQL JOINs


 (INNER) JOIN: Returns records that have matching values in both tables
 LEFT (OUTER) JOIN: Returns all records from the left table, and the
matched records from the right table
 RIGHT (OUTER) JOIN: Returns all records from the right table, and the
matched records from the left table
 FULL (OUTER) JOIN: Returns all records when there is a match in either
left or right table
INNER JOIN
The INNER JOIN keyword selects records that have matching values in both
tables.

Note: The INNER JOIN keyword returns only rows with a match in both tables.
Which means that if you have a product with no CategoryID, or with a
CategoryID that is not present in the Categories table, that record would not be
returned in the result.

Syntax
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;

Let's look at a selection of the Products table:

ProductID ProductName CategoryID Price

1 Chais 1 18
2 Chang 1 19

3 Aniseed Syrup 2 10

And a selection of the Categories table:

CategoryID CategoryName Description

1 Beverages Soft drinks, coffees, teas, beers, and ales

2 Condiments Sweet and savory sauces, relishes, spreads, and


seasonings

3 Confections Desserts, candies, and sweet breads

We will join the Products table with the Categories table, by using
the CategoryID field from both tables:

Example
Join Products and Categories with the INNER JOIN keyword:

SELECT ProductID, ProductName, CategoryName


FROM Products
INNER JOIN Categories ON Products.CategoryID =
Categories.CategoryID;

SQL LEFT JOIN Keyword


The LEFT JOIN keyword returns all records from the left table (table1), and the
matching records from the right table (table2). The result is 0 records from the
right side, if there is no match.

LEFT JOIN Syntax


SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;

Note: In some databases LEFT JOIN is called LEFT OUTER JOIN.

Below is a selection from the "Customers" table:

CustomerID CustomerName ContactName Address City PostalCod Country


e

1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222

3 Antonio Moreno Antonio Mataderos México 05023 Mexico


Taquería Moreno 2312 D.F.
And a selection from the "Orders" table:

OrderID CustomerID EmployeeID OrderDate ShipperID

10308 2 7 1996-09-18 3

10309 37 3 1996-09-19 1

10310 77 8 1996-09-20 2

SQL LEFT JOIN Example


The following SQL statement will select all customers, and any orders they
might have:

Example
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;

Note: The LEFT JOIN keyword returns all records from the left table
(Customers), even if there are no matches in the right table (Orders).

SQL RIGHT JOIN Keyword


The RIGHT JOIN keyword returns all records from the right table (table2), and the
matching records from the left table (table1). The result is 0 records from the
left side, if there is no match.

RIGHT JOIN Syntax


SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;

Note: In some databases RIGHT JOIN is called RIGHT OUTER JOIN.

Below is a selection from the "Orders" table:

OrderID CustomerID EmployeeID OrderDate ShipperID

10308 2 7 1996-09-18 3

10309 37 3 1996-09-19 1

10310 77 8 1996-09-20 2

And a selection from the "Employees" table:


EmployeeID LastName FirstName BirthDate Photo

1 Davolio Nancy 12/8/1968 EmpID1.pic

2 Fuller Andrew 2/19/1952 EmpID2.pic

3 Leverling Janet 8/30/1963 EmpID3.pic

SQL RIGHT JOIN Example


The following SQL statement will return all employees, and any orders they
might have placed:

Example
SELECT Orders.OrderID, Employees.LastName, Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID;

Note: The RIGHT JOIN keyword returns all records from the right table

(Employees), even if there are no matches in the left table (Orders). h


Exercises
Exercise:
Choose the correct JOIN clause to select all the records from the Customers table plus
all the matches in the Orders table.

SELECT *
FROM Orders
ON Orders.CustomerID=Customers.CustomerID;

SQL FULL OUTER JOIN Keyword


The FULL OUTER JOIN keyword returns all records when there is a match in left
(table1) or right (table2) table records.

Tip: FULL OUTER JOIN and FULL JOIN are the same.

FULL OUTER JOIN Syntax


SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;

Note: FULL OUTER JOIN can potentially return very large result-sets!

Below is a selection from the "Customers" table:

CustomerID CustomerName ContactName Address City PostalCode Country

1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución
helados 2222 D.F.

3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico


Taquería Moreno D.F.

And a selection from the "Orders" table:

OrderID CustomerID EmployeeID OrderDate ShipperID

10308 2 7 1996-09-18 3

10309 37 3 1996-09-19 1

10310 77 8 1996-09-20 2

SQL FULL OUTER JOIN Example


The following SQL statement selects all customers, and all orders:

SELECT Customers.CustomerName, Orders.OrderID


FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;

A selection from the result set may look like this:

CustomerName OrderID
Null 10309

Null 10310

Alfreds Futterkiste Null

Ana Trujillo Emparedados y helados 10308

Antonio Moreno Taquería Null

Note: The FULL OUTER JOIN keyword returns all matching records from both
tables whether the other table matches or not. So, if there are rows in
"Customers" that do not have matches in "Orders", or if there are rows in
"Orders" that do not have matches in "Customers", those rows will be listed as
well.

Relational Algebra

A. Introduction to Relational Algebra


Relational Algebra is a procedural query language that works on relational models. It uses
operators to perform queries on relations (tables) and returns results in the form of new relations.
Relational Algebra provides the theoretical foundation for relational databases and SQL.
Key Concepts:

 Relation: A table with columns and rows.


 Tuple: A single row in a table.
 Attribute: A column in a table.
 Schema: The structure of a table defined by its attributes.

Relational Algebra consists of a set of operations that take one or two relations as input and
produce a new relation as output. These operations include:
 Selection
 Projection
 Union
 Set Difference
 Intersection
 Cartesian Product
 Join Operations

B. Selection (or Restriction)


Selection (denoted by σ) is used to select rows from a relation that satisfy a given predicate.
Syntax:
σcondition(Relation)\sigma_{condition}(Relation)σcondition(Relation)
Example:
Assume we have a relation Employee:
EmployeeID Name Age Department

1 Alice 30 HR

2 Bob 25 IT

3 Charlie 35 IT

To select employees from the IT department: σDepartment=′IT′(Employee)\sigma_{Department


= 'IT'}(Employee)σDepartment=′IT′(Employee)
Result:
EmployeeID Name Age Department

2 Bob 25 IT

3 Charlie 35 IT

C. Projection
Projection (denoted by π) is used to select specific columns from a relation.
Syntax:
πcolumn1,column2,...(Relation)\pi_{column1, column2, ...}(Relation)πcolumn1,column2,...
(Relation)
Example:
To select only the Name and Department columns from the Employee relation:
πName,Department(Employee)\pi_{Name, Department}(Employee)πName,Department
(Employee)
Result:
Name Department

Alice HR

Bob IT

Charlie IT

D. Union
Union (denoted by ∪) combines the tuples of two relations and removes duplicates.
Syntax:
Relation1∪Relation2Relation1 \cup Relation2Relation1∪Relation2
Example:
Assume we have another relation TempEmployee:
EmployeeID Name Age Department

4 David 28 Marketing

5 Eva 22 HR

To combine Employee and TempEmployee: Employee∪TempEmployeeEmployee \cup


TempEmployeeEmployee∪TempEmployee
Result:
EmployeeID Name Age Department

1 Alice 30 HR

2 Bob 25 IT

3 Charlie 35 IT

4 David 28 Marketing

5 Eva 22 HR
E. Set Difference
Set Difference (denoted by -) returns the tuples that are in one relation but not in the other.
Syntax:
Relation1−Relation2Relation1 - Relation2Relation1−Relation2
Example:
To find employees who are not temporary: Employee−TempEmployeeEmployee -
TempEmployeeEmployee−TempEmployee
Result:
EmployeeID Name Age Department

1 Alice 30 HR

2 Bob 25 IT

3 Charlie 35 IT

F. Intersection
Intersection (denoted by ∩) returns the tuples that are in both relations.
Syntax:
Relation1∩Relation2Relation1 \cap Relation2Relation1∩Relation2
Example:
To find employees who are both permanent and temporary:
Employee∩TempEmployeeEmployee \cap TempEmployeeEmployee∩TempEmployee
Result: (Assuming no common tuples between Employee and TempEmployee, the result will be
an empty set.)

G. Cartesian Product
Cartesian Product (denoted by ×) returns a relation that is the combination of every tuple in the
first relation with every tuple in the second relation.
Syntax:
Relation1×Relation2Relation1 \times Relation2Relation1×Relation2
Example:
To combine each employee with each department: Employee×DepartmentEmployee \times
DepartmentEmployee×Department
Assume Department relation:
DeptID DeptName

1 HR

2 IT

3 Marketing

Result:
EmployeeID Name Age DeptID DeptName

1 Alice 30 1 HR

1 Alice 30 2 IT

1 Alice 30 3 Marketing

2 Bob 25 1 HR

2 Bob 25 2 IT

2 Bob 25 3 Marketing

3 Charlie 35 1 HR

3 Charlie 35 2 IT

3 Charlie 35 3 Marketing

H. Join Operations
Join Operations combine related tuples from different relations based on a common attribute.
Types of Joins:

1. Theta Join (θ-Join): Combines tuples based on a general condition.


o Syntax: Relation1⋈conditionRelation2Relation1 \bowtie_{condition}
Relation2Relation1⋈conditionRelation2
2. Equi-Join: A special case of theta join where the condition is equality.
o Syntax: Relation1⋈Relation1.attribute=Relation2.attributeRelation2Relation1 \
bowtie_{Relation1.attribute = Relation2.attribute}
Relation2Relation1⋈Relation1.attribute=Relation2.attributeRelation2
3. Natural Join: Combines tuples based on common attribute names and removes duplicate
attributes.
o Syntax: Relation1⋈Relation2Relation1 \bowtie Relation2Relation1⋈Relation2
4. Outer Join: Includes tuples that do not have matching tuples in the other relation.
o Types: Left Outer Join, Right Outer Join, Full Outer Join
o Syntax: Relation1⋈conditionRelation2Relation1 \bowtie_{condition}
Relation2Relation1⋈conditionRelation2 (with variations for left, right, and full outer
joins)

Example of Equi-Join:
Assume Employee and Department relations. To join them on Department:
Employee⋈Employee.Department=Department.DeptNameDepartmentEmployee \
bowtie_{Employee.Department = Department.DeptName}
DepartmentEmployee⋈Employee.Department=Department.DeptNameDepartment
Result:
EmployeeID Name Age Department DeptID

1 Alice 30 HR 1

2 Bob 25 IT 2

3 Charlie 35 IT 2

You might also like