0% found this document useful (0 votes)
215 views

SQL Joins

SQL Join is used to combine data from two or more tables and is used to fetch related data. There are several types of joins in SQL including inner joins, outer joins, left joins, right joins, full outer joins, cross joins, and self joins. The type of join used depends on whether matched or unmatched data is to be included in the results.

Uploaded by

sunny11088
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
215 views

SQL Joins

SQL Join is used to combine data from two or more tables and is used to fetch related data. There are several types of joins in SQL including inner joins, outer joins, left joins, right joins, full outer joins, cross joins, and self joins. The type of join used depends on whether matched or unmatched data is to be included in the results.

Uploaded by

sunny11088
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 59

join in SQL

SQL Join is used to fetch data from two or more tables, which is joined to appear as single set of
data. SQL Join is used for combining column from two or more tables by using values common
to both tables. JoinKeyword is used in SQL queries for joining two or more tables. Minimum
required condition for joining table, is(n-1) where n, is number of tables. A table can also join to
itself known as, Self Join.

Types of Join
The following are the types of JOIN that we can use in SQL.

Inner

Outer

Left

Right

Cross JOIN or Cartesian Product


This type of JOIN returns the cartesian product of rows of from the tables in Join. It will return a
table which consists of records which combines each row from the first table with each row of
the second table.
Cross JOIN Syntax is,
SELECT column-name-list
from table-name1
CROSS JOIN
table-name2;

Example of Cross JOIN


The class table,

ID

NAME

abhi

adam

alex

The class_info table,


ID

Address

DELHI

MUMBAI

CHENNAI

Cross JOIN query will be,


SELECT *
from class,
cross JOIN class_info;

The result table will look like,


ID

NAME

ID

Address

abhi

DELHI

adam

DELHI

alex

DELHI

abhi

MUMBAI

adam

MUMBAI

alex

MUMBAI

abhi

CHENNAI

adam

CHENNAI

alex

CHENNAI

INNER Join or EQUI Join


This is a simple JOIN in which the result is based on matched data as per the equality condition
specified in the query.
Inner Join Syntax is,
SELECT column-name-list
from table-name1
INNER JOIN
table-name2
WHERE table-name1.column-name = table-name2.column-name;

Example of Inner JOIN


The class table,
ID

NAME

abhi

adam

alex

anu

The class_info table,


ID

Address

DELHI

MUMBAI

CHENNAI

Inner JOIN query will be,


SELECT * from class, class_info where class.id = class_info.id;

The result table will look like,


ID

NAME

ID

Address

abhi

DELHI

adam

MUMBAI

alex

CHENNAI

Natural JOIN
Natural Join is a type of Inner join which is based on column having same name and same
datatype present in both the tables to be joined.

Natural Join Syntax is,


SELECT *
from table-name1
NATURAL JOIN
table-name2;

Example of Natural JOIN


The class table,
ID

NAME

abhi

adam

alex

anu

The class_info table,


ID

Address

DELHI

MUMBAI

CHENNAI

Natural join query will be,


SELECT * from class NATURAL JOIN class_info;

The result table will look like,


ID

NAME

Address

abhi

DELHI

adam

MUMBAI

alex

CHENNAI

In the above example, both the tables being joined have ID column(same name and same
datatype), hence the records for which value of ID matches in both the tables will be the result of
Natural Join of these two tables.

Outer JOIN
Outer Join is based on both matched and unmatched data. Outer Joins subdivide further into,

Left Outer Join

Right Outer Join

Full Outer Join

Left Outer Join


The left outer join returns a result table with the matched data of two tables then remaining
rows of the lefttable and null for the right table's column.
Left Outer Join syntax is,
SELECT column-name-list
from table-name1
LEFT OUTER JOIN
table-name2

on table-name1.column-name = table-name2.column-name;

Left outer Join Syntax for Oracle is,


select column-name-list
from table-name1,
table-name2
on table-name1.column-name = table-name2.column-name(+);

Example of Left Outer Join


The class table,
ID

NAME

abhi

adam

alex

anu

ashish

The class_info table,


ID

Address

DELHI

MUMBAI

CHENNAI

NOIDA

PANIPAT

Left Outer Join query will be,


SELECT * FROM class LEFT OUTER JOIN class_info ON (class.id=class_info.id);

The result table will look like,


ID

NAME

ID

Address

abhi

DELHI

adam

MUMBAI

alex

CHENNAI

anu

null

null

ashish

null

null

Right Outer Join


The right outer join returns a result table with the matched data of two tables then remaining
rows of the right table and null for the left table's columns.
Right Outer Join Syntax is,
select column-name-list
from table-name1
RIGHT OUTER JOIN

table-name2
on table-name1.column-name = table-name2.column-name;

Right outer Join Syntax for Oracle is,


select column-name-list
from table-name1,
table-name2
on table-name1.column-name(+) = table-name2.column-name;

Example of Right Outer Join


The class table,
ID

NAME

abhi

adam

alex

anu

ashish

The class_info table,


ID

Address

DELHI

MUMBAI

CHENNAI

NOIDA

PANIPAT

Right Outer Join query will be,


SELECT * FROM class RIGHT OUTER JOIN class_info on (class.id=class_info.id);

The result table will look like,


ID

NAME

ID

Address

abhi

DELHI

adam

MUMBAI

alex

CHENNAI

null

null

NOIDA

null

null

PANIPAT

Full Outer Join


The full outer join returns a result table with the matched data of two table then remaining rows
of both lefttable and then the right table.
Full Outer Join Syntax is,
select column-name-list
from table-name1
FULL OUTER JOIN

table-name2
on table-name1.column-name = table-name2.column-name;

Example of Full outer join is,


The class table,
ID

NAME

abhi

adam

alex

anu

ashish

The class_info table,


ID

Address

DELHI

MUMBAI

CHENNAI

NOIDA

PANIPAT

Full Outer Join query will be like,

SELECT * FROM class FULL OUTER JOIN class_info on (class.id=class_info.id);

The result table will look like,


ID

NAME

ID

Address

abhi

DELHI

adam

MUMBAI

alex

CHENNAI

anu

null

null

ashish

null

null

null

null

NOIDA

null

null

PANIPAT

Types of Join in SQL Server

Nirav Prabtani, 20 Jan 2014 CPOL

135.1K

31

4.81 (23 votes)

Rate this:

Vote!

Types of join in SQL Server for fetching records from multiple tables.

Introduction
In this tip, I am going to explain about types of join.

What is join??

1.
2.

1.
2.
3.

An SQL JOIN clause is used to combine rows from two or more tables, based on a
common field between them.
There are many types of join.
Inner Join
Equi-join
Natural Join
Outer Join
Left outer Join
Right outer join
Full outer join
Cross Join
Self Join

Using the Code


Join is very useful to fetching records from multiple tables with reference to common
column between them.
To understand join with example, we have to create two tables in SQL Server database.
1.

Employee

2.
3.
4.
5.
6.
7.
8.
9.

create table Employee(


id int identity(1,1) primary key,
Username varchar(50),
FirstName varchar(50),
LastName varchar(50),
DepartID int
)

10.

Departments
create table Departments(
id int identity(1,1) primary key,
DepartmentName varchar(50)
)

Now fill Employee table with demo records like that.

Fill Department table also like this....

1) Inner Join
The join that displays only the rows that have a match in both the joined tables is known
as inner join.
select e1.Username,e1.FirstName,e1.LastName,e2.DepartmentName _
from Employee e1 inner join Departments e2 on e1.DepartID=e2.id

It gives matched rows from both tables with reference to DepartID of first table and id
of second table like this.

Equi-Join
Equi join is a special type of join in which we use only equality operator. Hence, when
you make a query forjoin using equality operator, then that join query comes under Equi
join.
Equi join has only (=) operator in join condition.
Equi join can be inner join, left outer join, right outer join.
Check the query for equi-join:
SELECT * FROM Employee e1 JOIN Departments e2 ON e1.DepartID = e2.id

2) Outer Join
Outer join returns all the rows of both tables whether it has matched or not.
We have three types of outer join:
1.
2.
3.

Left outer join


Right outer join
Full outer join
a) Left Outer join
Left join displays all the rows from first table and matched rows from second table like
that..
SELECT * FROM Employee e1 LEFT OUTER JOIN Departments e2
ON e1.DepartID = e2.id

Result:

b) Right outer join


Right outer join displays all the rows of second table and matched rows from first table
like that.
SELECT * FROM Employee e1 RIGHT OUTER JOIN Departments e2
ON e1.DepartID = e2.id

Result:

3) Full outer join


Full outer join returns all the rows from both tables whether it has been matched or not.
SELECT * FROM Employee e1 FULL OUTER JOIN Departments e2
ON e1.DepartID = e2.id

Result:

3) Cross Join
A cross join that produces Cartesian product of the tables that are involved in the join.
The size of a Cartesian product is the number of the rows in the first table multiplied by
the number of rows in the second table like this.
SELECT * FROM Employee cross join Departments e2

You can write a query like this also:


SELECT * FROM Employee , Departments e2

4) Self Join
Joining the table itself called self join. Self join is used to retrieve the records having
some relation or similarity with other records in the same table. Here, we need to use
aliases for the same table to set a self join between single table and retrieve records
satisfying the condition in where clause.
SELECT e1.Username,e1.FirstName,e1.LastName from Employee e1 _
inner join Employee e2 on e1.id=e2.DepartID

Here, I have retrieved data in which id and DepartID of employee table has been
matched:

Points of Interest
Here, I have taken one example of self join in this scenario where manager name can be
retrieved by manageridwith reference of employee id from one table.
Here, I have created one table employees like that:

If I have to retrieve manager name from manager id, then it can be possible by Self join:
select e1.empName as ManagerName,e2.empName as EmpName _
from employees e1 inner join employees e2 on e1.id=e2.managerid

Result:

History

11 important database
designing rules which I follow

Shivprasad koirala, 25 Feb 2014 CPOL

557.8K

191

4.79 (80 votes)


Rate this: vote 1vote 2vote 3vote 4vote 5
This article will discuss about 11 important database
designing rules.

Table of Contents

Introduction
Rule 1: What is the nature of the application (OLTP or
OLAP)?
Rule 2: Break your data in to logical pieces, make life
simpler
Rule 3: Do not get overdosed with rule 2
Rule 4: Treat duplicate non-uniform data as your biggest
enemy
Rule 5: Watch for data separated by separators

Rule 6: Watch for partial dependencies


Rule 7: Choose derived columns preciously
Rule 8: Do not be hard on avoiding redundancy, if
performance is the key

Rule 9: Multidimensional data is a different beast


altogether

Rule 10: Centralize name value table design

Rule 11: For unlimited hierarchical data self-reference PK


and FK

Courtesy: Image from Motion pictures

Introduction
Before you start reading this article let me confirm to you I
am not a guru in database designing. The below 11 points are
what I have learnt via projects, my own experiences, and my
own reading. I personally think it has helped me a lot when it
comes to DB designing. Any criticism is welcome.

The reason I am writing a full blown article is, when


developers design a database they tend to follow the three
normal forms like a silver bullet. They tend to think
normalization is the only way of designing. Due this mind set
they sometimes hit road blocks as the project moves ahead.
If you are new to normalization, then click and see 3 normal
forms in action which explains all the three normal forms step
by step.
Said and done normalization rules are important guidelines
but taking them as a mark on stone is calling for trouble.
Below are my own 11 rules which I remember on the top of
my head while doing DB design.

Rule 1: What is the nature of


the application (OLTP or
OLAP)?
When you start your database design the first thing to
analyze is the nature of the application you are designing for,
is it Transactional or Analytical. You will find many developers
by default applying normalization rules without thinking
about the nature of the application and then later getting into
performance and customization issues. As said, there are two
kinds of applications: transaction based and analytical based,
lets understand what these types are.
Transactional: In this kind of application, your end user is
more interested in CRUD, i.e., creating, reading, updating,

and deleting records. The official name for such a kind of


database is OLTP.
Analytical: In these kinds of applications your end user is
more interested in analysis, reporting, forecasting, etc. These
kinds of databases have a less number of inserts and
updates. The main intention here is to fetch and analyze data
as fast as possible. The official name for such a kind of
database is OLAP.

In other words if you think inserts, updates, and deletes are


more prominent then go for a normalized table design, else
create a flat denormalized database structure.
Below is a simple diagram which shows how the names and
address in the left hand side are a simple normalized table
and by applying a denormalized structure how we have
created a flat table structure.

Rule 2: Break your data into


logical pieces, make life
simpler
This rule is actually the first rule from 1st normal form. One of
the signs of violation of this rule is if your queries are using
too many string parsing functions like substring, charindex,
etc., then probably this rule needs to be applied.
For instance you can see the below table which has student
names; if you ever want to query student names having
Koirala and not Harisingh, you can imagine what kind of a
query you will end up with.
So the better approach would be to break this field into
further logical pieces so that we can write clean and optimal
queries.

Rule 3: Do not get


overdosed with rule 2
Developers are cute creatures. If you tell them this is the
way, they keep doing it; well, they overdo it leading to
unwanted consequences. This also applies to rule 2 which we
just talked above. When you think about decomposing, give a
pause and ask yourself, is it needed? As said, the
decomposition should be logical.
For instance, you can see the phone number field; its rare
that you will operate on ISD codes of phone numbers
separately (until your application demands it). So it would be
a wise decision to just leave it as it can lead to more
complications.

Rule 4: Treat duplicate nonuniform data as your


biggest enemy
Focus and refactor duplicate data. My personal worry about
duplicate data is not that it takes hard disk space, but the
confusion it creates.
For instance, in the below diagram, you can see 5th
Standard and Fifth standard means the same. Now you
can say the data has come into your system due to bad data
entry or poor validation. If you ever want to derive a report,
they would show them as different entities, which is very
confusing from the end user point of view.

One of the solutions would be to move the data into a


different master table altogether and refer them via foreign
keys. You can see in the below figure how we have created a

new master table called Standards and linked the same


using a simple foreign key.

Rule 5: Watch for data


separated by separators
The second rule of 1st normal form says avoid repeating
groups. One of the examples of repeating groups is explained
in the below diagram. If you see the syllabus field closely, in
one field we have too much data stuffed. These kinds of fields
are termed as Repeating groups. If we have to manipulate
this data, the query would be complex and also I doubt about
the performance of the queries.

These kinds of columns which have data stuffed with


separators need special attention and a better approach
would be to move those fields to a different table and link
them with keys for better management.

So now lets apply the second rule of 1st normal form: Avoid
repeating groups. You can see in the above figure I have
created a separate syllabus table and then made a many-tomany relationship with the subject table.
With this approach the syllabus field in the main table is no
more repeating and has data separators.

Rule 6: Watch for partial


dependencies

Watch for fields which depend partially on primary keys. For


instance in the above table we can see the primary key is
created on roll number and standard. Now watch the syllabus
field closely. The syllabus field is associated with a standard
and not with a student directly (roll number).

The syllabus is associated with the standard in which the


student is studying and not directly with the student. So if
tomorrow we want to update the syllabus we have to update
it for each student, which is painstaking and not logical. It
makes more sense to move these fields out and associate
them with the Standard table.
You can see how we have moved the syllabus field and
attached it to the Standards table.
This rule is nothing but the 2nd normal form: All keys should
depend on the full primary key and not partially.

Rule 7: Choose derived


columns preciously

If you are working on OLTP applications, getting rid of derived


columns would be a good thought, unless there is some
pressing reason for performance. In case of OLAP where we
do a lot of summations, calculations, these kinds of fields are
necessary to gain performance.

In the above figure you can see how the average field is
dependent on the marks and subject. This is also one form of
redundancy. So for such kinds of fields which are derived from
other fields, give a thought: are they really necessary?
This rule is also termed as the 3rd normal form: No column
should depend on other non-primary key columns. My
personal thought is do not apply this rule blindly, see the
situation; its not that redundant data is always bad. If the
redundant data is calculative data, see the situation and then
decide if you want to implement the 3rdnormal form.

Rule 8: Do not be hard on


avoiding redundancy, if
performance is the key

Do not make it a strict rule that you will always avoid


redundancy. If there is a pressing need for performance think
about de-normalization. In normalization, you need to make
joins with many tables and in denormalization, the joins
reduce and thus increase performance.

Rule 9: Multidimensional
data is a different beast
altogether
OLAP projects mostly deal with multidimensional data. For
instance you can see the below figure, you would like to get
sales per country, customer, and date. In simple words you
are looking at sales figures which have three intersections of
dimension data.

For such kinds of situations a dimension and fact design is a


better approach. In simple words you can create a simple
central sales fact table which has the sales amount field and

it makes a connection with all dimension tables using a


foreign key relationship.

Rule 10: Centralize name


value table design
Many times I have come across name value tables. Name and
value tables means it has key and some data associated with
the key. For instance in the below figure you can see we have
a currency table and a country table. If you watch the data
closely they actually only have a key and value.

For such kinds of tables, creating a central table and


differentiating the data by using a type field makes more
sense.

Rule 11: For unlimited


hierarchical data selfreference PK and FK
Many times we come across data with unlimited parent child
hierarchy. For instance consider a multi-level marketing
scenario where a sales person can have multiple sales people
below them. For such scenarios, using a self-referencing
primary key and foreign key will help to achieve the same.

This article is not meant to say that do not follow normal


forms, instead do not follow them blindly, look at your
project's nature and the type of data you are dealing with
first.

Below is a video which explains the three normal forms step


by step using a simple school table.

You can also visit my website for step by step videos


on Design Patterns, UML, SharePoint 2010, .NET
Fundamentals, VSTS, UML, SQL Server, MVC, and lots more.

License
This article, along with any associated source code and files,
is licensed under The Code Project Open License (CPOL)

Share

EMAIL
TWITTER

About the Author

Shivprasad koirala
Architect http://www.questpond.com
India

Introduction to database design


This article/tutorial will teach the basis of relational database design and
explains how to make a good database design. It is a rather long text, but
we advise to read all of it. Designing a database is in fact fairly easy, but
there are a few rules to stick to. It is important to know what these rules
are, but more importantly is to know why these rules exist, otherwise you
will tend to make mistakes!
Standardization makes your data model flexible and that makes working
with your data much easier. Please, take the time to learn these rules and
apply them! The database used in this article is designed with our database
design and modeling tool DeZign for Databases.
A good database design starts with a list of the data that you want to
include in your database and what you want to be able to do with the
database later on. This can all be written in your own language, without any
SQL. In this stage you must try not to think in tables or columns, but just
think: "What do I need to know?" Don't take this too lightly, because if you
find out later that you forgot something, usually you need to start all over.
Adding things to your database is mostly a lot of work.

Identifying Entities
The types of information that are saved in the database are called 'entities'.
These entities exist in four kinds: people, things, events, and locations.
Everything you could want to put in a database fits into one of these
categories. If the information you want to include doesn't fit into these
categories, than it is probably not an entity but a property of an entity, an
attribute.
To clarify the information given in this article we'll use an example. Imagine
that you are creating a website for a shop, what kind of information do you
have to deal with? In a shop you sell your products to customers. The

"Shop" is a location; "Sale" is an event; "Products" are things; and


"Customers" are people. These are all entities that need to be included in
your database.
But what other things are happening when selling a product? A customer
comes into the shop, approaches the vendor, asks a question and gets an
answer. "Vendors" also participate, and because vendors are people, we
need a vendors entity.

Figure 1: Entities: types of information.

Identifying Relationships
The next step is to determine the relationships between the entities and to
determine the cardinality of each relationship. The relationship is the
connection between the entities, just like in the real world: what does one
entity do with the other, how do they relate to each other? For example,
customers buy products, products are sold to customers, a sale comprises
products, a sale happens in a shop.
The cardinality shows how much of one side of the relationship belongs to
how much of the other side of the relationship. First, you need to state for
each relationship, how much of one side belongs to exactly 1 of the other
side. For example: How many customers belong to 1 sale?; How many
sales belong to 1 customer?; How many sales take place in 1 shop?
You'll get a list like this: (please note that 'product' represents a type of
product, not an occurance of a product)

Customers --> Sales; 1 customer can buy something several times


Sales --> Customers; 1 sale is always made by 1 customer at the
time
Customers --> Products; 1 customer can buy multiple products
Products --> Customers; 1 product can be purchased by multiple
customers
Customers --> Shops; 1 customer can purchase in multiple shops
Shops --> Customers, 1 shop can receive multiple customers
Shops --> Products; in 1 shop there are multiple products
Products --> Shops; 1 product (type) can be sold in multiple shops
Shops --> Sales; in 1 shop multiple sales can me made
Sales --> Shops; 1 sale can only be made in 1 shop at the time
Products --> Sales; 1 product (type) can be purchased in multiple
sales
Sales --> Products; 1 sale can exist out of multiple products
Did we mention all relationships? There are four entities and each entity
has a relationship with every other entity, so each entity must have three
relationships, and also appear on the left end of the relationship three
times. Above, 12 relationships were mentioned, which is 4*3, so we can
conclude that all relationships were mentioned.

Now we'll put the data together to find the cardinality of the whole
relationship. In order to do this, we'll draft the cardinalities per relationship.
To make this easy to do, we'll adjust the notation a bit, by noting the
'backward'-relationship the other way around:
Customers --> Sales; 1 customer can buy something several times
Sales --> Customers; 1 sale is always made by 1 customer at the
time
The second relationship we will turn around so it has the same entity order
as the first. Please notice the arrow that is now faced the other way!
Customers <-- Sales; 1 sale is always made by 1 customer at the
time
Cardinality exists in four types: one-to-one, one-to-many, many-to-one, and
many-to-many. In a database design this is indicated as: 1:1, 1:N, M:1, and
M:N. To find the right indication just leave the '1'. If there is a 'many' on the
left side, this will be indicated with 'M', if there is a 'many' on the right side it
is indicated with 'N'.
Customers --> Sales; 1 customer can buy something several times;
1:N.
Customers <-- Sales; 1 sale is always made by 1 customer at the
time; 1:1.
The true cardinality can be calculated through assigning the biggest values
for left and right, for which 'N' or 'M' are greater than '1'. In thisexample, in

both cases there is a '1' on the left side. On the right side, there is a 'N' and
a '1', the 'N' is the biggest value. The total cardinality is therefore '1:N'. A
customer can make multiple 'sales', but each 'sale' has just one customer.
If we do this for the other relationships too, we'll get:
Customers --> Sales; --> 1:N
Customers --> Products; --> M:N
Customers --> Shops; --> M:N
Sales --> Products; --> M:N
Shops --> Sales; --> 1:N
Shops --> Products; --> M:N
So, we have two '1-to-many' relationships, and four 'many-to-many'
relationships.

Figure 2: Relationships between the entities.

Between the entities there may be a mutual dependency. This means that
the one item cannot exist if the other item does not exist. For example,
there cannot be a sale if there are no customers, and there cannot be a
sale if there are no products.
The relationships Sales --> Customers, and Sales --> Products are
mandatory, but the other way around this is not the case. A customer can
exist without sale, and also a product can exist without sale. This is of
importance for the next step.
Recursive Relationships

Sometimes an entity refers back to itself. For example, think of a work


hierarchy: an employee has a boss; and the bosschef is an employee too.
The attribute 'boss' of the entity 'employees' refers back to the entity
'employees'.
In an ERD (see next chapter) this type of relationship is a line that goes out
of the entity and returns with a nice loop to the same entity.
Redundant Relationships

Sometimes in your model you will get a 'redundant relationship'. These are
relationships that are already indicated by other relationships, although not
directly.
In the case of our example there is a direct relationships between
customers and products. But there are also relationships from customers to
sales and from sales to products, so indirectly there already is a
relationship between customers and products through sales. The
relationship 'Customers <----> Products' is made twice, and one of them is
therefore redundant. In this case, products are only purchased through a

sale, so the relationships 'Customers <----> Products' can be deleted. The


model will then look like this:

Figure 3: Relationships between the entities.

Solving Many-to-Many Relationships

Many-to-many relationships (M:N) are not directly possible in a database.


What a M:N relationship says is that a number of records from one table
belongs to a number of records from another table. Somewhere you need
to save which records these are and the solution is to split the relationship
up in two one-to-many relationships.
This can be done by creating a new entity that is in between the related
entities. In our example, there is a many-to-many relationship between
sales and products. This can be solved by creating a new entity: salesproducts. This entity has a many-to-one relationship with Sales, and a
many-to-one relationship with Products. In logical models this is called an
associative entity and in physical database terms this is called a link table
or junction table.

Figure 4: Many to many relationship implementation via associative entity.

In the example there are two many-to-many relationships that need to be


solved: 'Products <----> Sales', and 'Products <----> Shops'. For both
situations there needs to be created a new entity, but what is that entity?
For the Products <----> Sales relationship, every sale includes more
products. The relationship shows the content of the sale. In other words, it
gives details about the sale. So the entity is called 'Sales details'. You could
also name it 'sold products'.
The Products <----> Shops relationship shows which products are available
in which the shops, also known as 'stock'. Our model would now look like
this:

Figure 5: Model with link tables Stock and Sales_details.

Identifying Attributes
The data elements that you want to save for each entity are called
'attributes'.
About the products that you sell, you want to know, for example, what the
price is, what the name of the manufacturer is, and what the type number
is. About the customers you know their customer number, their name, and
address. About the shops you know the location code, the name, the
address. Of the sales you know when they happened, in which shop, what
products were sold, and the sum total of the sale. Of the vendor you know
his staff number, name, and address. What will be included precisely is not
of importance yet; it is still only about what you want to save.

Figure 6: Entities with attributes.

Derived Data

Derived data is data that is derived from the other data that you have
already saved. In this case the 'sum total' is a classical case of derived
data. You know exactly what has been sold and what each product costs,
so you can always calculate how much the sum total of the sales is. So
really it is not necessary to save the sum total.
So why is it saved here? Well, because it is a sale, and the price of the
product can vary over time. A product can be priced at 10 euros today and
at 8 euros next month, and for your administration you need to know what it
cost at the time of the sale, and the easiest way to do this is to save it here.
There are a lot of more elegant ways, but they are too profound for this
article.

Presenting Entities and Relationships: Entity


Relationship Diagram (ERD)
The Entity Relationship Diagram (ERD) gives a graphical overview of the
database. There are several styles and types of ER Diagrams. A muchused notation is the 'crowfeet' notation, where entities are represented as
rectangles and the relationships between the entities are represented as
lines between the entities. The signs at the end of the lines indicate the

type of relationship. The side of the relationship that is mandatory for the
other to exist will be indicated through a dash on the line. Not mandatory
entities are indicated through a circle. "Many" is indicated through a
'crowfeet'; de relationship-line splits up in three lines.
In this article we make use of DeZign for Databases to design and present
our database.
A 1:1 mandatory relationship is represented as follows:

Figure 7: Mandatory one to one relationship.

A 1:N mandatory relationship:

Figure 8: Mandatory one to many relationship.

A M:N relationship is:

Figure 9: Mandatory many to many relationship.

The model of our example will look like this:

Figure 10: Model with relationships.

Assigning Keys
Primary Keys

A primary key (PK) is one or more data attributes that uniquely identify an
entity. A key that consists of two or more attributes is called a composite
key. All attributes part of a primary key must have a value in every record
(which cannot be left empty) and the combination of the values within these
attributes must be unique in the table.
In the example there are a few obvious candidates for the primary key.
Customers all have a customer number, products all have a unique product
number and the sales have a sales number. Each of these data is unique
and each record will contain a value, so these attributes can be a primary
key. Often an integer column is used for the primary key so a record can be
easily found through its number.
Link-entities usually refer to the primary key attributes of the entities that
they link. The primary key of a link-entity is usually a collection of these
reference-attributes. For example in the Sales_details entity we could use

the combination of the PK's of the sales and products entities as the PK of
Sales_details. In this way we enforce that the same product (type) can only
be used once in the same sale. Multiple items of the same product type in a
sale must be indicated by the quantity.
In the ERD the primary key attributes are indicated by the text 'PK' behind
the name of the attribute. In the example only the entity 'shop' does not
have an obvious candidate for the PK, so we will introduce a new attribute
for that entity: shopnr.
Foreign Keys

The Foreign Key (FK) in an entity is the reference to the primary key of
another entity. In the ERD that attribute will be indicated with 'FK' behind its
name. The foreign key of an entity can also be part of the primary key, in
that case the attribute will be indicated with 'PF' behind its name. This is
usually the case with the link-entities, because you usually link two
instances only once together (with 1 sale only 1 product type is sold 1
time).
If we put all link-entities, PK's and FK's into the ERD, we get the model as
shown below. Please note that the attribute 'products' is no longer
necessary in 'Sales', because 'sold products' is now included in the linktable. In the link-table another field was added, 'quantity', that indicates how
many products were sold. The quantity field was also added in the stocktable, to indicate how many products are still in store.

Figure 11: Primary keys and foreign keys.

Defining the Attribute's Data Type


Now it is time to figure out which data types need to be used for the
attributes. There are a lot of different data types. A few are standardized,
but many databases have their own data types that all have their own
advantages. Some databases offerthe possibility to define your own data
types, in case the standard types cannot do the things you need.
The standard data types that every database knows, and are most-used,
are: CHAR, VARCHAR, TEXT, FLOAT, DOUBLE, and INT.
Text:
CHAR(length) - includes text (characters, numbers, punctuations...).
CHAR has as characteristic that it always saves a fixed amount of

positions. If you define a CHAR(10) you can save up to ten positions


maximum, but if you only use two positions the database will still save
10 positions. The remaining eight positions will be filled by spaces.
VARCHAR(length) - includes text (characters, numbers,
punctuation...). VARCHAR is the same as CHAR, the difference is
that VARCHAR only takes as much space as necessary.
TEXT - can contain large amounts of text. Depending on the type of
database this can add up to gigabytes.
Numbers:
INT - contains a positive or negative whole number. A lot of
databases have variations of the INT, such as TINYINT, SMALLINT,
MEDIUMINT, BIGINT, INT2, INT4, INT8. These variations differ from
the INT only in the size of the figure that fits into it. A regular INT is 4
bytes (INT4) and fits figures from -2147483647 to +2147483646, or if
you define it as UNSIGNED from 0 to 4294967296. The INT8, or
BIGINT, can get even bigger in size, from 0 to
18446744073709551616, but takes up to 8 bytes of diskspace, even
if there is just a small number in it.
FLOAT, DOUBLE - The same idea as INT, but can also store floating
point numbers. . Do note that this does not always work perfectly. For
instance in MySQL calculating with these floating point numbers is
not perfect, (1/3)*3 will result with MySQL's floats in 0.9999999, not 1.
Other types:

BLOB - for binary data such as files.INET - for IP addresses. Also


useable for netmasks.
For our example the data types are as follows:

Figure 12: Data model displaying data types.

Normalization
Normalization makes your data model flexible and reliable. It does generate
some overhead because you usually get more tables, but it enables you to
do many things with your data model without having to adjust it.
Normalization, the First Form

The first form of normalization states that there may be no repeating groups
of columns in an entity. We could have created an entity 'sales' with
attributes for each of the products that were bought. This would look like
this:

Figure 13: Not in 1st normal form.

What is wrong about this is that now only 3 products can be sold. If you
would have to sell 4 products, than you would have to start a second sale
or adjust your data model by adding 'product4' attributes. Both solutions are
unwanted. In these cases you should always create a new entity that you
link to the old one via a one-to-many relationship.

Figure 14: In accordance with 1st normal form.

Normalization, the Second Form

The second form of normalization states that all attributes of an entity


should be fully dependent on the whole primary key. This means that each
attribute of an entity can only be identified through the whole primary key.
Suppose we had the date in the Sales_details entity:

Figure 15: Not in 2nd normal form.

This entity is not according the second normalization form, because in


order to be able to look up the date of a sale, I do not have to know what is
sold (productnr), the only thing I need to know is the sales number. This
was solved by splitting up the tables into the sales and the Sales_details
table:

Figure 16: In accordance with 2nd normal form.

Now each attribute of the entities is dependent on the whole PK of the


entity. The date is dependent on the sales number, and the quantity is
dependent on the sales number and the sold product.
Normalization, the Third Form

The third form of normalization states that all attributes need to be directly
dependent on the primary key, and not on other attributes. This seems to
be what the second form of normalization states, but in the second form is
actually stated the opposite. In the second form of normalization you point
out attributes through the PK, in the third form of normalization every
attribute needs to be dependent on the PK, and nothing else.

Figure 17: Not in 3rd normal form.

In this case the price of a loose product is dependent on the ordering


number, and the ordering number is dependent on the product number and
the sales number. This is not according to the third form of normalization.
Again, splitting up the tables solves this.

Figure 18: In accordance with 3rd normal form.

Normalization, More Forms

There are more normalization forms than the three forms mentioned above,
but those are not of great interest for the average user. These other forms
are highly specialized for certain applications. If you stick to the design
rules and the normalization mentioned in this article, you will create a
design that works great for most applications.
Normalized Data Model

If you apply the normalization rules, you will find that the 'manufacturer' in
de product table should also be a separate table:

Figure 19: Data model in accordance with 1st, 2nd and 3d normal form.

Glossary
Attributes - detailed data about an entity, such as price, length, name
Cardinality - the relationship between two entities, in figures. For example,
a person can place multiple orders.
Entities - abstract data that you save in a database. For example:
customers, products.
Foreign key (FK) - a referral to the Primary Key of another table. Foreign
Key-columns can only contain values that exist in the Primary Key column
that they refer to.
Key - a key is used to point out records. The most well-known key is the
Primary Key (see Primary Key).

Normalization - A flexible data model needs to follow certain rules. Applying


these rules is called normalizing.
Primary key - one or more columns within a table that together form a
unique combination of values by which each record can be pointed out
separately. For example: customer numbers, or the serial number of a
product.

Resources

Learn
DeZign for Databases: Learn more about DeZign for Databases.
Getting started with DeZign for Databases: Start making a data model
directly.
Display data types in a diagram: Learn how to display data type
and/or domain info in the entity boxes on your diagram.
Get products and technologies
Build your next data model with DeZign for Databases trial software,
available for download directly from Datanamic's download section.

You might also like