MSQL Lesson 5 - Database Design and Creation
MSQL Lesson 5 - Database Design and Creation
• design and create a simple database using the first three normal forms.
Instructions
This material will help you build a foundation for this week's topic. You will be required to attempt a quiz
after this lesson
Terms to Know
Database design:
Organizing data to minimize redundancy and ensure data integrity and consistency.
Normal Forms:
A set of rules that if applied to data reduces data redundancy and attributes of each entity are organized in
a way that increases data integrity and consistency.
Each cell of the table should only have one single (scalar) value and there should be no repeating columns.
The table should also have a primary key.
SQL commands that are used to define the database, to create and modify the actual structure of the
database. Examples are CREATE, ALTER and DROP.
SQL commands that deal with manipulating data in the database. Examples are INSERT, UPDATE, and
DELETE.
CREATE:
INSERT:
UPDATE:
SQL command to modify or edit data in tables. Always use a WHERE clause with UPDATE.
DELETE:
SQL command to delete data in tables. Always use a WHERE clause with DELETE.
DROP:
SQL command to delete not only data in the table but the table definition itself. So, the entire table is
completely deleted.
Referential Integrity:
Implies that relationships among data should be enforced to guarantee the relationship between rows in
two tables will remain synchronized during all updates and deletes.
Enforces referential integrity by guaranteeing that changes cannot be made to data in the primary table if
those changes invalidate the link to data in the foreign key table.
Designing a Database
Database Design
With database design we are trying to organize the data in such a way that redundancy is minimized
without losing any data. Redundancy is when we repeat the same data more than once. For example, each
time a student registers for a class their name and address might be repeated for each class in which they
are registered. Another way you might have repeating data is if you have a list of those same students and
their addresses in a separate file listing parking passes or ticket violations. Say the student's address
changes, then changing the address would mean editing in different locations; not only in multiple places in
registration, but also in a different file at the parking office. There is a greater chance of making a data
entry mistake when you must edit in multiple places. It takes more time and resources as well. Having
multiple copies of information also increases the chances for unauthorized access to data that should be
secure.
Good design also means our data should be accurate and reliable. Setting up constraints on our data is a
good way to ensure it has integrity and is consistent. We learned about what data integrity and data
consistency were earlier and now we will see this in action as we design a database.
For relational database design, reducing redundancy means splitting the data up into separate entities or
tables to minimize redundancy. But it is not always clear how to do this, especially with large 'real-life'
databases. Database design can be complex. You may not be the person who will design a database, usually
this is the database administrator, but knowing the proper way to design a database will help you
understand the structure of the database you work with.
Relational databases model real-world environments. The designer must analyze the real-world system and
design it into a relational database.
A data model organizes data and standardizes how the data relate to each other. The purpose of the model
is to show the data needed and created by business processes. The data model determines the structure of
data and the relationships and constraints of the data.
Let's look at an example of designing for a system that uses books and their associated information as
output. Here's a flat file showing all the data that is needed for the system:
1-1111- Intro to Sue Smith 123 Street Acme Publishing 60 Main Street
1111-1 Databases
2-2222- Database John Jones 44 Fourth Ave Frontier Inc 25th N 700 E
2222-2 Design
3-3333- Data Retrieval Sue Smith 123 Street Budd Publisher 345 Allen Blvd
3333-3
4-4444- Data Types Tim, Thomas, 67 E Bird Blvd, 44 Acme Publishing 60 Main Street
4444-4 John Jones Fourth Ave
5-5555- Queries for Fun Mary Miller 32 Launa Dr Perfect Inc 309 Wish Way
5555-5
Each book is listed with the international standard book number (ISBN), title, author, and publisher. For
relational databases, one table in the database will represent one entity in the real world with each row
being one instance of that entity and one column storing an attribute associated with that entity.
In the table above, there are 3 different entities: books, authors and publishers. In the table, we see some
of the authors and publishers are repeated. Remember we want to eliminate redundancy.
In the table above, the first two columns belong to the real world entity of books. The next two columns
belong to the entity of author, and the last two columns belong to the entity of publisher. If we separate
the data into 3 tables, we can eliminate the redundancy and have Sue Smith and Acme Publishing listed
only once.
ISBN Title
Differentiating between entities and attributes within those entities, can be tricky sometimes. For example,
maybe we find that we want to store both of the author's home address and work address. Suddenly the
address attributes of the author's table would become a separate table called Author Address with a
foreign key that relates to the author's primary key. You will ultimately make these decisions based on what
output is needed from the database. For now we will leave the entities as they are.
The book entity already has a great natural key that uniquely identifies each book; the ISBN. But there is a
possibility of two different authors having the same name or even publishing companies having the same
name, so we need a unique primary key for each of those tables. Here they have been assigned surrogate
keys for the primary keys. A surrogate key is a key that the system automatically assigns to each row to each
row unique.
Let's look at the relationships of the entities now, which will help us establish the foreign keys. The
relationship between Publishers and Books is one to many. A book belongs to one publisher and a
publisher can publish many books. The foreign key will be located in the table with the many part of the
relationship. So the primary key of Publisher becomes a foreign key of Books. Notice how Acme Publishing,
001 is listed multiple times as a foreign key in the Book table. They are the publisher for two different
books, 'Intro to Databases' and 'Date Types'.
The Books and Authors tables have a many-to-many relationship; each book can have many authors, and
each author can write many books. Therefore, there will be a linking table between the two tables to
resolve the many-to-many into 2 one-to-many relationships. The linking table will have the foreign keys
made up of each primary key of the other tables.
So in our linking table we have a composite primary key, meaning two attributes together ensure each row
is unique. Together these two foreign keys make up the primary key of the linking table. We have author ID
22 repeated a few times and ISBN 4-4444-4444-4 repeated a few times but when we put each foreign key
together with the other foreign key they uniquely identify each row. No two rows have both those
attributes together exactly the same. We can see the many-to- many being resolved here. Sue is the author
of multiple books and the book 'Data Types' has two authors. And it is all designed properly now for a
relational database.
ISBN Author ID
1-1111-1111-1 22
2-2222-2222-2 23
3-3333-3333-3 22
4-4444-4444-4 23
4-4444-4444-4 24
5-5555-5555-5 25
Separating the data into real-world entities and establishing relationships and reducing redundancy are is
part of normalizing the data. But there are a standard set of steps to follow to ensure your data is
normalized.
There are 7 normal forms that must be applied in order. Our course covers the first normal form.
First Normal Form (1NF) - The value stored at the intersection of each row and column must be a scalar
value, and the table must not contain any repeating columns. Scalar means ' being described as a single
value that might belong to a number of fields, but itself is one value. The table should also have a primary
key.
By following normalization rules (or normal forms) we reduce data redundancy. The attributes of each
entity are organized in a way that increases data consistency.
INSERT Statement
Let's add some data to the course table. Here are the column definitions as shown in Workbench:
There are two different ways to add one or more rows of data to your tables with the INSERT statement.
The first way is without a column list. With this method, every column in the table must have a value and it
must be entered in the proper column order as it was defined in the column definitions.
The second way is with a column list. The columns in the list don't have to be in any particular order, but
the values must be in the same order as the columns are listed in the INSERT statement. Here we have left
the same order of columns that is shown in the column definition image but we don't have to keep that
order.
VALUES
VALUES
Two columns in the Artist table have NULL as the default value if nothing is entered for that column. The
local column in the artist table has a default value of 'n'. That column tells us whether an artist is local or
not. Most artists are not local, so the value of 'n' will be inserted into that column if no other value is
placed there as data is entered for that row. Also notice that the artist_id is set up as auto-increment (AI).
We can use the keyword of DEFAULT instead of an actual integer for any auto-incremented field. And we
can also use NULL to leave a data value null. Or DEFAULT to enter the default value.
Since we already defined NULL in our column definitions for mname and dod, we could have also used
DEFAULT for mname and dod instead of NULL.
VALUES
You can also enter data into your tables in the table view in Workbench without using the SQL INSERT
command. When you apply the data you've entered, it will create an INSERT statement for you to run.
When entering a string or string literal this way, you don't have to put quotes around the string value.
UPDATE Statement
UPDATE table_name
SET column_name = expresson
WHERE search_condition
If we wanted to change Vincent Van Gogh's first name to Vinny we could use this UPDATE statement.
UPDATE artist
WHERE artist_id = 2;
DELETE Statement
If we wanted to delete the van Gogh row from our table, we could use this DELETE statement.
Don't forget to always use a WHERE clause with UPDATE and DELETE. Otherwise, every row in the table
will be UPDATED or DELETED. You might need to turn off the Safe Mode in Workbench by going to
'Preferences' ' 'SQL Editor' and uncheck 'Safe Updates' near the bottom of the preferences there.
Remember there is no undo button once you run a UPDATE or DELETE statement.
The way we create our database this week, we will not need to use the CREATE command because
Workbench does that for us in the forward engineering from our ERD to the database. We also should not
need to use DROP to delete any of our tables or any of our databases. If you need to delete a table or
database from Workbench because you made a mistake or want to start over, you can run a DROP
statement but be careful not to drop something you don't mean to. You can also right click the table or
database name from your schema list in Workbench and drop them that way. Be very careful with the
DROP command. There is no undo button.
Referential Integrity and Foreign Key Constraints
Referential Integrity implies that relationships among tables should be enforced. This guarantees that
relationships between rows in two tables remain synchronized during all updates and deletes. When we
create a database from our ERD diagram the foreign keys are by default mandatory. In other words, MySQL
is enforcing the relationships between tables. This is referred to as Foreign Key Constraints. This constraint
enforces the referential integrity by guaranteeing that changes cannot be made to data in the primary key
table if those changes invalidate the link to data in the foreign key table.
This is important to know because, as we enter insert, update, and delete data in our tables, we cannot
delete a row of data that has a foreign key associated with the primary key of that row in another table. We
also cannot add a row to a table with a foreign key that does not already exist as a primary key of another
table.
Also, if an attempt is made to insert a row into a table that uses a foreign key that doesn't exist as a primary
key in another table, the action will fail. If an attempt is made to delete a row in a primary key table or to
update a primary key value, the action will fail when the deleted or updated primary key value corresponds
to a foreign key value of another table. All links to that primary key would have to be deleted first before
you could delete or update that primary key.
Submission