Kanenus College: Occupation: Database Administration L-III
Kanenus College: Occupation: Database Administration L-III
1
UC: Design Database
Database Design is a collection of processes that facilitate the designing, development,
implementation and maintenance of enterprise data management systems
You should note that the above says nothing about how the data are processed, but it does state what
the data items are, what attributes they have, what constraints apply and the relationships that hold
between the data items.
Among the many sources of information regarding database requirements, the most common include
interviews, business forms , Questionnaire, and existing systems (Observation).
Interviews
For all the computing power companies use today, people still make the decisions and make things
happen in a business. Don't underestimate the power of talking and—perhaps more important—
listening to the employees of a firm. These people might not be able to speak in technical database
terms, but then again, that is why you are on the scene—to be the gatherer and translator of
information.
Through interviews, you will be able to learn how information passes through an organization. You
will also be able to learn which specific pieces of data individuals rely on and the decisions they make
based on that data.
CLUE
2
When conducting interviews, don't limit yourself to just the management personnel and the frontline staff. Each
level of an organization has a piece of the puzzle to contribute. Each is equally important because without all the
pieces of a puzzle, you can never have a complete picture. As a result, without all the pieces, you can never have
a complete understanding of the business and the requirements of the database you are being tasked with
building.
Invoices
Reports
Shipping documents
Timesheets
Customer service surveys
Any other piece of paper somebody in the business uses
In an interview, you might discover that it is critical the database stores information about customers.
Through the analysis and examination of business forms and documents, you discover the specific
pieces of customer information that must be captured. Another example is tracking time. In your
current project, the Time Entry and Billing database, you know you will have to keep track of time at
some level of detail. A typical timesheet might contain some or all of the following information:
Employee name
Date of work
Start time
End time
Task performed
Project
Client
In looking at this example involving a timesheet, it becomes clearer as to what information the
database must store.
CLUE
Time to stop and catch your breath here. It is important to illustrate how interviews and business
forms work together. It is all a seamless web. Interviews provide the big picture of what information
3
the database stores. To some extent, interviews also can provide information on the details. However,
an analysis of the business forms and documents a company uses provides the bulk of the detail
information.
CLUE
In many respects, the user-interface components of an existing system can be regarded as business documents
and forms. For example, a physical piece of paper that represents a timesheet might not exist. Rather, employees
might enter their time online via a time-entry screen. As a database developer, you will need to be aware that the
line between disparate sources of information is not black and white. Further, there is no set order in which the
different elements are reviewed. Finally, it might very well be the case that the same sources of information will
need to be revisited. For example, after you have analyzed a business document, you might have to return to the
process of employee interviews to get further clarification of the database requirements.
Remember the mantra of patience? The process of requirements gathering might seem like a never-ending
process. Some believe that systems and databases are never "complete" because the business environment
constantly evolves and changes. At some point, you will acquire enough information to establish an initial
database design. Just because you create that first design does not mean the requirements-gathering process is
complete. Remember, you might have to repeat the process two, three, or perhaps four or more times to achieve
that first database design. As you gain experience, the process will become more familiar to you.
At this point, you have an understanding of where sources of information can be found for
requirements gathering. The question now is how do you put an initial design together. The answer is
in the form of a database model. The process of creating your first database model is the focus of the
next section.
2. Conceptual Design
The purpose of the conceptual design phase is to build a conceptual model based upon the previously
identified requirements, but closer to the final physical model. A commonly-used conceptual model is
called an entity-relationship model.
4
Entities and attributes
Entities are basically people, places, or things you want to keep information about. For example, a
library system may have the book, library and borrower entities. Learning to identify what should be
an entity, what should be a number of entities, and what should be an attribute of an entity takes
practice, but there are some good rules of thumb. The following questions can help to identify
whether something is an entity:
Can it vary in number independently of other entities? For example, person height is probably
not an entity, as it cannot vary in number independently of person. It is not fundamental, so it
cannot be an entity in this case.
Is it important enough to warrant the effort of maintaining. For example customer may not be
important for a small grocery store and will not be an entity in that case, but it will be
important for a video store, and will be an entity in that case.
Is it its own thing that cannot be separated into subcategories? For example, a car-rental
agency may have different criteria and storage requirements for different kinds of
vehicles. Vehicle may not be an entity, as it can be broken up into car and boat, which are the
entities.
Does it list a type of thing, not an instance? The video game blow-em-up 6 is not an entity,
rather an instance of the game entity.
Does it have many associated facts? If it only contains one attribute, it is unlikely to be an
entity. For example, citymay be an entity in some cases, but if it contains only one
attribute, city name, it is more likely to be an attribute of another entity, such as customer.
The following are examples of entities involving a university with possible attributes in parentheses.
An instance of an entity is one particular occurrence of that entity. For example, the student Rudolf
Sono is one instance of the student entity. There will probably be many instances. If there is only one
instance, consider whether the entity is warranted. The top level usually does not warrant an entity.
For example, if the system is being developed for a particular university, university will not be an
entity because the whole system is for that one university. However, if the system was developed to
track legislation at all universities in the country, then university would be a valid entity.
Attributes
An attribute is a description of an entity. It is simply one non-null cell in the spreadsheet, or the
conjunction of a column and row. It stores only one piece of data about the object represented by the
table in which the attribute belongs. For example, the attributes in an invoice might be price, number,
date or paid/unpaid.
Types of attribute
5
Example: Imagine from the entity Student that instead of having the three attributes: stu_LastName,
stu_MiddleName, stu_FirstName it had one attribute called stu_Name. The attribute stu_Name would
be considered a composite attribute since it can be subdivied into the other three
attributes: stu_LastName, stu_MiddleName, stu_FirstName. The rest of attributes would be consider
single attributes since they can't be subdivided into parts.
Derived Attributes
The last category that attributes can be defined is called a derived attribute, where one attribute is
calculated from another attribute. The derived attribute may not be stored in the database but rather
calculated using algorithm.
Example: In the entity Student, stu_Age would be considered a derived attribute since it could be
calculated using the student's date of birth with the current date to find their age.
Data type
SQL supports the following data types for its column and parameter declarations.
The CHARACTER data type accepts character strings. The length of the character string should be specified in the data type
declaration; for example, CHARACTER(n) where n represents the desired length of the character string. If no length is
specified during the declaration, the default length is 1.
The minimum length of the CHARACTER data type is 1 and it can have a maximum length up to the table page size.
Character strings that are larger than the page size of the table can be stored as a Character Large Object (CLOB).
6
Character String Examples:
CHAR(10) or CHARACTER(10)
VARCHAR (length)
The VARCHAR data type accepts character strings, including Unicode, of a variable length is up to the maximum length
specified in the data type declaration.
A VARCHAR declaration must include a positive integer in parentheses to define the maximum allowable character string
length. For example, VARCHAR(n) can accept any length of character string up to n characters in length. The length
parameter may take any value from 1 to the current table page size.
Examples: VARCHAR(10)
BOOLEAN
The BOOLEAN data type supports the storage of two values: TRUE or FALSE. No parameters are required when declaring
a BOOLEAN data type.
Examples
BOOLEAN
Invalid: 1, 0, Yes, No
SHORT
The SMALLINT data type accepts numeric values with an implied scale of zero. It stores any integer value between the
range 2^ -15 and 2^15 -1. Attempting to assign values outside this range causes an error.
INTEGER or INT
The INTEGER data type accepts numeric values with an implied scale of zero. It stores any integer value between the range
2^ -31 and 2^31 -1. Attempting to assign values outside this range causes an error.
DECIMAL [(p[,s])]
The DECIMAL data type accepts numeric values, for which you may define a precision and a scale in the data type
declaration. The precision is a positive integer that indicates the number of digits that the number will contain. The scale is a
positive integer that indicates the number of these digits that will represent decimal places to the right of the decimal point.
The scale for a DECIMAL cannot be larger than the precision.
FLOAT(p), DOUBLE
The FLOAT data type accepts approximate numeric values, for which you may define a precision up to a maximum of 64. If
no precision is specified during the declaration, the default precision is 64. Attempting to assign a value lager than the
declared precision will cause an error to be raised.
7
DATE
The DATE data type accepts date values. No parameters are required when declaring a DATE data type. Date values should
be specified in the form: YYYY-MM-DD. However, PointBase will also accept single digits entries for month and day
values.
TIME
The TIME data type accepts time values. No parameters are required when declaring a TIME data type. Date values should
be specified in the form: HH:MM:SS. An optional fractional value can be used to represent nanoseconds.
The minutes and seconds values must be two digits. Hour values should be between zero 0 and 23, minute values should be
between 00 and 59 and second values should be between 00 and 61.999999.
Values assigned to the TIME data type should be enclosed in single quotes, preceded by the case insensitive keyword TIME;
for example, TIME '07:30:00'.
TIMESTAMP
The TIMESTAMP data type accepts timestamp values, which are a combination of a DATE value and a TIME value. No
parameters are required when declaring a TIMESTAMP data type. Timestamp values should be specified in the form:
YYYY-MM-DD HH:MM:SS. There is a space separator between the date and time portions of the timestamp.
All specifications and restrictions noted for the DATE and TIME data types also apply to the TIMESTAMP data type.
Values assigned to the TIMESTAMP data type should be enclosed in single quotes, preceded by the case insensitive
keyword TIMESTAMP; for example, TIMESTAMP '1999-04-04 07:30:00'.
Relationships
Entities are related in certain ways. For example, a borrower may belong to a library and can take out books. A
book can be found in a particular library. Understanding what you are storing data about, and how the data
relate, leads you a large part of the way to a physical implementation in the database.
There are a number of possible relationships:
One-to-one (1:1)
This is where for each instance of entity A, there exists one instance of entity B, and vice-versa. If the
relationship is optional, there can exist zero or one instances, and if the relationship is mandatory, there exists
one and only one instance of the associated entity.
One-to-many (1:M)
For each instance of entity A, many instances of entity B can exist, which for each instance of entity B, only one
instance of entity A exists. Again, these can be optional or mandatory relationships.
Many-to-many (M:N)
For each instance of entity A, many instances of entity B can exist, and vice versa. These can be optional or
mandatory relationships.
8
There are numerous ways of showing these relationships. The image below shows student and course entities. In
this case, each student must have registered for at least one course, but a course does not necessarily have to
have students registered. The student-to-course relationship is mandatory, and the course-to-student relationship
is optional.
The image below shows invoice_line and product entities. Each invoice line must have at least one product (but
no more than one); however each product can appear on many invoice lines, or none at all. The invoice_line-to-
productrelationship is mandatory, while the product-to-invoice_line relationship is optional.
The figure below shows husband and wife entities. In this system (others are of course possible), each husband
must have one and only one wife, and each wife must have one, and only one, husband. Both relationships are
mandatory.
An entity can also have a relationship with itself. Such an entity is called a recursive entity. Take
a person entity. If you're interested in storing data about which people are brothers, you wlll have an "is brother
to" relationship. In this case, the relationship is a M:N relationship.
Conversely, a weak entity is an entity that cannot exist without another entity. For example, in a school,
the scholar entity is related to the weak entity parent/guardian. Without the scholar, the parent or guardian
cannot exist in the system. Weak entities usually derive their primary key, in part or in totality, from the
associated entity. parent/guardian could take the primary key from the scholar table as part of its primary key
(or the entire key if the system only stored one parent/guardian per scholar).
The term connectivity refers to the relationship classification.
The term cardinality refers to the specific number of instances possible for a relationship. Cardinality limits list
the minimum and maximum possible occurrences of the associated entity. In the husband and wife example, the
9
cardinality limit is (1,1), and in the case of a student who can take between one and eight courses, the cardinality
limits would be represented as (1,8).
The same applies even if the entity is recursive. The person entity that has an M:N relationship "is brother to"
also needs an intersection entity. You can come up with a good name for the intersection entity in this
case: brother. This entity would contain two fields, one for each person of the brother relationship — in other
words, the primary key of the first brother and the primary key of the other brother.
10
ER-diagram example
Database Schema
A database schema is description of a database. It is the skeleton structure that represents the logical view of
the entire database. It defines how the data is organized and how the relations among them are associated. It
formulates all the constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive detail of the
database, which can be depicted by means of schema diagrams. It’s the database designers who design the
schema to help programmers understand the database and make it useful.
11
A database schema can be divided broadly into two categories −
Physical Database Schema − This schema pertains to the actual storage of data and its form of
storage like files, indices, etc. It defines how the data will be stored in a secondary storage.
Logical Database Schema − This schema defines all the logical constraints that need to be applied on
the data stored. It defines tables, views, and integrity constraints.
Database Instance
It is important that we distinguish these two terms individually. Database schema is the skeleton of database. It
is designed when the database doesn't exist at all. Once the database is operational, it is very difficult to make
any changes to it. A database schema does not contain any data or information.
A database instance is a state of operational database with data at any given time. It contains a snapshot of the
database. Database instances tend to change with time. A DBMS ensures that its every instance (state) is in a
valid state, by diligently following all the validations, constraints, and conditions that the database designers
have imposed.
Normalization
Normalization is a database design technique which organizes tables in a manner that reduces redundancy
and dependency of data.
It divides larger tables to smaller tables and links them using relationships.
12
1NF (First Normal Form) Rules
Each table cell should contain a single value.
Each record needs to be unique.
1NF Example
A transitive functional dependency is when changing a non-key column, might cause any of the
other non-key columns to change
13
Consider the table 1. Changing the non-key column Full Name may change Salutation.
To move our 2NF table into 3NF, we again need to again divide our table.
3NF Example
TABLE 1
Table 2
Table 3
We have again divided our tables and created a new table which stores Salutations.
There are no transitive functional dependencies, and hence our table is in 3NF
14
In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to primary key in
Table 3
Logical Design
Overview
Once the conceptual design is finalized, it's time to convert this to the logical and physical design. Usually, the
DBMS is chosen at this stage, depending on the requirements and complexity of the data structures. Strictly
speaking, the logical design and the physical design are two separate stages, but are often merged into one.
Each entity will become a database table, and each attribute will become a field of this table. Foreign keys can
be created if the DBMS supports them and the designer decides to implement them. If the relationship is
mandatory, the foreign key must be defined as NOT NULL, and if it's optional, the foreign key can allow nulls.
For example, because of the invoice line-to-product relationship in the previous example, the product code field
is a foreign key in the invoice to line table. Because the invoice line must contain a product, the field must be
defined as NOT NULL. Normalizing your tables is an important step when designing the database. This process
helps avoid data redundancy and improves your data integrity.
Novice database designers usually make a number of common errors. If you've carefully identified entities and
attributes and you've normalized your data, you'll probably avoid these errors.
Primary Key
Primary key uniquely identify a record in the table. It can't accept null values. By default, Primary key is
clustered index and data in the database table is physically organized in the sequence of clustered index. We
can have only one Primary key in a table.
Foreign Key
Foreign key is a field in the table that is primary key in another table. Foreign key can accept multiple null
value. Foreign key do not automatically create an index, clustered or non-clustered. You can manually create
an index on foreign key. We can have more than one foreign key in a table.
Referential Constraints
Referential Constraints or simply Constraints are simple rules that can be created in SQL databases to
prevent unwanted/incorrect data from being entered into the database. The constraints are normally decided
before the SQL table is created and are created along with the table.
Here is an example. A SQL table stores the marks scored by students in a test and it is known that the scores can
only range from 0 to 100. In such a scenario, a constraint can be created in SQL to enforce that only values
between 0 and 100 can be entered in the marks field and any values outside this range will be rejected.
All SQL implementations support the following constraints:
15
NOT NULL - When defined for a table field, this constraint ensures that the value entered cannot be
NULL. For example, a field may be created in a table to store the birth date of students. The birth date
field is mandatory as it is used extensively elsewhere, for example, to determine the eligibility of
students to register for courses or to apply for temporary work permits. Thus, it is obvious that the birth
date field cannot be left with a NULL value and the field has to be created with a NOT NULL
constraint.
UNIQUE - When defined for a table field, this constraint ensures that all the values entered for
this field must be unique. For example, a field may be created to store the social security
number of employees. In real life, all social security numbers are unique and the UNIQUE
constraint is used to enforce this in the database. Once this constraint is used, no duplicate
social security numbers will be accepted and if an attempt is made to enter a record
containing a duplicate number, an error alert will be immediately raised.
CHECK - When defined for a table field, the CHECK constraint ensures that the value entered is within the
permissible range of values. This example has been mentioned earlier where the scores for students in a test is only
possible between 0 to 100. The CHECK constraint is used to ensure that only valid values within this range are
accepted and anything outside is rejected.
DEFAULT - When defined for a table field, the DEFAULT constraint provides a default value for the
field when no specific value is provided. For example, it is known that most members of a society are
males and a SQL table is created to store member details with sex being one of the contained fields. As
most members are males, during data entry the sex field for male members can all be left blank and the
DEFAULT constraint will automatically substitute it with male. For female members, the value has to
be entered explicitly.
PRIMARY KEY - The PRIMARY KEY constraint applies to the table and not the fields as explained
for the four preceding constraints. PRIMARY KEY can be composed of either single or multiple fields
in the table and this constraint ensures that all records in the table are uniquely identified. An example
using this constraint is, a table has to be created to store the personal details of all employees in an
organization and as the employeeID field uniquely identifies all employees, the field can be used as the
PRIMARY KEY constraint.
FOREIGN KEY - FOREIGN KEY is a constraint that is used to enforce integrity.
Referential Integrity
Referential integrity (RI) is a relational database concept, which states that table relationships must always be
consistent. In other words, any foreign key field must agree with the primary key that is referenced by the
foreign key. Thus, any primary key field changes must be applied to all foreign keys, or not at all. The same
restriction also applies to foreign keys in that any updates (but not necessarily deletions) must be propagated
to the primary parent key.
CUSTOMER_MASTER Table: This holds basic customer/account holder data such as name, social
security number, address and date of birth.
ACCOUNTS_MASTER Table: This stores basic bank account data such as account type, account
creation date, account holder and withdrawal limits.
To uniquely identify each customer/account holder in the CUSTOMER_MASTER table, a primary key column
named CUSTOMER_ID is created.
To identify a customer and bank account relationship in the ACCOUNTS_MASTER table, an existing customer
in the CUSTOMER_MASTER table must be referenced. Thus, the CUSTOMER_ID column – also created in
the ACCOUNTS_MASTER table – is a foreign key. This column is special because its values are not newly
created. Rather, these values must reference existing and identical values in the primary key column of another
table, which is the CUSTOMER_ID column of the CUSTOMER_MASTER table.
16
Referential integrity is a standard that means any CUSTOMER_ID value in the CUSTOMER_MASTER table
may not be edited without editing the corresponding value in the ACCOUNTS_MASTER table. For example, if
Andrew Smith’s customer ID is changed in the CUSTOMER_MASTER table, this change also must be applied
to the ACCOUNTS_MASTER table, thus allowing Andrew Smith’s account information to link to his customer
ID.
Index Design
Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes
on which the indexing has been done.
Clustered Indexes: Clustered indexes store row data in order. Only a single clustered index can be
created on a database table. This works efficiently only if data is sorted in increasing and decreasing
order or a limit is specified on the columns involved in the table. Such a sequential arrangement of data
on disks reduces block reads.
Non-Clustered Indexes: In non-clustered indexes, data is arranged in a random way, but a logical
ordering is internally specified by the index. Thus, the index order is not the same as the physical
ordering of data.
Data Dictionary
A data dictionary is a file or a set of files that contains a database's metadata. The data dictionary contains
records about other objects in the database, such as data ownership, data relationships to other objects, and
other data.
The data dictionary is a crucial component of any relational database. Because of its importance, it is invisible
to most database users. Typically, only database administrators interact with the data dictionary.
In a relational database, the metadata in the data dictionary includes the following:
17