0% found this document useful (0 votes)
18 views22 pages

Identify Physical Database Requirements LO2

Identify Physical Database Requirements

Uploaded by

birhanugirmay559
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views22 pages

Identify Physical Database Requirements LO2

Identify Physical Database Requirements

Uploaded by

birhanugirmay559
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

NAME OF INSTITUTION

UNDER

Ethiopian TVET-System

INFORMATION TECHNOLOGY
DATABASE ADMINISTRATION
LEVEL III
LEARNING GUIDE # 6
Unit of Competence : Identify Physical Database Requirements

Module Title : Identifying Physical Database Requirements


LG Code : ICT DBA3 M02 06
TTLM Code : ICT DBA3 TTLM 0817
INFORMATION SHEET 3
LO2: IDENTIFY DATABASE REQUIREMENTS

Understanding user needs


designing any custom product, whether it’s a database, beach house, or case mod, is largely a translation
process. you need to translate the customers’ needs, wants, and desires from the sometimes-fuzzy ideas
floating around in their heads into a product that meets the customers’ needs. the first step in the translation
process is understanding the user’s requirements. unless you know what the user needs, you cannot build it.
designing the best order processing database imaginable won’t do you a bit of good if the customer really
wants a circuit design database or an ostrich race handicapping system. just as the database design forms the
foundation upon which the rest of the application’s development stands, your understanding of the user’s
needs forms the foundation of the database design. if you don’t know what the user needs, how can you
possibly design it? this information sheet explains techniques that you can use to learn about the customer’s
needs. it describes methods that you can use to record those needs in a concrete and verifiable way. the
sections that follow describe some of the steps you can take to better understand the customers’ needs. in
some projects, you may not need to follow all of these steps. in other projects, the steps may work best in a
different order. in this information sheet, you learn how to:
 understand the customers’ needs and motivations.
 gather and document user requirements.
 cull requirements from existing practices and information.
 build use cases to understand the user’s needs and to measure success or failure.
 anticipate changes and future needs to build the most flexible database possible.

Bring a List of Questions


From the very first day, you should start thinking of questions to ask the customers to get a better idea of the
Project’s goals and scope.
The following sections list some questions that you can ask your customers to help understand their needs.
You’ll see many of them described in greater detail later in this chapter. This list is by no means complete—
the questions that you need to ask will depend to a large extent on the type of project. Use them only as a
starting point. It’s helpful to have something to work from when you start, however. Then you can then
wander off in promising directions as the discussions continue.

Functionality
These questions deal with what the system is supposed to accomplish and, to a lesser extent, how. It is
usually best to avoid deciding how the system should do anything until you thoroughly understand what it
should do so you don’t become locked into one idea too early, but it’s still useful to record any impressions
the customers have of how the system should work.
· What should the system do?
· Why are you building this system? What do you hope it will accomplish?
· What should it look like? Sketch out the user interface.
· What response times do you need for different parts of the system? (Typically, interactive
response times
should be under five seconds, whereas reports and other offline activities may take longer.
· What reports are needed?
· Do the end users need to be able to define new reports?
· Who are the players? (ties to previous section)
· Do power users and administrators need to be able to define new reports?

Data Needs
These questions help clarify the project’s data needs. Knowing what data is needed will help you start
defining the
Database’s tables.
· What data is needed for the user interface?
· Where should that data come from? How are those pieces of data related?
· How are these tasks handled today? Where does the data come from?

Data Integrity
These questions deal with data integrity. They help you define some of the integrity constraints that you will
build into the database.
· What values are allowed in which fields?
· Which fields are required? (For example, does a customer record need a phone number? A fax numbers?
An email addresses? One of those but not all of them?)
· What are the valid domains (allowed values) for various fields? What phone number formats are allowed?
How long can customer names be? Addresses? Do addresses need extra lines for suite or apartment number?
Do addresses need to handle U.S. ZIP Codes such as 12345? ZIP+4 Codes such as 12345-6789? Canadian
postal codes such as T1A 6G9? Or other countries’ postal codes?
· Which fields should refer to foreign keys? (For example, an address’s State field might need to be in the
States table and a Customer field might need to be in the Customers table. I’ve seen customers with a big
list of standard comments and a Comments field can only take those values.)
· Should the system validate cities against postal codes? (For example, should it verify that the 10005 ZIP
Code is in New York City, New York? That’s cool but a bit tricky and can involve a lot of data.)
· Do you need a customer record before you can place orders?
· If a customer cancels an account, do you want to delete the corresponding records or just
flag them as inactive?
· What level of system reliability is needed?
· Does the system need 24/7 access?
· How volatile is the data? How often does it need to be backed up?
· How disastrous will it be if the system crashes?
· How quickly do you need to be back up and running?
· How painful will it be if you lose some data during a crash?

Security
These questions focus on the application’s security. The answers to these questions will help you decide
which database product will work best (different products provide different forms of security) and what
architecture to use.
· Does each user need a separate password? (Generally, a good idea.)
· Do different users need access to different pieces of data? (For example, sales clerks might need to access
customer credit card numbers but order fulfillment technicians probably don’t.)
· Does the data need to be encrypted within the database?
· Do you need to provide audit trails recording every action taken and by whom? (For example, you can see

which clerk increased the priority of a customer who was ordering the latest iPod and then ask that clerk
why that happened.)
· What different classes of users will there be?
There are often three classes of users. First, clerks do most of the regular work. They enter orders, print
invoices, and so forth. Second, supervisors can do anything that clerks can and they also perform managerial
tasks. They can view reports, logs, and audit trails; assign clerks to tasks; grant bonuses; and so forth. Third,
super users or key users can do everything. They can reset user passwords, go directly into database tables to
fix problems, change system parameters such as the states that users can pick from dropdowns, and so forth.
There should only be a couple of super users and they should usually log in as supervisors, not as super users,
to prevent accidental catastrophes.
· How many of each class of user will there be? Will only one person need access to the data
at a time? Will
there be hundreds or even thousands (as is the case with some Web applications)?
· Is there existing documentation describing the users’ tasks and responsibilities?
Environment
These questions deal with the project’s surrounding environment. They gather information
about other systems and processes that the project will replace or with which it will interact.
 Does this system enhance or replace an existing system?
 Is there documentation describing the existing system?
 Does the existing system have paper forms that you can study?
 What features in the existing system are required? Which are not?
 What kinds of data does the existing system use? How is it stored? How are different pieces of
data related?
 Is there documentation for the existing system’s data?
 Are there other systems with which this one must interact?
 Exactly how will it interact with them?
 Will the new project send data to existing systems? How?
 Will the new project receive data from existing systems? How?
 Is there documentation for those systems?
 How does your business work? (Try to understand how this project fits into the bigger
picture.)

Meet the Customers


Before you can start any project, you need to know what it is about. Are you building an inventory system, a
supply?
chain model, or a stock price tracker and predictor? The best way to understand the system you need to
design and build is to interrogate the customers. Learning about the customers’ requirements can be a long
process. It can take days or even weeks of studying existing practices, poring over corporate documentation,
and spying on the customers while they do their daily jobs.
The goal is to give you an absolute and complete understanding of the problem you’re attempting to solve.
You want as few surprises as possible after you’re done researching the problem. Unexpected difficulties and
feature requests are the biggest reasons why software projects finish late, come in over budget, or fail
completely.
The sooner you can identify potential problems and the more completely you can identify the system’s
features, the easier it will be for you to plan for them and the less they will mess up your meticulously crafted
plan. Your initial encounters with the customer give you your first chance to address these issues so they
don’t bite you later. So, when you first start a project, meet the customers. Get to know them and what they
do. Even if the problem you are trying to solve is only a small part of their business, get a feel for the overall
picture. Sometimes you’ll find unexpected connections that may make your job easier or that may lead to
surprising benefits in a completely unrelated area.
When you first meet the customers, it usually doesn’t hurt to warn them that you’re going to be major pest
for a while. This can also help you figure out who’s who. Those who are committed to the project and are
eager to succeed will take your warning well. Those who are less than dedicated may tip their hands at this
point.

Learn Who’s Who


Ideally a project team works well together, everyone does the best possible job without conflict, and the
project moves along smoothly to create a finished product that meets the customers’ needs. In practice,
however, it doesn’t always work out that way. Everyone has his or her own personal abilities, agenda, and
motivation that don’t always coincide with those of the other team members.
As you get to know the customers (and your team members), it’s important to realize that not everyone
shares the same vision of the product. You need to figure out which customer is the leader, which are team
players, and which have little or no say in specifying the project. The following list describes some of the
roles that customers (and developers) often play in a project. Naturally
these cannot categorize everyone, but they define some characteristics that you should look for. · Executive
Champion. This is the highest-ranking customer driving the project. Often this person doesn’t participate in
the project’s day-to-day trials and tribulations.
 Customer Champion. This person has a thorough understanding of the customers’ needs.
Lesser champions may help define pieces of the project but this is the person you run to when
the others are unclear. For the purposes of this this information sheet, this is the most
important person on the project. This person must have enough time and resources to help you
define the project and answer your questions. Ideally this person also has enough influence to
make decisions.
 Customer Representative. A Customer Representative is someone assigned to answer your
questions and help define the project. Often these are people who do the day-to-day work of
your customers’ business. Sometimes they are experts in only parts of the business so you
need more than one to cover all of the issues.
 Stakeholder. This is anyone who has an interest in the project. Some of these falls into other
categories such as Customer Champion or Customer Representative. Others are affected by
the outcome but have no direct say in the design of the system. For example, front-line clerks
rarely get to toss in their opinion when you design a point-of-sales system. Though many of
them have no direct power over the outcome, you should keep them in mind.
 Sidekick/Gopher. This is someone who can help you get the more commonplace resources
you need such as conference rooms, airline tickets, and food. Though this isn’t a glamorous
job, an effective Sidekick can make everything run more smoothly.
 Short-Timer. This is someone who is only going to be around for a short while. This may be
someone who is about to be promoted to a new division, who will retire soon, or who is just
plain fed up and about to walk. A dedicated short-timer can be a huge asset, particularly those
who are about to retire and take a lifetime’s worth of experience with them. Others don’t care
all that much whether the project succeeds or fails after they’re gone.
 Critic. This is an important role for avoiding groupthink. Left unchecked, some groups
become irrationally optimistic and can make extremely poor decisions. A Critic keep the project
realistic... as long as the Critic doesn’t get out of hand. The purpose of the Critic is to maintain a
reality check, not to defeat the entire project.
 Convert. This is someone who originally is against the project but who you convert to your
cause.
 Generic Bad Guy. These range from simple defeatists and layouts to anyone actively trying to
sabotage the project. Don’t feel constrained by this list. These are just some of the characters that
I’ve encountered and you may meet others. Identify the main players as quickly as possible so
you know who to ask questions and where to run when the fighting erupts.
Pick the Customers’ Brains
Once you figure out more or less who the movers and shakers are, you can start picking their brains. Sit
down with the Customer Champion and find out what the customers think they need. Find out what they
think the solution should look like. Find out what data they think it should contain, how that data will be
presented, and how different parts of the data are related. Get input from as many Stakeholders as you can.
Always keep in mind, however, that the Customer Champion is the one who understands the customers’
needs thoroughly and has the authority to make the final decisions. Though you should consider everyone’s
opinions, the Customer Champion has the final word.
Depending on the scope of the project, this can take a while. Take your time and make sure the customers
have finished telling you what they think they need.
Walk a Mile in the User’s Shoes
Often following the customers’ day-to-day operations can give you some extremely helpful perspective.
Ideally you could do the customers’ jobs for them for a while to thoroughly learn what’s involved. Unless
your customers aren’t in your industry, however, you probably aren’t qualified to do their jobs. Though you
may not be able to actually do the customers’ jobs, you may be able to sit next to them while they do it. Warn
them that you will probably reduce productivity slightly by asking stupid and annoying questions. Then ask
away. Take notes and learn as much as you can. Sometimes your outsider’s point of view can lead to ideas
that the customers would never have discovered. Take notes while you’re watching the customers do their
jobs. Draw pictures and diagrams if that helps you visualize what they’re doing. Pictures can also be very
helpful in asking the customers if you have the right idea. If the customers will let you, print screen shots and
even take photographs.
Study Current Operations
After you’ve walked a mile or two in the customers’ shoes, see if there are other ways that you can study the
current operation. Often companies have procedure manuals and documentation that describes the customers’
roles and responsibilities. Look around for any existing databases that the customers use. There are many
different kinds of databases. Don’t just look for relational databases. Look also for note files, filing cabinets,
boxes of index cards, tickler files, and so forth. Generally, snoop around and find out what information is
kept where. Figure out how that information is used and how it relates to other pieces of information. Often
different
physical databases contain redundant information and that forms a relationship. For example, a filing cabinet
holding information about customers includes all of the customers’ data. A pile of invoices also includes the
customers’ names, addresses, ID numbers, and other information that is duplicated in the customer files.
Paper orders probably contain the same information. These are the sorts of pieces of data that tie the whole
process together.

Brainstorm
At this point, you should have a decent understanding of the customers’ business and needs. To make sure
the customer hasn’t left anything out, you can hold brainstorming sessions. Bring in as many Stakeholders as
you can and let them run wild. Don’t rule out anything yet. If a stakeholder says the database should record
the color of customers’ shoes when they make a purchase, write it down. Continue brainstorming until
everyone has had their say and it’s clear that no new ideas are appearing. Occasionally extra creative people
look like they’re going to go on forever. Let them go for a while but if it’s clear they really can’t stem the
flood of ideas, split up. Have everyone go off separately and write down anything else relevant that they can
think of. Then come back and dump all of the ideas in a big pile. Try not to let the Customer Champion
suppress the others’ creativity too early. Though the customer Champion has the final say, the goal right now
is to gather ideas, not to decide which ones are the best. The goal at this point isn’t to accept or eliminate
anything as much as it is to write everything down. You want to be sure that everything relevant is
considered before you start designing. Later, when you’ve started laying out tables and indexes and changes
are more difficult to make, you don’t want someone to step in and say, ‘‘Owl voltages!
Why didn’t someone think of owl voltages?’’ Hopefully you have owl voltages written down somewhere and
crossed out so you can say they were considered and everyone agreed they were not a priority.
Look to the Future
During the brainstorming process, think about future needs. Explicitly ask the customers what they might
like to have in future releases. You may be able to include some of those ideas in the current project, but
even if you can’t it’s nice to know where things are headed. That will help you design your database flexibly
so you can more easily incorporate changes in the future. For example, suppose your customer Paula Marble
runs a plumbing supply shop but thinks someday it might be nice to add a little café and call the whole thing
‘‘Paula’s Plumbing and Pastries.’’ Think about how this might affect the database and the rest of the project.
Plumbing supplies are generally non-perishable, but pastries must be baked fresh daily and the ingredients
that go into pastries are perishable. You may want to think about using separate inventory tables to hold
information about non-perishable plumbing items that clients can purchase (gaskets, thread tape, pipe
wrenches) and perishable cooking items that the clients won’t buy directly (flour, eggs, raisins). You might
not even track quantity in stock for finished pastries (the clients either see them in the case or not) but you
probably want to be able to record prices for them nonetheless. In that case, you will have entries in an
inventory table that will contain prices but that will never hold quantities. You don’t necessarily need to start
planning the future database just yet, but you can keep these future changes in mind as you study the rest of
the problem.

Understand the Customers’ Reasoning


Occasionally you’ll come across a customer who thinks he knows something about database design. He may
say that you should use a particular table structure or an object-relational hierarchical data model. Sometimes
these suggestions make perfect sense. Other times you’ll think the customer just stumbled into an endless
swamp of techno-babble. Even if the suggestions seem to make no sense whatsoever, don’t dismiss them out
of hand. Remember that the customer has a different perspective than you do. The customer knows a lot
more than you about his particular business. He may or may not know anything about database design, but
it’s entirely possible that he has a reason for his obscure requests.
For example, suppose you’re trying to design a sales and inventory system for Thor’s Thimbles. The
president and CEO Thor say he thinks you need to use a temporal database. You think, ‘‘How hard can it be
to sell thimbles?’’ and ignore him. After you spend a month building a really slick relational database you
discover that old Thor isn’t so naïve after all. It turns out that the company sells hundreds of different models
of thimbles made from such materials as stainless steel, anodized aluminum, gold, and platinum. The value
of the more exotic models changes daily with precious metal prices. Suddenly what you thought was a simple
problem really does have hundreds of variables changing rapidly over time and you realize that you probably
should have built a temporal database.
Even if a customer’s suggestion seems odd, take it seriously. Dig deeper to find out why the customer thinks
that will be useful.
Learn What the Customers Really Need
Sometimes the customers don’t really understand what they need. They think they do and they almost
certainly understand the symptoms of their problems, but they don’t always make the right cause-and-effect
connections. Sometimes customers think a database or a new computer program will magically increase their
sales and reduce their costs. In fact, a well-designed database will increase consistency, reduce data entry
errors, provide reports, and otherwise help the customers manage their data, but that won’t necessarily
translate into higher profits. As you look over the customers’ operation, keep in mind that their real goals
may not be exactly what they think they are. Their real goals probably include things such as making bigger
profits, making fewer mistakes, and finishing their daily work in time.
Look for the real causes of the customers’ problems and think about ways you can address them. If you can
see a way to improve operations, suggest it.
Prioritize
At this point, you should have a fair understanding of the customers’ business, at least the pieces that are
relevant to your project. You should understand at least roughly which customers will be playing which
roles. At a minimum, you should know who the Customer Champion and Customer Representatives are so
you know who to ask questions. You should also have a big list of desired features. This list will probably
include a lot of things that would be nice to have but that are obviously unrealistic. It may also include things
that are reasonable but that would take too much time for your current project. To narrow the wish list to
manageable scope, sit down with the customers and help them prioritize. You’ll need the Customer
Representatives who understand what is needed so they can make the decisions. Sometimes you may need
the Customer Champion either in the meeting or available for consultation to make the tough calls. Group the
features into three categories.
Priority 1 (or release 1) features are things that absolutely must be in the version of the project that you’re
about to start building. This should be the bare-bones essentials without which the project will be a failure.
Priority 2 (or release 2) features are those that the customers can live without until the first version is in use
and you have time to start working on the next version. If development goes well, you may be able to pull
some of these features into the first release but the customers should not count on it.
Priority 3 (or release 3) features are those that the customers think would be nice but that are less important
than the priority 1 and 2 features. You can ignore them for now.
Make Use Cases
A use case is a script that the users can follow to practice solving a particular problem that they will face
while using your finished product. These can range in complexity from the very simple such as logging in or
closing the application, to the extremely complex such as scheduling a fleet of trucks. Depending on how
complete the user interface design is when you are writing the use cases, these may be sketchy or extremely
detailed. They may spell out every keystroke and mouse movement that the user must make or they may
provide vague instructions such as, ‘‘The user will use the Order Entry form to place a new order.’’ When
the project is finished, the customers should review all of the use cases and verify that the finished project
can handle them all. Some of the things that you might specify when writing up use cases include:
· Goals: A summary of what the use case should achieve.
· Summary. An executive overview that your Executive Champion can understand.
· Actors. Who will do what? This includes people, your finished system, other systems, and so forth. Anyone
or anything that will do something.
· Pre- and post-conditions. The conditions that should be true before and after the use case is finished. For
example, a pre-condition to placing a new order might be that the client placing the order already exists.
· Normal Flow: The normal steps that occur during the use case.
· Alternative Flow: Other ways the use case might proceed. For example, when a user tries to look up a
customer, what happens if the customer isn’t there?
· Notes: Just in case there are special considerations that the person following the use case needs to now.
Many developers like to draw use case diagrams to show what actors perform what tasks. These seem to
usually work at one of two levels.
A higher-level use case shows which actors perform which tasks. For example, the Student actor enrolls in a
class and takes the class, the Instructor actor teaches the class and assigns grades, and so forth. This type of
use case diagram provides little detail about how the actors accomplish their tasks. It’s useful early on when
you know what you want to do but don’t yet know how the system will do it. Figure 4-1 shows a high-level
use case diagram. Actors are shown as stick figures, tasks are shown in ellipses, and lines connect actors to
tasks. More elaborate use case diagrams use other kinds of arrows, lines, and annotations to provide more
detail.

FIGURE 4-1. High-Level Use Case Diagram

The second kind of use case lists more specific steps that actors take to perform a task, although the steps are
still listed at a fairly high level. Neither of these kinds of use case diagram provides enough detail to use as a
script for testing, although they do list the cases that you must test. Because they are shown at such a high
level, they are great for executive presentations. For more information on use case diagrams, look for books
about UML (Universal Modeling Language), which includes use case diagrams, or search the Web for ‘‘use
case diagram.’’
Typical use cases might include:
 The user logging in.
 The user logging out.
 Switching users (if the program allows that).
 Creating a new customer record.
 Editing a customer record.
 Marking a customer record as inactive.
 Creating a new order for an existing customer.
 Creating a new order for a new customer.
 Creating an invoice for an order.
 Sending out late payment notices.
 Creating a replacement invoice in case the customer lost one.
 Receiving a payment.
 Defining a new inventory item (when the CEO decides that you should start selling Rogaine for
Dogs).
 Adding new items to inventory (for example, when you restock your fuzzy dice supply). Etc.

The list can go on practically forever. A large project can include hundreds of use cases and it may take quite
a while to write them all down and then later verify that the finished project handles them all. In addition to
being measurable, use cases should be as realistic as possible. There’s no point in verifying that the program
can handle a situation that will never occur in real life.

Summary
Building any custom product is largely a translation process whether you’re building a small database, a
gigantic Internet sales system similar to the one used by Amazon, or a really tricked-out snowboard. You
need to translate the half-formed ideas floating around in the minds of your customers into reality. The first
step in the translation process is understanding the customers’ needs. This information sheet explained ways
you can gather information about the customers’ problems, wishes, and desires so you can take the next step
in the process.
In this information sheet you learned how to:
 Try to decide which customers will play which roles.
 Pick the customers’ brains for information.
 Look for documentation about user roles and responsibilities, existing procedures, and
existing data.
 Watch customers at work and study their current operations directly.
 Brainstorm and categorize the results into priority 1, 2, and 3 items.
 Write use cases.
After you’ve achieved a good understanding of the customers’ needs and expectations, you can start turning
them into data models.
K2
Match the following the customer roles with their corresponding descriptions.

__________ 1. Convert A. Someone who won’t be around for long. May be


helpful or may not care all that much.
__________ 2. Customer Champion B. Answers your questions about the project.
C. Anyone who has an interest in the project.
__________ 3. Customer Representative D. Makes things generally run smoothly. Not glamorous
__________ 4. Critic but very useful.
E. Provides a reality check and prevents groupthink.
__________ 5. Executive Champion F. Ranges from annoying naysayer to malicious
saboteur/super villain.
__________ 6. Generic Bad Guy G. A user who originally was against your project that
you
__________ 7. Short-Timer include in the development process to bring them onto
your side.
__________ 8. Sidekick/Gopher H. The highest-ranking customer driving the project.
__________ 9. Stakeholder I. Willing to fight super villains
J. Thoroughly understands the customers’ needs. Has the
au-throaty to make decisions that stick.
CHOOSE PART 1
__________ 1. Which of the following does not describe a use case?
a. A script for performing some task.
b. Should describe a realistic operation.
c. Should cover the customer’s entire operation from start to finish.
d. Should be verifiable.
__________ 2. Brainstorming sessions should ideally include:
a. Customer Representatives.
b. A Critic.
c. All interested Stakeholders.
d. All of the above.
__________ 3. If a customer says you should use a hierarchical XML database, you should:
a. Politely say, ‘‘Thank you,’’ and ignore this nugget of wisdom.
b. Ask the customer why he thinks that.
c. Do as the customer says.
d. Study the problem to see if that kind of database makes sense.
__________ 4. During a visit to view the customers’ operation, you see someone repeatedly stamping the
front
of an order with the current date, turning the order over, turning it over again, and stamping
the front with the date again. You should:
a. Ask someone what that’s all about.
b. Suggest that the manager fire this crazy and possibly dangerous employee.
c. Ignore the whole issue and stay focused on your own tasks.
d. Avoid eye contact with this employee at all costs.
_________ 5. Which of the following is not a security issue that you should consider when studying the
project?
a. The number of classes of users the database must support.
b. Whether you need to provide audit trails to record changes to the data.
c. The frequency with which you need to perform backups.
d. Whether the users should have individual passwords.
__________ 6. You are called upon to design a database for a florist shop named ‘‘Frank’s Floral
Fantasies.’’ Frank thinks that he might want to track the medicinal and homeopathic properties of his plants
because he thinks that might improve his sales of echinacea, St. John’s Wort, and other plants. +What
priority should this requirement get?
INFORMATION SHEET 4
LO2: IDENTIFY DATABASE REQUIREMENTS

PHYSICAL DATABASE DESIGN


AND PERFORMANCE
after studying this information sheet, you should be able to:
· concisely define key terms.
· describe the physical database design process, its objectives, and its deliverables.
· choose storage formats for attributes from a logical data model.
The Physical Database Design Process
To make life a little easier for you, many physical database design decisions are implicit or eliminated when
you choose the database management technologies to use with the information system you are designing.
Because many organizations have standards for operating systems, database management systems, and data
access languages, you must deal only with those choices not implicit in the given technologies. Thus, we will
cover only those decisions that you will make most frequently, as well as other selected decisions that may
be critical for some types of applications, such an online data capture and retrieval.
The primary goal of physical database design is data processing efficiency. Today, with ever-decreasing
costs for computer technology per unit of measure (both speed and space measures), it is typically very
important to design a physical database to minimize the time required by users to interact with the
information system. Thus, we concentrate on how to make processing of physical files and databases
efficient, with less attention on minimizing the use of space.
Designing physical files and databases requires certain information that should have been collected and
produced during prior systems development phases. The information needed for physical file and database
design includes these requirements:
 Normalized relations, including estimates for the range of the number of rows in each table
 Definitions of each attribute, along with physical specifications such as maximum possible
length
 Descriptions of where and when data are used in various ways (entered, retrieved, deleted, and
updated, including typical frequencies of these events)
 Expectations or requirements for response time and data security, backup, recovery, retention,
and integrity
 Descriptions of the technologies (database management systems) used for implementing the
database Physical database design requires several critical decisions that will affect the
integrity and performance of the application system. These key decisions include the
following:
 Choosing the storage format (called data type) for each attribute from the logical data model.
The format and associated parameters are chosen to maximize data integrity and to minimize
storage space.
 Giving the database management system guidance regarding how to group attributes from the
logical data model into physical records. You will discover that although the columns of a
relational table as specified in the logical design are a natural definition for the contents of a
physical record, this does not always form the foundation for the most desirable grouping of
attributes.
 Giving the database management system guidance regarding how to arrange similarly
structured records in secondary memory (primarily hard disks), using a structure (called a file
organization) so that individual and groups of records can be stored, retrieved, and updated
rapidly. Consideration must also be given to protecting data and recovering data if errors are
found.
 Selecting structures (including indexes and the overall database architecture) for storing and
connecting files to make retrieving related data more efficient.
 Preparing strategies for handling queries against the database that will optimize performance
and take advantage of the file organizations and indexes that you have specified. Efficient
database structures will be of benefit only if queries and the database management systems
that handle those queries are tuned to intelligently use those structures.

Data Volume and Usage Analysis


As mentioned previously, data volume and frequency-of-use statistics are important inputs to the physical
database design process, particularly in the case of very large-scale database implementations. Thus, you
have to maintain a good understanding of the size and usage patterns of the database throughout its life cycle.
In this section, we discuss data volume and usage analysis as if it were a one-time static activity, but in
practice, you should continuously monitor significant changes in usage and data volumes.
An easy way to show the statistics about data volumes and usage is by adding notation to the EER diagram
that represents the final set of normalized relations from logical database design. Figure 4-1 shows the EER
diagram (without attributes) for a simple inventory database in Pine Valley Furniture Company. This EER
diagram represents the normalized relations constructed during logical database design for the original
conceptual data model of this situation.
FIGURE 4-1 Composite Usage Map (Pine Valley Furniture Company)

Both data volume and access frequencies are shown in Figure 4-1. For example, there are 3,000 PARTs in
this database. The supertype PART has two subtypes, MANUFACTURED (40 percent of all PARTs are
manufactured) and PURCHASED (70 percent are purchased; because some PARTs are of both subtypes, the
percentages sum to more than 100 percent). The analysts at Pine Valley estimate that there are typically 150
SUPPLIERs, and Pine Valley receives, on average, 40 SUPPLIES instances from each SUPPLIER, yielding
a total of 6,000 SUPPLIES. The dashed arrows represent access frequencies. So, for example, across all
applications that use this database, there are on average 20,000 accesses per hour of PART data, and these
yields, based on subtype percentages, 14,000 accesses per hour to PURCHASED PART data. There are an
additional 6,000 direct accesses to PURCHASED PART data. Of this total of 20,000 accesses to
PURCHASED PART, 8,000 accesses then also require SUPPLIES data and of these 8,000 accesses to
SUPPLIES, there are 7,000 subsequent accesses to SUPPLIER data. For online and Web-based applications,
usage maps should show the accesses per second. Several usage maps may be needed to show vastly
different usage patterns for different times of day. Performance will also be affected by network
specifications.
The volume and frequency statistics are generated during the systems analysis phase of the systems
development process when systems analysts are studying current and proposed data processing and business
activities. The data volume statistics represent the size of the business and should be calculated assuming
business growth over at least a several years period. The access frequencies are estimated from the timing of
events, transaction volumes, the number of concurrent users, and reporting and querying activities. Because
many databases support ad hoc accesses, and such accesses may change significantly over time, and known
database access can peak and dip over a day, week, or month, the access frequencies tend to be less certain
and even than the volume statistics. Fortunately, precise numbers are not necessary. What is crucial is the
relative size of the numbers, which will suggest where the greatest attention needs to be given during
physical database design in order to achieve the best possible performance. For example,
in Figure 4-1, notice that
· There are 3,000 PART instances, so if PART has many attributes and some, like description, would be quite
long, then the efficient storage of PART might be important.
· For each of the 4,000 times per hour that SUPPLIES is accessed via SUPPLIER, PURCHASED PART is
also accessed; thus, the diagram would suggest possibly combining these two co-accessed entities into a
database table (or file). This act of combining normalized tables is an example of denormalization.
· There is only a 10 percent overlap between MANUFACTURED and PURCHASED parts, so it might make
sense to have two separate tables for these entities and redundantly store data for those parts that are both
manufactured and purchased; such planned redundancy is okay if purposeful. Further, there are a total of
20,000 accesses an hour of PURCHASED PART data (14,000 from access to PART and 6,000 independent
access of PURCHASED PART) and only 8,000 accesses of MANUFACTURED PART per
hour. Thus, it might make sense to organize tables for MANUFACTURED and PURCHASED PART data
differently due to the significantly different access volumes.
It can be helpful for subsequent physical database design steps if you can also explain the nature of the
access for the access paths shown by the dashed lines. For example, it can be helpful to know that of the
20,000 accesses to PART data, 15,000 ask for a part or a set of parts based on the primary key, Part No (e.g.,
access a part with a particular number); the other 5,000 accesses qualify part data for access by the value of
Stonehands. (These specifics are not shown in Figure 4-1.) This more precise description can help in
selecting indexes, one of the major topics we discuss later in this chapter. It might also be helpful to know
whether an access results in data creation, retrieval, update, or deletion. Such a refined description of access
frequencies can be handled by additional notation on a diagram such as in Figure 4-1, or by text and tables
kept in other documentation.

Designing Fields
A field is the smallest unit of application data recognized by system software, such as a programming
language or database management system. A field corresponds to a simple attribute in the logical data model,
and so in the case of a composite attribute, a field represents a single component.
The basic decisions you must make in specifying each field concern the type of data (or storage type) used to
represent values of this field, data integrity controls built into the database, and the mechanisms that the
DBMS uses to handle missing values for the field. Other field specifications, such as display format, also
must be made as part of the total specification of the information system, but we will not be concerned here
with those specifications that are often handled by applications rather than the DBMS.
CHOOSING DATA TYPES
A data type is a detailed coding scheme recognized by system software, such as a DBMS, for representing
organizational data. The bit pattern of the coding scheme is usually transparent to you, but the space to store
data and the speed required to access data are of consequence in physical database design. The specific
DBMS you will use will dictate which choices are available to you? For example, Table 4-1 lists some of the
data types available in the Oracle 11g DBMS, a typical DBMS that uses the SQL data definition and
manipulation language. Additional data types might be available for currency, voice, image, and user defined
for some DBMSs.
Table 4-1 Commonly Used Data Types in Oracle 11g
Data Type
VARCHAR2
CHAR
CLOB
NUMBER
INTEGER
DATE
BLOB
Description
variable-length character data with a maximum length of 4,000 characters; you must enter a maximum field
length
(e.g., varchar2(30) specifies a field with a maximum length of 30 characters). a value less than 30 characters
will consume only the required space.
fixed-length character data with a maximum length of 2,000 characters; default length is 1 character (e.g., char
(5) specifies a field with a fixed length of 5 characters, capable of holding a value from 0 to 5 characters
long).
character large object, capable of storing up to 4 gigabytes of one variable-length character data field (e.g., to hold
a medical instruction or a customer comment).positive or negative number in the range 10–130 to 10126; can
specify the precision (total number of digits to the left and right of the decimal point) and the scale (the
number of digits to the right of the decimal point) (e.g., number (5) specifies an integer field with a maximum
of 5 digits, and number (5,2) specifies a field with no more than 5 digits and exactly 2 digits to the right of
the decimal point).positive or negative integer with up to 38 digits (same as small int). any date from January
1, 4712 B.Sc., to December 31, 9999 a.m.; date stores the century, year, month, day, hour, minute, and
second.
binary large object, capable of storing up to 4 gigabytes of binary data (e.g., a photograph or sound clip).
Selecting a data type involves four objectives that will have different relative levels of importance for different applications:
1. Represent all possible values.
2. Improve data integrity.
3. Support all data manipulations.
4. Minimize storage space.
An optimal data type for a field can, in minimal space, represent every possible value (while eliminating
illegal values) for the associated attribute and can support the required data manipulation (e.g., numeric data
types for arithmetic operations and character data types for string manipulation). Any attribute domain
constraints from the conceptual data model is helpful in selecting a good data type for that attribute.
Achieving these four objectives can be subtle. For example, consider a DBMS for which a data type has a
maximum width of 2 bytes. Suppose this data type is sufficient to represent a Quantity Sold field. When
Quantity Sold fields are summed, the sum may require a number larger than 2 bytes. If the DBMS uses the
field’s data type for results of any mathematics on that field, the 2-byte length will not work. Some data types
have special manipulation capabilities; for example, only the DATE
data type allows true date arithmetic.
Coding Techniques
Some attributes have a sparse set of values or are so large that, given data volumes, considerable storage
space will be consumed. A field with a limited number of possible values can be translated into a code that
requires less space. Consider the example of the Product Finish field illustrated in Figure 4-2. Products at
Pine Valley Furniture come in only a limited number of woods: Birch, Maple, and Oak. By creating a code or
translation table, each Product Finish field value can be replaced by a code, a cross-reference to the lookup
table, similar to a foreign key. This will decrease the amount of space for the Product Finish field and hence
for the PRODUCT file. There will be additional space for the PRODUCT FINISH lookup table, and when
the Product Finish field value is needed, an extra access (called a join) to this lookup table will be required. If
the Product Finish field is infrequently used or if the number of distinct
Product Finish values is very large, the relative advantages of coding may outweigh the costs. Note that the
code table would not appear in the conceptual or logical model. The code table is a physical construct to
achieve data processing performance improvements, not a set of data with business value.
CONTROLLING DATA INTEGRITY
For many DBMSs, data integrity controls (i.e., controls on the possible value a field can assume)
can be built into the physical structure of the fields and controls enforced by the DBMS on those
fields. The data type enforces one form of data integrity control because it may limit the type of
data (numeric or character) and the length of a field value. The following are some other typical
integrity controls that a DBMS may support:
FIGURE 4-2 EXAMPLE OF A CODE LOOKUP TABLE (PINE VALLEY FURNITURE
COMPANY)

 Default value. A default value is the value a field will assume unless a user enters an
explicit value for an instance of that field. Assigning a default value to a field can reduce
data entry time because entry of a value can be skipped. It can also help to reduce data
entry errors for the most common value.
 Range control. A range control limits the set of permissible values a field may assume.
The range may be a numeric lower-to-upper bound or a set of specific values. Range
controls must be used with caution because the limits of the range may change over time.
A combination of range controls and coding led to the year 2000 problem that many
organizations faced, in which a field for year was represented by only the numbers 00 to
99. It is better to implement any range controls through a DBMS because range controls
in applications may be inconsistently enforced. It is also more difficult to find and change
them in applications than in a DBMS.
 Null value control. A null value is defined as an empty value. Each primary key must
have an integrity control that prohibits a null value. Any other required field may also
have a null value control placed on its if that is the policy of the organization. For
example, a university may prohibit adding a course to its database unless that course has
a title as well as a value of the primary key, Coursed. Many fields legitimately may have
a null value, so this control should be used only when truly required by business rules.
 Referential integrity. Referential integrity on a field is a form of range control in which
the value of that field must exist as the value in some field in another row of the same or
(most commonly) a different table. That is, the range of legitimate values comes from the
dynamic contents of a field in a database table, not from some pre-specified set of values.
Note that referential integrity guarantees that only some existing cross-referencing value
is used, not that it is the correct one. A coded field will have referential integrity with the
primary key of the associated lookup table.
Handling Missing Data
When a field may be null, simply entering no value may be sufficient. For example, suppose a
customer zip code field is null and a report summarizes total sales by month and zip code. How
should sales to customers with unknown zip codes be handled? Two options for handling or
preventing missing data have already been mentioned: using a default value and not permitting
missing (null) values. Missing data are inevitable. The following are some other possible
methods for handling missing data:
Substitute an estimate of the missing value. For example, for a missing sales value when
computing monthly product sales, use a formula involving the mean of the existing
monthly sales values for that product indexed by total sales for that month across all
products. Such estimates must be marked so that users know that these are not actual
values.
Track missing data so that special reports and other system elements cause people to
resolve unknown values quickly. This can be done by setting up a trigger in the database
definition. A trigger is a routine that will automatically execute when some event occurs
or time period passes. One trigger could log the missing entry to a file when a null or
other missing value is stored, and another trigger could run periodically to create a report
of the contents of this log file.
Perform sensitivity testing so that missing data are ignored unless knowing a value might
significantly change results (e.g., if total monthly sales for a particular salesperson are
almost over a threshold that would make a difference in that person’s compensation).
This is the most complex of the methods mentioned and hence requires the most
sophisticated programming. Such routines for handling missing data may be written in
application programs. All relevant modern DBMSs now have more sophisticated
programming capabilities, such as case expressions, user-defined functions, and triggers,
so that such logic can be available in the database for all users without application-
specific programming.

Summary
During physical database design, you, the designer, translate the logical description of data into
the technical specifications for storing and retrieving data. The goal is to create a design for
storing data that will provide adequate performance and ensure database integrity, security, and
recoverability. In physical database design, you consider normalized relations and data volume
estimates, data definitions, data processing requirements and their frequencies, user expectations,
and database technology characteristics to establish the specifications that are used to implement
the database using a database management system.
A field is the smallest unit of application data, corresponding to an attribute in the logical data
model. You must determine the data type, integrity controls, and how to handle missing values
for each field, among other factors. A data type is a detailed coding scheme for representing
organizational data. Data may be coded to reduce storage space. Field integrity control includes
specifying a default value, a range of permissible values, null value permission, and referential
integrity.

You might also like