Cs409 Notes
Cs409 Notes
Data are the raw bits and pieces of information with no context.
Data are the raw bits and pieces of information with no context. If I told you, “45, 32,
41, 75,” you would not have learned anything.
Information
Information is meaningful
In today’s world, accurate, relevant, and timely information is the key to good
decision making
By itself, data is not that useful. To be useful, it needs to be given context. Returning
to the example above, if I told you that “45, 32, 41, 75” are the numbers of students
that had registered for upcoming classes, that would be information. By adding the
context – that the numbers represent the count of students registering for specific
classes – I have converted data into information.
Once we have put our data into context, aggregated and analyzed it, we can use it to
make decisions for our organization. We can say that this consumption of information
produces knowledge. This knowledge can be used to make decisions, set policies, and
even spark innovation.
What is DBMS and its examples?
Database Management System (DBMS) is a software for storing and retrieving users’ data
while considering appropriate security measures. It consists of a group of programs which
manipulate the database. The DBMS accepts the request for data from an application and
instructs the operating system to provide the specific data. In large systems, a DBMS helps users
and other third-party software to store and retrieve data.
DBMS allows users to create their own databases as per their requirement. The term “DBMS”
includes the user of the database and other application programs. It provides an interface
between the data and the software application.
Example of DBMS
Oracle
IBM DB2
Ingress
Teradata
MS SQL Server
MS Access
MySQL
Forms
Forms are used for entering, modifying, and viewing records. You likely have had to fill out
forms on many occasions, like when visiting a doctor's office, applying for a job, or registering
for school. The reason forms are used so often is that they're an easy way to guide people toward
entering data correctly. When you enter information into a form in Access, the data goes exactly
where the database designer wants it to go in one or more related tables.
Forms make entering data easier. Working with extensive tables can be confusing, and when you
have connected tables, you might need to work with more than one at a time to enter a set of
data. However, with forms it's possible to enter data into multiple tables at once, all in one place.
Database designers can even set restrictions on individual form components to ensure all of the
needed data is entered in the correct format. All in all, forms help keep data consistent and
organized, which is essential for an accurate and powerful database.
Reports
Reports offer you the ability to present your data in print. If you've ever received a computer
printout of a class schedule or a printed invoice of a purchase, you've seen a database report.
Reports are useful because they allow you to present components of your database in an easy-to-
read format. You can even customize a report's appearance to make it visually appealing. Access
offers you the ability to create a report from any table or query.
Queries
Queries are a way of searching for and compiling data from one or more tables. Running a
query is like asking a detailed question of your database. When you build a query in Access, you
are defining specific search conditions to find exactly the data you want.
Queries are far more powerful than the simple searches you might carry out within a table. While
a search would be able to help you find the name of one customer at your business, you could
run a query to find the name and phone number of every customer who's made a purchase within
the past week. A well-designed query can give information you might not be able to find just by
looking through the data in your tables.
DBMS Languages
A Software Package that enables users to define, create, maintain, and control access to the
database.
Security, integrity, concurrent access, recovery, support for data communication, etc.
The final step up the information ladder is the step from knowledge to wisdom. We
can say that someone has wisdom when they can combine their knowledge and
experience to produce a deeper understanding of a topic. It often takes many years to
develop wisdom on a particular topic and requires patience.
What is database?
A database is a collection of related data. By data, we mean known facts that can be
recorded and that have implicit meaning. For example, consider the names, telephone
numbers, and addresses of the people you know. You may have recorded this data in
an indexed address book or you may have stored it on a hard drive, using a personal
computer and software such as Microsoft Access or Excel. This collection of related
data with an implicit meaning is a database.
Any user can interact with the database. For example, application programmers and end user.
As its name shows, application programmers are the one who writes application programs that
uses the database. These application programs are written in programming languages like
COBOL or PL (Programming Language 1), Java and fourth generation language. These
programs meet the user requirement and make according to user requirements. Retrieving
information, creating new information and changing existing information is done by these
application programs. They interact with DBMS through DML (Data manipulation language)
calls. And all these functions are performed by generating a request to the DBMS. If application
programmers are not there, then there will be no creativity in the whole team of Database.
End users are those who access the database from the terminal end. They use the developed
applications, and they don’t have any knowledge about the design and working of database.
These are the second class of users, and their main motto is just to get their task done.
Form Processing & Report Processing Applications are built by using VB, DOT Net
Query Processing can be managed by using vendors SQL tool or 3rd party tools such
Teleprocessing
Processing performed within the same physical computer. User terminals are
typically “dumb”, incapable of functioning on their own, and cabled to the central
computer
File-Server
Client-Server (2-tiers)
SQL processing remained on the server side. In such an architecture, the server is
often called a query server or transaction server because it provides these two
This allows Java client programs to access one or more DBMSs through a
standard interface.
object-oriented DBMSs, where the software modules of the DBMS were divided
between client and server in a more integrated way. For example, the server level
may include the part of the DBMS software responsible for handling data storage
on
disk pages, local concurrency control and recovery, buffering and caching of disk
pages, and other such functions. Meanwhile, the client level may handle the user
across multiple servers; structuring of complex objects from the data in the
buffers;
and other such functions. In this approach, the client/server interaction is more
on the client and some on the server—rather than by the users/programmers. The
The architectures described here are called two-tier architectures because the
software components are distributed over two systems: client and server. The
advantages of this architecture are its simplicity and seamless compatibility with
existing
systems. The emergence of the Web changed the roles of clients and servers,
leading
adds an intermediate layer between the client and the database server, this
intermediate layer or middle tier is called the application server or the Web
that are used to access data from the database server. It can also improve database
business rules. The intermediate server accepts requests from the client, processes
the request and sends database queries and commands to the database server, and
then acts as a conduit for passing (partially) processed data from the database
server
in GUI format. Thus, the user interface, application rules, and data access act as
the
three tiers. Figure 2.7(b) shows another architecture used by database and other
user and allows data entry. The business logic layer handles intermediate rules
and
constraints before data is passed up to the user or down to the DBMS. The bottom
layer includes all data management services. The middle layer can also act as a
Web
server, which retrieves query results from the database server and formats them
into
dynamic Web pages that are viewed by the Web browser at the client side.
Other architectures have also been proposed. It is possible to divide the layers
between the user and the stored data further into finer components, thereby giving
rise to n-tier architectures, where n may be four or five tiers. Typically, the
business
logic layer is divided into multiple layers. Besides distributing programming and
data throughout a network, n-tier applications afford the advantage that any one
tier can run on an appropriate processor or operating system platform and can be
A conceptual data model is a model that helps to identify the highest-level relationships between
the different entities, while a logical data model is a model that describes the data as much detail
as possible, without regard to how they will be physically implemented in the database.
Query Processing is a translation of high-level queries into low-level expression. It is a step wise
process that can be used at the physical level of the file system, query optimization and actual
execution of the query to get the result. It requires the basic concepts of relational algebra and
file structure. It refers to the range of activities that are involved in extracting data from the
database. It includes translation of queries in high-level database languages into expressions that
can be implemented at the physical level of the file system. In query processing, we will actually
understand how these queries are processed and how they are optimized.
Physical database design is the process of transforming logical data models into physical data
models. An experienced database designer will make a physical database design in parallel with
conceptual data modeling if they know the type of database technology that will be used.
1. The internal level has an internal schema, which describes the physical storage structure of the
database. The internal schema uses a physical data model
and describes the complete details of data storage and access paths for the
database.
2. The conceptual level has a conceptual schema, which describes the structure of the whole
database for a community of users. The conceptual schema
hides the details of physical storage structures and concentrates on describing entities, data types,
relationships, user operations, and constraints.
data model.
3. The external or view level includes a number of external schemas or user
views. Each external schema describes the part of the database that a particular user group is
interested in and hides the rest of the database from that
user group. As in the previous level, each external schema is typically implemented using a
representational data model, possibly based on an external
The three-schema architecture is a convenient tool with which the user can visualize
the schema levels in a database system. Most DBMSs do not separate the three levels
completely and explicitly but support the three-schema architecture to some extent.
Some older DBMSs may include physical-level details in the conceptual schema.
development because it clearly separates the users’ external level, the database’s conceptual
level, and the internal storage level for designing a database. It is very much
applicable in the design of DBMSs, even today. In most DBMSs that support user
views, external schemas are specified in the same data model that describes the
conceptual-level information (for example, a relational DBMS like Oracle uses SQL
for this). Some DBMSs allow different data models to be used at the conceptual and
external levels. An example is Universal Data Base (UDB), a DBMS from IBM,
which uses the relational model to describe the conceptual schema, but may use an
Notice that the three schemas are only descriptions of data; the stored data that
actually exists is at the physical level only. In a DBMS based on the three-schema
architecture, each user group refers to its own external schema. Hence, the DBMS
must transform a request specified on an external schema into a request against the
conceptual schema, and then into a request on the internal schema for processing
over the stored database. If the request is a database retrieval, the data extracted
from the stored database must be reformatted to match the user’s external view. The
processes of transforming requests and results between levels are called mappings.
are meant to support small databases—do not support external views. Even in such
Data Independence
The three-schema architecture can be used to further explain the concept of data
independence, which can be defined as the capacity to change the schema at one
level of a database system without having to change the schema at the next higher
may change the conceptual schema to expand the database (by adding a
schemas need not be changed as well. Changes to the internal schema may be
needed because some physical files were reorganized—for example, by creating additional
access structures—to improve the performance of retrieval or
update. If the same data as before remains in the database, we should not
have to change the conceptual schema.
Generally, physical data independence exists in most databases and file environments where
physical details such as the exact location of data on disk, and hardware details of storage
encoding, placement, compression, splitting, merging of
records, and so on are hidden from the user. Applications remain unaware of these
details. On the other hand, logical data independence is harder to achieve because it
information on how to map requests and data among the various levels. The DBMS
information in the catalog. Data independence occurs because when the schema is
changed at some level, the schema at the next higher level remains unchanged; only
the mapping between the two levels is changed. Hence, application programs referring to the
higher-level schema need not be changed.
The three-schema architecture can make it easier to achieve true data independence, both
physical and logical. However, the two levels of mappings create an
Database schema
A database schema is a blueprint or architecture of how our data will look. It doesn’t hold data
itself, but instead describes the shape of the data and how it might relate to other tables or
models. An entry in our database will be an instance of the database schema. It will contain all of
the properties described in the schema.
Schema types
There are two main database schema types that define different parts of the schema: logical and
physical.
A logical database schema represents how the data is organized in terms of tables. It also
explains how attributes from tables are linked together. Different schemas use a different syntax
to define the logical architecture and constraints.
To create a logical database schema, we use tools to illustrate relationships between components
of your data. This is called entity-relationship modeling (ER Modeling). It specifies what the
relationships between entity types are.
The physical database schema represents how data is stored on disk storage. In other words, it
is the actual code that will be used to create the structure of your database. In MongoDB with
mongoose, for instance, this will take the form of a mongoose model. In MySQL, you will use
SQL to construct a database with tables.
Schema objects
A schema is a collection of schema objects. Examples of schema objects include tables, views,
sequences, synonyms, indexes, clusters, database links, procedures, and packages. This chapter
explains tables, views, sequences, synonyms, indexes, and clusters.
Schema objects are logical data storage structures. Schema objects do not have a one-to-one
correspondence to physical files on disk that store their information. However, Oracle stores a
schema object logically within a tablespace of the database. The data of each object is physically
contained in one or more of the tablespace's datafiles. For some objects such as tables, indexes,
and clusters, you can specify how much disk space Oracle allocates for the object within the
tablespace's datafiles.
Who is DBA?
Network Administrator
A network administrator is a person designated in an organization whose responsibility
includes maintaining computer infrastructures with emphasis on local area networks
(LANs) up to wide area networks (WANs). Responsibilities may vary between
organizations, but installing new hardware, on-site servers, enforcing licensing
agreements, software-network interactions, as well as network integrity/resilience, are
some of the key areas of focus.
Network administrator coordinates with the DBA for database connections and other
issues such as storage, OS and hardware.
Some sites have one or more network administrators. A network administrator, for
example, administers Oracle networking products, such as Oracle Net Services.
Application Developers
DBA’s Tasks
DBA’s Responsibilities
Installing and upgrading the Oracle Database server and application tools
Allocating system storage and planning future storage requirements for the database
system
Creating primary database storage structures (tablespaces) after application developers
have designed an application
Creating primary objects (tables, views, indexes) once application developers have
designed an application
Modifying the database structure, as necessary, from information given by application
developers
Enrolling users and maintaining system security
Ensuring compliance with Oracle license agreements
Controlling and monitoring user access to the database
Monitoring and optimizing the performance of the database
Planning for backup and recovery of database information
Maintaining archived data on tape
Backing up and restoring the database
Contacting Oracle for technical support
Physical database design is the process of transforming logical data models into
physical data models. An experienced database designer will make a physical database
design in parallel with conceptual data modeling if they know the type of database
technology that will be used.
Purposes
Meeting the expectations of Database Designer for the database, following are two main
purposes of Physical Database Design for a DBA.
Managing Storage Structure for database or DBMS
Performance & Tuning
Factor (A): Analyzing the database queries and transactions
Before undertaking the physical database design, we must have a good idea of the
intended use of the database by defining in a high-level form the queries and transactions
that are expected to run on the database. For each retrieval query, the following
information about the query would be needed:
1. The files that will be accessed by the query.
2. The attributes on which any selection conditions for the query are specified.
3. Whether the selection condition is an equality, inequality, or a range condition.
4. The attributes on which any join conditions or conditions to link multiple
tables or objects for the query are specified.
5. The attributes whose values will be retrieved by the query.
The attributes listed in items 2 and 4 above are candidates for the definition of access
structures, such as indexes, hash keys, or sorting of the file.
For each update operation or update transaction, the following information would be
needed:
1. The files that will be updated.
2. The type of operation on each file (insert, update, or delete).
3. The attributes on which selection conditions for a delete or update are specified.
4. The attributes whose values will be changed by an update operation.
Again, the attributes listed in item 3 are candidates for access structures on the
files,because they would be used to locate the records that will be updated or deleted.
On the other hand, the attributes listed in item 4 are candidates for avoiding an
access structure, since modifying them will require updating the access structures.
Factor (B): Frequency with Queries and Transactions
Besides identifying the characteristics of expected retrieval queries and update
transactions, we must consider their expected rates of invocation. This
frequency information, along with the attribute information collected on each query and
transaction, is used to compile a cumulative list of the expected frequency
of use for all queries and transactions. This is expressed as the expected frequency of
using each attribute in each file as a selection attribute or a join attribute, over all the
queries and transactions. Generally, for large volumes of processing, the informal 80–20
rule can be used: approximately 80 percent of the processing is accounted for by only 20
percent of the queries and transactions. Therefore, in practical situations, it is rarely
necessary to collect exhaustive statistics and invocation
rates on all the queries and transactions; it is sufficient to determine the 20 percent or so
most important ones.
Factor (C): Time constraints of queries & transactions
Some queries and transactions may have stringent performance constraints. For
example,a transaction may have the constraint that it should terminate within 5 seconds
on 95 percent of the occasions when it is invoked, and that it should never take more than
20 seconds. Such timing constraints place further priorities on the attributes that are
candidates for access paths. The selection attributes used by queries and transactions with
time constraints become higher-priority candidates for primary access structures for the
files because the primary access structures are generally the most efficient for locating
records in a file.
Factor (D): Expected frequencies of update operations
A minimum number of access paths should be specified for a file that is frequently
updated,because updating the access paths themselves slows down the update operations.
For example, if a file that has frequent record insertions has 10 indexes on 10
different attributes, each of these indexes must be updated whenever a new record is
inserted.The overhead for updating 10 indexes can slow down the insert operations.
Factor (E): Uniqueness constraints on attributes
Access paths should be specified on all candidate key attributes—or sets of attributes—
that are either the primary key of a file or unique attributes. The existence of an index (or
other access path) makes it sufficient to only search the index when checking this
uniqueness constraint, since all values of the attribute will exist in the leaf nodes of the
index. For example, when inserting a new record, if a key attribute value of the new
record already exists in the index, the insertion of the new record should be rejected,
since it would violate the uniqueness constraint on the attribute. Once the preceding
information is compiled, it is possible to address the physical
database design decisions, which consist mainly of deciding on the storage structures and
access paths for the database files.
Design Decisions about Indexing.
The attributes whose values are required in equality or range conditions (selection
operation) are those that are keys or that participate in join conditions (join operation)
requiring access paths, such as indexes.
The performance of queries largely depends upon what indexes or hashing schemes exist
to expedite the processing of selections and joins. On the other hand, during
insert, delete, or update operations, the existence of indexes adds to the overhead.This
overhead must be justified in terms of the gain in efficiency by expediting
queries and transactions.The physical design decisions for indexing fall into the following
categories:
1. Whether to index an attribute. The general rules for creating an index on an attribute
are that the attribute must either be a key (unique), or there must be some query that uses
that attribute either in a selection condition (equality or range of values) or in a join
condition. One reason for creating multiple indexes is that some operations can be
processed by just scanning the indexes, without having to access the actual data file.
2. What attribute or attributes to index on. An index can be constructed on a single
attribute, or on more than one attribute if it is a composite index. If multiple attributes
from one relation are involved together in several queries,(for example,
(Garment_style_#, Color) in a garment inventory database), a Multi attribute (composite)
index is warranted. The ordering of attributes within a multiattribute index must
correspond to the queries. For instance, the above index assumes that queries would be
based on an ordering of colors within a Garment_style_# rather than vice versa.
3. Whether to set up a clustered index. At most, one index per table can be a primary or
clustering index, because this implies that the file be physically
ordered on that attribute. In most RDBMSs, this is specified by the keyword CLUSTER.
(If the attribute is a key, a primary index is created, whereas a clustering index is created
if the attribute is not a key.) If a table requires several indexes, the decision about which
one should be the primary or clustering index depends upon whether keeping the
table ordered on that attribute is needed. Range queries benefit a great deal
from clustering. If several attributes require range queries, relative benefits must be
evaluated before deciding which attribute to cluster on. If a query is to be answered by
doing an index search only (without retrieving data records), the corresponding index
should not be clustered, since the main benefit of clustering is achieved when retrieving
the records themselves. A clustering index may be set up as a multi attribute index if
range retrieval by that composite key is useful in report creation (for example, an index
on Zip_code, Store_id, and Product_id may be a clustering index for sales data).
4. Whether to use a hash index over a tree index. In general, RDBMSs use B+- trees
for indexing. However, ISAM and hash indexes are also provided in some systems (see
Chapter 18). B+-trees support both equality and range queries on the attribute used as the
search key. Hash indexes work well with equality conditions, particularly during joins to
find a matching record(s), but they do not support range queries
5. Whether to use dynamic hashing for the file. For files that are very volatile—that is,
those that grow and shrink continuously.
The process of continuing to revise/adjust the physical database design by monitoring resource
utilization as well as internal DBMS processing to reveal bottlenecks such as contention for the
same data or devices.
The initial choice of indexes may have to be revised for the following reasons:
Certain queries may take too long to run for lack of an index.
Certain indexes may not get utilized at all.
Certain indexes may undergo too much updating because the index is on an attribute that
undergoes frequent changes.
Most DBMSs have a command or trace facility, which can be used by the DBA to ask
the system to show how a query was executed—what operations were performed in
what order and what secondary access structures (indexes) were used. By analyzing
these execution plans, it is possible to diagnose the causes of the above problems.
Some indexes may be dropped and some new indexes may be created based on the
tuning analysis.
fluctuate seasonally or during different times of the month or week, and to reorganize
the indexes and file organizations to yield the best overall performance. Dropping and building
new indexes is an overhead that can be justified in terms of performance improvements.
Updating of a table is generally suspended while an index is dropped or created; this loss of
service must be accounted for. Besides dropping or creating indexes and changing from a
nonclustered to a clustered index and vice versa, rebuilding the index may improve performance.
Most RDBMSs use B+-trees for an index. If there are many deletions on the index key, index
pages may contain wasted space, which can be claimed during a rebuild operation. Similarly,
too many insertions may cause overflows in a clustered index that affect performance.
Rebuilding a clustered index amounts to reorganizing the entire table ordered on that key.
The available options for indexing and the way they are defined, created, and reorganized varies
from system to system. As an illustration, consider the sparse and dense indexes. A sparse index
such as a primary index will have one index pointer for each page (disk block) in the data file; a
dense index such as a unique secondary index will have an index pointer for each record. Sybase
provides clustering indexes as sparse indexes in the form of B+-trees, whereas INGRES provides
sparse clustering indexes as ISAM files and dense clustering indexes as B+-trees. In some
versions of Oracle and DB2, the option of setting up a clustering index is limited to a dense
index (with many more index entries), and the DBA has to work with this limitation.
Storage statistics: Data about allocation of storage into tablespaces, index spaces,
The times required for different phases of query and transaction processing
Problems in Tuning
Most of the previously mentioned problems can be solved by the DBA by setting
be closely tied to specific systems. The DBAs are typically trained to handle these
from two or more tables are frequently needed together: This reduces the
■ For the given set of tables, there may be alternative design choices, all of
which achieve 3NF or BCNF.We illustrated alternative equivalent designs. One normalized
design may be replaced by another.
that is in BCNF can be stored in multiple tables that are also in BCNF—for
example, R1(K, A, B), R2(K, C, D, ), R3(K, ...)—by replicating the key K in each
table. Such a process is known as vertical partitioning. Each table groups sets of attributes that
are accessed together. For example, the table EMPLOYEE(Ssn, Name, Phone, Grade, Salary)
may be split into two tables: EMP1(Ssn, Name, Phone) and EMP2(Ssn, Grade, Salary). If the
original table has a large number of rows (say 100,000) and queries about phone numbers and
salary information are totally distinct and occur with very different frequencies, then this
separation of tables may work better.
■ Attribute(s) from one table may be repeated in another even though this creates
replicated in tables wherever the Part# appears (as foreign key), but there
tables based on ten product lines. Each table has the same set of columns
transaction applies to all product data, it may have to run against all the
Tuning Queries
Some typical instances of situations prompting query tuning include the following:
of different sizes and precision (such as Aqty = Bqty where Aqty is of type
2. Indexes are often not used for nested queries using IN; for example, the following
query:
may not use the index on Dno in EMPLOYEE, whereas using Dno = Dnumber
in the WHERE-clause with a single block query may cause the index to be
used.
much as possible.
multiple queries into a single query unless the temporary relation is needed
useful. Consider the following query, which retrieves the highest paid
SELECT Ssn
FROM EMPLOYEE E
FROM EMPLOYEE AS M
This has the potential danger of searching all of the inner EMPLOYEE table M
for each tuple from the outer EMPLOYEE table E. To make the execution
more efficient, the process can be broken into two queries, where the first
FROM EMPLOYEE
GROUP BY Dno;
SELECT EMPLOYEE.Ssn
6. If multiple options for a join condition are possible, choose one that uses a
clustering index and avoid those that contain string comparisons. For example,
7. One idiosyncrasy with some query optimizers is that the order of tables in
the FROM-clause may affect the join processing. If that is the case, one may
have to switch this order so that the smaller of the two relations is scanned
Of the four types above, the first one typically presents no problem, since
most query optimizers evaluate the inner query once. However, for a query
of the second type, such as the example in item 2, most query optimizers
may not use an index on Dno in EMPLOYEE. However, the same optimizers
9. Finally, many applications are based on views that define the data of interest
to those applications. Sometimes, these views become overkill, because a
query may be posed directly against a base table, rather than going through a
Concepts of Keys
A key is a combination of one or more columns that is used to identify rows in a relation
A key in DBMS is an attribute or a set of attributes that help to uniquely identify a tuple
(or row) in a relation (or table). Keys are also used to establish relationships between the
different tables and columns of a relational database. Individual values in a key are called
key values.
A composite key is a key that consists of two or more columns
Super Keys
A super key is a combination of columns that uniquely identifies any row within a
relational database management system (RDBMS) table.
In a real database we don't need values for all of those columns to identify a row
We only need, per our example, the set {EmployeeID}.
o This is a minimal superkey
o So, employeeID is a candidate key.
o EmployeeID can also uniquely identify the tuples.
A candidate key is a key that determines all of the other columns in a relation
Candidate key columns help in searching fewer duplicated or unique records.
Examples
In PRODUCT relation
In ORDER_PROD
ACC#, Fname, Lname, DOB, CNIC#, Addr, City, TelNo, Mobile#, DriveLic#
For NADRA: Citizen (CNIC#, Fname, Lname, FatherName, DOB, OldCNIC#, PAddr, PCity,
TAddr, TCity, TelNo, Mobile#)
Reading Content:
Candidate key is used for the searching purposes in the logical and conceptual database system.
cname, address, city can be duplicated individually and cannot determine a record.
As {natid} {natid, cname}, then {natid, cname} is not candidate key and {natid} is a
candidate key
Exercises
Answer:
A primary key is a candidate key selected as the primary means of identifying rows in a
relation:
A primary key is a minimal identifier (means maybe 2 or 3 columns will gives you uniqueness)
that is used to identify tuples uniquely.
This means that no subset of the primary key is sufficient to provide unique identification of
tuples.
We will now discuss how to identify the primary key in deferent examples.
Example-1:
B# is a primary key. Although, BName is unique, not null but it is not short.
Example-4:
In this topic we are going to discuss more example because we need to analyze further new
issues from the real life that how we can decide a well formatted and well-organized kind of
primary keys.
As we all know, different organizations utilize various primary keys to express their views. For
example,
Indexing is a way to optimize the performance of a database by minimizing the number of disk accesses
required when a query is processed. It is a data structure technique which is used to quickly locate and
access the data in a database. DBMS automatically set the indexing on primary key.
In this topic we have discuss different formats and styles of the primary key.
Basics of Indexing:
This figure indicates that when indexing is used, data is accessed more quickly than when
indexing is not used.
Roll# is issued by some University or college in which these students are studying.
DBMS supplied
Short, numeric and never changes – an ideal primary key!
Has artificial values that are meaningless to users
Normally hidden in forms and reports
Example
RENTAL_PROPERTY without surrogate key:
Exercises
Needs attributes?
Example 1: ATMTransaction
Example 2: InsurancePaid
Normally insurance is paid every year in advance. Therefore, paid year is a well-defined artificial
key because there is a standard criteria Insurance is managed yearly basis.
Solving Example-1
Now we must choose if invoice# should be handled as a surrogate key or not. (When we are
buying something from the cash and carry store).
Let us choose
Surrogate key is called the fact less key as it is added just for our ease of identification of
unique values and contains no relevant fact (or information) that is useful for the table.
It contains unique value for all records of the table.
Solving Example-2
Let us discuss
o Collect all the necessary information about an event (book by, client name, and
payment method etc.)
Which columns are required?
Reading Material:
Discuss about Facebook, blogs and twitter. If we want to store data in database, how to decide
which id should be given.
Solving Example-2
How to decide which columns are required to fill up to keep track record of installments?
o Columns that we need paid amount, due amount, balance, paid date, due date,
penalty and status.
Comparisons of keys
Let us discuss
The main difference between surrogate key and primary key is that surrogate key is a
type of primary key that helps to identify each record uniquely, while the primary key is a set of
minimal columns that helps to identify each record uniquely.
The primary key is the minimal set of attributes which uniquely identifies any row of a
table. The primary key cannot have a NULL value. It cannot have duplicate values.
Unique Key is a key which has a unique value and is used to prevent duplicate values in a
column. A unique key can have a NULL value which is not allowed in a primary key.
1.
a. System date
b. time stamp
c. Random alphanumeric string
Reading Material:
Definition:
A foreign key is an attribute that refers to a primary key of same or different relation to form a
link (constraint) between the relations:
Example-1
Example-2
In this example CustID is referred to the foreign key in ORDERS table that refers to a primary
key in CUSTOMER table.
In this topic, we have to discuss more examples of foreign key. In the last topic, we covered the
characteristics of a foreign key.
Relationship details:
A relationship between two entities of a similar entity type is called a recursive relationship.
Here the same entity type participates more than once in a relationship type with a different role
for each instance. In other words, a relationship has always been between occurrences in two
different entities. However, the same entity can participate in the relationship. This is termed a
recursive relationship.
A referential integrity constraint is a statement that limits the values of the foreign key to those
already existing as primary key values in the corresponding relation
3. Recursive Relationship
Integrity rules
Referential with cascade
Integrity Example
Integrity Rules
Entity integrity
It specifies that:
Detail:
The entity integrity constraint states that no primary key value can be NULL. This is because the primary
key value is used to identify individual tuples in a relation. Having NULL values for the primary key
implies that we cannot identify some tuples. For example, if two or more tuples had NULL for their
primary keys, we may not be able to distinguish them if we try to reference them from other relations.
Key constraints and entity integrity constraints are specified on individual relations.
(Reference: Database Systems (FDS), by Ramez Elmasri and Shamkant Navathe, Addison Wesley,
6th Edition.)
Referential integrity
The database must not contain any unmatched foreign key values.
Detail:
The referential integrity constraint is specified between two relations and is used to maintain the
consistency among tuples in the two relations. Informally, the referential integrity constraint states that a
tuple in one relation that refers to another relation must refer to an existing tuple in that relation.
(Reference: Database Systems (FDS), by Ramez Elmasri and Shamkant Navathe, Addison Wesley,
6th Edition.)
When we update a tuple from a table, say S, that is referenced by another table, say SP,
There are similar choices of referential actions for update and delete:
ON UPDATE CASCADE
ON UPDATE RESTRICT
Integrity Example
Definition of Composite Key
It is combination of Primary Keys (PK) in a relation of selected attributes gives the concept of
composite key. Composite key is an extended form of a primary key. All the characteristics of PK
applies on composite Keys comprising of more than one column.
Basic Example of Composite Keys
During ATM Transaction, amounts can be drawn several times on one ATM Card.
We need follow the issues
How much amount will be withdrawn?
When amount was drawn?
Which machine has been used?
What are the types of transaction?
Card#, Amount, DrawDate, Machine#, TransType are the attributes for transaction
Other Examples of Composite Keys
Can we say Card# will be a primary key? NO
Can DrawDate will be a key with Card#? NO
Then what to do? Add Surrogated key which is TransNo
ATMTransaction(Card#, TransNo, Amount, DrawDate, Machine#, TransType)
Date includes time in seconds as well.
Many transactions are managed on same date.
Composite Keys Examples
Preparing Dataset for Composite Key
Card(Card#, CardHolderName, ExpiryDate, IssueDate, Acc#)
ATMTransaction(Card#, TransNo, Amount, DrawDate, Machine#, TransType)
Answer following questions.
Do we need most closed related table?
What are PK and FK?
Hospital Related Example
PatientVisit(PatID, VisitSNO, Vdate, DocID, Diagnosis,…)
LabTest(PatID, VisitSNO, TestID, Tdate, …)
LabTest has FKs PatID, VisitSNO, referring to same corresponding composite keys in
PatientVisit
Answer following question.
Which lab table needs to be considered?
What about Excluding Composite Key?
Card(Card#, CardHolderName, ExpiryDate, IssueDate, Acc#)
ATMTransaction(ID, Card#, TransNo, Amount, DrawDate, Machine#, TransType)
Answer following questions.
Do we need most closed related table?
What is a drawback of using ID as Surrogated Key in ATMTransaction?
What are PK and FK?
Composite Keys Examples
Course Offering Example
Let’s us assume, there are 1000 number of courses in a table or list.
Students want to register in courses.
Can they register by looking at 1000 courses? NO
Answer following questions.
What to do?
Offer selected courses?
Course Offering Example
Offering course table
CrsOffer(SemID, CrsID, Sec, InstrName, B#, R#)
What to do for keys, PK or Composite Key?
CrsOffer(SemID, CrsID, Sec, InstrName, B#, R#)
CrsReg(SemID, Roll#, CrsID, Sec, TotMarks, Grade)
Preparing Dataset for Composite Keys
CrsOffer(SemID, CrsID, Sec, InstrName, Building#, Room#)
CrsReg(SemID, Roll#, CrsID, Sec, TotMarks, Grade)
Answer following questions.
Do we need most closed related tables?
What are PK and FK?