Dbms Full Notes
Dbms Full Notes
MINISTRY OF EDUCATION
DIPLOMA IN
INFORMATION COMMUNICATION
TECHNOLOGY
Page 1 of 72
Contents
CHAPTER 1: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS.............................. 5
Introduction to Database management system (DBMS) .......................................................................... 5
Advantages of DBMS................................................................................................................................. 5
Meaning of "database system .................................................................................................................. 6
Components of DBMS ............................................................................................................................... 7
Database Access Language ....................................................................................................................... 8
Users (role of key players in database design and development) ............................................................ 8
Evolution of DBMS .................................................................................................................................... 9
Current Trends ........................................................................................................................................ 11
Definition of the database approach ...................................................................................................... 12
CHAPTER 2: DATABASE ORGANISATION ..................................................................................... 14
Meaning of Database Organization ........................................................................................................ 14
Database Organization – Types/Approaches .......................................................................................... 14
CHAPTER 3: PRINCIPLES AND TECHNIQUES OF DATABASE DESIGN ................................. 16
Introduction to Database design Principles ............................................................................................ 16
The database life cycle (DBLC) ................................................................................................................ 16
Basic Design Principles ............................................................................................................................ 18
Types of Database Modeling Techniques ............................................................................................... 19
CHAPTER 4: RELATIONAL DATABASE SYSTEM ......................................................................... 23
Introduction to relational database management system ..................................................................... 23
Characteristics of Relational Databases .................................................................................................. 23
Relational algebra ................................................................................................................................... 24
Relational Algebra Types and Operations............................................................................................... 26
Set Operations on Relations ............................................................................................................... 26
Selection (σ) ........................................................................................................................................ 26
Projection (Π) ...................................................................................................................................... 27
Join ...................................................................................................................................................... 28
Implementing Set Operations ............................................................................................................. 29
Expressing Queries in Relational Algebra ........................................................................................... 32
Relational calculus .................................................................................................................................. 37
The Tuple Relational Calculus [TRC].................................................................................................... 37
Page 2 of 72
Page 3 of 72
Page 4 of 72
It’s a set of software programs that controls the organization, storage and retrieval of data (fields,
records and files) in a database. It also controls the security and integrity of the database.
Advantages of DBMS
The database management system has a number of advantages as compared to traditional
computer file-based processing approach. The DBA must keep in mind these benefits or
capabilities during databases and monitoring the DBMS.
The Main advantages of DBMS are described below.
Controlling Data Redundancy - In non-database systems each application program has its own
private files. In this case, the duplicated copies of the same data is created in many places. In
DBMS, all data of an organization is integrated into a single database file. The data is recorded
in only one place in the database and it is not duplicated.
Sharing of Data - In DBMS, data can be shared by authorized users of the organization. The
database administrator manages the data and gives rights to users to access the data. Many users
can be authorized to access the same piece of information simultaneously. The remote users can
also share same data. Similarly, the data of same database can be shared between different
application programs.
Data Consistency - By controlling the data redundancy, the data consistency is obtained. If a data
item appears only once, any update to its value has to be performed only once and the updated
value is immediately available to all users. If the DBMS has controlled redundancy, the database
system enforces consistency.
Page 5 of 72
Data Security - Form is very important object of DBMS. You can create forms very easily and
quickly in DBMS. Once a form is created, it can be used many times and it can be modified very
easily. The created forms are also saved along with database and behave like a software
component. A form provides very easy way (user-friendly) to enter data into database, edit data
and display data from database. The non-technical users can also perform various operations on
database through forms without going into technical details of a fatabase.
Report Writers - Most of the DBMSs provide the report writer tools used to create reports. The
users can create very easily and quickly. Once a report is created, it can be used may times and it
can be modified very easily. The created reports are also saved along with database and behave
like a software component.
Control Over Concurrency - In a computer file-based system, if two users are allowed to access
data simultaneously, it is possible that they will interfere with each other. For example, if both
users attempt to perform update operation on the same record, then one may overwrite the values
recorded by the other. Most database management systems have sub-systems to control the
concurrency so that transactions are always recorded with accuracy.
Backup and Recovery Procedures - In a computer file-based system, the user creates the backup
of data regularly to protect the valuable data from damage due to failures to the computer system
or application program. It is very time consuming method, if amount of data is large. Most of the
DBMSs provide the 'backup and recovery' sub-systems that automatically create the backup of
data and restore data if required.
Data Independence - The separation of data structure of database from the application program
that uses the data is called data independence. In DBMS, you can easily change the structure of
database without modifying the application program.
Page 6 of 72
Components of DBMS
A database management system (DBMS) consists of several components. Each component plays
very important role in the database management system environment. The major components of
database management system are:
Software
Hardware
Data
Procedures
Database Access Language
Software
The main component of a DBMS is the software. It is the set of programs used to handle the
database and to control and manage the overall computerized database
1. DBMS software itself, is the most important software component in the overall system
2. Operating system including network software being used in network, to share the data of
database among multiple users.
3. Application programs developed in programming languages such as C++, Visual Basic
that are used to to access database in database management system. Each program
contains statements that request the DBMS to perform operation on database. The
operations may include retrieving, updating, deleting data etc . The application program
may be conventional or online workstations or terminals.
Hardware
Hardware consists of a set of physical electronic devices such as computers (together with
associated I/O devices like disk drives), storage devices, I/O channels, electromechanical devices
that make interface between computers and the real world systems etc, and so on. It is impossible
to implement the DBMS without the hardware devices, In a network, a powerful computer with
high data processing speed and a storage device with large storage capacity is required as
database server.
Data
Data is the most important component of the DBMS. The main purpose of DBMS is to process
the data. In DBMS, databases are defined, constructed and then data is stored, updated and
retrieved to and from the databases. The database contains both the actual (or operational) data
and the metadata (data about data or description about data).
Procedures
Procedures refer to the instructions and rules that help to design the database and to use the
DBMS. The users that operate and manage the DBMS require documented procedures on hot use
or run the database management system. These may include.
Page 7 of 72
1. Application Programmers
2. Database Administrators
3. End-Users
Application Programmers
The people who write application programs in programming languages (such as Visual Basic,
Java, or C++) to interact with databases are called Application Programmer.
Database Administrators
A person who is responsible for managing the overall database management system is called
database administrator or simply DBA.
A database developer is an IT professional responsible for working on database technologies. Where
database administrators are more focused on routine maintenance and support for an existing database
setup, database developers tend to focus more on improving databases, expanding their range or
functionality, or otherwise developing submissions for a company's IT architecture.
End-Users
The end-users are the people who interact with database management system to perform
different operations on database such as retrieving, updating, inserting, deleting data etc.
Page 8 of 72
Evolution of DBMS
1. The sixties and seventies: centralized
DBMS to the sixties and seventies (IBM IMS, IDS Bull, Univac DMS, etc.) Were totally centralized
systems, as befits those years operating systems, and hardware for which they were made: a large
enterprise-wide computer and a network of dumb terminals and memory.
The first DBMS in the sixties, yet they were called and were aimed at facilitating the use of large
data sets in which the interrelationships are complex.The archetype of implementation was the
Bill of materials or explosion Parts, typical in the automotive, construction of spacecraft and
related fields. These systems worked only in batches (batch).
Appearing keypad terminals, connected to the central computer via a telephone line, they start to
build great applications on-line transaction processing (OLTP). The DBMS software were
closely linked to communications and transaction management.
Although to write application programs using high level languages such as Cobol or PL / I, were
also available instructions and subroutines specialized to treat BD requiring that the programmer
knew many details of physical design, and that made the was very complex programming.
Since the programs were related to the physical level, it should change continuously when
changes were made in the design and organization of the database.The basic concern was to
maximize performance: response time and transactions per second.
This required the development of applications would be easier.The DBMS of the seventies were
too complex and inflexible, and could only use highly qualified personnel.
The emergence of relational DBMS * marks a significant step to facilitate the programming of
applications with BD and to ensure that programs are independent of the physical aspects of the
database.
All these factors make greater use of the DBMS.Standardization, in 1986, the SQL language was
a veritable explosion of relational DBMS.
Personal computers
During the eighties appear and spread very quickly on personal computers. It also appears these
teams single-user software (eg, dBase and its derivatives, Access), with which it is very easy to
create and use data sets, and that personal data are called bases. Notice that the fact referred to
these early systems DBMS PC is a bit forced, as it does not accept complex structures or
Page 9 of 72
relationships, or could be used in a network that simultaneously serve many different users.But
some, over time, have been turning into real DBMS.
In the late eighties and early nineties, companies have found that their departments have been
buying departmental and personal computers, and applications have been making BD. The result
has been that within the company there are numerous DBMS BD and several different types or
suppliers.This phenomenon of multiplication of the BD and the DBMS has been increased by the
fever of mergers.
The need to have an overview of the company and of linking different applications using
different BD, together with the ease that give the networks for communication between
computers, has led to the current DBMS, which allow a program to work with different BD as if
it were a single. This is what is known as distributed database.
This ideal distribution is achieved when the various BD are supported by one brand of DBMS, ie
when there is homogeneity.However, this is not so simple if the DBMS are heterogeneous.
Today, thanks largely to the standardization of SQL language, DBMS of different brands can be
serviced at each other and work together to provide service to an application program. However,
in general, in cases of heterogeneity can not be reached to give the program that uses the
appearance that it is a single BD.
In addition to this distribution "imposed", wanting to separate the integrated treatment of pre-
existing BD, you can also make a distribution "desired" BD designing a physically distributed
and replicated with parts in different systems.The basic reasons for which are interested in this
distribution are:
2) Cost. A BD can reduce the cost distributed. In the case of a centralized system, all users
computers that can be distributed to different and distant geographical areas are connected to the
central system via communication lines.The total cost of communications can be reduced by a
user to close the data used most often, eg on a computer in your office or even on your personal
computer.
The technology is commonly used to distribute data is known as environment (or architecture)
Client / Server (C / S). All of the relational DBMS market have been adapted to this
environment.
Page 10 of 72
The idea of C / S is simple. Two different processes running on one system or separate systems,
they act so that one has the role of client or a service requester, and the other server or service
provider.
For example, a program that a user application running on your PC (which is connected to a
network) requests some data from a DB that resides on a UNIX computer which, in turn, runs the
relational DBMS that manages it.The application program is the client and the DBMS is the
server. A client process can request services to multiple servers. A server can receive requests
from many customers. In general, a process that makes customer requesting a service to another
process B can also do a service server which prompted another process C (or B, that this request
would be the client).Even the client and server can reside on one system.
The ease of distribution of data available is not the only reason, not even the basic, the great
success of the environments C / S in the nineties. Perhaps the main reason was the flexibility to
build and grow the global computer configuration of the company, as well as making
modifications to it, using very standard hardware and software and cheap.
The success of BD, including personal computers, has led to the emergence of the Fourth
Generation Languages (4GL), very easy and powerful languages, specializing in application
development based on BD.They provide many facilities at the time to define, usually visually,
talks to enter, modify, and query data to the C / S.
Current Trends
Today, relational DBMS are undergoing transformation to accommodate three recent successful
technologies, closely related: multimedia, object-oriented (OO) and Internet and web.
The types of data that can be defined in relational DBMS of the eighties and nineties are very
limited. The incorporation of multimedia technologies, image and sound-in makes it necessary to
accept relational DBMS attributes of these types.
However, some applications do not have enough with the addition of specialized media
types.Need complex types that the developer can define as the application. In short, we need to
abstract data types: TAD. The latest DBMS already incorporated this possibility, and a wide
open market or TAD predefined class libraries.
This brings us to the object-oriented (OO). The success of the OO at the end of the eighties, the
development of basic software applications in industrial engineering and construction of
graphical interfaces with users, has made during the nineties is widespread in virtually all the
fields of computing.
In the SI is also initiated the adoption, shy of the moment, the OO.The use of languages like C +
+ or Java requires relational DBMS fit them with appropriate interfaces.
Page 11 of 72
The rapid adoption of the SI web makes the DBMS server resources to be incorporated into
websites, such as SQL scripts including HTML, Java embedded SQL, etc.. Notice that in the
world of the web are common OO data and multimedia.
In recent years it has begun rolling out an application type of the BD called Data Warehouse, or
data warehouse, which also produces some changes in the relational DBMS market.
Over the years I have worked with BD in different applications, companies have accumulated
large amounts of data of all kinds.If these data are analyzed appropriately can provide valuable
information *.
Therefore, it is a great BD keeping with information from all kinds of enterprise applications
(and even outside). The data in this big warehouse, Data Warehouse, you get a more or less
elaborate replication of which is in the BD used in the daily work of the company. These data
warehouses are used exclusivelymind to make inquiries, most especially to carry out studies *
financial analysts, market analysts, etc..
Currently, the DBMS is adapted to this type of application, including, for example, tools such as:
Self contained nature of database systems (database contains both data and meta-data).
Data Independence: application programs and queries are independent of how data is
actually stored.
Data sharing.
Controlling redundancies and inconsistencies.
Page 12 of 72
Page 13 of 72
Flat
Hierarchical
Relational
Object-oriented
Flat databases
A single kind of record with a fixed number of fields.
Hierarchical databases
can be very easy to answer some questions, but very difficult to answer others
Page 14 of 72
if one-to-many relationship is violated (e.g., a patient can have more than one physician)
then the hierarchy becomes a network
Relational databases
Data are organized as logically independent tables. Features:
‗Natural‘
Not so strongly biased towards specific questions
Expresses relationships by means of shared data rather than explicit pointers
Theoretical basis: relational algebra, calculus; closure
Operations on tables (Join, Project, Select) to form new tables
Object-oriented databases
Object-oriented analysis is another way to model the world,
involving abstraction, encapsulation, modularity and hierarchy (with inheritance).
Classes are used to group objects which have the same types of data and the same methods.
Page 15 of 72
1. Requirements analysis
2. Logical design
3. Physical design
4. Implementation
5. Monitoring, modification, and maintenance
The first three stages are database-design stages, which are briefly described below.
I. Requirements analysis
Requirements Analysis is the first and most important stage in the Database Life Cycle.
It is the most labor-intensive for the database designer.
Page 16 of 72
This stage involves assessing the informational needs of an organization so that a database can
be designed to meet those needs.
II. Logical design
During the first part of Logical Design, a conceptual model is created based on the needs
assessment performed in stage one. A conceptual model is typically an entity-relationship (ER)
diagram that shows the tables, fields, and primary keys of the database, and how tables are
related (linked) to one another.
The tables sketched in the ER diagram are then normalized. The normalization process resolves
any problems associated with the database design, so that data can be accessed quickly and
efficiently.
Certain database design books consider converting an ER diagram into SQL statements to be the
final task in the logical-design stage. According to such books, implementation is just a matter of
feeding SQL statements into an RDBMS and populating the database with data. The difference is
not especially important.
Page 17 of 72
is the field value always required, i.e., are NULL values denied (NOT NULL constraint in SQL)**
should the field values be unique (PRIMARY KEY constraint, UNIQUE constraint)***
what set of values is allowed (CHECK constraint, FOREIGN KEY constraint)
is there some more complex constraints or business rules to be set up (ASSERTION constraint,
triggers)
Page 18 of 72
* Techically speaking, date and time data types fall into the cathegory of numeric data types.
** Make sure that you understand the difference between empty string and NULL. With string data
types they can look the same from the user's point of view, but they behave differently with search
conditions. To avoid confusion, some developers advice to deny NULLs in fields with string data type.
*** Table can have only one primary key but many alternate keys, i.e., keys defined with the UNIQUE
constraint. Notice that both constraints can involve a set of fields rather than only one field. Make sure
that you understand the difference between these two:
a) table has two UNIQUE constraints: UNIQUE(a), UNIQUE(b);
b) table has one UNIQUE constraint: UNIQUE(a,b).
Another way to speed up data searches is denormalization, in the case that table joins should be
made in the query. Denormalization simply means joining two or more tables into one table.
Then the join operation hasn't have to be done by the query, which saves time. Again, this means
slowing down data update operations, because in denormalized tables there are redundancies:
same information has to be updated in many places.
From the below mentioned models the relational model is the most commonly used model for
most database designs. But in some special cases other models can be more beneficial.
Page 19 of 72
Relational Model: Founded on mathematical theory, this database model takes information
storage and retrieval to a new level because it offers a way to find and understand different
relationships between the data. By looking at how different variables can change the
relationship between the data, new perspectives can be gained as the information’s
presentation is altered by focusing on different attributes or domains. These models can often
be found within airline reservation systems or bank databases.
Graph Model: Graph model is another model that is gaining popularity. These databases are
created based on the Graph theory and used nodes and edges to represent data. The structure
is somewhat similar to object oriented applications. Graph databases are generally easier to
scale and usually perform faster for associative data sets.
Hierarchical Model: Much like the common organizational chart used to organize companies,
this database model has the same tree-like appearance and is often used to structure XML
documents. In looking at data efficiency, this is an ideal model where the data contains nested
and sorted information, but it can be inefficient when the data does not have an upward link to
a main data point or subject. This model works well for an employee information management
system in a company that seeks to restrict or assign equipment usage to certain individuals
and/or departments.
Page 20 of 72
Network Model: Using records and sets, this model uses a one-to-many relationship approach
for the data records. Multiple branches are allocated for lower-level structures and branches
that are then connected by multiple nodes, which represent higher-level structures within the
information. This database modeling method provides an efficient way to retrieve information
and organize the data so that it can be looked at multiple ways, providing a means of increasing
business performance and reaction time. This is a viable model for planning road, train, or utility
networks.
The network model where a node can have multiple parent nodes
Dimensional Model: This is an adaptation of the relational model and is often used in
conjunction with it by adding the “dimension” of fact to the data points. Those facts can be used
as measuring sticks for the other data to determine how a size of a group or the timing of a
group impacted upon certain results. This can help a business make more effective strategic
decisions and help them get to know their target audience. These models can be useful to
organizations with sales and profit analysis.
Page 21 of 72
Object Relational Model: These models have created an entirely new type of database, which
combines database design with application program to solve specific technical problems while
leveraging the best of both worlds. To date, object databases still need to be refined to achieve
greater standardization. Real world applications of this model often include technical or
scientific fields, such as engineering and molecular biology.
Page 22 of 72
1) Data in the relational database must be represented in tables, with values in columns within
rows.
2) Data within a column must be accessible by specifying the table name, the column name, and
the value of the primary key of the row.
3) The DBMS must support missing and inapplicable information in a systematic way, distinct
from regular values and independent of data type.
4) The DBMS must support an active on-line catalogue.
5) The DBMS must support at least one language that can be used independently and from
within programs, and supports data definition operations, data manipulation, constraints, and
transaction management.
6) Views must be updatable by the system.
7) The DBMS must support insert, update, and delete operations on sets.
8) The DBMS must support logical data independence.
9) The DBMS must support physical data independence.
10) Integrity constraints must be stored within the catalogue, separate from the application.
11) The DBMS must support distribution independence. The existing application should run
when the existing data is redistributed or when the DBMS is redistributed.
12) If the DBMS provides a low level interface (row at a time), that interface cannot bypass the
integrity constraints.
Page 23 of 72
Relational algebra
Relational algebra is a procedural query language, which takes instances of relations as input and
yields instances of relations as output. It uses operators to perform queries. An operator can be
either unary or binary. They accept relations as their input and yields relations as their output.
Relational algebra is performed recursively on a relation and intermediate results are also
considered relations.
Fundamental operations of Relational algebra:
Select
Project
Union
Set different
Cartesian product
Rename
Notation σp(r)
Where p stands for selection predicate and r stands for relation. p is prepositional logic formulae
which may use connectors like and, or and not. These terms may use relational operators like: =,
≠, ≥, < , >, ≤.
For example:
σsubject="database"(Books)
Output: Selects tuples from books where subject is 'database' and 'price' is 450.
Output : Selects tuples from books where subject is 'database' and 'price' is 450 or the publication
year is greater than 2010, that is published after 2010.
Page 24 of 72
for example:
r ∪ s = { t | t ∈ r or t ∈ s}
Notion: r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold:
r, s must have same number of attributes.
Attribute domains must be compatible.
Set Difference ( − )
The result of set difference operation is tuples which present in one relation but are not in the
second relation.
Notation: r − s
Finds all tuples that are present in r but not s.
Output : yields a relation as result which shows all books and articles written by tutorialspoint.
Page 25 of 72
Rename operation ( ρ )
Results of relational algebra are also relations but without any name. The rename operation
allows us to rename the output relation. rename operation is denoted with small greek letter rho ρ
Notation: ρ x (E)
Where the result of expression E is saved with name of x.
Additional operations are:
Set intersection
Assignment
Natural join
Relational algebra is a formal system for manipulating relations (Tables). Operands of this algebra are
relations. Operations of this algebra include the usual set operations (since relations are sets of tuples
[Rows]), and special operations defined for relations.
For the set operations on relations, both operands must have the same scheme, and the result has
that same scheme.
R1 U R2 (union) is the relation containing all tuples that appear in R1, R2, or both.
R1 n R2 (intersection) is the relation containing all tuples that appear in both R1 and R2.
R1 - R2 (set difference) is the relation containing all tuples of R1 that do not appear in R2.
Selection (σ)
Selects tuples from a relation whose attributes meet the selection criteria, which is normally
expressed as a predicate.
Page 26 of 72
R2 = select(R1,P)
That is, from R1 we create a new relation R2 containing those tuples from R1 that satisfy (make
true) the predicate P.
A predicate is a boolean expression whose operators are the logical connectives (and, or, not)
and arithmetic comparisons (LT, LE, GT, GE, EQ, NE), and whose operands are either domain
names or domain constants.
select(Workstation,Room=633) =
Projection (Π)
R2 = project(R1,D1,D2,...Dn)
That is, from the tuples in R1 we create a new relation R2 containing only
the domains D1,D2,..Dn.
project(Server,Name,Status) =
Name Status
==============
diamond up
emerald up
graphite down
ruby up
frito up
Page 27 of 72
project(select(User,Status=UG),Name,Status) =
Name Status
==================
A. Cohn UG
J. Inka UG
R. Kemp UG
Join
R3 = join(R1,D1,R2,D2)
Given a domain from each relation, join considers all possible pairs of tuples from the two
relations, and if their values for the chosen domains are equal, it adds a tuple to the result
containing all the attributes of both tuples (discarding the duplicate domain D2).
Natural join: If the two relations being joined have exactly one attribute (domain) name in
common, then we assume that the single attribute in common is the one being compared to see if
a new tuple will be inserted in the result.
Assuming that we've augmented the domain names in our lab database so that we use
MachineName, PrinterName, ServerName, and UserName in place of the generic domain
"Name", then
join(Workstations,Printers)
is a natural join, on the shared attribute name Room. The result is a relation of all workstation/printer
attribute pairs that are in the same room.
R1 = project(Workstation,Name,Room)
R2 = project(Printer,Name,Room)
R3 = join(R1,R2)
Page 28 of 72
R1 R2 R3
Name Room Name Room WName Pname Room
============ ============ ====================
coke 633 chaucer 737 coke uglab 633
bass 633 keats 706 bass uglab 633
bashful 633 poe 707 bashful uglab 633
tab 628 dali 737
crush 628 uglab 633
copy R1 to R3 in O(N)
insert R2 in R3 in O(M)
copy R1 to R3 in O(N)
for each tuple in R2 (which is O(M))
o use index to lookup tuples in R1 with the same index value O(1)
o if R2 tuple equals some such R1 tuple, don't add R2 tuple to R3
Implementing Projection
sort the result and remove consecutive tuples that are equal
o requires time O(N log N) where N is the size of the original relation
implement the result as a set
Page 29 of 72
Implementing Selection
A nested loop join on relations R1 (with N domains) and R2 (with M domains), considers all |R1|
x |R2| pairs of tuples.
R3= join(R1,Ai,R2,Bj)
Index Join
An index join exploits the existence of an index for one of the domains used in the join to find
matching tuples more quickly.
R3= join(R1,Ai,R2,Bj)
Page 30 of 72
We could choose to use an index for R2, and reverse the order of the loops.
The decision on which index to use depends on the number of tuples in each relation.
Sort Join
If we don't have an index for a domain in the join, we can still improve on the nested-loop join
using sort join.
R3= join(R1,Ai,R2,Bj)
Assumptions
Comparison
Page 31 of 72
Expression simplification is an important query optimization technique, which can affect the
running time of queries by an order of magnitude or more.
Nonassociativity
Commutativity
select(select(R1,P1),P2) = select(select(R1,P2),P1)
Selection pushing
if P contains attributes of R
select(join(R,Ai,S,Bj),P) = join(select(R,P),Ai,S,Bj)
if P contains attributes of S
Page 32 of 72
select(join(R,Ai,S,Bj),P) = join(R,Ai,select(S,P),Bj)
select(R,P) = select(select(R,A),B)
select(R,P) = select(select(R,B),A)
CSG: Course-StudentID-Grade
SNAP: StudentID-Name-Address-Phone
CDH: Course-Day-Hour
CR: Course-Room
We can use a brute-force approach that joins all the data in the relations into a single large
relation, selects those tuples that meet the query criteria, and then isolates the answer field using
projection.
R1 = join(CSG,SNAP)
R2 = join(R1,CDH)
R3 = join(R2,CR)
R4 = select(R3,P)
R5 = project(R4,Room)
project(select(join(join(join(CSG,SNAP),CDH),CR),P),Room)
The selection uses only Name, Day, and Hour attributes (and not Course or Room), so we can
push the selection inside the outermost join.
R1 = join(CSG,SNAP)
R2 = join(R1,CDH)
R3 = select(R2,P)
R4 = join(R3,CR)
R5 = project(R4,Room)
Page 33 of 72
We cannot push selection further, because the predicate involves attributes from both operands
of the next innermost join (R1,CDH).
We can split the selection into two, one based on Name, and the other based on Day-Hour.
R1 = join(CSG,SNAP)
R2 = join(R1,CDH)
R3 = select(R2,Day="Monday" and Hour="Noon")
R4 = select(R3,Name="Amy")
R5 = join(R4,CR)
R6 = project(R5,Room)
Now we can push the first selection inside the join, since it involves only attributes from the
CDH relation.
R1 = join(CSG,SNAP)
R2 = select(CDH,Day="Monday" and Hour="Noon")
R3 = join(R1,R2)
R4 = select(R3,Name="Amy")
R5 = join(R4,CR)
R6 = project(R5,Room)
Similarly we can push the second selection inside the preceding join, since it involves only
attributes from R1 (ie, Name).
R1 = join(CSG,SNAP)
R2 = select(CDH,Day="Monday" and Hour="Noon")
R3 = select(R1,Name="Amy")
R4 = join(R2,R3)
R5 = join(R4,CR)
R6 = project(R5,Room)
R1 = select(SNAP,Name="Amy")
R2 = join(CSG,R1)
R3 = select(CDH,Day="Monday" and Hour="Noon")
R4 = join(R2,R3)
R5 = join(R4,CR)
R6 = project(R5,Room)
Page 34 of 72
Projection pushing
To push a projection operation inside a join requires that the result of the projection contain the
attributes used in the join.
project(join(R,Ai,S,Bj),D1,D2,...Dn)
In this case, we know that the domains in the projection will exist in the relation that results from
the join.
we should only project on those domains that exist in each of the two relations
we must ensure that the join domains Ai and Bj exist in the resulting two relations
R1 = project(R,PDR)
R2 = project(S,PDS)
R3 = join(R1,Ai,R2,Bj) = project(join(R,Ai,S,Bj),D1,D2,...Dn)
R1 = select(SNAP,Name="Amy")
R2 = join(CSG,R1)
R3 = select(CDH,Day="Monday" and Hour="Noon")
R4 = join(R2,R3)
R5 = join(R4,CR)
R6 = project(R5,Room)
This approach carries along unnecessary attributes every step of the way.
Page 35 of 72
R1 = select(SNAP,Name="Amy")
R2 = join(CSG,R1)
R3 = select(CDH,Day="Monday" and Hour="Noon")
R4 = join(R2,R3)
R5 = project(CR, Course, Room)
R6 = project(R4, Course)
R7 = join(R5,R6)
R8 = project(R7,Room)
Note that R5 is unnecessary, since the domains in the projection are all the domains of CR.
R1 = select(SNAP,Name="Amy")
R2 = join(CSG,R1)
R3 = select(CDH,Day="Monday" and Hour="Noon")
R4 = join(R2,R3)
R5 = project(R4, Course)
R6 = join(CR,R5)
R7 = project(R6,Room)
We can continue pushing the projection on Course below the join for R4.
R1 = select(SNAP,Name="Amy")
R2 = join(CSG,R1)
R3 = select(CDH,Day="Monday" and Hour="Noon")
R4 = project(R2,Course)
R5 = project(R3,Course)
R6 = join(R4,R5)
R7 = join(CR,R6)
R8 = project(R7,Room)
We can continue pushing the projection on Course for R4 below the join for R2.
R1 = select(SNAP,Name="Amy")
R2 = project(CSG,Course,StudentID)
R3 = project(R1,StudentID)
R4 = join(R2,R3)
R5 = project(R4,Course)
R6 = select(CDH,Day="Monday" and Hour="Noon")
R7 = project(R6,Course)
R8 = join(R6,R7)
R9 = join(CR,R8)
R10 = project(R9,Room)
Page 36 of 72
Relational calculus
An operational methodology, founded on predicate calculus, dealing with descriptive expressions
that are equivalent to the operations of relational algebra. Codd's reduction algorithm can convert
from relational calculus to relational algebra.
Two forms of the relational calculus exist: the tuple calculus and the domain calculus.
predicate calculus is the system of symbolic logic concerned not only with relations between
propositions as wholes but also with the representation by symbols of individuals and predicates in
propositions and with quantification over individuals Also called functional calculus
In contrast with Relational Algebra, Relational Calculus is non-procedural query language, that
is, it tells what to do but never explains the way, how to do it.
Relational calculus exists in two forms:
1. The tuple relational calculus is a nonprocedural language. (The relational algebra was
procedural.)
For Example:
{ T.name | Author(T) AND T.article = 'database' }
Output: returns tuples with 'name' from Author who has written article on 'database'.
TRC can be quantified also. We can use Existential ( ∃ )and Universal Quantifiers ( ∀ ).
For example:
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output : the query will yield the same result as the previous one.
Page 37 of 72
Domain variables take on values from an attribute's domain, rather than values for an entire tuple.
A formal query in DRC is expressed as {<x1, x2 , …, xn> | P(x1, x2 , …, xn )} where x1, x2 , …, xn are
domain variables and P(x1, x2 , …, xn) is a formula involving those variables.
Similar to the TRC, formula in Domain Relational Calculus is built up from atoms. An atom in
the Domain Relational Calculus has one of the following forms:
The following rules are used to build up the Domain Relational Calculus formula from atoms:
An atom is a formula
If P is a formula , then so are ¬P and (P)
If P1 and P2 are formulae, then so are P1∧P2 , P1∨P2and P1 ⇒ P2
If P(t) is a formula in x where x is a domain variable then ∃x (P(x)) and ∀x(P(x)) are also
formulae.
where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example: {< article, page, subject > | ∈ TutorialsPoint ∧ subject = 'database'}
Output: Yields Article, Page and Subject from relation TutorialsPoint where Subject is database.
Just like TRC, DRC also can be written using existential and universal quantifiers. DRC also
involves relational operators.
Expression power of Tuple relation calculus and Domain relation calculus is equivalent to
Relational Algebra.
Example queries
Page 38 of 72
Query 2: Find the name, address of employees who works for department number 1
Query 3: Find the name of the department that employee John works for.
{ < dn > ∣ ∃ di,o,m ( < di,dn,o,m > ∈ DEPARTMENT ∧ ∃ id,n,b,a,s,d ( < id, n, b, a, s,
d > ∈ EMPLOYEE ∧ ( n = John ∧ d = di ) ) ) }
Query 4: Find the SSN, start date of the employees who works for project number P1 or project
number P2
Query 5: Find the name, relationship of all the dependents of employees who works for
Department Human Resource
Query 6: Finds the name of the employees who join in every project.
Page 39 of 72
In the Domain Relational Calculus, we must concerned about the formular within ―there exists‖
and ―for all‖. Consider the expression {x∣∃(<x,y>r)∧∃z(¬(<x,z>r)∧P(x,z))}. In the second part of
this expression ∃z(¬(<x,z>r)∧P(x,z)) we need to consider values for z that do not appear in r.
This set of values actually an infinite set. This case does not appear in the Tuple relational
Calculus because in TRC the universal and existential quantifiers are applied on variables that
already range over a specific relation ( in the form ∃t r (P(t)) and ∀t r (P(t)) ) .
The Domain Relational Calculus restricted to safe expression, is equivalent to relational algebra
in term of expressive power.
Page 40 of 72
The three schema approach to software engineering uses three levels of ER models that may be
developed.
Page 41 of 72
model must contain enough detail to produce a database and each physical ER model is
technology dependent since each database management system is somewhat different.
The physical model is normally instantiated in the structural metadata of a database
management system as relational database objects such as database tables, database indexes
such as unique key indexes, and database constraints such as a foreign key constraint or a
commonality constraint. The ER model is also normally used to design modifications to the
relational database objects and to maintain the structural metadata of the database.
Data Models
Before you look at specific symbols, it's important to understand the various levels of ERDs.
There are several ways to model entity-relationship diagrams. The most high-level type is a
conceptual data model; the next highest is the logical data model, and the lowest-level (and
therefore most detailed) type is the physical data model. Consult the chart below to see which
elements are covered in each data model.
Conceptual ERDs can be used as the foundation for logical data models. They may also be used
to form commonality relationships between ER models as a basis for data model integration.
Page 42 of 72
Drawing ERD
ENTITIES
Entities are objects or concepts that represent important data. They are typically nouns, e.g.
customer, supervisor, location, or promotion.
Strong entities exist independently from other entity types. They always possess one or more
attributes that uniquely distinguish each occurrence of the entity.
Weak entities depend on some other entity type. They don't possess unique attributes (also
known as a primary key) and have no meaning in the diagram without depending on another
entity. This other entity is known as the owner.
Associative entities are entities that associate the instances of one or more entity types. They
also contain attributes that are unique to the relationship between those entity instances.
Page 43 of 72
RELATIONSHIPS
Relationships are meaningful associations between or among entities. They are usually verbs,
e.g. assign, associate, or track. A relationship provides useful information that could not be
discerned with just the entity types.
Weak relationships, or identifying relationships, are connections that exist between a weak
entity type and its owner.
ATTRIBUTES
Attributes are characteristics of either an entity, a many-to-many relationship, or a one-to-one
relationship.
Multivalued attributes are those that are capable of taking on more than one value.
Derived attributes are attributes whose value can be calculated from related attribute values.
Page 44 of 72
ERD Notation
Page 45 of 72
Examples
Page 46 of 72
CHAPTER 6: NORMALIZATION
Introduction and meaning of Database normalization
Database normalization is the process of organizing the fields and tables of a relational database to
minimize redundancy. Normalization usually involves dividing large tables into smaller (and less
redundant) tables and defining relationships between them.
Normalization of data can be defined as a process during which the existing tables of a database
are tested to find certain data dependency between the column and the rows or normalizing of
data can be referred to a formal technique of making preliminary data structures into an easy to
maintain and make efficient data structure
With data normalization any table dependency detected, the table is restructured into multiple
tables (two tables) which eliminate any column dependency. Incase data dependency is still
exhibited the process is repeated till such dependency are eliminated. The process of eliminating
data redundancy is based upon a theory called functional dependency
Importance of normalization
It highlights constraints and dependency in the data and hence aid the understanding the nature
of the data
Normalization controls data redundancy to reduce storage requirement and standard
maintenance
Normalization provide unique identification for records in a database
Each stage of normalization process eliminate a particular type of undesirable dependency
Normalization permits simple data retrieval in response to reports and queries
The third normalization form produces well designed database which provides a higher degree
of independency
Normalization helps define efficient data structures
Normalized data structures are used for file and database design
Normalization eliminate unnecessary dependency relationship within a database file
Page 47 of 72
1. Extra storage space: storing the same data in many places takes large amount of disk space.
2. Entering same data more than once during data insertion.
3. Deleting data from more than one place during deletion.
4. Modifying data in more than one place.
5. Anomalies may occur in the database if insertion, deletion, modification etc are no done
properly. It creates inconsistency and unreliability in the database.
To solve this problem, the ―raw‖ database needs to be normalized. This is a step by step process
of removing different kinds of redundancy and anomaly at each step. At each step a specific rule
is followed to remove specific kind of impurity in order to give the database a slim and clean
look.
Page 48 of 72
In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification
facility for any single row. Further, each primary key points to a variable length record (3 for
E01, 2 for E02 and 4 for E03).
As you can see now, each row contains unique combination of values. Unlike in UNF, this
relation contains only atomic values, i.e. the rows can not be further decomposed, so the relation
is now in 1NF.
Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and
Bank-Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which
is not the primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be
removed into another related relation, it would come to 2NF.
Page 49 of 72
Bank-Id Bank-Name
B01 SBI
B02 UTI
After removing the portion into another relation we store lesser amount of data in two relations
without any loss information. There is also a significant reduction in redundancy.
Such derived dependencies hold well in most of the situations. For example if we have
Roll → Marks
And
Marks → Grade
Then we may safely derive
Roll → Grade.
This third dependency was not originally specified but we have derived it.
The derived dependency is called a transitive dependency when such dependency becomes
improbable. For example we have been given
Roll → City
And
City → STDCode
If we try to derive Roll → STDCode it becomes a transitive dependency, because obviously the
STDCode of a city cannot depend on the roll number issued by a school or college. In such a
case the relation should be broken into two, each containing one of these two dependencies:
Roll → City
And
City → STD code
Page 50 of 72
could be same situation when a 3NF relation may not be in BCNF the following conditions are
found true.
The relation diagram for the above relation is given as the following:
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information
that Rao is the Head of Department of Chemistry.
Page 51 of 72
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept.
and deleting Head of Dept. form the given relation. The normalized relations are shown in the
following.
Department
Head of Dept.
Physics Ghosh
Mathematics Krishnan
Chemistry Rao
See the dependency diagrams for these new relations.
A multi-valued dependency is a typical kind of dependency in which each and every attribute
within a relation depends upon the other, yet none of them is a unique primary key.
We will illustrate this with an example. Consider a vendor supplying many items to many
projects in an organization. The following are the assumptions:
Page 52 of 72
A multi valued dependency exists here because all the attributes depend upon the other and yet
none of them is a primary key having unique value.
1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a blank
for item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.
Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned
above. The problem is reduced by expressing this relation as two relations in the Fourth Normal
Form (4NF). A relation is in 4NF if it has no more than one independent multi valued
dependency or one independent multi valued dependency with a functional dependency.
The table can be expressed as the two 4NF relations given as following. The fact that vendors are
capable of supplying certain items and that they are assigned to supply for some projects in
independently specified in the 4NF relation.
Vendor-Supply Vendor-Project
V1 I1 V1 P1
V1 I2 V1 P3
V2 I2 V2 P1
V2 I3 V3 P2
V3 I1
Page 53 of 72
Page 54 of 72
Supports all major DBMSs from a single interface. Ability to use the tools
Multi-platform Support
on all supported platforms from a single license.
Code Templates Eliminates the need to memorize and type SQL syntax
Context-sensitive DMBS DBMS actions, such as Extract and Drop, are available directly in the
Actions context menu of the appropriate tokens in the SQL editor
Page 55 of 72
Developer Features
Context-sensitive DBMS DBMS actions, such as Extract and Drop, are available directly in the
Actions context menu of the appropriate tokens in the SQL editor
Debug Java, step seamlessly into SQL (i.e. stored procedure) and back
SQL Debugging
into Java again – true system-wide, round-trip debugging
Data Governance*
Gives users real-time metadata visibility in the SQL IDE and will gain
valuable context in SQL query development with awareness of sensitive
Inline Metadata
data. Examples of metadata attributes are: table descriptions, PII, data
governance policy information, etc.
Centralized Datasource Provides the functionality for users to work from a common, centralized
Repository datasource repository
Page 56 of 72
SQL commands can be used not only for searching the database but also to perform various other
functions like, for example, you can create tables, add data to tables, or modify data, drop the
table, set permissions for users.
SQL commands are grouped into four major categories depending on their functionality:
Data Definition Language (DDL) - These SQL commands are used for creating, modifying,
and dropping the structure of database objects. The commands are CREATE, ALTER,
DROP, RENAME, and TRUNCATE.
Data Manipulation Language (DML) - These SQL commands are used for storing,
retrieving, modifying, and deleting data.
These Data Manipulation Language commands are: SELECT INSERT, UPDATE, and
DELETE.
Transaction Control Language (TCL) - These SQL commands are used for managing
changes affecting the data. These commands are COMMIT, ROLLBACK, and SAVEPOINT.
Data Control Language (DCL) - These SQL commands are used for providing security to
database objects. These commands are GRANT and REVOKE.
Page 57 of 72
Now we want to create a table called "Persons" that contains five columns: P_Id, LastName,
FirstName, Address, and City.
The P_Id column is of type int and will hold a number. The LastName, FirstName, Address, and
City columns are of type varchar with a maximum length of 255 characters.
The empty table can be filled with data with the INSERT INTO statement.
Page 58 of 72
Describe command
The describe command gives you a list of all the data fields used in your database table. In the
example, you can see that the table named test in the sales data database keeps track of four
fields: name, description, num, and date_modified.
+---------------+--------------+------+-----+------------+----------------+
+---------------+--------------+------+-----+------------+----------------+
The first form doesn't specify the column names where the data will be inserted, only their
values:
The second form specifies both the column names and the values to be inserted:
Page 59 of 72
Page 60 of 72
The following SQL statement will add a new row, but only add data in the "P_Id", "LastName"
and the "FirstName" columns:
5 Tjessem Jakob
UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value
Note: Notice the WHERE clause in the UPDATE syntax. The WHERE clause specifies which
record or records that should be updated. If you omit the WHERE clause, all records will be
updated!
Page 61 of 72
5 Tjessem Jakob
Now we want to update the person "Tjessem, Jakob" in the "Persons" table.
UPDATE Persons
SET Address='Nissestien 67', City='Sandnes'
WHERE LastName='Tjessem' AND FirstName='Jakob'
Page 62 of 72
Be careful when updating records. If we had omitted the WHERE clause in the example above,
like this:
UPDATE Persons
SET Address='Nissestien 67', City='Sandnes'
3. Switch to a database.
Page 63 of 72
6. To delete a db.
7. To delete a table.
9. Returns the columns and column information pertaining to the designated table.
11. Show all records containing the name "Bob" AND the phone number '3444444'.
mysql> SELECT * FROM [table name] WHERE name = "Bob" AND phone_number =
'3444444';
12. Show all records not containing the name "Bob" AND the phone number '3444444' order
by the phone_number field.
mysql> SELECT * FROM [table name] WHERE name != "Bob" AND phone_number =
'3444444' order by phone_number;
13. Show all records starting with the letters 'bob' AND the phone number '3444444'.
mysql> SELECT * FROM [table name] WHERE name like "Bob%" AND phone_number =
'3444444';
14. Show all records starting with the letters 'bob' AND the phone number '3444444' limit to
records 1 through 5.
mysql> SELECT * FROM [table name] WHERE name like "Bob%" AND phone_number =
'3444444' limit 1,5;
Page 64 of 72
15. Use a regular expression to find records. Use "REGEXP BINARY" to force case-sensitivity.
This finds any record beginning with a.
mysql> alter table [table name] add column [new column name] varchar (20);
Page 65 of 72
mysql> alter table [table name] change [old column name] [new column name] varchar (50);
mysql> create table [table name] (personid int(50) not null auto_increment primary
key,firstname varchar(35),middlename varchar(50),lastnamevarchar(50) default 'bato');
Page 66 of 72
A database management system (DBMS) is a collection of programs that enables you to store,
modify, and extract information from a database. There are many different types of DBMSs,
ranging from small systems that run on personal computers to huge systems that run on
mainframes. The following are examples of database applications:
Importance of DBMS:
A database management system is important because it manages data efficiently and allows
users to perform multiple tasks with ease.
A database management system stores, organizes and manages a large amount of information
within a single software application. Use of this system increases efficiency of business
operations and reduces overall costs.
Handling multiple types of data. Some of the data that are easily managed with this type of
system include: employee records, student information, payroll, accounting, project
management, inventory and library books. These systems are built to be extremely versatile.
DBMS Functions
There are several functions that a DBMS performs to ensure data integrity and consistency of
data in the database. The ten functions in the DBMS are: data dictionary management, data
storage management, data transformation and presentation, security management, multiuser
access control, backup and recovery management, data integrity management, database access
languages and application programming interfaces, database communication interfaces, and
transaction management.
1. Data Dictionary Management
Data Dictionary is where the DBMS stores definitions of the data elements and their
relationships (metadata). The DBMS uses this function to look up the required data component
structures and relationships. When programs access data in a database they are basically going
through the DBMS. This function removes structural and data dependency and provides the user
with data abstraction. In turn, this makes things a lot easier on the end user. The Data Dictionary
is often hidden from the user and is used by Database Administrators and Programmers.are u
mad
Page 67 of 72
4. Security Management
This is one of the most important functions in the DBMS. Security management sets rules that
determine specific users that are allowed to access the database. Users are given a username and
password or sometimes through biometric authentication (such as a fingerprint or retina scan) but
these types of authentication tend to be more costly. This function also sets restraints on what
specific data any user can see or manage.
Page 68 of 72
A – Atomicity: states a transaction is an indivisible unit that is either performed as a whole and
not by its parts, or not performed at all.It is the responsibility of recovery management to make
sure this takes place.
C – Consistency:A transaction must alter the database from one constant state to another
constant state.
Concurrency: concurrent access (meaning 'at the same time') to the same database by
multiple users
Security: security rules to determine access rights of users
Backup and recovery: processes to back-up the data regularly and recover data if a
problem occurs
Integrity: database structure and rules improve the integrity of the data
Data descriptions: a data dictionary provides a description of the data
Page 69 of 72
3. Automated management
Automating database management is another emerging trend. The set of such techniques and
tools intend to simplify maintenance, patching, provisioning, updates and upgrades — even
project workflow. However, the trend may have limited usefulness since database management
frequently needs human intervention.
5. In-memory databases
Within the data warehousing community there are similar questions about columnar versus row-
based relational tables; the rise of in-memory databases, the use of flash or solid-state disks
(which also applies within transaction processing), clustered versus no-clustered solutions and so
on.
Page 70 of 72
6. Big Data
To be clear, big data does not necessarily mean lots of data. What it really refers to is the ability
to process any type of data: what is typically referred to as semi-structured and unstructured data
as well as structured data. Current thinking is that these will typically live alongside conventional
solutions as separate technologies, at least in large organisations, but this will not always be the
case.
Integrating Trends
Projects involving databases should not be viewed and appreciated solely on how they adhere to
these trends. Ideally, each tool or process available should merge in some meaningful way with
existing operations. It is important to look of these trends as items that can coincide: enhancing
security and moving to the cloud coexist?
Thirty years after the computerization of databases, the Internet has lead to an exponential
growth within the industry – whether indirectly or directly, everything that compiles data uses a
database. Recent times have proven to be an exceptional period of the production and capturing
of a nearly overwhelming amount of data. This has obviously created opportunities for
businesses to gain visibility into their customers and industry, but it has also created many
challenges in database management.
Data Integration from Various Sources – With the advancement of smartphones, new mobile
applications, and the Internet of Things, businesses must be able to have their data adapt
accordingly. These varying types of data and sources cause a typical data center of today to
contain patchwork for data management technologies. The management techniques have
become more diverse than ever.
Public and Private Data Security – In today’s digital world, security is the most prevalent
concern. Businesses must be able to ensure that every bit of their data remains safe and at
limited risk of exposure from hackers or leaks. Database breaches of highly sensitive information
have led to the destroyed reputation of businesses. It is up to the manager of the database to
ensure that the data is fully secured at all times.
The Management of Cloud-Based Databases – In recent years, the Cloud has become one of the
biggest terms in the tech community. Both businesses and consumers want to be able to access
their data from database from the cloud or from a cloud database provider’s servers in addition
to the standard on-premises mode of deployment. Cloud computing enables users to effectively
allocate resources, optimize scaling, and allow for high availability. Handling database that run
on the cloud and on-premises is yet another challenge for database managers.
The Growth of Structured and Unstructured Data – The amount of data that has being both
created and collected has been growing at an unprecedented rate for years. Those who deal
with analytics may be excited by the promise of insight and business intelligence that comes
Page 71 of 72
from big data, but those who manage databases face the challenges that come along with
managing overall growth and data types from an increasing number of database platforms.
Data Strategy
o What kind of data is important and what kind of performance should be achieved?
What data needs to be protected and what should be analyzed?
o How much historical data must be accumulated? What does this mean for capacity
planning and disk space?
o Can you monetize on your data? Which data needs to be aggregated or correlated to
provide the necessary insights into the business?
Database Support
o You must consider that moving to the cloud does not guarantee data backup and
security. This is something that must still be managed with 24/7 monitoring and
coverage.
o Are the right personnel members with the necessary skill sets always available?
Backup Strategy
o Do you have the right kind of backup retention available?
o Have you determined the necessary backup frequency to determine the Recovery Point
Objective (RPO)?
o Have you determined the Recovery Time Objective (RTO) due to high availability
requirements?
Security Strategy
o How will external and internal security be handled? Who can access what?
o What kind of data access policies should be in place?
o How are regulatory requirements handled?
o In the event of a hack, breach, or leak, how will data exposure be handled?
Page 72 of 72