0% found this document useful (0 votes)
60 views69 pages

Database Notes New

The document provides information about databases and database management systems. It defines what a database is, provides examples of different types of databases, and describes different database models including relational, hierarchical, and network models. It also discusses the functions of a database management system (DBMS), which includes creating, maintaining, and interfacing with the database.

Uploaded by

Jeff jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views69 pages

Database Notes New

The document provides information about databases and database management systems. It defines what a database is, provides examples of different types of databases, and describes different database models including relational, hierarchical, and network models. It also discusses the functions of a database management system (DBMS), which includes creating, maintaining, and interfacing with the database.

Uploaded by

Jeff jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

DATABASE NOTES

UNIT LECTURER: MR KARIUKI EDWARD


EMAIL: [email protected]

DATABASES.

What is a Database?

• It is a collection of information related to a particular subject or purpose.


• A collection of related data or information grouped together under one logical structure.
• A logical collection of related files grouped together by a series of tables as one entity.

Examples of databases.

You can create a database for;

• Customers’ details. – Library records.


• Personal records. – Flight schedules.
• Employees’ records. – A music collection.
• An Address book (or Telephone directory), where each person has the Name, Address,
City & Telephone no.

DATABASE CONCEPTS.

Definition & Background.

A Database is a common data pool, maintained to support the various activities taking place
within an organization.

The manipulation of database contents to yield information is by the user programs.

The database is an organized set of data items that reduces duplications of the stored files.

INTEGRATED FILE SYSTEMS.

These refer to the traditional methods of storing files, i.e., the use of paper files. E.g., Manual &
Flat files.
• In Integrated file systems, several inter-independent files are maintained for the different
users’ requirements.
• The Integrated file systems have the problems of data duplication.
• In order to carry out any file processing task(s), all the related files have to be processed.
• Some information resulting from several files may not be available, giving the overall
state of affairs of the system.

DATABASE MAINTENANCE.

A Database cannot be created fully at once. Its creation and maintenance is a gradual and
continuous procedure. The creation & the maintenance of databases is under the influence of
a set of user programs known as the Database Management Systems (DBMS).

Through the DBMS, users communicate their requirements to the database using Data
Description Languages (DDL’s) & Data Manipulation Languages (DML’s).

In fact, the DBMS provide an interface between the user’s programs and the contents of the
database.

During the creation & subsequent maintenance of the database, the DDL’s & DML’s are used to:

• Add new files to the database.


• Incorporate fields onto the existing records in the database.
• Delete the obsolete (outdated) records.
• Carry out adjustments on (or amend) the existing records.
• Expand the database capacity, for it to cater for the growth in the volume for enhanced
application requirements.
• Link up all the data items in the database logically.

Data Dictionary.

All definitions of elements in the system are described in detail in a Data dictionary.

The elements of the system that are defined are: Dataflow, Processes, and Data stores.
If a database administrator wants to know the definition of a data item name or the content of
a particular dataflow, the information should be available in the dictionary.

Notes.

• Databases are used for several purposes, e.g., in Accounting – used for maintenance of
the customer files within the base.
• Database systems are installed & coordinated by a Database Administrator, who has the
overall authority to establish and control data definitions and standards.
• Database storage requires a large Direct Access storage (e.g., the disk) maintained on-
line.
• The database contents should be backed up, after every update or maintenance run, to
supplement the database contents in case of loss. The backup media to be used is chosen
by the organization.

Data Bank.

A Data Bank can be defined as a collection of data, usually for several users, and available to
several organizations.

A Data Bank is therefore, a collection of databases.

Notes.

The Database is organizational, while a Data Bank is multi-organizational in use.

The Database & the Data Bank have similar construction and purpose. The only difference is
that, the term Data Bank is used to describe a larger capacity base, whose contents are mostly
of historical references (i.e., the Data Bank forms the basis for data or information that is
usually generated periodically). On the other hand, the contents of the Database are used
frequently to generate information that influences the decisions of the concerned organization.
TYPES OF DATABASE MODELS.

Relational database model.

A Relational database is a set of data where all the items are related.

The data elements in a Relational database are stored or organized in tables. A Table consists
of rows & columns. Each column represents a Field, while a row represents a Record. The
records are grouped under fields.

~ A Relational database is flexible and easy to understand.

~ A Relational database system, has the ability to quickly find & bring information stored in
separate tables together using queries, forms, & reports. This means that, a data element in
any one table can be related to any piece of data in another table as long as both tables share
common data elements.

Examples of Relational database systems;

Microsoft Access.

• FileMaker Pro.

Hierarchical database model.

It is a data structure where the data is organized like a family tree or an organization chart.

In a Hierarchical database, the records are stored in multiple levels. Units further down the
system are subordinate to the ones above.

In other words, the database has branches made up of parent and child records. Each parent
record can have multiple child records, but each child can have only one parent.

Components of Data hierarchy.

Databases (logical collection of related files).

Files (collection of related records).

Records (collection of related fields).

Fields (Facts, attributes – a set of related characters).


Characters (Alphabets, numbers & special characters or symbols).

• Network database model.

A Network database model represents many-to-many relationships between data. It allows a


data element or record to be related to more than one other data element or record. For
example, an employee can be associated with more than one department.

DATA BASE MANAGEMENT SYSTEMS (DBMS).

• These are programs used to store & manage files or records containing related
information.
• A collection of programs required to store & retrieve data from a database.
• A DBMS is a tool that allows one to create, maintain, update and store the data within a
database.

A DBMS is a complex software, which creates, expands & maintains the database, and it also
provides the interface between the user and the data in the database.

A DBMS enables the user to create lists of information in a computer, analyse them, add new
information, delete old information, and so on. It allows users to efficiently store information
in an orderly manner for quick retrieval.

A DBMS can also be used as a programming tool to write custom-made programs.

CLASSIFICATION OF DATABASE SOFTWARE.

Database software is generally classified into 2:

1. PC-based database software (or Personal Information Managers – PIMs).


2. Corporate-based database software.

PC-based database software.

The PC-based database programs are usually designed for individual users or small businesses.
They provide many general features for organizing & analyzing data. For example, they allow
users to create database files, enter data, organize that data in various ways, and also create
reports.

They do not have strict security features, complicated backup & recovery procedures.

Examples of PC-based systems;

* Microsoft Access. * FoxPro.

* Dbase III Plus * Paradox.

Corporate database software.

They are designed for big corporations that handle large amounts of data.

Issues such as security, data integrity (reliability), backup and recovery are taken seriously to
prevent loss of information.

Examples of Corporate-based systems;

* Oracle. * Informix * Ingress.

* Progress. * Sybase. * SQL Server.

Common features of a database packages.

• Have facilities for Creating


• Have facilities for Updating records or databases.

Using a DBMS, you can define relationships between records & files maintained in a
database. In this case, a transaction in one file of the database can also cause a series of
updates in parts of other tables. Thus, the data is input only once to the database and is made
available to the many files composing it.
• Have facilities for generating Reports.
• Have a Find or Search facility that enables the user to scan through the records in the
database so as to find information he/she needs.
• Allow Sorting that enables the user to organize & arrange the records within the
database.
• Contain Query & Filter facilities that specify the information you want the database to
search or sort.
• Have a data Validating

FUNCTIONS OF A DATABASE MANAGEMENT SYSTEM.

The DBMS is a set of software, which have several functions in relation to the database as listed
below:

1. Creates or constructs the database contents through the Data Manipulation Languages.
2. Interfaces (links) the user to the database contents through Data Manipulation
Languages.
3. Ensures the growth of the database contents through addition of new fields & records
onto the database.
4. Maintains the contents of the database. This involves adding new records or files into the
database, modifying the already existing records & deleting of the outdated records.
5. It helps the user to sort through the records & compile lists based on any criteria he/she
would like to establish.
6. Manages the storage space for the data within the database & keeps track of all the data
in the database.
7. It provides flexible processing methods for the contents of the database.
8. Protects the contents of the database against all sorts of damage or misuse, e.g. illegal
access.
9. Monitors the usage of the database contents to determine the rarely used data and those
that are frequently used, so that they can be made readily available, whenever need
arises.
10. It maintains a dictionary of the data within the database & manages the data descriptions
in the dictionary.

Note. Database Management System (DBMS) is used for database;


• Control, and
• Report generation.

ADVANTAGES OF USING A DBMS.

1. Database systems can be used to store data, retrieve and generate reports.
2. It is easy to maintain the data stored within a database.
3. A DBMS is able to handle large amounts of data.
4. Data is stored in an organized format, i.e. under different fieldnames.
5. With modern equipment, data can easily be recorded.
6. Data is quickly & easily accessed or retrieved, as it is properly organized.
7. It helps in linking many database tables and sourcing of data from these tables.
8. It is quite easy to update the data stored within a database.

A database is a collection of files grouped together by a series of tables as one entity. These
tables serve as an index for defining relationships between records and files maintained in the
database. This makes updating of the data in the related tables very easy.

9. Use of a database tool reduces duplication of the stored files, and the reprocessing of the
same data items. In addition, several independent files are maintained for the different
user requirements.
10. It is used to query & display records satisfying a given condition.
11. It is easy to analyse information stored in a database & to prepare summary reports &
charts.
12. It cost saving. This results from the sharing of records, reduced processing times, reduced
use of software and hardware, more efficient use of data processing personnel, and an
overall improvement in the flow of data.
13. Use of Integrated systems is greatly facilitated.
An Integrated system – A total system approach that unifies all the aspects of the
organization. Facilities are shared across the complete organization.

14. A lot of programming time is saved because the DBMS can be used to construct &
process files as well as retrieve data.
15. Information supplied to managers is more valuable, because it is based on a widespread
collection of data (instead of files, which contain only the data needed for one
application).
16. The database also maintains an extensive Inventory Control file. This file gives an account
of all the parts & equipment throughout the maintenance system. It also defines the
status of each part and its location.
17. It enables timely & accurate reporting of data to all the maintenance centres. The same
data is available and distributed to everyone.
18. The database maintains files related to any work assigned to outside service centres.

Many parts are repaired by the vendors from whom they are purchased. A database is used to
maintain data on the parts that have been shipped to vendors and those that are outstanding
from the inventory. Data relating to the guarantees and warranties of individual vendors are
also stored in the database.

DISADVANTAGES OF DATABASES.

1. A Database system requires a big size, very high cost & a lot of time to implement.
2. A Database requires the use of a large-scale computer system.
3. The time involved. A project of this type requires a minimum of 1 – 2 years.
4. A large full-time staff is also required to design, program, & support the implementation
of a database.
5. The cost of the database project is a limiting factor for many organizations.

Database-oriented computer systems are not luxuries, and are undertaken when proven
economically reasonable.

A database management system stores data in such a way that it becomes easier to retrieve,
manipulate, and produce information.
Characteristics
Traditionally, data was organized in file formats. DBMS was a new concept then, and all the
research was done to make it overcome the deficiencies in traditional style of data management.
A modern DBMS has the following characteristics −
• Real-world entity − A modern DBMS is more realistic and uses real-world entities to
design its architecture. It uses the behavior and attributes too. For example, a school
database may use students as an entity and their age as an attribute.
• Relation-based tables − DBMS allows entities and relations among them to form tables.
A user can understand the architecture of a database just by looking at the table names.
• Isolation of data and application − A database system is entirely different than its data. A
database is an active entity, whereas data is said to be passive, on which the database
works and organizes. DBMS also stores metadata, which is data about data, to ease its
own process.
• Less redundancy − DBMS follows the rules of normalization, which splits a relation when
any of its attributes is having redundancy in values. Normalization is a mathematically rich
and scientific process that reduces data redundancy.
• Consistency − Consistency is a state where every relation in a database remains
consistent. There exist methods and techniques, which can detect attempt of leaving
database in inconsistent state. A DBMS can provide greater consistency as compared to
earlier forms of data storing applications like file-processing systems.
• Query Language − DBMS is equipped with query language, which makes it more efficient
to retrieve and manipulate data. A user can apply as many and as different filtering options
as required to retrieve a set of data. Traditionally it was not possible where file-processing
system was used.
• ACID Properties − DBMS follows the concepts of Atomicity, Consistency, Isolation,
and Durability (normally shortened as ACID). These concepts are applied on transactions,
which manipulate data in a database. ACID properties help the database stay healthy in
multi-transactional environments and in case of failure.
• Multiuser and Concurrent Access − DBMS supports multi-user environment and allows
them to access and manipulate data in parallel. Though there are restrictions on
transactions when users attempt to handle the same data item, but users are always
unaware of them.
• Multiple views − DBMS offers multiple views for different users. A user who is in the Sales
department will have a different view of database than a person working in the Production
department. This feature enables the users to have a concentrate view of the database
according to their requirements.
• Security − Features like multiple views offer security to some extent where users are
unable to access data of other users and departments. DBMS offers methods to impose
constraints while entering data into the database and retrieving the same at a later stage.
DBMS offers many different levels of security features, which enables multiple users to
have different views with different features. For example, a user in the Sales department
cannot see the data that belongs to the Purchase department. Additionally, it can also be
managed how much data of the Sales department should be displayed to the user. Since
a DBMS is not saved on the disk as traditional file systems, it is very hard for miscreants to
break the code.
Users
A typical DBMS has users with different rights and permissions who use it for different purposes.
Some users retrieve data and some back it up. The users of a DBMS can be broadly categorized
as follows −

• Administrators − Administrators maintain the DBMS and are responsible for


administrating the database. They are responsible to look after its usage and by whom it
should be used. They create access profiles for users and apply limitations to maintain
isolation and force security. Administrators also look after DBMS resources like system
license, required tools, and other software and hardware related maintenance.
• Designers − Designers are the group of people who actually work on the designing part of
the database. They keep a close watch on what data should be kept and in what format.
They identify and design the whole set of entities, relations, constraints, and views.
• End Users − End users are those who actually reap the benefits of having a DBMS. End
users can range from simple viewers who pay attention to the logs or market rates to
sophisticated users such as business analysts.

DBMS - Architecture
The design of a DBMS depends on its architecture. It can be centralized or decentralized or
hierarchical. The architecture of a DBMS can be seen as either single tier or multi-tier. An n-tier
architecture divides the whole system into related but independent n modules, which can be
independently modified, altered, changed, or replaced.
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS and
uses it. Any changes done here will directly be done on the DBMS itself. It does not provide handy
tools for end-users. Database designers and programmers normally prefer to use single-tier
architecture.
If the architecture of DBMS is 2-tier, then it must have an application through which the DBMS
can be accessed. Programmers use 2-tier architecture where they access the DBMS by means of
an application. Here the application tier is entirely independent of the database in terms of
operation, design, and programming.
3-tier Architecture
A 3-tier architecture separates its tiers from each other based on the complexity of the users and
how they use the data present in the database. It is the most widely used architecture to design
a DBMS.

• Database (Data) Tier − At this tier, the database resides along with its query processing
languages. We also have the relations that define the data and their constraints at this
level.
• Application (Middle) Tier − At this tier reside the application server and the programs that
access the database. For a user, this application tier presents an abstracted view of the
database. End-users are unaware of any existence of the database beyond the application.
At the other end, the database tier is not aware of any other user beyond the application
tier. Hence, the application layer sits in the middle and acts as a mediator between the
end-user and the database.
• User (Presentation) Tier − End-users operate on this tier and they know nothing about
any existence of the database beyond this layer. At this layer, multiple views of the
database can be provided by the application. All views are generated by applications that
reside in the application tier.
Multiple-tier database architecture is highly modifiable, as almost all its components are
independent and can be changed independently.
DBMS - Data Models
Data models define how the logical structure of a database is modeled. Data Models are
fundamental entities to introduce abstraction in a DBMS. Data models define how data is
connected to each other and how they are processed and stored inside the system.
The very first data model could be flat data-models, where all the data used are to be kept in the
same plane. Earlier data models were not so scientific, hence they were prone to introduce lots
of duplication and update anomalies.
Entity-Relationship Model
Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them. While formulating real-world scenario into the database model, the ER Model
creates entity set, relationship set, general attributes and constraints.
ER Model is best used for the conceptual design of a database.
ER Model is based on −
• Entities and their attributes.
• Relationships among entities.
These concepts are explained below.

• Entity − An entity in an ER Model is a real-world entity having properties called attributes.


Every attribute is defined by its set of values called domain. For example, in a school
database, a student is considered as an entity. Student has various attributes like name,
age, class, etc.
• Relationship − The logical association among entities is called relationship. Relationships
are mapped with entities in various ways. Mapping cardinalities define the number of
association between two entities.
Mapping cardinalities −
o one to one
o one to many
o many to one
o many to many
Relational Model
The most popular data model in DBMS is the Relational Model. It is more scientific a model than
others. This model is based on first-order predicate logic and defines a table as an n-ary relation.

The main highlights of this model are −

• Data is stored in tables called relations.


• Relations can be normalized.
• In normalized relations, values saved are atomic values.
• Each row in a relation contains a unique value.
• Each column in a relation contains values from a same domain.
DBMS - Data Schemas
Database Schema
A database schema is the skeleton structure that represents the logical view of the entire
database. It defines how the data is organized and how the relations among them are associated.
It formulates all the constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive
detail of the database, which can be depicted by means of schema diagrams. It’s the database
designers who design the schema to help programmers understand the database and make it
useful.
A database schema can be divided broadly into two categories −
• Physical Database Schema − This schema pertains to the actual storage of data and its
form of storage like files, indices, etc. It defines how the data will be stored in a secondary
storage.
• Logical Database Schema − This schema defines all the logical constraints that need to be
applied on the data stored. It defines tables, views, and integrity constraints.

Database Instance
It is important that we distinguish these two terms individually. Database schema is the skeleton
of database. It is designed when the database doesn't exist at all. Once the database is
operational, it is very difficult to make any changes to it. A database schema does not contain
any data or information.
A database instance is a state of operational database with data at any given time. It contains a
snapshot of the database. Database instances tend to change with time. A DBMS ensures that its
every instance (state) is in a valid state, by diligently following all the validations, constraints, and
conditions that the database designers have imposed.
ER Diagram in DBMS

What is the ER Model?

The ER or (Entity Relational Model) is a high-level conceptual data model diagram. Entity-
Relation model is based on the notion of real-world entities and the relationship between
them.

ER modeling helps you to analyze data requirements systematically to produce a well-designed


database. So, it is considered a best practice to complete ER modeling before implementing
your database.

History of ER models

ER diagrams are a visual tool which is helpful to represent the ER model. It was proposed by
Peter Chen in 1971 to create a uniform convention which can be used for relational database
and network. He aimed to use an ER model as a conceptual modeling approach.

What is ER Diagrams?

Entity relationship diagram displays the relationships of entity set stored in a database. In other
words, we can say that ER diagrams help you to explain the logical structure of databases. At
first look, an ER diagram looks very similar to the flowchart. However, ER Diagram includes
many specialized symbols, and its meanings make this model unique.

Sample ER Diagram
Facts about ER Diagram Model:

• ER model allows you to draw Database Design


• It is an easy to use graphical tool for modeling data
• Widely used in Database Design
• It is a GUI representation of the logical structure of a Database
• It helps you to identifies the entities which exist in a system and the relationships
between those entities

Why use ER Diagrams?

Here, are prime reasons for using the ER Diagram

• Helps you to define terms related to entity relationship modeling


• Provide a preview of how all your tables should connect, what fields are going to be on
each table
• Helps to describe entities, attributes, relationships
• ER diagrams are translatable into relational tables which allows you to build databases
quickly
• ER diagrams can be used by database designers as a blueprint for implementing data in
specific software applications
• The database designer gains a better understanding of the information to be contained
in the database with the help of ERP diagram
• ERD is allowed you to communicate with the logical structure of the database to users

Components of the ER Diagram

This model is based on three basic concepts:

• Entities
• Attributes
• Relationships

Example

For example, in a University database, we might have entities for Students, Courses, and
Lecturers. Student’s entity can have attributes like Rollno, Name, and DeptID. They might have
relationships with Courses and Lecturers.
WHAT IS ENTITY?

A real-world thing either living or non-living that is easily recognizable and non-recognizable. It
is anything in the enterprise that is to be represented in our database. It may be a physical thing
or simply a fact about the enterprise or an event that happens in the real world.

An entity can be place, person, object, event or a concept, which stores data in the database.
The characteristics of entities are must have an attribute, and a unique key. Every entity is
made up of some 'attributes' which represent that entity.

Examples of entities:

• Person: Employee, Student, Patient


• Place: Store, Building
• Object: Machine, product, and Car
• Event: Sale, Registration, Renewal
• Concept: Account, Course
Notation of an Entity

Entity set:

Student

An entity set is a group of similar kind of entities. It may contain entities with attribute sharing
similar values. Entities are represented by their properties, which also called attributes. All
attributes have their separate values. For example, a student entity may have a name, age,
class, as attributes.

Example of Entities:

A university may have some departments. All these departments employ various lecturers and
offer several programs.

Some courses make up each program. Students register in a particular program and enroll in
various courses. A lecturer from the specific department takes each course, and each lecturer
teaches a various group of students.

Relationship

Relationship is nothing but an association among two or more entities. E.g., Tom works in the
Chemistry department.

Entities take part in relationships. We can often identify relationships with verbs or verb
phrases.
For example:

• You are attending this lecture


• I am giving the lecture
• Just loke entities, we can classify relationships according to relationship-types:
• A student attends a lecture
• A lecturer is giving a lecture.

Weak Entities

A weak entity is a type of entity which doesn't have its key attribute. It can be identified
uniquely by considering the primary key of another entity. For that, weak entity sets need to
have participation.

In above example, "Trans No" is a discriminator within a group of transactions in an ATM.

Let's learn more about a weak entity by comparing it with a Strong Entity

Strong Entity Set Weak Entity Set

Strong entity set always has a primary key. It does not have enough attributes to build
a primary key.

It is represented by a rectangle symbol. It is represented by a double rectangle


symbol.

It contains a Primary key represented by It contains a Partial Key which is


the underline symbol. represented by a dashed underline
symbol.
The member of a strong entity set is called The member of a weak entity set called as
as dominant entity set. a subordinate entity set.

Primary Key is one of its attributes which In a weak entity set, it is a combination of
helps to identify its member. primary key and partial key of the strong
entity set.

In the ER diagram the relationship The relationship between one strong and a
between two strong entity set shown by weak entity set shown by using the double
using a diamond symbol. diamond symbol.

The connecting line of the strong entity set The line connecting the weak entity set for
with the relationship is single. identifying relationship is double.

Attributes

It is a single-valued property of either an entity-type or a relationship-type.

For example, a lecture might have attributes: time, date, duration, place, etc.

An attribute is represented by an Ellipse


Types of Attributes Description

Simple attribute Simple attributes can't be divided any


further. For example, a student's contact
number. It is also called an atomic value.

Composite attribute It is possible to break down composite


attribute. For example, a student's full
name may be further divided into first
name, second name, and last name.

Derived attribute This type of attribute does not include in


the physical database. However, their
values are derived from other attributes
present in the database. For example,
age should not be stored directly.
Instead, it should be derived from the
DOB of that employee.

Multivalued attribute Multivalued attributes can have more


than one values. For example, a student
can have more than one mobile number,
email address, etc.

Cardinality

Defines the numerical attributes of the relationship between two entities or entity sets.

Different types of cardinal relationships are:

• One-to-One Relationships
• One-to-Many Relationships
• May to One Relationships
• Many-to-Many Relationships
1.One-to-one:

One entity from entity set X can be associated with at most one entity of entity set Y and vice
versa.

Example: One student can register for numerous courses. However, all those courses have a
single line back to that one student.

2.One-to-many:

One entity from entity set X can be associated with multiple entities of entity set Y, but an
entity from entity set Y can be associated with at least one entity.

For example, one class is consisting of multiple students.


3. Many to One

More than one entity from entity set X can be associated with at most one entity of entity set Y.
However, an entity from entity set Y may or may not be associated with more than one entity
from entity set X.

For example, many students belong to the same class.

4. Many to Many:

One entity from X can be associated with more than one entity from Y and vice versa.

For example, Students as a group are associated with multiple faculty members, and faculty
members can be associated with multiple students.
ER- Diagram Notations

ER- Diagram is a visual representation of data that describe how data is related to each other.

• Rectangles: This symbol represent entity types


• Ellipses : Symbol represent attributes
• Diamonds: This symbol represents relationship types
• Lines: It links attributes to entity types and entity types with other relationship types
• Primary key: attributes are underlined
• Double Ellipses: Represent multi-valued attributes

Steps to Create an ERD

Following are the steps to create an ERD.

Let's study them with an example:

In a university, a Student enrolls in Courses. A student must be assigned to at least one or more
Courses. Each course is taught by a single Professor. To maintain instruction quality, a Professor
can deliver only one course
Step 1) Entity Identification

We have three entities

• Student
• Course
• Professor
Step 2) Relationship Identification

We have the following two relationships

• The student is assigned a course


• Professor delivers a course

Step 3) Cardinality Identification

For them problem statement we know that,

• A student can be assigned multiple courses


• A Professor can deliver only one course

Step 4) Identify Attributes

You need to study the files, forms, reports, data currently maintained by the organization to
identify attributes. You can also conduct interviews with various stakeholders to identify
entities. Initially, it's important to identify the attributes without mapping them to a particular
entity.
Once, you have a list of Attributes, you need to map them to the identified entities. Ensure an
attribute is to be paired with exactly one entity. If you think an attribute should belong to more
than one entity, use a modifier to make it unique.

Once the mapping is done, identify the primary Keys. If a unique key is not readily available,
create one.

Entity Primary Key Attribute

Student Student_ID StudentName

Professor Employee_ID ProfessorName

Course Course_ID CourseName

For Course Entity, attributes could be Duration, Credits, Assignments, etc. For the sake of ease
we have considered just one attribute.

Step 5) Create the ERD

A more modern representation of ERD Diagram

Best Practices for Developing Effective ER Diagrams

• Eliminate any redundant entities or relationships


• You need to make sure that all your entities and relationships are properly labeled
• There may be various valid approaches to an ER diagram. You need to make sure that
the ER diagram supports all the data you need to store
• You should assure that each entity only appears a single time in the ER diagram
• Name every relationship, entity, and attribute are represented on your diagram
• Never connect relationships to each other
• You should use colors to highlight important portions of the ER diagram

Summary

• The ER model is a high-level data model diagram


• ER diagrams are a visual tool which is helpful to represent the ER model
• Entity relationship diagram displays the relationships of entity set stored in a database
• ER diagrams help you to define terms related to entity relationship modeling
• ER model is based on three basic concepts: Entities, Attributes & Relationships
• An entity can be place, person, object, event or a concept, which stores data in the
database
• Relationship is nothing but an association among two or more entities
• A weak entity is a type of entity which doesn't have its key attribute
• It is a single-valued property of either an entity-type or a relationship-type
• It helps you to defines the numerical attributes of the relationship between two entities
or entity sets
• ER- Diagram is a visual representation of data that describe how data is related to each
other
• While Drawing ER diagram you need to make sure all your entities and relationships are
properly labeled

Relational database Design

DBMS – Normalization

Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation.
Functional dependency says that if two tuples have same values for attributes A1, A2,..., An, then
those two tuples must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally
determines Y. The left-hand side attributes determine the values of attributes on the right-hand
side.
Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as F +, is the set of all
functional dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when
applied repeatedly, generates a closure of functional dependencies.
• Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds
beta.
• Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is
adding attributes in dependencies, does not change the basic dependencies.
• Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then
a → c also holds. a → b is called as a functionally that determines b.
Trivial Functional Dependency
• Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is
called a trivial FD. Trivial FDs always hold.
• Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial
FD.
• Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a
completely non-trivial FD.
Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any
database administrator. Managing a database with anomalies is next to impossible.
• Update anomalies − If data items are scattered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances get updated properly while
a few others are left with old values. Such instances leave the database in an inconsistent
state.
• Deletion anomalies − We tried to delete a record, but parts of it was left undeleted
because of unawareness, the data is also saved somewhere else.
• Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent
state.
First Normal Form
First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all
the attributes in a relation must have atomic domains. The values in an atomic domain are
indivisible units.
We re-arrange the relation (table) as below, to convert it to First Normal Form.

Each attribute must contain only a single value from its pre-defined domain.
Second Normal Form
Before we learn about the second normal form, we need to understand the following −
• Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime
attribute.
• Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a
non-prime attribute.
If we follow second normal form, then every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper
subset Y of X, for which Y → A also holds true.

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually. But we find that Stu_Name can
be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
Third Normal Form
For a relation to be in Third Normal Form, it must be in Second Normal form and the following
must satisfy −

• No non-prime attribute is transitively dependent on prime key attribute.


• For any non-trivial functional dependency, X → A, then either −
o X is a superkey or,

o A is prime attribute.

We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute.
We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is
City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows

Boyce-Codd Normal Form
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF
states that −

• For any non-trivial functional dependency, X → A, X must be a super-key.


In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-
key in the relation ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.

DBMS - Joins
We understand the benefits of taking a Cartesian product of two relations, which gives us all the
possible tuples that are paired together. But it might not be feasible for us in certain cases to take
a Cartesian product where we encounter huge relations with thousands of tuples having a
considerable large number of attributes.
Join is a combination of a Cartesian product followed by a selection process. A Join operation
pairs two tuples from different relations, if and only if a given join condition is satisfied.
We will briefly describe various join types in the following sections.
Theta (θ) Join
Theta join combines tuples from different relations provided they satisfy the theta condition. The
join condition is denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the attributes
don’t have anything in common, that is R1 ∩ R2 = Φ.
Theta join can use all kinds of comparison operators.

Student

SID Name Std

101 Alex 10

102 Maria 11

Subjects

Class Subject

10 Math

10 English

11 Music

11 Sports

Student_Detail −
STUDENT ⋈Student.Std = Subject.Class SUBJECT
Student_detail

SID Name Std Class Subject

101 Alex 10 10 Math

101 Alex 10 10 English


102 Maria 11 11 Music

102 Maria 11 11 Sports

Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The above
example corresponds to equijoin.
Natural Join (⋈)
Natural join does not use any comparison operator. It does not concatenate the way a Cartesian
product does. We can perform a Natural Join only if there is at least one common attribute that
exists between two relations. In addition, the attributes must have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in both the relations
are same.

Courses

CID Course Dept

CS01 Database CS

ME01 Mechanics ME

EE01 Electronics EE

HoD

Dept Head

CS Alex

ME Maya

EE Mira

Courses ⋈ HoD
Dept CID Course Head

CS CS01 Database Alex

ME ME01 Mechanics Maya

EE EE01 Electronics Mira

Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only those
tuples with matching attributes and the rest are discarded in the resulting relation. Therefore,
we need to use outer joins to include all the tuples from the participating relations in the resulting
relation. There are three kinds of outer joins − left outer join, right outer join, and full outer join.
Left Outer Join(R S)
All the tuples from the Left relation, R, are included in the resulting relation. If there are tuples in
R without any matching tuple in the Right relation S, then the S-attributes of the resulting relation
are made NULL.

Left

A B

100 Database

101 Mechanics

102 Electronics

Right

A B

100 Alex

102 Maya
104 Mira

Courses HoD

A B C D

100 Database 100 Alex

101 Mechanics --- ---

102 Electronics 102 Maya

Right Outer Join: ( R S)


All the tuples from the Right relation, S, are included in the resulting relation. If there are tuples
in S without any matching tuple in R, then the R-attributes of resulting relation are made NULL.

Courses HoD

A B C D

100 Database 100 Alex

102 Electronics 102 Maya

--- --- 104 Mira

Full Outer Join: ( R S)


All the tuples from both participating relations are included in the resulting relation. If there are
no matching tuples for both relations, their respective unmatched attributes are made NULL.

Courses HoD

A B C D

100 Database 100 Alex


101 Mechanics --- ---

102 Electronics 102 Maya

--- --- 104 Mira

DBMS - File Structure

Relative data and information is stored collectively in file formats. A file is a sequence of records
stored in binary format. A disk drive is formatted into several blocks that can store records. File
records are mapped onto those disk blocks.
File Organization
File Organization defines how file records are mapped onto disk blocks. We have four types of
File Organization to organize file records −

Heap File Organization


When a file is created using Heap File Organization, the Operating System allocates memory area
to that file without any further accounting details. File records can be placed anywhere in that
memory area. It is the responsibility of the software to manage the records. Heap File does not
support any ordering, sequencing, or indexing on its own.
Sequential File Organization
Every file record contains a data field (attribute) to uniquely identify that record. In sequential
file organization, records are placed in the file in some sequential order based on the unique key
field or search key. Practically, it is not possible to store all the records sequentially in physical
form.
Hash File Organization
Hash File Organization uses Hash function computation on some fields of the records. The output
of the hash function determines the location of disk block where the records are to be placed.
Clustered File Organization
Clustered file organization is not considered good for large databases. In this mechanism, related
records from one or more relations are kept in the same disk block, that is, the ordering of records
is not based on primary key or search key.
File Operations
Operations on database files can be broadly classified into two categories −
• Update Operations
• Retrieval Operations
Update operations change the data values by insertion, deletion, or update. Retrieval operations,
on the other hand, do not alter the data but retrieve them after optional conditional filtering. In
both types of operations, selection plays a significant role. Other than creation and deletion of a
file, there could be several operations, which can be done on files.
• Open − A file can be opened in one of the two modes, read mode or write mode. In read
mode, the operating system does not allow anyone to alter data. In other words, data is
read only. Files opened in read mode can be shared among several entities. Write mode
allows data modification. Files opened in write mode can be read but cannot be shared.
• Locate − Every file has a file pointer, which tells the current position where the data is to
be read or written. This pointer can be adjusted accordingly. Using find (seek) operation,
it can be moved forward or backward.
• Read − By default, when files are opened in read mode, the file pointer points to the
beginning of the file. There are options where the user can tell the operating system where
to locate the file pointer at the time of opening a file. The very next data to the file pointer
is read.
• Write − User can select to open a file in write mode, which enables them to edit its
contents. It can be deletion, insertion, or modification. The file pointer can be located at
the time of opening or can be dynamically changed if the operating system allows to do
so.
• Close − This is the most important operation from the operating system’s point of view.
When a request to close a file is generated, the operating system
o removes all the locks (if in shared mode),
o saves the data (if altered) to the secondary storage media, and
o releases all the buffers and file handlers associated with the file.
The organization of data inside a file plays a major role here. The process to locate the file pointer
to a desired record inside a file various based on whether the records are arranged sequentially
or clustered.

DBMS – Transaction and Concurrency


A transaction can be defined as a group of tasks. A single task is the minimum processing unit
which cannot be divided further.
Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from
A's account to B's account. This very simple and small transaction involves several low-level tasks.
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
ACID Properties
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability
− commonly known as ACID properties − in order to ensure accuracy, completeness, and data
integrity.
• Atomicity − This property states that a transaction must be treated as an atomic unit, that
is, either all of its operations are executed or none. There must be no state in a database
where a transaction is left partially completed. States should be defined either before the
execution of the transaction or after the execution/abortion/failure of the transaction.
• Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.
• Durability − The database should be durable enough to hold all its latest updates even if
the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction commits but the
system fails before the data could be written on to the disk, then that data will be updated
once the system springs back into action.
• Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will
be carried out and executed as if it is the only transaction in the system. No transaction
will affect the existence of any other transaction.
Serializability
When multiple transactions are being executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transactions are interleaved with
some other transaction.
• Schedule − A chronological execution sequence of a transaction is called a schedule. A
schedule can have many transactions in it, each comprising of a number of
instructions/tasks.
• Serial Schedule − It is a schedule in which transactions are aligned in such a way that one
transaction is executed first. When the first transaction completes its cycle, then the next
transaction is executed. Transactions are ordered one after the other. This type of
schedule is called a serial schedule, as transactions are executed in a serial manner.
In a multi-transaction environment, serial schedules are considered as a benchmark. The
execution sequence of an instruction in a transaction cannot be changed, but two transactions
can have their instructions executed in a random fashion. This execution does no harm if two
transactions are mutually independent and working on different segments of data; but in case
these two transactions are working on the same data, then the results may vary. This ever-varying
result may bring the database to an inconsistent state.
To resolve this problem, we allow parallel execution of a transaction schedule, if its transactions
are either serializable or have some equivalence relation among them.
Equivalence Schedules
An equivalence schedule can be of the following types −
Result Equivalence
If two schedules produce the same result after execution, they are said to be result equivalent.
They may yield the same result for some value and different results for another set of values.
That's why this equivalence is not generally considered significant.
View Equivalence
Two schedules would be view equivalence if the transactions in both the schedules perform
similar actions in a similar manner.
For example −
• If T reads the initial data in S1, then it also reads the initial data in S2.
• If T reads the value written by J in S1, then it also reads the value written by J in S2.
• If T performs the final write on the data value in S1, then it also performs the final write
on the data value in S2.
Conflict Equivalence
Two schedules would be conflicting if they have the following properties −

• Both belong to separate transactions.


• Both accesses the same data item.
• At least one of them is "write" operation.
Two schedules having multiple transactions with conflicting operations are said to be conflict
equivalent if and only if −

• Both the schedules contain the same set of Transactions.


• The order of conflicting pairs of operation is maintained in both the schedules.
Note − View equivalent schedules are view serializable and conflict equivalent schedules are
conflict serializable. All conflict serializable schedules are view serializable too.
States of Transactions
A transaction in a database can be in one of the following states −

• Active − In this state, the transaction is being executed. This is the initial state of every
transaction.
• Partially Committed − When a transaction executes its final operation, it is said to be in a
partially committed state.
• Failed − A transaction is said to be in a failed state if any of the checks made by the
database recovery system fails. A failed transaction can no longer proceed further.
• Aborted − If any of the checks fails and the transaction has reached a failed state, then
the recovery manager rolls back all its write operations on the database to bring the
database back to its original state where it was prior to the execution of the transaction.
Transactions in this state are called aborted. The database recovery module can select one
of the two operations after a transaction aborts −
o Re-start the transaction
o Kill the transaction
• Committed − If a transaction executes all its operations successfully, it is said to be
committed. All its effects are now permanently established on the database system
Concurrency Control
In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions. We have
concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent
transactions. Concurrency control protocols can be broadly divided into two categories −

• Lock based protocols


• Time stamp based protocols
Lock-based Protocols
Database systems equipped with lock-based protocols use a mechanism by which any transaction
cannot read or write data until it acquires an appropriate lock on it. Locks are of two kinds −
• Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
• Shared/exclusive − This type of locking mechanism differentiates the locks based on their
uses. If a lock is acquired on a data item to perform a write operation, it is an exclusive
lock. Allowing more than one transaction to write on the same data item would lead the
database into an inconsistent state. Read locks are shared because no data value is being
changed.
There are four types of lock protocols available −
Simplistic Lock Protocol
Simplistic lock-based protocols allow transactions to obtain a lock on every object before a 'write'
operation is performed. Transactions may unlock the data item after completing the ‘write’
operation.
Pre-claiming Lock Protocol
Pre-claiming protocols evaluate their operations and create a list of data items on which they
need locks. Before initiating an execution, the transaction requests the system for all the locks it
needs beforehand. If all the locks are granted, the transaction executes and releases all the locks
when all its operations are over. If all the locks are not granted, the transaction rolls back and
waits until all the locks are granted.
Two-Phase Locking 2PL
This locking protocol divides the execution phase of a transaction into three parts. In the first
part, when the transaction starts executing, it seeks permission for the locks it requires. The
second part is where the transaction acquires all the locks. As soon as the transaction releases its
first lock, the third phase starts. In this phase, the transaction cannot demand any new locks; it
only releases the acquired locks.

Two-phase locking has two phases, one is growing, where all the locks are being acquired by the
transaction; and the second phase is shrinking, where the locks held by the transaction are being
released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then
upgrade it to an exclusive lock.
Strict Two-Phase Locking
The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the
transaction continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a
lock after using it. Strict-2PL holds all the locks until the commit point and releases all the locks
at a time.
Strict-2PL does not have cascading abort as 2PL does.
Timestamp-based Protocols
The most commonly used concurrency protocol is the timestamp based protocol. This protocol
uses either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions at the
time of execution, whereas timestamp-based protocols start working as soon as a transaction is
created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age
of the transaction. A transaction created at 0002 clock time would be older than all other
transactions that come after it. For example, any transaction 'y' entering the system at 0004 is
two seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system
know when the last ‘read and write’ operation was performed on the data item.
Timestamp Ordering Protocol
The timestamp-ordering protocol ensures serializability among transactions in their conflicting
read and write operations. This is the responsibility of the protocol system that the conflicting
pair of tasks should be executed according to the timestamp values of the transactions.

• The timestamp of transaction Ti is denoted as TS(Ti).


• Read time-stamp of data-item X is denoted by R-timestamp(X).
• Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows −
• If a transaction Ti issues a read(X) operation −
o If TS(Ti) < W-timestamp(X)
▪ Operation rejected.
o If TS(Ti) >= W-timestamp(X)
▪ Operation executed.
o All data-item timestamps updated.
• If a transaction Ti issues a write(X) operation −
o If TS(Ti) < R-timestamp(X)
▪ Operation rejected.
o If TS(Ti) < W-timestamp(X)
▪Operation rejected and Ti rolled back.
o Otherwise, operation executed.
Thomas' Write Rule
This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and Ti is rolled back.
Time-stamp ordering rules can be modified to make the schedule view serializable.
Instead of making Ti rolled back, the 'write' operation itself is ignored.
DBMS - Deadlock
In a multi-process system, deadlock is an unwanted situation that arises in a shared resource
environment, where a process indefinitely waits for a resource that is held by another process.
For example, assume a set of transactions {T0, T1, T2, ...,Tn}. T0 needs a resource X to complete its
task. Resource X is held by T1, and T1 is waiting for a resource Y, which is held by T 2. T2 is waiting
for resource Z, which is held by T0. Thus, all the processes wait for each other to release resources.
In this situation, none of the processes can finish their task. This situation is known as a deadlock.
Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the transactions
involved in the deadlock are either rolled back or restarted.
Deadlock Prevention
To prevent any deadlock situation in the system, the DBMS aggressively inspects all the
operations, where transactions are about to execute. The DBMS inspects the operations and
analyzes if they can create a deadlock situation. If it finds that a deadlock situation might occur,
then that transaction is never allowed to be executed.
There are deadlock prevention schemes that use timestamp ordering mechanism of transactions
in order to predetermine a deadlock situation.
Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with
a conflicting lock by another transaction, then one of the two possibilities may occur −
• If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than Tj − then Ti is
allowed to wait until the data-item is available.
• If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a
random delay but with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with
conflicting lock by some another transaction, one of the two possibilities may occur −
• If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is restarted later
with a random delay but with the same timestamp.
• If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
This scheme, allows the younger transaction to wait; but when an older transaction requests an
item held by a younger one, the older transaction forces the younger one to abort and release
the item.
In both the cases, the transaction that enters the system at a later stage is aborted.
Deadlock Avoidance
Aborting a transaction is not always a practical approach. Instead, deadlock avoidance
mechanisms can be used to detect any deadlock situation in advance. Methods like "wait-for
graph" are available but they are suitable for only those systems where transactions are
lightweight having fewer instances of resource. In a bulky system, deadlock prevention
techniques may work well.
Wait-for Graph
This is a simple method available to track if any deadlock situation may arise. For each transaction
entering into the system, a node is created. When a transaction T i requests for a lock on an item,
say X, which is held by some other transaction Tj, a directed edge is created from T i to Tj. If
Tj releases item X, the edge between them is dropped and Ti locks the data item.
The system maintains this wait-for graph for every transaction waiting for some data items held
by others. The system keeps checking if there's any cycle in the graph.

Here, we can use any of the two following approaches −


• First, do not allow any request for an item, which is already locked by another transaction.
This is not always feasible and may cause starvation, where a transaction indefinitely waits
for a data item and can never acquire it.
• The second option is to roll back one of the transactions. It is not always feasible to roll
back the younger transaction, as it may be important than the older one. With the help of
some relative algorithm, a transaction is chosen, which is to be aborted. This transaction
is known as the victim and the process is known as victim selection.
DBMS - Data Backup
Loss of Volatile Storage
A volatile storage like RAM stores all the active logs, disk buffers, and related data. In addition, it
stores all the transactions that are being currently executed. What happens if such a volatile
storage crashes abruptly? It would obviously take away all the logs and active copies of the
database. It makes recovery almost impossible, as everything that is required to recover the data
is lost.
Following techniques may be adopted in case of loss of volatile storage −
• We can have checkpoints at multiple stages so as to save the contents of the database
periodically.
• A state of active database in the volatile memory can be periodically dumped onto a
stable storage, which may also contain logs and active transactions and buffer blocks.
• <dump> can be marked on a log file, whenever the database contents are dumped from
a non-volatile memory to a stable one.

Recovery
• When the system recovers from a failure, it can restore the latest dump.
• It can maintain a redo-list and an undo-list as checkpoints.
• It can recover the system by consulting undo-redo lists to restore the state of all
transactions up to the last checkpoint.
Database Backup & Recovery from Catastrophic Failure
A catastrophic failure is one where a stable, secondary storage device gets corrupt. With the
storage device, all the valuable data that is stored inside is lost. We have two different strategies
to recover data from such a catastrophic failure −
• Remote backup &minu; Here a backup copy of the database is stored at a remote location
from where it can be restored in case of a catastrophe.
• Alternatively, database backups can be taken on magnetic tapes and stored at a safer
place. This backup can later be transferred onto a freshly installed database to bring it to
the point of backup.
Grown-up databases are too bulky to be frequently backed up. In such cases, we have techniques
where we can restore a database just by looking at its logs. So, all that we need to do here is to
take a backup of all the logs at frequent intervals of time. The database can be backed up once a
week, and the logs being very small can be backed up every day or as frequently as possible.
Remote Backup
Remote backup provides a sense of security in case the primary location where the database is
located gets destroyed. Remote backup can be offline or real-time or online. In case it is offline,
it is maintained manually.

Online backup systems are more real-time and lifesavers for database administrators and
investors. An online backup system is a mechanism where every bit of the real-time data is
backed up simultaneously at two distant places. One of them is directly connected to the system
and the other one is kept at a remote place as backup.
As soon as the primary database storage fails, the backup system senses the failure and switches
the user system to the remote storage. Sometimes this is so instant that the users can’t even
realize a failure.
SQL

SQL is a database computer language designed for the retrieval and management of data in a
relational database. SQL stands for Structured Query Language. This tutorial will give you a
quick start to SQL. It covers most of the topics required for a basic understanding of SQL and to
get a feel of how it works.

Why to Learn SQL?

SQL is Structured Query Language, which is a computer language for storing, manipulating and
retrieving data stored in a relational database.
SQL is the standard language for Relational Database System. All the Relational Database
Management Systems (RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and
SQL Server use SQL as their standard database language.
Also, they are using different dialects, such as −

• MS SQL Server using T-SQL,


• Oracle using PL/SQL,
• MS Access version of SQL is called JET SQL (native format) etc.

Applications of SQL

As mentioned before, SQL is one of the most widely used query language over the databases.
I'm going to list few of them here:
• Allows users to access data in the relational database management systems.
• Allows users to describe the data.
• Allows users to define the data in a database and manipulate that data.
• Allows to embed within other languages using SQL modules, libraries & pre-compilers.
• Allows users to create and drop databases and tables.
• Allows users to create view, stored procedure, functions in a database.
• Allows users to set permissions on tables, procedures and views.
SQL is followed by a unique set of rules and guidelines called Syntax. This tutorial gives you a
quick start with SQL by listing all the basic SQL Syntax.
All the SQL statements start with any of the keywords like SELECT, INSERT, UPDATE, DELETE,
ALTER, DROP, CREATE, USE, SHOW and all the statements end with a semicolon (;).
The most important point to be noted here is that SQL is case insensitive, which means SELECT
and select have same meaning in SQL statements. Whereas, MySQL makes difference in table
names. So, if you are working with MySQL, then you need to give table names as they exist in
the database.
SQL commands are grouped into four major categories depending on their functionality:
•Data Definition Language (DDL)-These SQL commands are used for creating, modifying,
and dropping the structure of database objects. The commands are CREATE, ALTER,
DROP, RENAME, and TRUNCATE.
•Data Manipulation Language (DML)-These SQL commands are used for storing,
retrieving, modifying, and deleting data. These Data Manipulation Language commands
are: SELECTINSERT, UPDATE, and DELETE.
•Transaction Control Language (TCL)-These SQL commands are used for managing
changes affecting the data. These commands are COMMIT, ROLLBACK, and SAVEPOINT.
•Data Control Language (DCL)-These SQL commands are used for providing security to
database objects. These commands are GRANT and REVOKE.
Various Syntax in SQL
All the examples given in this tutorial have been tested with a MySQL server.

SQL SELECT Statement


SELECT column1, column2....columnN
FROM table_name;
SQL DISTINCT Clause
SELECT DISTINCT column1, column2....columnN
FROM table_name;
SQL WHERE Clause
SELECT column1, column2....columnN
FROM table_name
WHERE CONDITION;
SQL AND/OR Clause
SELECT column1, column2....columnN
FROM table_name
WHERE CONDITION-1 {AND|OR} CONDITION-2;
SQL IN Clause
SELECT column1, column2....columnN
FROM table_name
WHERE column_name IN (val-1, val-2,...val-N);
SQL BETWEEN Clause
SELECT column1, column2....columnN
FROM table_name
WHERE column_name BETWEEN val-1 AND val-2;
SQL LIKE Clause
SELECT column1, column2....columnN
FROM table_name
WHERE column_name LIKE { PATTERN };
SQL ORDER BY Clause
SELECT column1, column2....columnN
FROM table_name
WHERE CONDITION
ORDER BY column_name {ASC|DESC};
SQL GROUP BY Clause
SELECT SUM(column_name)
FROM table_name
WHERE CONDITION
GROUP BY column_name;
SQL COUNT Clause
SELECT COUNT(column_name)
FROM table_name
WHERE CONDITION;
SQL HAVING Clause
SELECT SUM(column_name)
FROM table_name
WHERE CONDITION
GROUP BY column_name
HAVING (arithematic function condition);
SQL CREATE TABLE Statement
CREATE TABLE table_name(
column1 datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
PRIMARY KEY( one or more columns )
);
SQL DROP TABLE Statement
DROP TABLE table_name;
SQL CREATE INDEX Statement
CREATE UNIQUE INDEX index_name
ON table_name ( column1, column2,...columnN);
SQL DROP INDEX Statement
ALTER TABLE table_name
DROP INDEX index_name;
SQL DESC Statement
DESC table_name;
SQL TRUNCATE TABLE Statement
TRUNCATE TABLE table_name;
SQL ALTER TABLE Statement
ALTER TABLE table_name {ADD|DROP|MODIFY} column_name {data_ype};
SQL ALTER TABLE Statement (Rename)
ALTER TABLE table_name RENAME TO new_table_name;
SQL INSERT INTO Statement
INSERT INTO table_name( column1, column2....columnN)
VALUES ( value1, value2....valueN);
SQL UPDATE Statement
UPDATE table_name
SET column1 = value1, column2 = value2....columnN=valueN
[ WHERE CONDITION ];
SQL DELETE Statement
DELETE FROM table_name
WHERE {CONDITION};
SQL CREATE DATABASE Statement
CREATE DATABASE database_name;
SQL DROP DATABASE Statement
DROP DATABASE database_name;
SQL USE Statement
USE database_name;
SQL COMMIT Statement
COMMIT;
SQL ROLLBACK Statement
ROLLBACK;
SQL Data Type is an attribute that specifies the type of data of any object. Each column, variable
and expression has a related data type in SQL. You can use these data types while creating your
tables. You can choose a data type for a table column based on your requirement.

SQL Server offers six categories of data types for your use which are listed below −
Exact Numeric Data Types
DATA TYPE FROM TO

bigint -9,223,372,036,854,775,808 9,223,372,036,854,775,807

int -2,147,483,648 2,147,483,647

smallint -32,768 32,767

tinyint 0 255

bit 0 1

decimal -10^38 +1 10^38 -1

numeric -10^38 +1 10^38 -1

money -922,337,203,685,477.5808 +922,337,203,685,477.5807

smallmoney -214,748.3648 +214,748.3647

Approximate Numeric Data Types


DATA TYPE FROM TO

float -1.79E + 308 1.79E + 308

real -3.40E + 38 3.40E + 38

Date and Time Data Types


DATA TYPE FROM TO

datetime Jan 1, 1753 Dec 31, 9999

smalldatetime Jan 1, 1900 Jun 6, 2079

date Stores a date like June 30, 1991

time Stores a time of day like 12:30 P.M.


Note − Here, datetime has 3.33 milliseconds accuracy where as smalldatetime has 1 minute
accuracy.

Character Strings Data Types


Sr.No. DATA TYPE & Description

1
char

Maximum length of 8,000 characters.( Fixed length non-Unicode characters)


varchar
2
Maximum of 8,000 characters.(Variable-length non-Unicode data).
varchar(max)
3
Maximum length of 2E + 31 characters, Variable-length non-Unicode data (SQL Server
2005 only).
text
4
Variable-length non-Unicode data with a maximum length of 2,147,483,647 characters.
Unicode Character Strings Data Types
Sr.No. DATA TYPE & Description

1
nchar

Maximum length of 4,000 characters.( Fixed length Unicode)


nvarchar
2
Maximum length of 4,000 characters.(Variable length Unicode)
nvarchar(max)
3
Maximum length of 2E + 31 characters (SQL Server 2005 only).( Variable length Unicode)
ntext
4
Maximum length of 1,073,741,823 characters. ( Variable length Unicode )
Binary Data Types
Sr.No. DATA TYPE & Description

1
binary

Maximum length of 8,000 bytes(Fixed-length binary data )


2 varbinary
Maximum length of 8,000 bytes.(Variable length binary data)
varbinary(max)
3
Maximum length of 2E + 31 bytes (SQL Server 2005 only). ( Variable length Binary data)
image
4
Maximum length of 2,147,483,647 bytes. ( Variable length Binary Data)
Misc Data Types
Sr.No. DATA TYPE & Description

sql_variant
1
Stores values of various SQL Server-supported data types, except text, ntext, and
timestamp.
timestamp
2
Stores a database-wide unique number that gets updated every time a row gets updated
uniqueidentifier
3
Stores a globally unique identifier (GUID)
xml
4
Stores XML data. You can store xml instances in a column or a variable (SQL Server 2005
only).
cursor
5
Reference to a cursor object
table
6
Stores a result set for later processing

What is an Operator in SQL?


An operator is a reserved word or a character used primarily in an SQL statement's WHERE clause
to perform operation(s), such as comparisons and arithmetic operations. These Operators are used
to specify conditions in an SQL statement and to serve as conjunctions for multiple conditions in a
statement.

• Arithmetic operators
• Comparison operators
• Logical operators
• Operators used to negate conditions

SQL Arithmetic Operators


Assume 'variable a' holds 10 and 'variable b' holds 20, then −
Show Examples

Operator Description Example

+ (Addition) Adds values on either side of the operator. a + b will give 30

Subtracts right hand operand from left hand


- (Subtraction) a - b will give -10
operand.

* (Multiplication) Multiplies values on either side of the operator. a * b will give 200

/ (Division) Divides left hand operand by right hand operand. b / a will give 2

Divides left hand operand by right hand operand


% (Modulus) b % a will give 0
and returns remainder.

SQL Comparison Operators


Assume 'variable a' holds 10 and 'variable b' holds 20, then −
Show Examples

Operator Description Example

Checks if the values of two operands are equal or not,


= (a = b) is not true.
if yes then condition becomes true.

Checks if the values of two operands are equal or not,


!= (a != b) is true.
if values are not equal then condition becomes true.
Checks if the values of two operands are equal or not,
<> (a <> b) is true.
if values are not equal then condition becomes true.

Checks if the value of left operand is greater than the


> value of right operand, if yes then condition becomes (a > b) is not true.
true.

Checks if the value of left operand is less than the


< value of right operand, if yes then condition becomes (a < b) is true.
true.

Checks if the value of left operand is greater than or


>= equal to the value of right operand, if yes then (a >= b) is not true.
condition becomes true.

Checks if the value of left operand is less than or equal


<= to the value of right operand, if yes then condition (a <= b) is true.
becomes true.

Checks if the value of left operand is not less than the


!< value of right operand, if yes then condition becomes (a !< b) is false.
true.

Checks if the value of left operand is not greater than


!> the value of right operand, if yes then condition (a !> b) is true.
becomes true.

SQL Logical Operators


Here is a list of all the logical operators available in SQL.
Show Examples

Sr.No. Operator & Description


1 ALL
The ALL operator is used to compare a value to all values in another value set.

AND
2 The AND operator allows the existence of multiple conditions in an SQL statement's
WHERE clause.

ANY
3 The ANY operator is used to compare a value to any applicable value in the list as per
the condition.

BETWEEN
4 The BETWEEN operator is used to search for values that are within a set of values,
given the minimum value and the maximum value.

EXISTS
5 The EXISTS operator is used to search for the presence of a row in a specified table
that meets a certain criterion.

IN
6 The IN operator is used to compare a value to a list of literal values that have been
specified.

LIKE
7 The LIKE operator is used to compare a value to similar values using wildcard
operators.

NOT
8 The NOT operator reverses the meaning of the logical operator with which it is used.
Eg: NOT EXISTS, NOT BETWEEN, NOT IN, etc. This is a negate operator.

9 OR
The OR operator is used to combine multiple conditions in an SQL statement's
WHERE clause.

IS NULL
10
The NULL operator is used to compare a value with a NULL value.

UNIQUE
11 The UNIQUE operator searches every row of a specified table for uniqueness (no
duplicates).

The SQL CREATE DATABASE statement is used to create a new SQL database.
Syntax
The basic syntax of this CREATE DATABASE statement is as follows −

CREATE DATABASE DatabaseName;

Always the database name should be unique within the RDBMS.


Example
If you want to create a new database <testDB>, then the CREATE DATABASE statement would be as
shown below −
SQL> CREATE DATABASE testDB;

Make sure you have the admin privilege before creating any database. Once a database is created,
you can check it in the list of databases as follows −

SQL> SHOW DATABASES;


+--------------------+
| Database |
+--------------------+
| information_schema |
| AMROOD |
| TUTORIALSPOINT |
| mysql |
| orig |
| test |
| testDB |
+--------------------+
7 rows in set (0.00 sec)
The SQL DROP DATABASE statement is used to drop an existing database in SQL schema.
Syntax
The basic syntax of DROP DATABASE statement is as follows −

DROP DATABASE DatabaseName;

Always the database name should be unique within the RDBMS.


Example
If you want to delete an existing database <testDB>, then the DROP DATABASE statement would be
as shown below −
SQL> DROP DATABASE testDB;

NOTE − Be careful before using this operation because by deleting an existing database would result
in loss of complete information stored in the database.
Make sure you have the admin privilege before dropping any database. Once a database is dropped,
you can check it in the list of the databases as shown below −
SQL> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| AMROOD |
| TUTORIALSPOINT |
| mysql |
| orig |
| test |
+--------------------+
6 rows in set (0.00 sec)

Creating a basic table involves naming the table and defining its columns and each column's data
type.
The SQL CREATE TABLE statement is used to create a new table.
Syntax
The basic syntax of the CREATE TABLE statement is as follows −

CREATE TABLE table_name(


column1 datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
PRIMARY KEY( one or more columns )
);

CREATE TABLE is the keyword telling the database system what you want to do. In this case, you
want to create a new table. The unique name or identifier for the table follows the CREATE
TABLE statement.
Then in brackets comes the list defining each column in the table and what sort of data type it
is. The syntax becomes clearer with the following example.
A copy of an existing table can be created using a combination of the CREATE TABLE statement
and the SELECT statement. You can check the complete details at Create Table Using another
Table.
Example
The following code block is an example, which creates a CUSTOMERS table with an ID as a primary
key and NOT NULL are the constraints showing that these fields cannot be NULL while creating
records in this table −
SQL> CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);

You can verify if your table has been created successfully by looking at the message displayed by the
SQL server, otherwise you can use the DESC command as follows −

SQL> DESC CUSTOMERS;


+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | | |
| NAME | varchar(20) | NO | | | |
| AGE | int(11) | NO | | | |
| ADDRESS | char(25) | YES | | NULL | |
| SALARY | decimal(18,2) | YES | | NULL | |
+---------+---------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

Now, you have CUSTOMERS table available in your database which you can use to store the required
information related to customers.

The SQL DROP TABLE statement is used to remove a table definition and all the data, indexes,
triggers, constraints and permission specifications for that table.
NOTE − You should be very careful while using this command because once a table is deleted then
all the information available in that table will also be lost forever.
Syntax
The basic syntax of this DROP TABLE statement is as follows −

DROP TABLE table_name;


Example
Let us first verify the CUSTOMERS table and then we will delete it from the database as shown below

SQL> DESC CUSTOMERS;
+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | | |
| NAME | varchar(20) | NO | | | |
| AGE | int(11) | NO | | | |
| ADDRESS | char(25) | YES | | NULL | |
| SALARY | decimal(18,2) | YES | | NULL | |
+---------+---------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

This means that the CUSTOMERS table is available in the database, so let us now drop it as shown
below.
SQL> DROP TABLE CUSTOMERS;
Query OK, 0 rows affected (0.01 sec)

Now, if you would try the DESC command, then you will get the following error −
SQL> DESC CUSTOMERS;
ERROR 1146 (42S02): Table 'TEST.CUSTOMERS' doesn't exist

Here, TEST is the database name which we are using for our examples.

The SQL INSERT INTO Statement is used to add new rows of data to a table in the database.

Syntax

There are two basic syntaxes of the INSERT INTO statement which are shown below.

INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)


VALUES (value1, value2, value3,...valueN);

Here, column1, column2, column3,...columnN are the names of the columns in the table into
which you want to insert the data.
You may not need to specify the column(s) name in the SQL query if you are adding values for
all the columns of the table. But make sure the order of the values is in the same order as the
columns in the table.

The SQL INSERT INTO syntax will be as follows −

INSERT INTO TABLE_NAME VALUES (value1,value2,value3,...valueN);


Example

The following statements would create six records in the CUSTOMERS table.

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (1, 'Ramesh', 32, 'Ahmedabad', 2000.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (2, 'Khilan', 25, 'Delhi', 1500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (3, 'kaushik', 23, 'Kota', 2000.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (4, 'Chaitali', 25, 'Mumbai', 6500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (5, 'Hardik', 27, 'Bhopal', 8500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (6, 'Komal', 22, 'MP', 4500.00 );

You can create a record in the CUSTOMERS table by using the second syntax as shown below.

INSERT INTO CUSTOMERS


VALUES (7, 'Muffy', 24, 'Indore', 10000.00 );

All the above statements would produce the following records in the CUSTOMERS table as
shown below.

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Populate one table using another table

You can populate the data into a table through the select statement over another table;
provided the other table has a set of fields, which are required to populate the first table.

Here is the syntax −

INSERT INTO first_table_name [(column1, column2, ... columnN)]


SELECT column1, column2, ...columnN
FROM second_table_name
[WHERE condition];

The SQL SELECT statement is used to fetch the data from a database table which returns this data in
the form of a result table. These result tables are called result-sets.
Syntax
The basic syntax of the SELECT statement is as follows −

SELECT column1, column2, columnN FROM table_name;

Here, column1, column2... are the fields of a table whose values you want to fetch. If you want to
fetch all the fields available in the field, then you can use the following syntax.
SELECT * FROM table_name;
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

The following code is an example, which would fetch the ID, Name and Salary fields of the customers
available in CUSTOMERS table.
SQL> SELECT ID, NAME, SALARY FROM CUSTOMERS;

This would produce the following result −

+----+----------+----------+
| ID | NAME | SALARY |
+----+----------+----------+
| 1 | Ramesh | 2000.00 |
| 2 | Khilan | 1500.00 |
| 3 | kaushik | 2000.00 |
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+----+----------+----------+

If you want to fetch all the fields of the CUSTOMERS table, then you should use the following query.
SQL> SELECT * FROM CUSTOMERS;

This would produce the result as shown below.

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
EMERGING TRENDS
Trends in Database Management
Concepts in database management hardly fall in the category of come-and-go, as the cost of
shifting between technical approaches overwhelms producers, managers, and designers.
However, there are several trends in database management, and knowing how to take
advantage of them will benefit your organization. Following are the some of the current trends:
1. Databases that bridge SQL/NoSQL
The latest trends in database products are those that don‘t simply embrace a single database
structure. Instead, the databases bridge SQL and NoSQL, giving users the best capabilities
offered by both. This includes products that allow users to access a NoSQL database in the
same way as a relational database, for example.
2. Databases in the cloud/Platform as a Service
As developers continue pushing their enterprises to the cloud, organizations are carefully
weighing the trade-offs associated with public versus private. Developers are also determining
how to combine cloud services with existing applications and infrastructure. Providers of cloud
service offer many options to database administrators. Making the move towards the cloud
doesn‘t mean changing organizational priorities, but finding products and services that help
your group meet its goals.
3. Automated management
Automating database management is another emerging trend. The set of such techniques and
tools intend to simplify maintenance, patching, provisioning, updates and upgrades —even
project workflow. However, the trend may have limited usefulness since database management
frequently needs human intervention.
4. An increased focus on security
While not exactly a trend given the constant focus on data security, recent ongoing retail
database breaches among US-based organizations show with ample clarity the importance for
database administrators to work hand-in-hand with their IT security colleagues to ensure all
enterprise data remains safe. Any organization that stores data is vulnerable.Database
administrators must also work with the security team to eliminate potential internal
weaknesses that could make data vulnerable. These could include issues related to network
privileges, even hardware or software misconfigurations that could be misused,resulting in data
leaks.
5. In-memory databases
Within the data warehousing community there are similar questions about columnar versus
row-based relational tables; the rise of in-memory databases, the use of flash or solid-state
disks (which also applies within transaction processing), clustered versus no-clustered solutions
and so on.
6. Big Data
To be clear, big data does not necessarily mean lots of data. What it really refers to is the ability
to process any type of data: what is typically referred toas semi-structured and unstructured
data as well as structured data. Current thinking is that these will typically live alongside
conventional solutions as separate technologies, at least in large organisations, but this will not
always be the case.
Integrating Trends
Projects involving databases should not be viewed and appreciated solely on how they adhere
to these trends. Ideally, each tool or process available should merge in some meaningful way
with existing operations. It is important to look of these trends as items that can coincide:
enhancing security and moving to the cloud coexist?
The Top Challenges and Solutions of Database Management
No matter what field you work in, there will be changes over time. As technology becomes
more and more advanced, everyone from doctors to politicians and athletes must learn to use
these changes to their advantage. While other professions have encountered these changes,
few have experienced them on the same level as database administrators.
Thirty years after the computerization of databases, the Internet has lead to an exponential
growth within the industry –whether indirectly or directly, everything that compiles data uses a
database. Recent times have proven to be an exceptional period of the production and
capturing of a nearly overwhelming amount of data. This has obviously created opportunities
for businesses to gain visibility into their customers and industry, but it has also created many
challenges in database management.
Database Management Problems
•Data Integration from Various Sources –With the advancement of smartphones, new mobile
applications, and the Internet of Things, businesses must be able to have their data adapt
accordingly. These varying types of data and sources cause a typical data center of today to
contain patchwork for data managementtechnologies. The management techniques have
become more diverse than ever.
•Public and Private DataSecurity –In today’s digital world, security is the most prevalent
concern. Businesses must be able to ensure that every bit of their data remains safe and at
limited risk of exposure from hackers or leaks. Database breaches of highly sensitive
information have led to the destroyed reputation of businesses. It is up to the manager of the
database to ensure that the data is fully secured at all times.
•The Management of Cloud-Based Databases –In recent years, the Cloud has become one of
the biggest terms in the tech community. Both businesses and consumers want to be able to
access their data from database from the cloud or from a cloud database provider’s servers in
addition to the standard on-premises mode of deployment. Cloud computingenables users to
effectively allocate resources, optimize scaling, and allow for high availability. Handling
database that run on the cloud and on-premises is yet another challenge for database
managers.
•The Growth of Structured and Unstructured Data –The amount of data that has being both
created and collected has been growing at an unprecedented rate for years. Those who deal
with analytics may be excited by the promise of insight and business intelligence that comes
from big data, but those who manage databases face the challenges that come along with
managing overall growth and data types from an increasing number of database platforms.
Database Management Solutions
There are four main areas to think about when thinking about approaching these database
problems. The following are a few things to consider as solutions:
•Data Strategy
o What kind of data is important and what kind of performance should be achieved?
What data needs to be protected and what should be analyzed?
o How much historical data must be accumulated? What does this mean for capacity
planning and disk space?
o Can you monetize on your data? Which data needs to be aggregated or correlated to
provide the necessary insights into the business?
•Database Support
o You must consider that moving to the cloud does not guarantee data backup and
security. This is something that must still be managed with 24/7 monitoring and
coverage.
o Are the right personnel members with the necessary skill sets always available?
•Backup Strategy
o Do you have the right kind of backup retention available?
o Have you determined the necessary backup frequency to determine the Recovery Point
Objective (RPO)?
o Have you determined the Recovery Time Objective (RTO) due to high availability
requirements?
•Security Strategy
o How will external and internal security be handled? Who can access what?
o What kind of data access policies should be in place?
o How are regulatory requirements handled?
o In the event of a hack, breach, or leak, how will data exposure be handled?

You might also like