Database Notes New
Database Notes New
DATABASES.
What is a Database?
Examples of databases.
DATABASE CONCEPTS.
A Database is a common data pool, maintained to support the various activities taking place
within an organization.
The database is an organized set of data items that reduces duplications of the stored files.
These refer to the traditional methods of storing files, i.e., the use of paper files. E.g., Manual &
Flat files.
• In Integrated file systems, several inter-independent files are maintained for the different
users’ requirements.
• The Integrated file systems have the problems of data duplication.
• In order to carry out any file processing task(s), all the related files have to be processed.
• Some information resulting from several files may not be available, giving the overall
state of affairs of the system.
DATABASE MAINTENANCE.
A Database cannot be created fully at once. Its creation and maintenance is a gradual and
continuous procedure. The creation & the maintenance of databases is under the influence of
a set of user programs known as the Database Management Systems (DBMS).
Through the DBMS, users communicate their requirements to the database using Data
Description Languages (DDL’s) & Data Manipulation Languages (DML’s).
In fact, the DBMS provide an interface between the user’s programs and the contents of the
database.
During the creation & subsequent maintenance of the database, the DDL’s & DML’s are used to:
Data Dictionary.
All definitions of elements in the system are described in detail in a Data dictionary.
The elements of the system that are defined are: Dataflow, Processes, and Data stores.
If a database administrator wants to know the definition of a data item name or the content of
a particular dataflow, the information should be available in the dictionary.
Notes.
• Databases are used for several purposes, e.g., in Accounting – used for maintenance of
the customer files within the base.
• Database systems are installed & coordinated by a Database Administrator, who has the
overall authority to establish and control data definitions and standards.
• Database storage requires a large Direct Access storage (e.g., the disk) maintained on-
line.
• The database contents should be backed up, after every update or maintenance run, to
supplement the database contents in case of loss. The backup media to be used is chosen
by the organization.
Data Bank.
A Data Bank can be defined as a collection of data, usually for several users, and available to
several organizations.
Notes.
The Database & the Data Bank have similar construction and purpose. The only difference is
that, the term Data Bank is used to describe a larger capacity base, whose contents are mostly
of historical references (i.e., the Data Bank forms the basis for data or information that is
usually generated periodically). On the other hand, the contents of the Database are used
frequently to generate information that influences the decisions of the concerned organization.
TYPES OF DATABASE MODELS.
A Relational database is a set of data where all the items are related.
The data elements in a Relational database are stored or organized in tables. A Table consists
of rows & columns. Each column represents a Field, while a row represents a Record. The
records are grouped under fields.
~ A Relational database system, has the ability to quickly find & bring information stored in
separate tables together using queries, forms, & reports. This means that, a data element in
any one table can be related to any piece of data in another table as long as both tables share
common data elements.
Microsoft Access.
• FileMaker Pro.
•
It is a data structure where the data is organized like a family tree or an organization chart.
In a Hierarchical database, the records are stored in multiple levels. Units further down the
system are subordinate to the ones above.
In other words, the database has branches made up of parent and child records. Each parent
record can have multiple child records, but each child can have only one parent.
• These are programs used to store & manage files or records containing related
information.
• A collection of programs required to store & retrieve data from a database.
• A DBMS is a tool that allows one to create, maintain, update and store the data within a
database.
A DBMS is a complex software, which creates, expands & maintains the database, and it also
provides the interface between the user and the data in the database.
A DBMS enables the user to create lists of information in a computer, analyse them, add new
information, delete old information, and so on. It allows users to efficiently store information
in an orderly manner for quick retrieval.
The PC-based database programs are usually designed for individual users or small businesses.
They provide many general features for organizing & analyzing data. For example, they allow
users to create database files, enter data, organize that data in various ways, and also create
reports.
They do not have strict security features, complicated backup & recovery procedures.
They are designed for big corporations that handle large amounts of data.
Issues such as security, data integrity (reliability), backup and recovery are taken seriously to
prevent loss of information.
Using a DBMS, you can define relationships between records & files maintained in a
database. In this case, a transaction in one file of the database can also cause a series of
updates in parts of other tables. Thus, the data is input only once to the database and is made
available to the many files composing it.
• Have facilities for generating Reports.
• Have a Find or Search facility that enables the user to scan through the records in the
database so as to find information he/she needs.
• Allow Sorting that enables the user to organize & arrange the records within the
database.
• Contain Query & Filter facilities that specify the information you want the database to
search or sort.
• Have a data Validating
The DBMS is a set of software, which have several functions in relation to the database as listed
below:
1. Creates or constructs the database contents through the Data Manipulation Languages.
2. Interfaces (links) the user to the database contents through Data Manipulation
Languages.
3. Ensures the growth of the database contents through addition of new fields & records
onto the database.
4. Maintains the contents of the database. This involves adding new records or files into the
database, modifying the already existing records & deleting of the outdated records.
5. It helps the user to sort through the records & compile lists based on any criteria he/she
would like to establish.
6. Manages the storage space for the data within the database & keeps track of all the data
in the database.
7. It provides flexible processing methods for the contents of the database.
8. Protects the contents of the database against all sorts of damage or misuse, e.g. illegal
access.
9. Monitors the usage of the database contents to determine the rarely used data and those
that are frequently used, so that they can be made readily available, whenever need
arises.
10. It maintains a dictionary of the data within the database & manages the data descriptions
in the dictionary.
1. Database systems can be used to store data, retrieve and generate reports.
2. It is easy to maintain the data stored within a database.
3. A DBMS is able to handle large amounts of data.
4. Data is stored in an organized format, i.e. under different fieldnames.
5. With modern equipment, data can easily be recorded.
6. Data is quickly & easily accessed or retrieved, as it is properly organized.
7. It helps in linking many database tables and sourcing of data from these tables.
8. It is quite easy to update the data stored within a database.
A database is a collection of files grouped together by a series of tables as one entity. These
tables serve as an index for defining relationships between records and files maintained in the
database. This makes updating of the data in the related tables very easy.
9. Use of a database tool reduces duplication of the stored files, and the reprocessing of the
same data items. In addition, several independent files are maintained for the different
user requirements.
10. It is used to query & display records satisfying a given condition.
11. It is easy to analyse information stored in a database & to prepare summary reports &
charts.
12. It cost saving. This results from the sharing of records, reduced processing times, reduced
use of software and hardware, more efficient use of data processing personnel, and an
overall improvement in the flow of data.
13. Use of Integrated systems is greatly facilitated.
An Integrated system – A total system approach that unifies all the aspects of the
organization. Facilities are shared across the complete organization.
14. A lot of programming time is saved because the DBMS can be used to construct &
process files as well as retrieve data.
15. Information supplied to managers is more valuable, because it is based on a widespread
collection of data (instead of files, which contain only the data needed for one
application).
16. The database also maintains an extensive Inventory Control file. This file gives an account
of all the parts & equipment throughout the maintenance system. It also defines the
status of each part and its location.
17. It enables timely & accurate reporting of data to all the maintenance centres. The same
data is available and distributed to everyone.
18. The database maintains files related to any work assigned to outside service centres.
Many parts are repaired by the vendors from whom they are purchased. A database is used to
maintain data on the parts that have been shipped to vendors and those that are outstanding
from the inventory. Data relating to the guarantees and warranties of individual vendors are
also stored in the database.
DISADVANTAGES OF DATABASES.
1. A Database system requires a big size, very high cost & a lot of time to implement.
2. A Database requires the use of a large-scale computer system.
3. The time involved. A project of this type requires a minimum of 1 – 2 years.
4. A large full-time staff is also required to design, program, & support the implementation
of a database.
5. The cost of the database project is a limiting factor for many organizations.
Database-oriented computer systems are not luxuries, and are undertaken when proven
economically reasonable.
A database management system stores data in such a way that it becomes easier to retrieve,
manipulate, and produce information.
Characteristics
Traditionally, data was organized in file formats. DBMS was a new concept then, and all the
research was done to make it overcome the deficiencies in traditional style of data management.
A modern DBMS has the following characteristics −
• Real-world entity − A modern DBMS is more realistic and uses real-world entities to
design its architecture. It uses the behavior and attributes too. For example, a school
database may use students as an entity and their age as an attribute.
• Relation-based tables − DBMS allows entities and relations among them to form tables.
A user can understand the architecture of a database just by looking at the table names.
• Isolation of data and application − A database system is entirely different than its data. A
database is an active entity, whereas data is said to be passive, on which the database
works and organizes. DBMS also stores metadata, which is data about data, to ease its
own process.
• Less redundancy − DBMS follows the rules of normalization, which splits a relation when
any of its attributes is having redundancy in values. Normalization is a mathematically rich
and scientific process that reduces data redundancy.
• Consistency − Consistency is a state where every relation in a database remains
consistent. There exist methods and techniques, which can detect attempt of leaving
database in inconsistent state. A DBMS can provide greater consistency as compared to
earlier forms of data storing applications like file-processing systems.
• Query Language − DBMS is equipped with query language, which makes it more efficient
to retrieve and manipulate data. A user can apply as many and as different filtering options
as required to retrieve a set of data. Traditionally it was not possible where file-processing
system was used.
• ACID Properties − DBMS follows the concepts of Atomicity, Consistency, Isolation,
and Durability (normally shortened as ACID). These concepts are applied on transactions,
which manipulate data in a database. ACID properties help the database stay healthy in
multi-transactional environments and in case of failure.
• Multiuser and Concurrent Access − DBMS supports multi-user environment and allows
them to access and manipulate data in parallel. Though there are restrictions on
transactions when users attempt to handle the same data item, but users are always
unaware of them.
• Multiple views − DBMS offers multiple views for different users. A user who is in the Sales
department will have a different view of database than a person working in the Production
department. This feature enables the users to have a concentrate view of the database
according to their requirements.
• Security − Features like multiple views offer security to some extent where users are
unable to access data of other users and departments. DBMS offers methods to impose
constraints while entering data into the database and retrieving the same at a later stage.
DBMS offers many different levels of security features, which enables multiple users to
have different views with different features. For example, a user in the Sales department
cannot see the data that belongs to the Purchase department. Additionally, it can also be
managed how much data of the Sales department should be displayed to the user. Since
a DBMS is not saved on the disk as traditional file systems, it is very hard for miscreants to
break the code.
Users
A typical DBMS has users with different rights and permissions who use it for different purposes.
Some users retrieve data and some back it up. The users of a DBMS can be broadly categorized
as follows −
DBMS - Architecture
The design of a DBMS depends on its architecture. It can be centralized or decentralized or
hierarchical. The architecture of a DBMS can be seen as either single tier or multi-tier. An n-tier
architecture divides the whole system into related but independent n modules, which can be
independently modified, altered, changed, or replaced.
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS and
uses it. Any changes done here will directly be done on the DBMS itself. It does not provide handy
tools for end-users. Database designers and programmers normally prefer to use single-tier
architecture.
If the architecture of DBMS is 2-tier, then it must have an application through which the DBMS
can be accessed. Programmers use 2-tier architecture where they access the DBMS by means of
an application. Here the application tier is entirely independent of the database in terms of
operation, design, and programming.
3-tier Architecture
A 3-tier architecture separates its tiers from each other based on the complexity of the users and
how they use the data present in the database. It is the most widely used architecture to design
a DBMS.
• Database (Data) Tier − At this tier, the database resides along with its query processing
languages. We also have the relations that define the data and their constraints at this
level.
• Application (Middle) Tier − At this tier reside the application server and the programs that
access the database. For a user, this application tier presents an abstracted view of the
database. End-users are unaware of any existence of the database beyond the application.
At the other end, the database tier is not aware of any other user beyond the application
tier. Hence, the application layer sits in the middle and acts as a mediator between the
end-user and the database.
• User (Presentation) Tier − End-users operate on this tier and they know nothing about
any existence of the database beyond this layer. At this layer, multiple views of the
database can be provided by the application. All views are generated by applications that
reside in the application tier.
Multiple-tier database architecture is highly modifiable, as almost all its components are
independent and can be changed independently.
DBMS - Data Models
Data models define how the logical structure of a database is modeled. Data Models are
fundamental entities to introduce abstraction in a DBMS. Data models define how data is
connected to each other and how they are processed and stored inside the system.
The very first data model could be flat data-models, where all the data used are to be kept in the
same plane. Earlier data models were not so scientific, hence they were prone to introduce lots
of duplication and update anomalies.
Entity-Relationship Model
Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them. While formulating real-world scenario into the database model, the ER Model
creates entity set, relationship set, general attributes and constraints.
ER Model is best used for the conceptual design of a database.
ER Model is based on −
• Entities and their attributes.
• Relationships among entities.
These concepts are explained below.
Database Instance
It is important that we distinguish these two terms individually. Database schema is the skeleton
of database. It is designed when the database doesn't exist at all. Once the database is
operational, it is very difficult to make any changes to it. A database schema does not contain
any data or information.
A database instance is a state of operational database with data at any given time. It contains a
snapshot of the database. Database instances tend to change with time. A DBMS ensures that its
every instance (state) is in a valid state, by diligently following all the validations, constraints, and
conditions that the database designers have imposed.
ER Diagram in DBMS
The ER or (Entity Relational Model) is a high-level conceptual data model diagram. Entity-
Relation model is based on the notion of real-world entities and the relationship between
them.
History of ER models
ER diagrams are a visual tool which is helpful to represent the ER model. It was proposed by
Peter Chen in 1971 to create a uniform convention which can be used for relational database
and network. He aimed to use an ER model as a conceptual modeling approach.
What is ER Diagrams?
Entity relationship diagram displays the relationships of entity set stored in a database. In other
words, we can say that ER diagrams help you to explain the logical structure of databases. At
first look, an ER diagram looks very similar to the flowchart. However, ER Diagram includes
many specialized symbols, and its meanings make this model unique.
Sample ER Diagram
Facts about ER Diagram Model:
• Entities
• Attributes
• Relationships
Example
For example, in a University database, we might have entities for Students, Courses, and
Lecturers. Student’s entity can have attributes like Rollno, Name, and DeptID. They might have
relationships with Courses and Lecturers.
WHAT IS ENTITY?
A real-world thing either living or non-living that is easily recognizable and non-recognizable. It
is anything in the enterprise that is to be represented in our database. It may be a physical thing
or simply a fact about the enterprise or an event that happens in the real world.
An entity can be place, person, object, event or a concept, which stores data in the database.
The characteristics of entities are must have an attribute, and a unique key. Every entity is
made up of some 'attributes' which represent that entity.
Examples of entities:
Entity set:
Student
An entity set is a group of similar kind of entities. It may contain entities with attribute sharing
similar values. Entities are represented by their properties, which also called attributes. All
attributes have their separate values. For example, a student entity may have a name, age,
class, as attributes.
Example of Entities:
A university may have some departments. All these departments employ various lecturers and
offer several programs.
Some courses make up each program. Students register in a particular program and enroll in
various courses. A lecturer from the specific department takes each course, and each lecturer
teaches a various group of students.
Relationship
Relationship is nothing but an association among two or more entities. E.g., Tom works in the
Chemistry department.
Entities take part in relationships. We can often identify relationships with verbs or verb
phrases.
For example:
Weak Entities
A weak entity is a type of entity which doesn't have its key attribute. It can be identified
uniquely by considering the primary key of another entity. For that, weak entity sets need to
have participation.
Let's learn more about a weak entity by comparing it with a Strong Entity
Strong entity set always has a primary key. It does not have enough attributes to build
a primary key.
Primary Key is one of its attributes which In a weak entity set, it is a combination of
helps to identify its member. primary key and partial key of the strong
entity set.
In the ER diagram the relationship The relationship between one strong and a
between two strong entity set shown by weak entity set shown by using the double
using a diamond symbol. diamond symbol.
The connecting line of the strong entity set The line connecting the weak entity set for
with the relationship is single. identifying relationship is double.
Attributes
For example, a lecture might have attributes: time, date, duration, place, etc.
Cardinality
Defines the numerical attributes of the relationship between two entities or entity sets.
• One-to-One Relationships
• One-to-Many Relationships
• May to One Relationships
• Many-to-Many Relationships
1.One-to-one:
One entity from entity set X can be associated with at most one entity of entity set Y and vice
versa.
Example: One student can register for numerous courses. However, all those courses have a
single line back to that one student.
2.One-to-many:
One entity from entity set X can be associated with multiple entities of entity set Y, but an
entity from entity set Y can be associated with at least one entity.
More than one entity from entity set X can be associated with at most one entity of entity set Y.
However, an entity from entity set Y may or may not be associated with more than one entity
from entity set X.
4. Many to Many:
One entity from X can be associated with more than one entity from Y and vice versa.
For example, Students as a group are associated with multiple faculty members, and faculty
members can be associated with multiple students.
ER- Diagram Notations
ER- Diagram is a visual representation of data that describe how data is related to each other.
In a university, a Student enrolls in Courses. A student must be assigned to at least one or more
Courses. Each course is taught by a single Professor. To maintain instruction quality, a Professor
can deliver only one course
Step 1) Entity Identification
• Student
• Course
• Professor
Step 2) Relationship Identification
You need to study the files, forms, reports, data currently maintained by the organization to
identify attributes. You can also conduct interviews with various stakeholders to identify
entities. Initially, it's important to identify the attributes without mapping them to a particular
entity.
Once, you have a list of Attributes, you need to map them to the identified entities. Ensure an
attribute is to be paired with exactly one entity. If you think an attribute should belong to more
than one entity, use a modifier to make it unique.
Once the mapping is done, identify the primary Keys. If a unique key is not readily available,
create one.
For Course Entity, attributes could be Duration, Credits, Assignments, etc. For the sake of ease
we have considered just one attribute.
Summary
DBMS – Normalization
Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation.
Functional dependency says that if two tuples have same values for attributes A1, A2,..., An, then
those two tuples must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally
determines Y. The left-hand side attributes determine the values of attributes on the right-hand
side.
Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as F +, is the set of all
functional dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when
applied repeatedly, generates a closure of functional dependencies.
• Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds
beta.
• Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is
adding attributes in dependencies, does not change the basic dependencies.
• Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then
a → c also holds. a → b is called as a functionally that determines b.
Trivial Functional Dependency
• Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is
called a trivial FD. Trivial FDs always hold.
• Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial
FD.
• Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a
completely non-trivial FD.
Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any
database administrator. Managing a database with anomalies is next to impossible.
• Update anomalies − If data items are scattered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances get updated properly while
a few others are left with old values. Such instances leave the database in an inconsistent
state.
• Deletion anomalies − We tried to delete a record, but parts of it was left undeleted
because of unawareness, the data is also saved somewhere else.
• Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent
state.
First Normal Form
First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all
the attributes in a relation must have atomic domains. The values in an atomic domain are
indivisible units.
We re-arrange the relation (table) as below, to convert it to First Normal Form.
Each attribute must contain only a single value from its pre-defined domain.
Second Normal Form
Before we learn about the second normal form, we need to understand the following −
• Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime
attribute.
• Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a
non-prime attribute.
If we follow second normal form, then every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper
subset Y of X, for which Y → A also holds true.
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually. But we find that Stu_Name can
be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
Third Normal Form
For a relation to be in Third Normal Form, it must be in Second Normal form and the following
must satisfy −
o A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute.
We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is
City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows
−
Boyce-Codd Normal Form
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF
states that −
DBMS - Joins
We understand the benefits of taking a Cartesian product of two relations, which gives us all the
possible tuples that are paired together. But it might not be feasible for us in certain cases to take
a Cartesian product where we encounter huge relations with thousands of tuples having a
considerable large number of attributes.
Join is a combination of a Cartesian product followed by a selection process. A Join operation
pairs two tuples from different relations, if and only if a given join condition is satisfied.
We will briefly describe various join types in the following sections.
Theta (θ) Join
Theta join combines tuples from different relations provided they satisfy the theta condition. The
join condition is denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the attributes
don’t have anything in common, that is R1 ∩ R2 = Φ.
Theta join can use all kinds of comparison operators.
Student
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
11 Music
11 Sports
Student_Detail −
STUDENT ⋈Student.Std = Subject.Class SUBJECT
Student_detail
Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The above
example corresponds to equijoin.
Natural Join (⋈)
Natural join does not use any comparison operator. It does not concatenate the way a Cartesian
product does. We can perform a Natural Join only if there is at least one common attribute that
exists between two relations. In addition, the attributes must have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in both the relations
are same.
Courses
CS01 Database CS
ME01 Mechanics ME
EE01 Electronics EE
HoD
Dept Head
CS Alex
ME Maya
EE Mira
Courses ⋈ HoD
Dept CID Course Head
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only those
tuples with matching attributes and the rest are discarded in the resulting relation. Therefore,
we need to use outer joins to include all the tuples from the participating relations in the resulting
relation. There are three kinds of outer joins − left outer join, right outer join, and full outer join.
Left Outer Join(R S)
All the tuples from the Left relation, R, are included in the resulting relation. If there are tuples in
R without any matching tuple in the Right relation S, then the S-attributes of the resulting relation
are made NULL.
Left
A B
100 Database
101 Mechanics
102 Electronics
Right
A B
100 Alex
102 Maya
104 Mira
Courses HoD
A B C D
Courses HoD
A B C D
Courses HoD
A B C D
Relative data and information is stored collectively in file formats. A file is a sequence of records
stored in binary format. A disk drive is formatted into several blocks that can store records. File
records are mapped onto those disk blocks.
File Organization
File Organization defines how file records are mapped onto disk blocks. We have four types of
File Organization to organize file records −
• Active − In this state, the transaction is being executed. This is the initial state of every
transaction.
• Partially Committed − When a transaction executes its final operation, it is said to be in a
partially committed state.
• Failed − A transaction is said to be in a failed state if any of the checks made by the
database recovery system fails. A failed transaction can no longer proceed further.
• Aborted − If any of the checks fails and the transaction has reached a failed state, then
the recovery manager rolls back all its write operations on the database to bring the
database back to its original state where it was prior to the execution of the transaction.
Transactions in this state are called aborted. The database recovery module can select one
of the two operations after a transaction aborts −
o Re-start the transaction
o Kill the transaction
• Committed − If a transaction executes all its operations successfully, it is said to be
committed. All its effects are now permanently established on the database system
Concurrency Control
In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions. We have
concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent
transactions. Concurrency control protocols can be broadly divided into two categories −
Two-phase locking has two phases, one is growing, where all the locks are being acquired by the
transaction; and the second phase is shrinking, where the locks held by the transaction are being
released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then
upgrade it to an exclusive lock.
Strict Two-Phase Locking
The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the
transaction continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a
lock after using it. Strict-2PL holds all the locks until the commit point and releases all the locks
at a time.
Strict-2PL does not have cascading abort as 2PL does.
Timestamp-based Protocols
The most commonly used concurrency protocol is the timestamp based protocol. This protocol
uses either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions at the
time of execution, whereas timestamp-based protocols start working as soon as a transaction is
created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age
of the transaction. A transaction created at 0002 clock time would be older than all other
transactions that come after it. For example, any transaction 'y' entering the system at 0004 is
two seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system
know when the last ‘read and write’ operation was performed on the data item.
Timestamp Ordering Protocol
The timestamp-ordering protocol ensures serializability among transactions in their conflicting
read and write operations. This is the responsibility of the protocol system that the conflicting
pair of tasks should be executed according to the timestamp values of the transactions.
Recovery
• When the system recovers from a failure, it can restore the latest dump.
• It can maintain a redo-list and an undo-list as checkpoints.
• It can recover the system by consulting undo-redo lists to restore the state of all
transactions up to the last checkpoint.
Database Backup & Recovery from Catastrophic Failure
A catastrophic failure is one where a stable, secondary storage device gets corrupt. With the
storage device, all the valuable data that is stored inside is lost. We have two different strategies
to recover data from such a catastrophic failure −
• Remote backup &minu; Here a backup copy of the database is stored at a remote location
from where it can be restored in case of a catastrophe.
• Alternatively, database backups can be taken on magnetic tapes and stored at a safer
place. This backup can later be transferred onto a freshly installed database to bring it to
the point of backup.
Grown-up databases are too bulky to be frequently backed up. In such cases, we have techniques
where we can restore a database just by looking at its logs. So, all that we need to do here is to
take a backup of all the logs at frequent intervals of time. The database can be backed up once a
week, and the logs being very small can be backed up every day or as frequently as possible.
Remote Backup
Remote backup provides a sense of security in case the primary location where the database is
located gets destroyed. Remote backup can be offline or real-time or online. In case it is offline,
it is maintained manually.
Online backup systems are more real-time and lifesavers for database administrators and
investors. An online backup system is a mechanism where every bit of the real-time data is
backed up simultaneously at two distant places. One of them is directly connected to the system
and the other one is kept at a remote place as backup.
As soon as the primary database storage fails, the backup system senses the failure and switches
the user system to the remote storage. Sometimes this is so instant that the users can’t even
realize a failure.
SQL
SQL is a database computer language designed for the retrieval and management of data in a
relational database. SQL stands for Structured Query Language. This tutorial will give you a
quick start to SQL. It covers most of the topics required for a basic understanding of SQL and to
get a feel of how it works.
SQL is Structured Query Language, which is a computer language for storing, manipulating and
retrieving data stored in a relational database.
SQL is the standard language for Relational Database System. All the Relational Database
Management Systems (RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and
SQL Server use SQL as their standard database language.
Also, they are using different dialects, such as −
Applications of SQL
As mentioned before, SQL is one of the most widely used query language over the databases.
I'm going to list few of them here:
• Allows users to access data in the relational database management systems.
• Allows users to describe the data.
• Allows users to define the data in a database and manipulate that data.
• Allows to embed within other languages using SQL modules, libraries & pre-compilers.
• Allows users to create and drop databases and tables.
• Allows users to create view, stored procedure, functions in a database.
• Allows users to set permissions on tables, procedures and views.
SQL is followed by a unique set of rules and guidelines called Syntax. This tutorial gives you a
quick start with SQL by listing all the basic SQL Syntax.
All the SQL statements start with any of the keywords like SELECT, INSERT, UPDATE, DELETE,
ALTER, DROP, CREATE, USE, SHOW and all the statements end with a semicolon (;).
The most important point to be noted here is that SQL is case insensitive, which means SELECT
and select have same meaning in SQL statements. Whereas, MySQL makes difference in table
names. So, if you are working with MySQL, then you need to give table names as they exist in
the database.
SQL commands are grouped into four major categories depending on their functionality:
•Data Definition Language (DDL)-These SQL commands are used for creating, modifying,
and dropping the structure of database objects. The commands are CREATE, ALTER,
DROP, RENAME, and TRUNCATE.
•Data Manipulation Language (DML)-These SQL commands are used for storing,
retrieving, modifying, and deleting data. These Data Manipulation Language commands
are: SELECTINSERT, UPDATE, and DELETE.
•Transaction Control Language (TCL)-These SQL commands are used for managing
changes affecting the data. These commands are COMMIT, ROLLBACK, and SAVEPOINT.
•Data Control Language (DCL)-These SQL commands are used for providing security to
database objects. These commands are GRANT and REVOKE.
Various Syntax in SQL
All the examples given in this tutorial have been tested with a MySQL server.
SQL Server offers six categories of data types for your use which are listed below −
Exact Numeric Data Types
DATA TYPE FROM TO
tinyint 0 255
bit 0 1
1
char
1
nchar
1
binary
sql_variant
1
Stores values of various SQL Server-supported data types, except text, ntext, and
timestamp.
timestamp
2
Stores a database-wide unique number that gets updated every time a row gets updated
uniqueidentifier
3
Stores a globally unique identifier (GUID)
xml
4
Stores XML data. You can store xml instances in a column or a variable (SQL Server 2005
only).
cursor
5
Reference to a cursor object
table
6
Stores a result set for later processing
• Arithmetic operators
• Comparison operators
• Logical operators
• Operators used to negate conditions
* (Multiplication) Multiplies values on either side of the operator. a * b will give 200
/ (Division) Divides left hand operand by right hand operand. b / a will give 2
AND
2 The AND operator allows the existence of multiple conditions in an SQL statement's
WHERE clause.
ANY
3 The ANY operator is used to compare a value to any applicable value in the list as per
the condition.
BETWEEN
4 The BETWEEN operator is used to search for values that are within a set of values,
given the minimum value and the maximum value.
EXISTS
5 The EXISTS operator is used to search for the presence of a row in a specified table
that meets a certain criterion.
IN
6 The IN operator is used to compare a value to a list of literal values that have been
specified.
LIKE
7 The LIKE operator is used to compare a value to similar values using wildcard
operators.
NOT
8 The NOT operator reverses the meaning of the logical operator with which it is used.
Eg: NOT EXISTS, NOT BETWEEN, NOT IN, etc. This is a negate operator.
9 OR
The OR operator is used to combine multiple conditions in an SQL statement's
WHERE clause.
IS NULL
10
The NULL operator is used to compare a value with a NULL value.
UNIQUE
11 The UNIQUE operator searches every row of a specified table for uniqueness (no
duplicates).
The SQL CREATE DATABASE statement is used to create a new SQL database.
Syntax
The basic syntax of this CREATE DATABASE statement is as follows −
Make sure you have the admin privilege before creating any database. Once a database is created,
you can check it in the list of databases as follows −
NOTE − Be careful before using this operation because by deleting an existing database would result
in loss of complete information stored in the database.
Make sure you have the admin privilege before dropping any database. Once a database is dropped,
you can check it in the list of the databases as shown below −
SQL> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| AMROOD |
| TUTORIALSPOINT |
| mysql |
| orig |
| test |
+--------------------+
6 rows in set (0.00 sec)
Creating a basic table involves naming the table and defining its columns and each column's data
type.
The SQL CREATE TABLE statement is used to create a new table.
Syntax
The basic syntax of the CREATE TABLE statement is as follows −
CREATE TABLE is the keyword telling the database system what you want to do. In this case, you
want to create a new table. The unique name or identifier for the table follows the CREATE
TABLE statement.
Then in brackets comes the list defining each column in the table and what sort of data type it
is. The syntax becomes clearer with the following example.
A copy of an existing table can be created using a combination of the CREATE TABLE statement
and the SELECT statement. You can check the complete details at Create Table Using another
Table.
Example
The following code block is an example, which creates a CUSTOMERS table with an ID as a primary
key and NOT NULL are the constraints showing that these fields cannot be NULL while creating
records in this table −
SQL> CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);
You can verify if your table has been created successfully by looking at the message displayed by the
SQL server, otherwise you can use the DESC command as follows −
Now, you have CUSTOMERS table available in your database which you can use to store the required
information related to customers.
The SQL DROP TABLE statement is used to remove a table definition and all the data, indexes,
triggers, constraints and permission specifications for that table.
NOTE − You should be very careful while using this command because once a table is deleted then
all the information available in that table will also be lost forever.
Syntax
The basic syntax of this DROP TABLE statement is as follows −
This means that the CUSTOMERS table is available in the database, so let us now drop it as shown
below.
SQL> DROP TABLE CUSTOMERS;
Query OK, 0 rows affected (0.01 sec)
Now, if you would try the DESC command, then you will get the following error −
SQL> DESC CUSTOMERS;
ERROR 1146 (42S02): Table 'TEST.CUSTOMERS' doesn't exist
Here, TEST is the database name which we are using for our examples.
The SQL INSERT INTO Statement is used to add new rows of data to a table in the database.
Syntax
There are two basic syntaxes of the INSERT INTO statement which are shown below.
Here, column1, column2, column3,...columnN are the names of the columns in the table into
which you want to insert the data.
You may not need to specify the column(s) name in the SQL query if you are adding values for
all the columns of the table. But make sure the order of the values is in the same order as the
columns in the table.
The following statements would create six records in the CUSTOMERS table.
You can create a record in the CUSTOMERS table by using the second syntax as shown below.
All the above statements would produce the following records in the CUSTOMERS table as
shown below.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Populate one table using another table
You can populate the data into a table through the select statement over another table;
provided the other table has a set of fields, which are required to populate the first table.
The SQL SELECT statement is used to fetch the data from a database table which returns this data in
the form of a result table. These result tables are called result-sets.
Syntax
The basic syntax of the SELECT statement is as follows −
Here, column1, column2... are the fields of a table whose values you want to fetch. If you want to
fetch all the fields available in the field, then you can use the following syntax.
SELECT * FROM table_name;
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The following code is an example, which would fetch the ID, Name and Salary fields of the customers
available in CUSTOMERS table.
SQL> SELECT ID, NAME, SALARY FROM CUSTOMERS;
+----+----------+----------+
| ID | NAME | SALARY |
+----+----------+----------+
| 1 | Ramesh | 2000.00 |
| 2 | Khilan | 1500.00 |
| 3 | kaushik | 2000.00 |
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+----+----------+----------+
If you want to fetch all the fields of the CUSTOMERS table, then you should use the following query.
SQL> SELECT * FROM CUSTOMERS;
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
EMERGING TRENDS
Trends in Database Management
Concepts in database management hardly fall in the category of come-and-go, as the cost of
shifting between technical approaches overwhelms producers, managers, and designers.
However, there are several trends in database management, and knowing how to take
advantage of them will benefit your organization. Following are the some of the current trends:
1. Databases that bridge SQL/NoSQL
The latest trends in database products are those that don‘t simply embrace a single database
structure. Instead, the databases bridge SQL and NoSQL, giving users the best capabilities
offered by both. This includes products that allow users to access a NoSQL database in the
same way as a relational database, for example.
2. Databases in the cloud/Platform as a Service
As developers continue pushing their enterprises to the cloud, organizations are carefully
weighing the trade-offs associated with public versus private. Developers are also determining
how to combine cloud services with existing applications and infrastructure. Providers of cloud
service offer many options to database administrators. Making the move towards the cloud
doesn‘t mean changing organizational priorities, but finding products and services that help
your group meet its goals.
3. Automated management
Automating database management is another emerging trend. The set of such techniques and
tools intend to simplify maintenance, patching, provisioning, updates and upgrades —even
project workflow. However, the trend may have limited usefulness since database management
frequently needs human intervention.
4. An increased focus on security
While not exactly a trend given the constant focus on data security, recent ongoing retail
database breaches among US-based organizations show with ample clarity the importance for
database administrators to work hand-in-hand with their IT security colleagues to ensure all
enterprise data remains safe. Any organization that stores data is vulnerable.Database
administrators must also work with the security team to eliminate potential internal
weaknesses that could make data vulnerable. These could include issues related to network
privileges, even hardware or software misconfigurations that could be misused,resulting in data
leaks.
5. In-memory databases
Within the data warehousing community there are similar questions about columnar versus
row-based relational tables; the rise of in-memory databases, the use of flash or solid-state
disks (which also applies within transaction processing), clustered versus no-clustered solutions
and so on.
6. Big Data
To be clear, big data does not necessarily mean lots of data. What it really refers to is the ability
to process any type of data: what is typically referred toas semi-structured and unstructured
data as well as structured data. Current thinking is that these will typically live alongside
conventional solutions as separate technologies, at least in large organisations, but this will not
always be the case.
Integrating Trends
Projects involving databases should not be viewed and appreciated solely on how they adhere
to these trends. Ideally, each tool or process available should merge in some meaningful way
with existing operations. It is important to look of these trends as items that can coincide:
enhancing security and moving to the cloud coexist?
The Top Challenges and Solutions of Database Management
No matter what field you work in, there will be changes over time. As technology becomes
more and more advanced, everyone from doctors to politicians and athletes must learn to use
these changes to their advantage. While other professions have encountered these changes,
few have experienced them on the same level as database administrators.
Thirty years after the computerization of databases, the Internet has lead to an exponential
growth within the industry –whether indirectly or directly, everything that compiles data uses a
database. Recent times have proven to be an exceptional period of the production and
capturing of a nearly overwhelming amount of data. This has obviously created opportunities
for businesses to gain visibility into their customers and industry, but it has also created many
challenges in database management.
Database Management Problems
•Data Integration from Various Sources –With the advancement of smartphones, new mobile
applications, and the Internet of Things, businesses must be able to have their data adapt
accordingly. These varying types of data and sources cause a typical data center of today to
contain patchwork for data managementtechnologies. The management techniques have
become more diverse than ever.
•Public and Private DataSecurity –In today’s digital world, security is the most prevalent
concern. Businesses must be able to ensure that every bit of their data remains safe and at
limited risk of exposure from hackers or leaks. Database breaches of highly sensitive
information have led to the destroyed reputation of businesses. It is up to the manager of the
database to ensure that the data is fully secured at all times.
•The Management of Cloud-Based Databases –In recent years, the Cloud has become one of
the biggest terms in the tech community. Both businesses and consumers want to be able to
access their data from database from the cloud or from a cloud database provider’s servers in
addition to the standard on-premises mode of deployment. Cloud computingenables users to
effectively allocate resources, optimize scaling, and allow for high availability. Handling
database that run on the cloud and on-premises is yet another challenge for database
managers.
•The Growth of Structured and Unstructured Data –The amount of data that has being both
created and collected has been growing at an unprecedented rate for years. Those who deal
with analytics may be excited by the promise of insight and business intelligence that comes
from big data, but those who manage databases face the challenges that come along with
managing overall growth and data types from an increasing number of database platforms.
Database Management Solutions
There are four main areas to think about when thinking about approaching these database
problems. The following are a few things to consider as solutions:
•Data Strategy
o What kind of data is important and what kind of performance should be achieved?
What data needs to be protected and what should be analyzed?
o How much historical data must be accumulated? What does this mean for capacity
planning and disk space?
o Can you monetize on your data? Which data needs to be aggregated or correlated to
provide the necessary insights into the business?
•Database Support
o You must consider that moving to the cloud does not guarantee data backup and
security. This is something that must still be managed with 24/7 monitoring and
coverage.
o Are the right personnel members with the necessary skill sets always available?
•Backup Strategy
o Do you have the right kind of backup retention available?
o Have you determined the necessary backup frequency to determine the Recovery Point
Objective (RPO)?
o Have you determined the Recovery Time Objective (RTO) due to high availability
requirements?
•Security Strategy
o How will external and internal security be handled? Who can access what?
o What kind of data access policies should be in place?
o How are regulatory requirements handled?
o In the event of a hack, breach, or leak, how will data exposure be handled?