DDM Soft
DDM Soft
INTRODUCTION
Eg: Suresh,25,Chennai
Database:
A Database is a collection of related data organized in a way that data can be easily accessed,
managed and updated. Any piece of information can be a data, for example name of your school.
Database is actualy a place where related piece of information is stored and various operations can
be performed on it.
Database Management System (DBMS): The software which is used to manage database is
called Database Management System (DBMS). For Example, MySQL, Oracle etc. are popular
commercial DBMS used in different applications.
Data Definition: It helps in creation, modification and removal of definitions that define the organization of
data in database.
Data Updation: It helps in insertion, modification and deletion of the actual data in the database.
Data Retrieval: It helps in retrieval of data from the database which can be used by applications for various
purposes.
User Administration: It helps in registering and monitoring users, enforcing data security, monitoring
performance, maintaining data integrity, dealing with concurrency control and recovering information
corrupted by unexpected failure.
Database Applications:
DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server architecture is
used to deal with a large number of PCs, web servers, database servers and other
components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation which are connected
via the network.
o DBMS architecture depends upon how users are connected to the database to get their
request done.
Database architecture can be seen as a single tier or multi-tier. But logically, database architecture
is of two types like: 2-tier architecture and 3-tier architecture.
1- Tier Architecture
o In this architecture, the database is directly available to the user. It means the user can
directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide a
handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
o Tier Architecture in DBMS is the simplest architecture of Database in which the client,
server, and Database all reside on the same machine. A simple one tier architecture example
would be anytime you install a Database in your system and access it to practice SQL
queries. But such architecture is rarely used in production.
2- Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture,
applications on the client end can directly communicate with the database at the server side.
For this interaction, API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and
transaction management.
o To communicate with the DBMS, client-side application establishes a connection with the
server side.
o A 2 Tier Architecture in DBMS is a Database architecture where the presentation layer runs
on a client (PC, Mobile, Tablet, etc.), and data is stored on a server called the second tier.
Two tier architecture provides added security to the DBMS as it is not exposed to the end-
user directly. It also provides direct and faster communication.
3- Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this
architecture, client can't directly communicate with the server.
o The application on the client-end interacts with an application server which further
communicates with the database system.
o End user has no idea about the existence of the database beyond the application server. The
database also has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.
o 3-Tier database Architecture design is an extension of the 2-tier client-server architecture. A
3-tier architecture has the following layers:
DATABASE ENVIRONMENT
One of the primary aims of a database is to supply users with an abstract view of data, hiding
a certain element of how data is stored and manipulated. Therefore, the starting point for the
design of a database should be an abstract and general description of the information needs of
the organization that is to be represented in the database. And hence you will require an
environment to store data and make it work as a database.
VIEWS OF DATA/DATA ABSTRACTION:
A major purpose of a database system is to provide users with an abstract view of data. Ie., the
system hides certain details of how the data are stored and maintained.
Three Schema Architecture:
Separates the user applications and physical database. Schemas can be defined in three
levels:
(i) Internal Level:
It has an internal schema which describes physical storage structure of the database.
How the data are actually stored uses physical model
describes the complete details of data storage and access paths for the database.e.g.,
customer
What data are stored and what relationships exist among data.
Uses high level or implementational data model.
Hides the details of physical storage structures and describes
datatypes, relationships, operations and constraints.
e.g:
typecustomer = record
customer_id : string;
customer_name : string;
customer_street : string;
customer_city : string;
end;
(iii) External or View Level:
includes a number of external schemas or views.
Each external schema describes the part of the database and hides the rest.
Uses high level or implementation data model.
such as an employee’s salary.
In general, a relation schema consists of a directory of attributes and their corresponding domain.
Some Common Relational Model Terms
Physical Database Schema − This schema pertains to the actual storage of data and its
form of storage like files, indices, etc. It defines how the data will be stored in a secondary
storage.
Logical Database Schema − This schema defines all the logical constraints that need to be
applied on the data stored. It defines tables, views, and integrity constraints.
A database environment is a collective system of components that comprise and regulates the
group of data, management, and use of data, which consist of software, hardware, people,
techniques of handling database, and the data also.
Here, the hardware in a database environment means the computers and computer peripherals
that are being used to manage a database, and the software means the whole thing right from
the operating system (OS) to the application programs that include database management
software like M.S. Access or SQL Server. Again the people in a database environment include
those people who administrate and use the system. The techniques are the rules, concepts, and
instructions given to both the people and the software along with the data with the group of
facts and information positioned within the database environment.
Every system environment is made up of certain components that help the system to get
organized and managed. Even the database system environment is made up of the following
components:
Users: Users may be of various type such as DB administrator, System developer and End
users.
Database application : Database application may be Personal, Departmental, Enterprise and
Internal
DBMS: Software that allow users to define, create and manages database access, Ex: MySql,
Oracle etc.
Database: Collection of logical data.
1. Hardware
The hardware component of the database system environment includes all the physical
devices that comprise the database system. It includes storage devices, processors, input and
output devices, printers, network devices and many more.
2. Software
The software component of the database environment includes all the software that we require
to access, store and regulate the database. Like operating systems, DBMS and application
programs and utilities. The operating system invokes computer hardware, and let other
software runs. DBMS software controls and regulates the database. The application program
and utilities access the database and if required you can even manipulate the database.
3. People
If talk of the people component then it will include all the people who are related to the
database. There may be a group of people who will access the database just to resolve their
queries i.e. end-user, there may be people that are involved in designing the database i.e.
database designer.
There are four distinct types of people that participate in the DBMS environment: data and
database administrators, database designers, application developers, and the end-users.
Some may be involved in designing the applications that will have an interface through which
data entry is possible i.e. database programmer and analyst and some may also be there to
monitor the database i.e. database administrator.
4. Procedures
The procedure component of the database environment is nothing but the function that
regulates and controls the use of the database.
5. Data
Data component include a collection of related data which are the known fact that can be
recorded and it has an implicit meaning in the databsae.
System Utilities
Database system utilities are the tools that can be used by the database system administrator to
control and manage the database system.
System Utilities
Database system utilities are the tools that can be used by the database system administrator to
control and manage the database system.
1. Loading Utility
Loading database utility helps in loading the database file into the database. It efficiently
reformats the current format of data files to the format that is required by the destination
database file structure. Some loading programs or tools are specially designed for loading data
from one DBMS to another.
If you provide source database storage description and target database storage description to
these loading tools then it will automatically reformat the data files to target database storage
description.
2. Backup Utility
The backup utility in the database environment helps in creating a backup copy of the entire
database. Generally, the entire data of the database is copied to mass storage and we refer to it
as a backup copy. This backup copy can be used when there is a system failure or storage of
your system is corrupted.
You can always choose incremental backups which only record the changes from the previous
backup. Though the incremental backup requires a more complex algorithm it saves more
space as compared to regular backup.
3. Database Storage Reorganization Utility
Sometimes we need to relocate the set of database files to a different location. The database
storage reorganization utility helps to relocate and organize the database files to a different
location and it also produces a new access path to access the files from its new location.
4. Performance Monitoring Utility
Performance monitoring utility monitors the usage of the database by its user and provides
statistics for the same to the database administrator (DBA). The statistics provided by the
utility helps the DBA to decide whether it is required to reorganize the data files, whether
there is a need to add new indexes or not, whether some indexes to the files must be dropped
to improve the performance of the database system.
There are more utilities in the database environment that help in sorting the database file on
some basis, handling data compression on the large databases, monitoring the user’s access to
It is the management activities that permit the stages of the database system development life cycle
to be realized as efficiently and effectively as possible.
Database planning must be integrated with the overall IS strategy of the organization.
There are three main issues involved in formulating an IS strategies which are:
Identification of enterprise plans and goals with the subsequent purpose of information
systems requirements
Evaluation of current information systems to find out existing strengths and weaknesses
Appraisal of IT opportunities that might yield aggressive advantage
Ex.
―The purpose of our HW database system is to maintain the data that is used to
support hotel room rentals
Once mission statement is defined, mission objectives are defined which should identify a
particular task that the database must support.
Ex.
To maintain (insert, update, delete) data on the hotels, rooms, guests, and bookings.
System Definition
Describes scope and boundaries of database system and the major user views.
User view defines what is required of a database system from the perspective of:
– a particular job role (such as Manager or Supervisor) or
– enterprise application area (such as marketing, personnel, etc.).
Ø Get user requirements - collect and analyze information about the part of organization to be
supported by the database system.
Ø These requirements/features for the new database system are described in documents known as
the requirements specifications.
Ø Many techniques for gathering this information (fact-finding techniques)
Database Design
Database Design: Creating a design for a database that will support the mission statement and
mission objectives.
Ø Data Modeling is in the Database Design Phase.
Ø Building data model requires answering questions about entities, relationships, and attributes.
Three phases of database design:
– Conceptual database design
– Logical database design
– Physical database design.
Conceptual Database Design
Process of constructing a model of the data used, independent of all physical considerations.
Ø Conceptual data model is built using the information in users’ requirements specification.
Ø Ex. ER Diagram
Logical Database Design
Conceptual data model is independent of all physical considerations, a logical model is derived
knowing the underlying data model of the target DBMS.
Ø Ex. relational data model, normalization
The physical design of the database specifies a description of the physical configuration of the
database, such as the tables, file organizations, indexes, security, data types, and other parameters
in the data dictionary.
DBMS Selection
Selection of an appropriate DBMS to support the database system (if none exist).
Ø Undertaken at any time prior to logical design provided sufficient information is available
regarding system requirements.
Ø Check off DBMS features against requirements.
Ø Some DBMS examples include MySQL, Microsoft Access, SQL Server, Oracle
Implementation
Physical realization of the database and application designs.
– Use DDL to create database schemas and empty database files
– Use DDL to create any specified user views.
Transferring any existing data into new database and converting any existing applications to run on
new database.
Ø Only required when new database system is replacing an old system.
– DBMS normally has utility that loads existing files into new database.
Ø May be possible to convert and use application programs from old system for use by new
system.
Testing
Process of running the database system with the intent of finding errors.
Ø Use carefully planned test strategies and realistic data.
Ø Demonstrates that database and application programs appear to be working according to
requirements.
Operational Maintenance
Process of monitoring and maintaining database system following installation.
Ø Monitoring performance of system.
– if performance falls, may require tuning or reorganization of the database.
Ø Maintaining and upgrading database system (when required).
Incorporating new requirements into database application.
Explanation: 2
The different phases of database development life cycle (DDLC) in the Database Management
System (DBMS) are explained below −
● Requirement analysis.
● Database design.
● Evaluation and selection.
● Logical database design.
● Physical database design.
● Implementation.
● Data loading.
● Testing and performance tuning.
● Operation.
● Maintenance.
In order to collect all this information, a database analyst spends a lot of time within the
business organization talking to people, end users and getting acquainted with the day-to-day
process.
Database Design
In this phase the database designers will make a decision on the database model that perfectly
suits the organization’s requirement. The database designers will study the documents
prepared by the analysis in the requirement analysis stage and then start development of a
system model that fulfils the needs.
Implementation
Database implementation needs the formation of special storage related
constructs. These constructs consist of storage groups, table spaces, data files,
tables etc.
Data Loading
Once the database has been created, the data must be loaded into the
database. The data required to be converted, if the loaded date is in a different
format.
Operations
In this phase, the database is accessed by the end users and application programs. This stage
includes adding of new data, modifying existing data and deletion of absolute data. This
phase provides useful information and helps management to make a business decision.
Maintenance
It is one of the ongoing phases in DDLC.
The major tasks included are database backup and recovery, access management, hardware
maintenance etc.
3.REQUIREMENT COLLECTION
Before we can effectively design a database, we must know and analyze the expectations of the
users and the intended uses of the database in as much detail as possible. This process is
called requirements collection and analysis. To specify the requirements, we first identify the
other parts of the information system that will interact with the database system. These include new
and existing users and applications, whose requirements are then collected and analyzed. Typically,
the following activities are part of this phase:
The major application areas and user groups that will use the database or whose work will be
affected by it are identified. Key individuals and commit-tees within each group are chosen to carry
out subsequent steps of requirements collection and specification.
Existing documentation concerning the applications is studied and analyzed. Other
documentation—policy manuals, forms, reports, and organization charts—is reviewed to determine
whether it has any influence on the requirements collection and specification process.
The current operating environment and planned use of the information is studied. This
includes analysis of the types of transactions and their frequencies as well as of the flow of
information within the system. Geographic characteristics regarding users, origin of transactions,
destination of reports, and so on are studied. The input and output data for the transactions are
specified.
Written responses to sets of questions are sometimes collected from the potential database
users or user groups. These questions involve the users’ priorities and the importance they place on
various applications. Key individuals may be interviewed to help in assessing the worth of
information and in setting up priorities.
Requirement analysis is carried out for the final users, or customers, of the database system by a
team of system analysts or requirement experts. The initial requirements are likely to be informal,
incomplete, inconsistent, and partially incorrect. Therefore, much work needs to be done to
transform these early requirements into a specification of the application that can be used by
developers and testers as the starting point for writing the implementation and test cases. Because
the requirements reflect the initial understanding of a system that does not yet exist, they will
inevitably change. Therefore, it is important to use techniques that help customers converge
quickly on the implementation requirements.
There is evidence that customer participation in the development process increases customer
satisfaction with the delivered system. For this reason, many practitioners use meetings and
workshops involving all stakeholders. One such methodology of refining initial system
requirements is called Joint Application Design (JAD). More recently, techniques have been
developed, such as Contextual Design, which involve the designers becoming immersed in the
workplace in which the application is to be used. To help customer representatives better
understand the proposed system, it is common to walk through workflow or transaction scenarios
or to create a mock-up rapid prototype of the application.
The preceding modes help structure and refine requirements but leave them still in an informal
state. To transform requirements into a better-structured representa-tion, requirements
specification techniques are used. These include object oriented analysis (OOA), data flow
diagrams (DFDs), and the refinement of appli-cation goals. These methods use diagramming
techniques for organizing and presenting information-processing requirements. Additional
documentation in the form of text, tables, charts, and decision requirements usually accompanies
the dia-grams. There are techniques that produce a formal specification that can be checked
mathematically for consistency and what-if symbolic analyses. These methods may become
standard in the future for those parts of information systems that serve mission-critical functions
and which therefore must work as planned. The model-based formal specification methods, of
which the Z-notation and methodology is a prominent example, can be thought of as extensions of
the ER model and are there-fore the most applicable to information system design.
Some computer-aided techniques—called Upper CASE tools—have been proposed to help check
the consistency and completeness of specifications, which are usually stored in a single repository
and can be displayed and updated as the design progresses. Other tools are used to trace the links
between requirements and other design entities, such as code modules and test cases.
Such traceability databases are especially important in conjunction with enforced change-
management procedures for systems where the requirements change frequently. They are also used
in contractual projects where the development organization must provide documentary evidence to
the customer that all the requirements have been implemented.
The requirements collection and analysis phase can be quite time-consuming, but it is crucial to the
success of the information system. Correcting a requirements error is more expensive than
correcting an error made during implementation because the effects of a requirements error are
usually pervasive, and much more down-stream work has to be reimplemented as a result. Not
correcting a significant error means that the system will not satisfy the customer and may not even
be used at all. Requirements gathering and analysis is the subject of entire books.
When are Fact-Finding Techniques Used?
Many situations arise for fact-finding during the database system development life cycle. However,
fact-finding is particularly vital to the early stages of the life cycle, which includes database
planning, system definition, and requirements gathering, and analysis stages. It is during these early
stages where the database developer captures the necessary facts essential to build the required
database. Fact-finding is also used in the case of database design and the later stages of the
lifecycle but to a lesser extent. It is to be noted that it is important to make a rough estimation of
how much time and effort is required to be spent on fact-finding for a database project.
Fact-Finding Techniques
A database developer commonly uses several fact-finding techniques during a single database
project. There are five widely used fact-finding techniques:
Examining documentation
Interviewing
Observing the enterprise in action
Research
Questionnaires
1. Examining documentation can be helpful when you try to gain some insight as to how the
requirement for a database arose. You may also find that documentation can help to acquire
information on the part of the enterprise associated with the problem. If the problem relates to the
current system, there should have to be documents associated with that system. By examining
documents, forms, reports, and files associated with the current system, you can quickly gain some
thoughtful concepts out of the system.
2. Interviewing is the most frequently used, and usually the most useful, fact-finding procedure used.
We can interview to collect information from person face-to-face. There can be several objectives
for using interviewing, such as finding out facts, verifying those facts, clarifying these released
facts, generating enthusiasm, getting the end-user involved, identifying requirements, and gathering
ideas and opinions. However, using the interviewing practice must require proper communication
skills for dealing effectively with people who have different values, priorities, opinions,
motivations, and personalities.
3. Observing the enterprise in action: Observing the enterprise in action: Observation is one of the
most successful fact-finding techniques carried out for understanding a system. Using this
technique, it is achievable to either participate in or observe a person perform activities to learn
about the system.
4. Research: A useful fact-finding technique is to research the application or the problem that you are
dealing with and want to put within a database. Computer trade journals, reference books, and the
Internet are good sources of information that can make available the vast quantity of information on
how others have solved similar problems/issues plus whether or not any software packages exist to
resolve or even partially solve your current problem.
5. Questionnaires: Another fabulous fact-finding method is to conduct surveys through
questionnaires. Questionnaires are special-purpose documents that allow facts to be gathered from a
large number of people while upholding some control over their responses. When dealing with a
large number of listeners or audience, no other fact-finding technique can tabulate the same facts so
efficiently. There are two types of questions that can be asked in a questionnaire, namely free-
format and fixed-format. Free-format questions offer the respondent greater freedom inputting
answers. Fixed-format questions require specific responses from individuals, and for the given
question, the respondent must choose from the available answers.
Collecting Data
Collecting data is relatively easy, but turning raw information into something useful requires that
you know how to extract precisely what you need. In this module, intermediate to experienced
programmers interested in data analysis will learn techniques for working with data in a business
environment. You will learn how to look at data to discover what it contains, how to capture those
ideas in conceptual models, and then feed your understanding back into the organization through
business plans, metrics dashboards, and other applications. Along the way, you will experiment
with concepts through hands-on exercises at various points in the moule.
As mentioned earlier in this course, Requirements Analysis is the most important and most
labor-intensive stage in the DBLC. It is critical for the designer to approach Requirements
Analysis armed with a plan for each task in the process.
Experience is the great teacher when it comes to assessing informational needs, but there is
no substitute for preparation, specially for new designers. Most database designers begin
Requirements Analysis by examining the existing database(s) to establish a framework for
the remaining tasks. Analyzing how an organization stores data about its business objects[1],
and scrutinizing its perception of how it uses stored data (for example, gaining familiarity
with its business rules)[2] provides that framework.
Requirements Analysis
The modeler works with the end users of an organization to determine the data requirements of the
database. Information needed for the requirements analysis can be gathered in several ways:
Review of existing documents:Such documents include existing forms and reports, written
guidelines, job descriptions, and personal narratives. Paper documentation is a good way to become
familiar with the organization or activity you need to model.
Following are some of the tips for making the requirements collection process successful:
● Never assume that you know the customer's requirements. What you usually
think,could be quite different to what the customer wants. Therefore, always verify with
the customer when you have an assumption or a doubt.
● Get the end-users involved from the start. Get their support for what you do. ● At the
initial levels, define the scope and get customer's agreement. This helps you to
successfully focus on scope of features.
● When you are in the process of collecting the requirements, make sure that the
requirements are realistic, specific and measurable.
● Focus on making the requirements document crystal clear. Requirement document is the
only way to get the client and the service provider to an agreement. Therefore, there
should not be any gray area in this document. If there are gray areas, consider this would
lead to potential business issues.
● Do not talk about the solution or the technology to the client until all the requirements
are gathered. You are not in a position to promise or indicate anything to the client until
you are clear about the requirements.
● Before moving into any other project phases, get the requirements document signed off
by the client.
● If necessary, create a prototype to visually illustrate the requirements.
Finalizing the requirements of the system to be built forms the backbone for the ultimate success of the
project. It not only includes ascertaining the functions, but also the constraints of the system. The later part
is very important as the customer needs to be very clear about the services that are going to be offered by the
system. This will avoid any conflicts during the delivery or intermediate meetings with the client as the
client assumes that the system provides those functions which are actually constraints of the system.
When the requirements of the system are inaccurate, it may lead to the following problems:
1. Delivery schedules may be slipped.
2. Developed system may be rejected by the client leading to the loss of reputation and amount
spent on the project.
3. System developed may be unreliable.
4. Overall cost of the project may exceed the estimates.
There are different ways of finding the system requirements. Two of them are joint application development
and prototyping.
4. DATABASE DESIGN
The main objectives of database design in DBMS are to produce logical and physical designs
models of the proposed database system.
The logical model concentrates on the data requirements and the data to be stored independent
of physical considerations. It does not concern itself with how the data will be stored or where
it will be stored physically.
The physical data design model involves translating the logical DB design of the database
onto physical media using hardware resources and software systems such as database
management systems (DBMS).
Requirements analysis
● Planning – This stages of database design concepts are concerned with planning of
entire Database Development Life Cycle. It takes into consideration the Information
Systems strategy of the organization.
● System definition – This stage defines the scope and boundaries of the proposed
database system.
Database designing
● Logical model – This stage is concerned with developing a database model based on
requirements. The entire design is on paper without any physical implementations or
specific DBMS considerations.
● Physical model – This stage implements the logical model of the database taking into
account the DBMS and physical implementation factors.
Implementation
Testing
– this stage is concerned with the identification of errors in the newly implemented
system. It checks the database against requirement specifications.
The methodology is depicted as a bit by bit guide to the three main phases of database design,
namely: conceptual, logical, and physical design.
Logical database design - to convert the conceptual representation to the logical structure
of the database, which includes designing the relations.
Physical database design - to decide how the logical structure is to be physically
implemented (as base relations) in the target Database Management System (DBMS).
Database Design
Database design is the process of creating a design that will support the enterprise's mission
statement and mission objectives for the required database system. Two main approaches to the
design of a database are followed. These are:
bottom-up and
top-down
The bottom-up approach starts at the fundamental level of attributes (i.e., properties of entities
and relationships), which through analysis of the associations between attributes, are clustered into
relations that signify types of entities and relationships between entities.
A more appropriate strategy for the design of complex databases is to use the top-down
approach, which starts with the development of data models that holds few high-level entities and
relationships and then apply consecutive top-down refinements to identify lower-level entities,
relationships, and the associated attributes. The top-down approach can be understood better using
the concepts of the Entity-Relationship (ER) model, beginning with the identification of entities
and relationships between the entities, which are of interest to the organization.
A structured approach that uses procedures, techniques, tools, and documentation help to support
and make possible the process of design is called Design Methodology.
A design methodology encapsulates various phases, each containing some stages, which guide the
designer in the techniques suitable at each stage of the project. A design methodology also helps
the designer to plan, manage, control, and evaluate database development and managing projects.
Furthermore, it is a planned approach for analyzing and modeling a group of requirements for a
database in a standardized and ordered manner.
Conceptual
Logical and
Physical database design
Conceptual Database Design
In this design methodology, the process of constructing a model of the data is used in an enterprise,
independent of all physical considerations. The conceptual database design phase starts with the
formation of a conceptual data model of the enterprise that is entirely independent of
implementation details such as the target DBMS, use of application programs, programming
languages used, hardware platform, performance issues, or any other physical deliberations.
The following planning strategies are often critical to the success of database design:
These factors are constructed into the methodology that is presented for database design.
The first step in conceptual database design is to build one (or more) conceptual data replica of the
data requirements of the enterprise. A conceptual data model comprises these following elements:
entity types
types of relationship
attributes and the various attribute domains
primary keys and alternate keys
integrity constraints
The conceptual data model is maintained by documentation, including ER diagrams and a data
dictionary, which is produced throughout the development of the model.
you have already come across the basics of what methodologies are and their stages. You have
gathered the basic concept of what conceptual methodology is and how it works within the main
stages of the database system development life cycle.
A local logical data model is used to characterize the data requirements of one or more but not all
user views of a database, and a universal logical data model represents the data requirements for all
user views. The final step of the logical database design phase is to reflect on how well the model
can support possible future developments for the database system.
The objective of logical database design methodology is to interpret the conceptual data model into
a logical data model and then authorize this model to check whether it is structurally correct and
able to support the required transactions or not.
In this step of the database development life cycle, the main purpose is to translate the conceptual
data model created in conceptual methodology (of the previous chapter) into a logical data model
of the data requirements of the enterprise. This objective can be achieved by following the
activities given below:
The structure of the relational schema is authorized using normalization. It then makes sure to
ensure that the relations are capable of supporting the transactions given in the users' requirements
specification. You can then check those all-important integrity constraints that are characterized by
the logical data model. At this stage, the logical data model is authorized by the users to ensure that
they consider the model to be a true demonstration of the data requirements for the enterprise.
The relationship that an entity has with other entities is characterized using the primary key or
foreign key's concept. In deciding where to post the foreign key attribute(s), firstly, you must have
to identify the 'parent' and 'child' entities that are involved in that relationship. The parent entity
refers to the entity that posts a copy of its primary key into the relation that represents the child
entity to act as the foreign key. You can describe how relations are obtained for the following
structures that may occur in a conceptual data model:
In the previous step, you have derived a set of relations from signifying the conceptual data model
created in the earlier step. Now, in the next step, you have to validate the groupings of attributes in
each relation using the rules of normalization. The purpose of normalization is to ensure that the
position of relations has a minimal and yet sufficient number of attributes necessary to support the
data requirements of the enterprise.
The primary purpose of this step is to validate the logical data model to make certain that the model
supports the required transactions, as the users' requirements specification. By using the relations,
the primary key / foreign key links within the relations, the ER diagram, and the data dictionary,
you can attempt to perform the operations manually. If you can resolve all transactions in this way,
you can validate the logical data model against the transactions.
This physical methodology is the third and final phase of the database design methodology. Here,
the designer must decide how to translate the logical database design (i.e., the entities, attributes,
relationships, and constraints) into a physical database design, which can ultimately be
implemented using the target DBMS. As the various parts of physical database design are highly
reliant on the target DBMS, there may be more than one method of implementing any given
portion of the database. Consequently, to do this work appropriately, the designers must be fully
aware of the functionality of the target DBMS. They must recognize the advantages and
disadvantages of each alternative approach for a particular accomplishment. For some systems, the
designer may also need to select a suitable storage space/strategy that can take account of intended
database usage.
It is the process of making a description of the execution of the database on secondary storage,
which describes the base relations, file organizations as well as indexes used to gain efficient
access to the data and any associated integrity constraints and security measures.
In designing and presenting a database design methodology, you have to divide the design process
into three main stages or steps, also known as the Database development life cycle. These steps or
stages are:
Conceptual
Logical and
Physical database design
The phase before the physical design is the logical database design, which is largely independent of
implementation details, such as the specific functionality of the target DBMS and application
programs, but is reliant on the target data model. The outcome of this process is a logical data
model that consists of an ER/relation diagram, relational schema, and supporting documents that
depict this model, such as a data dictionary.
Logical database designs are concerned with the "what," and in contrast, physical database design
is concerned with the "how." It requires diverse skills that are often found in different people. In
particular, the physical database designer must know how the computer system hosts the DBMS
and how it operates and must be fully conscious of the working of the target DBMS.
It typically illustrates data requirements for a single project or application. Sometimes even
a part of an application
May be incorporated into other physical data models by means of a repository of shared
entities
It typically includes 10-1000 tables; although these numbers are highly variable, depending
on the scope of the data model
It has the relationships between tables that address cardinality and nullability (optionality)
of the relationships
Designed and developed to be reliant on a specific version of a DBMS, storage location of
data or technology
Database columns will have data types with accurate precisions and lengths assigned to
them. Columns will have nullability (optional) assigned
Tables and columns will have specific definitions
5. What is ER Modeling?
An Entity–relationship model (ER model) describes the structure of a database with the help of a
diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model is a
design or blueprint of a database that can later be implemented as a database. The main
components of E-R model are: entity set and relationship set.
An ER diagram shows the relationship among entity sets. An entity set is a group of similar entities
and these entities can have attributes. In terms of DBMS, an entity is a table or attribute of a table
in database, so by showing relationship among tables and their attributes, ER diagram shows the
complete logical structure of a database. Lets have a look at a simple ER diagram to understand
this concept.
ENTITY
An entity is an object that exists and is distinguishable from other objects.
Example: specific person, company, event, plant
An entity set is a set of entities of the same type that share the same properties.
Example: set of all persons, companies, trees, holidays
E-R DIAGRAM (ENTITY RELATIONSHIP DIAGRAM)
ER-Diagram is a visual representation of data that describes how data is related to each other.
E-R model is graphical in nature, thus making it easy to analyze and Observe relationship
between data elements
Most DBMS are based upon E-R model
A simple ER Diagram:
In the following diagram we have two entities Student and College and their relationship. The
relationship between Student and College is many to one as a college can have many students
however a student cannot study in multiple colleges at the same time. Student entity has attributes
such as Stu_Id, Stu_Name & Stu_Addr and College entity has attributes such as Col_ID &
Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss these terms
in detail in the next section(Components of a ER Diagram) of this guide so don’t worry too much
about these terms now, just go through them once.
1. Entity
2. Attribute
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student roll number
can uniquely identify a student from a set of students. Key attribute is represented by oval same as
other attributes however the text of key attribute is underlined.
2. Composite attribute:
An attribute that is a combination of other attributes is known as composite attribute. For example,
In student entity, the student address is a composite attribute as an address is composed of other
attributes such as pin code, state, country.
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is represented
with double ovals in an ER Diagram. For example – A person can have more than one phone
numbers so the phone number attribute is multivalued.
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It is
represented by dashed oval in an ER Diagram. For example – Person age is a derived attribute as it
changes over time and can be derived from another attribute (Date of birth).
E-R diagram with multivalued and derived attributes:
3. Relationship
When a single instance of an entity is associated with a single instance of another entity then it is
called one to one relationship. For example, a person has only one passport and a passport is given
to one person.
When a single instance of an entity is associated with more than one instances of another entity
then it is called one to many relationship. For example – a customer can place many orders but a
order cannot be placed by many customers.
When more than one instances of an entity is associated with more than one instances of another
entity then it is called many to many relationship. For example, a can be assigned to many projects
and a project can be assigned to many students.
Total participation of an entity set represents that each entity in entity set must have at least one
relationship in a relationship set. It is also called mandatory participation. For example: In the
following diagram each college must have at-least one associated Student. Total participation is
represented using a double line between the entity set and relationship set.
Partial participation of an entity set represents that each entity in the entity set may or may not
participate in the relationship instance in that relationship set. It is also called as optional
participation
Partial participation is represented using a single line between the entity set and relationship set.
Example: Consider an example of an IT company. There are many employees working for the
company. Let’s take the example of relationship between employee and role software engineer.
Every software engineer is an employee but not every employee is software engineer as there are
employees for other roles as well, such as housekeeping, managers, CEO etc. so we can say that
participation of employee entity set to the software engineer relationship is partial.
Super class shape has sub groups: Triangle, Square and Circle.
Sub classes are the group of entities with some unique attributes. Sub class inherits the properties
and attributes from super class.
Specialization and Generalization
Generalization is a process of generalizing an entity which contains generalized attributes or
properties of generalized entities.
It is a Bottom up process i.e. consider we have 3 sub entities Car, Truck and Motorcycle. Now
these three entities can be generalized into one super class named as Vehicle.
Specialization is a process of identifying subsets of an entity that share some different
characteristic. It is a top down approach in which one entity is broken down into low level entity.
In above example Vehicle entity can be a Car, Truck or Motorcycle.
Generalization Example
These two entities have two common attributes: Name and Address, we can make a generalized
entity with these common attributes. Lets have a look at the ER model after generalization.
The ER diagram after generalization:
We have created a new generalized entity Person and this entity has the common attributes of both
the entities. As you can see in the following ER diagram that after the generalization process the
entities Student and Teacher only has the specialized attributes Grade and Salary respectively and
their common attributes (Name & Address) are now associated with a new entity Person which is
in the relationship with both the entities (Student & Teacher).
Note:
1. Generalization uses bottom-up approach where two or more lower level entities combine
together to form a higher level new entity.
2. The new generalized entity can further combine together with lower level entity to create a
further higher level generalized entity.
DBMS Specialization
Specialization is a process in which an entity is divided into sub-entities. You can think of it as a
reverse process of generalization, in generalization two entities combine together to form a new
higher level entity. Specialization is a top-down process.
The idea behind Specialization is to find the subsets of entities that have few distinguish attributes.
For example – Consider an entity employee which can be further classified as sub-entities
Technician, Engineer & Accountant because these sub entities have some distinguish attributes.
Specialization Example
In the above diagram, we can see that we have a higher level entity ―Employee‖ which we have
divided in sub entities ―Technician‖, ―Engineer‖ & ―Accountant‖. All of these are just an employee
of a company, however their role is completely different and they have few different attributes. Just
for the example, I have shown that Technician handles service requests, Engineer works on a
project and Accountant handles the credit & debit details. All of these three employee types have
few attributes common such as name & salary which we had left associated with the parent entity
―Employee‖ as shown in the above diagram.
Category or Union
Relationship of one super or sub class with more than one super class.
Aggregration Example
In real world, we know that a manager not only manages the employee working under them but he
has to manage the project as well. In such scenario if entity ―Manager‖ makes a ―manages‖
relationship with either ―Employee‖ or ―Project‖ entity alone then it will not make any sense
because he has to manage both. In these cases the relationship of two entities acts as one entity. In
our example, the relationship ―Works-On‖ between ―Employee‖ & ―Project‖ acts as one entity that
has a relationship ―Manages‖ with the entity ―Manager‖.
It is quite simple to develop and maintain. In addition to this, it is easy to understand and
interpret as well, technically speaking.
Everything that is visually represented is easier to understand and maintain, and the same
goes for EER models.
It has been an efficient tool for database designers. It serves as a communication tool and
helps display the relationship between entities.
You can always convert the EER model into a table. Thus, it can easily be integrated into a
relational model.
The EER diagrams have many constraints and come up with limited features.
The Pareto Chart cannot be used for all the issues.
Faults in the scoring of data can happen, plus also there could be an error in the application.
Calculated on past data and therefore, cannot predict the future.
Types of UML: The UML diagrams are divided into two parts: Structural UML diagrams and
Behavioral UML diagrams which are listed below:
1. Structural UML diagrams
Class diagram
Package diagram
Object diagram
Component diagram
Composite structure diagram
Deployment diagram
2. Behavioral UML diagrams
Activity diagram
Sequence diagram
Use case diagram
State diagram
Communication diagram
Interaction overview diagram
Timing diagram
A class consists of its objects, and also it may inherit from other classes. A class diagram is used to
visualize, describe, document various different aspects of the system, and also construct executable
software code.
It shows the attributes, classes, functions, and relationships to give an overview of the software
system. It constitutes class names, attributes, and functions in a separate compartment that helps in
software development. Since it is a collection of classes, interfaces, associations, collaborations,
and constraints, it is termed as a structural diagram.
The main purpose of class diagrams is to build a static view of an application. It is the only
diagram that is widely used for construction, and it can be mapped with object-oriented languages.
It is one of the most popular UML diagrams. Following are the purpose of class diagrams given
below:
o Upper Section: The upper section encompasses the name of the class. A class is a
representation of similar objects that shares the same relationships, attributes, operations,
and semantics. Some of the following rules that should be taken into account while
representing a class are given below:
a.Capitalize the initial letter of the class name.
b. Place the class name in the center of the upper section.
c. A class name must be written in bold format.
d. The name of the abstract class should be written in italics format.
o Middle Section: The middle section constitutes the attributes, which describe the quality of
the class. The attributes have the following characteristics:
. The attributes are written along with its visibility factors, which are public (+), private (-),
protected (#), and package (~).
a. The accessibility of an attribute class is illustrated by the visibility factors.
b. A meaningful name should be assigned to the attribute, which will explain its usage
inside the class.
o Lower Section: The lower section contain methods or operations. The methods are
represented in the form of a list, where each method is written in a single line. It
demonstrates how a class interacts with data.
Relationships
In UML, relationships are of three types:
Multiplicity: It defines a specific range of allowable instances of attributes. In case if a range is not
specified, one is considered as a default multiplicity.
The company encompasses a number of employees, and even if one employee resigns, the
company still exists.
Composition: The composition is a subset of aggregation. It portrays the dependency between the
parent and its child, which means if one part is deleted, then the other part also gets discarded. It
represents a whole-part relationship.
A contact book consists of multiple contacts, and if you delete the contact book, all the contacts
will be lost.
Abstract Classes
In the abstract class, no objects can be a direct entity of the abstract class. The abstract class can
neither be declared nor be instantiated. It is used to find the functionalities across the classes. The
notation of the abstract class is similar to that of class; the only difference is that the name of the
class is written in italics. Since it does not involve any implementation for a given function, it is
best to use the abstract class with multiple objects.
Let us assume that we have an abstract class named displacement with a method declared inside it,
and that method will be called as a drive (). Now, this abstract class method can be implemented
by any object, for example, car, bike, scooter, cycle, etc.
How to draw a Class Diagram?
The class diagram is used most widely to construct software applications. It not only represents a
static view of the system but also all the major aspects of an application. A collection of class
diagrams as a whole represents a system.
Some key points that are needed to keep in mind while drawing a class diagram are given below:
The class diagram is used to represent a static view of the system. It plays an essential role in the
establishment of the component and deployment diagrams. It helps to construct an executable code
to perform forward and backward engineering for any system, or we can say it is mainly used for
construction. It represents the mapping with object-oriented languages that are C++, Java, etc.
Class diagrams can be used for the following purposes:
Object diagrams are dependent on the class diagram as they are derived from the class diagram. It
represents an instance of a class diagram. The objects help in portraying a static view of an object-
oriented system at a specific instant.
Both the object and class diagram are similar to some extent; the only difference is that the class
diagram provides an abstract view of a system. It helps in visualizing a particular functionality of a
system.
Notation of an Object Diagram
The object diagram holds the same purpose as that of a class diagram. The class diagram provides
an abstract view which comprises of classes and their relationships, whereas the object diagram
represents an instance at a particular point of time.
The object diagram is actually similar to the concrete (actual) system behavior. The main purpose
is to depict a static view of a system.
2. Dynamic changes are not included in the class Dynamic changes are captured in the
diagram. object diagram.
3. The data values and attributes of an instance are It incorporates data values and
not involved here. attributes of an entity.
4. The object behavior is manipulated in the class Objects are the instances of a class.
diagram.
UNIT II RELATIONAL MODEL AND SQL 10
Relational model concepts -- Integrity constraints -- SQL Data manipulation
– SQL Data definition – Views -- SQL programming.
Update Operation
You can see that in the below-given relation table CustomerName= ‘Apple’ is
updated from Inactive to Active.
Delete Operation
To specify deletion, a condition on the attributes of the relation selects the tuple
to be deleted.
Properties of Relations
Name of the relation is distinct from all other relations.
Each relation cell contains exactly one atomic (single) value
Each attribute contains a distinct name
Attribute domain has no significance
tuple has no duplicate value
Order of tuple can have a different sequence
INTEGRITY CONSTRAINTS
Integrity constraints are a set of rules. It is used to maintain the quality of
information.
Integrity constraints ensure that the data insertion, updating, and other
processes have to be performed in such a way that data integrity is not
affected.
Thus, integrity constraint is used to guard against accidental damage to
the database.
Types of Integrity Constraint
1. Domain constraints
Domain constraints can be defined as the definition of a valid set of
values for an attribute.
The data type of domain includes string, character, integer, time, date,
currency, etc. The value of the attribute must be available in the
corresponding domain.
Example:
4. Key constraints
Keys are the entity set that is used to identify an entity within its entity set
uniquely.
An entity set can have multiple keys, but out of which one key will be the
primary key. A primary key can contain a unique and null value in the
relational table.
Example:
Introduction to DDL
DDL stands for Data Definition Language.
It is a language used for defining and modifying the data and its structure.
It is used to build and modify the structure of your tables and other
objects in the database.
DDL commands are as follows,
1. CREATE
2. DROP
3. ALTER
4. RENAME
5. TRUNCATE
These commands can be used to add, remove or modify tables within a
database.
DDL has pre-defined syntax for describing the data.
1. CREATE COMMAND
CREATE command is used for creating objects in the database.
It creates a new table.
Syntax:
CREATE TABLE <table_name>
( column_name1 datatype,
column_name2 datatype,
.
.
.
column_name_n datatype
);
Example : CREATE command
CREATE TABLE employee
(
empid INT,
ename CHAR,
age INT,
city CHAR(25),
phone_no VARCHAR(20)
);
2. DROP COMMAND
DROP command allows to remove entire database objects from the
database.
It removes entire data structure from the database.
It deletes a table, index or view.
Syntax:
DROP TABLE <table_name>;
OR
DROP DATABASE <database_name>;
Example : DROP Command
DROP TABLE employee;
OR
DROP DATABASE employees;
3. ALTER COMMAND
An ALTER command allows to alter or modify the structure of the
database.
It modifies an existing database object.
Using this command, you can add additional column, drop existing
column and even change the data type of columns.
Syntax:
ALTER TABLE <table_name>
ADD <column_name datatype>;
OR
OR
OR
OR
4. RENAME COMMAND
RENAME command is used to rename an object.
It renames a database table.
Syntax:
RENAME TABLE <old_name> TO <new_name>;
Example:
RENAME TABLE emp TO employee;
5. TRUNCATE COMMAND
TRUNCATE command is used to delete all the rows from the table
permanently.
It removes all the records from a table, including all spaces allocated for
the records.
This command is same as DELETE command, but TRUNCATE
command does not generate any rollback data.
Syntax:
TRUNCATE TABLE <table_name>;
Example:
TRUNCATE TABLE employee;
SQL COMMANDS
SQL commands are instructions. It is used to communicate with the database. It is
also used to perform specific tasks, functions, and queries of data.
SQL can perform various tasks like create a table, add data to tables, drop the table,
modify the table, set permission for users.
c. ALTER: It is used to alter the structure of the database. This change could be
either to modify the characteristics of an existing attribute or probably to add a
new attribute.
Syntax:
To add a new column in the table
1. ALTER TABLE table_name ADD column_name COLUMN-definition;
To modify existing column in the table:
1. ALTER TABLE table_name MODIFY(column_definitions. .. );
EXAMPLE
1. ALTER TABLE STU_DETAILS ADD(ADDRESS VARCHAR2(20));
2. ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));
d. TRUNCATE: It is used to delete all the rows from the table and free the
space containing the table.
Syntax:
1. TRUNCATE TABLE table_name;
Example:
1. TRUNCATE TABLE EMPLOYEE;
SQL VIEWS
SQL CREATE VIEW Statement
In SQL, a view is a virtual table based on the result-set of an SQL statement.
A view contains rows and columns, just like a real table. The fields in a view
are fields from one or more real tables in the database.
You can add SQL statements and functions to a view and present the data as if
the data were coming from one single table.
A view is created with the CREATE VIEW statement.
CREATE VIEW Syntax
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Note: A view always shows up-to-date data! The database engine recreates the
view, every time a user queries it.
SQL CREATE VIEW Examples
The following SQL creates a view that shows all customers from Brazil:
Example
CREATE VIEW [Brazil Customers] AS
SELECT CustomerName, ContactName
FROM Customers
WHERE Country = 'Brazil';
We can query the view above as follows:
Example
SELECT * FROM [Brazil Customers];
The following SQL creates a view that selects every product in the "Products"
table with a price higher than the average price:
Example
CREATE VIEW [Products Above Average Price] AS
SELECT ProductName, Price
FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products);
We can query the view above as follows:
Example
SELECT * FROM [Products Above Average Price];
Creating Views
Database views are created using the CREATE VIEW statement. Views can be
created from a single table, multiple tables or another view.
To create a view, a user must have the appropriate system privilege according to
the specific implementation.
The basic CREATE VIEW syntax is as follows −
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE [condition];
You can include multiple tables in your SELECT statement in a similar way as
you use them in a normal SQL SELECT query.
Example
Consider the CUSTOMERS table having the following records −
+ + + + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ + + + + +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ + + + + +
Following is an example to create a view from the CUSTOMERS table. This
view would be used to have customer name and age from the CUSTOMERS
table.
SQL > CREATE VIEW CUSTOMERS_VIEW AS
SELECT name, age
FROM CUSTOMERS;
Now, you can query CUSTOMERS_VIEW in a similar way as you query an
actual table. Following is an example for the same.
SQL > SELECT * FROM CUSTOMERS_VIEW;
This would produce the following result.
+ + +
| name | age |
+ + +
| Ramesh | 32 |
| Khilan | 25 |
| kaushik | 23 |
| Chaitali | 25 |
| Hardik | 27 |
| Komal | 22 |
| Muffy | 24 |
+ + +
Overview
1. Advantages of Using PL/pgSQL
2. Supported Argument and Result Data Types
PL/pgSQL is a loadable procedural language for the PostgreSQL database
system. The design goals of PL/pgSQL were to create a loadable procedural
language that
can be used to create functions, procedures, and triggers,
adds control structures to the SQL language,
can perform complex computations,
inherits all user-defined types, functions, procedures, and operators,
can be defined to be trusted by the server,
is easy to use.
Functions created with PL/pgSQL can be used anywhere that built-in functions
could be used. For example, it is possible to create complex conditional
computation functions and later use them to define operators or use them in index
expressions.
In PostgreSQL 9.0 and later, PL/pgSQL is installed by default. However it is still
a loadable module, so especially security-conscious administrators could choose
to remove it.
1.1. Advantages of Using PL/pgSQL
SQL is the language PostgreSQL and most other relational databases use as query
language. It's portable and easy to learn. But every SQL statement must be
executed individually by the database server.
That means that your client application must send each query to the database
server, wait for it to be processed, receive and process the results, do some
computation, then send further queries to the server. All this incurs interprocess
communication and will also incur network overhead if your client is on a
different machine than the database server.
With PL/pgSQL you can group a block of computation and a series of queries
inside the database server, thus having the power of a procedural language and
the ease of use of SQL, but with considerable savings of client/server
communication overhead.
Extra round trips between client and server are eliminated
Intermediate results that the client does not need do not have to be
marshaled or transferred between server and client
Multiple rounds of query parsing can be avoided
This can result in a considerable performance increase as compared to an
application that does not use stored functions.
Also, with PL/pgSQL you can use all the data types, operators and functions of
SQL.
1.2. Supported Argument and Result Data Types
Functions written in PL/pgSQL can accept as arguments any scalar or array data
type supported by the server, and they can return a result of any of these types.
They can also accept or return any composite type (row type) specified by name.
It is also possible to declare a PL/pgSQL function as accepting record, which
means that any composite type will do as input, or as returning record, which
means that the result is a row type whose columns are determined by specification
in the calling query, as discussed in Section 7.2.1.4.
PL/pgSQL functions can be declared to accept a variable number of arguments
by using the VARIADIC marker. This works exactly the same way as for SQL
functions, as discussed in Section 38.5.6.
PL/pgSQL functions can also be declared to accept and return the polymorphic
types described in Section 38.2.5, thus allowing the actual data types handled by
the function to vary from call to call. Examples appear in Section 43.3.1.
PL/pgSQL functions can also be declared to return a “set” (or table) of any data
type that can be returned as a single instance. Such a function generates its output
by executing RETURN NEXT for each desired element of the result set, or by
using RETURN QUERY to output the result of evaluating a query.
Finally, a PL/pgSQL function can be declared to return void if it has no useful
return value. (Alternatively, it could be written as a procedure in that case.)
PL/pgSQL functions can also be declared with output parameters in place of an
explicit specification of the return type. This does not add any fundamental
capability to the language, but it is often convenient, especially for returning
multiple values. The RETURNS TABLE notation can also be used in place of
RETURNS SETOF.
2. Structure of PL/pgSQL
Functions written in PL/pgSQL are defined to the server by executing CREATE
FUNCTION commands. Such a command would normally look like, say,
CREATE FUNCTION somefunc(integer, text) RETURNS integer
AS 'function body text'
LANGUAGE plpgsql;
The function body is simply a string literal so far as CREATE FUNCTION is
concerned. It is often helpful to use dollar quoting (see Section 4.1.2.4) to write
the function body, rather than the normal single quote syntax. Without dollar
quoting, any single quotes or backslashes in the function body must be escaped
by doubling them. Almost all the examples in this chapter use dollar-quoted
literals for their function bodies.
PL/pgSQL is a block-structured language. The complete text of a function body
must be a block. A block is defined as:
[ <<label>> ]
[ DECLARE
declarations ]
BEGIN
statements
END [ label ];
Each declaration and each statement within a block is terminated by a semicolon.
A block that appears within another block must have a semicolon after END, as
shown above; however the final END that concludes a function body does not
require a semicolon.
Tip
A common mistake is to write a semicolon immediately after BEGIN. This is
incorrect and will result in a syntax error.
A label is only needed if you want to identify the block for use in an EXIT
statement, or to qualify the names of the variables declared in the block. If a label
is given after END, it must match the label at the block's beginning.
All key words are case-insensitive. Identifiers are implicitly converted to lower
case unless double-quoted, just as they are in ordinary SQL commands.
Comments work the same way in PL/pgSQL code as in ordinary SQL. A double
dash (--) starts a comment that extends to the end of the line. A /* starts a block
comment that extends to the matching occurrence of */. Block comments nest.
Any statement in the statement section of a block can be a subblock. Subblocks
can be used for logical grouping or to localize variables to a small group of
statements. Variables declared in a subblock mask any similarly-named variables
of outer blocks for the duration of the subblock; but you can access the outer
variables anyway if you qualify their names with their block's label. For example:
CREATE FUNCTION somefunc() RETURNS integer AS $$
<< outerblock >>
DECLARE
quantity integer := 30;
BEGIN
RAISE NOTICE 'Quantity here is %', quantity; -- Prints 30
quantity := 50;
--
-- Create a subblock
--
DECLARE
quantity integer := 80;
BEGIN
RAISE NOTICE 'Quantity here is %', quantity; -- Prints 80
RAISE NOTICE 'Outer quantity here is %', outerblock.quantity; -- Prints
50
END;
RETURN quantity;
END;
$$ LANGUAGE plpgsql;
UNIT - III
Syllabus
ER-to-Relational Mapping – Update anomalies-Functional Dependencies – Inference
rules-minimal cover-properties of relational decomposition- Normalization (upto
BCNF).
ER to Relational Mapping
In this section we will discuss how to map various ER model constructs to Relational
Model construct.
The SQL statement captures the information for above ER diagram as follows -
The SQL statement captures the information for relationship present in above ER
diagram as follows -
o By this approach the relationship associated with more than one entities is
separately represented using a table. For example - Consider following ER
diagram. Each Dept has at most one manager, according to the key
constraint on Manages.
Here the constraint is each department has at the most one manager to manage it.
Hence no two tuples can have same DeptID. Hence there can be a separate table
named Manages with DeptID as Primary Key. The table can be defined using
following SQL statement
Approach 2 :
Method 1 : All the entities in the relationship are mapped to individual tables
InventoryItem(ID , name)
Book(ID,Publisher)
DVD(ID, Manufacturer)
Method 2 : Only subclasses are mapped to tables. The attributes in the superclass
are duplicated in all subclasses. For example -
Book(ID,name,Publisher)
DVD(ID, name,Manufacturer)
Method 3 : Only the superclass is mapped to a table. The attributes in the subclasses
are taken to the superclass. For example -
InventoryItem(ID , name,Publisher,Manufacturer)
This method will introduce null values. When we insert a Book record in the table, the
Manufacturer column value will be null. In the same way, when we insert a DVD record
in the table, the Publisher value will be null.
R N
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
R N
1 AAA
2 BBB
3 CCC
1 XXX
2 YYY
In above table for RollNumber 1 we are getting two different names - “AAA” and
“XXX”. Hence here it does not hold the functional dependency.
Solution : For finding the closure of functional dependencies - Refer example 2.8.1.
We can identify candidate from the given relation schema with the help of functional
dependency. For that purpose we need to compute the closure set of attribute. Now we
will find out the closure set which can completely identify the relation R(A,B,C,D).
Let, (A)+ = {ABCDE}
(B)+ = {BD}
(C)+ = {C}
(D)+ = {D}
(E)+ = {ABCDE}
(CD)+ = {ABCDE}
Clearly, only (A)+,(E)+ and (CD)+ gives us {ABCD} i.e. complete relation R. Hence these
are the candidate keys.
3.8.2 Canonical Cover or Minimal Cover
Formal Definition : A minimal cover for a set F of FDs is a set G of FDs such that :
1) Every dependency in G is of the form X->A, where A is a single attribute.
2) The closure F+ is equal to the closure G+.
3) If we obtain a set H of dependencies from G by deleting one or more dependencies
or by deleting attributes from a dependency in G, then F+ H+.
Concept of Extraneous Attributes
Definition : An attribute of a functional dependency is said to be extraneous if we can
remove it without changing the closure of the set of functional dependencies. The formal
definition of extraneous attributes is as follows:
Consider a set F of functional dependencies and the functional dependency in F
B->D
B->E
Step 2 : Find the redundant entries and delete them. This can be done as follows -
A->CD
B->AE
This is a minimal cover or Canonical cover of functional dependencies.
3.4 Concept of Redundancy and Anomalies
Definition : Redundancy is a condition created in database in which same piece of
data is held at two different places.
Redundancy is at the root of several problems associated with relational schemas.
Problems caused by redundancy : Following problems can be caused by redundancy-
i) Redundant storage : Some information is stored repeatedly.
ii) Update anomalies : If one copy of such repeated data is updated then inconsistency
is created unless all other copies are similarly updated.
iii) Insertion anomalies : Due to insertion of new record repeated information get
added to the relation schema.
iv) Deletion anomalies : Due to deletion of particular record some other important
information associated with the deleted record get deleted and thus we may lose
some other important information from the schema.
Example : Following example illustrates the above discussed anomalies or redundancy
problems
Consider following Schema in which all possible information about Employee is
stored.
1) Redundant storage : Note that the information about DeptID, DeptName and
DeptLoc is repeated.
2) Update anomalies : In above table if we change DeptLoc of Pune to Chennai, then
it will result inconsistency as for DeptID 101 the DeptLoc is Pune. Or otherwise, we
need to update multiple copies of DeptLoc from Pune to Chennai. Hence this is an
update anomaly.
3) Insertion anomalies : For above table if we want to add new tuple say
(5, EEE,50000) for DeptID 101 then it will cause repeated information of
(101, XYZ,Pune) will occur.
4) Deletion anomalies : For above table, if we delete a record for EmpID 4, then
automatically information about the DeptID 102,DeptName PQR and DeptLoc
Mumbai will get deleted and one may not be aware about DeptID 102. This causes
deletion anomaly.
3.10 Decomposition AU : Dec.-17, Marks 7
Decomposition is the process of breaking down one table into multiple tables.
Formal definition of decomposition is -
A decomposition of relation Schema R consists of replacing the relation Schema by
two relation schema that each contain a subset of attributes of R and together
include all attributes of R by storing projections of the instance.
For example - Consider the following table
Employee_Department table as follows -
Employee Table
Eid Ename Age City Salary
E001 ABC 29 Pune 20000
E002 PQR 30 Pune 30000
E003 LMN 25 Mumbai 5000
E004 XYZ 24 Mumbai 4000
E005 STU 32 Hyderabad 25000
Department Table
Deptid Eid DeptName
D001 E001 Finance
D002 E002 Production
D003 E003 Sales
D004 E004 Marketing
D005 E005 Human Resource
The decomposition is used for eliminating redundancy.
For example : Consider following relation Schema R in which we assume that the
grade determines the salary, the redundancy is caused
Schema R
Hence, the above table can be decomposed into two Schema S and T as follows :
Schema S Schema T
Name eid deptname Grade Grade Salary
AAA 121 Accounts 2 2 8000
AAA 132 Sales 3 3 7000
BBB 101 Marketing 4 4 7000
CCC 106 Purchase 2 2 8000
Relation R1 = (A,B,C)
A B C
a 1 x
b 2 x
Relation R2 = (C,D,E)
C D E
x p q
X r s
Step 2 : Now we will join these tables using natural join, i.e. the join based on
common attribute C. We get R1 ⋈ R2 as
A B C D E
a 1 x p q
Here we get more rows or
a 1 x r s tuples than original
b 2 x p q relation R
b 2 x r s
Clearly R1 ⋈ R2 R. Hence it is not lossless decomposition.
Example 3.10.4 Consider the relation R (A, B, C) for functional dependency set {A -> B and
B -> C} which is decomposed into two relations R1 = (A, C) and R2 = (B, C). Then check if
this decomposition dependency preserving or not.
Solution : This can be solved in following steps :
F+ = (F1 F2)+
Step 2 : We have with us the F+ ={ A->B and B->C }
Step 3 : Let us find (F1)+ for relation R1 and (F2)+ for relation R2
R1(A,C) R2(B,C)
A->A Trivial B->B Trivial
C->C Trivial C->C Trivial
A->C In (F)+A->B->C and it is Nontrivial B->C In (F)+ B->C and it is Non-Trivial
AC->AC Trivial BC->BC Trivial
A->B but is not useful as B is not part of R1 We can not obtain C->B
set
We can not obtain C->A
Step 4 : We will eliminate all the trivial relations and useless relations. Hence we can obtain
R1 and R2 as
R1(A,C) R2(B,C)
A->C Nontrivial
B->C Non-Trivial
(F1∪ F2)+ = {A->C, B->C} {A->B, B->C} i.e.(F)+
Thus the condition specified in step 1 i.e. F+=(F1 F2)+ is not true. Hence it is not
dependency preserving decomposition.
Example 3.10.5 Let relation R(A,B,C,D) be a relational schema with following functional
dependencies {A->B, B->C,C->D, and D->B}. The decomposition of R into (A,B), (B,C)
and (B,D). Check whether this decomposition is dependency preserving or not.
Solution :
Step 1 : Let (F)+ = {A->B, B->C, C->D,D->B}.
Step 2 : We will find (F1)+, (F2)+, (F3)+ for relations R1(A,B) , R2(B,C) and R3(B,D) as
follows -
Step 3 : We will eliminate all the trivial relations and useless relations. Hence we
can obtain R1 ∪ R2 ∪ R3 as
R1(A,B) R2(B,C) R2(B,D)
A->B B->C B-> D
C->B D->B
As there are multiple values of phone number for sid 1 and 3, the above table is not in
1NF. We can make it in 1NF. The conversion is as follows -
Student_Course
sid sname cid cname
1 AAA 101 C
2 BBB 102 C++
3 CCC 101 C
4 DDD 103 Java
This table is not in 2NF. For converting above table to 2NF we must follow the
following steps -
Step 1 : The above table is in 1NF.
Step 2 : Here sname and sid are associated similarly cid and cname are associated
with each other. Now if we delete a record with sid=2, then automatically the
course C++ will also get deleted. Thus,
sid->sname or cid->cname is a partial functional dependency, because {sid,cid}
should be essentially a candidate key for above table. Hence to bring the above table
to 2NF we must decompose it as follows :
Student
Here candidate key is
sid sname cid (sid,cid)
and
1 AAA 101
(sid,cid)->sname
2 BBB 102
3 CCC 101
4 DDD 103
Course
cid cname
Here candidate key is
101 C cid
101 C
103 Java
101 1 AAA
102 2 BBB
103 3 CCC
104 4 DDD
Superkeys
{RegID}
{RegID, RollNo}
{RegID,Sname}
{RollNo,Sname}
{RegID, RollNo,Sname}
Candidate Keys
{RegID}
{RollNo}
Zip
zipcode cityname state
11111 Pune Maharashtra
22222 Surat Gujarat
33333 Chennai Tamilnadu
44444 Jaipur Rajasthan
55555 Mumbai Maharashtra
Example 3.11.1 Consider the relation R = {A, B, C, D, E, F, G, H, I, J} and the set of
functional dependencies F= {{A, B} C, A {D, E}, B F, F {G, H}, D {I, J} }
1. What is the key for R ? Demonstrate it using the inference rules.
2. Decompose R into 2NF, then 3NF relations.
Solution : Let,
A DE (given)
A D, A E
As D I J, A I J
Using union rule we get
A DEIJ
As AA
we get A ADEIJ
Using augmentation rule we compute AB
AB ABDEIJ
But AB C (given)
AB ABCDEIJ
B F (given) F GH B GH (transitivity)
AB AGH is also true
Similarly AB AF ∵ B F (given)
Thus now using union rule
AB ABCDEFGHIJ
AB is a key
The table can be converted to 2NF as
R1 = (A, B, C)
R2 = (A, D, E, I, J)
R3 = (B, F, G, H)
The above 2NF relations can be converted to 3NF as follows
R1 = (A, B, C)
R2 = (A, D, E)
R3 = (D, I, J)
R4 = (B, E)
R5 = (E, G, H).
University Questions
1. What is database normalization ? Explain the first normal form, second normal form and third
normal form. AU : May-18, Marks 13; Dec.-15, Marks 16
2. What are normal forms. Explain the types of normal form with an example.
AU : Dec.-14, Marks 16
Enrollment Table
sid course Teacher
1 C Ankita
1 Java Poonam
2 C Ankita
3 C++ Supriya
4 C Archana
From above table following observations can be made :
One student can enroll for multiple courses. For example student with sid=1 can
enroll for C as well as Java.
For each course, a teacher is assigned to the student.
There can be multiple teachers teaching one course for example course C can be
taught by both the teachers namely - Ankita and Archana.
The candidate key for above table can be (sid,course), because using these two
columns we can find
The above table holds following dependencies
o (sid,course)->Teacher
o Teacher->course
The above table is not in BCNF because of the dependency teacher->course. Note
that the teacher is not a superkey or in other words, teacher is a non prime
attribute and course is a prime attribute and non-prime attribute derives the prime
attribute.
To convert the above table to BCNF we must decompose above table into Student
and Course tables
Student
sid Teacher
1 Ankita
1 Poonam
2 Ankita
3 Supriya
4 Archana
Course
Teacher course
Ankita C
Poonam Java
Ankita C
Supriya C++
Archana C
(AC) + = {AC} R
There is no involvement of D on LHS of the FD rules. Hence D can not be part of any
candidate key. Thus we obtain two candidate keys (AB)+ and (BC)+. Hence
prime attributes = {A,B,C}
Non prime attributes = {D}
Step 2 : Now, we will start checking from reverse manner, that means from BCNF,
then 3NF, then 2NF.
Step 3 : For R being in BCNF for X->Y the X should be candidate key or super key.
From above FDs consider C->D in which C is not a candidate key or super key.
Hence given relation is not in BCNF.
Step 4 : For R being in 3NF for X->Y either i) the X should be candidate key or super
key or ii) Y should be prime attribute. (For prime and non prime attributes refer
step 1)
Q.2 Give the limitations of E-R model ? How do you overcome this ? AU : May-07
Ans. : 1) Loss of information content : Some information be lost or hidden in ER
model
2) Limited relationship representation : ER model represents limited relationship as
compared to another data models like relational model etc.
3) No representation of data manipulation : It is difficult to show data manipulation
in ER model.
4) Popular for high level design : ER model is very popular for designing high level
design.
Q.9 Why certain functional dependencies are called trivial functional dependencies ?
AU : May-06,12
Ans. : A functional dependency FD : X → Y is called trivial if Y is a subset of X.
This kind of dependency is called trivial because it can be derived from common
sense. If one "side" is a subset of the other, it's considered trivial. The left side is
considered the determinant and the right the dependent.
For example - {A,B} –> B is a trivial functional dependency because B is a subset of
A,B. Since {A,B} –> B includes B, the value of B can be determined. It's a trivial
functional dependency because determining B is satisfied by its relationship to
A,B
Q.13 Describe BCNF and describe a relation which is in BCNF. AU : Dec. -02
Ans. : Refer section 3.12.
Q.14 Why 4NF in normal form is more desirable than BCNF ? AU : Dec. -14
Ans. :
4NF is more desirable than BCNF because it reduces the repetition of information.
If we consider a BCNF schema not in 4NF we observe that decomposition into
4NF does not lose information provided that a lossless join decomposition is used,
yet redundancy is reduced.
Q.15 Give an example of a relation schema R and set of dependencies such that R is in
BCNF but not in 4NF. AU : May -12
Ans. : Consider relation R(A,B,C,D) with dependencies
AB C
ABC D
AC B
Here the only key is AB. Thus each functional dependency has superkey on the left.
But MVD has non-superky on its left. So it is not 4NF.
Ans. : Decomposition is the process of breaking down one table into multiple
tables.
The decomposition is used for eliminating redundancy.
TRANSACTION MANAGEMENT
Syllabus
Transaction Concepts - ACID Properties - Schedules - Serializability - Concurrency Control - Need
for Concurrency - Locking Protocols - Two Phase Locking
Contents
4.1 Transaction Concepts .................................................Dec.-14 .................................. Marks 4
4.2 ACID Properties ...........................................................May-14, 18 .................................. Marks 8
4.3 Transaction States .......................................................Dec.-11, May-14,18................... Marks 8
4.4 Schedules
4.5 Serializability ................................................................Dec.-15, May-15,18............... Marks 8
4.6 Transaction Isolation and Atomicity
4.7 Introduction to Concurrency Control
4.8 Need for Concurrency .................................................May-17 ................................. Marks 13
4.9 Locking Protocols.........................................................Dec-15,17, May-16 ............. Marks 16
4.10 Two Phase Locking ..................................................... May-14,18, Dec.-16 ............... Marks
4.11 Two Marks Questions with Answers
Part I : Introduction Transactions
University Question
1) Atomicity :
This property states that each transaction must be considered as a single unit and
must be completed fully or not completed at all.
No transaction in the database is left half completed.
Database should be in a state either before the transaction execution or after the
transaction execution. It should not be in a state ‘executing’.
For example - In above mentioned withdrawal of money transaction all the five
steps must be completed fully or none of the step is completed. Suppose if
transaction gets failed after step 3, then the customer will get the money but the
balance will not be updated accordingly. The state of database should be either at
before ATM withdrawal (i.e customer without withdrawn money) or after ATM
withdrawal (i.e. customer with money and account updated). This will make the
system in consistent state.
2) Consistency :
The database must remain in consistent state after performing any transaction.
For example : In ATM withdrawal operation, the balance must be updated
appropriately after performing transaction. Thus the database can be in consistent
state.
3) Isolation :
In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the
transactions will be carried out and executed as if it is the only transaction in the
system.
No transaction will affect the existence of any other transaction.
For example : If a bank manager is checking the account balance of particular
customer, then manager should see the balance either before withdrawing the
money or after withdrawing the money. This will make sure that each individual
transaction is completed and any other dependent transaction will get the
consistent data out of it. Any failure to any transaction will not affect other
transaction in this case. Hence it makes all the transactions consistent.
4) Durability :
The database should be strong enough to handle any system failure.
If there is any set of insert /update, then it should be able to handle and commit to
the database.
If there is any failure, the database should be able to recover it to the consistent
state.
For example : In ATM withdrawal example, if the system failure happens after
Customer getting the money then the system should be strong enough to update
Database with his new balance, after system recovers. For that purpose the system
has to keep the log of each transaction and its failure. So when the system
recovers, it should be able to know when a system has failed and if there is any
pending transaction, then it should be updated to Database.
University Questions
1. Explain with an example the properties that must be satisfied by transaction
AU : May-18, Marks 7
Solution :
(1) Read only transaction
T1
Read(A)
Read(B)
Display(A-B)
University Questions
1. During execution, a transaction passes through several states, until it finally commits or aborts.
List all possible sequences of states through which transaction may pass. Explain why each state
transaction may occur ? AU : May-18, Marks 6
4.4 Schedules
Schedule is an order of multiple transactions executing in concurrent environment.
Following figure represents the types of schedules.
Serial Schedule : The schedule in which the transactions execute one after the other is
called serial schedule. It is consistent in nature. For example : Consider following two
transactions T1 and T2
T1 T2
R(A)
W(A)
R(B)
W(B)
R(A)
W(A)
R(B)
W(B)
All the operations of transaction T1 on data items A and then B executes and then in
transaction T2 all the operations on data items A and B execute. The R stands for Read
operation and W stands for write operation.
Non Serial Schedule : The schedule in which operations present within the transaction
are intermixed. This may lead to conflicts in the result or inconsistency in the resultant
data. For example-
Consider following two transactions,
T1 T2
R(A)
W(A)
R(A)
W(B)
R(A)
W(B)
R(B)
W(B)
The above transaction is said to be non serial which result in inconsistency or conflicts
in the data.
4.5 Serializability AU : Dec.-15, May-15, 18, Marks 8
W(A)
B=B+10
W(B)
90 110
A=A-10
W(A)
80 110
In above transactions initially T1 will read the values from database as A=100,
B=100 and modify the values of A and B. But transaction T2 will read the modified
value i.e. 90 and will modify it to 80 and perform write operation. Thus at the end
of transaction T1 value of A will be 90 but at end of transaction T2 value of A will
be 80. Thus conflicts or inconsistency occurs here. This sequence can be converted
to a sequence which may give us consistent result. This process is called
serializability.
Difference between Serial Schedule and Serializable Schedule
Serial Schedule Serializable Schedule
No concurrency is allowed in serial schedule. Concurrency is allowed in serializable schedule.
In serial schedule, if there are two transactions In serializable schedule, if there are two
executing at the same time and no interleaving of transactions executing at the same time and
operations is permitted, then following can be the interleaving of operations is allowed there can be
possibilities of execution – different possible orders of executing an
(i) Execute all the operations of transactions T1 in individual operation of the transactions.
a sequence and then execute all the operations of
transactions T2 in a sequence.
(ii) Execute all the operations of transactions T2 in
a sequence and then execute all the operations of
transactions T1 in a sequence.
Example of Serial Schedule Example of Serializable Schedule
T1 T2 T1 T2
Read(A) Read(A)
A=A-50 A=A-50
Write(A) Write(A)
Read(B) Read(B)
B=B+100 B=B+100
Write(B) Write(B)
Read(A) Read(B)
A=A+10 Write(B)
Write(A)
T1 T2
R(A)
W(A)
R(A)
R(B)
R(B)
W(B)
Solution :
Step 1 : To check whether the schedule is conflict serializable or not we will check from
top to bottom. Thus we will start reading from top to bottom as
T1: R(A) -> T1:W(A) ->T2:R(A) -> T2:R(B) ->T1:R(B)->T1:W(B)
Step 2 : We will find conflicting operations. Two operations are called as conflicting
operations if all the following conditions hold true for them-
i) Both the operations belong to different transactions.
ii) Both the operations are on same data item.
iii)At least one of the two operations is a write operation
From above given example in the top to bottom scanning we find the conflict as
T1:W(A) ->T2:R(A).
i) Here note that there are two different transactions T1 and T2,
Step 4 : Draw the edge between conflicting transactions. For example in above given
scenario, the conflict occurs while moving from T1:W(A) to T2:R(A). Hence edge must be
from T1 to T2.
Step 5 : Repeat the step 4 while reading from top to bottom. Finally the precedence
graph will be as follows
Step 6 : Check if any cycle exists in the graph. Cycle is a path using which we can start
from one node and reach to the same node. If the is cycle found then schedule is not
conflict serializable. In the step 5 we get a graph with cycle, that means given schedule is
not conflict serializable.
Example 4.5.2 Check whether following schedule is conflict serializable or not. If it is not
conflict serializable then find the serializability order.
T1 T2 T3
R(A)
R(B)
R(B)
W(B)
W(A)
W(A)
R(A)
W(A)
Solution :
Step 1 : We will read from top to bottom, and build a precedence graph for conflicting
entries :
Step 2 : As there is no cycle in the precedence graph, the given sequence is conflict
serializable. Hence we can convert this non serial schedule to serial schedule. For that
purpose we will follow these steps to find the serializable order.
Step 3 : A serializability order of the transactions can be obtained by finding a linear
order consistent with the partial order of the precedence graph. This process is called
topological sorting.
Step 4 : Find the vertex which has no incoming edge which is T . Finally find the vertex
1
having no outgoing edge which is T . So in between them is T . Hence the order will be
2 3
T1 – T3 –T 2
4.5.2 View Serializability
If a given schedule is found to be view equivalent to some serial schedule, then it is called
as a view serializable schedule.
View Equivalent Schedule : Consider two schedules S1 and S2 consisting of transactions
T1 and T2 respectively, then schedules S1 and S2 are said to be view equivalent schedule if
it satisfies following three conditions :
o If transaction T 1reads a data item A from the database initially in schedule S ,1
then in schedule S2also, T must
1
perform the initial read of the data item X
from the database. This is same for all the data items. In other words - the
initial reads must be same for all data items.
o If transaction Ti reads a data item that has been updated by the transaction T j
in schedule S 1, then in schedule S 2also, transaction T must
i
read the same data
item that has been updated by transaction T.j In other words the Write-Read
sequence must be same.
Steps to check whether the given schedule is view serializable or not
Step 1 : If the schedule is conflict serializable then it is surely view serializable because
conflict serializability is a restricted form of view serializability.
Step 2 : If it is not conflict serializable schedule then check whether there exist any blind
write operation. The blind write operation is a write operation without reading a value. If
there does not exist any blind write then that means the given schedule is not view
serializable. In other words if a blind write exists then that means schedule may or may
not be view conflict.
Step 3 : Find the view equivalence schedule
Example 4.5.3 Consider the following schedules for checking if these are view
serializable or not.
T1 T2 T3
W(C)
R(A)
W(B)
R(C)
W(B)
W(B)
A B
0 0
0 1
0 1
AB =A OR B=FT=T. This means consistency is met.
Consider case T2->T 1then
A B
0 0
1 0
1 0
AB = A OR B = FT = T. This means consistency is met.
(2) The concurrent execution means interleaving of transactions T1 and T2. It can be
T1 T2
R(A)
R(B)
R(A)
R(B) If B=0 then
If A=0 then A=A+1
B=B+1 W(A)
W(B)
This is a non-serializable schedule.
(3) There is no concurrent execution resulting in a serializable schedule.
Example 4.5.5 Test serializability of the following schedule :
i) r1(x);r3(x);w1(x);r2(x);w3(x) ii) r3(x);r2(x);w3(x);r1(x);w1(x)
Solution : i) r1(x);r3(x);w1(x);r2(x);w3(x)
The r1 represents the read operation of transaction T 1, w3 represents the write operation
on transaction T3 and so on. Hence from given sequence the schedule for three
transactions can be represented as follows :
T1 T2 T3
r1(x)
r3(x)
w1(x)
r2(x)
w3(x)
Step 1 : We will use the precedence graph method to check the serializability. As there
are three transactions, three nodes are created for each transaction.
Step 3 : As cycle exists in the above precedence graph, we conclude that it is not
serializable.
ii) r3(x);r2(x);w3(x);r1(x);w1(x)
T1 T2 T3
r3(x)
r2(x)
w3(x)
r1(x)
w1(x)
Step 1 : Read the schedule from top to bottom for pair of operations. For r3(x) we get
w1(x) pair. Hence edge exists from T3 to T1 in precedence graph.
There is a pair from r2(x) : w3(x). Hence edge exists from T2 to T3.
There is a pair from r2(x) : w1(x). Hence edge exists from T2 to T1.
There is a pair from w3(x): r1(x). Hence edge exists from T3 to T1.
Step 2 : The precedence graph will then be as follows –
Step 3 : As there is no cycle in the above graph, the given schedule is serializable.
Step 4 : The searializability order for consistent schedule will be obtained by applying
topological sorting on above drawn precedence graph. This can be achieved as follows,
Sub-Step 1 : Find the node having no incoming edge. We obtain T2 is such a node. Hence
T2 is at the beginning of the serializability sequence. Now delete T2. The Graph will be
Thus we obtain the sequence of transactions as T2, T3 and T1. Hence the serializability
order is
r2(x);r3(x);w3(x):r1(x);w1(x)
Example 3.5.6 Consider the following schedules. The actions are listed in the order they are
scheduled, and prefixed with the transaction name.
S1 : T1 : R(X), T2 : R(X), T1 : W(Y), T2 : W(Y) T1 : R(Y), T2 : R(Y)
S2 : T3 : W(X), T1 : R(X), T1 : W(Y), T2 : R(Z), T2 : W(Z) T3 : R(Z)
For each of the schedules, answer the following questions :
i) What is the precedence graph for the schedule ?
ii) Is the schedule conflict-serializable ? If so, what are all the conflict equivalent serial
schedules ?
iii) Is the schedule view-serializable ? If so, what are all the view equivalent serial schedules ?
AU : May-15, Marks 2 + 7 + 7
Solution : i) We will find conflicting operations. Two operations are called as conflicting
operations if all the following conditions hold true for them-
o T2 : W(Y), T1 : R(Y)
Hence we will build the precedence graph. Draw the edge between conflicting
transactions. For example in above given scenario, the conflict occurs while moving from
T1:W(Y) to T2:W(Y). Hence edge must be from T1 to T2. Similarly for second conflict, there
will be the edge from T2 to T1
(iii)
o S1 is not view serializable.
T2-T3-T1.
University Questions
1. Explain Conflict serializability and view serializability.
AU : May-18, Marks 6, Dec.-15, Marks 8
T1 T2
R(A)
A=A+50
W(A)
R(A)
A=A-20
W(A)
Commit
Failure
some transaction…
Commit
The above schedule is inconsistent if failure occurs after the commit of T2.
It is because T2 is dependable transaction on T1. A transaction is said to be
dependable if it contains a dirty read.
The dirty read is a situation in which one transaction reads the data immediately
after the write operation of previous transaction
T1 T2
R(A)
A=A+50
W(A) Dirty read
R(A)
A=A-20
W(A)
Commit
Commit
Now if the dependable transaction i.e. T2 is committed first and then failure occurs
then if the transaction T1 makes any changes then those changes will not be
known to the T2. This leads to non recoverable state of the schedule.
To make the schedule recoverable we will apply the rule that - commit the
independent transaction before any dependable transaction.
In above example independent transaction is T1, hence we must commit it before
the dependable transaction i.e. T2.
R(A)
A=A+50
W(A)
R(A)
A=A-20
W(A)
Commit
Commit
4.6.2 Cascadeless Schedule
Definition : If in a schedule, a transaction is not allowed to read a data item until the
last transaction that has written that data item is committed or aborted, then such a
schedule is known as a cascadeless schedule.
The cascadeless schedule allows only committed Read operation. For example :
T1 T2 T3
R(A)
A=A+50
W(A)
Commit
R(A)
A=A-20
W(A)
Commit
R(A)
W(A)
In above schedule at any point if the failure occurs due to commit operation before
every Read operation of each transaction, the schedule becomes recoverable and atomicity
can be maintained.
T1 T2
R(A)
A=A+50
W(A) Dirty read
R(A)
A=A-20
W(A)
Commit
Commit
For example – Consider following transactions -
Assume initially salary is = ` 1000
(1) At the time t1, the transaction T2 updates the salary to `1200
(2) This salary is read at time t2 by transaction T1. Obviously it is ` 1200
(3) But at the time t3, the transaction T2 performs Rollback by undoing the changes
made by T1 and T2 at time t1 and t2.
(4) Thus the salary again becomes = ` 1000. This situation leads to Dirty Read or
Uncommited Read because here the read made at time t2(immediately after
update of another transaction) becomes a dirty read.
The phantom read problem is a special case of non repeatable read problem.
This is a problem in which one of the transaction makes the changes in the database
system and due to these changes another transaction can not read the data item which
it has read just recently. For example –
(1) At time t1, the transaction T1 reads the value of salary as ` 1000
(2) At time t2, the transaction T2 reads the value of the same salary as ` 1000
(3) At time t3, the transaction T1 deletes the variable salary.
(4) Now at time t4, when T2 again reads the salary it gets error. Now transaction T 2 can
not identify the reason why it is not getting the salary value which is read just few
time back.
This problem occurs due to changes in the database and is called phantom read
problem.
University Question
1. Discuss the violations caused by each of the following: dirty read, non repeatable read and phantoms
with suitable example AU : May-17, Marks 13
i) Shared Lock : The shared lock is used for reading data items only. It is denoted by
Lock-S. This is also called as read lock.
ii) Exclusive Lock : The exclusive lock is used for both read and write operations. It is
denoted as Lock-X. This is also called as write lock.
The compatibility matrix is used while working on set of locks. The concurrency
control manager checks the compatibility matrix before granting the lock. If the
two modes of transactions are compatible to each other then only the lock will be
granted.
In a set of locks may consists of shared or exclusive locks. Following matrix
represents the compatibility between modes of locks.
S X
S T F
X F F
Here T stands for True and F stands for False. If the control manager get the
compatibility mode as True then it grant the lock otherwise the lock will be denied.
For example : If the transaction T1 is holding a shared lock in data item A, then the
control manager can grant the shared lock to transaction T2 as compatibility is
True. But it cannot grant the exclusive lock as the compatibility is false. In simple
words if transaction T1 is reading a data item A then same data item A can be read
by another transaction T2 but cannot be written by another transaction.
Similarly if an exclusive lock (i.e. lock for read and write operations) is hold on the
data item in some transaction then no other transaction can acquire Shared or
exclusive lock as the compatibility function denotes F. That means of some
transaction is writing a data item A then another transaction can not read or write
that data item A.
Hence the rule of thumb is
i) Any number of transactions can hold shared lock on an item.
ii) But exclusive lock can be hold by only one transaction.
University Questions
1. State and explain the lock based concurrency control with suitable example
AU : Dec-17, Marks 13, May-16, Marks 16
2. What is Concurrency control ? How is implemented in DBMS ? Illustrate with suitable example.
AU : Dec-15, Marks 8
The two phase locking is a protocol in which there are two phases :
i) Growing phase (Locking phase) : It is a phase in which the transaction may
obtain locks but does not release any lock.
ii) Shrinking phase (Unlocking phase) : It is a phase in which the transaction
may release the locks but does not obtain any new lock.
Lock Point : The last lock position or first unlock position is called lock
point. For example
Lock(A)
Lock(B)
Lock(C)
….
… Lock Point
…
Unlock(A)
Unlock(B)
Unlock(C)
For example –
Consider following transactions
T1 T2
Lock-X(A) Lock-S(B)
Read(A) Read(B)
A=A-50 Unlock-S(B)
Write(A)
Lock-X(B)
Unlock-X(A)
B=B+100 Lock-S(A)
Write(B) Read(A)
Unlock-X(B) Unlock-S(A)
The important rule for being a two phase locking is - All Lock operations precede all
the unlock operations.
In above transactions T1 is in two phase locking mode but transaction T2 is not in two
phase locking. Because in T2, the Shared lock is acquired by data item B, then data item B
is read and then the lock is released. Again the lock is acquired by data item A , then the
data item A is read and the lock is then reloased. Thus we get lock-unlock-lock-unlock
sequence. Clearly this is not possible in two phase locking.
Example 4.10.1 Prove that two phase locking guarantees serializability.
Solution:
o Serializability is mainly an issue of handling write operation. Because any
inconsistency may only be created by write operation.
The serializability using two phase locking can be understood with the help of
following example
Consider two transactions
T1 T2
R(A)
R(A)
R(B)
W(B)
Step 1 : Now we will apply two phase locking. That means we will apply locks in
growing and shrinking phase
T1 T2
Lock-S(A)
R(A)
Lock-S(A)
R(A)
Lock-X(B)
R(B)
W(B)
Unlock-X(B)
Unlock-S(A)
Note that above schedule is serializable as it prevents interference between two
transactions.
The serializability order can be obtained based on the lock point. The lock point is
either last lock operation position or first unlock position in the transaction.
The last lock position is in T1, then it is in T2. Hence the serializability will be T1->T2
The two phase locking protocol leads to two problems – deadlock and cascading
roll back.
(1) Deadlock : The deadlock problem can not be solved by two phase locking.
Deadlock is a situation in which when two or more transactions have got a lock and
waiting for another locks currently held by one of the other transactions.
For example
T1 T2
Lock-X(A) Lock-X(B)
Read(A) Read(B)
A=A-50 B=B+100
Write(A) Write(B)
Delayed, Delayed,
wait for T2 wait for T1
to release to release
Lock on B Lock on A
Read(A)
Read(B)
C=A+B
Write(C)
Read(C)
Write(C)
Read(C)
When T1 writes value of C then only T2 can read it. And when T2 writes the value of C
then only transaction T3 can read it. But if the transaction T1 gets failed then
automatically transactions T2 and T3 gets failed.
The simple two phase locking does not solve the cascading rollback problem. To solve
the problem of cascading Rollback two types of two phase locking mechanisms can be
used.
4.10.1 Types of Two Phase Locking
(1) Strict Two Phase Locking : The strict 2PL protocol is a basic two phase protocol but
all the exclusive mode locks be held until the transaction commits. That means in other
words all the exclusive locks are unlocked only after the transaction is committed. That
also means that if T1 has exclusive lock, then T1 will release the exclusive lock only after
commit operation, then only other transaction is allowed to read or write. For example -
Consider two transactions
T1 T2
W(A)
R(A)
If we apply the locks then
T1 T2
Lock-X(A)
W(A)
Commit
Unlock(A)
Lock-S(A)
R(A)
Unlock-S(A)
Thus only after commit operation in T1, we can unlock the exclusive lock. This ensures
the strict serializability.
Thus compared to basic two phase locking protocol, the advantage of strict 2PL
protocol is it ensures strict serializability.
(2) Rigorous Two Phase Locking : This is stricter two phase locking protocol. Here all
locks are to be held until the transaction commits. The transactions can be seriealized in
the order in which they commit.
example - Consider transactions
T1
R(A)
R(B)
W(B)
If we apply the locks then
T1
Lock-S(A)
R(A)
Lock-X(B)
R(B)
W(B)
Commit
Unlock(A)
Unlock(B)
Thus the above transaction uses rigorous two phase locking mechanism
Example 4.10.2 Consider the following two transactions :
T1:read(A)
Read(B);
If A=0 then B=B+1;
Write(B)
T2:read(B); read(A)
If B=0 then A=A+1
Write(A)
Add lock and unlock instructions to transactions T1 and T2, so that they observe two phase
locking protocol. Can the execution of these transactions result in deadlock ?
AU : Dec.-16, Marks 6
Solution :
T1 T2
Lock-S(A) Lock-S(B)
Read(A) Read(B)
Lock-X(B) Lock-X(A)
Read(B) Read(A)
if A=0 then B=B+1 if B=0 then A=A+1
Write(B) Write(A)
Unlock(A) Unlock(B)
Commit Commit
Unlock(B) Unlock(A)
This is lock-unlock instruction sequence help to satisfy the requirements for strict two
phase locking for the given transactions.
The execution of these transactions result in deadlock. Consider following partial
execution scenario which leads to deadlock.
T1 T2
Lock-S(A) Lock-S(B)
Read(A) Read(B)
Lock-X(B) Lock-X(A)
Now it will wait for T2 to Now it will wait for T1 to
release exclusive lock on A release exclusive lock on B
4.10.2 Lock Conversion
Lock conversion is a mechanism in two phase locking mechanism - which allows
conversion of shared lock to exclusive lock or exclusive lock to shared lock.
Method of Conversion :
First Phase :
o can acquire a lock-S on item
Second Phase :
o can release a lock-S
University Questions
1. What is concurrency control ? Explain two phase locking protocol with an example
AU : May-18, Marks 7
University Question
Ans. : A transaction can be defined as a group of tasks that form a single logical unit.
Q.2 What does time to commit mean ? AU : May-04
Ans. :
The COMMIT command is used to save permanently any transaction to database.
When we perform, Read or Write operations to the database then those changes
can be undone by rollback operations. To make these changes permanent, we
should make use of commit
Q.3 What are the various properties of transaction that the database system maintains to
ensure integrity of data. AU : Dec.-04
OR
Q.4 What are ACID properties ? AU : May-05,06,08,13,15,Dec.-07,14,17
Ans. : In a database, each transaction should maintain ACID property to meet the
consistency and integrity of the database. These are
(1) Atomicity (2) Consistency (3) Isolation (4) Durability
Q.5 Give the meaning of the expression ACID transaction. AU : Dec.-08
Ans. : The expression ACID transaction represents the transaction that follows the ACID
Properties.
Q.6 State the atomicity property of a transaction. AU : May-09,13
Ans. : This property states that each transaction must be considered as a single unit and
must be completed fully or not completed at all.
No transaction in the database is left half completed.
Q.7 What is meant by concurrency control ? AU : Dec.-15
Ans. : A mechanism which ensures that simultaneous execution of more than one
transactions does not lead to any database inconsistencies is called concurrency control
mechanism.
Q.8 State the need for concurrency control. AU : Dec.-17
OR
Q.9 Why is it necessary to have control of concurrent execution of transactions ? How is it
made possible ? AU : Dec.-02
Ans. : Serializability is a concept that helps to identify which non serial schedule and find
the transaction equivalent to serial schedule.
It is tested using precedence graph technique.
Q.12 What is serializable schedule ? AU : May-17
Ans. : The schedule in which the transactions execute one after the other is called serial
schedule. It is consistent in nature. For example : Consider following two transactions T1
and T2
T1 T2
R(A)
W(A)
R(B)
W(B)
R(A)
W(A)
R(B)
W(B)
All the operations of transaction T1 on data items A and then B executes and then in
transaction T2 all the operations on data items A and B execute. The R stands for Read
operation and W stands for write operation.
Q.13 When are two schedules conflict equivalent ? AU : Dec.-08
T1 T2 T1 T2
Read(A) Read(A)
Write(A) Write(A)
Read(A) Read(B)
Write(A) Write(B)
Read(B) Read(A)
Write(B) Write(A)
Read(B) Read(B)
Write(B) Write(B)
Ans. : The two phase locking is a protocol in which there are two phases :
i) Growing Phase (Locking Phase) : It is a phase in which the transaction may obtain
locks but does not release any lock.
ii) Shrinking Phase (Unlocking Phase) : It is a phase in which the transaction may
release the locks but does not obtain any new lock.
Q.15 What is the difference between shared lock and exclusive lock? AU : May-18
Ans:
Shared lock is used for when the transaction Exclusive lock is used when the transaction
wants to perform read operation. wants to perform both read and write operation.
Multiple shared lock can be set on a transactions Only one exclusive lock can be placed on a data
simultaneously. item at a time.
Using shared lock data item can be viewed. Using exclusive lock data can be inserted or
deleted.
Q.16 What type of lock is needed for insert and delete operations. AU : May-17
Ans. : Benefits :
1. This ensure that any data written by an uncommitted transaction are locked in
exclusive mode until the transaction commits and preventing other transaction from
reading that data .
2. This protocol solves dirty read problem.
Disadvantage:
1. Concurrency is reduced.
Q.18 What is rigorous two phase locking protocol ? AU : Dec.-13
Ans. : This is stricter two phase locking protocol. Here all locks are to be held until the
transaction commits.
Q.19 Differentiate strict two phase locking and rigourous two phase locking protocol.
AU : May-16
Ans. :
In Strict two phase locking protocol all the exclusive mode locks be held until the
transaction commits.
The rigourous two phase locking protocol is stricter than strict two phase locking
protocol. Here all locks are to be held until the transaction commits.
AU : May-08,09,14
UNIT - V
OBJECT RELATIONAL AND NO-SQL DATABASES
Syllabus
Mapping EER to ODB schema – Object identifier – reference types – rowtypes – UDTs –
Subtypes and supertypes – user-defined routines – Collection types – Object Query
Language; No-SQL: CAP theorem – Document-based: MongoDB data model and CRUD
operations; Column-based: Hbase data model and CRUD operations.
rather than
SELECT * FROM pg_attribute
WHERE attrelid = (SELECT oid FROM pg_class WHERE relname = 'mytable');
While that doesn't look all that bad by itself, it's still oversimplified. A far more complicated
sub-select would be needed to select the right OID if there are multiple tables
named mytable in different schemas. The regclass input converter handles the table lookup
according to the schema path setting, and so it does the "right thing" automatically. Similarly,
casting a table's OID to regclass is handy for symbolic display of a numeric OID.
Table 8-19. Object Identifier Types
Name References Description Value Example
Oid any numeric object identifier 564182
Syntax
Description
record
Specifies an identifier for the record.
table
Specifies an identifier for the table whose column definitions will be used to define
the fields in the record.
view
Specifies an identifier for the view whose column definitions will be used to define
the fields in the record.
%ROWTYPE
Specifies that the record field data types are to be derived from the column data
types that are associated with the identified table or view. Record fields do not
inherit any other column attributes, such as, for example, the nullability attribute.
Example
The following example shows how to use the %ROWTYPE attribute to create a record
(named r_emp) instead of declaring individual variables for the columns in the EMP table.
Reference types
For every structured type you create, Db2® automatically creates a companion type. The
companion type is called a reference type and the structured type to which it refers is called a
referenced type. Typed tables can make special use of the reference type. You can also use
reference types in SQL statements in the same way that you use other user-defined types. To
use a reference type in an SQL statement, use REF(type-name), where type-name represents
the referenced type.
Db2 uses the reference type as the type of the object identifier column in typed tables. The
object identifier uniquely identifies a row object in the typed table hierarchy. Db2 also uses
reference types to store references to rows in typed tables. You can use reference types to
refer to each row object in the table.
References are strongly typed. Therefore, you must have a way to use the type in expressions.
When you create the root type of a type hierarchy, you can specify the base type for a
reference with the REF USING clause of the CREATE TYPE statement. The base type for a
reference is called the representation type. If you do not specify the representation type with
the REF USING clause, Db2 uses the default data type of VARCHAR(16) FOR BIT DATA.
The representation type of the root type is inherited by all its subtypes. The REF USING
clause is only valid when you define the root type of a hierarchy. In the examples used
throughout this section, the representation type for the BusinessUnit_t type is INTEGER,
while the representation type for Person_t is VARCHAR(13).
Reference types
For every structured type you create, Db2 automatically creates a companion type. The
companion type is called a reference type and the structured type to which it refers is called a
referenced type.
Relationships between objects in typed tables
You can define relationships between objects in one typed table and objects in another table.
You can also define relationships between objects in the same typed table.
Defining semantic relationships with references
Using the WITH OPTIONS clause of CREATE TABLE, you can define that a relationship
exists between a column in one table and the objects in the same or another table.
Referential integrity versus scoped references
Although scoped references do define relationships among objects in tables, they are different
from referential integrity relationships.
ROWTYPE Attribute
The %ROWTYPE attribute provides a record type that represents a row in a database
table. The record can store an entire row of data selected from the table or fetched
from a cursor or cursor variable. Fields in a record and corresponding columns in a
row have the same names and datatypes.
You can use the %ROWTYPE attribute in variable declarations as a datatype specifier.
Variables declared using %ROWTYPE are treated like those declared using a datatype
name. For more information, see "Using the %ROWTYPE Attribute".
Syntax
rowtype_attribute ::=
{cursor_name | cursor_variable_name | table_name} % ROWTYPE
cursor_variable_name
A PL/SQL strongly typed cursor variable, previously declared within the current
scope.
table_name
A database table or view that must be accessible when the declaration is
elaborated.
Usage Notes
There are two ways to assign values to all fields in a record at once:
Examples
The following example uses %ROWTYPE to declare two records. The first record
stores an entire row selected from a table. The second record stores a row fetched
from the c1 cursor, which queries a subset of the columns from the table. The
example retrieves a single row from the table and stores it in the record, then
checks the values of some table columns.
DECLARE
emp_rec employees%ROWTYPE;
my_empno employees.employee_id%TYPE := 100;
CURSOR c1 IS
SELECT department_id, department_name, location_id FROM departments;
dept_rec c1%ROWTYPE;
BEGIN
SELECT * INTO emp_rec FROM employees WHERE employee_id = my_empno;
IF (emp_rec.department_id = 20) AND (emp_rec.salary > 2000) THEN
NULL;
END IF;
END;
/
In this case, we will see how to insert command without inserting the value of
one or more fields.
For example, we are not passing the value of the field here. How Cassandra
will handle this?
Well, it will insert this value as a normal value but it will take the field value is
null. Every field value, except the primary key that we do not pass at the time
of insertion, Cassandra will take it as null.
CQL query without insert one field or more field value of the UDT:
INSERT INTO Registration(Emp_id, Emp_Name, current_address )
values (1003, 'Amit Gupta', { h_no : 'D 210',
city
: 'Bangalore', state : 'MH',
pin_code :12345}
);
You can create your own procedures, functions and methods in any of the supported
implementation styles for the routine type. Generally the prefix 'user-defined' is not used
when referring to procedures and methods. User-defined functions are also commonly called
UDFs.
User-defined routine definitions are stored in the SYSTOOLS system catalog table schema.
Sourced: user-defined routines can be sourced from the logic of existing built-in
routines.
SQL: user-defined routines can be implemented using only SQL statements.
External: user-defined routines can be implemented using one of a set of supported
programming languages.
When routines are created in a non-SQL programming language, the library or class
built from the code is associated with the routine definition by the value specified in
the EXTERNAL NAME clause. When the routine is invoked the library or class
associated with the routine is run.
User-defined routines can include a variety of SQL statements, but not all SQL statements.
User-defined routines are strongly typed, but type handling and error-handling mechanisms
must be developed or enhanced by routine developers.
In general, user-defined routines perform well, but not as well as built-in routines.
User-defined routines can invoke built-in routines and other user-defined routines
implemented in any of the supported formats. This flexibility allows users to essentially have
the freedom to build a complete library of routine modules that can be re-used.
In general, user-defined routines provide a means for extending the SQL language and for
modularizing logic that will be re-used by multiple queries or database applications where
built-in routines do not exist.
PL/SQL - Collections
we will discuss the Collections in PL/SQL. A collection is an ordered group of
elements having the same data type. Each element is identified by a unique
subscript that represents its position in the collection.
PL/SQL provides three collection types −
Index-by tables or Associative array
Nested table
Variable-size array or Varray
Oracle documentation provides the following characteristics for each type of
collections −
Can Be
Collection Number of Subscript Dense or Where
Object Type
Type Elements Type Sparse Created
Attribute
Associative
String or Only in
array (or index- Unbounded Either No
integer PL/SQL block
by table)
Starts Either in
dense, can PL/SQL block
Nested table Unbounded Integer Yes
become or at schema
sparse level
Either in
Variablesize Always PL/SQL block
Bounded Integer Yes
array (Varray) dense or at schema
level
We have already discussed varray in the chapter 'PL/SQL arrays'. In this chapter,
we will discuss the PL/SQL tables.
Both types of PL/SQL tables, i.e., the index-by tables and the nested tables have the
same structure and their rows are accessed using the subscript notation. However,
these two types of tables differ in one aspect; the nested tables can be stored in a
database column and the index-by tables cannot.
Index-By Table
An index-by table (also called an associative array) is a set of key-value pairs.
Each key is unique and is used to locate the corresponding value. The key can be
either an integer or a string.
An index-by table is created using the following syntax. Here, we are creating
an index-by table named table_name, the keys of which will be of the
subscript_type and associated values will be of the element_type
TYPE type_name IS TABLE OF element_type [NOT NULL] INDEX BY subscript_type;
table_name type_name;
Example
Following example shows how to create a table to store integer values along with
names and later it prints the same list of names.
DECLARE
TYPE salary IS TABLE OF NUMBER INDEX BY VARCHAR2(20);
salary_list salary;
name VARCHAR2(20);
BEGIN
-- adding elements to the table
salary_list('Rajnish') := 62000;
salary_list('Minakshi') := 75000;
salary_list('Martin') := 100000;
salary_list('James') := 78000;
+ + + + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ + + + + +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
+ + + + + +
DECLARE
CURSOR c_customers is
select name from customers;
Nested Tables
A nested table is like a one-dimensional array with an arbitrary number of elements.
However, a nested table differs from an array in the following aspects −
An array has a declared number of elements, but a nested table does not. The
size of a nested table can increase dynamically.
An array is always dense, i.e., it always has consecutive subscripts. A nested
array is dense initially, but it can become sparse when elements are deleted
from it.
A nested table is created using the following syntax −
TYPE type_name IS TABLE OF element_type [NOT NULL];
table_name type_name;
This declaration is similar to the declaration of an index-by table, but there is
no INDEX BY clause.
A nested table can be stored in a database column. It can further be used for
simplifying SQL operations where you join a single-column table with a larger table.
An associative array cannot be stored in the database.
Example
The following examples illustrate the use of nested table −
DECLARE
TYPE names_table IS TABLE OF VARCHAR2(10);
TYPE grades IS TABLE OF INTEGER;
names names_table;
marks grades;
total integer;
BEGIN
names := names_table('Kavita', 'Pritam', 'Ayan', 'Rishav', 'Aziz');
marks:= grades(98, 97, 78, 87, 92);
total := names.count;
dbms_output.put_line('Total '|| total || ' Students');
FOR i IN 1 .. total LOOP
dbms_output.put_line('Student:'||names(i)||', Marks:' || marks(i));
end loop;
END;
/
When the above code is executed at the SQL prompt, it produces the following result −
Total 5 Students
Student:Kavita, Marks:98
Student:Pritam, Marks:97
Student:Ayan, Marks:78
Student:Rishav, Marks:87
Student:Aziz, Marks:92
Collection Methods
PL/SQL provides the built-in collection methods that make collections easier to use.
The following table lists the methods and their purpose −
EXISTS(n)
1
Returns TRUE if the nth element in a collection exists; otherwise returns FALSE.
COUNT
2
Returns the number of elements that a collection currently contains.
LIMIT
3
Checks the maximum size of a collection.
FIRST
4 Returns the first (smallest) index numbers in a collection that uses the integer
subscripts.
LAST
5 Returns the last (largest) index numbers in a collection that uses the integer
subscripts.
PRIOR(n)
6
Returns the index number that precedes index n in a collection.
NEXT(n)
7
Returns the index number that succeeds index n.
EXTEND
8
Appends one null element to a collection.
EXTEND(n)
9
Appends n null elements to a collection.
EXTEND(n,i)
10
Appends n copies of the ith element to a collection.
TRIM
11
Removes one element from the end of a collection.
TRIM(n)
12
Removes n elements from the end of a collection.
DELETE
13
Removes all elements from a collection, setting COUNT to 0.
DELETE(n)
Removes the nth element from an associative array with a numeric key or a
14
nested table. If the associative array has a string key, the element corresponding
to the key value is deleted. If n is null, DELETE(n) does nothing.
DELETE(m,n)
15 Removes all elements in the range m..n from an associative array or nested
table. If m is larger than n or if m or n is null, DELETE(m,n) does nothing.
Collection Exceptions
The following table provides the collection exceptions and when they are raised −
Please note that the syntax in some of the examples might need minor adjustment
before they will work with the current version of O2. If you find any errors, or
places which are unclear, or if you have any suggestions or comments, please let us
know (email to Michalis - [email protected]). Your help is greatly
appreciated.
Introduction
OQL is the way to access data in an O2 database. OQL is a powerful and easy-to-
use SQL-like query language with special features dealing with complex objects,
values and methods.
Using OQL
We've been able to create classes and write some programs. So far, O2 appears to
be like an object-oriented programming language like C++ instead of a database
system. Probably the main difference is that O2 supports queries. The queries that
you'll be creating will look very similar to that of SQL.
In order to perform queries, you'll need to enter query mode. From within the O2
client, do the command:
query;
This will put you in a sub-shell for queries. From here, you can enter your queries
followed by Ctrl-D. To exit query mode, just hit Ctrl-D without entering a query.
Example Query 1
Give the names of people who are older than 26 years old:
We use the dot notation and path expressions to access components of complex
values.
Let variables t and ta range over objects in extents (persistent names) of Tutors
and TAs (i.e., range over objects in sets Tutors and TAs).
Cascade of dots can be used if all names represent objects and not a collection.
Example Query 2
Find the names of the students of all tutors:
SELECT s.name
FROM Tutors t, t.students s
Here we notice that the variable t that binds to the first collection of FROM is
used to help us define the second collection s. Because students is a
collection, we use it in the FROM list, like t.students above, if we want to access
attributes of students.
Example Query 3
Give the names of the Tutors which have a salary greater than $300 and have a
student paying more than $30:
SELECT t.name
FROM ( SELECT t FROM Tutors t WHERE t.salary > 300 ) r, r.students s
WHERE s.fee > 30
Example Query 4
Give the names of people who aren't TAs:
SELECT p.name
FROM p in People
WHERE not ( p.name in SELECT t.name FROM t in TAs )
The standard O2C operators for sets are + (union), * (intersection), and -
(difference). In OQL, the operators are written as UNION, INTERSECT
and EXCEPT , respectively.
Example Query 5
Give the names of TAs with the highest salary:
SELECT t.name
FROM t in TAs
WHERE t.salary = max ( select ta.salary from ta in TAs )
GROUP BY
The GROUP BY operator creates a set of tuples with two fields. The first has the
type of the specified GROUP BY attribute. The second field is the set of tuples
that match that attribute. By default, the second field is called PARTITION.
Example Query 6
Give the names of the students and the average fee they pay their Tutors:
1. Initial collection
We begin from collection Tutors, but technically it is a bag of tuples of the form:
where t1 is a Tutor object and s denotes a student tuple. In general, there are
fields for all of the variable bindings in the FROM clause.
2. Intermediate collection
The GROUP BY attribute s.name maps the tuples of the initial collection to the
value of the name of the student. The intermediate collection is a set of tuples of
type:
tuple( sname: string, partition: set( tuple(t: Tutor, s: tuple( name: string, fee:
real ) ) ) )
For example:
3. Output collection
Consists of student-average fee pairs, one for each tuple in the intermediate
collection. The type of tuples in the output is:
We let p range over all tuples in partition. Each of these tuples contains a Tutor
object and a student tuple. Thus, p.s.fee extracts the fee from one of the
student tuples.
Instead of using query mode, you can incorporate these queries in your O2
programs using the "o2query" command:
run body {
o2 real total_salaries;
o2query( total_salaries, "sum ( SELECT ta->get_salary \
FROM ta in TAs )" );
printf("TAs combined salary: %.2f\n", total_salaries);
};
The first argument for o2query is the variable in which you want to store the
query results. The second argument is a string that contains the query to be
performed. If your query string takes up several lines, be sure to backslash (\)
the carriage returns.
In MongoDB, data has a flexible schema. It is totally different from SQL database
where you had to determine and declare a table's schema before inserting data.
MongoDB collections do not enforce document structure.
The main challenge in data modeling is balancing the need of the application, the
performance characteristics of the database engine, and the data retrieval patterns.
Example
Suppose a client needs a database design for his blog/website and see the
differences between RDBMS and MongoDB schema design. Website has the
following requirements.
Every post has the unique title, description and url.
Every post can have one or more tags.
Every post has the name of its publisher and total number of likes.
Every post has comments given by users along with their name, message,
data-time and likes.
On each post, there can be zero or more comments.
In RDBMS schema, design for above requirements will have minimum three tables.
While in MongoDB schema, design will have one collection post and the following
structure −
{
_id: POST_ID
title: TITLE_OF_POST,
description: POST_DESCRIPTION,
by: POST_BY,
url: URL_OF_POST,
tags: [TAG1, TAG2, TAG3],
likes: TOTAL_LIKES,
comments: [
{
user:'COMMENT_BY',
message: TEXT,
dateCreated: DATE_TIME,
like: LIKES
},
{
user:'COMMENT_BY',
message: TEXT,
dateCreated: DATE_TIME,
like: LIKES
}
]
}
So while showing the data, in RDBMS you need to join three tables and in
MongoDB, data will be shown from one collection only.
If the create operation is successful, a new document is created. The function will return an
object where “acknowledged” is “true” and “insertID” is the newly created “ObjectId.”
> db.RecordsDB.insertOne({
... name: "Marsh",
... age: "6 years",
... species: "Dog",
... ownerAddress: "380 W. Fir Ave",
... chipped: true
... })
{
"acknowledged" : true,
"insertedId" : ObjectId("5fd989674e6b9ceb8665c57d")
}
insertMany()
It’s possible to insert multiple items at one time by calling the insertMany() method on the
desired collection. In this case, we pass multiple items into our chosen collection
(RecordsDB) and separate them by commas. Within the parentheses, we use brackets to
indicate that we are passing in a list of multiple entries. This is commonly referred to as a
nested method.
db.RecordsDB.insertMany([{
name: "Marsh",
age: "6 years",
species: "Dog",
ownerAddress: "380 W. Fir Ave",
chipped: true},
{name: "Kitana",
age: "4 years",
species: "Cat",
ownerAddress: "521 E. Cortland",
chipped: true}])
find()
In order to get all the documents from a collection, we can simply use the find() method on
our chosen collection. Executing just the find() method with no arguments will return all
records currently in the collection.
db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "6 years",
"species" : "Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "3 years",
"species" : "Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Kevin", "age" : "8 years",
"species" : "Dog", "ownerAddress" : "900 W. Wood Way", "chipped" : true }
Here we can see that every record has an assigned “ObjectId” mapped to the “_id” key.
If you want to get more specific with a read operation and find a desired subsection of the
records, you can use the previously mentioned filtering criteria to choose what results should
be returned. One of the most common ways of filtering the results is to search by value.
db.RecordsDB.find({"species":"Cat"})
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
findOne()
In order to get one document that satisfies the search criteria, we can simply use
the findOne() method on our chosen collection. If multiple documents satisfy the query, this
method returns the first document according to the natural order which reflects the order of
documents on the disk. If no documents satisfy the search criteria, the function returns null.
The function takes the following form of syntax.
db.{collection}.findOne({query}, {projection})
Let's take the following collection—say, RecordsDB, as an example.
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "8 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "6 years",
"species" : "Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "3 years",
"species" : "Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Kevin", "age" : "8 years",
"species" : "Dog", "ownerAddress" : "900 W. Wood Way", "chipped" : true }
And, we run the following line of code:
db.RecordsDB.find({"age":"8 years"})
We would get the following result:
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "8 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
Notice that even though two documents meet the search criteria, only the first document that
matches the search condition is returned.
Update Operations
Like create operations, update operations operate on a single collection, and they are atomic
at a single document level. An update operation takes filters and criteria to select the
documents you want to update.
You should be careful when updating documents, as updates are permanent and can’t be
rolled back. This applies to delete operations as well.
For MongoDB CRUD, there are three different methods of updating documents:
db.collection.updateOne()
db.collection.updateMany()
db.collection.replaceOne()
updateOne()
We can update a currently existing record and change a single document with an update
operation. To do this, we use the updateOne() method on a chosen collection, which here is
“RecordsDB.” To update a document, we provide the method with two arguments: an update
filter and an update action.
The update filter defines which items we want to update, and the update action defines how
to update those items. We first pass in the update filter. Then, we use the “$set” key and
provide the fields we want to update as a value. This method will update the first record that
matches the provided filter.
db.RecordsDB.updateOne({name: "Marsh"}, {$set:{ownerAddress: "451 W. Coffee St.
A204"}})
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "6 years",
"species" : "Dog", "ownerAddress" : "451 W. Coffee St. A204", "chipped" : true }
updateMany()
updateMany() allows us to update multiple items by passing in a list of items, just as we did
when inserting multiple items. This update operation uses the same syntax for updating a
single document.
db.RecordsDB.updateMany({species:"Dog"}, {$set: {age: "5"}})
{ "acknowledged" : true, "matchedCount" : 3, "modifiedCount" : 3 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "5", "species" :
"Dog", "ownerAddress" : "451 W. Coffee St. A204", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "5", "species" :
"Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Kevin", "age" : "5", "species" :
"Dog", "ownerAddress" : "900 W. Wood Way", "chipped" : true }
replaceOne()
The replaceOne() method is used to replace a single document in the specified
collection. replaceOne() replaces the entire document, meaning fields in the old document
not contained in the new will be lost.
db.RecordsDB.replaceOne({name: "Kevin"}, {name: "Maki"})
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "5", "species" :
"Dog", "ownerAddress" : "451 W. Coffee St. A204", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "5", "species" :
"Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Maki" }
Delete Operations
Delete operations operate on a single collection, like update and create operations. Delete
operations are also atomic for a single document. You can provide delete operations with
filters and criteria in order to specify which documents you would like to delete from a
collection. The filter options rely on the same syntax that read operations utilize.
MongoDB has two different methods of deleting records from a collection:
db.collection.deleteOne()
db.collection.deleteMany()
deleteOne()
deleteOne() is used to remove a document from a specified collection on the MongoDB
server. A filter criteria is used to specify the item to delete. It deletes the first record that
matches the provided filter.
db.RecordsDB.deleteOne({name:"Maki"})
{ "acknowledged" : true, "deletedCount" : 1 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "5", "species" :
"Dog", "ownerAddress" : "451 W. Coffee St. A204", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "5", "species" :
"Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
deleteMany()
deleteMany() is a method used to delete multiple documents from a desired collection with a
single delete operation. A list is passed into the method and the individual items are defined
with filter criteria as in deleteOne().
db.RecordsDB.deleteMany({species:"Dog"})
{ "acknowledged" : true, "deletedCount" : 2 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species"
HBase Data Model
The HBase Data Model is designed to handle semi-structured data that may differ in
field size, which is a form of data and columns. The data model’s layout partitions the
data into simpler components and spread them across the cluster. HBase's Data
Model consists of various logical components, such as a table, line, column,
family, column, column, cell, and edition.
Table:
An HBase table is made up of several columns. The tables in HBase defines upfront
during the time of the schema specification.
Row:
An HBase row consists of a row key and one or more associated value columns.
Row keys are the bytes that are not interpreted. Rows are ordered lexicographically,
with the first row appearing in a table in the lowest order. The layout of the row key is
very critical for this purpose.
Column:
A column in HBase consists of a family of columns and a qualifier of columns, which
is identified by a character: (colon).
Column Family:
Apache HBase columns are separated into the families of columns. The column
families physically position a group of columns and their values to increase its
performance. Every row in a table has a similar family of columns, but there may not
be anything in a given family of columns.
The same prefix is granted to all column members of a column family. For example,
Column courses: history and courses: math, are both members of the column
family of courses. The character of the colon (:) distinguishes the family of columns
from the qualifier of the family of columns. The prefix of the column family must be
made up of printable characters.
During schema definition time, column families must be declared upfront while
columns are not specified during schema time. They can be conjured on the fly when
the table is up and running. Physically, all members of the column family are stored
on the file system together.
Column Qualifier
The column qualifier is added to a column family. A column standard could
be content (html and pdf), which provides the content of a column unit. Although
column families are set up at table formation, column qualifiers are mutable and can
vary significantly from row to row.
Cell:
A Cell store data and is quite a unique combination of row key, Column Family, and
the Column. The data stored in a cell call its value and data types, which is every
time treated as a byte[].
Timestamp:
In addition to each value, the timestamp is written and is the identifier for a given
version of a number. The timestamp reflects the time when the data is written on the
Region Server. But when we put data into the cell, we can assign a different
timestamp value.
Create
Let’s create an HBase table and insert data into the table. Now that we know, while creating a
table user needs to create required Column Families.
Here we have created two-column families for table ‘employee’. First Column Family is
‘Personal Info’ and Second Column Family is ‘Professional Info’.
1 create 'employee', 'Personal info', 'Professional Info'
2 0 row(s) in 1.4750 seconds
3
4 => Hbase::Table - employee
Upon successful creation of the table, the shell will return 0 rows.
Create a table with Namespace:
A namespace is nothing but a logical grouping of tables.’company_empinfo’ is the
namespace id in the below command.
1 create 'company_empinfo:employee', 'Personal info', 'Professional Info'
Create a table with version:
By default, versioning is not enabled in HBase. So users need to specify while creating.
Given below is the syntax for creating an HBase table with versioning enabled.
1 create 'tableName',{NAME=>"CF1",VERSIONS=>5},{NAME=."CF2",VERSIONS=>5}
2 create 'bankdetails',{NAME=>"address",VERSIONS=>5}
Put:
Put command is used to insert records into HBase.
1 put 'employee', 1, 'Personal info:empId', 10
2 put 'employee', 1, 'Personal info:Name', 'Alex'
3 put 'employee', 1, 'Professional Info:Dept, 'IT'
Here in the above example all the rows having Row Key as 1 is considered to be one row in
HBase.To add multiple rows
1 put 'employee', 2, 'Personal info:empId', 20
2 put 'employee', 2, 'Personal info:Name', 'Bob'
3 put 'employee', 2, 'Professional Info:Dept', 'Sales'
As discussed earlier, the user can add any number of columns as part of the row.
Read
‘get’ and ‘scan’ command is used to read data from HBase. Lets first discuss ‘get’ operation.
get: ‘get’ operation returns a single row from the HBase table. Given below is the syntax for
the ‘get’ method.
1 get 'table Name', 'Row Key'
1 hbase(main):022:get 'employee', 1
COLUMN CELL
Personal info:Name timestamp=1504600767520, value=Alex
Per sonal info:empId timestamp=1504600767491, value=10
Pro fessional Info:Dept timestamp=1504600767540, value=IT
3 row(s) in 0.0250 seconds
Note: Notice that there is a timestamp attached to each cell. These timestamps will update for
the cell whenever the cell value is updated. All the old values will be there but timestamp
having the latest value will be displayed as output.
Get all version of a column
Below given command is used to find different versions. Here ‘VERSIONS => 3’ defines
number of version to be retrieved.
1 get 'Table Name', 'Row Key', {COLUMN => 'Column Family', VERSIONS => 3}
scan:
‘scan’ command is used to retrieve multiple rows.
Select all:
The below command is an example of a basic search on the entire table.
1 scan 'Table Name'
1 hbase(main):074:> scan 'employee'
ROW COLUMN+CELL
1 column=Personal info:Name, timestamp=1504600767520, value=Alex
1 column=Personal info:empId, timestamp=1504606480934, value=15
1 column=Professional Info:Dept, timestamp=1504600767540, value=IT
2 column=Personal info:Name, timestamp=1504600767588, value=Bob
2 column=Personal info:empId, timestamp=1504600767568, value=20
2 column=Professional Info:Dept, timestamp=1504600768266, value=Sales
2 row(s) in 0.0500 seconds
Note: All the Rows are arranged by Row Keys along with columns in each row.
Column Selection:
The below command is used to Scan any particular column.
1 hbase(main):001:>scan 'employee' ,{COLUMNS => 'Personal info:Name' }
ROW COLUMN+CELL
1 column=Personal info:Name, timestamp=1504600767520, value=Alex
2 column=Personal info:Name, timestamp=1504600767588, value=Bob
2 row(s) in 0.3660 seconds
Limit Query:
The below command is used to Scan any particular column.
1 hbase(main):002:>scan 'employee' ,{COLUMNS => 'Personal info:Name',LIMIT =>1 }
ROW COLUMN+CELL
1 column=Personal info:Name, timestamp=1504600767520, value=Alex
1 row(s) in 0.0270 seconds
Update
To update any record HBase uses ‘put’ command. To update any column value, users need to
put new values and HBase will automatically update the new record with the latest
timestamp.
1 put 'employee', 1, 'Personal info:empId', 30
The old value will not be deleted from the HBase table. Only the updated record with the
latest timestamp will be shown as query output.
To check the old value of any row use below command.
1 get 'Table Name', 'Row Key', {COLUMN => 'Column Family', VERSIONS => 3}
Delete
‘delete‘ command is used to delete individual cells of a record.
The below command is the syntax of delete command in the HBase Shell.
1 delete 'Table Name' ,'Row Key','Column Family:Column'
1 delete 'employee',1, 'Personal info:Name'
Drop Table:
To drop any table in HBase, first, it is required to disable the table. The query will return an
error if the user is trying to delete the table without disabling the table. Disable removes the
indexes from memory.
The below command is used to disable and drop the table.
1 disable 'employee'
Once the table is disabled, the user can drop using below syntax.
1 drop 'employee'
You can verify the table in using ‘exist’ command and enable table which is already disabled,
just use ‘enable’ command.