INFORMATION AND
DATABASE
MANAGEMENT
SYSTEMS (IDBMS)
Anukriti Bansal
Learning Objective
• The course is primarily concerned with the capture, digitization, representation,
organization, transformation, and presentation of information; algorithms for
efficient and effective access and updating of stored information; data modeling
and abstraction; and physical file storage techniques.
Course Outcomes
• Understand the different issues involved in the design and implementation of
a database system.
• Apply the modeling concepts and notation of the relational data model.
• Determine database storage structures and access techniques for a given problem.
• Understand the basic working of database management aspects in terms of
transaction processing, concurrency control, and recovery.
Course Outcomes
• Information Management Concepts
• Introduction to DBMS
• Data Modeling
• Relational Databases
• Transaction Processing, Concurrency Control, and Recovery
• Query Languages
• File structures, Indexing, and Hashing
• Transaction Processing, Concurrency Control, and Recovery
• Advanced Topics
Textbook
• R. Elmasri and S. Navathe, Fundamentals of Database Systems , Addison-Wesley,
6 th ed., 2011
Reference books
• Silberschatz, H. Korth, and S. Sudarshan, Database System Concepts, McGraw-
Hill.
• R. Ramakrishnan, Database Management Systems, WCB/McGraw-Hill.
• C.J. Date, An Introduction to Database Systems, Pearson, 8th ed.
Lab Evaluation (25%)
Item Weightage
Continuous evaluation (Quiz,
5%
Assignment)
Midterm 5%
Final Examination 5%
Project 10%
Theory Evaluation (75%)
Item Weightage
Continuous
24% ( 8+8+8)
evaluation (Quiz)
Midterm
21%
Final Examination
30%
• Data is a representation of facts, concepts, or instructions in a formalized manner which should be
suitable for communication, interpretation, or processing by humans or machines.
• Data is represented by characters such as alphabets, digits, or special characters.
• Most data is being converted into a digital format
§ Driven by user demand
§ Facilitated by Video
Ø Increase in data processing capabilities
01010101010
Ø Lower cost and increased speed of storage
10101011010
Ø Affordable and faster Network Photo
00010101011
01010101010
Who creates data? Book 10101010101
• Individuals 01010101010
• Businesses
Digital Data
Letter
E-Mail Attachments PDFs
• Data can be categorized as either structured or
unstructured data X-Rays
Check Unstructured (80%)
Structured
¢ Data Bases Manuals Instant Messages
¢ Spread Sheets
Images
Unstructured Documents
¢ Forms Forms
¢ Images
Web Pages
¢ Audio Contracts
Rich Media
¢ Movies
Invoices
• Over 80% of Information is unstructured Audio Video
Structured (20%)
Data
Summarizing the data
Averaging the data
Selecting part of the data
Graphing the data
Adding context
Adding value
Information
Organized/Processed form of data is known as information
Definitions:
v data that have been processed so that they are meaningful;
v data that have been processed for a purpose;
v data that have been interpreted and understood by the recipient.
v data that have been processed on which decisions and actions are based.
Data Records
IR
Text Multimedia
• Files
Most sizeable companies have huge stores of electronic files scattered throughout the enterprise (a legacy of
desktop networking). Letters, memos, reports, spreadsheets, database files, presentations, etc.
• Databases
Companies usually maintain a number of databases on several different hardware and software platforms.
• Email
Most employees communicate with email and much of an enterprise’s internal and external business
communication is done via email (and attachments).
• Instant Messaging (IM)
This is becoming the way employees talk to one another in real-time.
• Electronic Publishing
Most companies produce printed material such as catalogs, brochures, flyers, contact sheets, product
specification sheets, newsletters, business reports, etc. Also, an increasing amount of information exists only
in electronic format (e.g. Web pages, PDF documents, Intranets).
“Digital universe – The Information Explosion”
§ 21st Century is information era
§ Information is being created at ever increasing rate
§ Information has become critical for success
We live in an on-command, on-demand world
Example: Social networking sites, e-mails, video and photo sharing website,
online shopping, search engines etc
Information management is a big challenge
Organization seek to Store Protect Optimize
• Categories
• Equations
• Neural networks
• Natural language statements
• Logic statements
• Images
• Information capture is the process of collecting paper documents, forms and e-
documents, transforming them into accurate, retrievable, digital information,
and delivering the information into business applications and databases for
immediate action
• Organizations and businesses need to determine the best way of carrying out
data/information capture, as fits their purpose.
• Methods of information capture:
• Manual method
• Automated method
• Optical character recognition (OCR)
• Intelligent character recognition (ICR)
• Optical mark reading (OMR)
• Magnetic ink character recognition (MICR)
• Smart cards
• Web data capture
• Voice recognition
There are many ways to apply the information stored in representations.
• Retrieval
– Finding useful information
• Recognition
– Identifying an instance
• Inference
– Extend stored information to a new situation
Humans gain access to information and data to support their needs by:
• Search
• Link
• Browse
• Navigate
• An organized system for the collection, organization, storage, and communication of
information.
• An Information System is the system of persons, data records and activities that
process the data and information in a given organization, including manual processes
or automated processes.
• Information systems are created to capture, store, and support access of information
representations.
• Information systems include the Web, databases, libraries, archives, and enterprise
content management systems, etc.
• People use information systems so designers need to match content and the system
interface to the user’s needs.
• Computer based information system:
A combination of
Hardware
Software
Infrastructure and
Trained personnel
organized to facilitate
Planning
Control
Coordination and
Decision Making
in an organization.
ØHardware
These are the devices like the monitor, CPU, and keyboard, all of which work
together to accept, process, show data and information.
ØSoftware
The term software refers to computer programs and the manuals (if any) that allow
the hardware to process data
ØData
Data are facts that are used by programs to produce useful information. Data is
stored in files.
ØNetworks
Connecting system that allows diverse computers to distribute resources.
ØProcedures
Procedures are the policies that govern the operation of a computer system.
ØPeople
Every system needs people if it is to be useful. Often the most overlooked element
of the system are the people, probably the component that most influence the
success or failure of information systems. This includes "not only the users, but
those who operate and service the computers, those who maintain the data, and
those who support the network of computers."
• Executive Support System (ESS)
An Executive Support System ("ESS") is designed to help senior management make
strategic decisions.
• Management Information System (MIS)
A management information system (“MIS”) is mainly concerned with internal
sources of information and summarizes it into a series of management reports.
• Decision Support System (DSS)
Decision-support systems ("DSS") are specifically designed to help management make
decisions in situations where there is uncertainty about the possible outcomes of those
decisions.
• Knowledge Management System (KMS)
Knowledge Management Systems ("KMS") exist to help businesses create and share
information.
• Transaction Processing System (TPS)
Transaction Processing Systems ("TPS") are designed to process routine transactions
efficiently and accurately.
• Office Automation System (OAS)
Office Automation Systems are systems that try to improve the productivity of
employee who need to process data and information.
• Information Management:
• Management of information resources.
• Design of information technology components.
• Analysis of information processing procedures.
• Deriving knowledge from the information corpus.
} Information Management System:
A general term for software designed to facilitate/manage the storage,
organization, and retrieval of information.
• Query: A request to access data from a database to manipulate it or
retrieve it.
• Declarative Query
• Ask the system “what to fetch”.
• Used in relational database models
• Navigational Query
• Instruct the system in a sequence of steps how to reach a required record.
• Used in network database models
• Reliability
• Scalability
• Efficiency
• Effectiveness
Information and Database
Management Systems
(CSE 220)
Anukriti Bansal
Acknowledgements
Many slides of this presentation are from
“Fundamentals of Database Systems, by
Elmasri, Navathe”.
Data:
• information in raw or unorganized form.
• facts and statistics collected together for
reference or analysis.
Database:
• Is an organized collection of data.
• Collection of related data.
• Collection of schemas, tables, queries,
reports, views etc. etc.
Databases these days:
• Used daily: J
– Facebook
• Posts
• Likes
– Twitter
• Tweets
– Online shopping
• amazon.com
• flipcart.com
Database Management System
(DBMS):
• A collection of programs that enables users to
create and maintain a database.
• A general purpose software system that
facilitates the process of:
– Defining
– Constructing
DBMS Functionalities
– Manipulating
– Sharing
databases among various users and applications.
DBMS Functionalities:
• Define a database: in terms of data types,
structures and constraints
– Metadata: data about data
• Construct or Load the Database on a
secondary storage medium
• Manipulating the database: querying,
generating reports, insertions, deletions and
modifications to its content
• Concurrent Processing and Sharing by a set of
users and programs – yet, keeping all data
valid and consistent
Simplified database system
environment:
Source: Rameez Elmasri and Shamkant B. Navathe
DBMS Functionalities:
Other features:
– Protection or Security measures to prevent
unauthorized access
– “Active” processing to take internal actions on
data
– Presentation and Visualization of data
– Maintenance of the database and associated
programs over the lifetime of the database
application
Example of a simple database
Source: Rameez Elmasri and Shamkant B. Navathe
Database approach vs file processing
approach:
• Self describing nature of a database system.
• Insulation between programs and data, and
data abstraction.
• Support of multiple views of data.
• Sharing of data and multiuser transaction
processing.
– Allowing concurrent users for retrieval & updating
– Concurrency control
– Ensuring complete transaction
– OLTP (online transaction processing)
Example of a simplified database
catalog:
Source: Rameez Elmasri and Shamkant B. Navathe
Data Abstraction:
• A data model is used to hide storage details
and present the users with a conceptual view*
of the database.
• Programs refer to the data model constructs
rather than data storage details.
Database Users:
• Users may be divided into:
– Those who actually use and control the database
content, (called “Actors on the Scene”), and
– Those who design and develop the DBMS
software and related tools, and the computer
systems operators (called “Workers Behind the
Scene”).
1. Actors on the scene:
1. Database Administrators
– Responsible for authorizing access to the database,
for coordinating and monitoring its use, acquiring
software and hardware resources, controlling its use
and monitoring efficiency of operations.
2. Database Designers
– Responsible to define the content, the structure, the
constraints, and functions or transactions against the
database. They must communicate with the end-
users and understand their needs.
Contd…
1. Actors on the scene:
3. End Users : They use the data for queries,
reports and some of them update the
database content. End-users can be
categorized into:
a. Casual end users: access database occasionally
b. Naive or Parametric end users: they make up a
large section of the end-user population.
– They use previously well-defined functions in the form of
“canned transactions” against the database.
– Users of Mobile Apps mostly fall in this category
– Bank-tellers or reservation clerks are parametric users who do
this activity for an entire shift of operations.
– Social Media Users post and read information from websites
Contd…
1. Actors on the scene:
c. Sophisticated end users:
– These include business analysts, scientists, engineers, others
thoroughly familiar with the system capabilities.
– Many use tools in the form of software packages that work
closely with the stored database.
d. Stand-alone users:
– Mostly maintain personal databases using ready-to-use
packaged applications.
– An example is the user of a tax program that creates its own
internal database.
– Another example is a user that maintains a database of
personal photos and videos.
Contd…
1. Actors on the scene:
4. System Analysts and Application Programmers (S/W
Engineers): This category currently accounts for a
very large proportion of the IT work force.
a. System Analysts: They understand the user
requirements of naïve and sophisticated users and design
applications including canned transactions to meet those
requirements.
b. Application Programmers: Implement the
specifications developed by analysts and test and debug
them before deployment.
c. Business Analysts: There is an increasing need for such
people who can analyze vast amounts of business data and
real-time data (“Big Data”) for better decision making
related to planning, advertising, marketing etc.
2. Workers behind the scene:
1. DBMS system designers and implementers: Design and
implement DBMS packages in the form of modules and
interfaces and test and debug them. The DBMS must
interface with applications, language compilers, operating
system components, etc.
2. Tool developers: Design and implement software systems
called tools for modeling and designing databases,
performance monitoring, prototyping, test data generation,
user interface creation, simulation etc. that facilitate building
of applications and allow using database effectively.
3. Operators and maintenance personnel: They manage the
actual running and maintenance of the database system
hardware and software environment.
Advantages of using DBMS approach:
• Controlling redundancy
• Restricting unauthorized access
• Providing persistent storage for program objects
• Providing storage structures for efficient query
processing
• Providing backup and recovery
• Providing multiple user interfaces
• Representing complex relationships among data
Contd…
Advantages of using DBMS approach:
• Enforcing integrity constraints
• Permitting inferencing and actions using rules
• Additional implications of using database
approach:
– Potential for enforcing standards
– Reduced application development time
– Flexibility
– Availability of up-to-date information
When not to use a DBMS:
Overhead cost of using DBMS because of:
• High initial investment in hardware, software
and training
• The generality that a DBMS provides for
defining and processing data.
• Overhead for providing security, concurrency
control, recovery and integrity functions.
When not to use a DBMS:
If the database designers and DBA do not properly
design the database or if the database systems
are not implemented properly:
• The database and applications are simple, well
defined and not expected to change.
• There are stringent real time requirements for
some programs that may not be met because of
DBMS overhead.
• Multiple-user access to data is not required.
Information and Database
Management Systems
(CSE 220)
Anukriti Bansal
Acknowledgements
Many slides of this presentation are from
“Fundamentals of Database Systems, by
Elmasri, Navathe”.
Example of a simple database
Source: Rameez Elmasri and Shamkant B. Navathe
View of Data
• View is a single table, derived from other
tables. These other tables could be base
tables or previously defined views.
• A view doesn’t exist in physical form, it is
considered as a virtual table.
In other words:
• View logically represents subset of data from
one or more tables.
Why to use View:
• To restrict database access
• To make complex queries easy
• To allow data independence
• To present different views of same data
Data Abstraction: Once Again??
• A data model is used to hide storage details
and present the users with a conceptual view
of the database.
• Programs refer to the data model constructs
rather than data storage details
Data Abstraction:
• For a system to be usable, it must retrieve
data efficiently.
• Developers hide the complexity from users
through several levels of abstraction.
Levels of Data Abstraction:
1. Physical Level (Internal): The lowest level of
abstraction describes how the data are
actually stored. The physical level describes
complex low level data structures in detail.
2. Logical Level (Conceptual): It describes ‘what
data are stored in database’ and what
relationship exists among those data.
3. View Level (External): The highest level of
abstraction which describes only part of
entire database.
Data Abstraction:
View Level
What data users and
application programs
see ?
View 1 View 2 … View n
What data is stored ? Logical
describe data properties such as Level
data semantics, data relationships
How data is actually stored ?
e.g. are we using disks ? Which Physical
file system ? Level
Database Instances/ Database state:
• The collection of information stored in the
database at a particular moment is called an
instance of a database.
Database Instances/ Database state:
• Database State:
– Refers to the content of a database at a moment
in time.
• Initial Database State:
– Refers to the database state when it is initially
loaded into the system.
• Valid State:
– A state that satisfies the structure and constraints
of the database.
Example of a database state:
Source: Rameez Elmasri and Shamkant B. Navathe
Data Model:
• A collection of concepts, used to describe the
structure* of a database.
• It provides the necessary means to achieve
abstraction.
* Structure of a database means data types,
relationships and constraints that should hold
for the data.
Relational Data Model:
• The central data description construct in this
model is a relation, which can be thought of as
a set of records.
• A description of data in terms of a data model
is called a schema.
Relational Data Model…
• In the relational model, the schema for a
relation specifies its name, the name of each
field (or attribute or column), and the type of
each field.
• Students(sid: string, name: string, login:string,
age: integer, cgpa:real)
Other Data Models
• Relational data model: IBM’s DB2, Informix, Oracle,
Sybase, Microsoft’s Access, FoxBase, Paradox,
Tandem, and Teradata.
• Hierarchical model: IBM’s IMS DBMS
• Network model: IDS and IDMS
• Object-oriented model: Objectstore and Versant
• Object-relational model: Used in the DBMS products
from IBM, Informix, ObjectStore, Oracle, Versant, and
others
Database Schemas:
• The overall design of the database is called the
database schema.
In other words:
• The description of a database is called the
database schema. Includes descriptions of the
database structure, data types, and the
constraints on the database.
Example of a Database Schema:
Source: Rameez Elmasri and Shamkant B. Navathe
Three Schema Architecture:
• Goal is to separate “user application” and
“physical database”.
• Useful in explaining database system
organization
• Three schema architecture defines DBMS
schemas at three levels are:
1. Internal schema
2. Conceptual schema
3. External schema
Three Schema Architecture:
1. Conceptual schema
2. Internal schema
3. External schemas
Conceptual Schema
• Also called as logical schema
• Describes the stored data in terms of data
model of the DBMS
• In relational DBMS, it describes all relations
that are stored in the database
• In our university database, these relations
contains information about the entities, such
as students and courses, and about
relationships, such as prerequisites for courses
Internal Schema
• Also called as physical schema.
• It summarizes how the relations described in
the conceptual schema are actually stored on
secondary devices such as disks.
• Auxiliary data structures called indexes are
created to speed up data retrieval operations
• Decisions about the physical schema are
based on an understanding of how the data is
typically accessed.
External Schema
• Allow data access to be customized (and
authorized) at the level of individual users or a
group of users.
• It consists of a collection of one or more views
and relations from the conceptual schema.
Three Schema Architecture:
Three-Schema Architecture:
• Mappings among schema levels are needed to
transform requests and data.
– Programs refer to an external schema, and are
mapped by the DBMS to the internal schema for
execution.
– Data extracted from the internal DBMS level is
reformatted to match the user’s external view
(e.g. formatting the results of an SQL query for
display in a Web page)
Data Independence:
• Data independence is the capacity to change
the schema at one level of a database system
without any change in the schema at the next
higher level. Only the mapping between the
two levels is changed.
• Two types of Data Independence:
1. Logical Data Independence
2. Physical Data Independence
1. Logical Data Independence:
• It is the capacity to change conceptual
schema without having any change in
external schema.
• We may change conceptual schema to
expand the database, to change the
constraints, or to reduce the database.
• After conceptual schema undergoes a logical
reorganization, application program that
refer the external schema must work as
before.
2. Physical Data Independence:
• It is the capacity to change the internal
schema without having any change in the
conceptual schema. Hence the external
schema need not be changed as well .
• Change to the internal schema may be needed
because some physical files had to be
reorganized.
Database Languages:
1. Data-definition Language (DDL)
2. Data-manipulation Language (DML)
a. Procedural DMLs
b. Declarative DMLS (Non Procedural DML)
1. Data-definition Language (DDL):
• DDL is a set of definitions which specify a
database schema.
• DDL is used by DBA and database designers to
define all schemas.
Eg. -> create table ACCOUNT (name char(10),
balance integer)
Execution of above DDL statement creates ACCOUNT table plus
updates a special set of tables called Data Dictionary* or Data
Directory.
2. Data-manipulation Language (DML):
• DML is a language that enables users to access
or manipulate data as organized by
appropriate data model.
• Two types of DML:
a. Procedural DMLs-require a user to specify what
data are needed and how to get those data.
b. Declarative DMLs (Non-Procedural)- require a
user to specify what data are needed without
specifying how to get those data.
Data Manipulation Language is:
• The retrieval of information stored in
database.
• The insertion of information into the
database.
• The deletion of information from the
database.
• The modification of information stored in
database.
Information and
Database
Management
System
Anukriti Bansal
Database Languages
1. Data-definition Language (DDL)
2. Data-manipulation Language (DML)
a. Procedural DMLs
b. Declarative DMLS (Non Procedural DML)
Data-manipulation Language (DML)
• DML is a language that enables users to access or
manipulate data as organized by appropriate data model.
• The types of access are:
– Retrieval of information stored in the database
– Insertion of new information into the database
– Deletion of information from the database
– Modification of information stored in the database
Data-manipulation Language (DML)
• Two types of DML:
a. Procedural DMLs-require a user to specify what data are needed
and how to get those data.
b. Declarative DMLs (Non-Procedural)- require a user to specify
what data are needed without specifying how to get those data.
Data-definition Language (DDL)
• DDL is a set of definitions which specify a database schema.
• DDL is used by DBA and database designers to define all
schemas.
Eg. -> create table ACCOUNT (name char(10), balance
integer)
Execution of above DDL statement creates ACCOUNT table plus updates a special
set of tables called Data Dictionary* or Data Catalog.
Data-definition Language (DDL)
• Storage structure and access methods used by the database
system are specified by a set of statements in a special type
of DDL called a data storage and definition language.
• The data values stored in the database must satisfy certain
consistency constraints.
Consistency/Integrity Constraints
• Domain Constraints
– A domain/range of possible values must be associated with every
attribute (for example, integer types, character types, date/time
types).
– Declaring an attribute to be of a particular domain acts as a
constraint on the values that it can take.
Consistency/Integrity Constraints…
• Referential Integrity
– To ensure that a value that appears in one relation for a given set
of attributes also appears in a certain set of attributes in another
relation.
– For example, the department listed for each course must be one
that actually exists. More precisely, the dept_name value in a
course record must appear in the dept_name attribute of some
record of the department relation.
Consistency/Integrity Constraints…
• Assertions
– An assertion is any condition that the database must always
satisfy.
– Domain constraints and referential-integrity constraints are
special forms of assertions.
Consistency/Integrity Constraints…
• Authorization
– Permissions on accessing data in the database
– Read authorization
– Insert authorization
– Update authorization
– Delete authorization
Components of a Database System
1. Storage Manager
– Provides the interface between the low-level data stored in the
database and the application programs and queries submitted to the
system.
– Responsible for storing, retrieving, and updating data in the database.
2. Query Processor
– It allows database users to obtain good performance while being able
to work at the view level without understanding the physical-level
details.
Storage Manager
• The components of storage manager
1. Authorization and integrity manager: tests for the satisfaction of
integrity constraints and checks the authority of users to access data.
2. Transaction manager: ensures that the database remains in a
consistent (correct) state despite system failures, and that concurrent
transaction executions proceed without conflicting.
3. File manager: manages the allocation of space on disk storage and the
data structures used to represent information stored on disk.
4. Buffer manager: responsible for fetching data from disk storage into
main memory and deciding what data to cache in main memory.
Storage Manager
The storage manager implements several data structures as
part of the physical system implementation:
• Data files, which store the database itself
• Data dictionary, which stores metadata about the structure
of the database
• Indices, which can provide fast access to data items
Query Processor
The query processor components include:
• DDL interpreter
– interprets DDL statements and records the definitions in the data dictionary.
• DML compiler
– translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine
understands.
– performs query optimization
• Query evaluation engine
– executes low-level instructions generated by the DML compiler
Centralized DBMS Architectures
• Combines everything into single system including- DBMS
software, hardware, application programs, and user
interface processing software.
• User can still connect through a remote terminal – however,
all processing is done at centralized site.
Centralized DBMS Architectures
Source: Rameez Elmasri and Shamkant B. Navathe
Client Server DBMS Architecture
• Client Module: handles user interaction and provides the
user friendly interfaces such as forms or menu based GUIs
• Server Module: handles data storage, access, search and
other functions.
Three Tier Client-Server Architecture
• Common for Web applications
• Intermediate Layer called Application Server or Web Server:
– Stores the web connectivity software and the business logic part of the application
used to access the corresponding data from the database server
– Acts like a conduit for sending partially processed data between the database server
and the client.
• Three-tier Architecture Can Enhance Security:
– Database server only accessible via middle tier
– Clients cannot directly access database server
– Clients contain user interfaces and Web browsers
– The client is typically a PC or a mobile device connected to the Web
Two-Tier Three-Tier Client-Server Architecture
Information and Database
Management Systems
Anukriti Bansal
Relational Database Concept
• Dr. E. F. Codd proposed the relational model for database systems in
1970.
• It is the basis for the relational database management system (RDBMS)
• The relational model consists of the following:
• Collection of objects or relations
• Set of operators to act on the relations
• Data integrity for accuracy and consistency
Definition of a Relational Database
• A relational database is a collection of relations or two-dimensional
tables.
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The department relation
The instructor relation
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The prereq relation
The course relation
Structure of Relational Databases…
• In the relational model
• the term relation is used to refer to a table
• the term tuple is used to refer to a row
• the term attribute refers to a column of a table
• Domain is a set of permitted values for each attribute of a relation
Database Schema and Database Instance
• Database schema is the logical design of the database
• consists of a list of attributes and their corresponding domains
• Database instance is a snapshot of the data in the database at a given
instant in time
department (dept name, building, budget)
Keys
• Key: A set of attribute values by which a tuple within a relation can be
identified uniquely
Keys…
• Super Key:
• A set of one or more attributes that, taken collectively, allow us to identify
uniquely a tuple in the relation.
• Example: ID in the instructor table is a super key, name is not
• If K is a superkey, then so is any superset of K.
• For example, the combination of ID and name is a superkey for the relation instructor
• Candidate Key:
• minimal superkeys
• superkeys for which no proper subset is a superkey
Keys…
• Primary key
• Candidate key chosen by the database designer as a principal means of
identifying tuples within a relation
• Foreign Key
• Attribute in one table which is a primary key of another table
• A relation, say r1, may include among its attributes the primary key of another
relation, say r2. This attribute is called a foreign key from r1, referencing r2. The
relation r1 is also called the referencing relation of the foreign key dependency,
and r2 is called the referenced relation of the foreign key.
Structured Query Language
• Acronym: SQL
• Pronounced “Sequel” or “S-Q-L”
• Originally developed by IBM as the SEQUEL language in 1970s
• Designed to support Edgar Codd’s relational model
• Based on Relational Algebra
Structured Query Language
The SQL language has several parts:
• Data-definition language (DDL): Provides commands for defining relation
schemas, deleting relations, and modifying relation schemas.
• Data-manipulation language (DML): Provides the ability to query
information from the database and to insert tuples into, delete tuples
from, and modify tuples in the database.
• Integrity
• View definition
• Transaction control
• Embedded SQL and Dynamic SQL
• Authorization
Data Definition Language (DDL)
• The SQL DDL allows specification of a set of relations, and information
about each relation, including:
• The schema for each relation.
• The types of values associated with each attribute.
• The integrity constraints.
• The set of indices to be maintained for each relation.
• The security and authorization information for each relation
• The physical storage structure of each relation on disk
Basic Types
• char(n): A fixed-length character string with user-specified length n. The full form, character,
can be used instead.
• varchar(n): A variable-length character string with user-specified maximum length n. The full
form, character varying, is equivalent.
• int: An integer (a finite subset of the integers that ismachine dependent). The full form,
integer, is equivalent.
• smallint: A small integer (a machine-dependent subset of the integer type).
• numeric(p, d): Afixed-point numberwith user-specified precision. The number consists of p
digits (plus a sign), and d of the p digits are to the right of the decimal point. Thus,
numeric(3,1) allows 44.5 to be stored exactly, but neither 444.5 or 0.32 can be stored exactly
in a field of this type.
• real, double precision: Floating-point and double-precision floating-point numbers with
machine-dependent precision.
• float(n): A floating-point number, with precision of at least n digits.
Each type may include a special value called the null value.
Basic Schema Definition
• We define an SQL relation by using the create table command.
• The general form of the create table command is:
Basic Schema Definition: Integrity Constraints
• primary key (Aj1 , Aj2, . . . , Ajm ): Not null and unique
• foreign key (Ak1 , Ak2, . . . , Akn ) references s:
• not null: specifies that the null value is not allowed for that attribute
The department relation
The course relation
The department relation
The course relation
The instructor relation
The instructor relation
Information and Database
Management Systems
Anukriti Bansal
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The department relation
The instructor relation
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The prereq relation
The course relation
The department relation
The course relation
The instructor relation
The instructor relation
SQL Statements
•SELECT
•INSERT Data Manipulation Language
•UPDATE
•DELETE
•CREATE
•ALTER
•DROP Data Definition Language
•RENAME
•TRUNCATE
•COMMIT
•ROLLBACK Transaction Control Language
•SAVEPOINT
•GRANT Data Control Language (DCL)
•REVOKE
About SQL statements
• SQL statements art not case sensitive
• SQL statements can be on one or more lines
• Keywords cannot be abbreviated or split across lines
• Clauses are usually placed on separate lines
• Tabs and indents are used to enhance readability
Insert Statement
• To load data into the relation
insert into instructor
values (10211, ’Nidhi’, ’Biology’, 66000);
• The values are specified in the order in which the corresponding
attributes are listed in the relation schema
• More about insert statement later
Retrieving Data Using SELECT Statement
• The essential capabilities of SELECT statement are selection, projection,
and joining.
• Project Operation: Displaying specific columns
• Select Operation: Displaying specific rows
• Join Operation: Combines data from two or more tables based on one or
more common columns.
Retrieving Data Using SELECT Statement
• Syntax
SQL> SELECT {*|[DISTINCT] column | expression}
FROM table
[WHERE condition(s)]
• SELECT identifies which columns
• FROM identifies which table
• WHERE identifies which rows
Selecting All Columns
• SQL> SELECT * FROM instructor;
Selecting Specific Columns
• SQL> SELECT name FROM instructor;
Selecting Specific Columns
• SQL> SELECT name, dept_name FROM instructor;
Selecting Specific Columns
Selecting Specific Columns
select dept_name
from instructor;
Eliminating Duplicate Rows
• Eliminate duplicate rows by using the DISTINCT keyword in the SELECT
clause.
select distinct dept_name dept_name
from instructor; Comp. Sci.
Finance
Music
Physics
History
Biology
Elec. Eng.
Arithmetic Expressions
• Create expressions on NUMBER and DATE data by using arithmetic
operators.
Operator Description
+ Add
- Subtract
* Multiply
/ Divide
Arithmetic Expressions: Operator Precedence
_
* / +
• Multiplication and division take priority over addition and subtraction.
• Operators of the same priority are evaluated from left to right.
• Parentheses are used to force prioritized evaluation and to clarify
statements.
Arithmetic Expressions
select name, dept_name, salary, salary * 1.1
from instructor;
name dept_name salary salary * 1.1
Srinivasan Comp. Sci. 65000 71500
Wu Finance 90000 99000
Mozart Music 40000 44000
Einstein Physics 95000 104500
El Said History 60000 66000
Gold Physics 87000 95700
Katz Comp. Sci. 75000 82500
Califieri History 62000 68200
Singh Finance 80000 88000
Crick Biology 72000 79200
Brandt Comp. Sci. 92000 101200
Kim Elec. Eng. 80000 88000
• This shows what would result if we gave a 10% raise to each instructor
• Note: It does not result in any change to the instructor relation.
Null Value
• If a row lacks the data value for a particular column, that value is said to
be null, or to contain null.
• A null value is a value that is unavailable, unassigned, unknown, or
inapplicable.
• A null value is not the same as zero or a space. Zero is a number, and a
space is a character.
• Columns of any datatype can contain null values, unless the column was
defined as NOT NULL or as PRIMARY KEY when the column was created.
Defining a Null Value
insert into instructor
values(‘111011’, ‘Joy’, ‘Comp. Sci.’, NULL);
select name, dept_name, salary
from instructor; name dept_name salary
Srinivasan Comp. Sci. 65000
Wu Finance 90000
Mozart Music 40000
Einstein Physics 95000
El Said History 60000
Gold Physics 87000
Katz Comp. Sci. 75000
Califieri History 62000
Singh Finance 80000
Crick Biology 72000
Brandt Comp. Sci. 92000
Kim Elec. Eng. 80000
Joy Comp. Sci. NULL
Null Values in Arithmetic Expressions
• Arithmetic expressions containing a null value evaluate to null
select name, dept_name, salary, 10000 + salary
from instructor; Name dept_name salary 10000 + salary
Srinivasan Comp. Sci. 65000 75000
Wu Finance 90000 100000
Mozart Music 40000 50000
Einstein Physics 95000 105000
El Said History 60000 70000
Gold Physics 87000 97000
Katz Comp. Sci. 75000 85000
Califieri History 62000 72000
Singh Finance 80000 90000
Crick Biology 72000 82000
Brandt Comp. Sci. 92000 102000
Kim Elec. Eng. 80000 90000
Joy Comp. Sci. NULL NULL
Limiting Rows Using a Selection
DEPT_NO DEPT_NAME LOCATION
10 CSE JAIPUR
20 ECE JODHPUR
30 CCE UDAIPUR
40 MME JAISALMER
Displaying
50 CCE UDAIPUR information
60 MME JAISALMER about CCE
department only
DEPT_NO DEPT_NAME LOCATION
30 CCE UDAIPUR
50 CCE UDAIPUR
Information and Database
Management Systems
Anukriti Bansal
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The department relation
The instructor relation
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The prereq relation
The course relation
Limiting Rows Using a Selection
DEPT_NO DEPT_NAME LOCATION
10 CSE JAIPUR
20 ECE JODHPUR
30 CCE UDAIPUR
40 MME JAISALMER
Displaying
50 CCE UDAIPUR information
60 MME JAISALMER about CCE
department only
DEPT_NO DEPT_NAME LOCATION
30 CCE UDAIPUR
50 CCE UDAIPUR
Using the BETWEEN Operator
• Use the BETWEEN condition to display rows based on a range of values.
SQL> SELECT DEPT_NO, DEPT_NAME, LOCATION
FROM dept
WHERE DEPT_NO BETWEEN 20 AND 50;
DEPT_NO DEPT_NAME LOCATION
20 ECE JODHPUR
30 CCE UDAIPUR
40 MME JAISALMER
50 CCE UDAIPUR
Both lower limit and upper limit values will be included
Using the IN Operator
• Use the IN operator to test for values in a list
SQL> SELECT DEPT_NO, DEPT_NAME, LOCATION
FROM dept
WHERE DEPT_NO IN (20, 40, 50);
DEPT_NO DEPT_NAME LOCATION
20 ECE JODHPUR
40 MME JAISALMER
50 CCE UDAIPUR
Using the LIKE Operator
• Use the LIKE condition to perform wildcard searches of valid search
string values.
• Search conditions can contain either literal characters or numbers:
• % denotes zero or many characters.
• _ denotes one character.
Using the LIKE Operator
SQL> SELECT LOCATION
FROM dept
WHERE LOCATION LIKE ‘J%’;
This means that first
character should be J
and after that anything
LOCATION
JAIPUR
JODHPUR
JAISALMER
Using the LIKE Operator
• You can combine pattern-matching character
SQL> SELECT LOCATION
FROM dept
WHERE LOCATION LIKE ‘_A%’;
This means that first
letter can be anything
LOCATION then A and after that
JAIPUR any substring
JAISALMER
Using the LIKE Operator
• Find all locations where the last character is ‘R’
Using the LIKE Operator
• Find all locations where the last character is ‘R’
SQL> SELECT LOCATION
FROM dept
WHERE LOCATION LIKE ‘%R’;
LOCATION
JAIPUR
JODHPUR
UDAIPUR
JAISALMER
Using the IS NULL Operator
• Test for null values with the IS NULL operator.
SQL> SELECT DEPT_NO, DEPT_NAME
FROM DEPT
WHERE LOCATION IS NULL;
This will display the list
of DEPT_NO,
DEPT_NAME where the
value of LOCATION is
NULL
Logical Operators
Operator Meaning
AND Returns TRUE if both component
conditions are true
OR Returns TRUE if either component
condition is true
NOT Returns TRUE if the following condition
is false
Using the AND Operator
• AND requires both conditions to be TRUE.
SQL>SELECT DEPT_NO, DEPT_NAME, LOCATION
FROM dept
WHERE DEPT_NO >=10
AND DEPT_NAME=‘CSE’;
DEPT_NO DEPT_NAME LOCATION
10 CSE JAIPUR
Using the OR Operator
• OR requires either condition to be TRUE.
SQL>SELECT DEPT_NO, DEPT_NAME, LOCATION
FROM dept
WHERE DEPT_NO >20
OR DEPT_NAME=‘CSE’;
DEPT_NO DEPT_NAME LOCATION
30 CCE UDAIPUR
40 MME JAISALMER
Using the NOT Operator
SQL>SELECT DEPT_NO, DEPT_NAME, LOCATION
FROM dept
WHERE LOCATION NOT IN (‘JAISALMER’, ‘JAIPUR’);
DEPT_NO DEPT_NAME LOCATION
20 ECE JODHPUR
30 CCE UDAIPUR
50 CCE UDAIPUR
Rules of Precedence
Order Evaluated Operator
1 All comparison operators
2 NOT
3 AND
4 OR
• Override rules of precedence by using parenthesis.
Rules of Precedence
SQL> SELECT DEPT_NO, DEPT_NAME, LOCATION
FROM dept
WHERE DEPT_NAME = ‘MME’
OR DEPT_NAME = ‘CCE’
AND DEPT_NO>10;
DEPT_NO DEPT_NAME LOCATION
30 CCE UDAIPUR
40 MME JAISALMER
Rules of Precedence
• Use parentheses to force priority
SQL> SELECT DEPT_NO, DEPT_NAME, LOCATION
FROM dept
WHERE (DEPT_NAME = ‘MME’
First braces will be
OR DEPT_NAME = ‘CCE’) operated then the result
will be treated with
AND DEPT_NO>10; outer operators
DEPT_NO DEPT_NAME LOCATION
30 CCE UDAIPUR
40 MME JAISALMER
Queries on Multiple Relations
CID Name Age Address OID Date CID Amount
1 Ramesh 32 Ahmedabad 100 2009-10-08 3 3000
2 Hardik 27 Bhopal 102 2009-10-08 3 1500
3 Jaya 18 Udaipur 106 2009-11-20 2 1560
4 Riya 40 Meerut 108 2008-05-20 4 2060
SQL> SELECT Customers.CID, Name, Amount, Date
FROM Customers, Orders;
CID Name Amount Date
1 Ramesh 3000 2009-10-08
1 Ramesh 1500 2009-10-08
1 Ramesh 1560 2009-11-20
1 Ramesh 2060 2008-05-20
2 Hardik 3000 2009-10-08
2 Hardik 1500 2009-10-08
2 Hardik 1560 2009-11-20
2 Hardik 2060 2008-05-20
3 Jaya 3000 2009-10-08
3 Jaya 1500 2009-10-08
3 Jaya 1560 2009-11-20
3 Jaya 2060 2008-05-20
4 Riya 3000 2009-10-08
4 Riya 1500 2009-10-08
4 Riya 1560 2009-11-20
4 Riya 2060 2008-05-20
Restricting Rows with WHERE Clause
SQL> SELECT Customers.CID, Name, Amount, Date
FROM Customers, Orders
WHERE Customers.CID=Orders.CID;
CID Name Amount Date
2 Hardik 1560 2009-11-20
3 Jaya 3000 2009-10-08
3 Jaya 1500 2009-10-08
4 Riya 2060 2008-05-20
The department relation
The instructor relation
Retrieve the names of all instructors, along with their department names
and department building name?
SELECT name, instructor.dept_name, building
FROM instructor, department
WHERE instructor.dept_name= department.dept_name;
Creating Alias (Renaming Operation)
• Using AS clause
• old-name as new-name
• The as clause can appear in both the select and from clauses
SELECT name as instructor_name, I.dept_name, building
FROM instructor as I, department as D
WHERE I.dept_name= D.dept_name;
• Identifier, such as I and D, that is used to rename a relation is referred to
as a correlation name in the SQL standard. Also referred to as a table
alias, or a correlation variable, or a tuple variable.
Compare Tuples in the Same Relation
• Find the names of all instructors whose salary is greater than at least
one instructor in the Biology department.
SELECT distinct T.name
FROM instructor as T, instructor as S
WHERE T.salary > S.salary and S.dept_name = ’Biology’;
Ordering the Display of Tuples
• Sort rows with the ORDER BY clause
• ASC: ascending order, default
• DESC: descending order
• The ORDER BY clause comes last in the SELECT statement.
SELECT name
FROM instructor
WHERE dept_name = ’Physics’
ORDER BY DESC name;
Ordering the Display of Tuples…
• List the entire instructor relation in descending order of salary. If several
instructors have the same salary, we order them in ascending order by
name
Ordering the Display of Tuples…
• List the entire instructor relation in descending order of salary. If several
instructors have the same salary, we order them in ascending order by
name
select *
from instructor
order by salary desc, name asc;
The section relation
The section relation
• Find the set of all courses taught in the Fall 2009 semester
The section relation
• Find the set of all courses taught in the Fall 2009 semester
select course_id
from section
where semester = ’Fall’ and year= 2009;
The section relation
• Find the set of all courses taught in the Spring 2010 semester
select course_id
from section
where semester = ’Spring’ and year= 2010;
The section relation
Information and Database
Management Systems
Anukriti Bansal
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The department relation
The instructor relation
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The prereq relation
The course relation
The section relation
• Find the set of all courses taught in the Fall 2009 semester
The section relation
• Find the set of all courses taught in the Fall 2009 semester
select course id
from section
where semester = ’Fall’ and year= 2009;
The section relation
• Find the set of all courses taught in the Spring 2010 semester
select course id
from section
where semester = ’Spring’ and year= 2010;
The section relation
Set Operations
• Union
• Intersect
• Except
The Union Operation
• To find the set of all courses taught either in Fall 2009 or in Spring 2010,
or both
(select course_id
from section
where semester = ’Fall’ and year= 2009)
union
(select course_id
from section
where semester = ’Spring’ and year= 2010);
The Union Operation with Duplicates
(select course_id
from section
where semester = ’Fall’ and year= 2009)
union all
(select course_id
from section
where semester = ’Spring’ and year= 2010);
• The number of duplicate tuples in the result is equal to the total number of duplicates that
appear in both c1 and c2.
• Example: If it were the case that 4 sections of ECE-101 were taught in the Fall 2009 semester and 2
sections of ECE-101 were taught in the Fall 2010 semester, then there would be 6 tuples with ECE-101
in the result.
The Intersect Operation
• To find the set of all courses taught in the Fall 2009 as well as in Spring
2010
(select course_id
from section
where semester = ’Fall’ and year= 2009)
intersect
(select course_id
from section
where semester = ’Spring’ and year= 2010);
The Intersect Operation with Duplicates
(select course_id
from section
where semester = ’Fall’ and year= 2009)
intersect all
(select course_id
from section
where semester = ’Spring’ and year= 2010);
• The number of duplicate tuples in the result is equal to the minimum number of duplicates
that appear in both c1 and c2.
• Example: If it were the case that 4 sections of ECE-101 were taught in the Fall 2009 semester
and 2 sections of ECE-101 were taught in the Fall 2010 semester, then there would be 2
tuples with ECE-101 in the result.
The Except Operation
• To find the set of all courses taught in the Fall 2009 but not in Spring
2010
(select course_id
from section
where semester = ’Fall’ and year= 2009)
except
(select course_id
from section
where semester = ’Spring’ and year= 2010);
The Except Operation with Duplicates
(select course_id
from section
where semester = ’Fall’ and year= 2009)
except all
(select course_id
from section
where semester = ’Spring’ and year= 2010);
• The number of duplicate tuples in the result is equal to the minimum number of duplicates in c1
minus the number of duplicates in c2, provided the difference is positive.
• Example: If it were the case that 4 sections of ECE-101 were taught in the Fall 2009 semester and 2
sections of ECE-101 were taught in the Fall 2010 semester, then there are 2 tuples with ECE-101 in the
result. If, however, there were two or fewer sections of ECE-101 in the the Fall 2009 semester, and
two sections of ECE-101 in the Spring 2010 semester, there is no tuple with ECE-101 in the result.
Aggregate Functions
• Aggregate functions are functions that take a collection (a set or multiset) of
values as input and return a single value.
• SQL offers five built-in aggregate functions:
• Average: avg
• Minimum: min
• Maximum: max
• Total: sum
• Count: count
• The input to sum and avg must be a collection of numbers, but the other
operators can operate on collections of nonnumeric data types, such as
strings, as well.
Basic Aggregation
• Consider the query “Find the average salary of instructors in the
Computer Science department.” We write this query as
select avg (salary) as avg_salary
from instructor
where dept_name= ’Comp. Sci.’;
• The result of this query is a relation with a single attribute, containing a
single tuple with a numerical value corresponding to the average salary
of instructors in the Computer Science department.
• Find the total number of instructors who teach a course in the Spring
2010 semester
• Count the number of tuples in
relation teaches
The teaches relation
• Find the total number of instructors who teach a course in the Spring
2010 semester
select count (distinct ID)
from teaches
where semester = ’Spring’ and year = 2010;
select count (*)
from course;
• SQL does not allow the use of distinct
with count (*)
The teaches relation
Aggregation with Grouping: Group By Clause
• Apply the aggregate function to a group of sets of tuples using group by
clause
• The attribute or attributes given in the group by clause are used to form
groups
• Tuples with the same value on all attributes in the group by clause are
placed in one group.
Aggregation with Grouping: Group By Clause
• Find the average salary in each department
select dept_name, avg (salary) as avg_salary
from instructor
group by dept_name;
Aggregation with Grouping: Group By Clause
• When an SQL query uses grouping, it is important to ensure that the
only attributes that appear in the select statement without being
aggregated are those that are present in the group by clause.
/* erroneous query */
select dept_name, ID, avg (salary) as avg_salary
from instructor
group by dept_name;
The Having Clause
• Used to state a condition that applies to groups rather than to tuples.
• SQL applies predicates in the having clause after groups have been formed, so
aggregate functions may be used
select dept_name, avg (salary) as avg_salary
from instructor
group by dept_name
having avg (salary) > 42000;
• As was the case for the select clause, any attribute that is present in the having
clause without being aggregated must appear in the group by clause, otherwise the
query is treated as erroneous.
Modification of the Database
• Deletion
• Insertion
• Updates
Modification of the Database: Deletion
delete from r
where P;
• where, P represents a predicate and r represents a relation. The delete
statement first finds all tuples t in r for which P(t) is true, and then
deletes them from r.
• The where clause can be omitted, in which case all tuples in r are
deleted.
• A delete command operates on only one relation. If we want to delete
tuples from several relations, we must use one delete command for
each relation.
Modification of the Database: Deletion Examples
• Delete all tuples from the instructor relation.
delete from instructor;
• Delete all tuples in the instructor relation pertaining to instructors in the Finance
department.
delete from instructor
where dept name= ’Finance’;
• Delete all instructors with a salary between $13,000 and $15,000.
delete from instructor
where salary between 13000 and 15000;
• Delete all tuples in the instructor relation for those instructors associated with a department
located in the Watson building.
delete from instructor
where dept_name in (select dept_name
from department
where building = ’Watson’);
• Delete all tuples in the instructor relation for those instructors whose
salary is less than the average salary of all instructors
• Delete all tuples in the instructor relation for those instructors whose
salary is less than the average salary of all instructors
delete from instructor
where salary< (select avg (salary)
from instructor);
Modification of the Database: Insertion
Modification of the Database: Updates
• To change a value in a tuple without changing all values in the tuple
• Update salaries of all instructors by increasing it by 5 percent
• If a salary increase is to be paid only to instructors with salary of less
than $70,000, we can write
Modification of the Database: Updates
• Give a 5 percent salary raise to instructors whose salary is less than
average
Modification of the Database: Updates
• Give a 5 percent salary raise to instructors whose salary is less than
average
Modification of the Database: Updates
• All instructors with salary over $100,000 receive a 3 percent raise,
whereas all others receive a 5 percent raise. We could write two update
statements:
Modification of the Database: Updates
Information and Database
Management Systems
Anukriti Bansal
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The department relation
The instructor relation
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The prereq relation
The course relation
The section relation
Modification of the Database
• Deletion
• Insertion
• Updates
Modification of the Database: Deletion
delete from r
where P;
• where, P represents a predicate and r represents a relation. The delete
statement first finds all tuples t in r for which P(t) is true, and then
deletes them from r.
• The where clause can be omitted, in which case all tuples in r are
deleted.
• A delete command operates on only one relation. If we want to delete
tuples from several relations, we must use one delete command for
each relation.
Modification of the Database: Deletion Examples
• Delete all tuples from the instructor relation.
delete from instructor;
• Delete all tuples in the instructor relation pertaining to instructors in the Finance
department.
delete from instructor
where dept name= ’Finance’;
• Delete all instructors with a salary between $13,000 and $15,000.
delete from instructor
where salary between 13000 and 15000;
• Delete all tuples in the instructor relation for those instructors associated with a department
located in the Watson building.
delete from instructor
where dept_name in (select dept_name
from department
where building = ’Watson’);
• Delete all tuples in the instructor relation for those instructors whose
salary is less than the average salary of all instructors
• Delete all tuples in the instructor relation for those instructors whose
salary is less than the average salary of all instructors
delete from instructor
where salary< (select avg (salary)
from instructor);
Modification of the Database: Insertion
Modification of the Database: Updates
• To change a value in a tuple without changing all values in the tuple
• Update salaries of all instructors by increasing it by 5 percent
• If a salary increase is to be paid only to instructors with salary of less
than $70,000, we can write
Modification of the Database: Updates
• Give a 5 percent salary raise to instructors whose salary is less than
average
Modification of the Database: Updates
• Give a 5 percent salary raise to instructors whose salary is less than
average
Modification of the Database: Updates
• All instructors with salary over $100,000 receive a 3 percent raise,
whereas all others receive a 5 percent raise. We could write two update
statements:
Modification of the Database: Updates
Nested Subqueries
• SQL provides a mechanism for nesting subqueries.
• A subquery is a select-from-where expression that is nested within another
query.
• The nesting can be done in the following SQL query
select A1, A2, …., An
from r1, r2, …., rn
where P
as follows:
• Ai can be replaced by a query that generates a single values
• ri can be replaced by any valid subquery
• P can be replaced with an expression of the form:
B <operation> (subquery)
where B is an attribute and <operation> to be defined later
Subqueries in the Where Clause
• A common use of subqueries is to perform tests:
• For set membership
• For set comparisons
• For set cardinality
Set Membership
• Find all the courses taught in the both the Fall 2009 and Spring 2010
semesters
• We begin by finding all courses taught in Spring 2010, and we write the
subquery
(select course_id
from section
where semester = ’Spring’ and year= 2010)
Nested Subqueries…
• Find all the courses taught in the both the Fall 2009 and Spring 2010
semesters
• We then need to find those courses that were taught in the Fall 2009
and that appear in the set of courses obtained in the subquery
select distinct course_id
from section
where semester = ’Fall’ and year= 2009 and
course_id in (select course_id
from section
where semester = ’Spring’ and year= 2010);
Set Membership
• Find all the courses taught in the both the Fall 2009 and Spring 2010.
select distinct course_id
from section
where semester = ’Fall’ and year= 2009 and
course_id in (select course_id
from section
where semester = ’Spring’ and year= 2010);
• Find all the courses taught in Fall 2009 but not in Spring 2010.
select distinct course_id
from section
where semester = ’Fall’ and year= 2009 and
course_id in (select course_id
from section
where semester = ’Spring’ and year= 2010);
Set Comparison: “some” Clause
• Find the names of all instructors whose salary is greater than at least
one instructor in the Biology department
Set Comparison: “some” Clause
• Find the names of all instructors whose salary is greater than at least
one instructor in the Biology department
select distinct T.name
from instructor as T, instructor as S
where T.salary > S.salary and S.dept_name = ’Biology’;
• The phrase “greater than at least one” is represented in SQL by > some
Set Comparison: “some” Clause
select name
from instructor
where salary > some (select salary
from instructor
where dept_name = ’Biology’);
• SQL also allows < some, <= some, >= some, = some, and <> some
comparisons.
• As an exercise, verify that = some is identical to in, whereas <> some is not the
same as not in.
Set Comparison: “all” Clause
• The construct > all corresponds to the phrase “greater than all”.
• Find the names of all instructors that have a salary value greater than that of
each instructor in the Biology department.
select name
from instructor
where salary > all (select salary
from instructor
where dept_name = ’Biology’);
• SQL also allows < all, <= all, >= all, = all, and <> all comparisons.
• As an exercise, verify that <> all is identical to not in, whereas =all is not the
same as in.
Use of “exists” Construct
• The exists construct returns the value true if the argument subquery is
nonempty.
• Another way of specifying “Find all courses taught in both the Fall 2009
semester and in the Spring 2010 semester”
select course_id
from section as S
where semester = ’Fall’ and year= 2009 and
exists (select *
from section as T
where semester = ’Spring’ and year= 2010 and
S.course_id= T.course_id);
• A subquery that uses a correlation name from an outer query is called a
correlated subquery.
Use of “not exists” Construct
• We can test for the nonexistence of tuples in a subquery by using the
not exists construct
• We can write “relation A contains relation B” as “not exists (B except A).”
Use of “not exists” Construct
• Find all students who have taken all courses offered in the Biology department
select distinct S.ID, S.name
from student as S
where not exists ((select course_id
from course
where dept_name = ’Biology’)
except
(select T.course_id
from takes as T
where S.ID = T.ID));
Test for the Absence of Duplicate Tuples
• The unique construct tests whether a subquery has any duplicate tuples in its
result
• The unique construct evaluates to “true” if a given subquery contains no
duplicates
• Find all courses that were offered at most once in 2009
select T.course_id
from course as T
where unique (select R.course_id
from section as R
where T.course id= R.course_id and
R.year = 2009);
Subqueries in the From Clause
• SQL allows a subquery expression to be used in the from clause.
• The key concept applied here is that any select-from-where expression
returns a relation as a result and, therefore, can be inserted into
another select-from-where anywhere that a relation can appear.
Subqueries in the From Clause
• Find the average instructors’ salaries of those departments where the
average salary is greater than $42,000.
select dept_name, avg_salary
from (select dept_name, avg (salary) as avg_salary
from instructor
group by dept_name)
where avg_salary > 42000;
Relation obtained as a result of
subquery
Subqueries in the From Clause
• Another way of writing the previous query
select dept_name, avg_salary
from (select dept_name, avg (salary)
from instructor
group by dept_name)
as dept_avg(dept_name, avg_salary)
where avg_salary > 42000;
The with Clause
• The with clause provides a way of defining a temporary relation whose
definition is available only to the query in which the with clause occurs
• Finds departments with the maximum budget
with max_budget (value) as
(select max(budget)
from department)
select budget
from department, max_budget
where department.budget = max_budget.value;
Complex Queries using with Clause
• Find all departments where the total salary is greater than the average of the
total salary at all departments
with dept_total (dept name, value) as
(select dept_name, sum(salary)
from instructor
group by dept_name),
dept_total_avg(value) as
(select avg(value)
from dept_total)
select dept_name
from dept_total, dept_total_avg
where dept_total.value >= dept_total_avg.value;
Scalar Subquery
• Scalar subquery is one which is used where a single value is expected
• List all departments along with the number of instructors in each
department
select dept_name,
(select count(*)
from instructor
where department.dept name = instructor.dept_name)
as num_instructors
from department;
• Scalar subqueries can occur in select, where, and having clauses
Information and Database
Management Systems
Anukriti Bansal
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The department relation
The instructor relation
Structure of Relational Databases
• A relational database consists of a collection of tables, each of which is
assigned a unique name.
The prereq relation
The course relation
The section relation
Scalar Subquery
• Scalar subquery is one which is used where a single value is expected
• List all departments along with the number of instructors in each
department
select dept_name,
(select count(*)
from instructor
where department.dept name = instructor.dept_name)
as num_instructors
from department;
• Scalar subqueries can occur in select, where, and having clauses
Join Expressions
• Join operations take two relations and return as a result another relation
Natural Join
The instructor relation
The teaches relation
Natural Join
• For all instructors in the university who have taught some course, find
their names and the course ID of all courses they taught.
select name, course id
from instructor, teaches
where instructor.ID= teaches.ID;
• This query can be written more concisely using the natural-join
operation in SQL as:
select name, course id
from instructor natural join teaches;
The instructor relation
The teaches relation
Natural Join
• A from clause in an SQL query can have multiple relations combined using
natural join, as shown here:
select A1, A2, . . . , An
from r1 natural join r2 natural join . . . natural join rm
where P;
Natural Join
The course relation The prereq relation
select * from course natural join prereq;
Natural Join
The course relation The prereq relation
select * from course natural join prereq;
Join with Using Clause
• Form of natural join that only requires values to match on specified
attributes
The course relation The prereq relation
select * from course join prereq using(course_id);
Join with on Clause
• Form of natural join that only requires values to match on specified
attributes
The course relation The prereq relation
select * from course join prereq on course.course_id=prereq.course_id;
Outer Join
• It preserve those tuples that would be lost in a join, by creating tuples in
the result containing null values.
• The left outer join preserves tuples only in the relation named before (to the left
of) the left outer join operation.
• The right outer join preserves tuples only in the relation named after (to the
right of) the right outer join operation.
• The full outer join preserves tuples in both relations.
Left Outer Join
The course relation The prereq relation
select course natural left outer join prereq;
Right Outer Join
The course relation The prereq relation
select course natural right outer join prereq;
Full Outer Join
The course relation The prereq relation
select course natural left outer join prereq
union
select course natural right outer join prereq;
Stored Procedures
• Stored procedure contain set of SQL commands that be stored and called again later.
• Count the total number of instructors in a department
delimiter //
create procedure dept_count_proc(in dept varchar(20),
out d_count int)
begin
select count(*) into d_count
from instructor
where dept_name= dept;
end//
delimiter ;
call dept_count_proc(‘Biology’, @count);
select @count;
• The keywords in and out indicate, respectively, parameters that are expected to have values assigned
to them and parameters whose values are set in the procedure in order to return results.
Triggers
• A trigger is a statement that is executed automatically by the system as a
side effect of a modification to the database
• To design a trigger mechanism, we must:
• specify the conditions under which the trigger is to be executed
• specify the actions to be taken when the trigger executes
• Main purpose is to implement the complex integrity constraints that
can’t be done with the CREATE TABLE or ALTER TABLE command.
Trigger Timing
• BEFORE
where a trigger will be activated before DML process on table occur
• AFTER
where a trigger will be activated after DML process on table occur
Trigger Syntax
CREATE [OR REPLACE]
TRIGGER trigger_name
BEFORE (or AFTER)
INSERT OR UPDATE [OF COLUMNS] OR DELETE
ON tablename
[FOR EACH ROW [WHEN (condition)]]
BEGIN
...
END;
Triggers: Example
CREATE TABLE account (acct_num INT, amount NUMERIC(10,2));
CREATE TRIGGER ins_sum
BEFORE
INSERT
ON account
FOR EACH ROW
SET @sum = @sum + NEW.amount;
Triggers: Example
SET @sum = 0;
INSERT INTO account
VALUES(137,14.98),(141,1937.50),(97,-100.00);
SELECT @sum AS 'Total amount inserted';
Triggers Example
delimiter //
CREATE TRIGGER upd_check
BEFORE
UPDATE
ON account
FOR EACH ROW
BEGIN
IF NEW.amount < 0 THEN
SET NEW.amount = 0;
ELSEIF NEW.amount > 100 THEN
SET NEW.amount = 100;
END IF;
END;//
delimiter ;
Managing Trigger
• Enable Trigger
ALTER TRIGGER trigger_name ENABLE;
• Disable Trigger
ALTER TRIGGER trigger_name DISABLE;
• Enable or Disable All Triggers
ALTER TABLE table_name DISABLE | ENABLE ALL TRIGGERS;
• Delete TRIGGER
DROP TRIGGER name_trigger;
Data Definition Language Revisited
• Drop table Command: Remove a relation from an SQL database
drop table r;
How it is different from delete from r; ?
• Alter table command: Add or drop attributes to a relation
alter table r add A D;
alter table r drop A;
alter table r drop Primary Key;
Default Values
create table student
(ID varchar (5),
name varchar (20) not null,
dept_name varchar (20),
tot_cred numeric (3,0) default 0,
primary key (ID));
insert into student(ID, name, dept name)
values (’12789’, ’Newman’, ’Comp. Sci.’);
Integrity Constraints
• Integrity constraints ensure that changes made to the database by
authorized users do not result in a loss of data consistency
• Entity Integrity: Constraints on a single relation
• An instructor name cannot be null
• No two instructors can have the same instructor ID
• The budget of a department must be greater than $0.00
• Referential Integrity: ensure that a value that appears in one relation for
a given set of attributes also appears for a certain set of attributes in
another relation
• Every department name in the course relation must have a matching department
name in the department relation.
Entity Integrity
• Primary Key
• Unique
• Not null
• check
Referential Integrity
• Referential Integrity is the mechanism the system provides to maintain
foreign keys
• Referential Integrity means that the Foreign key must match in terms of
actual values and data types with the related Primary Key.
• An unmatched non-null foreign key identifies a non-existent object and
is in error
Referential Integrity Rules (Foreign Key Rules)
• How is referential integrity maintained in a database? Some
operations that may cause a violation …
• Insert of PK values – no problem
• Update of PK values – what happens to matching foreign keys?
• Delete of PK values – what happens to matching foreign keys?
• Insert of FK values – disallowed unless matching primary key exists
• Update of FK values – disallowed unless matching primary key exists
• Delete of FK values (FK Values set to NULL) – no problem as long as NULL
values are allowed in the FK
Foreign Key with Cascade
• If there is a chain of foreign-key dependencies across multiple relations, a deletion or
update at one end of the chain can propagate across the entire chain.
• If a delete of a tuple in department results in this referential-integrity constraint
being violated, the system does not reject the delete. Instead, the delete “cascades”
to the course relation, deleting the tuple that refers to the department that was
deleted.
Date and Time Type in SQL
• date: A calendar date containing a (four-digit) year, month, and day of
the month
• time: The time of day, in hours, minutes, and seconds. A variant, time(p),
can be used to specify the number of fractional digits for seconds (the
default being 0). It is also possible to store time-zone information along
with the time by specifying time with timezone
• timestamp: A combination of date and time. A variant, timestamp(p),
can be used to specify the number of fractional digits for seconds (the
default here being 6). Time-zone information is also stored if with
timezone is specified.
Date and Time Type in SQL
• Date and time values can be specified like this:
date ’2001-04-25’
time ’09:30:00’
timestamp ’2001-04-25 10:29:01.45’
• We can use an expression of the form cast e as t to convert a character
string (or string valued expression) e to the type t, where t is one of
date, time, or timestamp.