0% found this document useful (0 votes)
23 views69 pages

Unit 1: Introduction: What Is Data?

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views69 pages

Unit 1: Introduction: What Is Data?

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

What is data?

• “Data is the new oil.”


• Today data is everywhere in every field.
• Whether you are a data scientist, marketer, businessman, data analyst, researcher, or you
Unit 1: Introduction are in any other profession, you need to play or experiment with raw or structured data.
• This data is so important for us that it becomes important to handle and store it properly,
without any error.
• While working on these data, it is important to know the types of data to process them
and get the right results.
There are two types of data: Qualitative and Quantitative data

Content Types of Data


• What is data? The data is classified into four categories:
• Types of data • Nominal data.
• Data classification • Ordinal data.
• Data lifecycle
• Discrete data.
• Database-System Applications
• Purpose of Database Systems • Continuous data.
• View of Data
• Database Languages
• Database Design
• Database Engine
• Database Architecture
• Introduction to data mining and data warehousing
Qualitative or Categorical Data Quantitative Data
• Quantitative data can be expressed in numerical values, making it countable and including
• Qualitative or Categorical Data is data that can’t be measured or counted in the statistical data analysis. These kinds of data are also known as Numerical data. It answers
form of numbers. These types of data are sorted by category, not by number. the questions like “how much,” “how many,” and “how often.” For example, the price of a
That’s why it is also known as Categorical Data. These data consist of audio, phone, the computer’s ram, the height or weight of a person, etc., falls under quantitative
images, symbols, or text. The gender of a person, i.e., male, female, or others, is data.
qualitative data. • Quantitative data can be used for statistical manipulation. These data can be represented
• Qualitative data tells about the perception of people. This data helps market on a wide variety of graphs and charts, such as bar graphs, histograms, scatter plots,
researchers understand the customers’ tastes and then design their ideas and boxplots, pie charts, line graphs, etc.
strategies accordingly. • Examples of Quantitative Data :
• The other examples of qualitative data are : • Height or weight of a person or object
• What language do you speak • Room Temperature
• Favourite holiday destination • Scores and Marks (Ex: 59, 80, 60, etc.)
• Opinion on something (agree, disagree, or neutral) • Time
• Colours

Classification of Qualitative Data Classification of Quantitative Data


Data Lifecycle Database System Applications
• Enterprise Information
• Sales: customers, products, purchases
• Accounting: payments, receipts, assets
• Human Resources: Information about employees, salaries, payroll taxes.
• Manufacturing: management of production, inventory, orders, supply chain.
• Banking and finance
• customer information, accounts, loans, and banking transactions.
• Credit card transactions
• Finance: sales and purchases of financial instruments (e.g., stocks and bonds; storing
real-time market data
• Universities: registration, grades

Database Systems Database System Applications (conti…)


• DBMS contains information about a particular enterprise
• Collection of interrelated data • Airlines: reservations, schedules
• Set of programs to access the data • Telecommunication: records of calls, texts, and data usage, generating monthly bills,
• An environment that is both convenient and efficient to use maintaining balances on prepaid calling cards
• Database systems are used to manage collections of data that are: • Web-based services
• Highly valuable • Online retailers: order tracking, customized recommendations
• Relatively large • Online advertisements
• Accessed by multiple users and applications, often at the same time. • Document databases
• A modern database system is a complex software system whose task is to manage a • Navigation systems: For maintaining the locations of varies places of interest along with
large, complex collection of data. the exact routes of roads, train systems, buses, etc.
• Databases touch all aspects of our lives
Purpose of Database Systems University Database Example
In the early days, database applications were built directly on top of file systems, which leads to:
• In this text we will be using a university database to illustrate all the concepts
• Data redundancy and inconsistency: data is stored in multiple file formats resulting
induplication of information in different files • Data consists of information about:
• Students
• Difficulty in accessing data
• Instructors
• Need to write a new program to carry out each new task
• Classes
• Data isolation
• Application program examples:
• Multiple files and formats
• Add new students, instructors, and courses
• Integrity problems • Register students for courses, and generate class rosters
• Integrity constraints (e.g., account balance > 0) become “buried” in program code • Assign grades to students, compute grade point averages (GPA) and generate
rather than being stated explicitly transcripts
• Hard to add new constraints or change existing ones

Purpose of Database Systems (Cont.) View of Data


• Atomicity of updates • A database system is a collection of interrelated data and a set of programs that allow
• Failures may leave database in an inconsistent state with partial updates carried out users to access and modify these data.
• Example: Transfer of funds from one account to another should either complete or not happen
at all • A major purpose of a database system is to provide users with an abstract view of the
• Concurrent access by multiple users data.
• Concurrent access needed for performance • Data models
• Uncontrolled concurrent accesses can lead to inconsistencies
• Ex: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 • A collection of conceptual tools for describing data, data relationships, data
each) at the same time semantics, and consistency constraints.
• Security problems • Data abstraction
• Hard to provide user access to some, but not all, data • Hide the complexity of data structures to represent data in the database from
users through several levels of data abstraction.
Data Models Levels of Abstraction
• A collection of tools for describing • Physical level: describes how a record (e.g., instructor) is stored.
• Data
• Logical level: describes data stored in database, and the relationships among the data.
• Data relationships
• Data semantics type instructor = record
• Data constraints ID : string;
• Relational model name : string;
dept_name : string;
• Entity-Relationship data model (mainly for database design) salary : integer;
• Object-based data models (Object-oriented and Object-relational) end;
• Semi-structured data model (XML) • View level: application programs hide details of data types. Views can also hide
• Other older models: information (such as an employee’s salary) for security purposes.
• Network model
• Hierarchical model

Relational Model View of Data


• All the data is stored in various tables.
• Example of tabular data in the relational model
An architecture for a database system

Ted Codd
Turing Award 1981
Instances and Schemas Database example for reference
• Similar to types and variables in programming languages
• Logical Schema – the overall logical structure of the database
• Example: The database consists of information about a set of
customers and accounts in a bank and the relationship between
them
• Analogous to type information of a variable in a program
• Physical schema – the overall physical structure of the database
• Instance – the actual content of the database at a particular point in
time
• Analogous to the value of a variable

Physical Data Independence Data Definition Language (DDL)


• Physical Data Independence – the ability to modify the physical schema without changing • Specification notation for defining the database schema
the logical schema Example: create table instructor (
ID char(5),
name varchar(20),
• Applications depend on the logical schema dept_name varchar(20),
• In general, the interfaces between the various levels and components should be well salary numeric(8,2))
defined so that changes in some parts do not seriously influence others. • DDL compiler generates a set of table templates stored in a data dictionary
• Data dictionary contains metadata (i.e., data about data)
• Database schema
• Integrity constraints
• Primary key (ID uniquely identifies instructors)
• Authorization
• Who can access what
Data Manipulation Language (DML) Database Access from Application Program
• Non-procedural query languages such as SQL are not as powerful as a universal Turing
• Language for accessing and updating the data organized by the appropriate data model machine.
• DML also known as query language • SQL does not support actions such as input from users, output to displays, or
• There are basically two types of data-manipulation language communication over the network.
• Procedural DML -- require a user to specify what data are needed and how to get those • Such computations and actions must be written in a host language, such as C/C++, Java or
data. Python, with embedded SQL queries that access the data in the database.
• Declarative DML -- require a user to specify what data are needed without specifying • Application programs -- are programs that are used to interact with the database in this
how to get those data. fashion.
• Declarative DMLs are usually easier to learn and use than are procedural DMLs.
• Declarative DMLs are also referred to as non-procedural DMLs
• The portion of a DML that involves information retrieval is called a query language.

SQL Query Language Database Design


• SQL query language is nonprocedural. A query takes as input several tables The process of designing the general structure of the database:
(possibly only one) and always returns a single table.
• Logical Design – Deciding on the database schema. Database design requires that we find
• Example to find all instructors in Comp. Sci. dept
a “good” collection of relation schemas.
select name • Business decision – What attributes should we record in the database?
from instructor • Computer Science decision – What relation schemas should we have and how should
where dept_name = 'Comp. Sci.' the attributes be distributed among the various relation schemas?
• SQL is NOT a Turing machine equivalent language • Physical Design – Deciding on the physical layout of the database
• To be able to compute complex functions SQL is usually embedded in some
higher-level language
• Application programs generally access databases through one of
• Language extensions to allow embedded SQL
• Application program interface (e.g., ODBC/JDBC) which allow SQL queries to
be sent to a database
Normalization Storage Manager
• A program module that provides the interface between the low-level data stored in the database
• Another method for designing a relational database is to use a process commonly known and the application programs and queries submitted to the system.
as normalization • The storage manager is responsible to the following tasks:
• Problems associated with bad database design: • Interaction with the OS file manager
• Repetition of information • Efficient storing, retrieving and updating of data
• Inability to represent certain information • The storage manager components include:
• Authorization and integrity manager
• Transaction manager
• File manager
• Notice that there are two rows in faculty
• Buffer manager
that contain repeated information about the
History department, specifically, that • The storage manager implements several data structures as part of the physical system
department’s building and budget. The implementation:
repetition of information in our alternative • Data files -- store the database itself
design is undesirable. • Data dictionary -- stores metadata about the structure of the database, in particular the
schema of the database.
• Repeating information wastes space. • Indices -- can provide fast access to data items. A database index provides pointers to those
data items that hold a particular value.

Database Engine Query Processor


• A database system is partitioned into modules that deal with each of the responsibilities • The query processor components include:
of the overall system. • DDL interpreter -- interprets DDL statements and records the definitions in the data
• The functional components of a database system can be divided into dictionary.
• The storage manager, • DML compiler -- translates DML statements in a query language into an evaluation
• The query processor component, plan consisting of low-level instructions that the query evaluation engine understands.
• The transaction management component. • The DML compiler performs query optimization; that is, it picks the lowest cost
evaluation plan from among the various alternatives.
• Query evaluation engine -- executes low-level instructions generated by the DML
compiler.
Query Processing Database Architecture
1. Parsing and translation
• Centralized databases
2. Optimization • One to a few cores, shared memory
3. Evaluation • Client-server,
• One server machine executes work on behalf of multiple client machines.
• Parallel databases
• Many core shared memory
• Shared disk
• Shared nothing
• Distributed databases
• Geographical distribution
• Schema/data heterogeneity

Database Architecture
Transactional management (Centralized/Shared-
Memory)
• A transaction is a collection of operations that performs a single logical function in a
database application
• Transaction-management component ensures that the database remains in a consistent
(correct) state despite system failures (e.g., power failures and operating system crashes)
and transaction failures.
• Concurrency-control manager controls the interaction among the concurrent transactions,
to ensure the consistency of the database.
Database Users
Database Applications

Database applications are usually partitioned into two or three parts

• Two-tier architecture -- the application resides at the client machine, where it invokes
database system functionality at the server machine
• Three-tier architecture -- the client machine acts as a front end and does not contain any
direct database calls.
• The client end communicates with an application server, usually through a forms
interface.
• The application server in turn communicates with a database system to access data.

Two-tier and three-tier architectures Database Administrator


A person who has central control over the system is called a database administrator (DBA).
Functions of a DBA include:

• Schema definition
• Storage structure and access-method definition
• Schema and physical-organization modification
• Granting of authorization for data access
• Routine maintenance
• Periodically backing up the database
• Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required
• Monitoring jobs running on the database
History of Database Systems History of Database Systems (Cont.)
• 1950s and early 1960s: • 2000s
• Data processing using magnetic tapes for storage • Big data storage systems
• Tapes provided only sequential access
• Google BigTable, Yahoo PNuts, Amazon,
• Punched cards for input
• “NoSQL” systems.
• Late 1960s and 1970s:
• Hard disks allowed direct access to data • Big data analysis: beyond SQL
• Network and hierarchical data models in widespread use • Map reduce and friends
• Ted Codd defines the relational data model • 2010s
• Would win the ACM Turing Award for this work
• SQL reloaded
• IBM Research begins System R prototype
• UC Berkeley (Michael Stonebraker) begins Ingres prototype • SQL front end to Map Reduce systems
• Oracle releases first commercial relational database • Massively parallel database systems
• High-performance (for the era) transaction processing • Multi-core main-memory databases

History of Database Systems (Cont.) Data Warehouse Vs. Data Mining


• 1980s: • Data warehouse refers to the process of compiling and organizing data into one common
• Research relational prototypes evolve into commercial systems database, whereas data mining refers to the process of extracting useful data from the
• SQL becomes industrial standard databases. The data mining process depends on the data compiled in the data
warehousing phase to recognize meaningful patterns. A data warehousing is created to
• Parallel and distributed database systems support management systems.
• Wisconsin, IBM, Teradata
• Object-oriented database systems
• 1990s:
• Large decision support and data-mining applications
• Large multi-terabyte data warehouses
• Emergence of Web commerce
Data Warehouse Data Warehouse (Cont.)
• A Data Warehouse refers to a place where data can be stored for useful mining. It is like a • The Important features of Data Warehouse are given below:
quick computer system with exceptionally huge data storage capacity. Data from the
various organization's systems are copied to the Warehouse, where it can be fetched and • Subject Oriented: A data warehouse is subject-oriented. It provides useful data
conformed to delete errors. Here, advanced requests can be made against the warehouse about a subject instead of the company's ongoing operations, and these subjects
storage of data. can be customers, suppliers, marketing, product, promotion, etc. A data
warehouse usually focuses on modeling and analysis of data that helps the
business organization to make data-driven decisions.
• Time-Variant: The different data present in the data warehouse provides
information for a specific period.
• Integrated: A data warehouse is built by joining data from heterogeneous sources,
such as social databases, level documents, etc.
• Non- Volatile: It means, once data entered into the warehouse cannot be change.

Data Warehouse (Cont.) Data Warehouse (Cont.)


• Data warehouse combines data from numerous sources which ensure the data quality, accuracy,
and consistency
Advantages of Data Warehouse:
• Data warehouse boosts system execution by separating analytics processing from transnational
databases. • More accurate data access
• Data flows into a data warehouse from different databases. • Improved productivity and performance
• A data warehouse works by sorting out data into a pattern that depicts the format and types of • Cost-efficient
data. • Consistent and quality data
• Query tools examine the data tables using patterns.
• Data warehouses and databases both are relative data systems, but both are made to serve
different purposes.
• A data warehouse is built to store a huge amount of historical data and empowers fast requests
over all the data, typically using Online Analytical Processing (OLAP).
• A database is made to store current transactions and allow quick access to specific transactions for
ongoing business processes, commonly known as Online Transaction Processing (OLTP).
Why Data Mining? Knowledge Discovery (KDD) Process

• The Explosive Growth of Data: from terabytes to petabytes • This is a view from typical database systems
and data warehousing communities Pattern Evaluation
• Data collection and data availability
• Data mining plays an essential role in the
• Automated data collection tools, database systems, Web, computerized knowledge discovery process
society Data Mining

• Major sources of abundant data


Task-relevant Data
• Business: Web, e-commerce, transactions, stocks, …
• Science: Remote sensing, bioinformatics, scientific simulation, … Data Warehouse Selection
• Society and everyone: news, digital cameras, YouTube
Data Cleaning
• We are drowning in data, but starving for knowledge!
Data Integration
• “Necessity is the mother of invention”—Data mining—Automated analysis of
massive data sets
49 Databases 51

What Is Data Mining?


Example: A Web Mining Framework
• Data mining (knowledge discovery from data)
• Extraction of interesting (non-trivial, implicit, previously unknown and • Web mining usually involves
potentially useful) patterns or knowledge from huge amount of data
• Data cleaning
• Data mining: a misnomer?
• Data integration from multiple sources
• Alternative names
• Warehousing the data
• Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging, • Data cube construction
information harvesting, business intelligence, etc. • Data selection for data mining
• Watch out: Is everything “data mining”? • Data mining
• Simple search and query processing • Presentation of the mining results
• (Deductive) expert systems • Patterns and knowledge to be used or stored into knowledge-base

50 52
Data Mining in Business Intelligence KDD Process: A Typical View from ML and Statistics

Increasing potential
to support
business decisions End User
Decision Input Data Data Pre- Data Post-
Processing Mining Processing
Making

Data Presentation Business


Analyst
Visualization Techniques
Data Mining Data
Data integration Pattern discovery Pattern evaluation
Information Discovery Analyst Association & correlation
Normalization Pattern selection
Feature selection Classification Pattern interpretation
Data Exploration Dimension reduction
Clustering
Pattern visualization
Statistical Summary, Querying, and Reporting Outlier analysis
…………
Data Preprocessing/Integration, Data Warehouses
DBA
Data Sources • This is a view from typical machine learning and statistics communities
Paper, Files, Web documents, Scientific experiments, Database Systems
53 55

Example: Mining vs. Data Exploration


Example: Medical Data Mining
• Business intelligence view • Health care & medical data mining – often adopted such a view in statistics and machine
• Warehouse, data cube, reporting but not much mining learning
• Business objects vs. data mining tools • Preprocessing of the data (including feature extraction and dimension reduction)
• Supply chain example: tools
• Classification or/and clustering processes
• Data presentation
• Post-processing for presentation
• Exploration

54 56
Example of a Instructor Relation
attributes
(or columns)

tuples

Unit 2 (or rows)

Introduction to Relational Database and SQL

Outline
• Structure of Relational Databases
Relation Schema and Instance
• Database Schema
• Keys • A1, A2, …, An are attributes
• Schema Diagrams • R = (A1, A2, …, An ) is a relation schema
• Relational Query Languages
Example:
• The Relational Algebra
• Overview of The SQL Query Language instructor = (ID, name, dept_name, salary)
• SQL Data Definition • A relation instance r defined over schema R is denoted by r (R).
• Basic Query Structure of SQL Queries
• Additional Basic Operations
• The current values a relation are specified by a table
• Set Operations • An element t of relation r is called a tuple and is represented by a
• Null Values row in a table
• Aggregate Functions
• Nested Subqueries
• Modification of the Database
Attributes Database Schema
• Database schema -- is the logical structure of the database.
• Database instance -- is a snapshot of the data in the database at a given instant in time.
• The set of allowed values for each attribute is called the domain of the
attribute • Example:
• Attribute values are (normally) required to be atomic; that is, indivisible • schema: instructor (ID, name, dept_name, salary)
• Instance:
• The special value null is a member of every domain. Indicated that the
value is “unknown”
• The null value causes complications in the definition of many
operations

Relations are Unordered Keys


• Let K  R
• Order of tuples is irrelevant (tuples may be stored in an arbitrary • K is a superkey of R if values for K are sufficient to identify a unique tuple of
order) each possible relation r(R)

• Example: instructor relation with unordered tuples • Example: {ID} and {ID,name} are both superkeys of instructor.

• Superkey K is a candidate key if K is minimal


Example: {ID} is a candidate key for Instructor

• One of the candidate keys is selected to be the primary key.

• Which one?
• Foreign key constraint: Value in one relation must appear in another
• Referencing relation
• Referenced relation
• Example: dept_name in instructor is a foreign key from instructor
referencing department
Schema Diagram for University Database Relational Query Languages

• Procedural versus non-procedural, or declarative


• “Pure” languages:
• Relational algebra
• Tuple relational calculus
• Domain relational calculus
• The above 3 pure languages are equivalent in computing power
• We will concentrate in this chapter on relational algebra
• Not Turing-machine equivalent
• Consists of 6 basic operations

Relational Algebra

• A procedural language consisting of a set of operations that take one


or two relations as input and produce a new relation as their result.
• Six basic operators
• select: 
• project: 
• union: 
• set difference: –
• Cartesian product: x
• rename: 
Select Operation Project Operation

• The select operation selects tuples that satisfy a given predicate. • A unary operation that returns its argument relation, with certain attributes left

 p (r)
out.
• Notation:
• Notation:
• p is called the selection predicate
• Example: select those tuples of the instructor relation where the
instructor is in the “Physics” department.
 A1,A2,A3 ….Ak (r)
• Query
where A1, A2, …, Ak are attribute names and r is a relation name.
Slide 10  dept_name=“Physics” (instructor)
Result • The result is defined as the relation of k columns obtained by erasing the columns
that are not listed
• Duplicate rows removed from result, since relations are sets

Select Operation (Cont.) Project Operation Example


• We allow comparisons using • Example: eliminate the dept_name attribute of instructor
=, , >, ,<,  • Query:
in the selection predicate. ID, name, salary (instructor)
• We can combine several predicates into a larger predicate by using the • Result:
connectives:
 (and),  (or),  (not)
• Example: Find the instructors in Physics with a salary greater $90,000, we
write:

 dept_name=“Physics”  salary > 90,000 (instructor)

• The select predicate may include comparisons between two attributes.


• Example, find all departments whose name is the same as their
building name:
•  dept_name=building (department)
The instructor X teaches table
Composition of Relational Operations

• The result of a relational-algebra operation is relation and therefore of relational-algebra


operations can be composed together into a relational-algebra expression.
• Consider the query -- Find the names of all instructors in the Physics department.

name( dept_name =“Physics” (instructor))

• Instead of giving the name of a relation as the argument of the projection operation, we give
an expression that evaluates to a relation.

Cartesian-Product Operation Join Operation


• The Cartesian-Product
• The Cartesian-product operation (denoted by X) allows us to combine information from any instructor X teaches
two relations.
associates every tuple of instructor with every tuple of teaches.
• Example: the Cartesian product of the relations instructor and teaches is written as:
• Most of the resulting rows have information about instructors who did NOT teach a
instructor X teaches particular course.
• We construct a tuple of the result out of each possible pair of tuples: one from the instructor • To get only those tuples of “instructor X teaches “ that pertain to instructors and the
relation and one from the teaches relation (see next slide) courses that they taught, we write:
• Since the instructor ID appears in both relations we distinguish between these attribute by
attaching to the attribute the name of the relation from which the attribute originally came.  instructor.id = teaches.id (instructor x teaches ))
• instructor.ID
• teaches.ID • We get only those tuples of “instructor X teaches” that pertain to instructors and the
courses that they taught.
• The result of this expression, shown in the next slide
Join Operation (Cont.) Union Operation
• The table corresponding to: • The union operation allows us to combine two relations
 instructor.id = teaches.id (instructor x teaches)) • Notation: r  s
• For r  s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (example: 2nd
column of r deals with the same type of values as does the
2nd column of s)
• Example: to find all courses taught in the Fall 2017 semester, or in the Spring 2018 semester, or in both

course_id ( semester=“Fall” Λ year=2017 (section)) 

course_id ( semester=“Spring” Λ year=2018 (section))

Join Operation (Cont.) Union Operation (Cont.)

• The join operation allows us to combine a select operation and a Cartesian-Product operation into a single • Result of:
operation.
• Consider relations r (R) and s (S)
course_id ( semester=“Fall” Λ year=2017 (section)) 
• Let “theta” be a predicate on attributes in the schema R “union” S. The join operation r ⋈𝜃 s is defined as course_id ( semester=“Spring” Λ year=2018 (section))
follows:

𝑟 ⋈𝜃 𝑠 = 𝜎𝜃 (𝑟 × 𝑠)
• Thus

 instructor.id = teaches.id (instructor x teaches ))


• Can equivalently be written as

instructor ⋈ Instructor.id = teaches.id teaches.


Set-Intersection Operation The Assignment Operation
• It is convenient at times to write a relational-algebra expression by assigning parts of it to
• The set-intersection operation allows us to find tuples that are in both the input relations. temporary relation variables.
• Notation: r  s
• The assignment operation is denoted by  and works like assignment in a programming
• Assume: language.
• r, s have the same arity
• Example: Find all instructor in the “Physics” and Music department.
• attributes of r and s are compatible
• Example: Find the set of all courses taught in both the Fall 2017 and the Spring 2018 semesters. Physics   dept_name=“Physics” (instructor)
course_id ( semester=“Fall” Λ year=2017 (section))  Music   dept_name=“Music” (instructor)

course_id ( semester=“Spring” Λ year=2018 (section)) Physics  Music

• With the assignment operation, a query can be written as a sequential program consisting of a
• Result series of assignments followed by an expression whose value is displayed as the result of the
query.

Set Difference Operation


The Rename Operation
• The set-difference operation allows us to find tuples that are in one relation but are not in another.
• Notation r – s
• Set differences must be taken between compatible relations.
• r and s must have the same arity • The results of relational-algebra expressions do not have a name that we can use to refer to
• attribute domains of r and s must be compatible them. The rename operator,  , is provided for that purpose
• The expression:
• Example: to find all courses taught in the Fall 2017 semester, but not in the Spring 2018 semester
course_id ( semester=“Fall” Λ year=2017 (section)) − x (E)
course_id ( semester=“Spring” Λ year=2018 (section)) returns the result of expression E under the name x
• Another form of the rename operation:

x(A1,A2, .. An) (E)


Equivalent Queries History
• IBM Sequel language developed as part of System R project at the IBM San Jose
Research Laboratory
• There is more than one way to write a query in relational algebra.
• Renamed Structured Query Language (SQL)
• Example: Find information about courses taught by instructors in the Physics department with
salary greater than 90,000 • ANSI and ISO standard SQL:
• SQL-86
• Query 1 • SQL-89
 dept_name=“Physics”  salary > 90,000 (instructor) • SQL-92
• SQL:1999 (language name became Y2K compliant!)
• SQL:2003
• Query 2
• Commercial systems offer most, if not all, SQL-92 features, plus varying feature
 dept_name=“Physics” ( salary > 90.000 (instructor)) sets from later standards and special proprietary features.
• Not all examples here may work on your particular system.

• The two queries are not identical; they are, however, equivalent -- they give the same result on
any database.

Equivalent Queries SQL Parts


• There is more than one way to write a query in relational algebra.
• DML -- provides the ability to query information from the database and to insert tuples
• Example: Find information about courses taught by instructors in the Physics department into, delete tuples from, and modify tuples in the database.
• Query 1 • integrity – the DDL includes commands for specifying integrity constraints.

dept_name=“Physics” (instructor ⋈ instructor.ID = teaches.ID teaches) • View definition -- The DDL includes commands for defining views.
• Transaction control –includes commands for specifying the beginning and ending of
transactions.
• Query 2
• Embedded SQL and dynamic SQL -- define how SQL statements can be embedded
( dept_name=“Physics” (instructor)) ⋈ instructor.ID = teaches.ID teaches within general-purpose programming languages.
• Authorization – includes commands for specifying access rights to relations and views.
• The two queries are not identical; they are, however, equivalent -- they give the same result on any
database.
Data Definition Language Create Table Construct

• An SQL relation is defined using the create table command:


The SQL data-definition language (DDL) allows the specification of information about relations, create table r
including: (A1 D1, A2 D2, ..., An Dn,
(integrity-constraint1),
...,
• The schema for each relation. (integrity-constraintk))
• r is the name of the relation
• The type of values associated with each attribute. • each Ai is an attribute name in the schema of relation r
• The Integrity constraints • Di is the data type of values in the domain of attribute Ai
• Example:
• The set of indices to be maintained for each relation.

create table instructor (
Security and authorization information for each relation. ID char(5),
• The physical storage structure of each relation on disk. name varchar(20),
dept_name varchar(20),
salary numeric(8,2))

Integrity Constraints in Create Table


Domain Types in SQL
• Types of integrity constraints
• char(n). Fixed length character string, with user-specified length n. • primary key (A1, ..., An )
• foreign key (Am, ..., An ) references r
• varchar(n). Variable length character strings, with user-specified maximum length n.
• not null
• int. Integer (a finite subset of the integers that is machine-dependent).
• SQL prevents any update to the database that violates an integrity constraint.
• smallint. Small integer (a machine-dependent subset of the integer domain type).
• Example:
• numeric(p,d). Fixed point number, with user-specified precision of p digits, with d digits to the
right of decimal point. (ex., numeric(3,1), allows 44.5 to be stores exactly, but not 444.5 or create table instructor (
0.32) ID char(5),
name varchar(20) not null,
• real, double precision. Floating point and double-precision floating point numbers, with dept_name varchar(20),
machine-dependent precision. salary numeric(8,2),
• float(n). Floating point number, with user-specified precision of at least n digits. primary key (ID),
foreign key (dept_name) references department);
• More are covered in Chapter 4.
And a Few More Relation Definitions Updates to tables
• create table student ( • Insert
ID varchar(5), • insert into instructor values ('10211', 'Smith', 'Biology', 66000);
name varchar(20) not null, • Delete
dept_name varchar(20),
• Remove all tuples from the student relation
tot_cred numeric(3,0),
primary key (ID), • delete from student
foreign key (dept_name) references department); • Drop Table
• drop table r
• create table takes ( • Alter
ID varchar(5), • alter table r add A D
course_id varchar(8), • where A is the name of the attribute to be added to relation r and D is the domain of A.
sec_id varchar(8),
• All exiting tuples in the relation are assigned null as the value for the new attribute.
semester varchar(6),
year numeric(4,0), • alter table r drop A
grade varchar(2), • where A is the name of an attribute of relation r
primary key (ID, course_id, sec_id, semester, year) , • Dropping of attributes not supported by many databases.
foreign key (ID) references student,
foreign key (course_id, sec_id, semester, year) references section);

And more still Basic Query Structure

• create table course ( • A typical SQL query has the form:


course_id varchar(8),
title varchar(50), select A1, A2, ..., An
dept_name varchar(20), from r1, r2, ..., rm
where P
credits numeric(2,0),
primary key (course_id),
• Ai represents an attribute
foreign key (dept_name) references department);
• Ri represents a relation
• P is a predicate.
• The result of an SQL query is a relation.
The select Clause The select Clause (Cont.)
• An asterisk in the select clause denotes “all attributes”
• The select clause lists the attributes desired in the result of a query select *
• corresponds to the projection operation of the relational algebra from instructor
• An attribute can be a literal with no from clause
• Example: find the names of all instructors:
select '437'
select name • Results is a table with one column and a single row with value “437”
from instructor • Can give the column a name using:
select '437' as FOO
• NOTE: SQL names are case insensitive (i.e., you may use upper- or lower-
• An attribute can be a literal with from clause
case letters.)
select 'A'
• E.g., Name ≡ NAME ≡ name from instructor
• Some people use upper case wherever we use bold font. • Result is a table with one column and N rows (number of tuples in the instructors table), each
row with value “A”

The select Clause (Cont.)


The select Clause (Cont.)
• The select clause can contain arithmetic expressions involving the operation, +, –, , and /,
and operating on constants or attributes of tuples.
• The query:
• SQL allows duplicates in relations as well as in query results.
select ID, name, salary/12
• To force the elimination of duplicates, insert the keyword distinct after select. from instructor
• Find the department names of all instructors, and remove duplicates would return a relation that is the same as the instructor relation, except that the value
select distinct dept_name of the attribute salary is divided by 12.
from instructor • Can rename “salary/12” using the as clause:
• The keyword all specifies that duplicates should not be removed. select ID, name, salary/12 as monthly_salary

select all dept_name


from instructor
The where Clause
Examples
• The where clause specifies conditions that the result must satisfy
• Corresponds to the selection predicate of the relational algebra.
• Find the names of all instructors who have taught
• To find all instructors in Comp. Sci. dept some course and the course_id
select name • select name, course_id
from instructor from instructor , teaches
where dept_name = 'Comp. Sci.' where instructor.ID = teaches.ID
• SQL allows the use of the logical connectives and, or, and not
• The operands of the logical connectives can be expressions involving the comparison operators <, <=, >, >=, =, and • Find the names of all instructors in the Art
<>. department who have taught some course and the
course_id
• Comparisons can be applied to results of arithmetic expressions
• select name, course_id
• To find all instructors in Comp. Sci. dept with salary > 70000 from instructor , teaches
select name where instructor.ID = teaches.ID
from instructor and instructor. dept_name = 'Art'
where dept_name = 'Comp. Sci.' and salary > 70000

The Rename Operation


The from Clause • The SQL allows renaming relations and attributes using the as clause:
old-name as new-name
• The from clause lists the relations involved in the query
• Find the names of all instructors who have a higher salary than
• Corresponds to the Cartesian product operation of the relational algebra. some instructor in 'Comp. Sci'.
• Find the Cartesian product instructor X teaches • select distinct T.name
from instructor as T, instructor as S
select where T.salary > S.salary and S.dept_name = 'Comp. Sci.’
from instructor, teaches
• generates every possible instructor – teaches pair, with all attributes from both • Keyword as is optional and may be omitted
relations. instructor as T ≡ instructor T
• For common attributes (e.g., ID), the attributes in the resulting table are renamed
using the relation name (e.g., instructor.ID)
• Cartesian product not very useful directly, but useful combined with where-clause
condition (selection operation in relational algebra).
Self Join Example String Operations (Cont.)

• Patterns are case sensitive.


• Relation emp-super
• Pattern matching examples:
• 'Intro%' matches any string beginning with “Intro”.
• '%Comp%' matches any string containing “Comp” as a substring.
• '_ _ _' matches any string of exactly three characters.
• '_ _ _ %' matches any string of at least three characters.

• Find the supervisor of “Bob” • SQL supports a variety of string operations such as
• concatenation (using “||”)
• Find the supervisor of the supervisor of “Bob”
• converting from upper to lower case (and vice versa)
• Can you find ALL the supervisors (direct and indirect) of “Bob”? • finding string length, extracting substrings, etc.

String Operations Ordering the Display of Tuples

• SQL includes a string-matching operator for comparisons on character strings. The operator • List in alphabetic order the names of all instructors
like uses patterns that are described using two special characters: select distinct name
• percent ( % ). The % character matches any substring. from instructor
• underscore ( _ ). The _ character matches any character. order by name
• Find the names of all instructors whose name includes the substring “dar”. • We may specify desc for descending order or asc for ascending order, for each attribute;
ascending order is the default.
select name
from instructor • Example: order by name desc
where name like '%dar%' • Can sort on multiple attributes
• Match the string “100%” • Example: order by dept_name, name
like '100 \%' escape '\'
in that above we use backslash (\) as the escape character.
Where Clause Predicates Set Operations (Cont.)

• SQL includes a between comparison operator


• Set operations union, intersect, and except
• Example: Find the names of all instructors with salary between $90,000 and $100,000 (that is, $90,000 and • Each of the above operations automatically eliminates
$100,000) duplicates
• select name
from instructor
• To retain all duplicates use the
where salary between 90000 and 100000 • union all,
• Tuple comparison • intersect all
• select name, course_id
from instructor, teaches
• except all.
where (instructor.ID, dept_name) = (teaches.ID, 'Biology');

Set Operations
Null Values
• Find courses that ran in Fall 2017 or in Spring 2018
(select course_id from section where sem = 'Fall' and year = 2017)
union • It is possible for tuples to have a null value, denoted by null, for some of their attributes
(select course_id from section where sem = 'Spring' and year = 2018) • null signifies an unknown value or that a value does not exist.
• Find courses that ran in Fall 2017 and in Spring 2018 • The result of any arithmetic expression involving null is null
(select course_id from section where sem = 'Fall' and year = 2017) • Example: 5 + null returns null
intersect • The predicate is null can be used to check for null values.
(select course_id from section where sem = 'Spring' and year = 2018) • Example: Find all instructors whose salary is null.
• Find courses that ran in Fall 2017 but not in Spring 2018 select name
from instructor
(select course_id from section where sem = 'Fall' and year = 2017)
where salary is null
except
(select course_id from section where sem = 'Spring' and year = 2018) • The predicate is not null succeeds if the value on which it is applied is not null.
Null Values (Cont.)
Aggregate Functions Examples
• SQL treats as unknown the result of any comparison involving a null value (other than
predicates is null and is not null).
• Example: 5 < null or null <> null or null = null • Find the average salary of instructors in the Computer Science department
• The predicate in a where clause can involve Boolean operations (and, or, not); thus the • select avg (salary)
definitions of the Boolean operations need to be extended to deal with the value from instructor
unknown. where dept_name= 'Comp. Sci.';
• and : (true and unknown) = unknown, • Find the total number of instructors who teach a course in the Spring 2018 semester
(false and unknown) = false, • select count (distinct ID)
(unknown and unknown) = unknown from teaches
• or: (unknown or true) = true, where semester = 'Spring' and year = 2018;
(unknown or false) = unknown • Find the number of tuples in the course relation
(unknown or unknown) = unknown
• select count (*)
• Result of where clause predicate is treated as false if it evaluates to unknown from course;

Aggregate Functions Aggregate Functions – Group By


• Find the average salary of instructors in each department
• select dept_name, avg (salary) as avg_salary
• These functions operate on the multiset of values of a column of a relation, and return a from instructor
value group by dept_name;
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
Aggregation (Cont.) Nested Subqueries
• SQL provides a mechanism for the nesting of subqueries. A subquery is a select-from-where
expression that is nested within another query.
• The nesting can be done in the following SQL query
• Attributes in select clause outside of aggregate functions must appear in group by list
• /* erroneous query */ select A1, A2, ..., An
select dept_name, ID, avg (salary) from r1, r2, ..., rm
where P
from instructor
group by dept_name; as follows:
• From clause: ri can be replaced by any valid subquery
• Where clause: P can be replaced with an expression of the form:
B <operation> (subquery)
B is an attribute and <operation> to be defined later.
• Select clause:
Ai can be replaced be a subquery that generates a single value.

Aggregate Functions – Having Clause

• Find the names and average salaries of all departments whose average salary is
greater than 42000

select dept_name, avg (salary) as avg_salary


from instructor
group by dept_name
having avg (salary) > 42000; Set Membership
• Note: predicates in the having clause are applied after the formation of groups
whereas predicates in the where clause are applied before forming groups
Set Membership
• Find courses offered in Fall 2017 and in Spring 2018

select distinct course_id


from section
where semester = 'Fall' and year= 2017 and
course_id in (select course_id
from section Set Comparison
where semester = 'Spring' and year= 2018);

• Find courses offered in Fall 2017 but not in Spring 2018


select distinct course_id
from section
where semester = 'Fall' and year= 2017 and
course_id not in (select course_id
from section
where semester = 'Spring' and year= 2018);

Set Comparison – “some” Clause


Set Membership (Cont.)
• Find names of instructors with salary greater
• Name all instructors whose name is neither “Mozart” nor Einstein” than that of some (at least one) instructor in the
select distinct name Biology department.
from instructor
where name not in ('Mozart', 'Einstein') select distinct T.name
from instructor as T, instructor as S
• Find the total number of (distinct) students who have taken course sections taught where T.salary > S.salary and S.dept name = 'Biology';
by the instructor with ID 10101
select count (distinct ID)
from takes • Same query using > some clause
where (course_id, sec_id, semester, year) in
(select course_id, sec_id, semester, year
select name
from teaches
from instructor
where teaches.ID= 10101);
where salary > some (select salary
from instructor
• Note: Above query can be written in a much simpler manner. where dept name = 'Biology');
The formulation above is simply to illustrate SQL features
Set Comparison – “all” Clause Use of “exists” Clause
• Find the names of all instructors whose salary is greater than the salary of
all instructors in the Biology department . • Yet another way of specifying the query “Find all courses taught in both the Fall 2017
semester and in the Spring 2018 semester”
select course_id
from section as S
where semester = 'Fall' and year = 2017 and
exists (select *
select name from section as T
from instructor where semester = 'Spring' and year= 2018
where salary > all (select salary and S.course_id = T.course_id);
from instructor
where dept name = 'Biology'); • Correlation name – variable S in the outer query
• Correlated subquery – the inner query

Test for Empty Relations Use of “not exists” Clause

• Find all students who have taken all courses offered in the Biology department.

• The exists construct returns the value true if the argument subquery is nonempty. select distinct S.ID, S.name
• exists r  r  Ø from student as S
where not exists ( (select course_id
• not exists r  r = Ø from course
where dept_name = 'Biology')
except
(select T.course_id
from takes as T
where S.ID = T.ID));

• First nested query lists all courses offered in Biology


• Second nested query lists all courses a particular student took
• Note that X – Y = Ø  X  Y
• Note: Cannot write this query using = all and its variants
Test for Absence of Duplicate Tuples Subqueries in the Form Clause
• SQL allows a subquery expression to be used in the from clause
• The unique construct tests whether a subquery has any duplicate tuples in its
• Find the average instructors’ salaries of those departments where the average
result.
salary is greater than $42,000.”
• The unique construct evaluates to “true” if a given subquery contains no select dept_name, avg_salary
duplicates . from ( select dept_name, avg (salary) as avg_salary
• Find all courses that were offered at most once in 2017 from instructor
group by dept_name)
select T.course_id where avg_salary > 42000;
from course as T
where unique ( select R.course_id • Note that we do not need to use the having clause
from section as R
• Another way to write above query
where T.course_id= R.course_id
and R.year = 2017);
select dept_name, avg_salary
from ( select dept_name, avg (salary)
from instructor
group by dept_name)
as dept_avg (dept_name, avg_salary)
where avg_salary > 42000;

With Clause

• The with clause provides a way of defining a temporary relation whose definition is
available only to the query in which the with clause occurs.
• Find all departments with the maximum budget
Subqueries in the From Clause with max_budget (value) as
(select max(budget)
from department)
select department.name
from department, max_budget
where department.budget = max_budget.value;
Modification of the Database
Complex Queries using With Clause
• Deletion of tuples from a given relation.
• Insertion of new tuples into a given relation
• Find all departments where the total salary is greater than the average of the
total salary at all departments • Updating of values in some tuples in a given relation

with dept _total (dept_name, value) as


(select dept_name, sum(salary)
from instructor
group by dept_name),
dept_total_avg(value) as
(select avg(value)
from dept_total)
select dept_name
from dept_total, dept_total_avg
where dept_total.value > dept_total_avg.value;

Deletion
Scalar Subquery
• Delete all instructors
delete from instructor
• Scalar subquery is one which is used where a single value is expected
• List all departments along with the number of instructors in each department • Delete all instructors from the Finance department
delete from instructor
select dept_name, where dept_name= 'Finance’;
( select count(*)
from instructor
where department.dept_name = instructor.dept_name) • Delete all tuples in the instructor relation for those instructors associated with a
as num_instructors department located in the Watson building.
from department; delete from instructor
• Runtime error if subquery returns more than one result tuple where dept name in (select dept name
from department
where building = 'Watson');
Insertion (Cont.)
Deletion (Cont.)
• Make each student in the Music department who has earned more than 144
credit hours an instructor in the Music department with a salary of $18,000.
• Delete all instructors whose salary is less than the average salary of instructors insert into instructor
• Problem: as we delete tuples from instructor, the select ID, name, dept_name, 18000
from student
average salary changes where dept_name = 'Music' and total_cred > 144;
• Solution used in SQL:
• The select from where statement is evaluated fully before any of its results are
1. First, compute avg (salary) and find all tuples to delete inserted into the relation.
2. Next, delete all tuples found above (without recomputing avg Otherwise queries like
or retesting the tuples)
insert into table1 select * from table1
would cause problem
delete from instructor
where salary < (select avg (salary)
from instructor);

Insertion Updates

• Add a new tuple to course • Give a 5% salary raise to all instructors


insert into course update instructor
values ('CS-437', 'Database Systems', 'Comp. Sci.', 4); set salary = salary * 1.05

• Give a 5% salary raise to those instructors who earn less than 70000
• or equivalently update instructor
set salary = salary * 1.05
insert into course (course_id, title, dept_name, credits) where salary < 70000;
values ('CS-437', 'Database Systems', 'Comp. Sci.', 4);
• Give a 5% salary raise to instructors whose salary is less than average
• Add a new tuple to student with tot_creds set to null update instructor
set salary = salary * 1.05
insert into student where salary < (select avg (salary)
values ('3003', 'Green', 'Finance', null); from instructor);
Updates (Cont.)
Updates with Scalar Subqueries
• Increase salaries of instructors whose salary is over $100,000 by 3%, and all others
by a 5%
• Write two update statements: • Recompute and update tot_creds value for all students
update instructor update student S
set salary = salary * 1.03 set tot_cred = (select sum(credits)
where salary > 100000; from takes, course
update instructor where takes.course_id = course.course_id and
set salary = salary * 1.05 S.ID= takes.ID.and
where salary <= 100000; takes.grade <> 'F' and
• The order is important takes.grade is not null);
• Can be done better using the case statement (next slide) • Sets tot_creds to null for students who have not taken any course
• Instead of sum(credits), use:
case
when sum(credits) is not null then sum(credits)
else 0
end

Case Statement for Conditional Updates

• Same query as before but with case statement


update instructor
set salary = case
when salary <= 100000 then salary * 1.05
else salary * 1.03
end

Entity-Relationship Model
Design Phases Design Approaches
• The initial phase of database design is to characterize fully the data needs of the
• Entity Relationship Model
prospective database users. • Models an enterprise as a collection of entities and relationships
• Next, the designer chooses a data model and, by applying the concepts of the • Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects
chosen data model, translates these requirements into a conceptual schema of the
database. • Described by a set of attributes
• A fully developed conceptual schema also indicates the functional requirements of • Relationship: an association among several entities
the enterprise. In a “specification of functional requirements”, users describe the • Represented diagrammatically by an entity-relationship diagram:
kinds of operations (or transactions) that will be performed on the data.
• Normalization Theory
• Formalize what designs are bad, and test for them

Design Phases (Cont.)


The process of moving from an abstract data model to the implementation of
the database proceeds in two final design phases.

• Logical Design – Deciding on the database schema. Database Outline of the ER Model
design requires that we find a “good” collection of relation
schemas.
• Business decision – What attributes should we record in the database?
• Computer Science decision – What relation schemas should we have
and how should the attributes be distributed among the various
relation schemas?
• Physical Design – Deciding on the physical layout of the
database
Entity Sets -- instructor and student
ER model -- Database Modeling instructor_ID instructor_name student-ID student_name

• The ER data mode was developed to facilitate database design by allowing


specification of an enterprise schema that represents the overall logical
structure of a database.
• The ER model is very useful in mapping the meanings and interactions of
real-world enterprises onto a conceptual schema. Because of this
usefulness, many database-design tools draw on concepts from the ER
model.
• The ER data model employs three basic concepts:
• entity sets,
• relationship sets,
• attributes.
• The ER model also has an associated diagrammatic representation, the ER
diagram, which can express the overall logical structure of a database
graphically.

Entity Sets Relationship Sets


• A relationship is an association among several entities
• An entity is an object that exists and is distinguishable from other objects.
• Example: specific person, company, event, plant Example:
• An entity set is a set of entities of the same type that share the same properties. 44553 (Peltier) advisor 22222 (Einstein)
• Example: set of all persons, companies, trees, holidays student entityrelationship set instructor entity
• An entity is represented by a set of attributes; i.e., descriptive properties possessed • A relationship set is a mathematical relation among n  2
by all members of an entity set. entities, each taken from entity sets
• Example:
instructor = (ID, name, street, city, salary ) {(e1, e2, … en) | e1  E1, e2  E2, …, en  En}
course= (course_id, title, credits)
• A subset of the attributes form a primary key of the entity set; i.e., uniquely where (e1, e2, …, en) is a relationship
identifiying each member of the set. • Example:
(44553,22222)  advisor
Relationship Set advisor Degree of a Relationship Set
• binary relationship
• involve two entity sets (or degree two).
• most relationship sets in a database system are binary.
• Relationships between more than two entity sets are rare. Most
relationships are binary. (More on this later.)
Example: students work on research projects under the guidance of an instructor.
relationship proj_guide is a ternary relationship between instructor, student, and
project

Relationship Sets (Cont.) Mapping Cardinality Constraints


• An attribute can also be associated with a relationship set.
• For instance, the advisor relationship set between entity sets instructor and student may
have the attribute date which tracks when the student started being associated with the • Express the number of entities to which another entity can be associated via
advisor a relationship set.
• Most useful in describing binary relationship sets.
• For a binary relationship set the mapping cardinality must be one of the
following types:
• One to one
• One to many
• Many to one
• Many to many
Mapping Cardinalities Complex Attributes
• Attribute types:
• Simple and composite attributes.
• Single-valued and multivalued attributes
• Example: multivalued attribute: phone_numbers
• Derived attributes
• Can be computed from other attributes
• Example: age, given date_of_birth
• Domain – the set of permitted values for each attribute
One to one One to many

Note: Some elements in A and B may not be mapped to any elements in the other set

Composite Attributes
Mapping Cardinalities

Many to Many to many


one
Note: Some elements in A and B may not be mapped to any elements in the other set
Redundant Attributes Weak Entity Sets (Cont.)
• Suppose we have entity sets:
• instructor, with attributes: ID, name, dept_name, salary • An alternative way to deal with this redundancy is to not store the attribute course_id in
• department, with attributes: dept_name, building, budget the section entity and to only store the remaining attributes section_id, year, and
• We model the fact that each instructor has an associated department using a semester. However, the entity set section then does not have enough attributes to identify
relationship set inst_dept a particular section entity uniquely; although each section entity is distinct, sections for
different courses may share the same section_id, year, and semester.
• The attribute dept_name appears in both entity sets. Since it is the primary key for
the entity set department, it replicates information present in the relationship and is • To deal with this problem, we treat the relationship sec_course as a special relationship
therefore redundant in the entity set instructor and needs to be removed. that provides extra information, in this case, the course_id, required to identify section
entities uniquely.
• BUT: when converting back to tables, in some cases the attribute gets reintroduced,
as we will see later. • The notion of weak entity set formalizes the above intuition. A weak entity set is one
whose existence is dependent on another entity, called its identifying entity; instead of
associating a primary key with a weak entity, we use the identifying entity, along with extra
attributes called discriminator to uniquely identify a weak entity. An entity set that is not a
weak entity set is termed a strong entity set.

Weak Entity Sets Weak Entity Sets (Cont.)


• Consider a section entity, which is uniquely identified by a course_id, semester,
year, and sec_id. • Every weak entity must be associated with an identifying entity; that is, the weak entity
• Clearly, section entities are related to course entities. Suppose we create a set is said to be existence dependent on the identifying entity set. The identifying entity
relationship set sec_course between entity sets section and course. set is said to own the weak entity set that it identifies. The relationship associating the
• Note that the information in sec_course is redundant, since section already has weak entity set with the identifying entity set is called the identifying relationship.
an attribute course_id, which identifies the course with which the section is • Note that the relational schema we eventually create from the entity set section does
related. have the attribute course_id, for reasons that will become clear later, even though we
• One option to deal with this redundancy is to get rid of the relationship have dropped the attribute course_id from the entity set section.
sec_course; however, by doing so the relationship between section and course
becomes implicit in an attribute, which is not desirable.
Relationship Sets

Diamonds represent relationship sets.

E-R Diagrams

Entity Sets
Entities can be represented graphically as follows:
Relationship Sets with Attributes
• Rectangles represent entity sets.
• Attributes listed inside entity rectangle
• Underline indicates primary key attributes
One-to-Many Relationship
Roles
• Entity sets of a relationship need not be distinct
• Each occurrence of an entity set plays a “role” in the relationship
• one-to-many relationship between an instructor and
• The labels “course_id” and “prereq_id” are called roles. a student
• an instructor is associated with several (including 0)
students via advisor
• a student is associated with at most one instructor via
advisor,

Many-to-One Relationships
Cardinality Constraints
• We express cardinality constraints by drawing either a • In a many-to-one relationship between an
directed line (→), signifying “one,” or an undirected line (—), instructor and a student,
signifying “many,” between the relationship set and the entity • an instructor is associated with at most one student via
set. advisor,
• and a student is associated with several (including 0)
• One-to-one relationship between an instructor and a student instructors via advisor
:
• A student is associated with at most one instructor via the
relationship advisor
• A student is associated with at most one department via stud_dept
Notation for Expressing More Complex Constraints

Many-to-Many Relationship A line may have an associated minimum and maximum cardinality,
shown in the form l..h, where l is the minimum and h the maximum
cardinality
A minimum value of 1 indicates total participation.
• An instructor is associated with several (possibly 0) students via advisor
A maximum value of 1 indicates that the entity participates in
• A student is associated with several (possibly 0) instructors via advisor at most one relationship
A maximum value of * indicates no limit.

Instructor can advise 0 or more students. A student must have


1 advisor; cannot have multiple advisors

Total and Partial Participation Notation to Express Entity with Complex Attributes

Total participation (indicated by double line): every entity in the


entity set participates in at least one relationship in the relationship
set

participation of student in advisor relation is total


 every student must have an associated instructor
Partial participation: some entities may not participate in any
relationship in the relationship set
Example: participation of instructor in advisor is partial
Expressing Weak Entity Sets

• In E-R diagrams, a weak entity set is depicted via a double


rectangle.
• We underline the discriminator of a weak entity set with a
dashed line.
• The relationship set connecting the weak entity set to the
identifying strong entity set is depicted by a double diamond.
• Primary key for section – (course_id, sec_id, semester, year)
Reduction to Relation Schemas

E-R Diagram for a University Enterprise Reduction to Relation Schemas


• Entity sets and relationship sets can be expressed
uniformly as relation schemas that represent the
contents of the database.
• A database which conforms to an E-R diagram can be
represented by a collection of schemas.
• For each entity set and relationship set there is a
unique schema that is assigned the name of the
corresponding entity set or relationship set.
• Each schema has a number of columns (generally
corresponding to attributes), which have unique
names.
Representation of Entity Sets with Composite Attributes

Representing Entity Sets • Composite attributes are flattened out by


creating a separate attribute for each
• A strong entity set reduces to a schema with the same component attribute
attributes
• Example: given entity set instructor with
student(ID, name, tot_cred) composite attribute name with component
attributes first_name and last_name the
schema corresponding to the entity set has
• A weak entity set becomes a table that includes a column two attributes name_first_name and
for the primary key of the identifying strong entity set name_last_name
• Prefix omitted if there is no ambiguity
section ( course_id, sec_id, sem, year ) (name_first_name could be first_name)
• Ignoring multivalued attributes, extended
instructor schema is
• instructor(ID,
first_name, middle_initial, last_name,
street_number, street_name,
apt_number, city, state, zip_code,
date_of_birth)

Representing Relationship Sets Representation of Entity Sets with Multivalued Attributes

• A many-to-many relationship set is represented as a schema • A multivalued attribute M of an entity E is


with attributes for the primary keys of the two participating
entity sets, and any descriptive attributes of the relationship represented by a separate schema EM
set.
• Schema EM has attributes corresponding to the
• Example: schema for relationship set advisor
primary key of E and an attribute corresponding
advisor = (s_id, i_id) to multivalued attribute M
• Example: Multivalued attribute phone_number
of instructor is represented by a schema:
inst_phone= ( ID, phone_number)
• Each value of the multivalued attribute maps to
a separate tuple of the relation on schema EM
• For example, an instructor entity with primary key
22222 and phone numbers 456-7890 and 123-4567
maps to two tuples:
(22222, 456-7890) and (22222, 123-4567)
Redundancy of Schemas
Many-to-one and one-to-many relationship sets that are total on the
Redundancy of Schemas (Cont.)
many-side can be represented by adding an extra attribute to the • The schema corresponding to a relationship set
“many” side, containing the primary key of the “one” side
linking a weak entity set to its identifying strong
Example: Instead of creating a schema for relationship set inst_dept,
add an attribute dept_name to the schema arising from entity set entity set is redundant.
instructor

• Example: The section schema already contains


the attributes that would appear in the
sec_course schema

Redundancy of Schemas (Cont.)


• For one-to-one relationship sets, either
side can be chosen to act as the “many”
side
• That is, an extra attribute can be added to
either of the tables corresponding to the two
entity sets Advanced Topics
• If participation is partial on the “many”
side, replacing a schema by an extra
attribute in the schema corresponding to
the “many” side could result in null values
Non-binary Relationship Sets

Specialization
• Top-down design process; we designate sub-
• Most relationship sets are binary
groupings within an entity set that are distinctive
• There are occasions when it is more from other entities in the set.
convenient to represent relationships as • These sub-groupings become lower-level entity sets
that have attributes or participate in relationships
non-binary. that do not apply to the higher-level entity set.
• E-R Diagram with a Ternary Relationship • Depicted by a triangle component labeled ISA (e.g.,
instructor “is a” person).
• Attribute inheritance – a lower-level entity set
inherits all the attributes and relationship
participation of the higher-level entity set to which
it is linked.

Cardinality Constraints on Ternary Relationship

• We allow at most one arrow out of a ternary (or


Specialization Example
• Overlapping – employee and student
greater degree) relationship to indicate a
cardinality constraint • Disjoint – instructor and secretary
• For exampe, an arrow from proj_guide to • Total and partial
instructor indicates each student has at most one
guide for a project
• If there is more than one arrow, there are two
ways of defining the meaning.
• For example, a ternary relationship R between A, B and
C with arrows to B and C could mean
1. Each A entity is associated with a unique entity
from B and C or
2. Each pair of entities from (A, B) is associated with a
unique C entity, and each pair (A, C) is associated
with a unique B
• Each alternative has been used in different formalisms
• To avoid confusion we outlaw more than one arrow
Representing Specialization via Schemas

Generalization
• A bottom-up design process – combine a
• Method 1:
• Form a schema for the higher-level entity number of entity sets that share the same
features into a higher-level entity set.
• Form a schema for each lower-level entity set, include
primary key of higher-level entity set and local attributes • Specialization and generalization are simple
inversions of each other; they are represented in
schema attributes an E-R diagram in the same way.
person
student
ID, name, street, city
ID, tot_cred
• The terms specialization and generalization are
employee ID, salary
used interchangeably.

• Drawback: getting information about, an employee


requires accessing two relations, the one corresponding
to the low-level schema and the one corresponding to
the high-level schema

Representing Specialization as Schemas (Cont.) Design Constraints on a Specialization/Generalization

• Method 2: • Completeness constraint -- specifies whether


• Form a schema for each entity set with all local and or not an entity in the higher-level entity set
inherited attributes must belong to at least one of the lower-level
schema attributes
entity sets within a generalization.
person ID, name, street, city • total: an entity must belong to one of the lower-
student ID, name, street, city, tot_cred level entity sets
employee ID, name, street, city, salary
• partial: an entity need not belong to one of the
lower-level entity sets
• Drawback: name, street and city may be stored • Partial generalization is the default. We can specify total generalization in an ER
diagram by adding the keyword total in the diagram and drawing a dashed line
redundantly for people who are both students and from the keyword to the corresponding hollow arrow-head to which it applies (for
a total generalization), or to the set of hollow arrow-heads to which it applies (for
employees an overlapping generalization).
• The student generalization is total: All student entities must be either graduate or
undergraduate. Because the higher-level entity set arrived at through
generalization is generally composed of only those entities in the lower-level entity
sets, the completeness constraint for a generalized higher-level entity set is usually
total
Aggregation
Consider the ternary relationship proj_guide, which we saw earlier
Aggregation (Cont.)
• Eliminate this redundancy via aggregation without introducing
redundancy, the following diagram represents:
Suppose we want to record evaluations of a student by a guide • A student is guided by a particular instructor on a particular project
on a project • A student, instructor, project combination may have an associated
evaluation

Representing Aggregation via Schemas

Aggregation (Cont.)
• Relationship sets eval_for and proj_guide To represent aggregation, create a schema containing
represent overlapping information Primary key of the aggregated relationship,

• Every eval_for relationship corresponds to a The primary key of the associated entity set
proj_guide relationship Any descriptive attributes
• However, some proj_guide relationships may not In our example:
correspond to any eval_for relationships The schema eval_for is:
• So we can’t discard the proj_guide relationship eval_for (s_ID, project_id, i_ID, evaluation_id)

• Eliminate this redundancy via aggregation The schema proj_guide is redundant.

• Treat relationship as an abstract entity


• Allows relationships between relationships
• Abstraction of relationship into new entity
Entities vs. Relationship sets
• Use of entity sets vs. relationship sets

Possible guideline is to designate a relationship set to describe an action that occurs between entities

Design Issues • Placement of relationship attributes

For example, attribute date as attribute of advisor or as


attribute of student

Entities vs. Attributes Binary Vs. Non-Binary Relationships


• Use of entity sets vs. attributes • Although it is possible to replace any non-binary
(n-ary, for n > 2) relationship set by a number of
distinct binary relationship sets, a n-ary
relationship set shows more clearly that several
entities participate in a single relationship.
• Some relationships that appear to be non-binary
may be better represented using binary
relationships
• For example, a ternary relationship parents, relating a
• Use of phone as an entity allows extra information child to his/her father and mother, is best replaced by
two binary relationships, father and mother
about phone numbers (plus multiple phone • Using two binary relationships allows partial information
numbers) (e.g., only mother being known)
• But there are some relationships that are naturally non-
binary
• Example: proj_guide
Converting Non-Binary Relationships to Binary Form

• In general, any non-binary relationship can be represented E-R Design Decisions


• The use of an attribute or entity set to represent an
using binary relationships by creating an artificial entity set. object.
• Replace R between entity sets A, B and C by an entity set E, and
three relationship sets: • Whether a real-world concept is best expressed by
1. RA, relating E and A 2. RB, relating E and B an entity set or a relationship set.
3. RC, relating E and C
• Create an identifying attribute for E and add any attributes of R to • The use of a ternary relationship versus a pair of
E binary relationships.
• For each relationship (ai , bi , ci) in R, create
1. a new entity ei in the entity set E 2. add (ei , ai ) to
• The use of a strong or weak entity set.
RA • The use of specialization/generalization –
3. add (ei , bi ) to RB 4. add (ei , ci ) to RC contributes to modularity in the design.
• The use of aggregation – can treat the aggregate
entity set as a single unit without concern for the
details of its internal structure.

Converting Non-Binary Relationships (Cont.) Summary of Symbols Used in E-R Notation

• Also need to translate constraints


• Translating all constraints may not be possible
• There may be instances in the translated schema that
cannot correspond to any instance of R
• Exercise: add constraints to the relationships RA, RB and RC
to ensure that a newly created entity corresponds to
exactly one entity in each of entity sets A, B and C
• We can avoid creating an identifying attribute by
making E a weak entity set (described shortly)
identified by the three relationship sets
Symbols Used in E-R Notation (Cont.) Alternative ER Notations
Chen IDE1FX (Crows feet
notation)

Alternative ER Notations
• Chen, IDE1FX, …
UML
• UML: Unified Modeling Language
• UML has many components to graphically
model different aspects of an entire software
system
• UML Class Diagrams correspond to E-R Diagram,
but several differences.
ER vs. UML Class Diagrams
UML Class Diagrams (Cont.)
• Binary relationship sets are represented in UML
by just drawing a line connecting the entity
sets. The relationship set name is written
adjacent to the line.
• The role played by an entity set in a relationship
set may also be specified by writing the role
name on the line, adjacent to the entity set.
• The relationship set name may alternatively be
written in a box, along with attributes of the
relationship set, and the box is connected, using
a dotted line, to the line depicting the
relationship set.
*Note reversal of position in cardinality constraint depiction

ER vs. UMLERClass Diagrams Equivalent in UML


Diagram Notation

End of Chapter 7

*Generalization can use merged or separate arrows independent


of disjoint/overlapping
Joined Relations

• Join operations take two relations and return as a result another relation.
• A join operation is a Cartesian product which requires that tuples in the two relations match
(under some condition). It also specifies the attributes that are present in the result of the join

Unit 3: Intermediate SQL • The join operations are typically used as subquery expressions in the from clause
• Three types of joins:
• Natural join
• Inner join
• Outer join

Outline Natural Join in SQL


• Join Expressions
• Views
• Natural join matches tuples with the same values for all common attributes, and retains only one copy
• Transactions of each common column.
• Integrity Constraints • List the names of instructors along with the course ID of the courses that they taught
• SQL Data Types and Schemas • select name, course_id
• Index Definition in SQL from students, takes
where student.ID = takes.ID;
• Authorization
• Same query in SQL with “natural join” construct
• select name, course_id
from student natural join takes;
Natural Join in SQL (Cont.) Takes Relation

• The from clause can have multiple relations combined using natural
join:
select A1, A2, … An
from r1 natural join r2 natural join .. natural join rn
where P ;

Student Relation student natural join takes


Dangerous in Natural Join Outer Join Examples
• Beware of unrelated attributes with same name which get equated incorrectly • Relation course
• Example -- List the names of students instructors along with the titles of courses that they have taken
• Correct version
select name, title
from student natural join takes, course
where takes.course_id = course.course_id;
• Incorrect version • Relation prereq
select name, title
from student natural join takes natural join course;
• This query omits all (student name, course title) pairs where the student takes a course in a
department other than the student's own department.
• The correct version (above), correctly outputs such pairs.
• Observe that
course information is missing CS-347
prereq information is missing CS-315

Outer Join Left Outer Join


• An extension of the join operation that avoids loss of information. • course natural left outer join prereq
• Computes the join and then adds tuples form one relation that does not match tuples in the other
relation to the result of the join.
• Uses null values.
• Three forms of outer join:
• left outer join
• right outer join
• full outer join ▪ In relational algebra: course ⟕ prereq
Right Outer Join Joined Types and Conditions
• course natural right outer join prereq • Join operations take two relations and return as a result another
relation.
• These additional operations are typically used as subquery expressions
in the from clause
• Join condition – defines which tuples in the two relations match.
• Join type – defines how tuples in each relation that do not match any
tuple in the other relation (based on the join condition) are treated.

▪ In relational algebra: course ⟖ prereq

Full Outer Join Joined Relations – Examples


• course natural right outer join prereq
• course natural full outer join prereq

• course full outer join prereq using (course_id)

• In relational algebra: course ⟗ prereq


Joined Relations – Examples Views
• course inner join prereq on • In some cases, it is not desirable for all users to see the entire logical model (that
course.course_id = prereq.course_id is, all the actual relations stored in the database.)
• Consider a person who needs to know an instructors name and department, but
not the salary. This person should see a relation described, in SQL, by

select ID, name, dept_name


from instructor
• What is the difference between the above, and a natural join?
• course left outer join prereq on • A view provides a mechanism to hide certain data from the view of certain users.
course.course_id = prereq.course_id
• Any relation that is not of the conceptual model but is made visible to a user as a
“virtual relation” is called a view.

Joined Relations – Examples View Definition


• course natural right outer join prereq
• A view is defined using the create view statement which has the form

create view v as < query expression >

where <query expression> is any legal SQL expression. The view name is represented by v.
• course full outer join prereq using (course_id) • Once a view is defined, the view name can be used to refer to the virtual relation that the view generates.
• View definition is not the same as creating a new relation by evaluating the query expression
• Rather, a view definition causes the saving of an expression; the expression is substituted into queries
using the view.
View Definition and Use Views Defined Using Other Views
• A view of instructors without their salary
• create view physics_fall_2017 as
create view faculty as
select ID, name, dept_name select course.course_id, sec_id, building, room_number
from instructor from course, section
where course.course_id = section.course_id
• Find all instructors in the Biology department and course.dept_name = 'Physics'
select name and section.semester = 'Fall'
from faculty and section.year = '2017’;
where dept_name = 'Biology'
• Create a view of department salary totals • create view physics_fall_2017_watson as
select course_id, room_number
create view departments_total_salary(dept_name, total_salary) from physics_fall_2017
as where building= 'Watson';
select dept_name, sum (salary)
from instructor
group by dept_name;

Views Defined Using Other Views View Expansion

• One view may be used in the expression defining another view


• Expand the view :
• A view relation v1 is said to depend directly on a view relation v2 if v2 is used in the expression
defining v1 create view physics_fall_2017_watson as
select course_id, room_number
• A view relation v1 is said to depend on view relation v2 if either v1 depends directly to v2 or from physics_fall_2017
there is a path of dependencies from v1 to v2 where building= 'Watson'
• A view relation v is said to be recursive if it depends on itself. • To:
create view physics_fall_2017_watson as
select course_id, room_number
from (select course.course_id, building, room_number
from course, section
where course.course_id = section.course_id
and course.dept_name = 'Physics'
and section.semester = 'Fall'
and section.year = '2017')
where building= 'Watson';
View Expansion (Cont.) Update of a View
• Add a new tuple to faculty view which we defined earlier
• A way to define the meaning of views defined in terms of other views.
insert into faculty
• Let view v1 be defined by an expression e1 that may itself contain uses
of view relations. values ('30765', 'Green', 'Music');
• View expansion of an expression repeats the following replacement • This insertion must be represented by the insertion into the instructor relation
step: • Must have a value for salary.
repeat • Two approaches
Find any view relation vi in e1 1. Reject the insert operation and send an error message to user
Replace the view relation vi by the expression defining vi
until no more view relations are present in e1 2. Insert the tuple

• As long as the view definitions are not recursive, this loop will ('30765', 'Green', 'Music', null)
terminate into the instructor relation

Some Updates Cannot be Translated Uniquely


Materialized Views
• create view instructor_info as
select ID, name, building
• Certain database systems allow view relations to be physically stored. from instructor, department
• Physical copy created when the view is defined. where instructor.dept_name = department.dept_name;
• Such views are called Materialized view: • insert into instructor_info
• If relations used in the query are updated, the materialized view values ('69987', 'White', 'Taylor');
result becomes out of date
• Need to maintain the view, by updating the view whenever the • Issues
underlying relations are updated. • Which department, if multiple departments in Taylor?
• What if no department is in Taylor?
And Some Not at All Transactions
• A transaction consists of a sequence of query and/or update statements and
• create view history_instructors as is a “unit” of work
select * • The SQL standard specifies that a transaction begins implicitly when an SQL
from instructor statement is executed.
where dept_name= 'History';
• The transaction must end with one of the following statements:
• What happens if we insert • Commit work. The updates performed by the transaction become
('25566', 'Brown', 'Biology', 100000) permanent in the database.
into history_instructors? • Rollback work. All the updates performed by the SQL statements in the
transaction are undone.
• Atomic transaction
• either fully executed or rolled back as if it never occurred
• Isolation from concurrent transactions

View Updates in SQL Integrity Constraints

• Integrity constraints guard against accidental damage to the database, by ensuring that
• Most SQL implementations allow updates only on simple views
authorized changes to the database do not result in a loss of data consistency.
• The from clause has only one database relation.
• A checking account must have a balance greater than $10,000.00
• The select clause contains only attribute names of the relation,
• A salary of a bank employee must be at least $4.00 an hour
and does not have any expressions, aggregates, or distinct
specification. • A customer must have a (non-null) phone number
• Any attribute not listed in the select clause can be set to null
• The query does not have a group by or having clause.
Constraints on a Single Relation Unique Constraints

• not null • unique ( A1, A2, …, Am)


• primary key • The unique specification states that the attributes A1, A2, …, Am form a candidate
key.
• unique
• Candidate keys are permitted to be null (in contrast to primary keys).
• check (P), where P is a predicate

Not Null Constraints The check clause

• not null • The check (P) clause specifies a predicate P that must be satisfied by every tuple in a
relation.
• Declare name and budget to be not null
• Example: ensure that semester is one of fall, winter, spring or summer
name varchar(20) not null
budget numeric(12,2) not null create table section
(course_id varchar (8),
sec_id varchar (8),
semester varchar (6),
year numeric (4,0),
building varchar (15),
room_number varchar (7),
time slot id varchar (4),
primary key (course_id, sec_id, semester, year),
check (semester in ('Fall', 'Winter', 'Spring', 'Summer')))
Cascading Actions in Referential Integrity
Referential Integrity
• When a referential-integrity constraint is violated, the normal procedure is to reject the
action that caused the violation.
• Ensures that a value that appears in one relation for a given set of • An alternative, in case of delete or update is to cascade
attributes also appears for a certain set of attributes in another relation. create table course (
• Example: If “Biology” is a department name appearing in one of the (…
dept_name varchar(20),
tuples in the instructor relation, then there exists a tuple in the foreign key (dept_name) references department
department relation for “Biology”. on delete cascade
on update cascade,
• Let A be a set of attributes. Let R and S be two relations that contain . . .)
attributes A and where A is the primary key of S. A is said to be a foreign • Instead of cascade we can use :
key of R if for any values of A appearing in R these values also appear in S. • set null,
• set default

Integrity Constraint Violation During Transactions


Referential Integrity (Cont.)
• Consider:
create table person (
• Foreign keys can be specified as part of the SQL create table statement ID char(10),
foreign key (dept_name) references department name char(40),
mother char(10),
• By default, a foreign key references the primary-key attributes of the referenced table. father char(10),
• SQL allows a list of attributes of the referenced relation to be specified explicitly. primary key ID,
foreign key father references person,
foreign key (dept_name) references department (dept_name) foreign key mother references person)
• How to insert a tuple without causing constraint violation?
• Insert father and mother of a person before inserting person
• OR, set father and mother to null initially, update after inserting all persons (not possible if
father and mother attributes declared to be not null)
• OR defer constraint checking
Built-in Data Types in SQL
Complex Check Conditions
• date: Dates, containing a (4 digit) year, month and date
• Example: date '2005-7-27'
• The predicate in the check clause can be an arbitrary predicate that can include a • time: Time of day, in hours, minutes and seconds.
subquery. • Example: time '09:00:30' time '09:00:30.75'
check (time_slot_id in (select time_slot_id from time_slot)) • timestamp: date plus time of day
The check condition states that the time_slot_id in each tuple in the section relation is • Example: timestamp '2005-7-27 09:00:30.75'
actually the identifier of a time slot in the time_slot relation. • interval: period of time
• The condition has to be checked not only when a tuple is inserted or modified in • Example: interval '1' day
section , but also when the relation time_slot changes
• Subtracting a date/time/timestamp value from another gives an interval value
• Interval values can be added to date/time/timestamp values

Assertions Large-Object Types

• An assertion is a predicate expressing a condition that we wish the database always to satisfy. • Large objects (photos, videos, CAD files, etc.) are stored as a large object:
• blob: binary large object -- object is a large collection of uninterpreted binary data
• The following constraints, can be expressed using assertions: (whose interpretation is left to an application outside of the database system)
• For each tuple in the student relation, the value of the attribute tot_cred must equal the sum of • clob: character large object -- object is a large collection of character data
credits of courses that the student has completed successfully.
• When a query returns a large object, a pointer is returned rather than the large object itself.
• An instructor cannot teach in two different classrooms in a semester in the same time slot
• An assertion in SQL takes the form:
create assertion <assertion-name> check (<predicate>);
User-Defined Types Index Creation

• Many queries reference only a small proportion of the records in a table.


• create type construct in SQL creates user-defined type • It is inefficient for the system to read every record to find a record with particular
value
create type Dollars as numeric (12,2) final • An index on an attribute of a relation is a data structure that allows the database
system to find those tuples in the relation that have a specified value for that
• Example: attribute efficiently, without scanning through all the tuples of the relation.
create table department • We create an index with the create index command
(dept_name varchar (20),
building varchar (15), create index <name> on <relation-name> (attribute);
budget Dollars);

Domains
Index Creation Example
• create domain construct in SQL-92 creates user-defined domain types
• create table student
create domain person_name char(20) not null (ID varchar (5),
name varchar (20) not null,
• Types and domains are similar. Domains can have constraints, such as not null, dept_name varchar (20),
specified on them. tot_cred numeric (3,0) default 0,
• Example: primary key (ID))
create domain degree_level varchar(10) • create index studentID_index on student(ID)
constraint degree_level_test • The query:
check (value in ('Bachelors', 'Masters', 'Doctorate'));
select *
from student
where ID = '12345'
can be executed by using the index to find the required record, without looking at
all records of student
Authorization Authorization Specification in SQL
• We may assign a user several forms of authorizations on parts of the database. • The grant statement is used to confer authorization
• Read - allows reading, but not modification of data. grant <privilege list> on <relation or view > to <user list>
• Insert - allows insertion of new data, but not modification of existing data. • <user list> is:
• Update - allows modification, but not deletion of data. • a user-id
• Delete - allows deletion of data. • public, which allows all valid users the privilege granted
• Each of these types of authorizations is called a privilege. We may authorize the user all, none, or a • A role (more on this later)
combination of these types of privileges on specified parts of a database, such as a relation or a • Example:
view.
• grant select on department to Amit, Satoshi
• Granting a privilege on a view does not imply granting any privileges on the underlying
relations.
• The grantor of the privilege must already hold the privilege on the specified item (or be
the database administrator).

Authorization (Cont.) Privileges in SQL

• select: allows read access to relation, or the ability to query using the view
• Forms of authorization to modify the database schema • Example: grant users U1, U2, and U3 select authorization on the instructor
• Index - allows creation and deletion of indices. relation:
• Resources - allows creation of new relations. grant select on instructor to U1, U2, U3
• Alteration - allows addition or deletion of attributes in a relation. • insert: the ability to insert tuples
• Drop - allows deletion of relations.
• update: the ability to update using the SQL update statement
• delete: the ability to delete tuples.
• all privileges: used as a short form for all the allowable privileges
Revoking Authorization in SQL Roles Example

• The revoke statement is used to revoke authorization. • create role instructor;


revoke <privilege list> on <relation or view> from <user list>
• grant instructor to Amit;
• Example:
• Privileges can be granted to roles:
revoke select on student from U1, U2, U3
• grant select on takes to instructor;
• <privilege-list> may be all to revoke all privileges the revokee may hold.
• Roles can be granted to users, as well as to other roles
• If <revokee-list> includes public, all users lose the privilege except those granted it • create role teaching_assistant
explicitly.
• grant teaching_assistant to instructor;
• If the same privilege was granted twice to the same user by different grantees, the user • Instructor inherits all privileges of teaching_assistant
may retain the privilege after the revocation.
• Chain of roles
• All privileges that depend on the privilege being revoked are also revoked.
• create role dean;
• grant instructor to dean;
• grant dean to Satoshi;

Roles Authorization on Views


• A role is a way to distinguish among various users as far as what these
users can access/update in the database.
• To create a role we use:
• create view geo_instructor as
(select *
create a role <name> from instructor
where dept_name = 'Geology');
• Example:
• grant select on geo_instructor to geo_staff
• create role instructor
• Suppose that a geo_staff member issues
• Once a role is created we can assign “users” to the role using: • select *
• grant <role> to <users> from geo_instructor;
• What if
• geo_staff does not have permissions on instructor?
• Creator of view did not have some permissions on
instructor?
Other Authorization Features

• references privilege to create foreign key


• grant reference (dept_name) on department to Mariano;
• Why is this required?
• transfer of privileges
• grant select on department to Amit with grant option;
• revoke select on department from Amit, Satoshi cascade;
• revoke select on department from Amit, Satoshi restrict;
• And more!

Thank you

You might also like