0% found this document useful (0 votes)
13 views95 pages

DBMS Unit4 Notes

Uploaded by

pes1202203799
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views95 pages

DBMS Unit4 Notes

Uploaded by

pes1202203799
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

PES University

Department of Computer Science and Engineering


UE21CS351A: Database Management System

Unit 4 Notes

Accessing SQL from Python


Why Python with MySQL
✔ Programming in python is more efficient and faster compared to
other languages.
✔ Portability and Platform Independent.
✔ Python supports SQL cursors.
✔ In many programming languages, application developer needs to
take care of the open and closed connections of the database to
avoid further exception and errors. In python, these connections
are taken care of.
✔ Python supports relational database systems.

MySQL connector
There are many modules which helps us to connect python to
mysql. We are going to use mysql connector in our lesson.
• Mysql connector python is a module or library available in python
to communicate with MySQL.
Prerequisites
▪ You need root or administration privileges to perform the
installation process.
▪ Python 2 or 3 should be installed in your system. And make sure
it is in your system’s path.
▪ Mysql server should be installed and running.
▪ Mysql version should be greater than 4.1

Installing Mysql connector Python


You can install mysql connector either via python-pip or source
code also.
• Installing through python-pip: Open your command line
interface and run the following command.
pip install mysql-connector-python

• If you are facing any issue while installing, please mention the
version of the module and then try again.
pip install mysql-connector-python==8.0.11

Steps to Establish a connection


Create a python file ‘database.py’ and import mysql.connector
module.
Establish a connection using ‘mysql.connector.connect()’ which
takes 4 parameters, host name, user name, password and an optional
database name.

Create a cursor object: A cursor is used to execute SQL queries and


fetch results from the database.

Executing sql queries: Now cursor object is assigned to the


variable ‘c’ . So now we can use this variable to execute
sql queries and fetch the results from our db.

An example of how a cursor object can be used to execute queries is


shown below:
5. Close the cursor and the connection: It is essential to close
the cursor and connection when you are done with them to
release resources and free up the database connection and for
other security reasons.
Frequently used functions for CRUD operations
⮚ cursor.execute(query, values): This function is used to send
an SQL query (e.g., SELECT, INSERT, UPDATE, DELETE) to a
MySQL database from Python. If your query has placeholders
like %s, you provide the actual values in the values parameter,
allowing the query to be executed with dynamic data.

⮚ connection.commit(): After executing INSERT/ UPDATE/


DELETE operation, we need to commit the changes to the
database.
⮚ cursor.fetchone(): Fetches the next row from the result set.

⮚ cursor.fetchall(): Fetches all the rows from the result set


returned by a SELECT query.

Web Interfaces and HyperText Documents


• Data sources: Many Internet applications provide Web
interfaces to access information stored in one or more
databases. These databases are often referred to as data
sources.
• Web interfaces are used to interact with users through web
pages displayed on various devices. Hypertext documents,
typically written in HTML, are used to specify the content and
formatting of these web pages.
• Basic HTML is useful for generating static web pages with fixed
text and other objects.
XML: Extensible Markup Language
• Basic HTML is useful for generating static Web pages, but
dynamic Web pages require more interaction with the user and
data extraction from databases.
• XML is a standard for structuring and exchanging data over the
Web in text files.
• XML documents provide descriptive information, such as
attribute names, as well as the values of these attributes, in a
text file; hence, they are known as self-describing documents.
• XML documents are self-describing, meaning that they contain
descriptive information about the data, such as attribute names
and values.
• XML can be used to provide information about the structure and
meaning of the data in the Web pages rather than just specifying
how the Webpages are formatted for display on the screen.
Why XML over HTML
• XML is chosen over HTML when the primary goal is to structure
and exchange data, as it provides standardized way to describe
data, making it machine-readable.
• Unlike HTML, which focuses on web page presentation, XML
allows users to define custom data structures using elements
and attributes, providing flexibility tailored to specific data needs.
• XML's semantic tag names convey the meaning of data,
enhancing clarity and understanding, while HTML's predefined
tags primarily describe how content should be displayed.
• XML's standardized structure, extensibility, and support for data
validation through schemas enable reliable data exchange,
making it ideal for interoperability between diverse applications
and systems.
Dynamic web pages and its importance

• Dynamic Web pages: Web pages that provide interactive


features with the user and use the information provided by the
user for selecting specific data from a database for display.
• In these kind of web pages, the data parts of the page may be
different each time it is displayed, even though the display
appearance is the same.
• As e-commerce and other Internet applications become
increasingly automated, it is becoming crucial to be able to
exchange Web documents among various computer sites and to
interpret their contents automatically. This need was one of the
reasons that led to the development of XML.
Structured Data
• Structured data can be defined as data that follows a predefined,
organized format, often stored in relational database.
• Ex: Each record in a relational database table, such as each of
the tables in the COMPANY database shown below follows the
same format as the other records.
• For structured data, there is a need to carefully design the
database schema using techniques in order to define the
database structure.

Semistructured Data
• Semistructured data is a type of data that does not conform to a
rigid, predefined structure like structured data but still
possesses some level of organization or hierarchy.
• Unlike structured data, which is typically stored in relational
databases with well-defined schemas, semistructured data
allows for flexibility in terms of data formats and attributes.
• Displaying semi-structured data as directed graph.
• The labels or tags on the directed edges represent the schema
names: the names of attributes, object types and relationships.
• Internal nodes represent individual object or composite
attributes.
• Leaf nodes represent actual atomic values.
Unstructured Data
• Unstructured data refers to data that does not have a predefined
structure or format.
• It lacks a clear and organized schema, making it more
challenging to analyze and process compared to structured data.
• Ex: audio, text documents, web pages

• HTML text documents are very difficult to interpret automatically


by computer programs because they do not include schema
information about the type of data in the documents.
Differences between Semistructured and Structured Data
• Schema Mixing: In the semistructured model, information about
the structure (like attribute names) is mixed together with the
actual data in the same structure. In contrast, the object model
keeps these separate.

• Schema Flexibility: Semistructured data doesn't require a


predefined blueprint or schema that tells how data should be
organized. It allows for flexibility, while the object model insists
on a predefined schema.

• So, in semistructured data, the structure and data blend together,


and you don't need a fixed plan (schema) in advance. This
makes it self-describing because each data item can have its
own unique attributes.
XML Hierarchical (Tree) Data Model

• XML (Extensible Markup Language) serves as a data model


for structuring information in documents. The basic unit of data in
XML is the XML document.
• XML documents are structured using two main concepts:
elements and attributes. Elements are identified by start tags
enclosed in angle brackets (<...>) and end tags with a slash
(</...>). Attributes provide additional information that describes
elements.
• Schema Definitions: XML allows for the definition of tag names
and their meanings in schema documents. This schema provides
a common understanding of data structures that can be shared
among multiple programs and users.
• Hierarchy: XML documents follow a hierarchical or tree
structure. Complex elements are constructed from other
elements in a hierarchical manner, while simple elements contain
data values.
• Flexibility: XML does not limit the depth of nesting for elements,
allowing for flexible and nested data structures.

XML Hierarchical (Tree) Data Model

In the tree representation, internal nodes represent complex elements,


whereas leaf nodes represent simple elements. That is why the XML
model is called a tree model or a hierarchical model.
Types of XML Documents

● Data-centric XML documents: These documents have many


small data items that follow a specific structure and hence may
be extracted from a structured database. They are formatted as
XML documents in order to exchange them over the Web. These
usually follow a predefined schema that defines the tag names.

● Document-centric XML documents: These are documents with


large amounts of text, such as news articles or books. There are
few or no structured data elements in these documents.

● Hybrid XML documents: These documents may have parts that


contain structured data and other parts that are predominantly
textual or unstructured. They may or may not have a predefined
schema.

NO SQL systems

● NoSQL databases ("not only SQL") are non-tabular databases


and store data differently than relational tables.
● NoSQL databases come in a variety of types based on their data
model. The main types are document, key-value, wide-column,
and graph.

● NoSQL databases emerged as the cost of storage dramatically


decreased.
● As storage costs rapidly decreased, the amount of data that
applications needed to store and query increased. This data
came in all shapes and sizes — structured, semi-structured, and
polymorphic — and defining the schema in advance became
nearly impossible.

● NoSQL databases ("not only SQL") are non-tabular databases


and store data differently than relational tables.
● NoSQL databases come in a variety of types based on their data
model. The main types are document, key-value, wide-column,
and graph.

● NoSQL databases allow developers to store huge amounts of


unstructured data, giving them a lot of flexibility.
● Most NOSQL systems are distributed databases or distributed
storage systems, with a focus on semistructured data storage,
high performance, availability, data replication, and scalability as
opposed to an emphasis on immediate data consistency,
powerful query languages, and structured data storage.
Emergence of NoSQL systems
● NoSQL databases emerged as the cost of storage dramatically
decreased.
● As storage costs rapidly decreased, the amount of data that
applications needed to store and query increased. This data
came in all shapes and sizes — structured, semi-structured, and
polymorphic — and defining the schema in advance became
nearly impossible.

■ Google developed a NOSQL system known as


BigTable(column-based or wide column stores), which is used in
many of Google’s applications that require vast amounts of data
storage, such as Gmail, Google Maps, and Website indexing.
Apache Hbase is an open source NOSQL system based on similar
concepts.
■ Amazon developed a NOSQL system called DynamoDB
( key-value data stores or key-object data stores) that is available
through Amazon’s cloud services.
■ Facebook developed a NOSQL system called Cassandra, which is
now open source and known as Apache Cassandra. This NOSQL
system uses concepts from both key-value stores and column-based
systems.
■ Other software companies started developing own solutions and
making them available to users who need these capabilities—for
example, MongoDB and CouchDB, which are classified as
document-based NOSQL systems or document stores.
■ Another category of NOSQL systems is the graph-based NOSQL
systems,or graph databases; these include Neo4J and GraphBase.
NOSQL Characteristics related to related to distributed
databases and distributed systems
1. Scalability: In NOSQL systems, horizontal scalability is
employed while the system is operational, so techniques for
distributing the existing data among new nodes without
interrupting system operation are necessary.
2. Availability, Replication and Eventual Consistency: Many
applications that use NOSQL systems require continuous system
availability. To accomplish this, data is replicated over two or
more nodes in a transparent manner, so that if one node fails,
the data is still available on other nodes. Replication improves
data availability and can also improve read performance,
because read requests can often be serviced from any of the
replicated data nodes.

3.Replication Models: Two major replication models are used in


NOSQL systems: master-slave and master-master replication.
Master-slave replication requires one copy to be the master copy; all
write operations must be applied to the master copy and then
propagated to the slave copies, usually using eventual consistency
(the slave copies will eventually be the same as the master copy).The
master-master replication allows reads and writes at any of the
replicas but may not guarantee that reads at nodes that store different
copies see the same values.
4. Sharding : Sharding of the file records is often employed in NOSQL
systems. This serves to distribute the load of accessing the file
records to multiple nodes. The combination of sharding the file records
and replicating the shards works in tandem to improve load balancing
as well as data availability.

5. High-Performance Data Access: In many NOSQL applications, it


is necessary to find individual records or objects (data items) from
among the millions of data records or objects in a file. To achieve this,
most systems use
one of two techniques: hashing or range partitioning on object keys.
• In hashing, a hash function h(K) is applied to the key K, and the
location of the object with key K is determined by the value of
h(K).
• In range partitioning, the location is determined via a range of
key values;
for example, location i would hold the objects whose key
values K are in the range Kiٖ min ≤ K ≤ Ki max.

NOSQL characteristics related to data models and query


languages.
1. Not Requiring a Schema: The flexibility of not requiring a
schema is achieved in many NOSQL systems by allowing
semi-structured, self-describing data .
2. Less Powerful Query Languages: Many applications that use
NOSQL systems may not require a powerful query language
such as SQL, because search (read) queries in these systems
often locate single objects in a single file based on their object
keys. NOSQL systems typically provide a set of functions and
operations as a programming API (application programming
interface), so reading and writing the data objects is
accomplished by calling the appropriate operations by the
programmer.
3. Versioning: Some NOSQL systems provide storage of multiple
versions of the data items, with the timestamps of when the data
version was created.

Categories of NoSQL systems

1. Document-based NOSQL systems: These systems store data in


the form of documents using well-known formats, such as JSON
(JavaScript Object Notation).
Examples - CouchDB, MongoDB
2. NOSQL key-value stores: These systems have a simple data
model based on fast access by the key to the value associated with
the key; the value can be a record or an object or a document or even
have a more complex data structure.
Examples - DynamoDB, Redis
3. Column-based / wide column NOSQL systems: These systems
partition a table by column into column families (a form of vertical
partitioning), where each column family is stored in its own files.
Examples - Cassandra, HBase
4. Graph-based NOSQL systems: Data is represented as graphs,
and related nodes can be found by traversing the edges using path
expressions. Example: neo4j

CAP THEOREM

• The CAP theorem, which was originally introduced as the CAP


principle, can be used to explain some of the competing
requirements in a distributed system with replication.

• The three letters in CAP refer to three desirable properties of


distributed systems with replicated data: consistency (among
replicated copies), availability (of the system for read and write
operations) and partition tolerance (in the face of the nodes in
the system being partitioned by a network fault).
• Consistency means that the nodes will have the same copies of
a replicated data item visible for various transactions.
• Availability means that each read or write request for a data
item will either be processed successfully or will receive a
message that the operation cannot be completed.
• Partition tolerance means that the system can continue
operating if the network connecting the nodes has a fault that
results in two or more partitions, where the nodes in each
partition can only communicate among each other.

The CAP theorem states that:


“It is not possible to guarantee all three of the desirable
properties— consistency, availability, and partition
tolerance—at the same time in a distributed system with
data replication”.
• If this is the case, then the distributed system designer would
have to choose two properties out of the three to guarantee.

• It is generally assumed that in many traditional (SQL)


applications, guaranteeing consistency through the ACID
properties is important. On the other hand, in a NOSQL
distributed data store, a weaker consistency level is often
acceptable, and guaranteeing the other two properties
(availability, partition tolerance) is important.

• Hence, weaker consistency levels are often used in NOSQL


system instead of guaranteeing serializability. In particular, a
form of consistency known as eventual consistency is often
adopted in NOSQL systems.

Document-Based NOSQL Systems

● Document-based NOSQL systems typically store data as


collections of similar documents. These types of systems are
also sometimes known as document stores.

● The individual documents somewhat resemble complex objects


or XML documents, but a major difference between
document-based systems versus object and object-relational
systems and XML is that there is no requirement to specify a
schema—rather, the documents are specified as self-describing
data.

● Although the documents in a collection should be similar, they


can have different data elements (attributes), and new
documents can have new data elements that do not exist in any
of the current documents in the collection.

● Documents can be specified in various formats, such as XML. A


popular language to specify documents in NOSQL systems is
JSON (JavaScript Object Notation).

MongoDB Data Model

• MongoDB was designed as a scalable database - the name


Mongo comes from “humongous”- with performance and easy
data access as core design goals.

• It is a document database, which allows you to store objects


nested to any depth and the nested data can be queried in an
ad-hoc fashion.

• It enforces no schema, so documents can contain fields or types


that no other document in the collection contains.

• A Mongo document is similar to a relational table row without a


schema, whose values can nest to an arbitrary depth.

• MongoDB stores data as documents.

• A document in MongoDB consists of field-value pairs.

• MongoDB documents are stored in BSON (Binary JSON) format,


which is a variation of JSON with some additional data types and
is more efficient for storage than JSON.

• Documents are organized in a structure called collection.

• A collection does not have a schema. The structure of the data


fields in documents is chosen based on how documents will be
accessed and used, and the user can choose a normalized
design (similar to normalized relational tuples) or a denormalized
design (similar to XML documents or complex objects).

• To make an analogy, document can be considered as a row in a


table and collection can be considered as an entire table.

COLLECTION

The operation createCollection is used to create each collection.

For example, the following command can be used to create a


collection called project to hold PROJECT objects from the
COMPANY database :

db.createCollection(“project”, { capped : true, size :


1310720, max : 500 } )

The first parameter “project” is the name of the collection, which is


followed by an
optional document that specifies collection options. In this example,
the collection
is capped; this means it has upper limits on its storage space (size)
and number of
documents (max). The capping parameters help the system choose
the storage options for each collection.

• Each document in a collection has a unique ObjectId field, called


_id, which is
automatically indexed in the collection unless the user explicitly
requests no index
for the _id field.

• The value of ObjectId can be specified by the user, or it can be


system-generated if the user does not specify an _id field for a
particular document.

• System-generated ObjectIds have a specific format, which


combines the timestamp when the object is created (4 bytes, in
an internal MongoDB format), the node(3 bytes), the process id
(2 bytes), and a counter (3 bytes) into a 16-byte Id value.

• User-generated ObjectsIds can have any value specified by the


user as long as it uniquely identifies the document . These Ids
are similar to primary keys in relational systems.

RDBMS MongoDB

It is a relational database. It is a non-relational and


document-oriented database.

Not suitable for hierarchical data Suitable for hierarchical data storage.
storage.
It is vertically scalable i.e increasing It is horizontally scalable i.e we can add
RAM. more servers.

It has a predefined schema. It has a dynamic schema.

It centers around ACID properties It centers around the CAP theorem


(Atomicity, Consistency, Isolation, (Consistency, Availability, and Partition
and Durability). tolerance).

It is row-based. It is document-based.

It is slower in comparison with It is almost 100 times faster than


MongoDB. RDBMS.

Supports complex joins. No support for complex joins.

CRUD Operations in MongoDB

MongoDB has several CRUD operations.


CRUD stands for:
• Create
• Read
• Update
• Delete

Creating a collection

MongoDB stores documents in collections. Collections are analogous


to tables in relational databases

db.createCollection is used to create collection.


Format: db.createCollection(name, options)
name indicates name of the collection to be created.
options is a document and is used to specify configuration of
collection.

Example: db.createCollection(“myCollection”) // myCollection is the


collection created

Inserting documents into collections


Documents can be created and inserted into their collections
using
the insert operation, whose format is:
db.<collection_name>.insert(<document(s)>)

The parameters of the insert operation can include either a


single document or an
array of documents.

Example -
db.project.insert( { _id: “P1”, Pname: “ProductX”, Plocation:
“Brooklyn” } )

db.worker.insert( [ { _id: “W1”, Ename: “Jacob Peralta”, ProjectId:


“P1”, Hours: 22.5 },
{ _id: “W2”, Ename: “Charles Boyle”, ProjectId: “P1”, Hours: 30.0
}])
READ OPERATION

For read queries, the main command is called find, and the format is:
db.<collection_name>.find(<condition>)
General Boolean conditions can be specified as <condition>, and the
documents in
the collection that return true are selected for the query result.

Query: Display the first document in the collection

db.employee.findOne()

Query: Display the document of employee with empid=2

db.employee.find({“empid”:2})

Query: Display the document related to employees whose age is


less than 30

db.employee.find({“age”:{$lt:”30”}})
Query: Return the employees born after ‘1995-08-07’ from
employee collection

db.employee.find( { “birth”: { $gt: new Date('1995-08-07') } } )

Query : Return all the employees who have Java as a skill

db.employee.find({“skill”:”Java”})

Query: Return all the employees skilled in Java and salary of


75000

db.employee.find({“skill”:”Java”,”salary”:”75000”})

Query: Return all employees either skilled in Java or salary of


80000

db.employee.find({$or:[{“skill”:”Java”},{”salary”:”75000”}])

//and -> , or $and [{condition 1},{condition 2}]


//or -> $or:[{condition 1},{condition 2}]

Query: Return documents where DOB is between 1997-09-02 and


2001-11-09

db.employees.find( { dob: { $gt: new Date('1997-09-02'), $lt: new


Date('2001-11-09') } } )

Query: Return documents where employees are skilled in “Java”


and either with salary 80000 or 95000

db.employee.find({“skill”:”Java”}, $or:[{”salary”:”80000”},
{”salary”:”95000”}])

To display only necessary data from documents in the collection

Format: db.collection_name.find(selection_criteria, fields required)

db.employee.find({},{“firstname”:1})

//display firstname and object_id of all the employees

{} -> all documents ie no selection criteria


1 -> indicates display firstname (select firstname)
0 -> indicates do not select the field

db.employee.find({},{“firstname”:1,”_id”:0})

//displays only firstname of all the employees

UPDATE OPERATION

Update operation has a condition to select certain documents, and a


$set clause to
specify the update. It is also possible to use the update operation to
replace an
existing document with another one but keep the same ObjectId.

Format: db.collection_name.update(selection_criteria, update_value)

Query: update salaries of all employees with skill “mongodb”

db.employee.update({“skill”:”mongodb”},{$set:{“salary”:”1000000”}})

// modifies one document


To update all documents:

db.employee.update({“skill”:”mongodb”},{$set:{“salary”:”1000000”}})
{multi:true}

DELETING A COLLECTION

The delete operation is called remove, and the format is:


db.<collection_name>.remove(<condition>)

db.collection_name.remove({}) // removes all documents from the


collection

Query: Remove all employees skilled in “mongodb”

db.employee.remove({“skill”:”mongodb”}}) // removes all employees


skilled in mongodb

Replication in MongoDB
• The concept of replica set is used in MongoDB to create multiple
copies of the same data set on different nodes in the distributed
system
• It uses a variation of the master-slave approach for replication.

• For example, if we want to replicate a particular document


collection C. A replica set will have one primary copy of the
collection C stored in one node N1, and at least one secondary
copy (replica) of C stored at another node N2.
• Additional copies can be stored in nodes N3, N4, etc., as
needed, but the cost of storage and update (write) increases with
the number of replicas.

Replication in MongoDB.
• The total number of participants in a replica set must be at least
three, so if only one secondary copy is needed, a participant in
the replica set known as an arbiter must run on the third node
N3.

• The arbiter does not hold a replica of the collection but


participates in elections to choose a new primary if the node
storing the current primary copy fails.

• If the total number of members in a replica set is n (one primary


plus i secondaries, for a total of n = i + 1), then n must be an odd
number; if it is not, an arbiter is added to ensure the election
process works correctly if the primary fails.
• In MongoDB replication, all write operations must be applied to
the primary copy and then propagated to the secondaries.

• For read operations, the user can choose the particular read
preference for their application. The default read preference
processes all reads at the primary copy, so all read and write
operations are performed at the primary node.
• In this case, secondary copies are mainly to make sure that the
system continues operation if the primary fails, and MongoDB
can ensure that every read request gets the latest document
value.
Sharding
• Sharding is a method for distributing or partitioning data across
multiple machines.
• It is useful when no single machine can handle large modern-day
workloads, by allowing you to scale horizontally.
• Horizontal scaling, also known as scale-out, refers to adding
machines to share the data set and load. Horizontal scaling
allows for near-limitless scaling to handle big data and intense
workloads.

Sharding in MongoDB
• When a collection holds a very large number of documents or
requires a large storage space, storing all the documents in one
node can lead to performance problems, particularly if there are
many user operations accessing the documents concurrently
using various CRUD operations.

• Sharding of the documents in the collection—also known as


horizontal partitioning—divides the documents into disjoint
partitions known as shards.

• Sharding allows the system to add more nodes as needed by a


process known as horizontal scaling of the distributed system,
and to store the shards of the collection on different nodes to
achieve load balancing. Each node will process only those
operations pertaining to the documents in the shard stored at
that node.

• There are two ways to partition a collection into shards in


MongoDB—
➢ Range partitioning
➢ Hash partitioning
Both require that the user specify a particular document field to
be used as the basis for partitioning the documents into shards.
• The partitioning field (shard key) must have two characteristics:
1. It must exist in every document in the collection
2. It must have an index.

• The values of the shard key are divided into chunks through
range partitioning or hash partitioning, and documents are
partitioned based on the chunks of shard key values.
Range Partitioning

• Range partitioning creates the chunks by specifying a range of


key values.

• For example, if the shard key values ranged from one to ten
million, it is possible to create ten ranges—1 to 1,000,000;
1,000,001 to 2,000,000; ... ; 9,000,001 to 10,000,000—and each
chunk would contain the key values in one range.

Hash Partitioning
• Hash partitioning applies a hash function h(K) to each shard key
K, and the partitioning of keys into chunks is based on the hash
values.

Range Partitioning vs Hash partitioning


• In general, if range queries are commonly applied to a collection
(for example, retrieving all documents whose shard key value is
between 200 and 400), then range partitioning is preferred
because each range query will typically be submitted to a single
node that contains all the required documents in one shard.
• If most searches retrieve one document at a time, hash
partitioning may be preferable because it randomizes the
distribution of shard key values into chunks.
• When sharding is used, MongoDB queries are submitted to a
module called the query router, which keeps track of which
nodes contain which shards based on the particular partitioning
method used on the shard keys.
• The query (CRUD operation) will be routed to the nodes that
contain the shards that hold the documents that the query is
requesting. If the system cannot determine which shards hold
the required documents, the query will be submitted to all the
nodes that hold shards of the collection.

• Sharding and replication are used together; sharding focuses on


improving performance via load balancing and horizontal
scalability, whereas replication focuses on
ensuring system availability when certain nodes fail in the
distributed system.

NOSQL Key-Value Stores

● Focus on high performance, availability and scalability


by storing the data in “distributed storage systems”.
● Simple data model and in most cases no query language
just a set of operations that can be used by the
application programmers.
● Key: A unique identifier associated with a data item and
is used to locate this data item rapidly.
● Value: The data item itself, and it can have very different
formats for different key-value storage systems.
● In some cases the value is a string/array of bytes and
application using the datastore interprets the structure, in
other cases a standard data format is allowed (ex. JASON)
DynamoDB
● Part of Amazon’s AWS/SDK Platforms and used as part of
the Amazon cloud computing services
● The basic data model in DynamoDB uses the concepts of
tables, items, and attributes
● Table: A table in DynamoDB does not have a schema; it
holds a collection of self-describing items.
● Item: Each item will consist of a number of (attribute, value)
pairs, and attribute values can be single-valued or
multivalued. A table will hold a collection of items, and each
item is a self-describing record (or object).
● The table name and a primary key is required when
creating a table. Primary key must exist in all the items in
the table.

Types of Primary Key


● Single Attribute: This attribute is used to build a hash
index on the items in the table. Called as a hash type
primary key. Items aren’t ordered on the value of hash
attribute in storage.
● Pair of Attributes: Called as a hash and range type
primary key.
○ The primary key is a pair of attributes (A, B) where A
is used for hashing and B is used for ordering the
records which have the same A value
○ Table with this type of key can have secondary
indexes defined on it’s attributes.
○ Ex: if we want to store multiple versions of some type
of items in a table, we could use ItemID as hash and
Date or Timestamp (when the version was created) as
range in a hash and range type primary key.
Voldemort Key-Value Distributed Data Source

Introduction
● Voldemort is an open source system available through
Apache 2.0 open source licensing rules. It is based on
Amazon’s DynamoDB.
● High performance, high scalability, high availability (i.e
replication, sharding, horizontal scalability) are realized
through a technique to distribute the key-value pairs among
the nodes of a distributed cluster known as consistent
hashing.
● Voldemort has been used by LinkedIn for data storage.
Features
1. Simple basic operations:
● Collection of key-value pairs is kept in a store(s).
● 3 operations: get, put, delete
○ s.get(k) : retrieves the value v associated with key k.
○ s.put(k, v) : inserts an item as a key-value pair with key k
and value v.
○ s.delete(k) : deletes the item whose key is k from the store.
● At the basic storage level, both keys and values are arrays of
bytes (strings).

2. High-level formatted data values:


● The values (v) can be specified in JSON format.
● Other formats can be specified if the application provides
conversion(serialization) between user format and storage
format as a serializer class.
● The Serializer class must be provided by the user and will
include operations to convert the user format into a string of
bytes for storage as a value, and to convert back a string (array
of bytes) retrieved via s.get(k) into the user format.
● Voldemort has some built-in serializers for formats other than
JSON.
3. Consistent hashing for distributing (key, value) pairs:
● A variation of the data distribution algorithm known as
consistent hashing is used in Voldemort for data distribution
among the nodes in the distributed cluster of nodes.
● A hash function h(k) is applied to the key k of each (k, v) pair,
and h(k) determines where the item will be stored.
● The method assumes that h(k) is an integer value, usually in the
range 0 to Hmax = 2n-1, where n is chosen based on the desired
range for the hash values.

● Consider the range of all possible integer hash values 0 to Hmax


to be evenly distributed on a circle (or ring). The nodes in the
distributed system are then also located on the same ring and
each node has several locations on the ring.
● An item (k, v) will be stored on the node whose position in the
ring follows the position of h(k) on the ring in a clockwise
direction.
● In Figure 24.2(a), we assume there are three nodes in the
distributed cluster labeled A, B, and C, where node C has a
bigger capacity than nodes A and B. In a typical system, there
will be many more nodes. On the circle, two instances each of A
and B are placed, and three instances of C (because of its higher
capacity), in a pseudorandom manner to cover the circle.
● The h(k) values that fall in the parts of the circle marked as range
1 in Figure 24.2(a) will have their (k, v) items stored in node A
because that is the node whose label follows h(k) on the ring in a
clockwise direction; those in range 2 are stored in node B; and
those in range 3 are stored in node C.
● This scheme allows horizontal scalability because when a new
node is added to the distributed system, it can be added in one
or more locations on the ring depending on the node capacity.
● Only a limited percentage of the (k, v) items will be reassigned to
the new node from the existing nodes based on the consistent
hashing placement algorithm.
● Also, those items assigned to the new node may not all come
from only one of the existing nodes because the new node can
have multiple locations on the ring.
● For example, if a node D is added and it has two placements on
the ring as shown in Figure 24.2(b), then some of the items from
nodes B and C would be moved to node D. The items whose
keys hash to range 4 on the circle (see Figure 24.2(b)) would be
migrated to node D.
● This scheme also allows replication by placing the number of
specified replicas of an item on successive nodes on the ring in a
clockwise direction.
● The sharding is built into the method, and different items in the
store (file) are located on different nodes in the distributed
cluster, which means the items are horizontally partitioned
(sharded) among the nodes in the distributed system.
● When a node fails, its load of data items can be distributed to the
other existing nodes whose labels follow the labels of the failed
node in the ring. And nodes with higher capacity can have more
locations on the ring, as illustrated by node C in Figure 24.2(a),
and thus store more items than smaller-capacity nodes.

4. Consistency and versioning:


● Uses a method similar to the one used by DynamoDB for
consistency in the presence of replicas.
● Concurrent write operations are allowed by different processes.
● Two or more different values can be associated with the same
key at different nodes when items are replicated
● Consistency is achieved when by using a technique known as
versioning and read repair.
● Concurrent writes are allowed and each write is associated
with a vector clock value.
● For a read where different versions of the same value
(associated with the same key) are read from different nodes, if
the system can reconcile to a single final value then it will pass
that value to the read.
● If the system can’t reconcile to a single value then more than one
version can be passed back to the application which does the
reconciliation and passes the value back to the nodes.
Some other Key-Value Stores
● Oracle key-value store: Oracle has one of the well-known SQL
relational database systems, and Oracle also offers a system
based on the key-value store concept, this system is called the
Oracle NoSQL Database.
● Redis key-value cache and store: Redis differs from the other
systems discussed here because it caches its data in main
memory to further improve performance. It offers master-slave
replication and high availability, and it also offers persistence by
backing up the cache to disk.
● Apache Cassandra: Cassandra is a NOSQL system that is not
easily categorized into one category; it is sometimes listed in the
column-based NOSQL category or in the key-value category.
It offers features from several NOSQL categories and is used by
Facebook as well as many other customers.

NOSQL Graph Databases and Neo4j

Introduction
● Another category of NOSQL systems is known as graph
databases or graph-oriented NOSQL systems.
● Data is represented as a graph - collection of
nodes(vertices) and edges.
● Nodes and edges can be labeled to indicate the types of
entities and relationships they represent.
● Possible to store data associated with both individual
nodes and individual edges.
● Many systems can be categorized as graph databases.
● We will focus our discussion on one particular system,
Neo4j, which is used in many applications.
● Neo4j is an open source system, and it is implemented in
Java.
Neo4j Data Model
● Neo4j organizes data using the concepts of nodes and
relationships.
● Both nodes and relationships can have properties, which
store the data items associated with nodes and
relationships.
● Nodes can have labels (≥0) and the ones which have the
same label are grouped into a collection that identifies a
subset of the nodes in the database graph for querying
purposes.
● Relationships are directed; each relationship has a start
node and end node as well as a relationship type, which
serves a similar role to a node label by identifying similar
relationships that have the same relationship type.
● Properties can be specified via a map pattern, which is
made of one or more “name : value” pairs for example
{Lname : ‘Smith’, Fname : ‘John’, Minit : ‘B’}.
● To create nodes and relationships in Neo4j we use the
Neo4j CREATE command, which is part of the high-level
declarative query language Cypher.
Neo4j Data Model
● Nodes in Neo4j correspond to entities
● Node labels correspond to entity types and subclasses
● Relationships correspond to relationship instances
● Properties correspond to attributes

Neo4j ER Model

Relationship is directed. Relationship is not directed.

Node may have no label in Neo4j. Every entity must belong to an entity
type, so a node must have a label.

Graph model of Neo4j is used as a The ER/EER(Enhanced ER) model is


basis for an actual high-performance mainly used for database design.
distributed database system
Labels and properties
● When a node is created, the node label can be specified. It
is also possible to create nodes without any labels.
● In Figure 24.4(a), the node labels are EMPLOYEE,
DEPARTMENT, PROJECT, and LOCATION, the created
nodes correspond to some of the data from the COMPANY
database.
● It is possible that some nodes have multiple labels.
● For example the same node can be labeled as PERSON
and EMPLOYEE and MANAGER by listing all the labels
separated by the colon symbol as follows:
PERSON:EMPLOYEE:MANAGER.
● Having multiple labels is similar to an entity belonging to an
entity type (PERSON) plus some subclasses of PERSON
(namely EMPLOYEE and MANAGER).
Relationships and relationship types
● Figure 24.4(b) shows a few example relationships in Neo4j
based on the COMPANY database
● The → specifies the direction of the relationship, but the
relationship can be traversed in either direction.
● The relationship types (labels) in Figure 24.4(b) are
WorksFor, Manager, LocatedIn, and WorksOn
● Only relationships with the relationship type WorksOn have
properties (Hours) in Figure 24.4(b).
Paths
● A path specifies a traversal of part of the graph.
● It is typically used as part of a query to specify a pattern,
where the query will retrieve from the graph data that
matches the pattern.
● A path is typically specified by a start node, followed by one
or more relationships, leading to one or more end nodes
that satisfy the pattern.
● It is somewhat similar to the concepts of path expressions
in the context of query languages for object databases
(OQL) and XML (XPath and XQuery).
Optional Schema
● A schema is optional in Neo4j.
● Graphs can be created and used without a schema, but in
Neo4j version 2.0, a few schema-related functions were
added.
● The main features related to schema creation involve
creating indexes and constraints based on the labels and
properties.
● For example, it is possible to create the equivalent of a key
constraint on a property of a label, so all nodes in the
collection of nodes associated with the label must have
unique values for that property.

Indexing and node identifiers


● When a node is created, the Neo4j system creates an
internal unique system-defined identifier for each node.
● To retrieve individual nodes using other properties of the
nodes efficiently, the user can create indexes for the
collection of nodes that have a particular label.
● Typically, one or more of the properties of the nodes in
that collection can be indexed.
● For example, Empid can be used to index nodes with the
EMPLOYEE label, Dno to index the nodes with the
DEPARTMENT label, and Pno to index the nodes with the
PROJECT label.
The Cypher Query Language of Neo4j
Introduction
● Neo4j has a high-level query language, Cypher.
● It has declarative commands for creating nodes and
relationships as well as for finding nodes and relationships
based on specifying patterns.
● Deletion and modification of data is also possible in
Cypher.
● A Cypher query is made up of clauses. When a query has
several clauses, the result from one clause can be the input
to the next clause in the query.
Figure 24.4(c) summarizes some of the main clauses that can be part
of a Cyber query. The Cyber language can specify complex queries
and updates on a graph database. We will give a few of examples to
illustrate simple Cyber queries in Figure 24.4(d).
● Query 1 in Figure 24.4(d) shows how to use the MATCH and
RETURN clauses in a query, and the query retrieves the
locations for department number 5. Match specifies the pattern
and the query variables (d and loc) and RETURN specifies the
query result to be retrieved by referring to the query variables.
● Query 2 has three variables (e, w, and p), and returns the
projects and hours per week that the employee with Empid = 2
works on.
● Query 3, on the other hand, returns the employees and hours
per week who work on the project with Pno = 2.
● Query 4 illustrates the ORDER BY clause and returns all
employees and the projects they work on, sorted by Ename.
● It is also possible to limit the number of returned results by using
the LIMIT clause as in query 5, which only returns the first 10
answers.
● Query 6 illustrates the use of WITH and aggregation, although
the WITH clause can be used to separate clauses in a query
even if there is no aggregation. Query 6 also illustrates the
WHERE clause to specify additional conditions, and the query
returns the employees who work on more than two projects, as
well as the number of projects each employee works on. It is
also common to return the nodes and relationships themselves
in the query result, rather than the property values of the nodes
as in the previous queries.
● Query 7 is similar to query 5 but returns the nodes and
relationships only, and so the query result can be displayed as a
graph using Neo4j’s visualization tool. It is also possible to add
or remove labels and properties from nodes.
● Query 8 shows how to add more properties to a node by adding
a Job property to an employee node.

Neo4j Interfaces and Distributed System


Characteristics
● Graph visualization interface: Neo4j has a graph visualization
interface, so that a subset of the nodes and edges in a database
graph can be displayed as a graph. This tool can be used to
visualize query results in a graph representation.
● Master-slave replication: Neo4j can be configured on a cluster
of distributed system nodes (computers), where one node is
designated the master node. The data and indexes are fully
replicated on each node in the cluster. Various ways of
synchronizing the data between master and slave nodes can be
configured in the distributed cluster.
● Caching: A main memory cache can be configured to store the
graph data for improved performance.
● Logical logs: Logs can be maintained to recover from failures.

Query Processing

Introduction

• A language such as SQL is suitable for human use, but it is ill


suited to be the system’s internal representation of a query.

• A more useful internal representation is one based on the


extended relational algebra.

• Thus, the first action the system must take in query processing is
to translate a given query into its internal form. This translation
process is similar to the work performed by the parser of a
compiler.

• Query processing refers to the range of activities involved in


extracting data from a database. The activities include translation
of queries in high-level database languages into expressions that
can be used at the physical level of the file system, a variety of
query-optimizing transformations, and actual evaluation of
queries.

Steps in Query Processing

1. Parsing and translation

• Translate the query into its internal form. This is then


translated into relational algebra.
• Parser checks syntax and verifies relations.
1. Optimization

• Construct a query-evaluation plan that minimizes the cost


of query evaluation.

1. Evaluation

• The query-execution engine takes a query-evaluation plan,


executes that plan, and returns the answers to the query.

• Given a query, there are generally a variety of methods for


computing the answer.

• For example, we have seen that, in SQL, a query could be


expressed in several different ways. Each SQL query can itself be
translated into a relational-algebra expression in one of several
ways.

• The relational-algebra representation of a query specifies only


partially how to evaluate a query; there are usually several ways
to evaluate relational-algebra expressions. As an illustration,
consider the query:
• This query can be translated into either of the following
relational-algebra expressions:

Query Evaluation Plan

• To specify fully how to evaluate a query, we need not only to


provide the relational algebra expression, but also to annotate it
with instructions specifying how to evaluate each operation.

• Annotations may state the algorithm to be used for a specific


operation or the particular index or indices to use. A
relational-algebra operation annotated with instructions on how to
evaluate it is called an evaluation primitive.

• A sequence of primitive operations that can be used to evaluate a


query is a query-execution plan or query-evaluation plan.

• The query-execution engine takes a query-evaluation plan,


executes that plan, and returns the answers to the query.
Figure illustrates an evaluation, in which a particular index
(denoted in the figure as “index 1”) is specified for the selection
operation.

Measures of Query Cost

• There are multiple possible evaluation plans for a query, and it is


important to be able to compare the alternatives in terms of their
(estimated) cost, and choose the best plan.

• To do so, we must estimate the cost of individual operations and


combine them to get the cost of a query evaluation plan.

• The cost of query evaluation can be measured in terms of a


number of different resources, including disk accesses, CPU
time to execute a query, and, in parallel and distributed database
systems, the cost of communication.

• For large databases resident on magnetic disk, the I/O cost to


access data from disk usually dominates the other costs; thus,
early cost models focused on the I/O cost when estimating the
cost of query operations.
• For simplicity we just use the number of block transfers from
disk and the number of seeks as the cost measures and ignore
CPU costs.

• We use the number of blocks transferred from storage and the


number of seeks , each of which will require a disk seek on
magnetic storage, as two important factors in estimating the cost
of a query-evaluation plan.

• If the disk subsystem takes an average of tT seconds to transfer a


block of data and has an average block-access time (disk seek
time plus rotational latency) of tS seconds, then an operation that
transfers b blocks and performs S random I/O accesses would
take b ∗ tT + S ∗ tS seconds.

• tT – time to transfer one block


• Assuming for simplicity that write cost is same as read cost
• tS – time for one seek

• The above model of estimating the total disk access time


(including seek and data transfer) is an example of such a
resource consumption–based model of query cost.

tT and tS depend on where data is stored.

With 4 KB blocks:

• High end magnetic disk: tS = 4 msec and tT =0.1 msec.

• SSD: tS = 20-90 microsec and tT = 2-10 microsec for 4KB.

• The cost estimates do not include the cost of writing the final
result of an operation back to disk. These are taken into account
separately where required.

• The costs of all the algorithms that we consider depend on the


size of the buffer in main memory. In the best case, if data fits in
the buffer, the data can be read into the buffers, and the disk does
not need to be accessed again.

• Several algorithms can reduce disk IO by using extra buffer


space.

• In the worst case, we may assume that the buffer can hold only a
few blocks of data.

• The response time for a query-evaluation plan (that is, the


wall-clock time required to execute the plan), assuming no other
activity is going on in the computer, would account for all these
costs, and could be used as a measure of the cost of the plan.

• Unfortunately, the response time of a plan is very hard to estimate


without actually executing the plan, for the following two reasons:

1. The response time depends on the contents of the buffer


when the query begins execution; this information is not
available when the query is optimized and is hard to
account for even if it were available.

1. In a system with multiple disks, the response time depends


on how accesses are distributed among disks, which is
hard to estimate without detailed knowledge of data layout
on disk.

Selection Operation
• In query processing, the file scan is the lowest-level operator to
access data.

• File scans are search algorithms that locate and retrieve records
that fulfill a selection condition.

• In relational systems, a file scan allows an entire relation to be


read in those cases where the relation is stored in a single,
dedicated file.

Cost estimates for Selection Algorithms


• Linear Search –
• Scan each file block and test all records to see whether
they satisfy the selection condition.
• Cost estimate = br block transfers + 1 seek
• br denotes number of blocks containing records from
relation r
• If selection is on a key attribute, can stop on finding record
• cost = (br /2) block transfers + 1 seek
• Linear search can be applied regardless of
• selection condition or
• ordering of records in the file, or
• availability of indices
• Note: Binary search generally does not make sense since data
is not stored consecutively
• except when there is an index available,
• and binary search requires more seeks than index search
Selection using Indices

• Index scan are search algorithms that use an index


• Selection condition must be on search-key of index.

• Clustering index, equality on key-


• Retrieve a single record that satisfies the corresponding
equality condition
• Cost = (hi + 1) * (tT + tS)

• Clustering index, equality on non-key-


• Retrieve multiple records.
• Records will be on consecutive blocks
• b = number of blocks containing matching records
• Cost = hi * (tT + tS) + tS + tT * b
• Secondary index, equality on key –
• Retrieve a single record if the search-key is a candidate
key
• Cost = (hi + 1) * (tT + tS)
• Secondary index, equality on non-key –
• Retrieve multiple records if search-key is not a candidate
key
• each of n matching records may be on a different block
• Cost = (hi + n) * (tT + tS)
• Can be very expensive!

Selection involving Comparison

• Can implement selections of the form σA≤V (r) or σA ≥ V(r) by using a


linear file scan or by using indices in the following ways:
• Clustering index, comparison- (Relation is sorted on A)
• A clustering ordered index (for example, a clustering B+-tree
index) can be used when the selection condition is a
comparison.
• For σA ≥ V(r) use index to find first tuple ≥ v and scan relation
sequentially from there
• For σA≤V (r) just scan relation sequentially till first tuple > v; do
not use index

• Secondary index, comparison-


• For σA ≥ V(r) use index to find first index entry ≥ v and scan
index sequentially from there, to find pointers to records.
• For σA≤V (r) just scan leaf pages of index finding pointers to
records, till first entry > v
• In either case, retrieve records that are pointed to requires an
I/O per record
• Linear file scan may be cheaper!

Implementation of Complex Selections

• So far, we have considered only simple selection conditions of the


form A op B, where op is an equality or comparison operation. We
now consider more complex selection predicates.

• Conjunction- A conjunctive selection is a selection of the form:


• Disjunction- A disjunctive selection is a selection of the form:

• A disjunctive condition is satisfied by the union of all records


satisfying the individual simple conditions θi.

• Negation - The result of a selection σ¬θ(r) is the set of tuples of r


for which the condition θ evaluates to false.

A7) Conjunctive selection using one index-

• We first determine whether an access path is available for an


attribute in one of the simple conditions.

• If one is, one of the selection algorithms A2 through A6 can


retrieve records satisfying that condition.

• We complete the operation by testing, in the memory buffer,


whether or not each retrieved record satisfies the remaining
simple conditions.

• To reduce the cost, we choose a θi and one of algorithms A1


through A6 for which the combination results in the least cost for
σθi(r).

• The cost of algorithm A7 is given by the cost of the chosen


algorithm.
A8) Conjunctive selection using composite index-

• An appropriate composite index (that is, an index on multiple


attributes) may be available for some conjunctive selections.
• If the selection specifies an equality condition on two or more
attributes, and a composite index exists on these combined
attribute fields, then the index can be searched directly.

• The type of index determines which of algorithms A2, A3, or A4


will be used.

A9) Conjunctive selection by intersection of identifiers-

• Another alternative for implementing conjunctive selection


operations involves the use of record pointers or record
identifiers.

• This algorithm requires indices with record pointers, on the fields


involved in the individual conditions. The algorithm scans each
index for pointers to tuples that satisfy an individual condition.

• The intersection of all the retrieved pointers is the set of pointers


to tuples that satisfy the conjunctive condition.

• The algorithm then uses the pointers to retrieve the actual


records. If indices are not available on all the individual
conditions, then the algorithm tests the retrieved records against
the remaining conditions.

Cost of Conjunctive selection by intersection of identifiers-

• The cost of algorithm A9 is the sum of the costs of the individual


index scans, plus the cost of retrieving the records in the
intersection of the retrieved lists of pointers.

• This cost can be reduced by sorting the list of pointers and


retrieving records in the sorted order.

• Thereby, (1) all pointers to records in a block come together,


hence all selected records in the block can be retrieved using a
single I/O operation, and (2) blocks are read in sorted order,
minimizing disk-arm movement.

A10) Disjunctive selection by union of identifiers-

• If access paths are available on all the conditions of a disjunctive


selection, each index is scanned for pointers to tuples that satisfy
the individual condition.

• The union of all the retrieved pointers yields the set of pointers to
all tuples that satisfy the disjunctive condition. We then use the
pointers to retrieve the actual records.

• However, if even one of the conditions does not have an access


path, we have to perform a linear scan of the relation to find
tuples that satisfy the condition.

• Therefore, if there is even one such condition in the disjunct, the


most efficient access method is a linear scan, with the disjunctive
condition tested on each tuple during the scan.

Sorting
Introduction

• Sorting of data plays an important role in database systems for


two reasons.

• First, SQL queries can specify that the output be sorted.

• Second, and equally important for query processing, several of


the relational operations, such as joins, can be implemented
efficiently if the input relations are first sorted.
• We can sort a relation by building an index on the sort key and
then using that index to read the relation in sorted order.

• However, such a process orders the relation only logically,


through an index, rather than physically.

• Hence, the reading of tuples in the sorted order may lead to a


disk access (disk seek plus block transfer) for each record, which
can be very expensive, since the number of records can be
much larger than the number of blocks.

• For this reason, it may be desirable to order the records


physically.
Sort-Merge Algorithm
• Sorting of relations that do not fit in memory is called external
sorting.
• The most commonly used technique for external sorting is the
external sort–merge algorithm.
• Let M denote the number of blocks in the main memory buffer
available for sorting, that is, the number of disk blocks whose
contents can be buffered in available main memory.

● In the first stage, a number of sorted runs are created; each run
is sorted but contains only some of the records of the relation.
● In the second stage, the runs are merged. Suppose, for now, that
the total number of runs, N, is less than M, so that we can
allocate one block to each run and have space left to hold one
block of output. The merge stage operates as follows:

● The output of the merge stage is the sorted relation. The output
file is buffered to reduce the number of disk write operations.

● The preceding merge operation is a generalization of the


two-way merge used by the standard in-memory sort–merge
algorithm; it merges N runs, so it is called an N-way merge.

● In general, if the relation is much larger than memory, there may


be M or more runs generated in the first stage, and it is not
possible to allocate a block for each run during the merge stage.

● In this case, the merge operation proceeds in multiple passes.


Since there is enough memory for M−1 input buffer blocks,
eachmerge can take M−1 runs as input.

● The initial pass functions in this way: It merges the first M − 1


runs (as described in item 2 above) to get a single run for the
next pass.

● Then, it merges the next M − 1 runs similarly, and so on, until it


has processed all the initial runs. At this point, the number of
runs has been reduced by a factor of M − 1.

● If this reduced number of runs is still greater than or equal to M,


another pass is made, with the runs created by the first pass as
input.
● Each pass reduces the number of runs by a factor of M − 1.

● The passes repeat as many times as required, until the number


of runs is less than M; a final pass then generates the sorted
output.

Cost Analysis of External Sort-Merge

● 1 block per run leads to too many seeks during merge


○ Instead use bb buffer blocks per run
○ read/write bb blocks at a time

● Total number of merge passes required: ⌈log⌊M/bb⌋−1(br/M)⌉.


● Block transfers for initial run creation as well as in each
pass is 2br
○ for final pass, we don’t count write cost
○ we ignore final write cost for all operations since the
output of an operation may be sent to the parent
operation without being written to disk

● Thus total number of block transfers for external sorting:


br(2⌈log⌊M/bb⌋−1(br/M)⌉ + 1)
Cost Analysis of External Sort-Merge

● Cost of seeks
○ During run generation: one seek to read each run and
one seek to write each run
○ 2⌈br∕M⌉

● During the merge phase


○ Need 2⌈br/bb⌉ seeks for each merge pass
○ except the final one which does not require a write

● Total number of seeks:


2⌈br/M⌉ + ⌈br/bb⌉(2⌈log⌊M∕bb⌋−1(br∕M)⌉ − 1)

Join Operation

● In this section, we study several algorithms for computing


the join of relations, and we analyze their respective costs.

● We use the term equi-join to refer to a join of the form r


⋈r.A=s.B s, where A and B are attributes or sets of attributes of
relations r and s, respectively.

● Let’s look at an example as mentioned in the next slide


Example

● Number of records of student: nstudent = 5000.

● Number of blocks of student: bstudent = 100.

● Number of records of takes: ntakes = 10,000.

● Number of blocks of takes: btakes = 400.


Nested-Loop Join

● Figure 15.5 shows a simple algorithm to compute the theta


join, r ⋈θ s, of two relations r and s.

● This algorithm is called the nested-loop join algorithm,


since it basically consists of a pair of nested for loops.

● Relation r is called the outer relation and relation s the


inner relation of the join, since the loop for r encloses the
loop for s.

● The algorithm uses the notation tr ⋅ ts, where tr and ts are


tuples; tr ⋅ ts denotes the tuple constructed by
concatenating the attribute values of tuples tr and ts.
Nested-Loop Join

● Like the linear file-scan algorithm for selection, the


nested-loop join algorithm requires no indices, and it can
be used regardless of what the join condition is.

● Extending the algorithm to compute the natural join is


straightforward, since the natural join can be expressed as
a theta join followed by elimination of repeated attributes by
a projection.

● The only change required is an extra step of deleting


repeated attributes from the tuple tr ⋅ ts, before adding it to
the result.

● The nested-loop join algorithm is expensive, since it


examines every pair of tuples in the two relations.

● Consider the cost of the nested-loop join algorithm. The


number of pairs of tuples to be considered is nr ∗ ns, where
nr denotes the number of tuples in r, and ns denotes the
number of tuples in s.
● For each record in r, we have to perform a complete scan
on s. In the worst case, the buffer can hold only one block
of each relation, and a total of nr ∗ bs + br block transfers
would be required, where br and bs denote the number of
blocks containing tuples of r and s, respectively.
● We need only one seek for each scan on the inner relation s
since it is read sequentially, and a total of br seeks to read r,
leading to a total of nr +br seeks.

● In the best case, there is enough space for both relations to fit
simultaneously in memory, so each block would have to be read
only once; hence, only br + bs block transfers would be required,
along with two seeks.

● If one of the relations fits entirely in main memory, it is beneficial


to use that relation as the inner relation, since the inner relation
would then be read only once.

● Therefore, if s is small enough to fit in main memory, our strategy


requires only a total br + bs block transfers and two seeks—the
same cost as that for the case where both relations fit in
memory.
● Now consider the natural join of student and takes. Assume for
now that we have no indices whatsoever on either relation, and
that we are not willing to create any index.

● We can use the nested loops to compute the join; assume that
student is the outer relation and takes is the inner relation in the
join.

● We will have to examine 5000 ∗ 10,000 = 50 ∗ 106 pairs of


tuples. In the worst case, the number of block transfers is 5000 ∗
400 + 100 = 2,000,100, plus 5000 + 100 = 5100 seeks.

● In the best-case scenario, however, we can read both relations


only once and perform the computation. This computation
requires at most 100 + 400 = 500 block transfers, plus two seeks
—a significant improvement over the worst-case scenario.

● If we had used takes as the relation for the outer loop and
student for the inner loop, the worst-case cost of our final
strategy would have been 10,000 ∗ 100 + 400 = 1,000,400
block transfers, plus 10,400 disk seeks.
● The number of block transfers is significantly less, and
although the number of seeks is higher, the overall cost is
reduced, assuming tS = 4 milliseconds and tT = 0.1
milliseconds.

Merge Join

● The merge-join algorithm (also called the sort-merge-join


algorithm) can be used to compute natural joins and equi-joins.

● Let r(R) and s(S) be the relations whose natural join is to be


computed, and let R ∩ S denote their common attributes.

● Suppose that both relations are sorted on the attributes R ∩ S.


Then, their join can be computed by a process much like the
merge stage in the merge–sort algorithm.
Merge-Join Algorithm

● Figure 15.7 shows the merge-join algorithm. In the algorithm,


JoinAttrs refers to the attributes in R ∩ S, and tr ⋈ ts, where tr and
ts are tuples that have the same values for JoinAttrs, denotes the
concatenation of the attributes of the tuples, followed by
projecting out repeated attributes.

● The merge-join algorithm associates one pointer with each


relation. These pointers point initially to the first tuple of the
respective relations.

● As the algorithm proceeds, the pointers move through the


relation.

● A group of tuples of one relation with the same value on the join
attributes is read into Ss.

● The algorithm in Figure 15.7 requires that every set of tuples Ss


fit in main memory; we discuss extensions of the algorithm to
avoid this requirement shortly.

● Then, the corresponding tuples (if any) of the other relation are
read in and are processed as they are read.

● Figure 15.8 shows two relations that are sorted on their join
attribute a1.
● It is instructive to go through the steps of the merge-join
algorithm on the relations shown in the figure.

● The merge-join algorithm of Figure 15.7 requires that each set


Ss of all tuples with the same value for the join attributes must fit
in main memory.

● This requirement can usually be met, even if the relation s is


large.

● If there are some join attribute values for which Ss is larger than
available memory, a block nested-loop join can be performed for
such sets Ss, matching them with corresponding blocks of tuples
in r with the same values for the join attributes.

● If either of the input relations r and s is not sorted on the join


attributes, they can be sorted first, and then the merge-join
algorithm can be used.

● The merge-join algorithm can also be easily extended from


natural joins to the more general case of equi-joins.
Cost Analysis of Merge-Join Algorithm

● Once the relations are in sorted order, tuples with the same
value on the join attributes are in consecutive order.

● Thereby, each tuple in the sorted order needs to be read only


once, and, as a result, each block is also read only once.

● Since it makes only a single pass through both files (assuming


all sets Ss fit in memory), the merge-join method is efficient; the
number of block transfers is equal to the sum of the number of
blocks in both files, br + bs.
● Assuming that bb buffer blocks are allocated to each relation, the
number of disk seeks required would be ⌈br∕bb⌉ + ⌈bs∕bb⌉ disk
seeks.

● Since seeks are much more expensive than data transfer, it


makes sense to allocate multiple buffer blocks to each relation,
provided extra memory is available.

● For example, with tT = 0.1 milliseconds per 4-kilobyte block, and


tS = 4 milliseconds, the buffer size is 400 blocks (or 1.6
megabytes), so the seek time would be 4 milliseconds for every
40 milliseconds of transfer time; in other words, seek time would
be just 10 percent of the transfer time.

● If either of the input relations r and s is not sorted on the join


attributes, they must be sorted first; the cost of sorting must then
be added to the above costs.

● If some sets Ss do not fit in memory, the cost would increase


slightly.

● Suppose the merge-join scheme is applied to our example of


student ⋈ takes. The join attribute here is ID.

● Suppose that the relations are already sorted on the join attribute
ID. In this case, the merge join takes a total of 400+100 = 500
block transfers.

● If we assume that in the worst case only one buffer block is


allocated to each input relation (that is, bb = 1), a total of 400 +
100 = 500 seeks would also be required; in reality bb can be set
much higher since we need to buffer blocks for only two
relations, and the seek cost would be significantly less.

Introduction to Query Optimization in DBMS

Definition: It is an activity conducted by a query optimizer in a DBMS


to select the best available strategy for executing the query.
● Query optimization occurs after query parsing, which is the initial
step in understanding a query.
● During query optimization, various execution plans are
generated to analyze the parsed query.
● The goal is to select the execution plan with the lowest estimated
cost.
● The catalog manager plays a role in assisting the optimizer by
calculating the cost of each plan.
● Ultimately, the optimizer chooses the optimum plan to execute
the query based on cost considerations.
Query Trees and Heuristics for Query Optimization

1. Query Optimization with Heuristic Rules:


a. Optimization techniques use heuristic rules to modify the
internal representation of a query for improved
performance.
b. This internal representation is typically in the form of a
query tree or a query graph data structure.
2. Initial Query Representation:
a. The SQL query goes through a scanner and parser,
generating an initial data structure that
represents the query.
3. Heuristic Optimization:
a. The initial query representation is optimized using heuristic
rules.
b. The goal is to enhance the expected performance of the
query.
4. Optimized Query Representation:
a. After optimization, a new query representation is obtained.
b. This optimized representation guides the query execution
strategy.
5. Query Execution Plan:
a. A query execution plan is generated based on the
optimized query represent.
b. This plan is designed to execute groups of operations
considering the access paths available on the files involved
in the query.
6. Example Heuristic Rule:
a. One common heuristic rule is to apply SELECT and
PROJECT operations before JOIN or other binary
operations.
b. This order is preferred because the size of the result file
from a binary operation (like JOIN) is often a multiplicative
function of the sizes.
Example:
Suppose we have two large tables in a database, one with customer
information (e.g., names, addresses) and another with order details
(e.g., order IDs, products purchased). If we want to find the total sales
for customers in a specific city, we can optimize the query as follows:

● First, apply SELECT to filter out customers in the desired city.


This reduces the size of the customer table significantly.
● Then, apply PROJECT to select only the necessary columns
(e.g., customer IDs and order amounts).
● Finally, perform a JOIN with the order details table to calculate
the total sales for each customer in that city.

By following this heuristic rule, we reduce the amount of data


processed during the JOIN operation, leading to better query
performance.

Notation for Query Trees and Query Graphs

1. Query Tree Overview:


a. A query tree is a tree-like data structure.
b. It corresponds to an extended relational algebra
expression, which is commonly used in database queries.
2. Structure of a Query Tree:
a. Query tree leaves represent the input relations (tables) of
the query.
b. Internal nodes represent relational algebra operations (e.g.,
SELECT, PROJECT, JOIN).
3. Execution Process:
a. Executing a query tree involves performing operations at
internal nodes.
b. Operations are executed when their operands (input
relations) become available.
c. After executing an operation, the internal node is replaced
by the resulting relation.
d. Execution order starts at the leaf nodes (input relations)
and proceeds upward to the root node (final operation).
4. Termination:
a. Query tree execution ends when the root node operation is
executed.
b. The result relation, representing the desired query output,
is produced.
Given below is a relational algebra expression for the query tree
above:

This corresponds to the following SQL query:

● In general, many different relational algebra expressions—and


hence many different query trees—can be semantically
equivalent; that is, they can represent the same query and
produce the same results.

● The query parser will typically generate a standard initial query


tree to correspond to an SQL query, without doing any
optimization

● The heuristic query optimizer will transform this initial query tree
into an equivalent final query tree that is efficient to execute.

Let’s look into this with an example:


Consider the following query Q on the database in Figure 5.5: Find the
last names of employees born after 1957 who work on a project
named ‘Aquarius’. This query can be specified in SQL as follows:
1. Initial Query Tree:
a. Figure 19.2(a) depicts the initial query tree for Q.
b. Direct execution of this tree results in a massive Cartesian
product of the entire EMPLOYEE, WORKS_ON, and
PROJECT tables.
c. This intermediate result would contain all possible
combinations of employees, projects, and work
assignments.
d. The execution of such a query would be highly inefficient in
terms of time and resources, especially if the tables are
large.
2. Improved Query Tree (Figure 19.2(b)):
a. This improved tree applies SELECT operations before
proceeding with other operations.
b. The SELECT operations act as filters, reducing the number
of tuples in the Cartesian product.
c. Essentially, it narrows down the dataset early in the
execution, making subsequent operations more efficient by
working with a smaller set of data.
3. Further Improvement (Figure 19.2(c)):
a. In Figure 19.2(c), the EMPLOYEE and PROJECT relations
are switched in the tree.
b. This change is based on the knowledge that Pnumber is a
key attribute of the PROJECT relation, meaning it uniquely
identifies each project.
c. By placing the PROJECT relation first in the query
tree(Figure 19.2(d)), it can be filtered directly by the
SELECT operation to retrieve only the 'Aquarius' project
record.
d. This reduces unnecessary computations, as only one
record from PROJECT is needed, making the query more
efficient.
4. Attribute Reduction (Figure 19.2(e)):
a. Another optimization technique is to include PROJECT (π)
operations early in the query tree.
b. The purpose of these PROJECT operations is to limit the
attributes (columns) included in intermediate relations.
c. By selecting only the necessary attributes as soon as
possible in the query execution process, you reduce the
data size that needs to be processed in subsequent
operations.

We will state some transformation rules that are useful in query


optimization, without proving them:
Heuristic Optimization of Query Trees(example with rules)

Here we will see how to simplify the query as discussed with


rules this time to make you understand how to use the rules:

1. Cascade SELECT Operations:


a. Utilize Rule 1 to break down SELECT operations with
conjunctive conditions into a cascade.
b. This enhances flexibility in moving SELECT operations to
different branches of the query tree.
2. Move SELECT Operations Down:
a. Apply Rules 2, 4, 6, 10, 13, and 14 to move SELECT
operations down the query tree.
b. Move SELECT operations as far down as allowed by the
attributes involved in the selection condition.
c. If the condition involves attributes from only one table
(selection condition), move the operation to the leaf node
representing that table.
d. If the condition involves attributes from two tables (join
condition), move it after combining the two tables.
3. Optimize Leaf Node Ordering:
a. Apply Rules 5 and 9 to rearrange leaf nodes based on
specific criteria.
b. Place relations with the most restrictive SELECT operations
at the front of the query tree.
c. Ensure the leaf node order avoids generating Cartesian
Product operations.
4. Combine CARTESIAN PRODUCT into JOIN:
a. Utilize Rule 12 to transform a CARTESIAN PRODUCT
followed by a SELECT operation into a JOIN operation if
the condition represents a join condition.
5. Cascade and Optimize PROJECT Operations:
a. Apply Rules 3, 4, 7, and 11 to break down and move lists of
projection attributes down the tree.
b. Create new PROJECT operations as necessary.
c. Keep only the attributes needed in the query result and
subsequent query tree operations

Choice of Query Execution Plans


1. Pipelined Evaluation:
a. Streaming Processing : In pipelined evaluation, data is
processed in a streaming fashion, with each operation
consuming and producing data records one at a time.
There is no need to store intermediate results in temporary
tables.
b. Low Latency: Pipelined evaluation can provide lower
query latency because data flows through the pipeline
immediately. There is no delay due to storing and retrieving
intermediate results.

2. Materialized Evaluation:
a. Intermediate Results: Materialized evaluation creates and
stores intermediate results in temporary tables. These
results can be indexed and optimized, potentially leading to
better query performance.
b. Higher Latency: Due to the need to store and retrieve
intermediate results, materialized evaluation can introduce
additional latency in query execution.

Nested Subquery Optimization

1. The initial query involves a nested subquery where it checks if an


employee's department is located in a specific zip code (30332).
2. Evaluating the nested subquery for every employee in the outer
relation would be inefficient.
3. SQL optimizers aim to convert nested subqueries into more
efficient join operations when possible.

Nested Subquery Optimization (after optimization)

1. In this case, the query is unnested, resulting in a join operation


between the EMPLOYEE and DEPARTMENT tables.
2. The join condition is based on the equality of DD number
(department number) and E.Dno (employee's department
number).
3. It's an equi-join, meaning each employee's department matches
with at most one department in the DEPARTMENT table.
4. This optimization eliminates the need to evaluate the nested
subquery repeatedly for each employee and improves query
efficiency.
Use of Selectivities in Cost-Based Optimization
Cost-based query optimization is a technique used in database
management systems to choose the most efficient execution strategy
for a given query. Here are the key points to understand about this
approach:

1. Objective: The goal of cost-based query optimization is to


select the execution strategy that minimizes the estimated cost
of running a query. This involves considering various execution
plans, algorithms, and strategies.
2. Accurate Cost Estimates: Accurate cost estimates are crucial
for fair and realistic comparisons of different execution
strategies. These cost estimates are not exact but provide a
relative measure of performance.
3. Limited Strategy Exploration: The optimizer must limit the
number of execution strategies it explores, as evaluating too
many possibilities can be time-consuming. This approach is
better suited for compiled queries where optimization is done at
compile time.
4. Compiled vs. Interpreted Queries: Compiled queries benefit
from a more elaborate optimization process, whereas interpreted
queries, where optimization occurs at runtime, require a less
time-consuming approach to avoid slowing down response
times.

5. Cost-Based Optimization: This approach is commonly known


as cost-based query optimization. It relies on traditional
optimization techniques that search for a solution that minimizes
a cost function.
6. Components of Cost: The cost functions used in query
optimization estimate factors such as space, time, and other
resource requirements.
7. Challenges: Query optimization involves several challenges,
including the cascading application of equivalence rules, the
need for quantitative measures (cost) to evaluate alternatives,
designing search strategies to find the cheapest alternatives, and
considering various access paths, join orders, and methods.
8. Scope of Optimization: The scope of query optimization is
typically within a query block, where various alternatives for table
and index access paths, join orders, and aggregation methods
are considered.
9. Catalog Information: The information needed for cost functions
is typically stored in the database management system catalog.

Let’s talk more in detail about Components of Cost and Catalog


Information

Cost Components for Query Execution


The cost of executing a query includes the following components:

1. Access Cost: Involves transferring data blocks between


secondary disk storage and memory buffers. It depends on
access structures, file organization, and block allocation.
2. Disk Storage Cost: This is the cost of storing intermediate files
generated during query execution on disk.
3. Computation Cost: Includes in-memory operations like
searching, sorting, merging, and computations on field values,
often referred to as CPU cost.
4. Memory Usage Cost: Relates to the number of main memory
buffers required during query execution.
5. Communication Cost: Encompasses shipping the query and its
results between database sites or terminals. In distributed
databases, it includes transferring tables and results among
multiple computers during query evaluation.

Catalog Information Used in Cost Functions


To estimate the costs of different execution strategies, the following
information is needed and may be stored in the DBMS catalog:

1. File Information: For each file, you need to know its size, which
includes the number of records (r), the average record size (R),
the number of file blocks (b), and the blocking factor (bfr).
2. File Organization: Understand the primary file organization,
which can be unordered, ordered by an attribute (with or without
an index), or hashed on a key attribute.
3. Index Information: Keep track of primary, secondary, or
clustering indexes and their indexing attributes. Note the number
of levels (x) in multilevel indexes and the number of first-level
index blocks (bI1) for certain cost functions.
4. Attribute Details: Collect data on the number of distinct values
(NDV) of an attribute in a relation R and the attribute selectivity
(sl), which is the fraction of records satisfying an equality
condition on the attribute. This helps estimate the selection
cardinality (s = sl * r), which is the average number of records
meeting an equality condition.

5. Maintenance: Some parameters, like the number of index


levels, remain relatively stable. However, others, such as the
number of records (r), may change frequently due to insertions
or deletions. The optimizer requires reasonably accurate, though
not necessarily real-time, values of these parameters.
6. Value Distribution: To estimate query result sizes effectively,
having a good estimate of the value distribution is crucial.

Cost Functions for SELECT Operation


Cost functions are provided for selection algorithms S1 to S8, with
S9's cost based on S6. These functions estimate block transfers
between memory and disk, excluding computation and storage costs.
These estimates are valuable for assessing the efficiency of these
algorithms.

Note: In using the above notation in formulas, we have omitted the


relation name or attribute name when it is obvious.

1. S1—Linear Search (Brute Force): In this approach, we search


through all the file blocks to retrieve records that match a
selection condition. If the condition is based on equality, on
average, we search through half of the file blocks before finding
the record. So, if we find a matching record, the cost is roughly
(b/2), where 'b' is the total number of file blocks. If no matching
record is found, the cost is 'b'.
2. S2—Binary Search: Binary search is used when we can exploit
the order of data. It accesses approximately 'log2b' blocks plus
some additional blocks based on the selection condition and
block size. For unique (key) attributes, the cost is simply 'log2b'.

3. S3a—Using a Primary Index: If we have a primary index,


retrieving a single record involves accessing one disk block per
index level plus one more for the actual data. So, the cost is 'x +
1', where 'x' is the number of index levels.
4. S3b—Using a Hash Key: When we use a hash key, we usually
access just one disk block. The cost is approximately '1'.
However, for extendible hashing, it can be '2' disk block
accesses.
5. S4—Using an Ordering Index: If we use an ordering index and
the condition involves comparisons like >, >=, <, or <=, we
estimate the cost as 'x + (b/2)'. Roughly, half of the records
satisfy such conditions. This is a rough estimate and may vary.
6. S5—Using a Clustering Index: With a clustering index and an
equality condition, 's' records will satisfy the condition, and
approximately '(s/bfr)' file blocks will be in the cluster. So, the
cost is 'x + (s/bfr)'.
7. S6—Using a Secondary (B+-tree) Index: For a secondary
index on a unique key, with an equality condition, the cost is 'x +
1'. For a non-unique key with equality, it's 'x + 1 + s' where 's' is
the number of matching records. For range queries, the cost can
be more complex and depends on the distribution of values.

8. S7—Conjunctive Selection: We can use any of the methods


S1 to S6 for each condition in a conjunction. We retrieve records
for one condition and then check in memory if they satisfy the
remaining conditions. Multiple indexes can be used to produce
sets of record pointers that can be intersected in memory.
9. S8—Conjunctive Selection Using Composite Index: Similar
to S3a, S5, or S6a, depending on the type of index used for the
conjunction.
10. S9—Selection Using Bitmap Index: This method is used when
we can reduce the selection to a set of equality conditions. Each
equality condition corresponds to an attribute and value pair. The
cost involves accessing bit vectors for these values, where each
bit vector is 'r' bits or 'r/8' bytes long. If 's' records qualify, 's'
blocks are accessed for the data records.
11. S10—Selection Using Functional Index: This is similar to S6
but involves an index based on a function of multiple attributes. If
the function appears in the SELECT clause, the corresponding
index may be utilized.

You might also like