0% found this document useful (0 votes)
15 views

UNIT 4 Query Processing and Different types of Databases

Query processing
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

UNIT 4 Query Processing and Different types of Databases

Query processing
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT 4

Query Processing in DBMS


Query Processing is the activity performed in extracting data from the database.

In query processing, it takes various steps for fetching the data from the database. The steps involved
are:
1. Parsing and translation

2. Optimization

3. Evaluation

The query processing works in the following way:

1. Parsing and Translation : As query processing includes certain activities for data retrieval.
Initially, the given user queries get translated in high-level database languages such as SQL. It
gets translated into expressions that can be further used at the physical level of the file system.
After this, the actual evaluation of the queries and a variety of query -optimizing
transformations and takes place. Thus before processing a query, a computer system needs to
translate the query into a human-readable and understandable language. Consequently, SQL
or Structured Query Language is the best suitable choice for humans. But, it is not perfectly
suitable for the internal representation of the query to the system. Relational algebra is well
suited for the internal representation of a query. The translation process in query processing
is similar to the parser of a query. When a user executes any query, for generating the internal
form of the query, the parser in the system checks the syntax of the query, verifies the name
of the relation in the database, the tuple, and finally the required attribute value. The parser
creates a tree of the query, known as 'parse-tree.' Further, translate it into the form of
relational algebra. With this, it evenly replaces all the use of the views when used in the query.

Thus, we can understand the working of a query processing in the below-described diagram:

Suppose a user executes a query. As we have learned that there are various methods of extracting the
data from the database. In SQL, a user wants to fetch the records of the employees whose salary is
greater than or equal to 10000. For doing this, the following query is undertaken:

select emp_name from Employee where salary>10000;

Thus, to make the system understand the user query, it needs to be translated in the form of relational
algebra. We can bring this query in the relational algebra form as:
o σsalary>10000 (πsalary (Employee))

o πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational algebra operation by using different
algorithms. So, in this way, a query processing begins its working.

2. Evaluation: For this, with addition to the relational algebra translation, it is required to
annotate the translated relational algebra expression with the instructions used for specifying
and evaluating each operation. Thus, after translating the user query, the system executes a
query evaluation plan.

Query Evaluation Plan

• In order to fully evaluate a query, the system needs to construct a query evaluation plan.

• The annotations in the evaluation plan may refer to the algorithms to be used for the
particular index or the specific operations.

• Such relational algebra with annotations is referred to as Evaluation Primitives. The


evaluation primitives carry the instructions needed for the evaluation of the operation.

• Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query execution
plan.

• A query execution engine is responsible for generating the output of the given query. It
takes the query execution plan, executes it, and finally makes the output for the user
query.

3. Optimization

• The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to write
their query efficiently.

• Usually, a database system generates an efficient query evaluation plan, which minimizes
its cost. This type of task performed by the database system and is known as Query
Optimization.

• For optimizing a query, the query optimizer should have an estimated cost analysis of each
operation. It is because the overall operation cost depends on the memory allocations to
several operations, execution costs, and so on.

Finally, after selecting an evaluation plan, the system evaluates the query and produces the output of
the query.

Estimating Query Cost


In the previous section, we understood about Query processing steps and evaluation plan. Though a
system can create multiple plans for a query, the chosen method should be the best of all. It can be
done by comparing each possible plan in terms of their estimated cost.

For calculating the net estimated cost of any plan, the cost of each operation within a plan should be
determined and combined to get the net estimated cost of the query evaluation plan.

The cost estimation of a query evaluation plan is calculated in terms of various resources that include:

o Number of disk accesses


o Execution time taken by the CPU to execute a query

o Communication costs in distributed or parallel database systems.

To estimate the cost of a query evaluation plan, we use the number of blocks transferred from the disk,
and the number of disks seeks. Suppose the disk has an average block access time of ts seconds and
takes an average of tT seconds to transfer x data blocks. The block access time is the sum of disk seeks
time and rotational latency. It performs S seeks than the time taken will be b*tT + S*tS seconds. If tT=0.1
ms, tS =4 ms, the block size is 4 KB, and its transfer rate is 40 MB per second. With this, we can easily
calculate the estimated cost of the given query evaluation plan.

Generally, for estimating the cost, we consider the worst case that could happen. The users assume
that initially, the data is read from the disk only. But there must be a chance that the information is
already present in the main memory. However, the users usually ignore this effect, and due to this, the
actual cost of execution comes out less than the estimated value.

The response time, i.e., the time required to execute the plan, could be used for estimating the cost of
the query evaluation plan. But due to the following reasons, it becomes difficult to calculate the
response time without actually executing the query evaluation plan:

o When the query begins its execution, the response time becomes dependent on the contents
stored in the buffer. But this information is difficult to retrieve when the query is in optimized
mode, or it is not available also.

o When a system with multiple disks is present, the response time depends on an interrogation
that in "what way accesses are distributed among the disks?". It is difficult to estimate without
having detailed knowledge of the data layout present over the disk.

o Consequently, instead of minimizing the response time for any query evaluation plan, the
optimizers finds it better to reduce the total resource consumption of the query plan. Thus to
estimate the cost of a query evaluation plan, it is good to minimize the resources used for
accessing the disk or use of the extra resources.

Distributed Database System: Overview


• A distributed database is a database that is not limited to one computer system.
• It is like a database that consists of two or more files located in different computers or sites
either on the same network or on an entirely different network.
• Instead of storing all the data in one database, data is divided and stored at different locations
or sites which do not share any physical component.

Introduction to Distributed Database System in DBMS

• A distributed database is a database system that spans multiple computers or nodes that are
connected by a network.
• Each node in a distributed database can store a portion of the data, and the entire database is
made up of the sum of the data stored on each node.
• In a distributed database, data is stored and processed in a distributed manner, and the system
ensures that the data remains consistent and available to users despite network failures or
other system errors.
• The primary goal of a distributed database is to provide high availability, scalability, and
performance to applications that require access to large amounts of data.
Need of Distributed Database in DBMS

Let's start with the databases and their types,

• A database is an structured collection of information. The data can be easily accessed,


managed, modified, updated, controlled, and organized in a database.

• Databases can be broadly classified into two types,


namely Distributed and Centralized databases.

The question here is why do we even need a Distributed Database In Dbms?.

Let's assume for a moment that we have only centralized databases.

1. We will be inserting all the data into one single database. Making it too large so that
it will take a lot of time to query a single piece of record.

2. Once a fault occurs, we no longer be able to serve user requests as we have only one
database.

3. No scaling is possible even if we wanted to and availability is also less which in turn
affects the throughput.

Distributed databases resolve various issues, such as availability, fault


tolerance, throughput, latency, scalability, and many other problems that can arise from using a
single machine and a single database.

That's why we need distributed databases.

Distributed Databases

• A distributed database is a database that is not limited to one computer system. It is like a
database that consists of two or more files located in different computers or sites either on
the same network or on an entirely different network.

• These sites do not share any physical component. Distributed databases are needed when a
particular data in the database needs to be accessed by various users globally. It needs to be
handled in such a way that for a user it always looks like one single database.

• By contrast, a Centralized database consists of a single database file located at one site using
a single network.

• Below is a reference diagram for distributed databases.

• Though there are many distributed databases to choose from, some examples of distributed
databases include Apache Ignite, Apache Cassandra, Apache HBase, Amazon
SimpleDB, Clusterpoint, and FoundationDB.
Features of Distributed Databases

In general, distributed databases include the following features:

1. Location independency: Data is independently stored at multiple sites and managed by


independent Distributed database management systems (DDBMS).

2. Network linking: All distributed databases in a collection are linked by a network and
communicate with each other.

3. Distributed query processing: Distributed query processing is the procedure of answering


queries (which means mainly read operations on large data sets) in a distributed environment.

o Query processing involves the transformation of a high-level query (e.g., formulated


in SQL) into a query execution plan (consisting of lower-level query operators in some
variation of relational algebra) as well as the execution of this plan.

4. Hardware independent: The different sites where data is stored are hardware-independent.
There is no physical contact between these Distributed Database In Dbms which is
accomplished often through virtualization.

5. Distributed transaction management: Distributed Database In Dbms provides a consistent


distribution through commit protocols, distributed recovery methods,
and distributed concurrency control techniques in case of many transaction failures.

Types of Distributed Database In Dbms

There are two types of distributed databases:

• Homogenous distributed database.

• Heterogeneous distributed database.

Homogenous Distributed Database

• A Homogenous distributed database is a network of identical databases stored on multiple


sites. All databases stores data identically, the operating system, DDBMS and the data
structures used – all are same at all sites, making them easy to manage.

• Below is a diagram for the same,

Heterogeneous Distributed Database

• It is the opposite of a Homogenous distributed database. It uses different schemas, operating


systems, DDBMS, and different data models causing it difficult to manage.
• In the case of a Heterogeneous distributed database, a particular site can be completely
unaware of other sites. This causes limited cooperation in processing user requests, this is why
translations are required to establish communication between sites.

• Below is a diagram for the same,

Note: Heterogenous DDMS have local users while Homogenous DDMS does not have local users

Advantages of Distributed Database in Dbms


1. Better Reliability: Distributed databases offers better reliability than centralized databases.
When database failure occurs in a centralized database, the system comes to a complete stop.
But in the case of distributed databases, the system functions even when a failure occurs, only
performance-related issues occur which are negotiable.

2. Modular Development: It implies that the system can be expanded by adding new computers
and local data to the new site and connecting them to the distributed system without
interruption.

3. Lower Communication Cost: Locally storing data reduces communication costs for data
manipulation in distributed databases. In centralized databases, local storage is not possible.

4. Better Response Time: As the data is distributed efficiently in distributed databases, this
provides a better response time when user queries are met locally. While in the case of
centralized databases, all of the queries have to pass through the central machine which
increases response time.

Disadvantages of Distributed Database In Dbms


1. Costly Software: Maintaining a distributed database is costly because we need to ensure data
transparency, coordination across multiple sites which requires costly software.

2. Large Overhead: Many operations on multiple sites require complex and numerous
calculations, causing a lot of processing overhead.

3. Improper Data Distribution: If data is not properly distributed across different sites, then
responsiveness to user requests is affected. This in turn increases the response time.

Multimedia database
• A multimedia database is a specialized type of database that stores, manages, and retrieves
multimedia data, such as images, videos, audio, and other types of media, alongside
traditional data like text and numbers. These databases are designed to handle complex data
types that require large storage and efficient retrieval mechanisms.
• It is the collection of interrelated multimedia data that includes text, graphics (sketches,
drawings), images, animations, video, audio etc and have vast amounts of multisource
multimedia data.
• The framework that manages different types of multimedia data which can be stored,
delivered and utilized in different ways is known as multimedia database management system.

Content of Multimedia Database management system :

1. Media data – The actual data representing an object.

2. Media format data – Information such as sampling rate, resolution, encoding scheme etc.
about the format of the media data after it goes through the acquisition, processing and
encoding phase.

3. Media keyword data – Keywords description relating to the generation of data. It is also
known as content descriptive data. Example: date, time and place of recording.

4. Media feature data – Content dependent data such as the distribution of colors, kinds of
texture and different shapes present in data.

Need for Multimedia Databases:

1. Increased Usage of Multimedia Content: With the rise of social media, entertainment, e-
commerce, and other sectors, the volume of multimedia data (images, videos, audio files, etc.)
has surged, necessitating efficient storage and retrieval systems.

2. Complex Data Handling: Traditional databases are not optimized for handling multimedia
content, which has specific storage and retrieval needs such as large file sizes, real-time access,
and complex querying.

3. Integration of Different Media Types: As industries like healthcare, education, and digital
marketing integrate various media types for better user engagement and data analysis,
multimedia databases are required to unify and manage these diverse data sources.

Advantages of Multimedia Databases:

1. Efficient Storage and Retrieval: These databases are specifically designed to store large
multimedia files and ensure efficient indexing and retrieval of content like images, audio, and
video.

2. Enhanced User Experience: By integrating multimedia data, applications provide more


engaging and interactive experiences, enhancing user interaction and accessibility.

3. Support for Rich Data Types: Multimedia databases support a variety of media types, enabling
seamless integration of text, images, sound, and video, which is critical for modern
applications in various sectors.

4. Scalable Solutions: They offer scalable architectures to handle large amounts of multimedia
data, essential in big data environments or applications with large media libraries.

Disadvantages of Multimedia Databases:

1. Complexity in Design and Management: Multimedia databases are more complex to design
and maintain than traditional databases due to the diversity of data types and the need for
specialized indexing and retrieval techniques.

2. High Storage Requirements: Multimedia files, especially high-resolution images, videos, and
audios, consume significant storage space, which can increase costs and require advanced
hardware.
3. Performance Issues: Querying large multimedia data, especially in real-time applications, can
be slower than traditional database queries, especially if the system is not optimized.

4. Limited Standardization: There is no universal standard for multimedia data management,


leading to challenges in compatibility, integration, and interoperability between different
systems and platforms.

Object-oriented databases
Object-oriented databases are a type of database management system which add
the database functionality to object programming languages, creating more manageable code bases.

Object Database Definition

An object database is managed by an object-oriented database management system (OODBMS). The


database combines object-oriented programming concepts with relational database principles.

• Objects are the basic building block and an instance of a class, where the type is either built-
in or user-defined.

• Classes provide a schema or blueprint for objects, defining the behavior.

• Methods determine the behavior of a class.

• Pointers help access elements of an object database and establish relations between objects.

The main characteristic of objects in OODBMS is the possibility of user-constructed types. An object
created in a project or application saves into a database as is.

Object-oriented databases directly deal with data as complete objects. All the information comes in
one instantly available object package instead of multiple tables.

In contrast, the basic building blocks of relational databases, such as PostgreSQL or MySQL, are tables
with actions based on logical connections between the table data.

These characteristics make object databases suitable for projects with complex data which require an
object-oriented approach to programming. An object-oriented management system provides
supported functionality catered to object-oriented programming where complex objects are central.
This approach unifies attributes and behaviors of data into one entity.

Object-Oriented Programming Concepts

Object-oriented databases closely relate to object-oriented programming concepts. The four main
ideas of object-oriented programming are:

1. Polymorphism
2. Inheritance
3. Encapsulation
4. Abstraction

These four attributes describe the critical characteristics of object-oriented management systems.

1. Polymorphism: It is the capability of an object to take multiple forms. This ability allows the
same program code to work with different data types. Both a car and a bike are able to break,
but the mechanism is different. In this example, the action break is a polymorphism. The
defined action is polymorphic — the result changes depending on which vehicle performs.
2. Inheritance: It creates a hierarchical relationship between related classes while making parts
of code reusable. Defining new types inherits all the existing class fields and methods plus
further extends them. The existing class is the parent class, while the child class extends the
parent.
For example, a parent class called Vehicle will have child classes Car and Bike. Both child
classes inherit information from the parent class and extend the parent class with new
information depending on the vehicle type.
3. Encapsulation: It is the ability to group data and mechanisms into a single object to provide
access protection. Through this process, pieces of information and details of how an object
works are hidden, resulting in data and function security. Classes interact with each other
through methods without the need to know how particular methods work.

As an example, a car has descriptive characteristics and actions. You can change the color of a
car, yet the model or make are examples of properties that cannot change. A
class encapsulates all the car information into one entity, where some elements are
modifiable while some are not.

4. Abstraction: It is the procedure of representing only the essential data features for the
needed functionality. The process selects vital information while unnecessary information
stays hidden. Abstraction helps reduce the complexity of modelled data and allows reusability.

For example, there are different ways for a computer to connect to the network. A web
browser needs an internet connection. However, the connection type is irrelevant. An
established connection to the internet represents an abstraction, whereas the various types
of connections represent different implementations of the abstraction.
Short Note on Object-Oriented Database (OODB)
An Object-Oriented Database (OODB) is a database that stores data in the form of objects, like how
data is represented in object-oriented programming (OOP). In an OODB, each object contains both
data (attributes) and methods (functions or procedures) that operate on the data. This approach allows
for more complex data structures to be stored and processed compared to traditional relational
databases.

Need for Object-Oriented Databases:

1. Complex Data Representation: Traditional relational databases store data in tables, which can
be inadequate for representing complex, real-world entities and relationships. OODBs provide
a natural way to store complex data like multimedia files, geographical data, and engineering
models.

2. Seamless Integration with Object-Oriented Programming: OODBs are ideal for applications
that use object-oriented programming languages like Java, C++, or Python. They allow for a
direct mapping between objects in the code and data stored in the database, improving
efficiency and reducing the need for complex data translation.

3. Better Modeling of Real-World Entities: Objects in OODBs can encapsulate both data and
behavior, making them suitable for applications where entities have attributes and methods
(e.g., banking systems, inventory systems).

Advantages of Object-Oriented Databases:

1. Complex Data Representation: OODBs can easily represent complex and hierarchical
relationships between data, making them suitable for applications that deal with complex data
structures (e.g., multimedia, CAD systems).

2. Data and Behavior Encapsulation: In OODBs, data is encapsulated along with the methods
that operate on it, allowing for a more natural and intuitive representation of real-world
objects.

3. Inheritance and Reusability: OODBs support inheritance, a fundamental concept of object-


oriented programming, which allows new classes (objects) to inherit properties and methods
from existing ones. This promotes reusability and reduces redundancy in the database
schema.

4. Improved Maintainability: Since objects in OODBs correspond to real-world entities with


methods, maintaining and modifying the database can be simpler, especially when changes to
the system's functionality need to be reflected in the data.

Disadvantages of Object-Oriented Databases:

1. Complexity and Steep Learning Curve: Designing and managing an OODB is more complex
compared to traditional relational databases. Developers and database administrators need
to be familiar with both OOP concepts and database management.

2. Limited Standardization: Unlike relational databases, which have well-established standards


(like SQL), object-oriented databases lack a universal query language, leading to
inconsistencies across different OODB systems.

3. Performance Issues: Object-oriented databases can suffer from performance problems,


particularly when handling large volumes of data or complex queries, as the object model can
require more processing power and memory.
4. Integration Challenges: Integrating OODBs with existing systems that use relational databases
or other data models can be difficult, as it may require significant restructuring of both the
application and database layers.

Mobile Databases in DBMS


Introduction: You may have noticed that there are more and more demands placed on mobile
computing to deliver the support needed by the technology used by an increasing number of on-the-
go workers. These people must act as though they are working in an office, but in fact, they are
operating from remote corners of various sites in the same area, such as their houses, clients'
premises, or just while travelling to remote locations.

A remote worker may be present at the "office" in the form of a laptop, desktop, PDA (Personal Digital
Assistant), or other Internet-accessing device. Mobile users will soon be able to access any data from
any location at any time because of the rapid development of the mobile network, wireless media, and
satellite communications. Business protocol, practical considerations, security concerns, and expenses
could still limit communication to the point where it is impossible to maintain internet connections for
as much as users would like at any time. A remedy to some of these limitations or issues is provided
by mobile databases.

Mobile Database:

A Mobile Database is a type of database that can be accessed by a mobile network and connected to
a mobile computing device (or wireless network). Here, there is a wireless connection between the
client and the server. In the modern world, Mobile Cloud Computing is expanding quickly and has
enormous potential for the database industry. It will work with a variety of various devices, including
Mobile Databases powered by iOS and Android, among others. Couchbase Lite, Object Box, and other
popular databases are examples of databases.

Mobile Database Environment has the Following Components:

• For storing the corporate and providing the corporate applications, a Corporate Database
Server and DBMS is used.

• For storing the mobile data and providing the mobile application, a Remote Database and
server are used.

• There is always a two-way communication link present between the Mobile DBMS and
Corporate DBMS.

Features of Mobile Database:

• There are a lot of features of Mobile Database which are discussed below:

• As more people utilize laptops, smartphones, and PDAs to live on the go.

• To prevent frequent transactions from being missed due to connection failure, a cache is kept.

• Mobile Databases and the main database server are physically independent.

• Mobile gadgets hosted Mobile Databases.

• Mobile Databases can communicate with other mobile clients or a centralized database server
from distant locations.

• Due to unreliable or nonexistent connections, mobile users need to be able to operate without
a wireless connection with the aid of a Mobile Database (disconnected).
• Information on mobile devices is analyzed and managed using a Mobile Database.

Mobile Database Consists of Three Parties Which are Described Below:

1. Fixed Hosts: With the aid of database servers, it handles transactions and manages data.

2. Mobile Units: These are mobile, transportable computers, and the cell tower they utilize to
connect to base stations is a part of that geographical area.

3. Base Stations: These two-way radios, which are installed in fixed places, allow communication
between the stationary hosts and the mobile units.

In many instances, a user may utilize a mobile device to log in to any corporate database server and
deal with data there, depending on the specific requirements of mobile applications. While in other
cases, the user can upload data collected at the remote location to the company database or download
it and work with it on a mobile device. The interaction between the corporate and mobile databases
is frequently intermittent and only occasionally establishes or establishes a link for a brief period.

Additional Functionalities of a Mobile DBMS Consist of the Following Capabilities:

• It should communicate to the centralized and primary database through different modes.

• On mobile devices and centralized DBMS servers, the data should be repeated.

• From the internet, capture the data.

• Mobile devices should be capable of dealing with that data.

• Mobile devices must analyze the data.

• Must create a personalized and customized application.

Limitations:

• Its wireless bandwidth is restricted.

• It is very difficult work to make this database theft-proof.

• To operate this, we need unlimited battery power.

• Wireless communication speed suffers in mobile databases.

• In terms of security, it is less secure.

Short note on Mobile Database:


A mobile database is a database specifically designed to work with mobile devices, such as
smartphones and tablets, which have limited resources like storage, processing power, and network
connectivity. Mobile databases allow apps and services to store, retrieve, and manage data locally on
the device, as well as synchronize with remote databases when network connections are available.

Need for Mobile Databases:

1. Offline Access: Mobile devices often need to function without continuous internet
connectivity. Mobile databases allow users to access and manipulate data offline, syncing with
the central server when connectivity is available.

2. Mobile Application Support: With the rise of mobile apps in industries like e-commerce,
healthcare, and social media, there is a need for local storage of data to provide a responsive
user experience, even when network conditions are poor.
3. Data Synchronization: Mobile databases help ensure that data collected or modified on a
mobile device can be synchronized with central databases when the device reconnects to the
internet, ensuring consistency across platforms.

Advantages of Mobile Databases:

1. Offline Data Access: One of the key benefits is the ability to work offline. Users can access,
edit, and create data without an internet connection, providing flexibility and uninterrupted
functionality.

2. Faster Data Access: Local storage on mobile devices allows for quicker data retrieval and
manipulation, improving performance compared to relying solely on cloud or remote
databases.

3. Reduced Data Costs: By reducing the need for constant data transmission between the mobile
device and remote servers, mobile databases can help lower network usage costs, which is
especially important for users with limited data plans.

4. Improved User Experience: Mobile databases enable applications to work seamlessly and
provide a faster, more responsive experience by reducing dependence on network speed or
availability.

Disadvantages of Mobile Databases:

1. Limited Storage and Processing Power: Mobile devices have limited storage and processing
capabilities compared to desktop or server systems. This can restrict the size and complexity
of the databases stored on the device.

2. Synchronization Challenges: Syncing data between the mobile database and remote server
can be complex, especially when there are conflicts (e.g., if the same data is modified on both
ends while offline).

3. Data Security Concerns: Storing sensitive data locally on mobile devices can raise security
concerns, as mobile devices are more prone to theft or loss, which could expose private
information.

4. Increased Complexity in Development: Developing mobile apps that incorporate databases


requires additional coding and planning to handle local storage, synchronization, and offline
access, which can increase development time and complexity.

You might also like