Efficient Join On DBMS

This document discusses join query optimization in distributed databases. It contains the following key points: 1. Query optimization aims to find the most efficient plan to retrieve data in the least amount of time. In distributed databases, the cost of a query plan depends on transmission costs between servers and local processing costs. 2. For join queries in distributed databases, one optimization method is to send the table with the smaller size to the other site before performing the join. Parallel query processing aims to maximize simultaneous data transmissions rather than minimize transmission size. 3. The objectives for join query optimization in distributed databases are to minimize the size of transmitted data, transmission time, and local processing costs like CPU and I/O usage.

Uploaded by

Qaim Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views3 pages

Efficient Join On DBMS

Uploaded by

Qaim Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

International Journal of Scientific and Research Publications, Volume 3, Issue 5, May 2013 1

ISSN 2250-3153

Join Query Optimization in Distributed Databases

* **
Pawandeep Kaur , Jaspreet Kaur Sahiwal

* M.Tech Student, CSE Department, Lovely Professional University, Phagwara, India

** Assistant Professor, CSE Department, Lovely Professional University, Phagwara, India

Abstract- Query Optimization is to use the best plan for the

query that improves the performance of the query. Query II. QUERY OPTIMIZATION
Optimization is difficult in distributed databases as compared to
Query Optimization is to operate the query in different way
centralized databases. Queries in distributed databases are
so that it gives the same result but the speed to retrieve the data
effected by factors such as insertion methods of the data into the
increases. The queries should be efficient so that data can be
remote server and transmission time between servers. Response
retrieved in less time or accessing to database became fast. There
time of the query depends upon the transmission time, local
are alternative ways to perform the query that give the same
processing speed.
result. The way to perform the query should give the result in
I. INTRODUCTION minimum time and should increase the performance of the query.
In distributed systems, the cost of a query plan is given by the

A s the data is increasing day by day, it is becoming more

complex to store the more data on a single site. Data on
sum of the transmission cost and the local processing cost [5].
The transmission cost is in factors of speed to transfer the data
from one machine to another machine and the local processing
a single site also suffers from many problems such as the
storage limitations, site failure. Therefore, distributed cost is in terms of CPU cycles, disk I/O. Query Optimizer
database is required to distribute and store the data on determines:
multiple sites. A distributed database is a collection of  Number of alternative plans 
multiple, logically interrelated databases distributed over a  Cost of every plan using cost model 
computer network [5]. Data distributed on different sites is  Selects the plan with the lowest cost 
accessed with the help of queries. A distributed database is
useful because of its benefits. Join Query in distributed databases is used to join the data
from multiple sites. Optimization of join query in centralized
Benefits of using distributed databases are: databases is simple as compared to distributed databases. More
work is done on join query in centralized databases and more
i. Improved Performance: Because the data is stored on optimization is required in distributed databases.
multiple sites, so the overhead on one machine
decreases which improves the performance.
ii. Localization: means the data is present as close to the III. RELATED WORK ON JOIN QUERY
site where it is needed, therfore data can be accessed in OPTIMIZATION IN DISTRIBUTED DATABASES
less time and data transfer time also reduces.
There are different methods to optimize the queries in the
iii. Availability and Reliability: In distributed database databases. These methods improve the performance of the query
systems, the availability of the data increases because and decrease the cost. The optimizer determines that in which
the replicas of the data are distributed at different sites. order the queries (e. g. joins, selects, and projects) should be
It also increases the reliability because if the one site executed. Related work on join query optimization in distributed
fails, then data can be accessed from the other site databases is to calculate the size of the data on two different
where its replica is present. So, in distributed machines and then to send the table having smaller size to
environment, failure of one site does not result in another site and then perform the join query [5].
unavailability of the data.
iv. Reduced communication overhead: Communication
overhead reduces in distributed environment because a
relation is available at each site locally that contains the
replicas of the data.
v. Easier System Expansion: The capacity of the
distributed database can be increased easily by adding
the computers to the network [7]. In the distributed
environment, because system can associate and
coordinate a number of small machines so it gains the Figure 1: Minimize size of transmission data [5]
power equal to the power of a supercomputer.

www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 3, Issue 5, May 2013 2
ISSN 2250-3153
the data from the remote sites, objectives for the optimization
Another method for join query in distributed databases is the are:
parallel query processing. Parallel processing doesn't focus on
minimizing the quantity of transmission data but rather  Size of transmitted data: It is the amount of the data that is to
maximizing the number of simultaneous transmissions [7]. be transmitted. Size of the transmitted data should be small
so that less time will be required for transmission.
 Transmission speed: it depends upon the network speed. For
the wide area networks, transmission speed more affects the
query.
 Local processing costs: it consists of CPU cost, I/O cost.
Local processing costs can vary with the machine
processing speed.
To increase the performance of join query, these costs should be
less and operations should be performed in efficient manner for
optimization of the query.

V. FACTORS FOR PARALLEL PROCESSING

OF JOIN QUERY
Figure 2: Parallel processing of join query Parallel processing of join query in distributed database
depends upon the following factors:
Client sends the request for the data from server 1 and Time for query execution for requesting the data: time for the
server 2 by the queries. After that, server 1 sends the SUPPLY client to send the request to servers for the data that is placed at
data and server 2 sends the SUPPLIER data to client. Then client servers.
inserts the data into its database and performs the join query on Time for transmitting data: The transmission time increases as
the data from two servers. the quantity of transmitted data increases. It depends upon the
network speed between client and server. Response time to
If server 1 contains the SUPPLY relation as: transmit the data is given by:
SUPPLY(SUPPLY_NO, FROM_PLACE, TO_PLACE) Max(size(SUPPLY), size(SUPPLIER))
Time for inserting data: time taken to insert the data into the
and server 2 contains the SUPPLIER relation as: client database from the servers.
SUPPLIER(SUPPLY_NO, S_NAME, S_ADDRESS) Different insertion methods can be used:
 Row-by-row insertion
and client wants the join of the SUPPLY and SUPPLIER relation  Bulk insertion
from server 1 and server 2 respectively and want to perform the Bulk insertion is better to use than the row-by-row insertion.
query Q. Another type of insertion methods can be used to optimize the
Q: SELECT *FROM SUPPLY S, SUPPLIER sr WHERE insertion of data.
s.SUPPLY_NO = sr.SUPPLY_NO Time for join execution: time taken to perform the join query at
client side that joins the tables from server 1 and server 2.
In distributed databases, query Q can be divided into three parts: Optimization of join query can be done by using:
1. SELECT *FROM SUPPLY  different join orders
2. SELECT *FROM SUPPLIER  alternative “where” clause that will give the same result
3. SELECT *FROM SUPPLY S, SUPPLIER sr WHERE  different join methods
s.SUPPLY_NO = sr.SUPPLY_NO join also depends upon the local processing cost of query such as
CPU cost and I/O cost.
Queries 1 and 2 select the data from two source tables. To perform the distributed join query that is accessing the data
Because this data resides on the remote machines, the executions from the remote sites, costs should be less so that performance
of these two queries do not require data transmission. Query 3 is of join query increases. If a machine wants the result of join
the join query which can not be executed until the data on the query of data that is present at different machines, then
remote sites have been transferred to the same sites. transmission costs and the insertion costs are very important.
Insertion of the server’s data into client database takes the more
time than the transmission of the data. Therefore, the important
IV. OBJECTIVES OF JOIN QUERY IN DISTRIBUTED objective is to improve the performance of the join query that
DATABASES is accessing the data from two different machines by using the
To perform the distributed join query that is accessing different insertion methods that take less time.

www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 3, Issue 5, May 2013 3
ISSN 2250-3153
REFERENCES
VI. PROPOSED WORK [1] Aljanaby Alaa , Abuelrub Emad, and Odeh Mohammed , “ A Survey of
Distributed Query Optimization”, The International Arab Journal of
First method for the join query is first to transfer the Information Technology, Vol. 2, No. 1.
data from servers to client and then insert the data into client [2] Mullins Craig S., “Distributed Query Optimization”, Technical Support .
database, after that results are shown by performing join [3] Nicoleta Iacob , “Distributed Query Optimization”, PhD Student,
query at the client site that takes the data from its own University of Piteşti, Issue 4/2010
[4] Ioannidis Y. E. and Kang Y. C. , “Randomized Algorithms for Optimizing
database . Time for this method will equal to the addition of Large Join Queries”, in Proceedings of the ACM SIGMOD Conference on
time to fetch the data from server sites, time to insert the data Management of Data, Atlantic City, USA, pp. 312-321.
into client database and join query processing time. [5] Ghaemi Reza, Amin Milani Fard, Tabatabaee Hamid, and Sadeghizadeh
Second method that is my proposed work will show the Mahdi ,” “Evolutionary Query Optimization for Heterogeneous Distributed
Database Systems”, World Academy of Science, Engineering and
results by performing the join query on the client site without Technology 19.
inserting the data into its database. The join query in my [6] Mor Jyoti, Kashyap Indu, Rathy R. K., “Analysis of Query Optimization
proposed work will directly take the data from server sides. Techniques in Databases”, International Journal of Computer Applications
Time for proposed method will depend upon the time to fetch (0975 – 888) Volume 47– No.15.
[7] Jiang Shun , “Optimizing Join Query in Distributed Database”, University
the data from server sides and join query execution time. of North Carolina, Wilmington.
Then compare both the methods based upon their [8] Valduriez Patrick , “Join Indices”, ACM Transactions on Database
performance. In the proposed method, insertion time of data Systems, Vol. 12, No. 2 .
into client database will be deducted. Therefore, the join [9] Ioannidis Y. E. and Kang Y. C. , “Left-Deep vs. Bushy Trees: An Analysis
of Strategy Spaces and its Implications for Query Optimization”, SIGMOD
query will be optimized in distributed databases. Conference 1991: 168-177.
[10] Sukheja Deepak, Singh Umesh Kumar (July 2011), “A Novel Approach of
VII. CONCLUSION Query Optimization for Distributed Database Systems”, IJCSI International
Journal of Computer Science Issues, Vol. 8, Issue 4, No 1.
This paper presents the join query optimization in
distributed databases. One method for the join query is first
to transfer the data from servers to client site and then insert AUTHORS
the data into client database, after that join query is First Author – Pawandeep Kaur, M.Tech Student, CSE
performed. Proposed method will directly perform the join Department, Lovely Professional University, Phagwara,
query on the client site after fetching from servers site and it India. email: [email protected]
will not insert the data into client database. By the proposed Second Author – Jaspreet Kaur Sahiwal, Assistant Professor,
method, insertion time of data into client database will be CSE Department, Lovely Professional University, Phagwara,
deducted. So, this method will optimize the join query in India. email: [email protected]
distributed databases.

www.ijsrp.org

Chhanda Ray - Distributed Database Systems (2009, Pearson Education) - Libgen - Li
No ratings yet
Chhanda Ray - Distributed Database Systems (2009, Pearson Education) - Libgen - Li
325 pages
honda accord 2021
No ratings yet
honda accord 2021
59 pages
4-2-Query_Processing
No ratings yet
4-2-Query_Processing
106 pages
.AMD FP6 Motherboard Design Guide
100% (2)
.AMD FP6 Motherboard Design Guide
316 pages
4-Query_Processing (1)-PTIT
No ratings yet
4-Query_Processing (1)-PTIT
72 pages
Query Processing
No ratings yet
Query Processing
121 pages
Ddbms Long Only
No ratings yet
Ddbms Long Only
53 pages
1_2e_Query_Optimization_ozsu_ch8_SPLIT (1)
No ratings yet
1_2e_Query_Optimization_ozsu_ch8_SPLIT (1)
29 pages
SF8 - UNIT 2 DDB
No ratings yet
SF8 - UNIT 2 DDB
97 pages
2e_Query_Optimization_ozsu_ch8
No ratings yet
2e_Query_Optimization_ozsu_ch8
26 pages
Lecture5 -Query_Processing 1
No ratings yet
Lecture5 -Query_Processing 1
23 pages
DDBS Lecture5
No ratings yet
DDBS Lecture5
29 pages
Advanced Database Chapter 6 Distributed database
No ratings yet
Advanced Database Chapter 6 Distributed database
33 pages
DE GUZMAN, ISAIAH Q._MMEM
No ratings yet
DE GUZMAN, ISAIAH Q._MMEM
19 pages
IRJET-V6I2217
No ratings yet
IRJET-V6I2217
4 pages
queryoptimization-examples
No ratings yet
queryoptimization-examples
26 pages
Thesis On Query Optimization in Distributed Database
100% (1)
Thesis On Query Optimization in Distributed Database
6 pages
Distibuted System
No ratings yet
Distibuted System
11 pages
Rigging-Parts-And-Accessories 2022
No ratings yet
Rigging-Parts-And-Accessories 2022
39 pages
vu_Lec_35
No ratings yet
vu_Lec_35
42 pages
vu_Lec_30
No ratings yet
vu_Lec_30
28 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Data Analysis
100% (1)
Data Analysis
34 pages
26 Distributed Dbms Nosql
No ratings yet
26 Distributed Dbms Nosql
45 pages
Distributed Query Optimization: Oscar Romero Alberto Abelló Gamazo
No ratings yet
Distributed Query Optimization: Oscar Romero Alberto Abelló Gamazo
44 pages
EE313 Lesson 2
No ratings yet
EE313 Lesson 2
13 pages
DR Rola Pumps
No ratings yet
DR Rola Pumps
24 pages
Query Optimization
No ratings yet
Query Optimization
29 pages
UNIT 1 _SCSA3008_DISTRIBUTED DATABASE AND INFORMATION
No ratings yet
UNIT 1 _SCSA3008_DISTRIBUTED DATABASE AND INFORMATION
23 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
17 pages
Data Communication Basics CH 7
No ratings yet
Data Communication Basics CH 7
27 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Unit 2 Transmission lines and Waveguides 23_24.ppt
100% (2)
Unit 2 Transmission lines and Waveguides 23_24.ppt
130 pages
IJCA Joins Semi Joins
No ratings yet
IJCA Joins Semi Joins
5 pages
6-Query Intro
No ratings yet
6-Query Intro
15 pages
Evaluating Multiple Join Queries in A Distributed Database System
No ratings yet
Evaluating Multiple Join Queries in A Distributed Database System
16 pages
(1959) Control Chart Tests Based On Geometric Moving Averages PDF
No ratings yet
(1959) Control Chart Tests Based On Geometric Moving Averages PDF
13 pages
Fractions Lesson Plan
No ratings yet
Fractions Lesson Plan
4 pages
Unit-2_Query Processing in Distributed DBMS
No ratings yet
Unit-2_Query Processing in Distributed DBMS
4 pages
Query
No ratings yet
Query
13 pages
UNIT-II
No ratings yet
UNIT-II
14 pages
Assignment-3 OF: Mobile Computing
No ratings yet
Assignment-3 OF: Mobile Computing
10 pages
7th Grade CRCT Jeopardy
0% (2)
7th Grade CRCT Jeopardy
51 pages
Distributed Dbms Ca1
No ratings yet
Distributed Dbms Ca1
9 pages
Sample Mathematics Questions For Fairy Bee 12 13 Years Old
No ratings yet
Sample Mathematics Questions For Fairy Bee 12 13 Years Old
4 pages
Implications of A Distributed Environment Part 2
No ratings yet
Implications of A Distributed Environment Part 2
38 pages
Database MC A
No ratings yet
Database MC A
16 pages
DDBS Unit 2
No ratings yet
DDBS Unit 2
7 pages
Adv DBMS-Unit 2
No ratings yet
Adv DBMS-Unit 2
15 pages
Grade 8 and 9 Item Banks For Revision
100% (1)
Grade 8 and 9 Item Banks For Revision
131 pages
Sample Proportions: Section 9.2
No ratings yet
Sample Proportions: Section 9.2
12 pages
Outline: Distributed Query Processing
No ratings yet
Outline: Distributed Query Processing
8 pages
Field-Testing of Power Semiconductor Modules: Application Note
No ratings yet
Field-Testing of Power Semiconductor Modules: Application Note
11 pages
DDBMS-Chapter-4-SE-LectureNote (Version 1)
No ratings yet
DDBMS-Chapter-4-SE-LectureNote (Version 1)
11 pages
ads unit 2..
No ratings yet
ads unit 2..
3 pages
Query
No ratings yet
Query
104 pages
Query Optimization in Distributed Systems
No ratings yet
Query Optimization in Distributed Systems
4 pages
Unit I (Distributed Databases)
No ratings yet
Unit I (Distributed Databases)
8 pages
dd1
No ratings yet
dd1
10 pages
Query Processing in Distributed Database
No ratings yet
Query Processing in Distributed Database
20 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
27 pages
QueryProcessing Lect 3
No ratings yet
QueryProcessing Lect 3
26 pages
University of Cagliari: Blynk Platform
No ratings yet
University of Cagliari: Blynk Platform
34 pages
D and F Block Elements
No ratings yet
D and F Block Elements
91 pages
Rahul Chugh Adbms Asiignment 2
No ratings yet
Rahul Chugh Adbms Asiignment 2
10 pages
Lect#2 DDBS (Characteristics and Layers of Query Processing)
80% (10)
Lect#2 DDBS (Characteristics and Layers of Query Processing)
20 pages
Magnetism and Matter
No ratings yet
Magnetism and Matter
16 pages
Ruhland Skript Polymer Physics
No ratings yet
Ruhland Skript Polymer Physics
97 pages
Automotive Engine Valve
No ratings yet
Automotive Engine Valve
61 pages
CIV2263 - Water Systems - S1 2021: My Units
No ratings yet
CIV2263 - Water Systems - S1 2021: My Units
9 pages
Ieee Surg Arrestor .0) en
No ratings yet
Ieee Surg Arrestor .0) en
9 pages
Introduction To Parallel Databases
No ratings yet
Introduction To Parallel Databases
24 pages
AME MODULE LIST BY DGCA
100% (1)
AME MODULE LIST BY DGCA
11 pages
DDS Unit - 2
No ratings yet
DDS Unit - 2
7 pages
Distributed Databases
No ratings yet
Distributed Databases
58 pages
828D Basic T+User+Manual
No ratings yet
828D Basic T+User+Manual
44 pages
Unit - V: Database Database Management System Storage Devices CPU Computers Network
No ratings yet
Unit - V: Database Database Management System Storage Devices CPU Computers Network
4 pages
Olap Exp05
No ratings yet
Olap Exp05
10 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
31 pages
What Is Centralized Database?
No ratings yet
What Is Centralized Database?
8 pages
Appendix B:Schematic Diagrams
No ratings yet
Appendix B:Schematic Diagrams
44 pages
Distributed Databases: by Allyson Moran
No ratings yet
Distributed Databases: by Allyson Moran
37 pages
DHTML Tutorial: Components of Dynamic HTML
No ratings yet
DHTML Tutorial: Components of Dynamic HTML
17 pages
Distributed Database Overview
No ratings yet
Distributed Database Overview
4 pages
CS - 530-hướng - dẫn - sử - dụng - tiếng - anh 11
100% (2)
CS - 530-hướng - dẫn - sử - dụng - tiếng - anh 11
155 pages
Isochem: Modular Chemical Process Pumps
No ratings yet
Isochem: Modular Chemical Process Pumps
20 pages
Confirmatory Factor Analysis (CFA) of First Order Factor Measurement Model-ICT Empowerment in Nigeria
No ratings yet
Confirmatory Factor Analysis (CFA) of First Order Factor Measurement Model-ICT Empowerment in Nigeria
8 pages
Unit-V: Database Management System
No ratings yet
Unit-V: Database Management System
5 pages
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Efficient Join On DBMS

Uploaded by

Efficient Join On DBMS

Uploaded by

International Journal of Scientific and Research Publications, Volume 3, Issue 5, May 2013 1

Join Query Optimization in Distributed Databases

* M.Tech Student, CSE Department, Lovely Professional University, Phagwara, India

Abstract- Query Optimization is to use the best plan for the

A s the data is increasing day by day, it is becoming more

V. FACTORS FOR PARALLEL PROCESSING

You might also like