Unit-2_Query Processing in Distributed DBMS
Unit-2_Query Processing in Distributed DBMS
Query processing in a distributed database management system
requires the transmission of data between the computers in a
network. A distribution strategy for a query is the ordering of data
transmissions and local data processing in a database system.
Generally, a query in Distributed DBMS requires data from
multiple sites, and this need for data from different sites is called
the transmission of data that causes communication costs.
Query processing in DBMS is different from query processing in
centralized DBMS due to the communication cost of data transfer
over the network.
A user sends a query to site S1, which requires data from its own and
also from another site S2. Now, there are three strategies to process this
query which are given below:
1. We can transfer the data from S2 to S1 and then process the query
2. We can transfer the data from S1 to S2 and then process the query
3. We can transfer the data from S1 and S2 to S3 and then process the
query. So the choice depends on various factors like the size of
relations and the results, the communication cost between different
sites, and at which the site result will be utilized.
Site2: DEPARTMENT
DID DNAME
DID- 10 bytes
DName- 20 bytes
Total records- 50
Record Size- 30 bytes
Example:
1. Find the name of employees and their department names.
Also, find the amount of data transfer to execute this query when
the query is submitted to Site 3.
Let’s say that we have two tables R1, R2 on Site S1, and S2. Now, we
will forward the joining column of one table say R1 to the site where
the other table say R2 is located.
This column is joined with R2 at that site.
The decision whether to reduce R1 or R2 can only be made after
comparing the advantages of reducing R1 with that of reducing R2.
Thus, semi-join is a well-organized solution to reduce the transfer of
data in distributed query processing.