0% found this document useful (0 votes)

2 views13 pages

Parrel Query Processing

The document discusses parallel query processing for databases, emphasizing the need for multiple computers to handle large data requests efficiently. It outlines various parallel architectures, types of parallelism, and partitioning schemes, including shared nothing, intra-query, and inter-query parallelism. Additionally, it covers practical applications such as parallel sorting, hashing, and join operations, highlighting the importance of network cost and data partitioning strategies.

Uploaded by

Suman Ghorai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views13 pages

Parrel Query Processing

Uploaded by

Suman Ghorai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

CS 186

Spring 2023 Parallel Query Processing

1 Introduction
Up until now, we have assumed that our database is running on a single computer. For modern
applications that deal with millions of requests over terabytes of data it would be impossible for
one computer to quickly respond to all of those requests. We need to figure out how to run our
database on multiple computers. We will call this parallel query processing because a query
will be run on multiple machines in parallel.

2 Parallel Architectures
How are these machines connected together? The most straightforward option would probably be
to have every CPU share memory and disk. This is called shared memory.

Another option is for each CPU to have its own memory, but all of them share the same disk. This
is called shared disk.

These architectures are easy to reason about, but sharing resources holds back the system. It is
possible to achieve a much higher level of parallelism if all the machines have their own disk and
memory because they do not need to wait for the resource to become available. This architecture
is called shared nothing and will be the architecture we use throughout the rest of the note.

CS 186, Spring 2023, Course Notes 1 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

In shared nothing, the machines communicate with each other solely through the network by passing
messages to each other.

3 Types of Parallelism
The focus of this note will be on intra-query parallelism. Intra-query parallelism attempts to
make one query run as fast as possible by spreading the work over multiple computers. The other
major type of parallelism is inter-query parallelism which gives each machine different queries to
work on so that the system can achieve a high throughput and complete as many queries as possible.
This may sound simple, but doing it correctly is actually quite difficult and we will address how to
do it when we get to the module on concurrency.

3.1 Types of Intra-query Parallelism

We can further divide Intra-query parallelism into two classes: intra-operator and inter-operator.
Intra-operator is making one operator run as quickly as possible. An example of intra-operator
parallelism is dividing up the data onto several machines and having them sort the data in parallel.
This parallelism makes sorting (one operation) as fast as possible. Inter-operator parallelism is
making a query run as fast as possible by running the operators in parallel. For example, imagine
our query plan looks like this:

CS 186, Spring 2023, Course Notes 2 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

One machine can work on sorting R and another machine can sort S at the same time. In inter-
operator parallelism, we parallelize the entire query, not individual operators.

3.2 Types of Inter-operator Parallelism

The last distinction we will make is between the different forms of inter-operator parallelism. The
first type is pipeline parallelism. In pipeline parallelism records are passed to the parent operator
as soon as they are done. The parent operator can work on a record that its child has already
processed while the child operator is working on a different record. As an example, consider the
query plan:

In pipeline parallelism the project and filter can run at the same time because as soon as filter
finishes a record, project can operate on it while filter picks up a new record to operate on. The
other type of inter-operator parallelism is bushy tree parallelism in which different branches of
the tree are run in parallel. The example in section 3.1 where we sort the two files independently
is an example of bushy tree parallelism. This is another example of bushy tree parallelism:

CS 186, Spring 2023, Course Notes 3 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

In bushy tree parallelism, the left branch and right branch can execute at the same time. For
instance, scanning R, S, T, U can all happen at the same time. After that, joining R with S and
joining T with U can happen in parallel as well.

4 Partitioning
Because all of our machines have their own disk we have to decide what data is stored on what
machine. In this class each data page will be stored on only one machine. In industry we call this
sharding and it is used to achieve better performance. If each data page appeared on multiple
machines it would be called replication. This technique is used to achieve better availability (if
one machine goes down another can handle requests), but it presents a host of other challenges
that we will not study in depth in this class.

To decide what machines get what data we will use a partitioning scheme. A partitioning
scheme is a rule that determines what machine a certain record will end up on. The three we will
study are range partitioning, hash partitioning, and round-robin.

4.1 Range Partitioning

In a range partitioning scheme each machine gets a certain range of values that it will store (i.e.
machine 1 will store values 1-5, machine 2 will store values 6-10, and so on).

This scheme is very good for queries that lookup on a specific key (especially range queries compared
to the other schemes we’ll talk about) because you only need to request data from the machines
that the values reside on. It’s one of the schemes we use for parallel sorting and parallel sort merge
join.

4.2 Hash Partitioning

In a hash partitioning scheme, each record is hashed and is sent to a machine matches that hash
value. This means that all like values will be assigned to the same machine (i.e. if value 4 goes to
machine 1 then all of the 4s must go to that machine), but it makes no guarantees about where
close values go.

CS 186, Spring 2023, Course Notes 4 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

It will still perform well for key lookup, but not for range queries (because every machine will
likely need to be queried). Hash partitioning is the other scheme of choice for parallel hashing and
parallel hash join.

4.3 Round Robin Partitioning

The last scheme we will talk about is called round robin partitioning. In this scheme we go
record by record and assign each record to the next machine. For example the first record will be
assigned to the first machine, the second record will be assigned to the second machine and so on.
When we reach the final machine we will assign the next record to the first machine.

This may sound dumb but it has a nice property - every machine is guaranteed to get the same
amount of data. This scheme will actually achieve maximum parallelization. The downside, of
course, is that every machine will need to be activated for every query.

5 Network Cost
So far in this class we have only measured performance in terms of IOs. When we have multiple
machines communicating over a network, however, we also need to consider the network cost.
The network cost is how much data we need to send over the network to do an operation. Here
network can be thought of as the space of communication among machines. It is different from
memory and disk, which is why network cost is counted separately from I/O cost. Network cost is
incurred whenever one machine sends data to another machine. In this class it is usually measured
in KB. One important thing to note is that there is no requirement that entire pages must be sent
over the network (unlike when going from memory to disk), so it is possible for the network cost
to just be 1 record worth of data.

CS 186, Spring 2023, Course Notes 5 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

6 Partitioning Practice Questions

Assume that we have 5 machines and a 1000 page students(sid, name, gpa) table. Initially, all
of the pages start on one machine. Assume pages are 1KB.

1) How much network cost does it take to round-robin partition the table?

2) How many IOs will it take to execute the following query:

SELECT ∗ FROM s t u d e n t s where name = ’ Josh Hug ’ ;
3) Suppose that instead of round robin partitioning the table, we hash partitioned it on the name
column instead, How many IOs would the query from part 2 take?

4) Assume that an IO takes 1ms and the network cost is negligible. How long will the query in part
2 take if the data is round-robin partitioned and if the data is hash partitioned on the name column?

Now assume in the general case that we have n machines and p pages, all pages start on one
machine, and that each pages is k KB. Express answers in terms of n, p, and k.

5) How much network cost does it take to round-robin partition the table?

6) What is the minimum possible network cost to hash partition the table?

7) What is the maximum possible network cost to hash partition the table?

8) Now assume pages are randomly distributed across machines instead of all starting on one
machine. Let the ith machine contain pi pages such that ni=1 pi = p. How much network cost
P
does it take to hash partition the table across all machines in the average case?

7 Partitioning Practice Solutions

1) In round robin partitioning the data is distributed completely evenly. This means that the ma-
chine the data starts on will be assigned 1/5 of the pages. This means that 4/5 of the the pages
will need to move to different machines, so the answer is: 4/5 * 1000 * 1 = 800 KB.

2) When the data is round robin partitioned we have no idea what machine(s) the records we
need will be on. This means we will have to do full scans on all of the machines so we will have to
do a total of 1000 IOs.

3) When the data is hash partitioned on the name column, we know exactly what machine to
go to for this query (we can calculate the hash value of ’Josh Hug’ and find out what machine is

CS 186, Spring 2023, Course Notes 6 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

assigned that hash value). This means we will only have to a full scan over 1 machine for a total
of 200 IOs. Of course, this assumes all records with value ’Josh Hug’ can fit on one machine.

4) Under both partitioning schemes it will take 200ms. Each machine will take the same amount
of time to do a full scan, and all machines can be run at the same time so for runtime it doesn’t
matter how many machines we need to query1 .

5) We send n−1
n of the pages to other machines, leaving
1
n at the starting machine. Therefore,
the network cost is then pk(n−1)
n .

6) In the minimum possible network cost, all tuples hash to the starting machine so we do not
need to send any tuples to other machines, resulting in a minimum network cost of 0.

7) In the minimum possible network cost, all tuples hash to other machines machine so we need to
send all tuples to other machines, resulting in a maximum network cost of pk.

n−1
8) In the average case, each machine will hash n of its pages to other machines, resulting in
Pn pk(n−1)
a network cost of n−1
n i=1 pi k = n .

8 Parallel Sorting
Let’s start speeding up some algorithms we already know by parallelizing them. There are two
steps for parallel sorting:

1. Range partition the table

2. Perform local sort on each machine

We range partition the table because then once each machine sorts its data, the entire table is in
sorted order (the data on each machine can be simply concatenated together if needed).

Generally, it will take (1 pass to partition the table across machines) + (number of passes needed
to sort table) passes to parallel sort a table. This is equivalent to 1 + ⌈1 + logB−1 ⌈N/mB⌉⌉ passes
(where m is the number of machines) to sort the data in the best case.

9 Parallel Hashing
Parallel hashing is very similar to parallel sorting. The two steps are:
1
In reality, there may be some variance in how long each machine takes to complete a scan, and the likelihood of
a ”straggler” (i.e. slow machine) increases with the number of machines involved in a query, but for simplicity we
ignore this detail

CS 186, Spring 2023, Course Notes 7 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

1. Hash partition the table

2. Perform local hashing on each machine

Hash partitioning the table guarantees that like values will be assigned to the same machine.
It would also be valid to range partition the data (because it has the same guarantee). However,
in practice, range partitioning is a little harder (how do you come up with the ranges efficiently?)
than hash-partitioning so use hash partitioning when possible.

10 Parallel Sort Merge Join

The two steps for parallel sort merge join are:

1. Range partition each table using the same ranges on the join column

2. Perform local sort merge join on each machine

We need to use the same ranges to guarantee that all matches appear on the same machine. If we
used different ranges for each table, then it’s possible that a record from table R will appear on a
different machine than a record from table S even if they have the same value for the join column
which will prevent these records from ever getting joined.

The total number of passes for parallel SMJ is (1 pass/table to partition across machines) +
(number of passes needed to sort R) + (number of passes to sort S) + (1 final merge sort pass,
going through both tables). This is 2 + ⌈1 + logB−1 ⌈R/mB⌉⌉ + ⌈1 + logB−1 ⌈S/mB⌉⌉ + 2 passes
(where m is the number of machines).

CS 186, Spring 2023, Course Notes 8 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

11 Parallel Grace Hash Join

The two steps for parallel hash join are:
1. Hash partition each table using the same hash function on the join column

2. Perform local grace hash join on each machine

Similarly to parallel SMJ, we need to use the same hash function for both tables to guarantee that
matching records will be partitioned to the same machines.

CS 186, Spring 2023, Course Notes 9 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

12 Broadcast Join
Let’s say we want to join a 1,000,000 page table that is currently round robin partitioned with a
100 page table that is currently all stored on one machine. We could do one of the parallel join
algorithms discussed above, but to do this we would need to either range partition or hash partition
the tables. It will take a ton of network cost to partition the big table.

Instead, we can use a broadcast join. A broadcast join will send the entire small relation to
every machine, and then each machine will perform a local join. The concatenation of the results
from each machine will be the final result of the join. While this algorithm has each machine
operate on more data, we make up for this by sending much less data over the network. When
we’re joining together one really large table and one small table, a broadcast join will normally be
the fastest because of its network cost advantages.

For instance, if relation R is small, we could do a broadcast join by sending R to all machines that
have a partition of S. We perform a local join at each machine and union the results.

13 Symmetric Hash Join

The parallel joins we have discussed so far have been about making the join itself run as fast as
possible by distributing the data across multiple machines. The problem with the join algorithms
we have discussed so far is that they are pipeline breakers. This means that the join cannot
produce output until it has processed every record. Why is this a problem? It prevents us from
using pipeline parallelism. The operators above cannot work in parallel to the join because the join
is taking a lot of time and then producing all of the output at once.

Sort Merge Join is a pipeline breaker because sorting is a pipeline breaker. You cannot produce

CS 186, Spring 2023, Course Notes 10 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

the first row of output in sorting until you have seen all the input (how else would you know if your
row is really the smallest?). Hash Join is a pipeline breaker because you cannot probe the hash
table of each partition until you know that hash table has all the data in it. If you tried to probe
one of the hash tables after processing only half the data, you might get only half of the matches
that you should! Let’s try to build a hash based join that isn’t a pipeline breaker.

Symmetric Hash Join is a join algorithm that is pipeline-friendly, we can start producing output
as soon as we see our first matches. Here are the steps to symmetric hash join:

1. Build two hash tables, one for each table in the join

2. When a record from R arrives, probe the hash table for S for all of the matches. When a
record from S arrives, probe the hash table for R for all of the matches.

3. Whenever a record arrives add it to its corresponding hash table after probing the other hash
table for matches.

This works because every output tuple gets generated exactly once - when the second record in the
match arrives! But we produce output as soon as we see a match so it’s not a pipeline breaker.

14 Hierarchical Aggregation
The final parallel algorithm we’ll discuss is called hierarchical aggregation. Hierarchical aggre-
gation is how we parallelize aggregation operations (i.e. SUM, COUNT, AVG). Each aggregation
is implemented differently, so we’ll only discuss two in this section, but the ideas should carry over
to the other operations easily.

To parallelize COUNT, each machine individually counts their records. The machines all send
their counts to the coordinator machine who will then sum them together to figure out the overall
count.

CS 186, Spring 2023, Course Notes 11 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

AVG is a little harder to calculate because the average of a bunch of averages isn’t necessarily the
average of the data set. To parallelize AVG each machine must calculate the sum of all the values
and the count. They then send these values to the coordinator machine. The coordinator machine
adds up the sums to calculate the overall sum and then adds up the counts to calculate the overall
count. It then divides the sum by the count to calculate the final average.

15 Parallel Algorithms Practice Questions

For 1 and 2, assume we have a 100 page table R that we will join with a 400 page table S. All the
pages start on machine 1, and there are 4 machines in total with 30 buffer pages each.

1) How many passes are needed to do a parallel unoptimized SMJ on the data? For this ques-
tion a pass is defined as a full pass over either table (so if we have to do 1 pass over R and 1 pass
over S it is 2 total passes).

2) How many passes are needed to a parallel Grace Hash Join on the data?

3) We want to calculate the max in parallel using hierarchical aggregation. What value should
each machine calculate and what should the coordinator do with those values?

4) If instead we had a 10000 page table R round-robin partitioned across 10 machines and the
same 400 page table S on machine 1, what type of join would be appropriate for an equijoin?

5) If we used a non-uniform hash function which resulted in an uneven hash partition of our
records, how will this affect the total time it takes to run parallel Grace Hash Join?

6) Consider a parallel version of Block Nested Loop Join in which we first range partition tuples
from both tables and then perform BNLJ on each individual machine. Is this parallel algorithm
pipeline friendly?

CS 186, Spring 2023, Course Notes 12 Brian DeLeonardis and Jeremy Dong
CS 186
Spring 2023 Parallel Query Processing

16 Parallel Algorithms Practice Solutions

1) 2 passes to partition the data (1 for each table). Each machine will then have 25 pages for R
and 100 pages for S. R can be sorted in 1 pass but S requires 2 passes. Then it will take 2 passes
to merge the tables together (1 for each table). This gives us a total of 2 (partition) + 1 (sort R)
+ 2 (sort S) + 2 (merge relations) = 7 passes.

2) Again we will need 2 passes to partition the data across the four machines and each machine
will have 25 pages for R and 100 for S. We don’t need any partitioning passes because R fits in
memory on every machine, so we only need to do build and probe, which will take 2 passes (1 for
each table). This gives us a total of 4 passes.

3) Each machine should calculate the max and the coordinator will take the max of those maxes.

4) A broadcast join since this type of join helps reduce network costs when we have two tables
with drastically different sizes and the larger one is split into many machines.

5) The parallel join algorithm would only finish when the last machine finishes its Grace Hash
Join. This last machine is likely to be the machine that received the most pages from the parti-
tioning phase since it may take more I/Os to hash it.

6) No, because if BNLJ runs for each block of the outer relation R, we need scan through the
inner relation S to find matching tuples and thus, we would need the machine to have all of S first
before we can begin the join. Therefore, this is a pipeline breaker.

17 Past Exam Problems

• Fa22 Final Question 9

• Sp22 Final Question 9

• Fa21 Final Question 10

• Sp21 Final Question 4b, 8bi

• Fa20 Final Question 6

CS 186, Spring 2023, Course Notes 13 Brian DeLeonardis and Jeremy Dong

The Complete Future Trait Guide
From Everand
The Complete Future Trait Guide
Hamze Ghalebi
No ratings yet
Class 11 Maths
No ratings yet
Class 11 Maths
376 pages
Lecture 1 Parallel Databases
No ratings yet
Lecture 1 Parallel Databases
30 pages
Unit 5
No ratings yet
Unit 5
185 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
CS3492 DBMS Unit-4
No ratings yet
CS3492 DBMS Unit-4
24 pages
CH14
No ratings yet
CH14
43 pages
Parallel Database System
No ratings yet
Parallel Database System
55 pages
Lecture16 Fall
No ratings yet
Lecture16 Fall
81 pages
Adbms (Bca) 2 1744958912050
No ratings yet
Adbms (Bca) 2 1744958912050
40 pages
QueryProcessing Sorting
No ratings yet
QueryProcessing Sorting
44 pages
Lecture15 Fall
No ratings yet
Lecture15 Fall
102 pages
Elective-I Advanced Database Management Systems: Unit Ii
100% (1)
Elective-I Advanced Database Management Systems: Unit Ii
141 pages
02 - Indices
No ratings yet
02 - Indices
208 pages
Parallel and Distributed Databases in DBMS
No ratings yet
Parallel and Distributed Databases in DBMS
31 pages
Dsa Imp 1
No ratings yet
Dsa Imp 1
23 pages
Module III
No ratings yet
Module III
132 pages
Parallel
No ratings yet
Parallel
59 pages
CAS CS 460/660 Introduction To Database Systems Query Evaluation I
No ratings yet
CAS CS 460/660 Introduction To Database Systems Query Evaluation I
32 pages
Parallel Dbs
No ratings yet
Parallel Dbs
42 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
UEU Basis Data Pertemuan 14
No ratings yet
UEU Basis Data Pertemuan 14
32 pages
Third Year Engineering: 21BTCS604 - Advanced DBMS
No ratings yet
Third Year Engineering: 21BTCS604 - Advanced DBMS
51 pages
Execution
No ratings yet
Execution
37 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
1 - Disk Storage - Ch13
No ratings yet
1 - Disk Storage - Ch13
31 pages
Unit 5 Parallel and Distributed Databases
No ratings yet
Unit 5 Parallel and Distributed Databases
22 pages
2 Parallel Databases
No ratings yet
2 Parallel Databases
44 pages
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
No ratings yet
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
114 pages
Introduction To Data Classification and Prediction
No ratings yet
Introduction To Data Classification and Prediction
9 pages
Operating System Interview Questions and Answers
From Everand
Operating System Interview Questions and Answers
Manish Soni
No ratings yet
Depreciation
No ratings yet
Depreciation
1,694 pages
DBMS Indexing and Storage
No ratings yet
DBMS Indexing and Storage
53 pages
Fundamentals of Database Systems: (Parallel and Distributed Databases)
No ratings yet
Fundamentals of Database Systems: (Parallel and Distributed Databases)
46 pages
Adv DBMS-Unit 2
No ratings yet
Adv DBMS-Unit 2
15 pages
Index Dbms
No ratings yet
Index Dbms
5 pages
ADTHEORY1
No ratings yet
ADTHEORY1
15 pages
Introduction To DBMS
No ratings yet
Introduction To DBMS
37 pages
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
No ratings yet
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
27 pages
06 Query Processing (2) - NDN
No ratings yet
06 Query Processing (2) - NDN
31 pages
Parallel DB /D.S.Jagli 1 5/4/2012 1 1. Parallel DB /D.S.Jagli
No ratings yet
Parallel DB /D.S.Jagli 1 5/4/2012 1 1. Parallel DB /D.S.Jagli
70 pages
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
No ratings yet
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
42 pages
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
TDD: Topics in Distributed Databases: Parallel Database Management Systems
No ratings yet
TDD: Topics in Distributed Databases: Parallel Database Management Systems
38 pages
Parallel Database QA Detailed
No ratings yet
Parallel Database QA Detailed
2 pages
UGRD-ITE6201-Data-Structures-and-Algorithms-legit-not-quizess MidALL
100% (3)
UGRD-ITE6201-Data-Structures-and-Algorithms-legit-not-quizess MidALL
17 pages
Web Content Mining
100% (1)
Web Content Mining
112 pages
Parallel Databases
No ratings yet
Parallel Databases
19 pages
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
No ratings yet
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
42 pages
Data Processing Systems
No ratings yet
Data Processing Systems
2 pages
Unit I
No ratings yet
Unit I
43 pages
CLX 120 HL7 Interface User's Manual
0% (1)
CLX 120 HL7 Interface User's Manual
50 pages
Chapter 21: Parallel Databases
No ratings yet
Chapter 21: Parallel Databases
43 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
File Structure Data Storage Query Evaluation Indexing and Hashing
No ratings yet
File Structure Data Storage Query Evaluation Indexing and Hashing
14 pages
ParallelDBs PDF
No ratings yet
ParallelDBs PDF
23 pages
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
No ratings yet
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
23 pages
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
18s PDF
No ratings yet
18s PDF
6 pages
Computer Science 24 25
No ratings yet
Computer Science 24 25
27 pages
Chapter 20: Parallel Databases
No ratings yet
Chapter 20: Parallel Databases
6 pages
Python Basic
No ratings yet
Python Basic
73 pages
Lecture 10: Parallel Query Evaluation: CS 838: Foundations of Data Management Spring 2016
No ratings yet
Lecture 10: Parallel Query Evaluation: CS 838: Foundations of Data Management Spring 2016
4 pages
Change Management A Guide To Successful Transitions
No ratings yet
Change Management A Guide To Successful Transitions
10 pages
Chapter 5 - AISe - Student
No ratings yet
Chapter 5 - AISe - Student
78 pages
Ihelp Catalogue 2025
No ratings yet
Ihelp Catalogue 2025
35 pages
Payroll Exam2
No ratings yet
Payroll Exam2
20 pages
Figlmx Eacct
No ratings yet
Figlmx Eacct
2 pages
Learning Plan Computer 7 Word
No ratings yet
Learning Plan Computer 7 Word
9 pages
Centralized Versus Distributed DBMS: T T T T A A A A
No ratings yet
Centralized Versus Distributed DBMS: T T T T A A A A
3 pages
Parallelisation Comment
No ratings yet
Parallelisation Comment
3 pages
Windows 11 Version 23H2 - Everything You Need To Know
No ratings yet
Windows 11 Version 23H2 - Everything You Need To Know
14 pages
Chapter 3
No ratings yet
Chapter 3
50 pages
Dbms FNL Report
No ratings yet
Dbms FNL Report
25 pages
VE IT Tests PrTest01
No ratings yet
VE IT Tests PrTest01
4 pages
Deadlocks
No ratings yet
Deadlocks
9 pages
Data Partitioning Methods
No ratings yet
Data Partitioning Methods
9 pages
Saddam Confirmation
No ratings yet
Saddam Confirmation
10 pages
Datamining 1
No ratings yet
Datamining 1
7 pages
DBMS Case Study
No ratings yet
DBMS Case Study
12 pages
What Is A DBMS
No ratings yet
What Is A DBMS
11 pages
Esd Seminar
No ratings yet
Esd Seminar
11 pages
Instructions 2
No ratings yet
Instructions 2
7 pages
Hantek Mode Emploi 4
No ratings yet
Hantek Mode Emploi 4
11 pages
Introduction To Distributed DBMS Architecture
No ratings yet
Introduction To Distributed DBMS Architecture
7 pages
Name Suman Ghorai
No ratings yet
Name Suman Ghorai
7 pages
Gradient Leakage Attacks in Federated Learning - Research Frontiers, Taxonomy and Future Directions
No ratings yet
Gradient Leakage Attacks in Federated Learning - Research Frontiers, Taxonomy and Future Directions
8 pages
Research Paper Tungkol Sa Kahirapan Sa Pilipinas
No ratings yet
Research Paper Tungkol Sa Kahirapan Sa Pilipinas
8 pages
Evaluations of Lte Automatic Neighbor Relations
No ratings yet
Evaluations of Lte Automatic Neighbor Relations
5 pages
Attack Cloud
No ratings yet
Attack Cloud
1 page
User Management Module
No ratings yet
User Management Module
3 pages
Sample Question Bank-1
No ratings yet
Sample Question Bank-1
2 pages
RANE TWELVE mkII Serato DJ Pro Quick-Start Guide
No ratings yet
RANE TWELVE mkII Serato DJ Pro Quick-Start Guide
4 pages
Dependencies: On This Page
No ratings yet
Dependencies: On This Page
4 pages
Balogun Warith 2022
No ratings yet
Balogun Warith 2022
1 page
VITAL BRACE - Normal - Manual - EN2
No ratings yet
VITAL BRACE - Normal - Manual - EN2
1 page
Resume D Category
No ratings yet
Resume D Category
1 page

Parrel Query Processing

Uploaded by

Parrel Query Processing

Uploaded by

CS 186

Spring 2023 Parallel Query Processing

3.1 Types of Intra-query Parallelism

3.2 Types of Inter-operator Parallelism

4.1 Range Partitioning

4.2 Hash Partitioning

4.3 Round Robin Partitioning

6 Partitioning Practice Questions

2) How many IOs will it take to execute the following query:

7 Partitioning Practice Solutions

1. Range partition the table

2. Perform local sort on each machine

1. Hash partition the table

2. Perform local hashing on each machine

10 Parallel Sort Merge Join

2. Perform local sort merge join on each machine

11 Parallel Grace Hash Join

2. Perform local grace hash join on each machine

13 Symmetric Hash Join

15 Parallel Algorithms Practice Questions

16 Parallel Algorithms Practice Solutions

17 Past Exam Problems

• Sp22 Final Question 9

• Fa21 Final Question 10

• Sp21 Final Question 4b, 8bi

• Fa20 Final Question 6

You might also like