0% found this document useful (0 votes)
9 views

Notes 03

The document discusses the circuit model of parallel computation and its relationship to other parallel models. It defines the circuit model as a directed acyclic graph of gates where each gate performs a primitive operation. It then proves theorems showing the circuit model can simulate parallel models like the CREW PRAM and that the fastest sequential algorithm runtime lower bounds the size of any circuit solving the problem.

Uploaded by

biubu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Notes 03

The document discusses the circuit model of parallel computation and its relationship to other parallel models. It defines the circuit model as a directed acyclic graph of gates where each gate performs a primitive operation. It then proves theorems showing the circuit model can simulate parallel models like the CREW PRAM and that the fastest sequential algorithm runtime lower bounds the size of any circuit solving the problem.

Uploaded by

biubu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

ICS 643: Advanced Parallel Algorithms Fall 2016

Lecture 3 — September 12, 2016


Prof. Nodari Sitchinava Scribe: Kyle Berney

1 Overview

In the last lecture, we started introducing the various models of parallel computation. We first
covered the shared-memory model, where each processor is connected to a common global memory.
There are multiple forms of the shared-memory model, which are distinguished mainly by how
it handles the situation where 2 or more processors are all attempting to either read or write to
the same memory location. The next model we covered was the local-memory model, where each
processor has its own private memory. Each processor is connected to an interconnection network,
to allow each processor to communicate with each other. We then looked at the modular-memory
model, where each processor is connected to an interconnection network, which is then connected
to k-memory modules. All processors can access each of the k-memory modules. Lastly, we looked
at mixed models, such as the parallel external memory (PEM) model and the GPU architecture.
We finished up the last lecture by examining the different types of interconnection networks. Some
examples are: ethernet, ring, and 2D mesh.
In this lecture, we will first introduce the circuit model. Then, we will look at how the circuit
model is related to parallel models.

2 Circuit Model

The circuit model is comprised of gates; each gate has a constant fan-in and executes a primitive
operation. Some examples of primitive operations of a circuit are addition, subtraction, multipli-
cation, and boolean operations. Each gate works in parallel and starts its execution once all of its
inputs are ready. Hence, we can combine gates together to compute [BM04]. The formal definition
of a circuit is as follows:

Definition 1. A circuit for a particular problem, is a family of directed acyclic graphs (DAG) where
each node is a primitive operation and each edge is a dependency between operations. [JaJ92]

Therefore, in the circuit model, the runtime is equal to the depth of the circuit, which is equivalent
to the longest directed path in the DAG; and the work is equal to the size of the circuit, which is
equivalent to the number of nodes in the DAG [BM04]. Notice that at most there will be kn edges
in the DAG, where n = the number of nodes in the DAG and k = fan-in of the DAG. Thus, if
k = O(1), number of edges in the DAG is Θ(number of nodes in the DAG).

1
2.1 Relation to Parallel Models

Theorem 2. A circuit for a particular problem with time, T (n), and work, W (n), can be simulated
on a CREW PRAM with p processors in time:
W (n)
t(n) ≤ + T (n).
p

Proof. We will consider the circuit level-by-level; let us define:


(
1 if gi is fed by inputs
level(gi ) =
1 + maxgj feeds gi (level(gj )) otherwise

where gi is a gate. Let nl be the number of gates at level l. It will take dnl /pe time to simulate
level l using a p processor CREW PRAM (note: we need concurrent reads in order to handle cases
where two or more gates all use a common input). By definition, we know that the number of levels
in the circuit is T (n), hence we have:
T (n) 
X nl  TX(n) 
nl

t(n) = ≤ +1
p p
l=1 l=1
 
T (n)
1 X 
= · nl + T (n).
p
l=1

PT (n)
By definition, we have that l=1 nl = W (n), therefore:
T (n)
1 X W (n)
t(n) ≤ · nl + T (n) = + T (n)
p p
l=1

Hence, by Brent’s Theorem, if we can create a circuit to solve a problem,


 then we can
 always
W (n)
simulate the circuit using a p processor CREW PRAM algorithm with O p + T (n) runtime
[BM04].
Lemma 3. Let A be the fastest sequential algorithm for some problem; and let TA (n) be its runtime.
Then,
TA (n) ≤ Wc (n)
where Wc (n) is the number of nodes in any circuit that solves the same problem.

Proof. We will prove by contradiction. Assume that there exists a circuit with size Wc (n), such
that Wc (n) < TA (n). By Brent’s Theorem, for p = 1 we have:

T (n) ≤ Wc (n) + Tc (n)

⇒ T (n) ≤ Wc (n) < TA (n), since Wc (n) < TA (n).


We arrived at contradiction, because A is the fastest sequential algorithm and, consequently, it
must be that TA (n) ≤ T (n).

2
Therefore, we can never have a circuit smaller than the fastest sequential algorithm runtime.
We know that given a circuit, we can create a CREW PRAM algorithm by converting each edge into
a memory read and each gate into an operation. Similarly, a p-processor CREW PRAM algorithm
with work W (n) and time T (n) can be converted into a circut with W (n) gates and depth T (n) by
converting each memory read to an incoming edge, each memory write to an outgoing edge, and
each operation into a gate.

3 Conclusion

In conclusion, we now know that given a p processor CREW PRAM algorithm with work W (n)
and time T (n) can be converted
 into a CREW PRAM algorithm with p0 < p processors that has
W (n)
work W (n) and time O p0 + T (n) . In other words, if we design a CREW PRAM algorithm
with p processors, then we can always scale the algorithm down for p0 < p processors. Therefore,
when designing parallel algorithms, we want to design it for the largest possible p.

References

[BM04] Guy Blelloch and Bruce Maggs. Parallel algorithms. 2004.

[JaJ92] Joseph JaJa. Introduction to Parallel Algorithms. Addison Wesley, 1992.

You might also like