HPC Unit 1
HPC Unit 1
1.1 Introduction
For over 40 years, virtually all computers have followed a common machine
model known as the von Neumann computer. Named after the Hungarian
mathematician John von Neumann.
A von Neumann computer uses the stored-program concept. The CPU executes
a stored program that specifies a sequence of read and write operations on the
memory.
Basic design:
Memory is used to store both program and data instructions.
Program instructions are coded data which tell the computer to do
something.
Data is simply information to be used by the program.
A central processing unit (CPU) gets instructions and/or data from memory,
decodes the instructions and then sequentially performs them.
What is Parallel Computing?
Additionally, software has been written for serial computation:
- To be executed by a single computer having a single Central Processing
Unit (CPU);
- Problems are solved by a series of instructions, executed one after the
other by the CPU. Only one instruction may be executed at any moment
in time.
In the simplest sense, parallel computing is the simultaneous use of multiple
compute resources to solve a computational problem.
The compute resources can include:
- A single computer with multiple processors;
- An arbitrary number of computers connected by a network;
- A combination of both.
The computational problem usually demonstrates characteristics such as the
ability to be:
discrete
pieces
of work
that can
be
solved
Solved in less time with multiple compute resources than with a single
compute resource.
electronic circuits
manufacturing processes
oil exploration
Limits to serial computing - both physical and practical reasons pose significant
constraints to simply building ever faster serial computers:
o
The future: during the past 10 years, the trends indicated by ever faster
networks, distributed systems, and multi-processor computer architectures
(even at the desktop level) suggest that parallelism is the future of computing.
It clearly makes no sense to take longer than one minute to predict the weather one
minute from now; otherwise it is cheaper to look out the window.
4
MPPs
Today
Future
The end of Moore's Law?
Nan scale electronics
Exotic architectures? Quantum, DNA/molecular.
1.5.1 Method 1
Utilizing temporal parallelism:
Consider there are 1000 that appeared for the exam. There are 4 questions in each
answer book. If a teacher is to correct these answer books, the following instructions
are given to them.
1. Take an answer book from the pile of answer books.
2. Correct the answer to Q1, namely A1.
3. Repeat step 2 for answers to Q2, Q3,Q4 namely A2,A3,A4.
4. Add marks.
5. Put answer book in pile of corrected answer books.
6. Repeat steps 1 to 5 until no answer books are left.
Ask 4 teachers to correct each answer book by sitting in one line.
The first teacher corrects answer Q1, namely A1 of first paper and passes the
paper to the second teacher.
When the first three papers are corrected, some are idle.
Time taken to correct A1=Time to correct A2= Time to correct A3= Time to
correct A4=5 minutes. Then first answer book takes 20 min.
Total time taken to correct 1000 papers will be 20+ (999*5) =5015 min. This is
about 1/4th of the time taken.
Temporal means pertaining to time.
The method is correct if:
o Jobs are identical.
o Independent tasks are possible.
o Time is same.
o No. of tasks is small compared to total no of jobs.
7
Let no of jobs=n
Time to do a job=p
Each job is divided into k tasks
Time for each task=p/k
Time to complete n jobs with no pipeline processing =np
Time complete n jobs with pipeline processing of k teachers=p+(n-1)p/k=p*[(k+n-1)/k]
Speedup due to pipeline processing=[np/p(k+n-1)/k]=[k/1+(k-1)/n]
Problems encountered:
Synchronization:
o Identical time
Bubbles in pipeline
o Bubbles are formed
Fault tolerance
o Does not tolerate.
Inter task communication
o small
Scalability
o Cant be increased.
1.5.2 Method 2
Utilizing Data Parallelism:
Divide the answer books into four piles and give one pile to each teacher.
Each teacher takes 20 min to correct an answer book, the time taken for 1000 papers
is 5000 min.
Each teacher corrects 250 papers but simultaneously.
Let no of jobs=n
Time to do a job=p
Let there be k teachers
Time to distribute=kq
Time to complete n jobs by single teacher=np
Time to complete n jobs by k teachers=kq+np/k
Speed up due to parallel processing=np/kq+np/k=knp/k8k*q+np=k/1+ (kq/np)
Advantages:
No Synchronization:
No Bubbles in pipeline
More Fault tolerance
No communication
Disadvantages:
Static assignment
Partitionable
8
1.5.3 Method 3
Combined Temporal And Data Parallelism:
Combining method 1 and 2 gives this method.
Two pipelines of teachers are formed and each pipeline is given half of total no of
jobs.
Halves the time taken by single pipeline.
Reduces time to complete set of jobs.
Very efficient for numerical computing in which a no of long vectors and large
matrices are used as data and could be processed.
1.5.4 Method 4
Data Parallelism with Dynamic Assignment:
A head examiner gives one answer book to each teacher.
All teachers simultaneously correct the paper.
A teacher who completes goes to head examiner for another paper.
If second completes at the same time, then he queues up in front of head examiner.
Advantages:
Balancing of the work assigned to each teacher.
Teacher is not forced to be idle.
No bubbles
Overall time is minimized
Disadvantages:
Teachers Have To Wait In The Queue.
Head examiner can become bottle neck
Head examiner is idle after handing the papers.
Difficult to increase the number of teachers.
If speedup of a method is directly proportional to the number, then the method is said
to scale well.
Let total no of papers=n
Let there be k teachers
Time waited to get paper=q
Time for each teacher to get, grade and return a paper= (q+p)
Total time to correct papers by k teachers=[n(q+p)/k]
Speed up due to parallel processing=np/ [n (q+p)/k] =k/[1+(q/p)]
1.5.5 Method 5
Data Parallelism with Quasi-Dynamic Scheduling:
Method 4 can be made better by giving each teacher unequal sets of papers to
correct. Teacher 1,2,3,4 may be given with 7, 9, 11, 13 papers. When finish that
further papers will be given. This randomizes the job completion and reduces the
9
probability of queue. Each job is much smaller compared to the time to actually do the
job. This method is in between purely static and purely dynamic schedule. The jobs
are coarser grain in the sense that a bunch of jobs are assigned and the completion
time will be more than if one job is assigned.
Table 1.1 Difference between temporal and data parallelism:
TEMPORAL PARALLELISM
Independent task
Tasks take equal time
Bubbles leads to idling
Task assignment is static
Not tolerant to processor
Efficient with fine grained
DATA PARALLELISM
Full jobs are assigned
Tasks take different time
No bubbles
Task
assignment
is
static,
dynamic or quasi dynamic
Tolerates to processor
Efficient with coarse grained
1.5.6 Method 6
Specialist Data Parallelism:
There is a head examiner whop dispatches answer papers to teachers. We
assume that teacher 1(T1) grades A1, teacher 2(T2) grades A2 and teacher i(Ti) grades
Ai to question Qi.
Procedure:
Give one answer book to T1,T2,T3,T4
When a corrected answer paper is returned check if all questions are graded. If
yes add marks and put the paper in the output pile.
If no check which questions are not graded
For each I,if Ai is ungraded and teacher Ti is idle send it to teacher Ti or if any
other teacher Tp is idle.
Repeat steps 2,3 and 4 until no answer paper remains in input pile
1.5.7 Method 7
Coarse Grained Specialist Temporal Parallelism:
All teachers are independently and simultaneously at their pace. That teacher
will end up spending a lot of time inefficiently waiting for other teachers to complete
their work.
Procedure:
Answer papers are divided into 4 equal piles and put in the in-trays of each teacher.
Each teacher repeats 4 times simultaneously steps 1 to 5.
For teachers Ti (i=1 to 4) do in parallel
10
1.5.8 Method 8
Agenda Parallelism:
Answer book is thought as an agenda of questions to be graded. All teachers are
asked to work on the first item on agenda, namely grade the answer to first question in
all papers. Head examiner gives one paper to each teacher and asks him to grade the
answer A1 to Q1.When a teacher finishes this, he is given with another paper. This is
data parallel method with dynamic schedule and fine grain tasks.
11
There are 16 tasks in this, in that they have to be carried out in sequence. A graph
showing the relationship among the tasks is given.
12