0% found this document useful (0 votes)

158 views32 pages

Chained Matrix Multiplication

The document discusses chained matrix multiplication and parallelizing its computation. It describes calculating the minimum number of multiplications needed to multiply a sequence of matrices by determining the optimal parenthesization. This problem can be solved using dynamic programming with a 2D array to store intermediate results. Three approaches to parallelization are presented: using a pipeline model by computing diagonals of the results array concurrently, a centralized work pool model, and an idealized event-driven MPI implementation. Pseudocode for the centralized work pool approach is provided.

Uploaded by

कमल कुलश्रेष्ठ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

158 views32 pages

Chained Matrix Multiplication

Uploaded by

कमल कुलश्रेष्ठ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 32

In The Name Of God

Chained Matrix Multiplication

Alireza Nikseresht Fall 2004

Multiplying unequal matrices

Suppose we want to multiply two matrices do not have the same number of rows and columns We can multiply two matrices A1 and A2 only if the number of columns of A1 is equal to the number of rows of A2 Example: We want to multiply a 2 X 3 matrix by a 3 X 4 matrix This will have 4 terms in the top row and 4 in the bottom Each term is the result of 3 multiplications So the total number of multiplications is 2*3*4

Generalizing: if we want to multiply an N X M matrix by an M X P matrix it will take NMP multiplications

Chained Matrix Multiplication

We are given a sequence (chain) A1, A2, ,An of n matrices, and we wish to find the product The way we parenthesize a chain of matrices can have a dramatic impact on the cost of evaluating the product. This problem is to determine the best way to parenthesize the matrices to minimize the number of multiplications

Example:

A1 5 X 3 A2 3 X 4 A3 4 X 6 A4 6 X 5 The problem: what is the best order to multiply them? If we multiply (A1( (A2A3) A4) ) takes 237 multiplications (A1 (A2 (A3A4) )) takes 255 multiplications ( (A1A2) (A3A4) ) takes 280 multiplications ( ( (A1A2) A3) A4) takes 330 multiplications ( (A1(A2A3) ) A4) takes 312 multiplications

How to parenthesize the matrices

In the case of four matrices, there are only five ways to order the multiplications But with n matrices, the number of ways to parenthesize them grows exponentially (4n/n3/2) so we do not want to look at all the possibilities Dividing the problem into subproblems We use the principal of optimality which is said to apply if an optimal solution to an instance of a problem always contains optimal solutions to all substances. If A1((((A2A3)A4)A5)A6) is the optimal order then we know that (A2A3)A4 is the optimal order for A2A3A4

The matrix-chain problem

Suppose we have matrix-chain A1 .. An We divide this into subproblems A1..Ak and Ak+1 .. An The problem is that we do not know what the k should be We find k by looking at the optimal solutions of each of the subproblems. This means looking at all the values for k

Data Structure
The 2-D array called N
1.

N[i][j] will hold the number of multiplications to multiply from Ai to Aj N[i][i] This of course is zero, since it is a chain of length one Ni,j = min{ Ni,k + Nk+1,,j + di-1*dk*dj } where i <= k < j

Another Example

A1 30 X 35 A2 35 X 15 A3 15 X 5 A4 5 X 10 A5 10 X 20 A6 20 X 25

Ni,j = min{ Ni,k + Nk+1,j + di-1dkdj } where i <= k < j

1
1 2 3 0 0

2
15,750

3
7,875 2,625 0

6
24000 12375 5,375

9,375 15375 4,375 8625 750 2500

4
5 6

1,000
0

3,500
5,000 0

Sequential Code

int minMult(int n, int [ ] d, index [ ][ ] P) { index i,j, k dia; int [ ][ ] M = new int[1..n][1..n] for (i = 1; i <= n; i++) M[i][i] = 0; // initialize for (dia=1; dia<=n-1; dia++) for (i=1; i <=n-dia; i++) { j = I + dia; M[i][j]= min M[i][k] + M[k+1][j] +d[i1]*d[k] * d[j]; i k j-1 } return M[i][j]; }

Now How to Parallelize This Program?

1 1 0

2 15,750

3 7,875

4 9,375

5 11,875 15,125

2,625

4,375

7,125

10,500

750

5,375

1,000

3,500

5,000

Ni,j = min{ Ni,k + Nk+1,j + di-1dkdj } where i <= k < j

To Calculate Diagonal 1 We need no data To Calculate Diagonal 2 We need Diagonal 1 Elements .. So we implement each diagonal calculation in one step or one Processor Step (or Processor) 2 Need Data from Step 1 Step (or Processor) 3 Need Data from Step 1 and Step 2

Pipeline Design
We know that pipeline approach can provide increased speed under the following 3 types of computation : 1-if more than one instance of the complete problem is to be executed. 2-if a series of data item must be processed, each requiring multiple operations 3-if information to start the next process can be passed forward before the process has completed all its internal operations.

If we see the previous table we can understand that step 2 can started after that step 1 calculate 2 first element . in this order each step can start to calculate after that previous step generate 2 first element .
P1 . P2 . P3 (n-1) + (n-2) Message (n-1) Message

Centralized Work pool

In this implementation we can divide problem in to n-1 step ( when we have n matrix ) In each step we calculate elements of this step (diagonal ) it means that we calculate 1 diagonal in each step We have one server and n (n>0) clients All clients and server know step number Clients request job from server and server send a job for this client.

Server
Centralize Work Pool

Result Job

Client 1

Client2

Client n

Dreams

Suppose that MPI is Event Driven ! What happened? We can implement our program very simple and efficient The size of Message Passing is very low because we can use of on demand request, it means that we can request of any processor if we need.

If we go back and look at pipeline implementation of chained matrix multiplication we can see the number of message that pass between processes is very high and some of them is not necessary Now we suppose that MPI is Event Driven and write the pipeline program . In this implementation each process calculate a diagonal ( process p calculate p diagonal ) .

Void Main()
//Process Number P that should calculate P // Diagonal For(i=1;i<=n-P;i++) { j=P+I; N[i][j]=min( GetNij ( i , k ) + GetNij ( k+1 , j ) + ( di-1 )( dk )( dj ) ); //i<=k<j }

Int GetNij(int I, int j) { if ( this process dont have N[i][j] ) { To=j-i; send(request,i,j,To); recv(data); N[i][j]=data; } return N[i][j]; }

bool MPIEVENTS (MPIEventType Event) bool Handled; switch ( Event ) { case recv: MPI_Recv( message , From ) ; if ( requestdata ) MPI_Send( data , From ); Handled=true; break; default : Handled=false; break; }

Now We write code for real mpi . We want to implement and write code for centeralized work pool . In this implementation we have 4 functions: 1-void Server(); 2-void Client(); 3-calculate( i , j , Value[] ); 4-map( i , j , im , jm );

#include "stdafx.h" #include "iostream.h" #include "stdlib.h" #include "stdio.h" #include <mpi.h> #define n 6 //4 #define request 0 #define value 1 #define CONTINUE 1 #define STOP 0 #define infinit 99999 void Server(MPI_Comm comm , int processors); void Client( int my_rank , MPI_Comm comm ); int Calculate(int I , int J ,int Nv[n*2] ); int map(int i, int j , int I , int J , int Nv[n*2]); void fill_Nv (int x , int y ,int *Nv); int N[n+1][n+1]={0}; int d[n+1]={30,35,15,5,10,20,25};//{5,3,4,6,5};//{5,2,3,4,6,7,8};

int main(int argc, char* argv[]) { int my_rank; int processors; MPI_Comm io_comm; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &processors); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_dup(MPI_COMM_WORLD, &io_comm); if(my_rank==0) if(processors<2) { cout<<"the number of process should be grater than 1"<<endl; MPI_Finalize(); exit(0); } MPI_Bcast(d, n+1, MPI_INT, 0, MPI_COMM_WORLD);

if(my_rank==0) Server(MPI_COMM_WORLD,processors); else Client( my_rank , MPI_COMM_WORLD );

MPI_Finalize();

return 0;
}

void Server(MPI_Comm comm, int processors) { int Step=1; int x[3]; int To=0; int Nv[n*2]; int Count=0; MPI_Status st; while ( Step < n ) { for( int i=0 ; i < n-Step ; i++ ) { MPI_Recv(&x,3, MPI_INT,MPI_ANY_SOURCE,request,comm,&st); To=x[0]; x[0]=Step; x[1]=i+1; x[2]=Step+1+i; fill_Nv ( x[1] , x[2] , Nv); MPI_Send(&x,3,MPI_INT,To,0,comm); Count=(Step-1)*2; MPI_Send(&Nv,Count,MPI_INT,To,1,comm); MPI_Recv(&x,3, MPI_INT,MPI_ANY_SOURCE,value,comm,&st); N[x[0]][x[1]]=x[2]; } Step++; }

for ( int i =1 ; i < processors ; i++ ) { MPI_Recv(&x,3, MPI_INT,MPI_ANY_SOURCE,request,comm,&st); To=x[0]; x[0]=STOP; MPI_Send(&x,3,MPI_INT,To,0,comm); } cout <<" N is :"<<endl<<endl; for(i=1;i<=n;i++) {for(int j=1;j<=n;j++) cout<<N[i][j]<<" "; cout<<endl;} cout<<endl<<endl<<"minimom multipliction is : " <<N[1][n];

void Client( int my_rank , MPI_Comm comm ) { int x[3]; int I,J,Step,Count; int Val; int Nv[n*2]={0}; MPI_Status st; while( true ) { x[0]=my_rank; MPI_Send(&x,3,MPI_INT,0,request,comm); MPI_Recv(&x,3, MPI_INT,0,0,comm,&st); if( x[0] == STOP ) break; Step=x[0]; I=x[1]; J=x[2]; Count=(Step-1)*2; MPI_Recv(&Nv,Count, MPI_INT,0,1,comm,&st); Val=Calculate(I,J,Nv); x[0]=I; x[1]=J; x[2]=Val; MPI_Send(&x,3,MPI_INT,0,value,comm); } }

int Calculate(int I , int J ,int Nv[n*2] ) { int minval=infinit; int val; int k=0;
for ( k = I ; k < J ; k++ ) { val=map(I,k,I,J,Nv) + map(k+1,J,I,J,Nv) + d[I1]*d[k]*d[J]; if ( minval > val ) minval=val; } return minval; }

int map(int i, int j , int I , int J , int Nv[n*2]) { if ( i == j ) return 0; if ( j == J ) return Nv[( i - I ) + ( j - i - 1 ) ]; else return Nv[( J - j ) - 1 ];
}

void fill_Nv (int x , int y ,int *Nv) { int i=x+1; int j=y-1; int k=0; while ( j > x ) { Nv[k]=N[x][j]; k++; j--; } while ( i < y ) { Nv[k]=N[i][y]; k++; i++; } }

Dynamic Programming
0% (1)
Dynamic Programming
51 pages
Module IV
No ratings yet
Module IV
19 pages
The ARM Instruction Set: Advanced RISC Machines
No ratings yet
The ARM Instruction Set: Advanced RISC Machines
58 pages
Algorithm Design
No ratings yet
Algorithm Design
579 pages
Embedded System Design
No ratings yet
Embedded System Design
8 pages
Matrix Chain Multiplication
No ratings yet
Matrix Chain Multiplication
11 pages
Matrix Chain Multiplication
100% (1)
Matrix Chain Multiplication
20 pages
Co Question Bank
No ratings yet
Co Question Bank
6 pages
Dynamic Programming
No ratings yet
Dynamic Programming
35 pages
DAA Unit2
No ratings yet
DAA Unit2
56 pages
Dynamic Programming
No ratings yet
Dynamic Programming
68 pages
Greedy DP
No ratings yet
Greedy DP
57 pages
To Read Dynprog2
No ratings yet
To Read Dynprog2
50 pages
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
No ratings yet
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
27 pages
Lecture14 - Dynamic II
No ratings yet
Lecture14 - Dynamic II
31 pages
Dynamic Programming
No ratings yet
Dynamic Programming
15 pages
Dynamic Programming
No ratings yet
Dynamic Programming
68 pages
Algorithms and Data Structure
No ratings yet
Algorithms and Data Structure
29 pages
Lec08 Dynamic Programming2024
No ratings yet
Lec08 Dynamic Programming2024
78 pages
6th Sem Ete MCQ
100% (1)
6th Sem Ete MCQ
37 pages
Dynamic Programming
No ratings yet
Dynamic Programming
47 pages
Dynamic II
No ratings yet
Dynamic II
31 pages
Matrix Chain Multiplication
No ratings yet
Matrix Chain Multiplication
22 pages
DAA Lecture 9
No ratings yet
DAA Lecture 9
26 pages
Fundamentals of Algorithms 10B11CI411: Dynamic Programming Instructor: Raju Pal
No ratings yet
Fundamentals of Algorithms 10B11CI411: Dynamic Programming Instructor: Raju Pal
70 pages
Dynamic Prog
No ratings yet
Dynamic Prog
48 pages
M2-Matrix Chain Multiplication
No ratings yet
M2-Matrix Chain Multiplication
23 pages
Introduction To Algorithms: Dynamic Programming
No ratings yet
Introduction To Algorithms: Dynamic Programming
25 pages
Bhagaban - Dynamic - Programming Intro - Matrix - Elemnts - Unit - II - 4
No ratings yet
Bhagaban - Dynamic - Programming Intro - Matrix - Elemnts - Unit - II - 4
37 pages
Daaunit IV
No ratings yet
Daaunit IV
17 pages
Lecture 23 (OBST, Knapsack) (17 Files Merged)
No ratings yet
Lecture 23 (OBST, Knapsack) (17 Files Merged)
350 pages
Daa Module 2
No ratings yet
Daa Module 2
17 pages
Dynamic Programming
No ratings yet
Dynamic Programming
41 pages
Lec12.1-Dynamic Programming 2
No ratings yet
Lec12.1-Dynamic Programming 2
32 pages
Dynamic Programming
No ratings yet
Dynamic Programming
43 pages
Sadcx
No ratings yet
Sadcx
19 pages
Dynamic Prog Updated
No ratings yet
Dynamic Prog Updated
66 pages
Dynamic Programming
No ratings yet
Dynamic Programming
26 pages
Daaunit Iv2
No ratings yet
Daaunit Iv2
5 pages
Dynamic Programming
No ratings yet
Dynamic Programming
52 pages
Matrix Chain Multiplication Algorithm
No ratings yet
Matrix Chain Multiplication Algorithm
2 pages
Dynamic Programming
No ratings yet
Dynamic Programming
22 pages
Matrix Chain Multiplication
No ratings yet
Matrix Chain Multiplication
4 pages
Daa Lecture CSE
No ratings yet
Daa Lecture CSE
6 pages
Lecture 6 1
No ratings yet
Lecture 6 1
97 pages
Sample EXP 3
No ratings yet
Sample EXP 3
9 pages
To Print - Dynprog2
No ratings yet
To Print - Dynprog2
46 pages
Mewar VishwavidyalayareportFormat
No ratings yet
Mewar VishwavidyalayareportFormat
4 pages
Aad 4
No ratings yet
Aad 4
19 pages
Multiplym 2
No ratings yet
Multiplym 2
23 pages
Unit V: Dynamic Programming
No ratings yet
Unit V: Dynamic Programming
16 pages
HPC Module 1
0% (1)
HPC Module 1
32 pages
Unit 4
No ratings yet
Unit 4
12 pages
Dynamic Programming
No ratings yet
Dynamic Programming
18 pages
Matrix-Chain Multiplication As An Optimization Problem
No ratings yet
Matrix-Chain Multiplication As An Optimization Problem
23 pages
Dynamic Programming
No ratings yet
Dynamic Programming
7 pages
Control Hazards
No ratings yet
Control Hazards
19 pages
Adsa Unit - 4
No ratings yet
Adsa Unit - 4
33 pages
ServerAdmin v10.6
No ratings yet
ServerAdmin v10.6
197 pages
Dynamic Programming: Department of CSE JNTUA College of Engg., Kalikiri
No ratings yet
Dynamic Programming: Department of CSE JNTUA College of Engg., Kalikiri
66 pages
CSE 820 Graduate Computer Architecture: Dr. Enbody
No ratings yet
CSE 820 Graduate Computer Architecture: Dr. Enbody
25 pages
Design and Analysis of Algorithm
No ratings yet
Design and Analysis of Algorithm
89 pages
הרצאה 9
No ratings yet
הרצאה 9
64 pages
Dynamic Programming: Matrix Chain Multiplication: 1 The Problem
No ratings yet
Dynamic Programming: Matrix Chain Multiplication: 1 The Problem
5 pages
Chapter-10 Parallel Programming Models, Languages and Compilers
No ratings yet
Chapter-10 Parallel Programming Models, Languages and Compilers
5 pages
MATRIX CHAIN Multiplication
No ratings yet
MATRIX CHAIN Multiplication
41 pages
Coa Applied
No ratings yet
Coa Applied
13 pages
Matrix Chain Mult
No ratings yet
Matrix Chain Mult
11 pages
8086 8088 Microprocessor
No ratings yet
8086 8088 Microprocessor
60 pages
Micro Assignment-1 PDF
No ratings yet
Micro Assignment-1 PDF
26 pages
Chapter 1-8086-16 Bit Microprocessor: by Mr. Shinde G. B. M.Tech. (Electronics Engineering)
No ratings yet
Chapter 1-8086-16 Bit Microprocessor: by Mr. Shinde G. B. M.Tech. (Electronics Engineering)
329 pages
Rvfpga Getting Started Guide: The Imagination University Programme
No ratings yet
Rvfpga Getting Started Guide: The Imagination University Programme
102 pages
Adminission Policy 2012 13
No ratings yet
Adminission Policy 2012 13
43 pages
Problems Chapter 17 Parallel Processsing: 17.14 An Application Program Is Executed On A Nine-Computer Cluster. A
No ratings yet
Problems Chapter 17 Parallel Processsing: 17.14 An Application Program Is Executed On A Nine-Computer Cluster. A
4 pages
Module - 02 Architecture
No ratings yet
Module - 02 Architecture
18 pages
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
20 pages
DPCO Unit 4 2mark Q&A
No ratings yet
DPCO Unit 4 2mark Q&A
11 pages
Unit Iv Coa - PPT
No ratings yet
Unit Iv Coa - PPT
92 pages
Ec8501 Digital Communication 1
No ratings yet
Ec8501 Digital Communication 1
13 pages
Cpe626 ARMorganization
No ratings yet
Cpe626 ARMorganization
10 pages
MES Module1 - Notes
No ratings yet
MES Module1 - Notes
17 pages
EE471 Syllabus Schedule 2014
No ratings yet
EE471 Syllabus Schedule 2014
3 pages
CSC 315 PDF 1
No ratings yet
CSC 315 PDF 1
7 pages
Principles of Designing Pipelined Processor-1
No ratings yet
Principles of Designing Pipelined Processor-1
32 pages
Cid 2 Code
No ratings yet
Cid 2 Code
3 pages
Coa - Bits Answers
No ratings yet
Coa - Bits Answers
4 pages
The Mips R10000 Superscalar Microprocessor
No ratings yet
The Mips R10000 Superscalar Microprocessor
13 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
14 pages
CCE 514 CAT Model Answers 2021-2022
No ratings yet
CCE 514 CAT Model Answers 2021-2022
4 pages
Multiple Choice Questions For Unit 3
No ratings yet
Multiple Choice Questions For Unit 3
3 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Basic Exercises for Competitive Programming: Python
From Everand
Basic Exercises for Competitive Programming: Python
Jan Pol
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet