HW 4
Question 1
Explain the following terms:
- Cannon’s algorithm: Cannon's Algorithm is a well-known parallel matrix multiplication algorithm for distributing the computations of matrix multiplication over multiple processors in a multi-processor system. The algorithm uses a 2-dimensional block-partitioning approach to divide the large matrices into smaller sub-matrices, which are then distributed across the processors for multiplication and accumulation. The results from all processors are then combined to obtain the final product matrix.
- Blocking and non-blocking send/receive: In a blocking send/receive, the sender process will wait until the receiver process has received the message before continuing to execute. In a non-blocking send/receive, the sender process does not wait for the receiver process to receive the message.
- Single Program Multiple Data: A parallel computing approach where a single program is executed on multiple processors or cores simultaneously, each processing its own set of data.
- Loosely synchronous: A type of communication or coordination that is not tightly timed or controlled, but rather is based on approximate or flexible timing.
Question 2
Part 1
Deadlock is possible because both programs try to send a message to each other on line 3. With the blocking send/receive, they both will stop at line 3 waiting for the other thread to receive the message, but apparently, no one has yet to execute the receive command. I will reverse the send-and-receive order in program A to avoid the deadlock.
Part 2
No, it's not safe because program A may not receive the correct data from B when B is yet to send data to A or is in the process of copying to buffer a.
Question 3
Part 1
Initialize matrices and and MPI size and rank.
Initially, each has and . Align elements and by reordering them so that and are on
Each computes for the first round
( and are local on )
For to :
// Shift matrix a left by one
MPI_send(, )
MPI_recv(, )
// Shift matrix b up by one
MPI_send(, )
MPI_recv(, )
Part 2
Initialize matrices and and MPI size and rank.
Initially, each has and . Align elements and by reordering them so that and are on
Each computes for the first round
( and are local on )
For to :
// Send shifted a and b consecutively
MPI_Isend(, )
MPI_Isend(, )
Barrier(all sent)
// Receive shifted a and b consecutively
MPI_Irecv(, )
MPI_Irecv(, )
Barrier(All received)
Part 3
Data communicated in each iteration is
We have iterations, so in total we have