Discussion:
parallelizing an array multiplication problem
(too old to reply)
Kamran Hameed
2008-05-16 17:43:48 UTC
Permalink
folks.. i have a simple C program that calculates the product of two
4*4 matrices. Here is the Code.

#include<stdio.h>
#include<string.h>
void printA(int array[4][4]) {
int i,j;
for( i=0;i<4;i++) {
for(j=0;j<4;j++) {
printf("%*d",6,array[i][j]);
}
printf("\n");
}
}
void mult(int arr1[4][4],int arr2[4][4],int result[4][4]) {

int i,j,k,s;
for(i=0;i<4;i++) {
for( j=0;j<4;j++) {
s=0;
for(k=0;k<4;k++) {
s+=arr1[i][k] * arr2[k][j];
}
result[i][j]=s;
}
}
}
main() {
printf("Serail Matrix Multiplication Program\n");
int arr1[4][4]={{1,2,3,4},{5,6,7,8},{1,2,3,4},{5,6,7,8}}; /*Creating a
4*4
Matrix of Integers*/
/*Creating another 4*4 Matrix of Integers*/
int arr2[4][4]={{11,21,31,41},{1,1,1,1},{1,2,3,4},{5,6,7,8}};
int result[4][4]={0};
printf("Array 1 is ........\n");
printA(arr1);
printf("Array 2 is ........\n");
printA(arr2);
printf("After Multiplication, Result is ........\n");
mult(arr1,arr2,result);
printA(result);
}


Now i want to parallelize it i-e i want the product to be calculated
by different processes but i am a bit lost how to do it?

Any Guide Lines?
Sebastian Hanigk
2008-05-17 06:45:18 UTC
Permalink
Post by Kamran Hameed
folks.. i have a simple C program that calculates the product of two
4*4 matrices. Here is the Code.
[snip]
Now i want to parallelize it i-e i want the product to be calculated
by different processes but i am a bit lost how to do it?
First, your problem is way too small to benefit from
parallelisation. That out of the way, the usual approach is to
slice the matrices into blocks (in your case, you could slice into
1x1-blocks, for example's sake), distribute those blocks over the
available processes and implement a communication-computation strategy.

The CC-strategy is the part where you have to do your work, you could
read up on that e.g. in the ScaLAPACK manual or simply ask Google for
"parallel matrix multiplication".


A few (if a bit old) papers:

Robert van de Geijn \& Jerrell Watts: SUMMA: scalable universal matrix
multiplication algorithm., Concurrency - Practice and Experience 9,
255-274, 9, 1997

Jaeyoung Choi {\it et al.}: PUMMA}: {Parallel Universal Matrix
Multiplication Algorithms} on distributed memory concurrent computers,
Concurrency: Practice and Experience 6, 543--570, 6, 1994

Jaeyoung Choi: A new parallel matrix multiplication algorithm on
distributed-memory concurrent computers, Concurrency: Practice and
Experience 10, 655--670, 10, 1998

Agarwal {\it et al.}: A high-performance matrix-multiplication algorithm
on a distributed-memory parallel computer, using overlapped
communication, IBM J. Res. Dev. 38, 673--681, 38, Riverton, NJ, USA: IBM
Corp., 1994

Agarwal {\it et al.}: A three-dimensional approach to parallel matrix
multiplication, IBM Journal of Research and Development 39, 575--582,
39, 1995

Krishnan \& Nieplocha: {SRUMMA}: A Matrix Multiplication Algorithm
Suitable for Clusters and Scalable Shared Memory Systems, 2005


Shameless plug for works from my former university's group:

Bader {\it et al.}: Parallelisation of Block Recursive Matrix
Multiplication in Prefix Computations, 15, 175--184, Bischof {\it et
al.}: in: Parallel Computing: Architectures, Algorithms and
Applications, 15, IOS Press, 2008

Bader \& Zenger: A Cache Oblivious Algorithm for Matrix Multiplication
Based on Peano’s Space Filling Curve, 3911, 1042--1049, in: Parallel
Processing and Applied Mathematics, 6th International Conference, PPAM
2005, 3911, 2006


Sebastian

Loading...