Matlab parallel programming: parfor issues

If you ever used OpenMP, you probably know, that this specification is extremely easy to use and very helpful in many cases. Now, when parallel programming is becoming more and more common, this is definitely one of the ways to go.

Mathworks decided to follow the trend and developed Parallel Computing Toolbox, that has implementation of parallel for (simply parfor). Unfortunately, there are some serious issues with it, and I will describe them further...

0. Simple example

I think no detailed comments required for this example: we just initialize pool of four workers and run the code in parallel:
matlabpool open 4;
a = zeros(4, 1);
parfor i = (1:4)
  a(i) = timeConsumingFunction(i);

matlabpool close;

1. Shared memory

One of the most important issues in parallel programming in general is memory management. When you write in parallel, you usually assume one of two cases:
  • data should be copied to every worker if the workers have physically different memory (for example running in different computers)
  • data should not be copied (shared memory) in case all the workers have access to it already (for example threads in one processor)
I would assume that Matlab gets it all by itself. But no. Then I would assume that when you open local pool of workers, it would have shared memory:
matlabpool open local 2;
Forget about it either. In my case Matlab physically copied every local variable to every worker. Maybe that was because of my settings, but anyway you have to be careful. Huge overhead in case your workers perform small number of operations.
Moral. Be careful about shared memory: it doesn't always work as you expect. It is better to have larger pieces of work for every worker to minimize the overhead.

2. Iterator values

What is wrong with this code?
parfor i = (1:2:9)
Nothing except the fact that Matlab cannot handle non-consecutive iterator values. And also non-increasing or non-integer, as we get from the error message:
The range of a parfor statement must be increasing consecutive integers.
Simple, yet I think it is a little strange and non-obvious. And do not forget it should be row-vector, not column-vector.

3. Sliced variables

Ok then, what is wrong with the following code?
a = zeros(3,1);
parfor i = (2:4)
  a(i-1) = 1;
I'll tell you: we cannot use a(i-1) here. We can only change i-th element in i-th worker. The idea is that it is now easier to control that we will not change one element from two different workers. This is called slicing and the array is then a sliced variable. However, I still don't get it why should I slice exactly at i, and not at i-1.
What you can also change is the whole row (column of a matrix) and the i-th element of a cell array:
a(i) = i;
a(i, :) = i % the whole row
a{i} = i % cell array
That states some huge design limitations: in some problems you have to change multiple elements in one worker, or change the element of the worker that is only linearly connected with i. My advice is to save all the results to a temporary cell array and then have another loop (this time just for loop) to rearrange all the results into needed format.
Moral. Think about your design in advance. Use temporary arrays and rearrange the results afterwords in one thread.

4. Forbidden commands

You cannot use some of the commands inside parfor loop, for example saving something to the disk is restricted. The reason for that is again the same: if you save the same file from two different workers, you will get unpredictable results. Just do not be surprised.

5. Initialization

A simple issue that took me at least half an hour to realize, so I am sharing it with you.
This works perfectly:
a = {};
for i = (1:3)
  a{i} = i;
And this fails:
a = {};
parfor i = (1:3)
  a{i} = i;
Why? Because cell array a should not change size while executing in parallel. Instead it should be initialized in advance. This code will work fine:
a = cell(3);
parfor i = (1:3)
  a{i} = i;

6. More than one multithread job

If you have, let's say two processor in one environment, and each of them has many cores (which is a cluster scenario), you will probably at some point want to run many programs at once, each in parallel. And then everything discussed above will simply fail. That happens because one environment means one pool of workers by default. And if one job opens a pool of 4 workers - the second job will try to access them, causing conflicts. So instead of simple:
matlabpool open 4;
You should use the following pool request:

matlabpool(sched, 4);
The latter will create a pool in a temporary directory, which is separate for every job you ran.


Matlab is a great tool, but it brought me huge pain when it came to parallel programming. Hopefully it will all develop further, but so far have a closer look at what is written here and I hope it will save you quite some time getting around with Matlab parfor.

No comments :

Post a Comment