Parallel programming course

I have spent this whole week in the Computer Science and Informatics building. I wonder how did “informatics” creep into the English language; I was taught in 2002 that there is no such thing as “informatics”. There’s only Computer Science. Term “informatics” was supposed to be used only by mistake. German has “informatik”, Polish has “informatyka”, it’s probably those non-native English speakers who just kept using it until even English people started believing that it’s a legitimate English word. A lie told a thousand times… well, what was I… yes, the course.

The main topic was parallel programming, harnessing multiple processors to solve a single, computationally-intensive task such as a weather forecast or a car-crash simulation. There’s more than that, there are many more problems that you can solve and lots of money you save by simulating things for you instead of doing them for real.

Take the car companies. They need to see how a car will behave when hitting an object of this and this shape, from this and this angle, at this and this speed… experiments of this kind are expensive. You need to actually destroy a vehicle each time. Wouldn’t it be nicer to crash zeros and ones instead of heavy metal? To leave your desktop PC to calculate the impact overnight? Well, it would, except for that your PC doesn’t have enough horse power. Or FLOPS, if you like. It would have taken it a week.

What really counts, is how long do you have to wait for the result. If it takes a week to run the forecast for tomorrow’s weather, there’s no point in running it, really. It only makes sense if you can have it in, say, three hours, so you have time to send to the news.

The problem is, there’s no single processor that can possibly make the forecast for you that fast. Buy the most expensive Intel, AMD or even Cell processor, you’re not even close to what you need. The only way to do it is to use many processors in parallel. You need a stable of machines and make them work together.

And this ain’t easy.

Thing is, your computers won’t know how to work together unless you tell them. Just like your code describes how to solve your problem, it also need to describe how to split it up and collect results. Maybe in trivial cases the computer can figure it out by itself, but I bet it wouldn’t work for fancier scientific calculations.

Basically, that’s what brought me to the course: how to write parallel code.

In case you didn’t know, when one says something like “basically” or “generally”, it means “not”. In this case, the truth is that I went to the course because of the really nice guys from ICHEC I wanted to meet. And it worked! I had very nice lunch time conversations, allowing me to refresh my diminishing social skills. As a side effect, I learned some MPI and OpenMP.

Both are just specifications, standards, allowing multiple implementations to coexist.

OpenMP is something that Core Duo owners might be interested in, because it allows to easily parallelize code on machines with shared memory. What you need to do is to put some preprocessor directives in your code. If you compiler understands OpenMP, you will get a parallel code right away. If not, the directives will be ignored, treated as comments.

#pragma omp parallel for
for (i = 0; i < n; i++) { … }

What’s interesting is that GCC-4.2, which has been just released, supports it! It’s the first open-source compiler to support OpenMP.

That’s cool, but what if 2 cores isn’t enough? You can buy a 16-processor machine, or even more, but it’s going to cost you a fortune. And I don’t think you can really go beyond 32 processors with shared memory. If you need, say, 1000 processors, it’s better to deploy independent machines with private memory. In other words, just take a bunch of PCs and tie them together. To make them work, you need MPI.

MPI stands for Message Passing Interface. It doesn’t require any special compiler; it’s a library you link to and any compiler will do. The library takes care of sending data back and forth. MPI isn’t as nice as OpenMP. It has much steeper learning curve. You need to manipulate data by hand, slice arrays and do lots of additional work. Well, if you want to run your program on 1000 processors, it must have its price.

The ICHEC guys made a good job preparing materials and practicals for the course. It wasn’t all lectures, it was some actual coding too. I didn’t write the main practical from MPI course, I think it was too much for the amount time given, but I might try implementing it at home and running it on my two laptops. Then, I think, I might add MPI and OpenMP to my CV, in a very small font.

Author: automatthias

You won't believe what a skeptic I am.

One thought on “Parallel programming course”

  1. My company uses PyLinda for harnessing 8-way servers for massive number crunching jobs written in Python. PyLinda can also be used to distribute work to multiple servers, alhough we haven’t exercised that option yet.

Comments are closed.