mr-edd.co.uk :: horsing around with the C++ programming language

Parallel computing is now. Parallel programming isn't

[13rd February 2007]

Intel have recently announced that they are working on a processor with 80 cores which is expected to see the light of day within 5 years.

Many people predicted this would happen but what I think a lot of people don't yet realise is how much of a profound change this will have on the software programming landscape.

If you haven't come across Herb Sutter's classic article The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software by now, I recommend you read it ASAP. I've been closely following Sutter's talks and presentations over the last couple of years and this post is effectively a summary of his very important writings. There's a video of a presentation of his along with some slides available on the Intel research website, which is also worth a look when you have an hour or so to spare.

The long and the short of it is that our software will only see speed improvements on new hardware if we the programmers write our code in an appropriate way. This isn't just another hardware innovation that's orthogonal to programming methodology. As Sutter says, this really is more of a software revolution, even though it's the hardware that is changing.

To take full advantage of the kinds of chips that Intel are planning, we need to code our software to do 80 things at once. Or more accurately to be able to do N things at once, where N is any positive number. The number of cores in a CPU will be a moving target — the primary means of increasing processing power from now on will be the addition of more cores to CPUs.

But our programming languages just do not provide support for automatically distributing the workload across multiple threads, and nor do our compilers. As Sutter points out, there are a couple of classes of application that lend themselves to multi-core processing such as servers, databases and image processing software, but in general it's hard to see how to map the work of your average application on to a large number of cores. As he says, most applications aren't Photoshop!.

We need languages (or extensions to existing languages) that are able to express what Sutter calls latent parallelism; we need the compiler to be able to identify chunks of code that can be run in parallel, perhaps with a little help. But designing these languages/extensions/compilers isn't exactly a walk in the park.

Usually a compiler won't know what a call to a function will do, at least not until link time. And that's in the best-case scenario. Only a handful of compilers can do the kind of global analysis we're talking about at this point in time so it is perhaps unreasonable to force the burden of implementing such a feature on to all C++ compiler vendors, at least not without additional support in the language definition.

What this means is that the average compiler is currently unable to reason about the correctness of the potential parallelization of a given function call, let alone analysing any potential efficiency gains that would result in doing so. And even if a compiler is capable of global analysis, it still won't be able to deduce anything about the safety and benefits of calls to virtual functions because these calls by their very nature are 'hooked up' at runtime.

Of course, in dynamically typed languages the situation is even worse! The high degree of polymorphism that is their strength in the single-threaded world is a serious hindrance in the multi-threaded universe.

So compilers just can't help us at the moment. And the same goes for the hardware. Instead it is the programmer who has to deduce what can be done on a separate thread to speed up their software. It is also the programmer who must deduce if it is 'safe' to do a given task in parallel to another (or another 79!). And if you've ever done any multi-threaded programming, you'll know that this kind of deduction ain't easy. The human brain just isn't geared to thinking about concurrent tasks. Threading errors are some of the easiest to make and hardest to debug. Hmm, women are reportedly better at multi-tasking — perhaps we'll be seeing more female programmers after the concurrency revolution!

So new features are needed in our languages (at least for the men!) that communicate information on synchronization and parallelism primitives so that code can be optimised effectively and checked for correctness at compile time, before it's too late. That is, we ideally need compiler support for 'thread-correctness', much like have support for const-correctness, now.

But I for one can't see how this can be done. Currently the C++ standards committee is scrambling to add proper support for threads to the language definition. And not a moment too soon. But even though we'll be able to kick off extra threads in a standardised way, we still won't have any compile-time correctness constructs for concurrency.

This isn't a C++-centric problem, however. None of the real-world languages in use today (that I'm aware of) have the features needed to manually ensure correct multi-threading programming, let alone parallelize code automatically on the programmer's behalf. Sure a couple of languages have keywords based around locks and mutexes, but using locks alone in an application with tens or hundreds (or thousands?) of threads is a recipe for thread-starvation unless extreme care is taken from the outset (i.e. in the macroscopic design). There are also extensions such as OpenMP that make it somewhat simpler to run simple tasks in parallel, but it is still entirely on the programmer to ensure runtime correctness.

So what will the future hold for our beloved programming languages? And how can we adapt our existing practises and languages to cope with large numbers of cores? This isn't a C++-centric problem, however. None of the real-world languages in use today (that I'm aware of) have the features needed to manually ensure correct multi-threading programming, let alone parallelise code automatically on the programmer's behalf. Sure a couple of languages have keywords based around locks and mutexes, but using locks alone in an application with tens or hundreds (or thousands?) of threads is a recipe for thread-starvation unless extreme care is taken from the outset (i.e. in the macroscopic design). There are also extensions such as OpenMP that make it somewhat simpler to run simple tasks in parallel, but it is still entirely on the programmer to ensure runtime correctness.

So what will the future hold for our beloved programming languages? And how can we adapt our existing practises and languages to cope with large numbers of cores?

Comments

Florian

[19/03/2007 at 22:31:00]

I think the blackboard model of programming looks very promising and might really take off after the 'concurrency revolution'. This doesn't call for a new compiler or language; it solves a lot of design problems related to concurrent programming, distributes load evenly, and scales well. Don't know of an implementation for C/C++, but there's Javaspaces, Gigaspaces and IBM's T-Spaces. Really interesting.

(optional)
(optional)
(required, hint)

Links can be added like [this one -> http://www.mr-edd.co.uk], to my homepage.
Phrases and blocks of code can be enclosed in {{{triple braces}}}.
Any HTML markup will be escaped.