mr-edd.co.uk :: horsing around with the C++ programming language

Adding determinism to multithreaded tests

[13rd January 2008]

I haven't posted a whole lot recently, but that's not because I've been inactive. I've recently been assembling a small library of code containing a bunch of threaded queue implementations for use in concurrent patterns. However, I have failed to create any good tests for this code, so I've been reluctant to post the libraries here.

One of the reasons that I've struggled to write tests, is because writing them for threaded code is hard. You have little deterministic behaviour to rely on in a pre-emptive threading environment, such as the current Boost.Threads library that I often use. Thus more often that not, one cannot write unit tests in the conventional sense. Instead one has to write stress tests instead, where you hurl data at your code in order to try to exercise the synchronization elements adequately.

This is something I've only learned relatively recently because I'm still really rather new to both writing formal tests and multi-threading. While I think that writing stress tests is necessary, I would also like to be able to write some more conventional tests in an environment where the threading behaviour is more deterministic. So over the last week or two I've veered off course and have started to implement a new threading library to help solve this problem.

A motivating example

One of the classes in my little library of threaded queues looks something like this:

class task_queue
{
    public:
        typedef boost::function<void> task;

        task_queue(std::size_t num_workers);
        ~task_queue();

        void append(const task &t);

    private:
        // ...
};

It's pretty simple, having only a single member function to speak of (my kinda code!). The idea is that you create a task queue that spins up num_workers worker threads to churn through the tasks that you add with the append() member function.

Now, the tasks are supposed to get picked up by the worker threads in FIFO order. This guarantees that each task will be dealt with in finite time, assuming all tasks added eventually terminate. However, it is impossible to test this behaviour when your threads are pre-emptive[1]. Let's look at why this is.

As a first attempt, we might think to try to test FIFO ordering something like this:

bool is_sorted(const std::vector<unsigned> &numbers); // implementation omitted

boost::mutex mtx;
boost::condition finished;
std::vector<unsigned> finished_tasks;

void write_number_to_array(unsigned number, unsigned total_tasks)
{
    boost::mutex::scoped_lock lk(mtx);
    finished_tasks.push_back(number);

    if (finished_tasks.size() == total_tasks)
        finished.notify_one();
}

void test_fifo_ordering()
{
    conc::task_queue tq(10); // uses 10 worker threads

    const unsigned total_tasks = 1000;
    for (unsigned i = 0; i != total_tasks; ++i)
        tq.append(bind(&write_number_to_array, i, total_tasks));

    // Wait for the tasks to finish
    {
        boost::mutex::scoped_lock lk(mtx);
        while (finished_tasks.size() != total_tasks) finished.wait(lk);
    }

    CHECK( is_sorted(finished_tasks) );
}

The idea is that each task is given a number representing the time at which is was created. The task body (the write_number_to_array() function) then appends its number to an array. So in the end we would like to have a sorted array of task numbers because tasks added to the queue earlier should start earlier and thus append their numbers to the array earlier, too.

Alas this is not the case when dealing with pre-emptive threads. Consider the case when we add two tasks to a task_queue. The OS thread scheduler could decide to do this:

Time slices diagram

Here, task2 gets to run before task1 and so the array that we'd get at the end of the test wouldn't be sorted, even though the tasks were added in the opposite order. So the test we wrote above would fail, even though the code is behaving properly; the tasks might get pulled out the queue in FIFO order, but we can't judge that by looking at the order in which their first few instructions are executed.

So long as the OS has control over when context switches occur, this kind of test just isn't going to work. There are a couple of things we could do, here. The first is to do some statistical tests to find out if the data is roughly sorted to within a given confidence, as opposed to checking for a black/white outcome. But the given confidence may vary wildly from system to system or even between OS kernel upgrades if tweaks are made to the threads scheduler.

The second idea, which I think is just crazy enough to work, is to change the underlying threading system so that we can predict (and even control) when context switches occur.

Introducing coco

In the back of my mind, I remembered that a coroutines library for Boost had been developed for Google Summer of Code. Though it isn't an official Boost library yet, it does show that coroutines are generally possible in C and C++, if you're willing to do a little platform-specific tinkering. It turned out that I couldn't use the Boost.Coroutines library as it doesn't quite do enough to allow me to treat the main thread uniformly with the others, but there was no reason not to use the same underlying system APIs.

Windows provides Fibers and a number of POSIX systems allow you to save and restore the state of a stack using the functions in <uncontext.h>.

What I've come up with is coco, a library which has the same interface as Boost.Thread, but implements them in terms of Fibers and the <uncontext.h> functions. It's only really a proof-of-concept at this stage as I've only done the Windows side of things so far, but all the example programs distributed with the latest Boost.Threads work when using the alternative coco headers and library. I believe that the overwhelming majority of correctly written programs using Boost.Threads should work fine with coco, too[2].

To use the library, you shouldn't need to change any existing code. All you have to do is tell the compiler to look at the coco includes directory before boost's and then link against coco's library in addition to the Boost.Threads library. The Boost.Threads library is still needed as my xtime implementation calls in to the boost::xtime code.

How it works

With coco, only a single thread runs at any given time and context switches only occur in a handful of places:

Thus coco is really a covertly cooperative threading environment.

Given these guarantees, we have something to reason with. We can now work out when context switches will occur. In particular, looking back at the motivating example, we know that a context switch will never occur between the time a task is picked up by a thread and the time it appends its number to the array. This is because locking the mutex will never block as each task unlocks it before another has a chance to make an attempt at acquisition.

Still to do

There are lots of odds and ends remaining but one of the bigger chunks of work is of course to create the code for this to work on POSIX systems. I'd also like to firm up the guarantees that coco provides over pre-emptive threads and add some proper tests in addition to the Boost threads example programs.

Some of the more exotic features of the library go untested, such as boost::barrier and the fancier mutex classes.

Show me the code!

The prelimiary code is available for download below.

Included in the zip file is the current coco library code as well as all of the boost examples and some code that illustrates the task_queue example shown in this post.

The code is a little rough around the edges, but available if you'd like to experiment with it nonetheless. I've been successfully using MinGW 3.4.5, MSVC8 and MSVC9 to compile it thus far. slamfiles are included.

Compile like so:

> slam -r variant=release toolset=mingw BOOST_ROOT=/path/to/boost/root compiler_version=34 boost_version=1_34_1

This will compile coco and all the examples. If you're using MSVC8 instead, do:

> slam -r variant=release toolset=msvc8 BOOST_ROOT=/path/to/boost/root compiler_version=80 boost_version=1_34_1

The compiler_version is actually the number that appears in the name of the Boost.Threads static library to link against. If you have a BOOST_ROOT environment variable, you shouldn't need to put it on the command line.

I compiled with MSVC9 by entering the exact same command in a Visual Studio 2009 terminal. You might get some warnings from the boost headers about the compiler not being officially supported, but it seems to work fine regardless.

It appears that Win32 Fibers don't cope well with certain types of debugging data in the object files, so I've been building it without. I hope to narrow down the problematic compiler switches a little more precisely soon. So I recommend that you leave variant=release as-is on the slam command line.

There are also a couple of limitations that you should be aware of at this stage. Currently coco threads doesn't mix well with real threads, so if you do use coco make sure it's only coco that you're using. Avoid layering coco threads on top of native Win32 threads, for example.

Lastly, the implementation of thread_specific_ptr doesn't have the exact same clean-up ordering as the same facility in Boost.Threads; the only guarantee you get is that thread-local data held by thread_specific_ptrs is cleaned up when the thread exits, unless its the last living thread, in which case it's cleaned up on program exit.

Enjoy! Please let me know what you think.

Downloads

Footnotes
  1. meaning that the operating system gets to decide when to give each thread some time on a processor, and not the application []
  2. I'll firm up this statement in a future release, but any program that relies on a thread winning a race in order to behave correctly will most likely not work with coco threads []

Comments

Tim Finer

[12/10/2009 at 15:44:00]

Hello Edd,

I know that this is an old post, but this is a fertile subject with a lot of value.

Two things I’ve found invaluable in this very subject that you might not have heard of.

Windows only CHESS: http://research.microsoft.com/en-us/projects/chess/

Everyone else: Valgrind using helgrind: http://valgrind.org/docs/manual/hg-manual.html

Regards,
Tim

Edd

[14/10/2009 at 00:43:00]

Hi Tim! I came across Chess not-so-long-ago but I forgot about it until now. It looks very interesting and I see why you suggested it. Indeed I thought at the time that one might be able to use coco to do a similar thing. I do plan to spend some more time on this in future, so we'll see...

helgrind is entirely new to me. I'll check it out -- thanks!

(optional)
(optional)
(required, hint)

Links can be added like [this one -> http://www.mr-edd.co.uk], to my homepage.
Phrases and blocks of code can be enclosed in {{{triple braces}}}.
Any HTML markup will be escaped.