mr-edd.co.uk :: horsing around with the C++ programming language

An asynchronous function call mechanism: the final piece

[22nd July 2007]

In this post, I'll be tying up all the loose ends in the implementation of async::call(), a series of function overloads that allow us to make function calls in a new thread and receive the result when we see fit, using futures. Here's a quick example:

std::size_t count_primes_upto(std::size_t max); // some function we'll call in another thread

using namespace async;

layered_cage<std::exception, catch_all> cage;
future<std::size_t> fut = call(cage, &count_primes_upto, 9876543210); // call function in another thread

// do other stuff in the mean time
// ...

std::size_t num_primes = fut.value(); // get the result, blocks until ready  

In the previous two posts I made on this subject, we looked at how to implement the layered_cage<> and future<> template classes respectively. It would probably be a good idea to read those entries before this one, as I'll assume you're familiar with these classes.

In this post I'll describe how the actual overloads of the async::call() function are implemented and some of the intricacies involved. I'll also present a preliminary version of the async library for you to play with.

EDIT 5th August 2007: async now has it's own permanent home, here. If you're more interested in the usage of the library, as opposed to the implementation, you should go there instead.

The first overload of async::call()

The first thing that needs to be done is to implement the overload of async::call(), which calls a functor with 0 arguments.

You'll recall that previously we saw that if we had a nullary functor[1], we could construct a suitable future from it.

template<typename Invoker, typename Functoid>
future<BLANK>
call(const Invoker &invoker, const Functoid &func)
{
    typedef typename BLANK param_type;

    std::auto_ptr<future_worker<BLANK> > strat(
        new call_future_worker<Invoker, BLANK>(invoker, func)
    );
    return future<BLANK>(strat);
}

So the implementation of this functor is reasonably simple, once we're able to fill in the BLANKs. Also, by using boost::bind(), we can generate the code for the remaining overloads (handling functors that take 1 or more arguments) by binding those arguments to create a nullary functor that we may pass to this initial overload of async::call(). Roughly:

template<typename Invoker, typename Functoid, typename ArgType0>
future<BLANK>
call(const Invoker &invoker, const Functoid &func, const ArgType0 &a0))
{
    return call(invoker, boost::bind(func, a0)); // call the overload that takes a nullary functor
}

// and others for 2 and more arguments...

Evidently, we need to find the return type of func and modify it suitably using a little meta-programming to fill in the BLANKs.

The return type of a functor

Using boost::function_traits<> we are able to get the return type of pretty much any function e.g.

typedef function_traits<double (void)>::result_type type1; // type1 is double
typedef function_traits<const char * (int, const std::string &)>::result_type type2; // type2 is const char *
// etc ...

However, for arbitrary an functor, such as an object of a class with an operator() (a.k.a call operator), it is currently impossible to gather the return type without something like a typeof extension[2].

That said, good practise dictates that we should add a result_type typedef to all our custom functors. This is what various elements of the standard library do, including std::unary_function, std::binary_function, std::mem_fun and so on. boost::function<>s and the functors returned by boost::bind() also have this inner typedef.

So we can support a large class of incredibly useful functors by relying on the result_type if we aren't dealing with a native function. Therefore the plan is to construct a meta-function that will yield boost::function_traits<T>::result_type if T is a function type, else T::result_type. We can achieve this with partial template specialisation:

template<bool IsFunction, typename Functoid>
struct functor_return_type
{
    typedef typename Functoid::result_type type;
};

template<typename Functoid>
struct functor_return_type<true, Functoid>
{
    typedef typename boost::function_traits<Functoid>::result_type type;
};

// now we can get the return type of a functor as follows:
typedef functor_return_type<boost::is_function<my_functor_type>::value, my_functor_type>:: type result_type

Removing return type decorations

Now that we know what the return type of the functor is, we're closer to filling in the BLANKs. But we're still not quite there yet.

To get the future's template parameter we also need to strip off any reference and const/volatile qualification parts that exists. Why is this?

Recall that a future<T>'s value() member function returns a const T&. We can't currently have references to references in C++[3], hence we need to strip the reference part off the functor's return type. Secondly, it's too dangerous to allow references to be returned from a child thread as those references might be dangling by the time the future's value() function is called. Hopefully you'll also remember that our future<T> implementation will be storing a copy of whatever the functor returns before letting the parent thread have it. Removing const/volatile qualification makes this a little easier to do.

There may be circumstances where the function really does need to return a reference. In this case we'll simply have to ask that the user provide a wrapper function that returns a boost::reference_wrapper<> and pass that in to async::call(). I'll talk more about boost::reference_wrapper<> a little later on in this post.

So we can implement a couple more meta-functions to remove all decoration from the return type of a functor:

// Given a functor type Functoid, future_parameter<Functoid>::type yields the
// appropriate type to instantiate a corresponding future<>
template<typename Functoid>
struct future_parameter
{
    typedef typename boost::remove_pointer<Functoid>::type wo_ptr; // turn a function-pointer in to a function type
    static const bool is_func = boost::is_function<wo_ptr>::value;

    typedef typename boost::remove_reference
    <
        typename boost::remove_cv
        <
            typename functor_return_type<is_func, wo_ptr>::type
        >
        ::type
    >
    ::type type;
};

// This helper meta-function yields future<T> for some type T that is appropriate to
// handle object returned by a call to an object of type Functoid.
template<typename Functoid>
struct future_type
{
    typedef future<typename future_parameter<Functoid>::type> type;
};

Filling in the blanks

So now were in a good position to fill in the BLANKs.

template<typename Invoker, typename Functoid>
typename future_type<Functoid>::type
call(const Invoker &invoker, const Functoid &func)
{
    typedef typename future_parameter<Functoid>::type param_type;
    typedef typename future_type<Functoid>::type return_type;

    std::auto_ptr<future_worker<param_type> > strat(
        new call_future_worker<Invoker, param_type>(invoker, func)
    );
    return return_type(strat);
}

And with that we have the most important overload of async::call() implemented. The others are built on top of this. We can now call nullary functors asynchronously. In fact, if the user is familiar with boost::bind() they would be able to use this initial overload of async::call() to effectively handle function calls with near enough any number of arguments. For example:

future<X> fut = async::call(cage, boost::bind(arg1, arg2, arg3));

But there are two things I don't like about forcing the user to use boost::bind() in this way:

  1. It makes the library harder to use; the amount of knowledge required to use the library is more than it needs to be if we encourage the use of boost::bind. This library should be usable by as many people as possible.
  2. A naive application of boost::bind() such as the one above produces a surprising amount of copying overhead, as we'll see in a moment. By doing the bind ourselves inside each overload of async::call(), we can reduce the copying overhead significantly

The remaining overloads; efficient argument binding

Here I'll discuss the async::call() overload that takes two arguments to pass on to the functor at the call site. In the real code I've used the boost preprocessor library to generate a series of overloads for various numbers of arguments. I'll present this once we have examined something more concrete.

When I began implementing these overloads, I started out with code that looked something like this:

template<typename Invoker, typename Functoid, typename ArgType1, typename ArgType2>
typename detail::future_type<Functoid>::type
call(const Invoker &invoker, const Functoid &func, const ArgType1 &a1, const ArgType2 &a2)
{
    // call the overload that takes a nullary functor
    return call(invoker, boost::bind(func, a1, a2));
}

This indeed works perfectly correctly, but to my horror I discovered that each argument (a1 and a2 in this case) was being copied about 30 times during the binding process! This isn't so bad for built-in primitive types such as int, but for big containers of data such as std::vector<std::string> > this is a big source of inefficiency.

Now I think that by default, a copy of each argument should be made. The default behaviour of the code should be the safest and it's much safer to give the child thread its own copy of each argument to work with, else the functor could be dealing with dangling references at the call site. In a moment I'll show how the library user can subvert this behaviour and make async::call() treat the functor arguments as references if that's what's really needed.

So make copies by default is good and safe, but making 30 copies is over-the-top, where 1 copy would suffice. I needed a way to improve on this. After bouncing a couple of ideas off of Kirit Sælensminde in a conversation on the boost users mailing list, a solution came to me.

A boost:reference_wrapper<> object is a wrapper around a C++ reference. It provides a conversion/cast operator that yields the reference that it is wrapping. This means that the following two calls to some function func() are semantically equivalent:

void func(std::string &s);

std::string s = "hello!";
func(s);
func(boost::ref(s)); // boost::ref() creates a reference_wrapper<> object for its argument

But the nifty thing about a reference_wrapper<> is that its copyable. This means you can use it in conjunction with facilities such as boost::bind(), which normally make copies of bound arguments, to reduce unwanted copying and make sure that the wrapped functor really receives the correct reference at the call site.

However, even though the solution was inspired by reference_wrapper<> we're not going to use it. This is because we actually want a single copy, whereas boost::ref() doesn't make any copies at all.

So the reference_wrapper<>-inspired solution I came up with was to define a class that forces an initial copy of an argument and act like a reference_wrapper<> thereafter.

template<typename ArgType>
class copy_then_ref
{
    public:
        copy_then_ref(const ArgType &arg) : arg_(new ArgType(arg)) { }
        operator ArgType &() { return *arg_; }

    private:
        boost::shared_ptr<ArgType> arg_;
};

Inside the body of an async::call() overload, we can use copy_then_ref<> as follows.

template<typename Invoker, typename Functoid, typename ArgType1, typename ArgType2>
typename detail::future_type<Functoid>::type
call(const Invoker &invoker, const Functoid &func, const ArgType1 &a1, const ArgType2 &a2)
{
    return call(
        invoker,
        boost::bind(func, copy_then_ref<ArgType1>(a1), copy_then_ref<ArgType1>(a2))
    );
}

When each copy_then_ref<> is created, a copy of its constructor's argument is held internally. When the copy_then_ref<> objects are copied as part of the binding process, they don't copy the object they wrap as the copy is held via a reference-counting smart pointer[4]. copy_then_ref<> contains a cast/conversion operator much like boost::reference_wrapper so that it can be given as an argument to func at the final call site. Now copy arguments is much cheaper for large objects and we only end up making that single copy[5].

Now, as I mentioned earlier, there are cases where the library user knows best and really does need the functor called in the child thread to have one or more of its arguments reference variables in the parent thread. We'll cover this case by allowing our async::call() overloads to take reference_wrapper<>s.

// Make sure the call to my_func receives the actual references to a1 and a2
async::future<int> fut = async::call(cage, &my_func, boost::ref(a1), boost::ref(a2)); 

We'll make special provisions to allow this kind of thing in our implementation.

But haven't we just swapped one inefficiency for another?

The additional overhead incurred by the dynamic allocated and reference counting is a little over-the-top for cheap-to-copy types such int, pointer types and indeed boost::reference_wrappers<>, to name but a few. We need a way to say bypass copy_then_ref<> for cheap-to-copy objects and just let them get copied naturally.

So, we'll implement a template function wrap() such that wrap(arg) returns arg if it is cheap to copy or an appropriate copy_then_ref<> otherwise. Our async::call() overload then becomes:

template<typename Invoker, typename Functoid, typename ArgType1, typename ArgType2>
typename detail::future_type<Functoid>::type
call(const Invoker &invoker, const Functoid &func, const ArgType1 &a1, const ArgType2 &a2)
{
    return call(invoker, boost::bind(func, wrap(a1), wrap(a2)));
}

So let's look at how to create this wrap() function. First we need a way of telling if a given type is cheap to copy. This is taken care of by this simple meta-function.

// cheap_to_copy<X>::value is true if X is a primitive type or a boost::reference_wrapper
template<typename ArgType>
struct cheap_to_copy
{
    static const bool value =
        boost::is_reference_wrapper<ArgType>::value || boost::is_scalar<ArgType>::value;
};

Note that by allowing users to define their own specialisations, they can tell async::call() if their own type is cheap to copy. For example:

template<>
struct cheap_to_copy<my_custom_type>
{
    static const bool value = true;
};

Next we need another meta-function that helps us decide what type of object wrap() should return based on its argument. We can do this by employing partial template specialisation again.

// wrapper_type<X, cheap>::type yields a type that is appropriate to return from wrap
// where 'cheap' should be given as cheap_to_copy<X>::value
template<typename ArgType, bool CheapToCopy>
struct wrapper_type
{
    typedef ArgType type;
};

template<typename ArgType>
struct wrapper_type<ArgType, false>
{
    typedef copy_then_ref<ArgType> type;
};

Now we can define wrap() as follows:

template<typename ArgType>
typename wrapper_type<ArgType, cheap_to_copy<ArgType>::value>::type
wrap(ArgType &arg)
{
    return typename wrapper_type<ArgType, cheap_to_copy<ArgType>::value>::type(arg);
}

With that in place, we've finished the implementation of our particular two-argument async::call() overload!

Generating all the overloads automatically

We need a bunch of additional overloads that take different numbers of arguments to pass on to the functor being called. Well, what's really needed is template varargs support in the language. This is likely to be introduced in the next revision of the standard, but we may not have that available (at least not portably) until sometime in 2009 or beyond.

So for the time being we'll make use of the rather natty boost preprocessor library to generate a particular number of overloads. I'm just going to come out with the code that does the job. I'm not versed well enough in the preprocessor library yet to give a good explanation of the code, but with a little research you should be able to work out how I've put it together.

#ifndef ASYNC_MAX_CALL_ARGS
    // Set this to something else if you need more arguments available for async::call()
    #define ASYNC_MAX_CALL_ARGS 5
#endif

#define ASYNC_COMMA_WRAP(z, n, a) , wrap(a ## n)

#define BOOST_PP_LOCAL_MACRO(n)                                                                             \
    template<typename Invoker, typename Functoid, BOOST_PP_ENUM_PARAMS(n, typename ArgType)>                \
    typename future_type<Functoid>::type                                                                    \
    call(const Invoker &invoker, const Functoid &func, BOOST_PP_ENUM_BINARY_PARAMS(n, const ArgType, &a))   \
    {                                                                                                       \
        return call(invoker, boost::bind(func BOOST_PP_REPEAT(n, ASYNC_COMMA_WRAP, a)));                    \
    }

#define BOOST_PP_LOCAL_LIMITS (1, ASYNC_MAX_CALL_ARGS)
#include BOOST_PP_LOCAL_ITERATE()

This generates overloads that pass on 1 to ASYNC_MAX_CALL_ARGS arguments to the functor being called asynchronously. By default ASYNC_MAX_CALL_ARGS is 5, but the user can set this to a different value on their compiler command line if they need access to overloads more arguments.

Further work

So now the library is in pretty good shape; exception-safe, powerful, flexible, fairly efficient and easy to use (I hope!). At the end of this post there's a some code that you can download and try out for yourself.

However I don't consider the library complete just yet. There are a handful of things that I'd like to investigate before making any kind of official release.

  1. Currently a new thread is created for each invocation of async::call(). I'd like to investigate a different system that makes use of a pool of reusable threads. This would avoid the overhead of creating new threads all the time, which is comparatively expensive
  2. I'd like to make some kind of future container where futures can be placed in order to allow them to continue running once the current scope is left
  3. I'm interested in finding a mechanism for safe thread termination
  4. Consider allowing the user to wrap the functor in a boost::ref() to indicate that the return value should be returned in a boost::reference_wrapper<>. The syntax is a little quirky, but it's succinct

Show me the code!

The code is available for download below.

The library is currently a header-only library — you won't need to create any binary libraries in order to use it. You will however need a boost installation and if you use async in your code, you'll need to link against the boost threads library. I've tested it against boost version 1.34 but I expect other recent versions will work, too.

There are a number of illustrative example and test programs to be found in the examples directory of the source archive. So far I've only been able to get these to compile and link with MinGW and MSVC8. Thus far, I've failed to create a universal binary build of boost to test the examples on the Mac. Borland 5.82 and Digitial Mars choke on the boost code, so I couldn't test the code those compilers, either.

To build the examples using slam, use one of the following commands to build with MSVC8 or MinGW, respectively:

> slam toolset=msvc8 compiler_version=80 -r variant=debug
> slam toolset=mingw compiler_version=34 -r variant=debug

Here, the compiler_version is set to the number that appears in the name of the boost library file you would like to link against. You may also want to replace variant=debug with variant=release to create optimised versions of the examples.

Downloads

Footnotes
  1. a functor taking 0 arguments []
  2. roll on C++09! []
  3. though I understand a change has been voted in to the standard that does the obvious thing []
  4. In the implementation shown on this page I use boost::shared_ptr<> but I rolled my own light weight reference counting mechanism in the actual code []
  5. I experimented with a few alternative techniques to avoid the reference-counting overhead entirely. But I couldn't make them exception safe []

Comments

(optional)
(optional)
(required, hint)

Links can be added like [this one -> http://www.mr-edd.co.uk], to my homepage.
Phrases and blocks of code can be enclosed in {{{triple braces}}}.
Any HTML markup will be escaped.