mr-edd.co.uk :: horsing around with the C++ programming language

thread_local: a read/write lock example

[4th May 2010]

On discovering that boost::thread_specific_ptr works in what I would consider a dangerous fashion, I wrote my own thread-local storage facility. It's the first time I've had the chance to use the newer mutex types introduced in recent releases of boost.

I think the code makes quite a nice self-contained example of how to create a so-called read/write lock in terms of the primitives provided by the boost thread library, so I decided to post it here.

Update: the use of upgradable locking mechanism used in this article is incorrect. In fact, I believe it's safe, but only because it's a glorified unique_lock in disguise. The 'proper' way to do the go-from-shared-lock-to-unique-lock dance, as far as I can tell, is simply to release your shared_lock before acquiring a unique_lock. I don't believe there's a nice way of doing this without forfeiting your 'turn', at least not for the code in this article. Here's a discussion I started on the boost mailing list to get a better handle on the situation.

The differences

The first and biggest difference from boost::thread_specific_ptr is that the thread_local class presented here can be used outside of static storage. The lifetime of the data stored by a thread_local for all threads is bounded by the lifetime of the thread_local object itself. The lifetime semantics for objects held by a boost::thread_specific_ptr are somewhat more subtle. I've never really been a fan of them, truth be told.

The second difference is that each thread that accesses a thread_local object initially sees a copy of a prototype object passed to the constructor:

thread_local<int> tli(5);

void print_tli()
{
    std::cout << tli << '\n';
}

int main()
{
    print_tli(); // prints 5
    ++tli;

    boost::thread t(&print_tli); // thread's call to print_tli will also print 5
    t.join();

    return 0;
}

The next difference is that you're able to access all the data items associated with a given thread_local object. This is particularly useful if you're implementing some kind of parallel algorithm where each thread computes part of the output and their data are combined at the end:

thread_local<int> sum;

// Some parallel algorithm where each thread
// computes a partial sum, perhaps.
// ...

thread_local<int>::data_range range(sum); // effectively locks sum
int total_sum = std::accumulate(range.begin(), range.end(), 0);

Lastly, the code also provides a thread_local_reference_wrapper class, which acts much like boost::reference_wrapper, except that the references that are obtained via the wrapper are to thread-local objects. Again, this is useful for writing parallel algorithms where you want to pass a reference to some thread-local data:

void kernel(int &x);

thread_local<int> tlint;

// ref() creates a thread_local_reference_wrapper
parallel_for_each(b, e, boost::bind(&kernel, ref(tlint)));

There are probably some performance differences too, but I didn't do any benchmarking. I personally consider the current boost::thread_specific_ptr implementation to be a broken design rendering performance comparisons pointless.

Read/write locks

So I mentioned at the start that thread_local uses a read/write lock. What is a read/write lock? It's something of a misnomer, actually and should probably be called a read/write mutex.

Let's say that we have an object that we're going to be reading from frequently but modifying infrequently. Reads and writes can occur in any number of threads. Using a simple mutual exclusion mechanism such as an exclusive lock on a regular mutex would cause a lot of unnecessary contention; most threads are spending most of their time reading the object and concurrent reads are safe so the synchronization provided by the lock isn't required under these conditions. However, some kind of synchronization really is necessary when one of those write operations comes along.

A mutex with read/write locking capabilities provides us with two types of lock which enable a less conservative but equally safe synchronization mechanism. The first is a read lock. If two or more threads hold a read lock, they are all granted access to the shared object (though they must only read from the object — this must be enforced by you, in your code). However, if a thread attempts to take a write lock, it will not be acquired until all read locks are released. Once a write lock is obtained, the acquiring thread has exclusive access to the shared object(s). All attempts to acquire either a read lock or a write lock will block until the existing write lock is released.

The use of read/write locks in thread_local

The most important implementation detail of the thread_local class is that the data for each thread is stored inside a map:

template<typename T>
class thread_local : public boost::noncopyable
{
    public:
        thread_local(const T &prototype = T());

        // Accessors
        T &get();
        const T &get() const;
        operator T &();
        operator const T &() const;

        // ...

    private:
        const T prototype_;
        mutable std::map<boost::thread::id, T*> data_; 
        // ...
};

Ignoring synchronization for the moment, the basic idea is as follows: when a thread first attempts to obtain a reference to its datum, it looks to see if there's an entry in the data_ object corresponding to the boost::thread::id of the calling thread. If not, a copy of the prototype_ object is added to data_ and a reference to it is returned. On the other hand if there's already an entry in data_ for the calling thread, a reference to the existing data is returned.

Note that each thread that has access to the thread_local object only needs to insert a key/value pair once. So writes are infrequent in the general case. Once a thread has accessed its datum for the first time, it not longer needs to modify the map. It may of course modify the corresponding T object, but the map is entirely oblivious to this. As far as the map is concerned, any access made by a thread beyond the first will be for the purposes of reading the appropriate T* value in the map. So we have a situation that suits a read/write lock rather well.

The implementation

The accessor functions are all implemented in the same way, as a call to get_impl(), shown here:

template<typename T>
class thread_local : public boost::noncopyable
{
    private:
        typedef std::map<boost::thread::id, T*> map_type;

        T &get_impl()
        {
            const boost::thread::id current = boost::this_thread::get_id();

            // Take a 'read lock'. Once all threads have accessed their local data at least once,
            // there will be no further contention.
            boost::upgrade_lock<boost::shared_mutex> readlock(mtx_);

            // Is there already an entry in data_ corresponding to this thread?
            for (typename map_type::iterator b = data_.begin(), e = data_.end(); b != e; ++b)
                if (b->first == current)
                    return *(b->second);

            // There's nothing in data_ for this thread, yet, so we create it
            std::auto_ptr<T> datum(new T(prototype_));

            // To insert the datum, we need to have a 'write lock'. 
            // We can upgrade our existing read lock for this purpose.
            boost::upgrade_to_unique_lock<boost::shared_mutex> writelock(readlock);
            data_[current] = datum.get();

            // The datum has been added without exception, it is now owned by this
            // thread_local object (rather than the auto_ptr)
            return *datum.release();
        }

    private:
        const T prototype_;
        mutable boost::shared_mutex mtx_;
        mutable map_type data_;
};

The comments explain most of what's going on in there. The main thing to note is that boost::shared_mutex is a type of mutex that has distinct read- and write-locking mechanisms, though boost calls these shared locking and unique locking respectively. A read lock is acquired by constructing a boost::upgrade_lock, passing the shared_mutex to the constructor. We could also use a boost::shared_lock in order to acquire a read lock, but an upgrade_lock allows us to upgrade our read lock a write/unique lock if needed. We'll see this come into play soon.

While the upgrade_lock is held, we look to see if there's an entry in data_ that corresponds to the current_ thread. If there isn't we need to created the datum and add it to data_. This addition requires the protection afforded by a unique/write lock.

So a boost::upgrade_to_unique_lock object called writelock is constructed from our existing readlock object. While writelock exists, we have unique access to the invariants protected by the shared_mutex. So we go ahead an add a copy of prototype_ to data_ for the current_ thread and return a reference to it.

The auto_ptr juggling is for exception safety. Note that by creating the copy of prototype_ before acquisition of the write lock, we minimize the time for which exclusive access is required.

thread_local<>::data_range

To iterate over the data stored by a thread_local for each thread that has accessed it, a thread_local<>::data_range object can be used. The implementation is simple. It merely takes a write lock on construction and releases it on destruction.

class data_range : public boost::noncopyable
{
    public:
        data_range(const thread_local &tl) : tl_(tl), lock_(tl.mtx_) { }

        const_iterator begin() const; 
        const_iterator end() const;

    private:
        const thread_local &tl_;
        boost::unique_lock<boost::shared_mutex> lock_;
};

It is expected that the data only be examined infrequently using this mechanism, usually once all threads have stopped accessing the thread_local object. For this reason, a unique_lock shouldn't introduce any undue contention. Note that an upgrade_lock or shared_lock would not be sufficient here. We don't want any threads modifying their T objects during iteration over the data_.

The implementations of data_range::begin() and data_range::end() construct custom iterator objects that expose references to the underlying T objects. I'll leave you to peruse the source code if you'd like to find out more about those.

The code

Please let me know if you spot any errors. I must admit that I've only given it the lightest of testing (against boost 1.42). The full source code is available below.

Downloads

Comments

Tom

[22/10/2010 at 06:29:48]

Hi Edd,

Do You know what difference/advantages/disadvantages are between

mutable boost::shared_mutex mtx_;
boost::upgrade_lock<boost::shared_mutex> readlock(mtx_);
boost::upgrade_to_unique_lock<boost::shared_mutex> writelock(readlock);

and

boost::shared_mutex mutex_;
boost::unique_lock< boost::shared_mutex > writer_lock(mutex_);
boost::shared_lock< boost::shared_mutex > reader_lock(mutex_);

Edd

[23/10/2010 at 13:11:38]

Hi Tom,

The main difference is that the second set of code will effectively deadlock (assuming it's run on a single thread inside the same scope). There's already a unique_lock acquired on mutex_ when the shared_lock is created, so the shared_lock will block indefinitely waiting for the unique_lock to be released.

The whole point in an upgrade_lock is that it may be upgraded to a lock that has sole access to the associated data/invariants without having to release the upgrade_lock first.

The mutable qualification is there to allow a lock to be taken out on mutex_ in the non-const member functions of thread_local.

Note that in the code for thread_local::get_impl(), no modification is done on any of the member variables while the upgrade_lock is held on the shared_mutex. This allows multiple reader threads to read the data simultaneously. But as soon as we've found that we need to modify thread_local's members, we have to ensure that the current thread has exclusive access. Hence the use of upgrade_to_unique_lock.

Does that answer your question?

Tom

[25/10/2010 at 07:40:04]

Hi Edd,

>The main difference is that the second set of code will effectively deadlock

I know ;-)

>(assuming it's run on a single thread inside the same scope). There's already
> a unique_lock acquired on mutex_ when the shared_lock is created, so the
>shared_lock will block indefinitely waiting for the unique_lock to be released.

I didn't want you to compare exactly this three lines. I asked about comparison between two corresponding pairs; boost::upgrade_lock - boost::shared_lock and boost::upgrade_to_unique_lock - boost::unique_lock.
I will be more precise next time.

>This allows multiple reader threads to read the data simultaneously.
>But as soon as we've found that we need to modify thread_local's members,
> we have to ensure that the current thread has exclusive access. Hence
>the use of upgrade_to_unique_lock.

so, my question again. why don't use boost::shared_lock and boost::unique_lock?
it works fine for multiple readers one writer scheme (typical usage bellow).

class ldBaseEvent {
public:
  typedef boost::unique_lock< boost::shared_mutex > writer_lock;
  typedef boost::shared_lock< boost::shared_mutex > reader_lock; 

  ldBaseEvent (int tp, bool periodic) {
    eventStatus_ = EVENT_STATUS_NONE;
    timePeriod_ = tp;
    periodic_ = periodic;
  };

  virtual ~ldBaseEvent() {
    cancel();
  }

  virtual void doWork() = 0;

  void setStatus(EventStatus status) {
    writer_lock lock(mutex_);
    eventStatus_ = status;
  }

  EventStatus getStatus(){
    reader_lock(mutex_);
    return eventStatus_;
  }

  void cancel()
  {
    writer_lock lock(mutex_);
    eventStatus_ = EVENT_CANCELLED;
  }

  bool IsPeriodic()
  {
    return periodic_;
  }

  int getTimePeriod()
  {
    return timePeriod_;
  }

private:
  boost::shared_mutex mutex_;
  EventStatus	eventStatus_;
  int timePeriod_;
  bool periodic_;
};

Edd

[25/10/2010 at 18:04:49]

Hi Tom,

I don't see anywhere in your code where you convert from a reader lock to a writer lock while the reader lock is still held. That's what upgrade_to_unique_lock is all about -- we don't have to forfeit our "turn" in order to change from being a reader to being a writer. In your code, it doesn't look like there's a need for this kind of thing, though.

But I still have the unsettling feeling that I'm missing your point... let me know :)

Tom

[29/10/2010 at 07:56:53]

aloha,

>convert from a reader lock to a writer lock while the reader lock is still held

Here is a point - at last I have understood this...

Thanks Edd for spending time on explanation

Lindley

[03/02/2011 at 17:12:53]

I believe you may have missed a subtlety of upgrade_lock----if you never use a shared_lock on a particular shared_mutex, then upgrade_lock isn't actually getting you concurrent (reader) access to the mutex!

To quote the Boost documentation for the UpgradeLockable concept:
"a single thread may have upgradable ownership at the same time as others have shared ownership."

The problem is, of course, deadlocks. Trying to upgrade two readers into writers at the same time would produce a deadlock, which is why upgrade_mutex exists in the first place. While one thread holds an upgrade_lock, other threads may obtain shared_locks, but *not* additional upgrade_locks!

The best practice is then to use a shared_lock for the majority of your reader code. Use an upgrade_lock in a separate code path that won't be hit until you know that a write lock will soon be necessary in this thread, but while there is still work to be done which doesn't quite require the write lock yet.

Edd

[03/02/2011 at 21:15:58]

Hi Lindley. I'm absolutely willing to accept I've got something wrong here, but I'm a little confused about the points you're making. Could you clarify?

"The problem is, of course, deadlocks. Trying to upgrade two readers into writers at the same time would produce a deadlock".

Could you explain why? I'm assuming by "at the same time", you mean concurrently in different threads and not in the same scope of a single thread. I'm aware that this second meaning would produce deadlock, but I don't do this in the code.

I'd assume if thread 1 became a writer, thread 2 would block if it also tried to become a writer, but only until thread 1 was done with whatever it was writing.

So perhaps you mean "block" rather than "deadlock"?

"... which is why upgrade_mutex exists in the first place. While one thread holds an upgrade_lock, other threads may obtain shared_locks, but *not* additional upgrade_locks!"

I don't see an upgrade_mutex? Possibly just a typo but want to make sure (all the kinds of locks and mutexes make my head spin sometimes).

But assuming you did indeed mean "block" rather than "deadlock" earlier, I think I'm beginning to see your point now.

"The best practice is then to use a shared_lock for the majority of your reader code. Use an upgrade_lock in a separate code path that won't be hit until you know that a write lock will soon be necessary in this thread, but while there is still work to be done which doesn't quite require the write lock yet."

So let's see if I understand this; you're suggesting:

shared_lock readlock(m);
// do reads

// Ah, I need to write!
upgrade_lock intermediate(m);
upgrade_to_unique_lock writelock(intermediate);
// do writes

So I'm left wondering what's the point of there being an upgrade_lock class?

For this particular piece of code, it would make more sense if I could just create an upgrade_to_unique_lock from a shared_lock or shared_mutex. The only reason I'm creating an upgrade_lock is to act as a stepping stone to getting an upgrade_to_unique_lock. Is there some other use of upgrade_lock that I'm missing?

Lindley

[04/02/2011 at 07:09:12]

I was just working through these details myself. This page helped a lot:
http://home.roadrunner.com/~hinnant/mutexes/locking.html

Consider two threads which both own read locks. Both threads desire to upgrade to exclusive (write) locking. Both detect that another thread holds a read lock, and block until it releases----deadlock.

The way around this is to introduce the upgrade_lock. (I did indeed make a typo in calling it an upgrade_mutex before, although, as it happens, the page I linked to above discusses the possibility of making this a separate type from shared_mutex.) An upgrade_lock is a special type of read lock which is capable of upgrading to exclusive ownership. In order to avoid the deadlock situation, only one thread may own an upgrade_lock on a shared_mutex at once! However, other threads may own shared_locks as the same time. Therefore, when the upgrade_lock is upgraded to an exclusive lock, it is guaranteed that no other threads will be trying to do the same thing on the same lock. It may have to block for a while but no deadlock is possible.

An alternative approach would be to use try-to-upgrade semantics which may fail if a potential deadlock would be created. Sometimes this is more appropriate, sometimes not.

As an example of when an upgrade_lock makes sense, consider a folder full of images that we need to access by name. Reading from the disk is slow, so we'd like to cache them in memory; but memory is limited, so we assume that sometimes we'll need to do a read. The pseudocode might look like:

static map<string,image> cache;// somewhere global

shared_lock readlock(mutex);
// check to see if the requested image is in memory
auto iter = cache.find(filename);
if (iter != cache.end())
    return iter->second;
// return it immediately if so, releasing the read lock

// We only reach this point in the code if it's not in memory!
// We don't need to block other queries just yet though, because reading from disk is slow. However, we do want to prevent multiple threads from trying to read images at once; that would just slow things down!
upgrade_lock upgradablelock(std::move(readlock));
// read image from disk into a local object
image tmp(filename);
// Now we're ready to update our data structures, obtain a write lock
upgrade_to_unique_lock writelock(upgradeablelock);
// move image data into searchable cache data structure
auto iter = cache.insert(make_pair(filename,std::move(image)));
// possibly remove old cache entries

return iter->second;

Note: As of Boost 1.44, there are problems using upgrade_to_unique_lock in combination with C++0x. I believe 1.44 fixes the issue on gcc 4.5.0, but it's still present with VS2010.

Edd

[05/02/2011 at 15:38:17]

"Consider two threads which both own read locks. Both threads desire to upgrade to exclusive (write) locking. Both detect that another thread holds a read lock, and block until it releases----deadlock."

And from the article you linked to:

"When that thread upgrades, all other threads with shared ownership must have given up their shared ownership. Only then can a thread convert from shared ownership to exclusive ownership. If two threads are waiting to upgrade, then there is a deadlock because they are each waiting on the other to relinquish their shared ownership before exclusive ownership is possible."

Lindley, it has now 'clicked'. Thanks for your patient response!

I was somehow working under the illusion that the upgrade mechanism was just a simple lock-hierarchy (read-lock followed by write-lock) without any intermediate unlocking. But of course that wouldn't solve anything as it would allow concurrent reads and writes under some circumstances.

Your folder-of-images example also clearly shows the need for the upgrade_lock, so thanks very much for that too!

I'll do some more reading to make sure I've got things straight and then update the code in this post.

Edited to add:

Hmm, I'm still not getting this. The following program locks-up (boost 1.39, Visual C++ 2008):

#include <iostream>
#include <iomanip>
#include <boost/thread/shared_mutex.hpp>
#include <boost/thread/locks.hpp>

int main()
{
    using namespace boost;

    shared_mutex m;
    shared_lock<shared_mutex> readlock(m);
    upgrade_lock<shared_mutex> uglock(m);
    std::cout << "here" << std::endl;
    upgrade_to_unique_lock<shared_mutex> writelock(uglock); // <-- deadlock?

    return 0;
}

Alexander

[18/02/2011 at 07:48:19]

To me it looks like in your last code snippet, upgrade_to_unique_lock is waiting for the shared_lock/readlock to be released. It somehow makes sense: while there are still readers, so there should not be any writer.

Edd

[20/02/2011 at 18:03:45]

I added a paragraph near the start of this article explaining that my proposed use of upgrade_to_unique_lock is bunk. Also see the link to the thread on the boost developers mailing list.

Howard Hinnant

[10/07/2011 at 01:42:50]

Untested code using the upgrade_mutex from http://home.roadrunner.com/~hinnant/mutexes/locking.html shown below (hope I got the markup right).

template<typename T>
class thread_local : public boost::noncopyable
{
    private:
        typedef std::map<boost::thread::id, T*> map_type;

        T &get_impl()
        {
            const boost::thread::id current = boost::this_thread::get_id();

            // Take a 'read lock'. Once all threads have accessed their local data at least once,
            // there will be no further contention.
            ting::shared_lock<ting::upgrade_mutex> readlock(mtx_);

            while (true)
            {
                // Is there already an entry in data_ corresponding to this thread?
                typename map_type::iterator b = data_.find(current);
                if (b == data_.end())
                {
                    // No, there is not.
                    // Try to obtain upgrade ownership
                    ting::upgrade_lock<ting::upgrade_mutex> upgradelock(std::move(readlock));
                    if (upgradelock.owns_lock())
                    {
                        // This is the one thread that was able to get upgrade ownership.
                        // Now wait until all other threads release shared ownership:
                        std::unique_lock<ting::upgrade_mutex> exclusivelock(std::move(upgradelock));
                        // This thread now has exclusive ownership.
                        // Create entry and then notify all sleeping threads that the entry exists.
                        std::auto_ptr<T> datum(new T(prototype_));
                        b = data_.insert(std::make_pair(current, datum.get())).first;
                        datum.release();
                        cv_.notify_all();
                        break;
                    }
                    else
                    {
                        // Some other thread is trying to create an entry for current.
                        //   So sleep until that work is done.
                        cv_.wait(readlock);
                        //   And then try again
                        continue;
                    }
                }
                // Yes, there is an entry
                break;
            }
            // There is now an entry in data_ corresponding to this thread
            return *(b->second);
        }

    private:
        const T prototype_;
        mutable ting::upgrade_mutex mtx_;
        mutable std::condition_variable_any cv_;
        mutable map_type data_;
};

anonymous

[10/07/2011 at 01:46:36]

Think-o in the try-conversion from shared ownership to upgrade ownership, should be:

                    ting::upgrade_lock<ting::upgrade_mutex> upgradelock(std::move(readlock), std::try_to_lock);

(forgot the try_to_lock)

Edd

[14/07/2011 at 21:35:47]

Thanks, Howard! I think that just about wraps it up :)

(optional)
(optional)
(required, hint)

Links can be added like [this one -> http://www.mr-edd.co.uk], to my homepage.
Phrases and blocks of code can be enclosed in {{{triple braces}}}.
Any HTML markup will be escaped.