mr-edd.co.uk :: horsing around with the C++ programming language

So pointers to members aren't useless after all?

[12nd January 2009]

I remember the when I was learning C++ and first came across pointers to members and sitting there in puzzlement, wondering what on earth they could be used for. To this day I still don't really know why they're part of the language. Pointers to member functions are handy sometimes, but pointers to data members?

However, there are a couple of uses that I know of. They're both tricks, C++ slight of hand if you will, that allow you to add a little extra syntactic flair to your code. I suspect that neither was a motivation for including pointers to members in the language, though.

The first one isn't particularly new, but nor is it particularly widely known despite the fact that it's a pretty sweet hack. The second is something I actually rustled up today. It was one of those times where you find yourself thinking man, I wish I could write this yucky expression like this instead and then realize that you actually can.

1. Homogeneous tuple-like objects with indexed access

What the heck does that mean, then?

Let's say you're creating a little colour class[1]. Nothing fancy, just basic storage and access of RGBA components, say.

Something nice and simple like this would often do the trick:

// colour.hpp v1

class colour
{
    public:
        typedef unsigned char component_t;

        // constructor, fills in the members in the obvious way
        colour(component_t r, component_t g, component_t b, component_t a);

        component_t r, g, b, a;
};

But every now and again, you'll find a situation where you want to access the components by an index, rather than by using member access syntax. In other words you'd like to be able to write something like this:

double euclidean_distance(const colour &c1, const colour &c2)
{
    double dist_sq = 0.0;
    for (unsigned i = 0; i != 4; ++i)
    {
        const double delta = c1[i] - c2[i];
        dist_sq += delta * delta;
    }

    return std::sqrt(dist_sq);
}

You can't do this, though because colour doesn't have an operator[]. We could try adding one, perhaps. But how? Our first thought might be to do something like:

colour::component_t operator[] (unsigned i)
{
    BOOST_STATIC_ASSERT(sizeof(*this) == 4 * sizeof(component_t));
    return (&r)[i];
}

It's a bit ugly and the static assertion may of course fail on some systems. We could also turn the problem on its head and store the components in a member array, rather than 4 individual members. But then we'd have to add r(), g(), b() and a() member functions, incurring a bracket tax to access the members by name. It's not a big deal, you might say. I could live with that too, but there's actually no need. We can have our cake and eat it:

// colour.hpp v2

class colour
{
    public:
        typedef unsigned char component_t;

        // constructor, fills in the members in the obvious way
        colour(component_t r, component_t g, component_t b, component_t a);

        component_t r, g, b, a;

        // Array-like indexed access
        component_t &operator[] (unsigned i);

    private:
        typedef component_t colour::* component_ptr; // pointer to member type
        static const component_ptr rgba[4];
};

// colour.cpp

const colour::component_ptr colour::rgba[4] =
{
    &colour::r, &colour::g, &colour::b, &colour::a
};

colour::component_t &colour::operator[] (unsigned i)
{
    return this->*(rgba[i]);
}

If you tend to get a little glazed at the sight of pointer to member syntax, as I do, I'll explain what's going on here. component_ptr is simply a typedef for pointer-to-colour-member-of-type-component_t. A static array of component_ptrs is defined. Because it's static, it takes up no space at all in each colour object. The members of the array are initialized to point to the component_t members of the class.

Now everything's set so that we can get a pointer to member based on an index and dereference it against the this pointer in the operator[].

I thought this was pretty neat when I first saw it. In fact, I still think it's pretty neat! You get a tight memory representation and a choice of syntax with which to access the members — use indexed access or named access to the components depending on what's most appropriate for any given situation.

Of course this isn't restricted to colours. It also works for mathematical vector classes where sometimes you'd like to write v.x, v.y or v.z, but other times you'd prefer v[i]. This is the first context in which I came across this trick. I'm sure there are other examples, too.

2. Assignment of 0

As I said earlier, I dreamt this up today. That's not to say that it hasn't been thought of before. But I haven't come across it anywhere. If you've seen it before, let me know where.

Let's say you have a smart pointer. Something like boost::shared_ptr, will do for the sake of argument. Currently, to take such a pointer and assign null to it, you have to call the reset() member function[2]:

boost::shared_ptr<int> p(new int(10));
assert(p);

p.reset();
assert(!p);

Now, wouldn't it be nice to be able to write p = 0 rather than p.reset()? To do this, we could overload the assignment operator:

shared_ptr &operator= (T *p) const
{
    reset(p);
    return *this;
}

However, this is often frowned upon. It's now easy to assign any old raw pointer to a shared_ptr, whether that raw pointer was dynamically allocated or not.

What's we'd really like is a way to be able to handle the assignment of the literal 0 to an object. As it happens this trick is also useful for mathematical vectors where v = 0 means performing an assignment of the 0-vector to v.

Here's how we can achieve the desired effect:

class vec3D
{
    public:
        vec3D(double x, double y, double z);

        // ...

        vec3D &operator= (int vec3D::*) { std::fill(a, a + 3, 0.0); return *this; }

    private:
        double a[3];
};

That additional operator= provides the hook we need. You can now assign the literal 0 to the vector:

vec3D v(0, 1, 2);

v = 0; // calls the operator= we've defined. Nice!

What's more, 0 is the only possible valid argument. There are no data members in vec3D that are of type int, so it's impossible to make a non-null pointer to pass to the operator (ignoring the hideously sick casts you're currently trying to conjure-up in your head)! This makes the technique perfectly safe to use.

Footnotes
  1. people across the pond will have to put up with the u" in "colour for now, I'm afraid []
  2. actually, the trick used to make the conversions-to-bool work for the assertions in this example typically use a pointer to member, but it’s a well known hack so I won’t go in to it here. C++ streams do a similar thing []

Comments

Dave Johansen

[13/01/2009 at 19:45:00]

I guess the first example could also be pulled off by a switch statement in the operator[], but both of those are pretty nifty tricks and definitely worth knowing about.

Edd

[13/01/2009 at 23:48:00]

Yeah you could also use a switch statement for #1. What I neglected to mention however, is that I've seen reports that Microsoft's compiler at least will compile c[0] down to the exact same code as c.r, which is really nice, especially when you want a very quick 3D vector class, for example.

I honestly don't know whether or not a switch would do the same.

Brad

[15/01/2009 at 10:22:00]

Hey, nice blog. I always find your posts highly interesting. Still, in this instance, I think I might be missing something. Why not:

// colour.hpp v1

class colour
{
public:
    typedef unsigned char component_t;

    // constructor, fills in the members in the obvious way
    colour(component_t r, component_t g, component_t b, component_t a);

    union {
        struct {
            component_t r, g, b, a;
        };
        component_t n[4];
    };
};

colour::component_t &colour::operator[] (unsigned i)
{
    assert(i < 4);
    return n[i];
}

I can see that trick coming in handy if colour::component_t weren't a POD, though.

Edd

[15/01/2009 at 18:11:00]

Hi Brad!

It certainly would be nice if we could do it that way. However, there are 2 problems. The first is that anonymous structs aren't standard C++. The second is that the memory layout of the individual components are not guaranteed to correspond to the appropriate elements of the array in the union due to the compiler being able to insert intermediate padding between r, g, b and a.

These problems may not matter depending on the platforms you’re targeting and the compilers you’re using, of course, in which case you can get away with it. But in using the pointer-to-member trick, you don’t have to get away with anything as you’re well within the bounds of the C++ standard.

Jasmeet Bagga

[20/01/2009 at 21:01:00]

"What’s more, 0 is the only possible valid argument. There are no data members in vec3D that are of type int, so it's impossible to make a non-null pointer to pass to the operator (ignoring the hideously sick casts you're currently trying to conjure-up in your head)! This makes the technique perfectly safe to use."

What happens when one adds/has a data member of type in a class like vec3D ?

Darrell Wright

[21/01/2009 at 06:41:00]

I have used them to build a really big state machine simulation. For an assignment years ago I was required to build a traffic simulator. I had a huge multidimensional array of pointers to member functions. The array indices indicated certain sensor data and other variables and based on this it called the correct function. It was very fast and when used very simple to understand.

The tricky part was the uglyness of pointers to member functions and then putting that in an array. Also, this was just before C++ and the standard template library in the 90's. Setting up the array wasn't too difficult as most of the methods where called based on ranges.

I remember being so geek proud of it as it was really really fast at the time and the code that used it was pretty and uncluttered with layer after layer of if/else if statements.

But like anything, should I have done it that way... Maybe. It's like anything, right tool for the job and nothing is necessarily bad or good. That's the beauty of C++ though, isn't it. It doesn't force you to make idiological decisions about how to approach problems, it just does what you tell it in the way you want.

Edd

[21/01/2009 at 18:58:00]

@Jasmeet: if you don’t have data members of the appropriate type, then it's impossible to create a non-null pointer-to-member. Since the trick is about assigning 0, it's nice to know that you can't use the hook to assign any other value.

If you really want to be 100% bullet proof though, you can create a private inner class, X, and have a pointer-to-member of type X.

kentyman

[21/01/2009 at 19:18:00]

Very interesting! I actually had no idea such a feature existed.

My first thought for the colour example was to use unions, though that requires anonymous structs and strict struct packing, as you mentioned above. I tried to think of other ways to solve the problem, and have included them below just to have a nice comparison versus the pointer-to-member approach.

First, the switch statement approach:

class colour
{
public:
    typedef unsigned char component_t;

    colour(component_t r, component_t g, component_t b, component_t a) :
        r(r), g(g), b(b), a(a)
    {
    }

    component_t& operator[] (unsigned i)
    {
        switch (i)
        {
            default:
                assert(false);
            case 0: return r;
            case 1: return g;
            case 2: return b;
            case 3: return a;
        }
    }

    component_t r, g, b, a;
};

As mentioned above, the switch lookup may slow down indexed lookups. But at least it's easy to read and understand.

There's also an approach using references as aliases:

class colour
{
public:
    typedef unsigned char component_t;

    colour(component_t r, component_t g, component_t b, component_t a) :
        r(c[0]), g(c[1]), b(c[2]), a(c[3])
    {
        c[0] = r;
        c[1] = g;
        c[2] = b;
        c[3] = a;
    }

    component_t& operator[] (unsigned i)
    {
        return c[i];
    }

    component_t c[4];
    component_t& r;
    component_t& g;
    component_t& b;
    component_t& a;
};

The problem here is that our 4 extra references greatly increases sizeof(colour).

I thought of one other approach using an enum:

class colour
{
public:
    typedef unsigned char component_t;

    enum
    {
        r, g, b, a
    };

    colour(component_t _r, component_t _g, component_t _b, component_t _a)
    {
        c[r] = _r;
        c[g] = _g;
        c[b] = _b;
        c[a] = _a;
    }

    component_t& operator[] (unsigned i)
    {
        return c[i];
    }

    component_t c[4];
};
(optional)
(optional)
(required, hint)

Links can be added like [this one -> http://www.mr-edd.co.uk], to my homepage.
Phrases and blocks of code can be enclosed in {{{triple braces}}}.
Any HTML markup will be escaped.