mr-edd.co.uk :: horsing around with the C++ programming language

Explicitly using the delete operator is a bad idea

[29th April 2007]

One of the most common misconceptions about C++ is that it makes memory management hard. This point of view seems to be expressed more by those that treat C++ as if it were C-with-frills and as such view new/delete and their array counterparts in C++ as equivalents to malloc()/free() in C. Unfortunately, from witnessing discussions on message boards, news feeds and so on, a depressingly large number these people seem to be either:

This doesn't help the reputation of the language at all. Now, don't get me wrong, C++ certainly has its fair share of problems, but memory management isn't one of them; the methods available are far superior to those used in C.

So in this post, I'm going to look at how I think memory and other resources should be managed in C++, by default. We'll see that you should almost never have to manually delete or delete[] dynamically allocated objects and in fact, the use of these operators in everything but the most low-level code could point to problems.

Of course, many others have talked about this before. But I believe that it's worth going over things again, if only to bring balance to a universe populated with a large number of people who are troubled by this aspect of the language.

The problem with C style resource handling in C++

In C, resources are typically acquired by one function and released by another, which must be called manually when we're finished with the resource. Examples include malloc()/free() and fopen()/fclose() from the C standard library. System-level APIs written in C (such as POSIX and the Win32API) tend to have a large number of acquire/release style functions, too.

In this post, each example may focus on a particular resource handling pair such as new/delete, but keep in mind that the underlying principles of the discussion apply equally well to numerous other resource types.

So in C, we typically have a lot of code that looks something like this:

thingy_t handle = acquire_thingy();

if (handle != NULL)
{
    do_something_useful(handle);
}

release_thingy(handle);

This is generally fine in C. In C++ however, exceptions are thrown in to the mix and we should expect them to emerge from any function call. If the call to do_something_useful() were to throw an exception in the above example, release_thingy() would never get called. This could be a pretty serious problem, depending on what the resource represents. If it's a mutex lock for example, the program is probably going to grind to a halt (assuming the exception is caught before termination and appropriately handled).

So, one way of fixing this code is to do the following.

thingy_t handle = acquire_thingy();

if (handle == NULL) return;

try
{
    do_something_useful(handle);
}
catch(...)
{
    release_thingy(handle);
    throw;
}

release_thingy(handle);

Well, that'll take care of cleaning up the resource in the face of an exception. But it's still a pretty crappy solution. Why do I say that?

Burdening maintainers

As is often the case in non-trivial software, this function might change some day in order to have a new feature massaged into it.

So it may be the case that the function will have to return half-way under some circumstances, or you might have to call some other functions that have the potential to throw exceptions. Each of these modifications requires you to release the resource before the flow of execution leaves the function. Furthermore, we may end up using another resource, which means extra care must be taken.

thingy_t handle = acquire_thingy();

if (handle == NULL) return;

try
{
    if (get_thingy_name(handle)!= "custom resource 1")
    {
        release_thingy(handle);
        return;
    }

    // connection_t is from some other C API, say
    connection_t conn = open_connection(some_ip); 

    if (conn == NULL)
    {
        release_thingy(handle);
        return;
    }

    try
    {
        send(conn, get_thingy_data(handle))
    }
    catch(...)
    {
        close_connection(conn);
        release_thingy(handle);
        throw;
    }

    close_connection(conn);
}
catch(...)
{
    release_thingy(handle);
    throw;
}

release_thingy(handle);

Now, that's some pretty hideous code. The actual purpose of the code is obscured by all the resource management scaffolding. It would get even worse if we were acquiring and releasing resources in the body of a loop.

Now I hope that any seasoned C++ programmer that reads that code is going to be practically screaming at the monitor: USE A DESTRUCTOR TO RELEASE EACH RESOURCE, FOR CRYING OUT LOUD!.

Quite right! We could create a couple of utility classes to help us manage the thingy_t and connection_t resources, RAII style. But instead, I'm going to use the scoped<> template class that I introduced in a previous post.

scoped<thingy_t> handle(aquire_thingy(), release_thingy);
if (!handle.get() || get_thingy_name(handle.get()) != "custom resource 1")
    return;

scoped<connection_t> conn(open_connection(some_ip), close_connection);
if (!conn.get())
    return;

send(conn.get(), get_thingy_data(handle.get()));

Much better! It's now obvious what the function does by reading through the code. The function will automatically release the resources when it returns or when an exception is thrown as the destructors of conn and handle are called in either circumstance and they will call the appropriate clean-up functions.

Maintainers can now add extra code willy-nilly and not worry that they might be inadvertently leaking resources if the functions they call throw exceptions, or if they add in any additional return statements.

Of course, all of this applies equally well to new-ing and delete-ing dynamically allocated objects. The C++ standard library provides the std::auto_ptr<> template class for managing dynamically allocated pointers. But auto_ptr<> has some arguably strange copying semantics, so I'm happier encouraging you to look at scoped_ptr<> and/or shared_ptr<> from boost (soon to be found in the standard library).

boost::scoped_ptr<X> p(new X);

// Within this scope we can now treat p like a regular
// pointer-to-X, for the most part.

p->some_member_function(); 

// the dynamically allocated X is deleted in p's
// destructor when the scope is left

Disambiguating ownership semantics in your interfaces

There's another pretty nasty problem with C-style resource management and that is the difficulty in communicating change of ownership when passing resources as function parameters and returning resources from functions.

Consider a function with the following prototype, for example.

X *make_an_X();

If a function returns a naked pointer to a dynamically allocated object, the function imposes a burden on the caller. They are handed a proverbial hot potato. It is now the caller's responsibility to delete (or delete[]) the returned pointer. The interface is easy to use erroneously:

There's no need to introduce the possibility of these errors occurring. You might argue that any C++ programmer worth their salt wouldn't do any of these things. Sure every C++ programmer knows not to do these things, but they're human and these mistakes do get made. Furthermore, all of these errors can be avoided by making a simple change to the interface.

shared_ptr<X> make_an_X();

Instead of a raw pointer, we could return a boost::shared_ptr<>, for example. Now a client won't have to worry about delete-ing the pointer; we've made it impossible to make any of the mistakes mentioned above and thus the client programmer's life is easier as a result.

Indeed when I call a function from someone else's code that returns a naked pointer that needs to be delete-d, I immediately wrap it in shared_ptr<> or similar for peace of mind.

There are also related issues when passing pointers in to functions. Is the ownership transferred in passing the pointer as an argument? Even if the function throws an exception or fails in some other way? Again, there's just too much room for error. Avoid the errors entirely by passing in a shared_ptr<>, or something else that indicates and enforces the desired ownership semantics.

Another scenario is that we might dynamically allocate an object in a constructor and store the generated pointer in a member variable. If this raw pointer is passed to the outside world (as a member function's return value, perhaps) and stored elsewhere, there is potential for the external pointer to dangle i.e. point to a delete-d resource once the object's destructor has been called. This is another dangerous situation that can be trivially avoided in the same way.

Using std::vector instead of new[]-ed arrays

I'm going to give std::vector<> from the C++ standard library some special treatment here because I often see people thinking that new[] and delete[] are preferable.

There are a number of reasons that you should prefer to use std::vector<> or another standard container over a dynamically allocated array. The primary reason is of course that new[] produces a pointer that has all the same associated resource management worries as a pointer generated by new.

But what about efficiency? Surely a hand-written array is more efficient? Actually, no, at least not in optimised builds. There are a number of reasons for this. The first is that any modern compiler can inline the key member functions (such as operator[]) to perform equally as indexed access into a raw array[1].

Furthermore, a vector<> constructs objects on demand, whereas a invoking new[n] creates n objects that must all be default constructed from the word go, even if we need to overwrite them immediately. All those constructor calls are a waste of time. By using placement new a vector<> is able to construct an object at the appropriate address within the container only when it inserted.

To see what I mean, suppose we have a function std::string to_string(unsigned) that returns a string representation of the number given as an argument. Here are two code snippets that use this function in the same way. The first snippet uses new[] and delete[] whereas the second employs std::vector<>.

// native array version
using std::string;

string *s1 = new string[100]; // 100 strings default constructed
try
{
    for (unsigned i = 0; i != 100; ++i)
        s1[i] = to_string(i); // 100 copy-assignment operator calls
}
catch(...)
{
    delete[] s1;
    throw;
}

// ...

delete[] s1;
// std::vector version

std::vector<string> s2;
s2.reserve(100); // allocate memory, but don't construct anything in there yet

for (unsigned i = 0; i != 100; ++i)
    s2.push_back(to_string(i)); // 100 copy-constructor calls via placement new

// ...

So assuming that copy construction and copy assignment for std::strings take the same amount of work, the version using the std::vector<> is cheaper by 100 constructor calls. In fact, copy construction should be cheaper than assignment because it creates an object in an untouched region of memory. No extra work is needed to tear down an existing object as is the case in copy-assignment. In addition, because placement new is used in this way, there is no requirement for objects to have a default constructor if they are to be added to a vector<>[2].

And of course, adding elements to a vector is easy and exception safe. In contrast, having to manually re-allocate memory to effectively resize an existing array is tedious and error prone, especially when you must consider delete[]-ing the array consistently in the face of exceptions, which could occur even in a copy-assignment when overwriting an element of the array, hence the uglifying try/catch blocks in the first code snippet.

The last advantage that a vector<> has over a raw array is that of checked-iterators. Many standard library implementations allow you to use different iterators in debug builds of your code that check whether they're out-of-bounds[3]. Naked pointers as used in dynamically allocated arrays offer no such facility. Of course, in release builds, vector<> iterators are often (always?) a typedef for a naked pointer type, so you get both safety and efficiency, where needed.

So there's every reason to choose std::vector in place of simple new[]/delete[]. I've never heard a good argument to the contrary[4].

Conclusion

Every time a resource must be managed by hand, it can and will be managed incorrectly at some stage. By letting bullet-proof language constructs manage your resources, you improve both the robustness and readability of your code. Furthermore, it makes it easier for your clients to write additional code on top of yours.

This is why I never explicitly use delete, delete[] or call any resource clean-up function in C++.

Footnotes
  1. And keep in mind that std::vector was added to the C++ standard library over a decade ago and so your benchmarks may be out of date. Perhaps a future post on this particular topic is warranted? []
  2. One of the things that really annoys me about the Qt containers is that all types used in them must supply a default constructor even though this isn't necessary []
  3. Out of the implementations that I have at the time of writing, MinGW GCC 3.4.5, Apple GCC 4, Borland bcc32 v5.82 and MSVC 8 have libraries with checked-iterator support []
  4. Unless it's a vector<bool>, see Item 6 in Herb Sutter's More Exceptional C++ and/or Item 18 in Scott Meyers' Effective STL. But there are other alternatives such as std::bitset, boost::dynamic_bitset and std::deque<bool> []

Comments

Pierre Phaneuf

[22/06/2008 at 16:17:00]

I agree entirely with this. For a (long) while, I was rather opposed to using exceptions, but this was due to two things: I was using raw pointers, so my code had to be littered with try/catch blocks when there was possibilities of exceptions, and the other thing was that I saw too much code that used exceptions in wildly inept way (no, reaching the end of an iterator is not an exception!).

I'd like to point out that shared_ptr should really be called strong_ptr. I have seen people starting to use smart pointers just starting to replace every pointer they saw with shared_ptr, and then they start complaining about circular references and such problems.

Smart pointers should really be called something like "annotated pointers" instead, because the programmer still has to do the thinking! At the very least, one has to think whether a pointer is a strong owner (controlling the lifetime) or a weak one. When you do the deletes explicitly, you don't do delete on every single pointer that goes out of scope, right? Thankfully shared_ptr comes with weak_ptr, which adds in the right semantic. But whenever a (non-reference) shared_ptr is put down, you have to think if it is really a strong reference, or if it is merely a weak one.

By the way, I just found your site recently (throught the Google Reader suggestions), and I am finding it quite good.

(optional)
(optional)
(required, hint)

Links can be added like [this one -> http://www.mr-edd.co.uk], to my homepage.
Phrases and blocks of code can be enclosed in {{{triple braces}}}.
Any HTML markup will be escaped.