What's Your Address?

by Matthew Wilson

April 8, 2005

Summary

This article is an excerpt from Matthew Wilson's recently-published book, Imperfect C++, Addison-Wesley, 2004.

26 What's Your Address?

"Programmers who overload unary operator& should be sentenced to writing libraries that need to operate properly when fed such classes." - Peter Dimov, esteemed Boost member, Boost newsgroup June 2002

Although that's very funny, it's also a pretty strong statement. Why is there such antipathy to the use of this operator?

In this chapter we'll look at some of the problems that doing this can cause. Our solution will be an unexciting one, simply the recommendation that you follow Peter's implicit advice, and forswear any use of this overload for shortsighted gains and avoid much grief down the line.

One thing I should make clear. Anytime in this chapter that I refer to operator &() it will be the unary form, which is the address of operator. The binary form, which is the bitwise AND operator, is an entirely different beast.

class Int
{
  Int operator &(Int const &);  // Bitwise operator
  void *operator &()            // Address-of operator
  . . .

26.1 Can't Get The Real Address

Like most C++ operators, you're free to return anything you like from operator &(). This means that you can alter the value, or the type, or both. This can be a powerful aid in rare circumstances, but it can also cause you a world of trouble.

26.1.1 STL Containment

The standard library containers store contained elements via in- place construction. For instance, containers following the Vector model [Aust1999, Muss2001] maintain a block of memory within which each element is stored contiguously. Since they are resizable, there needs to be a mechanism to add and remove elements from this storage, which is provided by the allocators. The Allocator [Aust1999, Muss2001] model includes the construct() and destroy() methods, whose canonical definitions are as follows:

template <typename T>
struct some_allocator
{
  . . .
  void construct(T* p, T const &x)
  {
    new(p) T(x);
  }
  void destroy(T* p)
  {
   p->~T();
  }
  . . .

The construct() method is used by containers to in- place construct elements, as in

template <typename T, . . . >
void list::insert(. . ., T const &x)
{
  . . .
  Node *node = . . .
  get_allocator().construct(&node->value, x);

If you're storing in the list overloads operator &() to return a value that cannot be converted to T* then this line will fail to compile.

26.1.2 ATL Wrapper Classes & CAdapt

One of the major beefs I have with Microsoft's Active Template Library (ATL)—which, like most frameworks, started out with high ideals—is the heavy overloading of operator &. There are quite a number of classes that overload it, including CComBSTR, CComPtr and CComVariant.

To account for the incompatibility between ATL types and STL containers, the designers of ATL introduced the CAdapt template, which attempts to solve the problem by containing an instance of its parameterising type. It then provides implicit conversion operators and comparison operations to allow it to be used in place of its parameterising type. Because CAdapt<T> does not overload operator &(), it can be used to mask the overload for any T that does.

Listing 26.1

template <typename T>
class CAdapt
{
public:
  CAdapt();
  CAdapt(const T& rSrc);
  CAdapt(const CAdapt& rSrCA);
  CAdapt &operator =(const T& rSrc);
  bool operator <(const T& rSrc) const
  {
    return m_T < rSrc;
  }
  bool operator ==(const T& rSrc) const;
  operator T&()
  {
    return m_T;
  }
  operator const T&() const;
 
  T m_T;
};

Unfortunately, this is just a sticking plaster on a broken arm. As we saw in Chapter 23, templates that inherit from their parameterising type have a deal of trouble in unambiguously providing access to the requisite constructors of their parent class. The same problem exists for types such as CAdapt, which enhance their parameterising type via containment rather than inheritance. All the constructors of T, except the default and copy constructors, are inaccessible. This clutters your code, reduces the applicability of generic algorithms, and prevents the use of RAII (see Section 3.5).

26.1.3 Getting The Real Address

So is there a way to get at the real address? Since there is no equivalent overloadable operator for eliciting a reference from an object, we can use some dubious reference casting to get our address, along the lines of the following attribute shim (see Chapter 20):

template<typename T>
T *get_real_address(T &t)
{
  return reinterpret_cast<T*>(&reinterpret_cast<byte_t &>(t));
}

There are other complications, to account for const and/or volatile, but that's the essence of it. The Boost libraries have a nifty addressof() function, which takes account of all the issues.

But the use of reinterpret_cast is cause for some concern. The standard (C++-98: 5/2.10;3) says: "the mapping performed . is implementation-defined. [Note: it is intended to be unsurprising to those who know the addressing structure of the underlying machine]". Since the result may conceivably not be valid, it's not possible to claim that this technique is truly portable. However, it's also pretty hard to imagine a compiler that would not perform the expected conversion.

We can now side step types with pathological operator &() overloads, but this would require peppering all our code with calls to the real address shim. But it's ugly, and its correctness is implementation-defined. Do you want to use a standard library with myriad reinterpret_casts?

26.2 What Actions Are Carried Out During Conversion?

Since it's a function like any other, the operator &() overload can do things other than simply return a converted value. This has serious consequences.

Imperfection: Overloading operator&() breaks encapsulation.

That's a bold statement. Let me illustrate why it is so.

As I've mentioned already, ATL has a large number of wrapper classes that overload operator &(). Unfortunately, there are different semantics to their implementations. The types shown in Table 26.1 all have an assertion in the operator method to ensure that the current value is NULL.

Wrapper Classes	`operator&()` Return Type
`CComTypeAttr`	TYPEATTR**
`CComVarDesc`	VARDESC**
`CComFuncDesc`	FUNCDESC**
`CComPtr / CComQIPtr`	T**
`CHeapPtr`	T**

Table 26.1

Don't worry about the specifics of the types TYPEATTR, VARDESC and FUNCDESC—they're POD Open type structures (see Section 4.4) used for manipulating COM meta data. The important thing to note is that they have allocated resources associated with them but they do not provide value semantics, which means that they must be managed carefully in order to prevent resource leaks or use of dangling pointers.

The operator is overloaded in the wrapper classes to allow these types to be used with COM API functions that manipulate the underlying types, and to be thus initialised. Of course, it's not an initialisation as we RAII-phile C++ types know and love it, but it is initialisation, because the assertion means that any subsequent attempt to repeat the process will result in an error, in debug mode at least. I'll leave it up to you to decide whether that, in and of itself, is a good way to design wrapper classes, but you can see that you are required to look inside the library to see what is going on. After all, it's using an overloaded operator, not calling a function named get_one_time_content_pointer()[1].

The widely used CComBSTR class, which wraps the COM BSTR type, also overloads operator &() to return BSTR*, but it does not have an assertion. By contra-implication, we assume that this means that it's OK to take the address of a CComBSTR multiple times, and, since the operator is non-const, that we can make multiple modifying manipulations to the encapsulated BSTR without ill-effect. Alas, this is not the case. CComBStr can be made to leak memory with ease:

void SetBSTR(char const *str, BSTR *pbstr);
CComBSTR  bstr;
SetBSTR("Doctor", &bstr);   // All ok so far
SetBSTR("Proctor", &bstr);  // "Doctor" is now lost forever!

We can surmise that the reason CComBSTR does not assert is that it proved too inconvenient. For example, it is not uncommon to see in COM an API function or interface method that will take an array of BSTR. Putting aside the issue of passing arrays of derived types (see Sections 14.5; 33.4), we might wish to use our CComBSTR when we're only passing one string.

An alternative strategy is to release the encapsulated resource within the operator &() method. This is the approach of another popular Microsoft COM wrapper class, the Visual C++ _com_ptr_t template. The downside of this approach is that the wrapper is subject to premature release on those occasions when you need to pass a pointer to the encapsulated resource to a function that will merely be using it, rather than destroying it or removing it from your wrapper. You may think that you can solve this by declaring const and non- const overloads of operator &(), as in Listing 26.2.

Listing 26.2

template <typename T>
class X
{
  . . .
  T const *operator &() const
  {
    return &m_t;
  }
  T *operator &()
  {
    Release(m_t);
    m_t = T();
    return &m_t;
  }

Unfortunately, this won't help, because the compiler selects the overload appropriate to the const-ness of the instance on which it's to be called, rather than on the use one might be making of the returned value. Even if you pass the address of a non-const X<T> instance to a function that takes T const *, the non-const overload will be called.

To me, all this stuff is so overwhelmingly nasty that I stopped using any such classes a long time ago. Now I like to use explicitly named methods and/or shims to save me from all the uncertainty. For example, I use the sublimely named[2] BStr class to wrap BSTR. It provides the DestructiveAddress() and NonDestructiveAddress() methods, which, though profoundly ugly, don't leave anyone guessing as to what's going on.

26.3 What Do We Return?

Another source of abuse in overload operator &() is in the type it returns. Since we can make it return anything, it's easy to have it return something bad; naturally, this is the case for any operator.

We saw in Chapter 14 some of the problems attendant in passing arrays of inherited types with functions that take pointers to the base type. There's another dimension to that nasty problem when overloading operator &(). Consider the following types:

Listing 26.3

struct THING
{
  int i;
  int j;
};
struct Thing
{
  THING thing;
  int   k;

  THING *operator &()
  {
    return &thing;
  }
  THING const *operator &() const;
};

Now we're in the same position we would be if Thing inherited publicly from THING.

 void func(THING *things, size_t cThings);
 Thing things[10];
 func(&things[0], dimensionof(things)); // Oop!!

By providing the operator &() overloads for "convenience", we've exposed ourselves to abuse of the Thing type. I'm not going to suggest the application of any of the measures described in Chapter 14 here, because I think overloading operator &() is just a big no-no.

A truly bizarre confluence of factors is the case where the operator is destructive—it releases the resources—and you are passing an array of (even correctly size) wrapper class instances to a function, as in Listing 26.4.

Listing 26.4

struct ANOTHER
{
  . . .
};
 
void func(ANOTHER *things, size_t cThings);
inline void func(array_proxy<ANOTHER> const &things)
{
  func(things.base(), things.size());
}
 
class Another
{
  ANOTHER *operator &()
  {
    ReleaseAndReset(m_another);
    return &m_another;
  }
private:
  ANOTHER m_another;
 };

Let's assume you're on your best behaviour, and are using an array_proxy (see Section 14.5.5) and translator method to ensure that ANOTHER and Another can be used together.

 Another  things[5];
 . . . // Modify things
 func(things); // sizeof(ANOTHER) must == sizeof(Another)

Irrespective of the semantics of func(), in calling the function things[0] will be reset and things[1] - things[4] will not be affected. This is because the array constructor of array_proxy uses explicit array subscript syntax, as all good array manipulation code should. If you were to do it manually, you'd still need to apply the operator, unless Another inherited publicly from ANOTHER and you called the two parameter version of func() and relied on array decay.

If func() does not change the contents of the array passed to it, then this supposedly benign call has the nasty side effect of destroying the first element passed to it. If func() modifies the contents of the array, then things[1] - things[4] are subject to resource leaks, as their contents prior to the call are simply overwritten by func().

26.4 What's Your Address: Coda

I hope I've managed to convince you that Peter was spot on. Overloading operator &() is just far too much trouble. Consider the amount of coding time, thinking time and debugging time that is expended trying to understand and work with libraries that use it, I struggle to imagine how using it helps the software engineering community[3].

In short, don't do it. In grepping through my source databases at the time of writing, I found eleven uses of it. Of the three that were used in "proper" classes—i.e. those that are not in utility or meta-programming classes—I can probably truly justify only one of them. I removed two immediately[4]. The third I cannot justify, but I'm keeping it for reasons of expediency. For grins, I'll describe this in the following sub-section.

26.4.2 A Sensationalist Backflip!

I'm not going to try to justify this to you; you can make up your own mind whether its utility outweighs the many good reasons against overloading operator &().

The Win32 API defines many non-standard basic structures, oftentimes for closely related types. Further, since many Win32 compilers did not provide 64-bit integers in the early years of the operating system, there are several 64-bit structures that filled in the gap. Two such structures are ULARGE_INTEGER and FILETIME. Their structures are as follows:

struct FILETIME
{
  uint32_t    dwLowDateTime;
  uint32_t    dwHighDateTime;
};
   
union ULARGE_INTEGER
{
  struct
  {
    uint32_t  LowPart;
    uint32_t  HighPart;
  };
  uint64_t    QuadPart;
 };

Performing arithmetic using the FILETIME structure is tiresome, to say the least. On little-endian systems, the layout is identical to that of ULARGE_INTEGER, so that one can cast instances of one type to the other, hence one can manipulate two subtract FILETIME structures by casting them to ULARGE_INTEGER and subtracting the QuadPart members.

FILETIME ft1 = . . .
FILETIME ft2 = . . .
FILETIME ft3;
 
GetFileTme(h1, NULL, NULL, &ft1);
GetFileTme(h2, NULL, NULL, &ft2);
 
// Subtract them - yuck!
reinterpret_cast<ULARGE_INTEGER&>(ft3).QuadPart =
  reinterpret_cast<ULARGE_INTEGER&>(ft1).QuadPart -
  reinterpret_cast<ULARGE_INTEGER&>(ft2).QuadPart;

This also is pretty tiresome, so I concocted the ULargeInteger class. It supplies various arithmetic operations (see Chapter 29), has a compatible layout with the two structures, and provides an operator &() overload. The operator returns an instance of Address_proxy, whose definition is shown in Listing 26.5:

Listing 26.5

union ULargeInteger
{
private:
  struct Address_proxy
  {
    Address_proxy(void *p)
      : m_p(p)
    {}
    operator LPFILETIME ()
    {
      return reinterpret_cast<LPFILETIME>(p);
    }
    operator LPCFILETIME () const;
    operator ULARGE_INTEGER *()
    {
      return reinterpret_cast<ULARGE_INTEGER*>(p);
    }
    operator ULARGE_INTEGER const *() const;
  private:
    void  *m_p;
  // Not to be implemented
  private:
    Address_proxy &operator =(Address_proxy const&);
  };
  Address_proxy operator &()
  {
    return Address_proxy(this);
  }
  Address_proxy const operator &() const;
  . . .

It holds a reference to the ULargeInteger instance for which it acts, and it provides implicit conversions to both FILETIME* and ULARGE_INTEGER*. Since the proxy class is private, and instances of it are only returned from the ULargeInteger's address-of operators, it is relatively proof from abuse, though you'd be stuck if you tried to put it in an STL container. But it considerably eases the burden of using these Win32 structures:

ULargeInteger ft1 = . . .
ULargeInteger ft2 = . . .
 
GetFileTme(h1, NULL, NULL, &ft1);
GetFileTme(h2, NULL, NULL, &ft2);
 
// Subtract them - nice syntax now
ULargeInteger ft3 = ft1 - ft2;

Notes

Of course, in an ideal world one would only have to read the documentation to understand, and memorably absorb, the fine nuances of the use of libraries such as those mentioned in this section. However, this is anything but the case. Documentation is at least one conceptual step away from the code-face, out-of-date the moment it's written, and difficult to write (either by the author of the code, who know's too much, or by another party, who know's too little). In reality, the code is often the documentation. [Glas2003].
This was a definite case of not thinking before coding. The names BSTR and BStr are far too alike, and have caused me no end of bother.
Keeping developers employed in remediation work doesn't count, since they'd be better off working on new projects, as would their employer
There's another reason to write a book: you get to go through all your own code and learn how much you didn't used to know.

Resources

Matthew Wilson is author of Imperfect C++, which is available on Amazon.com at:
http://www.amazon.com/exec/obidos/ASIN/0321228774/

Talk back!

Have an opinion? Readers have already posted 12 comments about this article. Why not add yours?

About the author

Matthew Wilson is a software development consultant for Synesis Software, and creator of the STLSoft libraries. He is author of the

Imperfect C++ (Addison-Wesley, 2004), and is currently working on his next two books, one of which is not about C++. Matthew can be contacted via http://imperfectcplusplus.com/.