"Programmers who overload unary operator& should be sentenced to writing libraries that need to operate properly when fed such classes." - Peter Dimov, esteemed Boost member, Boost newsgroup June 2002
Although that's very funny, it's also a pretty strong statement. Why is there such antipathy to the use of this operator?
In this chapter we'll look at some of the problems that doing this can cause. Our solution will be an unexciting one, simply the recommendation that you follow Peter's implicit advice, and forswear any use of this overload for shortsighted gains and avoid much grief down the line.
One thing I should make clear. Anytime in this chapter that I refer to operator &()
it will be the unary form, which is the address of operator. The binary form, which is the bitwise AND operator, is an entirely different beast.
class Int { Int operator &(Int const &); // Bitwise operator void *operator &() // Address-of operator . . .
Like most C++ operators, you're free to return anything you like from operator &()
. This means that you can alter the value, or the type, or both. This can be a powerful aid in rare circumstances, but it can also cause you a world of trouble.
construct()
and
destroy()
methods, whose canonical definitions are as follows:
template <typename T> struct some_allocator { . . . void construct(T* p, T const &x) { new(p) T(x); } void destroy(T* p) { p->~T(); } . . .
The construct()
method is used by containers to in- place construct elements, as in
template <typename T, . . . > void list::insert(. . ., T const &x) { . . . Node *node = . . . get_allocator().construct(&node->value, x);
If you're storing in the list overloads operator &()
to return a value that cannot be converted to T*
then this line will fail to compile.
operator &
. There are quite a number of classes that overload it, including
CComBSTR
,
CComPtr
and
CComVariant
.
To account for the incompatibility between ATL types and STL containers, the designers of ATL introduced the CAdapt
template, which attempts to solve the problem by containing an instance of its parameterising type. It then provides implicit conversion operators and comparison operations to allow it to be used in place of its parameterising type. Because CAdapt<T>
does not overload operator &()
, it can be used to mask the overload for any T
that does.
template <typename T> class CAdapt { public: CAdapt(); CAdapt(const T& rSrc); CAdapt(const CAdapt& rSrCA); CAdapt &operator =(const T& rSrc); bool operator <(const T& rSrc) const { return m_T < rSrc; } bool operator ==(const T& rSrc) const; operator T&() { return m_T; } operator const T&() const; T m_T; };
Unfortunately, this is just a sticking plaster on a broken arm. As we saw in Chapter 23, templates that inherit from their parameterising type have a deal of trouble in unambiguously providing access to the requisite constructors of their parent class. The same problem exists for types such as CAdapt
, which enhance their parameterising type via containment rather than inheritance. All the constructors of T
, except the default and copy constructors, are inaccessible. This clutters your code, reduces the applicability of generic algorithms, and prevents the use of RAII (see Section 3.5).
template<typename T> T *get_real_address(T &t) { return reinterpret_cast<T*>(&reinterpret_cast<byte_t &>(t)); }
There are other complications, to account for const
and/or volatile
, but that's the essence of it. The Boost libraries have a nifty addressof()
function, which takes account of all the issues.
But the use of reinterpret_cast
is cause for some concern. The standard (C++-98: 5/2.10;3) says: "the mapping performed . is implementation-defined. [Note: it is intended to be unsurprising to those who know the addressing structure of the underlying machine]". Since the result may conceivably not be valid, it's not possible to claim that this technique is truly portable. However, it's also pretty hard to imagine a compiler that would not perform the expected conversion.
We can now side step types with pathological operator &()
overloads, but this would require peppering all our code with calls to the real address shim. But it's ugly, and its correctness is implementation-defined. Do you want to use a standard library with myriad reinterpret_cast
s?
Since it's a function like any other, the operator &()
overload can do things other than simply return a converted value. This has serious consequences.
operator&()
breaks encapsulation.
That's a bold statement. Let me illustrate why it is so.
As I've mentioned already, ATL has a large number of wrapper classes that overload operator &()
. Unfortunately, there are different semantics to their implementations. The types shown in Table 26.1 all have an assertion in the operator method to ensure that the current value is NULL
.
Wrapper Classes | operator&() Return Type |
---|---|
CComTypeAttr |
TYPEATTR** |
CComVarDesc |
VARDESC** |
CComFuncDesc |
FUNCDESC** |
CComPtr / CComQIPtr |
T** |
CHeapPtr |
T** |
Table 26.1
Don't worry about the specifics of the types TYPEATTR
, VARDESC
and FUNCDESC
—they're POD Open type structures (see Section 4.4) used for manipulating COM meta data. The important thing to note is that they have allocated resources associated with them but they do not provide value semantics, which means that they must be managed carefully in order to prevent resource leaks or use of dangling pointers.
The operator is overloaded in the wrapper classes to allow these types to be used with COM API functions that manipulate the underlying types, and to be thus initialised. Of course, it's not an initialisation as we RAII-phile C++ types know and love it, but it is initialisation, because the assertion means that any subsequent attempt to repeat the process will result in an error, in debug mode at least. I'll leave it up to you to decide whether that, in and of itself, is a good way to design wrapper classes, but you can see that you are required to look inside the library to see what is going on. After all, it's using an overloaded operator, not calling a function named get_one_time_content_pointer()
[1].
The widely used CComBSTR
class, which wraps the COM BSTR
type, also overloads operator &()
to return BSTR*
, but it does not have an assertion. By contra-implication, we assume that this means that it's OK to take the address of a CComBSTR
multiple times, and, since the operator is non-const, that we can make multiple modifying manipulations to the encapsulated BSTR
without ill-effect. Alas, this is not the case. CComBStr
can be made to leak memory with ease:
void SetBSTR(char const *str, BSTR *pbstr); CComBSTR bstr; SetBSTR("Doctor", &bstr); // All ok so far SetBSTR("Proctor", &bstr); // "Doctor" is now lost forever!
We can surmise that the reason CComBSTR
does not assert is that it proved too inconvenient. For example, it is not uncommon to see in COM an API function or interface method that will take an array of BSTR
. Putting aside the issue of passing arrays of derived types (see Sections 14.5; 33.4), we might wish to use our CComBSTR
when we're only passing one string.
An alternative strategy is to release the encapsulated resource within the operator &()
method. This is the approach of another popular Microsoft COM wrapper class, the Visual C++ _com_ptr_t
template. The downside of this approach is that the wrapper is subject to premature release on those occasions when you need to pass a pointer to the encapsulated resource to a function that will merely be using it, rather than destroying it or removing it from your wrapper. You may think that you can solve this by declaring const
and non- const
overloads of operator &()
, as in Listing 26.2.
template <typename T> class X { . . . T const *operator &() const { return &m_t; } T *operator &() { Release(m_t); m_t = T(); return &m_t; }
Unfortunately, this won't help, because the compiler selects the overload appropriate to the const
-ness of the instance on which it's to be called, rather than on the use one might be making of the returned value. Even if you pass the address of a non-const X<T>
instance to a function that takes T const *
, the non-const
overload will be called.
To me, all this stuff is so overwhelmingly nasty that I stopped using any such classes a long time ago. Now I like to use explicitly named methods and/or shims to save me from all the uncertainty. For example, I use the sublimely named[2] BStr
class to wrap BSTR
. It provides the DestructiveAddress()
and NonDestructiveAddress()
methods, which, though profoundly ugly, don't leave anyone guessing as to what's going on.
Another source of abuse in overload operator &()
is in the type it returns. Since we can make it return anything, it's easy to have it return something bad; naturally, this is the case for any operator.
We saw in Chapter 14 some of the problems attendant in passing arrays of inherited types with functions that take pointers to the base type. There's another dimension to that nasty problem when overloading operator &()
. Consider the following types:
struct THING { int i; int j; }; struct Thing { THING thing; int k; THING *operator &() { return &thing; } THING const *operator &() const; };
Now we're in the same position we would be if Thing
inherited publicly from THING
.
void func(THING *things, size_t cThings); Thing things[10]; func(&things[0], dimensionof(things)); // Oop!!
By providing the operator &()
overloads for "convenience", we've exposed ourselves to abuse of the Thing
type. I'm not going to suggest the application of any of the measures described in Chapter 14 here, because I think overloading operator &()
is just a big no-no.
A truly bizarre confluence of factors is the case where the operator is destructive—it releases the resources—and you are passing an array of (even correctly size) wrapper class instances to a function, as in Listing 26.4.
struct ANOTHER { . . . }; void func(ANOTHER *things, size_t cThings); inline void func(array_proxy<ANOTHER> const &things) { func(things.base(), things.size()); } class Another { ANOTHER *operator &() { ReleaseAndReset(m_another); return &m_another; } private: ANOTHER m_another; };
Let's assume you're on your best behaviour, and are using an array_proxy
(see Section 14.5.5) and translator method to ensure that ANOTHER
and Another
can be used together.
Another things[5]; . . . // Modify things func(things); // sizeof(ANOTHER) must == sizeof(Another)
Irrespective of the semantics of func()
, in calling the function things[0]
will be reset and things[1]
- things[4]
will not be affected. This is because the array constructor of array_proxy
uses explicit array subscript syntax, as all good array manipulation code should. If you were to do it manually, you'd still need to apply the operator, unless Another
inherited publicly from ANOTHER
and you called the two parameter version of func()
and relied on array decay.
If func()
does not change the contents of the array passed to it, then this supposedly benign call has the nasty side effect of destroying the first element passed to it. If func()
modifies the contents of the array, then things[1]
- things[4]
are subject to resource leaks, as their contents prior to the call are simply overwritten by func()
.
I hope I've managed to convince you that Peter was spot on. Overloading operator &()
is just far too much trouble. Consider the amount of coding time, thinking time and debugging time that is expended trying to understand and work with libraries that use it, I struggle to imagine how using it helps the software engineering community[3].
In short, don't do it. In grepping through my source databases at the time of writing, I found eleven uses of it. Of the three that were used in "proper" classes—i.e. those that are not in utility or meta-programming classes—I can probably truly justify only one of them. I removed two immediately[4]. The third I cannot justify, but I'm keeping it for reasons of expediency. For grins, I'll describe this in the following sub-section.
operator &()
.
The Win32 API defines many non-standard basic structures, oftentimes for closely related types. Further, since many Win32 compilers did not provide 64-bit integers in the early years of the operating system, there are several 64-bit structures that filled in the gap. Two such structures are ULARGE_INTEGER
and FILETIME
. Their structures are as follows:
struct FILETIME { uint32_t dwLowDateTime; uint32_t dwHighDateTime; }; union ULARGE_INTEGER { struct { uint32_t LowPart; uint32_t HighPart; }; uint64_t QuadPart; };
Performing arithmetic using the FILETIME
structure is tiresome, to say the least. On little-endian systems, the layout is identical to that of ULARGE_INTEGER
, so that one can cast instances of one type to the other, hence one can manipulate two subtract FILETIME
structures by casting them to ULARGE_INTEGER
and subtracting the QuadPart
members.
FILETIME ft1 = . . . FILETIME ft2 = . . . FILETIME ft3; GetFileTme(h1, NULL, NULL, &ft1); GetFileTme(h2, NULL, NULL, &ft2); // Subtract them - yuck! reinterpret_cast<ULARGE_INTEGER&>(ft3).QuadPart = reinterpret_cast<ULARGE_INTEGER&>(ft1).QuadPart - reinterpret_cast<ULARGE_INTEGER&>(ft2).QuadPart;
This also is pretty tiresome, so I concocted the ULargeInteger
class. It supplies various arithmetic operations (see Chapter 29), has a compatible layout with the two structures, and provides an operator &()
overload. The operator returns an instance of Address_proxy
, whose definition is shown in Listing 26.5:
union ULargeInteger { private: struct Address_proxy { Address_proxy(void *p) : m_p(p) {} operator LPFILETIME () { return reinterpret_cast<LPFILETIME>(p); } operator LPCFILETIME () const; operator ULARGE_INTEGER *() { return reinterpret_cast<ULARGE_INTEGER*>(p); } operator ULARGE_INTEGER const *() const; private: void *m_p; // Not to be implemented private: Address_proxy &operator =(Address_proxy const&); }; Address_proxy operator &() { return Address_proxy(this); } Address_proxy const operator &() const; . . .
It holds a reference to the ULargeInteger
instance for which it acts, and it provides implicit conversions to both FILETIME*
and ULARGE_INTEGER*
. Since the proxy class is private
, and instances of it are only returned from the ULargeInteger
's address-of operators, it is relatively proof from abuse, though you'd be stuck if you tried to put it in an STL container. But it considerably eases the burden of using these Win32 structures:
ULargeInteger ft1 = . . . ULargeInteger ft2 = . . . GetFileTme(h1, NULL, NULL, &ft1); GetFileTme(h2, NULL, NULL, &ft2); // Subtract them - nice syntax now ULargeInteger ft3 = ft1 - ft2;
BSTR
and BStr
are far too alike, and have caused me no end of bother. Matthew Wilson is author of Imperfect C++, which is available on Amazon.com at:
http://www.amazon.com/exec/obidos/ASIN/0321228774/
Have an opinion? Readers have already posted 12 comments about this article. Why not add yours?
Matthew Wilson is a software development consultant for Synesis Software, and creator of the STLSoft libraries. He is author of the
Imperfect C++ (Addison-Wesley, 2004), and is currently working on his next two books, one of which is not about C++. Matthew can be contacted via http://imperfectcplusplus.com/.
Artima provides consulting and training services to help you make the most of Scala, reactive
and functional programming, enterprise systems, big data, and testing.