October 2007 Archives
2007-10-13
std::vector
std::vector<T> is more intrusive than one
might expect. Besides the default and copy constructor
T must have an assignment operator. "So what?", you
think? Well, look at this:
struct Foo {
Foo() : bar(42) { }
const int bar;
};
//...
std::vector<Foo> v;
Foo foo;
v.push_back(foo);
Compiling this program makes the g++ dump this error
error: non-static const member 'const int Foo::bar', can't use default assignment operatorHmm, right. The default assignment operator does a bitwise copy but the const member is not allowed to be overwritten. So the compiler asks you, what to do.
What could be a meaningfull definition of the assignment operator? What if the member const variable differs from the operant's one? I think there is no general answer to this question. But perhaps we can do something if we know which objects are assigned to each other. Let's illuminate the vector class to find out.
class Illuminate {
//... see the Illuminate post in the C++ section
};
int main()
{
std::vector<Illuminate> v;
Illuminate object;
v.push_back(object);
}
Running this program shows:
default constructor 0 copy constructor 1 <- 0 destructor 0 destructor 1Hmm, operator= is not called, altough the compiler was eager to get one. Why is this?
Let's find out what is going on by diving into GNU's libstdc++ source code. In bits/stl_vector.h we find vector::push_back: (reformatted for the sake of readability)
//stl_vector.h
void push_back(const value_type& __x) {
if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) {
this->_M_impl.construct(this->_M_impl._M_finish, __x);
++this->_M_impl._M_finish;
} else {
_M_insert_aux(end(), __x);
}
}
What this code does, is: if the capacity of the vector was not
reached we let the allocator construct a new object at the right
place (copy constructor and placement new, by default) and increase
the size of the vector. No operator= yet. So let's see what
_M_insert_aux does which is in bits/vector.tcc. To get
the point you must know that this function is also called from
iterator insert(iterator pos, const T& x).
void _M_insert_aux(iterator __position, const _Tp& __x) {
if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) {
this->_M_impl.construct(this->_M_impl._M_finish,
*(this->_M_impl._M_finish - 1));
++this->_M_impl._M_finish;
_Tp __x_copy = __x;
std::copy_backward(__position,
iterator(this->_M_impl._M_finish-2),
iterator(this->_M_impl._M_finish-1));
*__position = __x_copy;
} else {
//... reallocate
}
}
This code again checks if the capacity was reached. If not the last
element is copied to the end and the size of the vector is
increased by one. Then all elements from the position of the new
element to the former end are copied one place to the right.
Finally the new element is placed at it's position by overwriting
the old element with the assignment operator. Here it is.
But if you look at the code flow you see that in our case the first branch of the if statement is never executed as the push_back function calls this method only if the condition is false. This means I must define a operator= with unclear semantics just because it is never used. Oh my noodles!
If I can guarante that the assignment operator is never called at runtime I could implement it like
Foo& operator=(const Foo& rhs) {
abort();
return *this;
}
To find out if this abortion will ever be triggered let's modify
the STL by removing the use of the assignment operator when the
compiler complains and recompile the program. By doing so we find
out that the only places where the assignment operator is used in
our case is the one we found above and the implementation of
std::copy_backward which is called from within the
same branch. If we remove the content of this branch from
_M_insert_aux our example compiles.
This means that our example would work with this unusual assignment operator. But generally this is no solution. As soon as we forget about this topic and extend our code to use other vector functions the assignment operator can be triggered.
Please note that this is no GNU specific problem here. The ISO
C++ standard requires that T has an assignment
operator (see section 23.2.4.3). I just showed on the example of
GNU's STL implementation where this can lead to.
I think this is a design bug of the STL. Instead of using the assigment operator one can destroy the old element and use the copy constructor to place the new one. And don't tell me about efficiency. As no memory allocation or deallocation must be done and the assignment opertator typically does the same work as the copy constructor there is only one destructor call extra. Furthermore C++ is full of traps and inconsistencies to get some doubtful efficiency.
Feel free to send me an email if you disagree or see any mistakes.
2007-10-02
Diamonds
Suppose the following class hierachie (check out
how to draw uml diagrams with graph viz. Ok, there are better
ways to draw UML but for the
moment I am a GraphViz fan
;-))
.
The corresponding code looks like this: (If you don't know what virtual inheritance means read this).
class A { };
class B1 : public virtual A { };
class B2 : public virtual A { };
class C : public B1, public B2 { };
Now suppose A has a non-trivial constructor, for instance like this:
class A {
public:
A(int i) { /* ... */ }
};
Because B1 and B2 may disagree on how A must be initialized, A must be initialized from C. This might look like this:
class C {
C() : A(42) { }
};
In my special case, B1 and B2 are meant to be used within the diamond only. But the compiler does not know this. It could be that another compilation unit defines a class which inherits from B1 without creating a diamond. In this case B1 must initialize A. So B1 needs to call A's constructor. This looks like this:
class A {
public:
A(int i) { /* ... */ }
};
class B1 : public virtual A {
public:
B1() : A(0) { }
};
class B2 : public virtual A {
public:
B2() : A(0) { }
};
class C : public B1, public B2 {
public:
C() : A(42) { }
};
Without going into much detail, in the case I encountered there is no meaningful value B1 or B2 could initialize A with. Not only that I have to define an initializer which is never called, I must find some I-don't-really-mean-it value to not initialize A with. What a mess.
Alternatively you could define a default constructor for A which is never called. Perhaps like this:
class A {
A(int i) { /* ... */ }
A() { assert(false); }
};
which is not really better.
Well, what should I say, this is C++...