Translation of the article “Pimp My Pimpl”, part 1

The first part of the article discusses the classic Pimpl idiom (pointer-to-implementation, pointer to implementation), shows its advantages and considers the further development of idioms based on it. The second part will focus on how to reduce the disadvantages that inevitably arise when using Pimpl.

Links to the original

This is a translation of the first part of the article from the Heise Developer website. The originals of both parts are here: part 1 , part 2
The translation was made from an English translation from here .

annotation

Much has been written about the idiom with the ridiculous name Pimpl. Heise Developer covers some aspects of this practical design that goes beyond the classic technique.

Classic idiom

Every C ++ programmer must have come across a class description like this:
')

class Class { // ... private: class Private; //   Private *d; //    };

Here, the data fields of class Class transferred to the nested class Class::Private . Instances of the Class class will contain only the pointer d to the Class::Private objects.

To understand why the author of the class used such a cover-up, you need to go back and look at the C ++ module system. Unlike many other languages, C ++, a successor of the C language, does not have built-in support for modules (this support was proposed for C ++ 0x, but it was not included in the final standard). Instead, an approach is used in which the declaration of module functions (but usually not their description) is placed in header files that are made available to other modules using the #include preprocessor directive. This approach gives the header files a double role: on the one hand, they are the interface of the module. On the other hand - the place of announcement of possible details of the internal implementation.

In the C language, this approach worked well: the details of the implementation of functions were completely encapsulated by separating the declaration and description; it can either make only a preliminary declaration of structures (in this case they will be private), or describe them directly in the header file (then they will be public). In object-oriented C, the above Class Class might look like this:

 struct Class; //   typedef struct Class * Class_t; // ->    void Class_new(Class_t *cls); // Class::Class() void Class_release(Class_t cls); // Class::~Class() int Class_f(Class_t cls, double num); // int Class::f(double) // ...

Unfortunately, this does not work in C ++. Methods must be declared inside the class. Classes without methods would be useless; therefore, C ++ header files usually contain class descriptions. Since the class body, unlike the namespace, cannot be re-opened, the header file must contain all declarations (data and method fields):

 class Class { public: // ...   ... ok private: // ...     ...   ,     };

The problem is obvious: the module interface (header file) necessarily contains implementation details - a bad approach. Therefore, a rather crude trick is used when all implementation details (data fields and private methods) are put into a separate class:

 // --- class.h --- class Class { public: Class(); ~Class(); // ...   ... void f(double n); private: class Private; Private *d; }; // -- class.cpp -- #include "class.h" class Class::Private { public: // ...     ... bool canAcceptN(double num) const { return num != 0 ; } double n; }; Class::Class() : d(new Private) {} Class::~Class() { delete d; } void Class::f(double n) { if (d->canAcceptN(n)) d->n = n; }

Since Class::Private used only when declaring a pointer variable, i.e. “Only by name” (Lakos), rather than “by size”, a preliminary declaration is sufficient, as is the case with pure C. All Class methods of the class will now access the private methods and data fields of Class Class::Private only through the d field .

Thus, we get the convenience of a system of fully encapsulated modules in C ++. Due to the use of an intermediate variable, one has to pay for the benefits with overhead memory allocation ( new Class::Private ), indirect calls to data fields, as well as a complete failure (at least in the public section) of inline methods. As will be shown in the second part of the article, the semantics of constant methods also change.

Before the second part of this article, devoted to correcting or, at least, alleviating the above shortcomings, let us try to describe the benefits of using the idiom in question.

Advantages of Pimpl Idioms

The benefits of using Pimpl are substantial. Encapsulating all implementation details, we get a thin and long-term stable interface (header file). The first is an easily readable description of the class; under the second - support for binary compatibility, even after significant changes in implementation.

For example, the Nokia department of Qt Development Frameworks (formerly Trolltech) at least twice during the development of the Qt 4 class library made profound changes to the rendering of widgets without having to relink applications using Qt 4.

Don't underestimate the significant build acceleration when using the Pimpl idiom, especially in large projects. The build is accelerated due to the reduction in the number of #include directives in header files and because of the significant reduction in the frequency of changes to the header files of the Pimpl classes. In the book “Solving Difficult Tasks in C ++” (“Exceptional C ++”), the Coat of Arms Satter notes the constant doubling of the compilation speed, and John Lacos even claims that the assembly is accelerated by two orders of magnitude.

Another advantage of using Pimpl: classes with d-pointers are well suited for transaction-oriented and safe regarding code exceptions. For example, a developer can use the Copy-Swap idiom (Sutter, Alexandrescu “C ++ Programming Standards,” clause 56) to create a transactional (all-or-nothing) copying assignment operator:

 class Class { // ... void swap(Class &other) { std::swap(d, other.d); } Class &operator=(const Class &other) { //    ,    *this Class copy(other); //     ,    *this swap(copy); return *this; }

The implementation of move operations in C ++ 0x is trivial (and, in particular, the same for all Pimpl classes):

  //    C++0x: Class(Class &&other) : d(other.d) { other.d = 0; } Class &operator=(Class &&other) { std::swap(d, other.d); return *this; } // ... };

In this model, the exchange function and assignment operators can be implemented as inline , without prejudice to the encapsulation of the class; developers can find an effective use of this feature.

Advanced Composition Methods

The last advantage of Pimpl, which is worth noting, is the ability to reduce additional dynamic memory allocations using direct aggregation of data fields. Without using Pimpl, aggregation can be done using pointers to separate classes from one another (using Pimpl for data fields). By using Pimpl entirely for the whole class, you can eliminate the need to store private data of complex types only by pointers.

For example, the idiomatic Qt dialog class

 class QLineEdit; class QLabel; class MyDialog : public QDialog { // ... private: //   Qt: QLabel *m_loginLB; QLineEdit *m_loginLE; QLabel *m_passwdLB; QLineEdit *m_passwdLE; };

turns into

 #include <QLabel> #include <QLineEdit> class MyDialog::Private { // ... //    Qt,       QLabel loginLB; QLineEdit loginLE; QLabel passwdLB; QLineEdit passwdLE; };

Qt experts may notice that the QDialog destructor already destroys the descendants of the widgets, therefore, direct aggregation will lead to a double call of their destruction. Indeed, the use of this technique poses the risk of errors in the memory allocation sequence (double deletion, use after release, etc.), especially if the data fields also belong to the class and vice versa. However, the conversion shown is safe in this case, since Qt always allows you to remove descendants in front of their parents.

This approach is especially effective when the data fields aggregated in this way are themselves instances of Pimpl classes. This is exactly the case in the last example, where the use of the idiom Pimpl retains four dynamic memory allocations of size size sizeof(void*) , instead of which there is only one additional (large) memory allocation. This can lead to more efficient use of the heap, since small memory allocations constantly create large overhead in the allocator.

In addition, with this approach, the compiler has much more chances to “virtualize” calls to virtual functions, i.e. it will remove double indirect calls, to which the virtuality of the called functions leads. When using pointer aggregation, this requires interprocedural optimization. In any case, this will give a performance gain in runtime against the background of additional indirect calls; however, the d-pointer should be checked as necessary by profiling specific classes.

In the case when profiling shows that dynamic memory allocation becomes a bottleneck, the application of the idiom “Fast Pimpl” can help (“Solving complex problems in C ++”, paragraph 30). In this variant, to create instances of the Private class, instead of the global operator new() , a fast allocator is used, for example, boost::singleton_pool .

Intermediate conclusions

Pimpl is a well-known C ++ idiom that allows the programmer to separate the class interface from its implementation to the extent that C ++ does not allow to do directly. The positive side effects of using the d-pointer are speeding up the compilation, simplifying the implementation of transaction semantics, and the ability to make the implementation potentially more efficient at runtime using advanced composition methods.

But d-pointers also have their drawbacks: in addition to the need to create an additional Private class, dynamic memory allocation for it, modified semantics of constant methods, potential errors in the sequence of memory allocation are also cause for concern.

In the second part of the article, the author will show a solution for some of the listed problems.
The complexity will increase even more, so in each case it is necessary to check whether the advantages of using idioms outweigh its disadvantages. In case of doubt, such a check must be done for every doubtful class. As always, there is no common solution.

What's next?

The second (and last) part of this article will introduce us to the internal structure of Pimpl, reveal problem areas and complement the idiom with the help of a number of improvements.