Quite often, in a database program, the values of an integer type (for example,
long
) are used as identifiers of entities. But people tend to make mistakes, and the programmer may mistakenly use the identifier of one type of entity to address another. Such a problem can go unnoticed for a long time if the identifiers of the entities overlap, and this happens quite often. Fortunately, in languages that allow manipulation of types, which is C ++, there is a fairly simple solution to this problem.
Formulation of the problem
Suppose our program works with several types of entities. For example, let's take widgets (
Widget
class) and gadgets (
Gadget
class):
class Widget { public: long id() const;
In addition to the high probability of error, the use of "raw" types as identifiers significantly reduces the readability of the code. It’s not very easy to understand code that contains many types like
std::vector<long>
std::map<long, long>
. Using type synonyms:
typedef long WidgetId; typedef long GadgetId;
allow the programmer to write more expressive code by manipulating types like
std::map<WidgetId, GadgetId>
. But this approach will solve only the problem of readability. The compiler still does not know that we consider the values of the
WidgetId
and
GadgetId
types to
GadgetId
incompatible.
')
We inform the compiler our intentions
What would a person do if he had to operate on paper with a multitude of abstract identifiers so as not to get confused in all these numbers? I think it’s quite a reasonable approach to add a
type tag to the identifiers - a prefix or a suffix meaning an identifiable entity. For example, K-12 could mean a computer for 12 in a row as a workstation, and P-12 - a twelfth registered user in a row.
Fortunately, in C ++ there is a mechanism that allows you to attach tags to templates - templates. To solve our problem, we just need to implement a class parameterized by the type and storing the identifier:
template <typename ModelType, typename ReprType = long> class IdOf { public: typedef ModelType model_type; typedef ReprType repr_type; IdOf() : value_() {} explicit IdOf(repr_type value) : value_(value) {} repr_type value() const { return value_; } bool operator==(const IdOf &rhs) const { return value() == rhs.value(); } bool operator!=(const IdOf &rhs) const { return value() != rhs.value(); } bool operator<(const IdOf &rhs) const { return value() < rhs.value(); } bool operator>(const IdOf &rhs) const { return value() > rhs.value(); } private: repr_type value_; };
Let's apply the new class to our gadgets and widgets:
class Gadget; class Widget; typedef IdOf<Gadget> GadgetId; typedef IdOf<Widget> WidgetId; class Widget { public: WidgetId id() const;
Due to how we defined the
IdOf
class, the following code containing logical errors will not compile:
Operations on identifiers of the same type will work correctly. Now the compiler knows more about our intentions, it will not allow us to load the gadget by the widget's identifier or place the wrong type identifier in the vector.
If we still need to compare different types of identifiers, or compare an identifier with a “raw” value, you can always call the
value()
method explicitly.
Phantom types
It turns out that the trick we just cranked with identifiers has been known in functional programming for quite some time. Parameterized types that do not use type-parameter in the definition are called
phantom types (
Phantom Types ).
For example, in Haskell, a similar technique can be implemented as follows:
newtype IdOf a = IdOf { idValue :: Int } deriving (Ord, Eq, Show, Read)
Wow, just a couple of lines of code! Now add the definitions of our models:
data Widget = Widget { widgetId :: IdOf Widget } deriving (Show, Eq) data Gadget = Gadget { gadgetId :: IdOf Gadget } deriving (Show, Eq)
and check the desired behavior by creating instances of different types and trying to compare their identifiers:
Prelude> let g = Gadget (IdOf 5) Prelude> let w = Widget (IdOf 5) Prelude> widgetId w == gadgetId g <interactive>:1:15: Couldn't match type `Gadget' with `Widget' Expected type: IdOf Widget Actual type: IdOf Gadget In the return type of a call of `gadgetId' In the second argument of `(==)', namely `gadgetId g' In the expression: widgetId w == gadgetId g
Well, the compiler (more precisely, here I used the ghci interpreter for experiments) refused to accept comparison of identifiers of different types. This is just what you need.
This technique can be used to bind to the numerical values of currency labels, units of measure, and other information that can be useful to both the program reader and the compiler.
Results
Only one small class can save us a lot of time, which would have to spend on finding errors. In addition, the use of this approach will not affect the performance and memory consumption of the program at runtime during compilation with optimization turned on. The Haskell version also does not incur any overhead.
The disadvantage is the need to type (and read) a little more letters and, perhaps, explain the idea to colleagues, but quite often the advantages of more stringent logic checking by the compiler outweigh the disadvantages.
Phantom types are popular in applications that require high reliability, where each additional test, automatically performed by the compiler, reduces the company's losses. In particular, they are used extensively when programming on OCaml at Jane Street and in Standard Chartered Bank products written in Haskell (as
described by Don Stewart at Google Tech Talk 2015 ).
It is impossible not to mention the powerful library
Boost.Units , which allows performing type-safe operations on values of different types with automatic output of the result type.