All nontrivial abstractions are holes.
Joel Spolsky - The Law of Holey Abstractions
And sometimes there are some pretty simple abstractions.
')
The author of this article
Most modern developers are familiar with the "law of holey abstractions" from the
famous article by Joel Spolsky with the same name. This law is that no matter how good the interaction protocol, the framework, or a set of classes that model the subject area, sooner or later we have to go down a level and deal with how this abstraction works. The internal structure of the abstraction must be the problem of the abstraction itself, but it is possible only in the most general cases and only until everything goes well (*).
Once upon a time, in a “small” small company, it was decided that why should we not “abstract” from the location of the object and make the fact that the object is local or remote, only a “implementation part”. This is how DCOM technologies and its successor .NET Remoting appeared, which hid from the developer whether an object was deleted or not. At the same time, all these “transparent proxies” appeared, which allowed working with a remote object without even knowing about it. However, over time, it turned out that this information is of paramount importance for the developer, since a remote object can generate a completely different list of exceptions, and the cost of working with it is incomparably higher than interaction with a local object.
Of course, such “information hiding” is useful, but in the general case, this leads rather to complicating the life of the developer, and not to its simplification. That is why the new version of the technology for developing distributed applications called WCF has left this practice and, although the line between the local and remote objects has remained very thin, but it still remains.
There are quite a lot of similar examples when we need to know not only the visible behavior (abstraction), but also to understand the internal structure (implementation). In most programming languages, working with different types of collections is done in a very similar way. Collections can “hide” behind base classes or interfaces (as in .NET), or use some other generalization method (as, for example, in C ++). But, despite the fact that we can work with different collections almost equally, we cannot completely untie our classes from specific types of collections. Despite the apparent similarity, we need to understand that it is better to use at the moment: a vector or a doubly linked list, hash-set or sorted set. The complexity of the basic operations depends on the internal implementation of the collection: searching for an element, inserting it in the middle or at the end of the collection, and knowing about such differences is essential.
Let's look at a specific example. We all know that types such as
List < T> (or
std :: vector in C ++) are implemented on the basis of a simple array. If the collection is already full, then adding a new element will create a new internal array, while it will “grow” not one element, but somewhat stronger (**). Many people know about this behavior, but in most cases we can not pay any attention to it: this is the “personal problem” of the
List < T> class and we don’t care.
But let's assume that we need to pass a list of enums (enums) through WCF or simply serialize such a list using the
DataContractSerializer or
NetDataContractSerializer (***) classes. In this case, the listing is announced as follows:
public enum Color
{
Green = 1,
Red,
Blue
}
* This source code was highlighted with Source Code Highlighter .
Do not pay attention to the fact that this enumeration is not marked with any attributes, it does not interfere with the
NeDataContractSerializer . The main feature of this enumeration is that it does not have a zero value; enumeration values ​​begin with
1 .
A feature of serializing enums to WCF is that you cannot serialize a value that does not belong to this enum.
public static string Serialize<T>(T obj)
{
// NetDataContractSerializer,
// DataContractSerializer
var serializer = new NetDataContractSerializer();
var sb = new StringBuilder ();
using ( var writer = XmlWriter.Create(sb))
{
serializer.WriteObject(writer, obj);
writer.Flush();
return sb.ToString();
}
}
Color color = (Color) 55;
Serialize(color);
* This source code was highlighted with Source Code Highlighter .
When we try to execute this code, we will receive the following error message:
Enum value '55' is invalid for type Color 'and cannot be serialized. . This behavior is quite logical, because in this way we are protected from the transfer of unknown values ​​between different applications.
Now let's try to transfer a collection from one element:
var colors = new List <Color> {Color.Green};
Serialize(colors);
* This source code was highlighted with Source Code Highlighter .
However, this seemingly completely harmless code also leads to a runtime error with the same content and the only difference is that the serializer cannot cope with an enumeration value of
0 . What for ... Where could he even took
0 ? We are trying to transfer a simple collection with one element, and the value of this element is absolutely correct. However,
DataContractSerializer / NetDataContractSerializer , like the good old binary serialization, uses reflection to gain access to all fields. As a result, the entire internal representation of the object, which is contained in both open and closed fields, will be serialized into the output stream.
Since the
List < T> class is built on an array, then the serialization will serialize the entire array, regardless of how many items are in the list. So, for example, when serializing a collection of two elements:
var list = new List < int > {1, 2};
string s = Serialize(list);
* This source code was highlighted with Source Code Highlighter .
In the output stream, we get not two elements, as we might expect, but 4 (ie, the number of elements corresponding to the
Capacity property, and not the
Count ):
< ArrayOfint >
< _items z:Id ="2" z:Size ="4" >
< int > 1 </ int >
< int > 2 </ int >
< int > 0 </ int >
< int > 0 </ int >
</ _items >
< _size > 2 </ _size >
< _version > 2 </ _version >
</ ArrayOfint >
* This source code was highlighted with Source Code Highlighter .
In such a case, the cause of the error message that occurs when serializing the enumeration list becomes clear. Our enumeration
Color does not contain a value equal to
0 , and this is the value that the elements of the internal array of the list are filled with:

This is another example of the “leakage” of abstraction, when the internal implementation of even such a simple class as
List < T> can prevent us from serializing it properly.
Solution to the problem
There are several solutions to this problem, each of which has its own disadvantages.
1. Add default value
The simplest solution to this problem is to add a value of
0 to the enumeration or to change the value of one of the existing elements:
public enum Color
{
None = 0,
Green = 1, // Green = 0
Red,
Blue
}
* This source code was highlighted with Source Code Highlighter .
This option is the simplest, but not always possible; the enumeration values ​​may correspond to some value in the database, and the addition of a dummy value may contradict the business logic of the application.
2. Transferring a collection without “empty” elements
Instead of doing something with the enumeration, you can ensure that the collection does not contain such empty elements. You can do this, for example, in this way:
var li1 = new List <Color> { Color.Green };
var li2 = new List <Color>(li1);
* This source code was highlighted with Source Code Highlighter .
In this case, the variable
li1 will contain three additional empty elements (the
Count will be equal to
1 and
Capacity -
4 ), and the variable
li2 will not (the internal array of the second list will contain only
1 element).
This option is quite workable, but very “fragile”: it is easy to break the working code. A harmless change on the part of your colleague in the form of deleting an unnecessary intermediate collection, and that’s it.
3. Using other types of collections in the services interface
Using other data structures, such as an array, or instead of using a DataContractSerializer, using XML serialization, which uses only open members, will solve this problem. But it’s up to you whether it’s convenient or not.
Abstractions flow, period. That is why rummaging through the internal implementation of different libraries is very useful. Even if this library perfectly hides all its details, sooner or later you will encounter a situation where without knowledge of its internal structure you cannot solve your problem. Debad, deal with the internal device and do not be afraid that it will change in the future; Not the fact that you need it, but at least it's interesting!
ZY By the way, think twice about passing meaningful types through WCF in the
List < T> type . If you have a collection of 524 elements, another 500 additional objects of a significant type will be transferred!
-
(*) Joel is far from being the first and not the last author to suggest an excellent metaphor for this purpose. So, for example, Lee Campbell once said very well about the same, but in several other words: "You must understand at least one level of abstraction below the level at which you encode." Details in a small note:
On the understanding of the desired level of abstraction .
(**) Typically, such data structures double their internal array. So, for example, when adding elements to the List <T>, the “capacity” will change in this way: 0, 4, 8, 16, 32, 64, 128, 256, 512, 1024 ...
(***) The difference between the two main types of WCF serializers is quite important.
NetDataContractSerializer , unlike
DataContractSerializer , violates the principles of SOA and adds information about the CLR type to the output stream, which violates the “cross-platform” service-oriented paradigm. You can read more about this in the notes:
What is WCF or
Declarative use of the NetDataContractSerializer .