A new
research document has appeared on the OpenJDK website, which describes the idea of introducing a new and improved serialization into the language instead of the old one.
Serialization in Java has existed since version 1.1, that is, almost since its birth. On the one hand, serialization is a very convenient mechanism that allows you to quickly and easily make any class serializable by inheriting this class from the java.io.Serializable interface. It is even possible that this simplicity was one of the key reasons why Java gained such huge popularity in the world, because it allowed to write network applications quickly and efficiently.
On the other hand, with how serialization is implemented in Java, there are a huge number of problems that increase the cost of supporting applications, reduce their security and slow down the evolution of the platform.
What is wrong with serialization in java? We list the most serious problems:
')
- Serialization (and de-serialization) occurs in circumvention of language mechanisms. It ignores field access modifiers (private, protected) and creates objects without using constructors, and therefore ignores the invariants that may be present in these constructors. An attacker could exploit this vulnerability by replacing the data with invalid ones, and they successfully “swallow” during deserialization.
- When writing serializable classes, the compiler does not help in any way and does not detect errors. For example, you cannot statically guarantee that all fields of a class that is being serialized are themselves serializable. Or you can be sealed in the name of the methods readObject, writeObject, readResolve, etc., and then these methods will simply not be used during serialization.
- Serialization does not support the normal versioning mechanism, so it is very difficult to modify the serializable classes so that they remain compatible with their old versions.
- Serialization is strongly tied to stream encoding / decoding, which means changing the encoding format to a different one from the standard one is very difficult. In addition, the standard format is neither compact, efficient nor human-readable.
The fundamental error of existing serialization in Java is that it tries to be too “invisible” to the programmer. It is simply inherited from java.io.Serializable and receives some kind of implicit magic that is performed by the virtual machine.
On the contrary, the programmer must explicitly write constructions responsible for the construction and deconstruction of objects. These constructs should be at the language level and should be written through static access to the fields, not reflection.
Another serialization error is that it tries to do too much. It sets itself the task of being able to serialize any arbitrary graph of objects (which may contain loops) and deserialize it back without breaking its state.
This error can be corrected if you simplify the task and do serialization not of an object graph, but of a data tree, which does not have the concept of identity (as in JSON).
How to make serialization, which would naturally fit into the object model, used constructors when deserializing, was separated from the encoding format and supported versioning? To do this, annotations come to the rescue and the possibility of a language not yet included in Java:
pattern matching . For example:
public class Range { int lo; int hi; private Range(int lo, int hi) { if (lo > hi) throw new IllegalArgumentException(String.format("(%d,%d)", lo, hi)); this.lo = lo; this.hi = hi; } @Serializer public pattern Range(int lo, int hi) { lo = this.lo; hi = this.hi; } @Deserializer public static Range make(int lo, int hi) { return new Range(lo, hi); } }
In this example, the Range class is declared, which is ready for serialization through two special class members: a serializer and a deserializer labeled with @Serializer and @Deserializer annotations. The serializer is implemented through a pattern deconstructor, and the deserializer is implemented through a static method, in which the constructor is called. Thus, during deserialization, the invariant hi> = lo specified in the constructor is inevitably checked.
In this approach, there is no magic, and the usual annotations are used, so any framework, and not just the Java platform itself, can do serialization. This means that the encoding format can be absolutely any (binary, XML, JSON, YAML, etc.).
Since serializers and deserializers are ordinary methods, the programmer has more freedom to implement them. For example, he may choose a representation of an object other than the one represented in memory. For example, LinkedList can be serialized not into a chain of links, but into one continuous array, which will make the presentation simpler, more efficient and compact.
Versioning in this approach is implemented using the special field of the annotations @Serializer and @Deserializer:
class C { int a; int b; int c; @Deserializer(version = 3) public C(int a, int b, int c) { this a = a; this.b = b; this.c = c; } @Deserializer(version = 2) public C(int a, int b) { this(a, b, 0); } @Deserializer(version = 1) public C(int a) { this(a, 0, 0); } @Serializer(version = 3) public pattern C(int a, int b, int c) { a = this.a; b = this.b; c = this.c; } }
In this example, one of the three deserializers will be called, depending on the version.
What if we don’t want serializers and deserializers to be available to anyone except for serialization purposes? For this we can make them private. However, in this case, a specific serialization framework will not be able to access them through reflection, if such a code is inside a module in which the package is not open for deep reflective access. For such a case, it is proposed to introduce one more new construction into the language: public members of classes. For example:
class Foo { private final InternalState is; public Foo(ExternalState es) { this(new InternalState(es)); } @Deserializer private open Foo(InternalState is) { this.is = is; } @Serializer private open pattern serialize(InternalState is) { is = this.is; } }
Here, serializers and deserializers are marked with the open keyword, which makes them open to setAccessible.
Thus, the new approach is fundamentally different from the old one: in it, classes are designed as serializable, and not given to the platform as it is. This requires additional efforts, but it makes serialization more predictable, safe and independent of the encoding format and serialization framework.
PS Friends, if you want to receive such news about Java more quickly and conveniently, then subscribe to
my channel in Telegram.