📜 ⬆️ ⬇️

Replication Framework • deep copying and generalized comparison of connected object graphs

Greetings reader!

I want to introduce you to the young but promising Replication Framework library for the .NET platform (perhaps, if there is enough interest in the topic, the Java version will also be implemented in the future). The library is portable ( portable ) and can be used in any project under Microsoft .NET or Mono .

The purpose of the library is in-depth copying of any objects and arbitrarily complex graphs, their generalized comparison, serialization and deserialization without distortion, tracking mutations, and state manipulation.
')
image


First of all, we define the terminology and basic entities.

A snapshot is an instant snapshot of the state of an object, isolated from the source and quite static during the execution of a program, thereby protected from accidental mutations. It is like a drawing or a sketch according to which you can later recreate a new object [graph] with the previous state, or you can set a specific state to an existing one.

You can take pictures from different angles, that is, interpret the state of objects differently, for example, collect values ​​of absolutely all properties and fields of instances, or only public, but often only those members that are marked with the special attribute DataMember . The way in which to take a snapshot depends on the ReplicationProfile [replication profile] and in particular on its internal list MemberProviders [member providers].

* By default, if a class has DataContract or CollectionDataContract attributes, then only members with the DataMember attribute are translated to the snapshot; otherwise, all fields and properties of the class are public or not.

A small example of using replication profiles
var snapshot0 = instance0.CreateSnapshot(); /* use default ReplicationProfile */ var customReplicationProfile = new ReplicationProfile { MemberProviders = new List<MemberProvider> { //new MyCustomMemberProvider(), /* you may override and customize MemberProvider class! */ new CoreMemberProviderForKeyValuePair(), //new CoreMemberProvider(BindingFlags.Public | BindingFlags.Instance, Member.CanReadWrite), new ContractMemberProvider(BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance, Member.CanReadWrite) } }; var snapshot1 = instance1.CreateSnapshot(customReplicationProfile ); Snapshot.DefaultReplicationProfile = customReplicationProfile; 


In general, a snapshot is a json -like data structure in which complex composite objects are parsed into primitives and converted into dictionaries, where the key is the name of a member (property or field) and the value is the corresponding primitive ( string , int , DateTime , etc.). ). All collections, including arrays, are a special kind of objects that, in addition to the usual properties, have one more implicit for the enumeration operation ( foreach ), and its value is equivalent to a json- array.

Reconstruction - the operation of converting the object graph to its original state on the basis of a snapshot and already existing cached instances of objects. Usually, in the course of program execution, objects and graphs consisting of them are modified, that is, mutated, but sometimes it is useful to be able to return [roll back] the graph and its objects to some specific state fixed earlier.

The reconstruction is as follows
 var cache = new Dictionary<object, int>(); var snapshot0 = graph0.CreateSnapshot(cache); /* modify 'graph0' by any way */ var graphX = snapshot0.ReconstructGraph(cache); /* graphX is the same reference that graph0, all items of the graph reverted to the previous state */ 


* It should be remembered that cached objects are kept from garbage collection, and during the reconstruction all of them return to their original state.

Replication - the operation of deep copying of the object graph on the basis of a snapshot, as a result of which a new copy of the graph is created isolated from the original one.

Replication is done as follows.
 var snapshot0 = graph0.CreateSnapshot(cache); /* modify 'graph0' by any way */ var graph1 = snapshot0.ReplicateGraph(cache); /* graph1 is a deep copy of the source graph0 */ 


* Difference between shallow and deep copying
Copying is of two types - superficial and deep. Let objects A and B be given, moreover, A contains a reference to B (column A => B). At the surface copying of object A, object A 'will be created, which will also refer to B, that is, in the end, we get two columns A => B and A' => B. They will have a common part B, so when changing object B in the first column, its state will automatically mutate in the second one. Objects A and A 'will remain independent. But the most interesting are the graphs with closed (cyclic) links. Let A refer to B and B refer to A (A <=> B), when copying object A to A, we get a very unusual graph A '=> B <=> A, that is, the original object got into the final graph subjected to cloning. Deep copying involves the cloning of all objects included in the graph. For our case, A <=> B is converted to A '<=> B', as a result, both graphs are completely isolated from each other. In some cases, superficial copying is sufficient, but not always.


Juxtaposition - a recursive operation of comparing a reference image of an object with a snapshot of the current sample.

Example of matching two snapshots
 var snapshot0 = instance0.CreateSnapshot(); /* etalon */ var snapshot1 = instance1.CreateSnapshot(); /* sample */ var juxtapositions = snapshot0.Juxtapose(snapshot1).ToList(); var differences = juxtapositions.Where(j=>j.State == Etalon.State.Different); 


Comparison of objects is an extensive topic in programming, and comparisons are an attempt to generalize it to the whole class of problems. How successful the attempt is, you can evaluate it yourself.

Let us have an instance of any object. At the initial moment of time, we take a snapshot of its state, after some time we can repeat the snapshot and, comparing both snapshots, identify all mutations of interest to us that have occurred with this instance, that is, implement state tracking. It is worth noting that in practice it is not always possible to copy an object or simply to keep track of changes, but it is always easy to take a picture of it.

In the case when we have two or more instances of objects, for example: reference and workers, they can be either of the same type or different. Their pictures can be taken at arbitrary points in time and compared in any combination without restrictions. Matching is done by member names (properties and fields).

* Importantly, the result of the mapping operation is IEnumerable<Juxtaposition> , which makes it possible to interrupt the recursive mapping process at any time after certain conditions are reached, rather than producing it completely, which in turn is significant for performance.

We turn to practice and pay attention to the key points.

Code to generate diagnostic object graph
 using System; using System.Collections.Generic; using System.Runtime.Serialization; namespace Art.Replication.Diagnostics { [DataContract] public class Role { [DataMember] public string Name; public string CodePhrase; [DataMember] public DateTime LastOnline = DateTime.Now; [DataMember] public Person Person; } public class Person { public string FirstName; public string LastName; public DateTime Birthday; public List<Role> Roles = new List<Role>(); } public static class DiagnosticsGraph { public static Person Create() { var person0 = new Person { FirstName = "Keanu", LastName = "Reeves", Birthday = new DateTime(1964, 9 ,2) }; var roleA0 = new Role { Name = "Neo", CodePhrase = "The Matrix has you...", LastOnline = DateTime.Now, Person = person0 }; var roleB0 = new Role { Name = "Thomas Anderson", CodePhrase = "Follow the White Rabbit.", LastOnline = DateTime.Now, Person = person0 }; person0.Roles.Add(roleA0); person0.Roles.Add(roleB0); return person0; } } } 


Namespaces that may come in handy
 using Art; using Art.Replication; using Art.Replication.Replicators; using Art.Replication.MemberProviders; using Art.Serialization; using Art.Serialization.Converters; 


Creating a snapshot and serializing it into a string without distortion with default settings
  public static void CreateAndSerializeSnapshot() { var person0 = DiagnosticsGraph.Create(); var snapshot0 = person0.CreateSnapshot(); string rawSnapsot0 = snapshot0.ToString(); Console.WriteLine(rawSnapsot0); Console.ReadKey(); } 


Result of work (the full structure of the image is clearly visible)
 { #Id: 0, #Type: "Art.Replication.Diagnostics.Person, Art.Replication.Diagnostics, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null", FirstName: "Keanu", LastName: "Reeves", Birthday: "1964-09-02T00:00:00.0000000+03:00"<DateTime>, Roles: { #Id: 1, #Type: "System.Collections.Generic.List`1[[Art.Replication.Diagnostics.Role, Art.Replication.Diagnostics, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]], mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089", #Set: [ { #Id: 2, #Type: "Art.Replication.Diagnostics.Role, Art.Replication.Diagnostics, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null", Name: "Neo", LastOnline: "2017-06-14T14:42:44.0000575+03:00"<DateTime>, Person: { #Id: 0 } }, { #Id: 3, #Type: "Art.Replication.Diagnostics.Role, Art.Replication.Diagnostics, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null", Name: "Thomas Anderson", LastOnline: "2017-06-14T14:42:44.0000575+03:00"<DateTime>, Person: { #Id: 0 } } ] } } 


• The Person class has the DataContract attribute, so all its fields with the DataMember attribute , except for CodePhrase , are in the snapshot.

• Each object is assigned its own identifier #Id: 0 , if the link to the object is found in the object graph more than once, then the following construction is substituted instead of replication.

  Person: { #Id: 0 } 

This protects against multiple replication of the same object instance, and in cases of cyclic references from entering endless recursion and Stack Overflow Exception ( note : not all serializers can cope with such situations).

• To each object, complete type information is added by the #Type key.
• Some primitives also contain information about the type of Birthday: "1964-09-02T00:00:00.0000000+03:00"<DateTime> . It is necessary to restore (deserialize) the image without distortion.
• The List <Role> collection is serialized as an object, but it has the #Set property, which is used to list nested objects.

However, one should not think that seralization and deserialization are supported by the library only in such a full format, it is also possible to use more classic json by adjusting the replication and saving profiles (there may be minor distortions inherent in ordinary serializers).

Serialization of an object into classic json and its successful deserialization
  public static void UseClassicalJsonSettings() { Snapshot.DefaultReplicationProfile.AttachId = false; Snapshot.DefaultReplicationProfile.AttachType = false; Snapshot.DefaultReplicationProfile.SimplifySets = true; Snapshot.DefaultReplicationProfile.SimplifyMaps = true; Snapshot.DefaultKeepProfile.SimplexConverter.AppendTypeInfo = false; Snapshot.DefaultKeepProfile.SimplexConverter.Converters .OfType<NumberConverter>().First().AppendSyffixes = false; } public static void CreateAndSerializeSnapshotToClassicJsonStyle() { UseClassicalJsonSettings(); var person0 = DiagnosticsGraph.Create(); var snapshot0 = person0.CreateSnapshot(); string rawSnapsot0 = snapshot0.ToString(); Console.WriteLine(rawSnapsot0); var person0A = rawSnapsot0.ParseSnapshot().ReplicateGraph<Person>(); Console.WriteLine(person0A.FirstName); Console.ReadKey(); } 


Classic json
 { FirstName: "Keanu", LastName: "Reeves", Birthday: "1964-09-02T00:00:00.0000000+03:00", Roles: [ { Name: "Neo", LastOnline: "2017-06-14T18:31:20.0000205+03:00", Person: { #Id: 0 } }, { Name: "Thomas Anderson", LastOnline: "2017-06-14T18:31:20.0000205+03:00", Person: { #Id: 0 } } ] } 


About saving and restoring state without distortion

Of course, it is always convenient to be able to save the state of objects (graphs) into a string or an array of bytes for later recovery. As a rule, serialization mechanisms are used for this. They are quite functional, but upon their close inspection a number of hard restrictions are often revealed that are imposed by a specific serializer on objects, for example, attributes or annotations, special methods, the absence of closed references in the graph, the presence of a constructor without parameters, or something else may be necessary.

But there are also two implicit minuses that are common to many serializers. First, as mentioned earlier, if there are several references to the same instance of an object in the graph, some serializers save it again, which is why deserialization already produces several copies of the same object (the graph is significantly modified). Secondly, in some cases there can be a loss of information about the type of the object, which leads to a distorted restoration of the types of objects during deserialization, for example, long turns into int , Guid into a string or vice versa.

  public class Distorsion { public object[] AnyObjects = { Guid.NewGuid(), Guid.NewGuid().ToString(), DateTime.Now, DateTime.Now.ToString("O"), 123, 123L, }; } 

Replication Framework uses its own json- serializer, which saves metadata about object types, supports multiple and cyclic references in a graph, which makes it possible to complete deserialization without distortion.

Key Usage Scenarios

Replication:

  public static void Replicate() { var person0 = DiagnosticsGraph.Create(); var snapshot0 = person0.CreateSnapshot(); var person1 = snapshot0.ReplicateGraph<Person>(); person1.Roles[1].Name = "Agent Smith"; Console.WriteLine(person0.Roles[1].Name); // old graph value: Thomas Anderson Console.WriteLine(person1.Roles[1].Name); // new graph value: Agent Smith Console.ReadKey(); } 

Reconstruction:

  public static void Reconstract() { var person0 = DiagnosticsGraph.Create(); var cache = new Dictionary<object, int>(); var s = person0.CreateSnapshot(cache); Console.WriteLine(person0.Roles[1].Name); // old graph value: Thomas Anderson Console.WriteLine(person0.FirstName); // old graph value: Keanu person0.Roles[1].Name = "Agent Smith"; person0.FirstName = "Zion"; person0.Roles.RemoveAt(0); var person1 = (Person)s.ReconstructGraph(cache); Console.WriteLine(person0.Roles[1].Name); // old graph value: Thomas Anderson Console.WriteLine(person1.Roles[1].Name); // old graph value: Thomas Anderson Console.WriteLine(person0.FirstName); // old graph value: Keanu Console.WriteLine(person1.FirstName); // old graph value: Keanu Console.ReadKey(); // result: person0 & person1 is the same one reconstructed graph } 

Matching:

  public static void Justapose() { // set this settings for less details into output Snapshot.DefaultReplicationProfile.AttachId = false; Snapshot.DefaultReplicationProfile.AttachType = false; Snapshot.DefaultReplicationProfile.SimplifySets = true; Snapshot.DefaultReplicationProfile.SimplifyMaps = true; var person0 = DiagnosticsGraph.Create(); var person1 = DiagnosticsGraph.Create(); person0.Roles[1].Name = "Agent Smith"; person0.FirstName = "Zion"; var snapshot0 = person0.CreateSnapshot(); var snapshot1 = person1.CreateSnapshot(); var results = snapshot0.Juxtapose(snapshot1); foreach (var result in results) { Console.WriteLine(result); } Console.ReadKey(); } <Different> [this.FirstName] {Zion} {Keanu} <Identical> [this.LastName] {Reeves} {Reeves} <Identical> [this.Birthday] {9/2/1964 12:00:00 AM} {9/2/1964 12:00:00 AM} <Identical> [this.Roles[0].Name] {Neo} {Neo} <Identical> [this.Roles[0].LastOnline] {6/14/2017 9:34:33 PM} {6/14/2017 9:34:33 PM} <Identical> [this.Roles[0].Person.#Id] {0} {0} <Different> [this.Roles[1].Name] {Agent Smith} {Thomas Anderson} <Identical> [this.Roles[1].LastOnline] {6/14/2017 9:34:33 PM} {6/14/2017 9:34:33 PM} <Identical> [this.Roles[1].Person.#Id] {0} {0} 

About performance

Currently, the library has fairly good performance, but it is worth understanding that the use of a generalized intermediate snapshot mechanism imposes additional costs in terms of both memory and execution speed in some tasks. However, in reality, not everything is so unequivocal, since the snapshot mechanism can also give a gain in a number of scenarios.

Loss:

- more memory consumption when serializing objects
- approximately 2-2.5 times lower speed of serialization and subsequent deserialization (depends on the serialization settings and the type of tests)

Winning:

- copying a graph using a snapshot without using serialization and deserialization (no need to convert primitives into a string or an array of bytes, due to which acceleration is achieved)
- better memory usage when partially storing the state of large objects in snapshots instead of copying them completely
* Performance comparison was made with BinaryFormatter , Newtonsoft.Json , and also with DataContractJsonSerializer .

A few words in conclusion about the Replication Framework

Developed a solution in a small creative studio "Meykloft" [ Makeloft ] . Now the project is at the stage of preliminary version, but its capabilities are impressive, although only the basic functionality has been implemented. A lot of time and effort was spent on development, therefore the framework is free only for educational and non-commercial projects .

Currently, a commercial license for use in a separate project costs $ 15 ( when purchasing a license, access to source codes is provided , and if necessary, more detailed advice on technical subtleties, for example, how to replicate objects with parameterized constructors). Probably in the future with the development of the solution the price will increase. If you plan to use the framework on an ongoing basis in a variety of projects, then the cost of such a license can be negotiated in person.

You can download the trial version from Nuget , it is functional until September 2017 . The project with code examples from the article can be downloaded from here . If the library leaves a good impression and you decide to use it in any of your decisions, please send a request for a free or paid license to makeman@tut.by. In the request, indicate the name and type of the project in which you plan to use the library.

Thank you very much for your attention! Feel free to ask questions and write wishes!

Source: https://habr.com/ru/post/330294/


All Articles