📜 ⬆️ ⬇️

Universal data converter on the .Net Framework

In this article I would like to tell the experience of our team to create a universal data converter. At first glance it sounds very simple, what is so complicated? Take one data type lead to another type. And if the data is a structure? Not hard either, you say, you just need to do the mapping of fields. Yes, just. But when there are several target structures, they are all complex and it needs to be converted on the fly, and with data enrichment, as they say, “I have to think”.
The team was assigned the task:
Write a data converter from one structure to several other target structures. Moreover, the storage format of the source data and destination data can be completely arbitrary. Conversion should be based on rules with the possibility of reuse and editing. During the conversion process, some data needs to be recoded, for example, transfer the string “# ff0000” to the string “red”.
In addition, as you know, the user wants to be able to read and edit all integration conversions, i.e. The developed functionality should be placed on the UI with the possibility of editing.
So let's get started. In theory, input and output formats can be of any type (csv, json, etc.). For clarity, choose the format of XML.


An example XML source is "convert FROM":


<Car> <Color>#ff0000</Color> <Length>5296 cm<Length> <Width>1848 cm</Width> <Price>31000 USD</Price> </Car> 

An example XML destination is "convert to":


 <Vehicle> <Body> <Exterior> <Color>red</Color> </Exterior> <Size> <MeasureUnit>ft</measureUnit> <Length>17.3753</Length> <Width>6.0630</Width> </Size> </Body> <Msrp> <Currency>RUB</Currency> <Value>1600000</Value> </Msrp> </Vehicle> 

As you can see, there are recalculations and conversions on the way, not only the location of values ​​in the structure changes, but also their types, there are calculated values, and the source data does not contain all the necessary data for successful conversion to the final format (enrichment is required). I will list a few:


  1. The color of the Car.Color car in the source is displayed as RBG code “# ff0000”, and in the destination object it needs to be recoded into the verbal interpretation “red” in the Vehicle.Body.Exterior.Color tag;
  2. The length of the Car.Lenght car needs to be parsed into several components, the measurement value and the unit of measurement and converted to US feet, the resulting value should be put into Vehicle.Size.Length;
  3. The price of the Car.Price car should also be parsed into components, recalculated at the exchange rate of the Central Bank in rubles on the date of the recalculation put in Vehicle.Msrp.

Choosing a container for accessing data


We cannot work directly with the XML format, since firstly this is the text, and secondly there is the requirement not to bind to the format. In this case, it is logical to work with container objects in the computer's memory, which will have a convenient interface for accessing their data and have a structured type for referring to parts of it.
For this, the usual C # classes are best suited for which the structure exactly matches the data for storage. Creating this class is greatly simplified if the XML is typed and there is an XSD schema. With the help of utilities, you can build a class automatically and use it in the code without extra work.
The classes for our structures are described below.


Container class C # source:


 public class Car { public string Color; public string Length; public string Width; public string Price; } 

Container class C # destination:


 public class Vehicle { public Body Body; public Msrp Msrp; } public class Body { public Exterior Exterior; public Size Size; } public class Msrp { public string Currency; public decimal Value; } public class Exterior { public string Color; } public class Size { public string MeasureUnit; public decimal Length; public decimal Width; } 

Loading source data into container


In the .Net Framework, there are ready-made components that perform de-serialization of XML data, with which we get an instance of the class that is automatically filled with the source data.
If the file is of a more specific format, it will not be difficult to write a custom data loading library.


Access to container data


The first thing we need to learn is to have a single way to access container data with arbitrary structures. Those. we need access to the container metadata. This is solved through .Net reflection. We can get to any property or field of a class, and knowing the type and location of the data we can modify them.
For the direct indication of the structural element (node) we will use the XPath analogy for XML. For example, in order to indicate in the source the node we need it is enough to indicate the line “Car.Color”.


Rules for converting source container data to destination container


So, we have two containers, both have a structured architecture. Now we need to learn how to convert one into another, from the source container to the destination container.
As stated in the problem statement, the conversion should be performed based on a set of rules. The rules must be universal so that they can be used repeatedly.
The code displays the following interaction scheme (see the diagram below): Data is serialized from XML into a .Net object (1-2), then by referring to the data of the container (2), the conversion is performed based on the list (3) of the rules into the destination container (2 -3-4). Moreover, the rules have the ability to enrich the data (3-3'-3). After the target container has been initialized, the data is uploaded to the final format (4-5).


Scheme 1. The scheme of interaction of components inside the converter:


The interaction scheme of components inside the converter

Now we will develop a conversion mechanism using the rules. With their help, we should be able to describe any conversion.
To write a new rule language, and then implement a separate compiler or interpreter for it - this is clearly superfluous. We decided to use ordinary C # code, which can always be compiled and connected to existing functionality. Several interfaces and C # base classes have been developed.


Converter itself:


 public interface IConverter { T Convert<T>(Object source, IDictionary<string, ConversionRule> rules) where T : class, new(); ... } 

where the list of rules is IDictionary <string, ConversionRule>, in which the string keys are the paths to the destination container data, for example, "Vehicle.Msrp"


And the rule of conversion:


 public abstract class ConversionRule { public abstract object GetValue(object source); ... } 

The task of the converter is to convert the specified source object source into a new object of type T in accordance with the list of rules.
When converting the “source” to “destination”, the converter performs the following actions.



The source container source object is passed to each rule. The rule must perform the calculation and return the resulting value. As you can see in the example, there is no strict typing in the conversion rules, an object can be passed to the input, we also get an object at the output.
Consider an example of a rule that: gets the price of a car.


Below is a table of settings for the conversion rule:


Target node in the destinationConversion Rule (class in assembly)Parameters for the conversion rule
Vehicle.MsrpConvertStringPriceToMsrpTargetCurrency = “RUB”, SourcePath = “Car.Price”

Example of a custom conversion rule class:


 public class ConvertStringPriceToMsrp: ConvertionRule { public string TargetCurrency; public string SourcePath; public override object GetValue(object source) { var targetObject = new Msrp(); targetObject.Currency = TargetCurrency; targetObject.Value = SplitAndCalc(GetFiled(source(), SourcePath, TargetCurrency); return targetObject; } ... } 

Before starting the rule, it is initialized by iterating through its fields and properties through reflection and filling the same values ​​with TargetCurrency and SourcePath values ​​from the config (a set of parameters for a specific rule instance).
By processing this rule, the ConvertStringPriceToMsrp object takes the field value in the Car.Price source container, splits the string into components: price and currency, and creates the resulting Msrp object by filling in the Msrp.Curreny = RUB and Msrp.Value fields = [price in rubles].
As can be seen from the description, the rule still needs to refer to an external data source in order to get the current exchange rate of the ruble against the dollar. Those. The conversion rule can connect to any external data source and perform data enrichment.


Uploading destination data from container


Uploading data from the target object to XML is performed in the same way by the finished .Net Framework library by serializing the object. The component neatly adds the data of the fields and class properties into an XML structure.


If the destination file is also specific, then you need to write an adapter, which will save our destination container in the required format.


The current prototype dignity and challenges


For automatic loading of service reference libraries (for data enrichment, for reusable reference books), we introduced Autofac IoC. Thus, when converting a large amount of homogeneous data, we solved the problem of an extra load on I / O and accelerated processing.


Conversion to the destination takes place in a single pass without unnecessary cycles.
Due to recursiveness, it is possible to substitute the node value optionally “by choice”. This option is very useful for XML, when the structure of one tag depends on another (for example, on the type of product, different tags are filled in - we actively use this when generating XML in the Amazon API).


At the same time, all work with metadata is based on reflection, and on the horizon there is a potential problem of speed. The problem will manifest itself when delays in the calculations of reflection will dominate us with fast calculations within the rules of converters. Currently, this problem has not yet declared itself. But, if it does appear, then there is the idea of ​​caching container-destination types during batch processing.


We brought all the rule settings to the Web interface so that users could quickly change the settings. The conversion settings were first stored in XML, but for the convenience of editing they decided to transfer to the database.


With all the advantages and disadvantages, we did get the desired “Universal Data Converter on the .Net Framework”. Now he is actively working in the modules for publishing goods on the Amazon, Wallmart and other trading platforms, this is where constant mapping, conversion and enrichment of data is required.


')

Source: https://habr.com/ru/post/347288/


All Articles