The article is devoted to the dual use of Expression Trees API - for parsing expressions and for generating code. Parsing expressions helps build representation structures (they are also representation structures of the problem-oriented Internal DSL language), and code generation allows you to dynamically create effective functions — sets of instructions specified by representation structures.
I will demonstrate the dynamic creation of property iterators: serialize, copy, clone, equals . Using the example of serialize, I’ll show how to optimize serialization (as compared to stream serializers) in a classic situation where prior knowledge is used to improve performance. The idea is that the call to a streaming serializer always loses the "non-streaming" function that knows exactly which nodes in the tree need to be circumvented. At the same time, such a serializer is created "not by hand" but dynamically, but according to predetermined bypass rules. The proposed Inernal DSL solves the problem of a compact description of rules for traversing tree structures of objects by their properties / properties (and, in general, bypassing the tree of computations with the naming of nodes) . The serializer's benchmark is modest, but it is important in that it adds an approach built around the use of a particular Internal DSL Includes (the dialect of that Include / ThenInclude from EF Core ) and the application of Internal DSL as a whole, the necessary persuasiveness.
Compare:
var p = new Point(){X=-1,Y=1}; // which has better performance ? var json1 = JsonConvert.SerializeObject(p); var json2 = $"{{\"X\":{pX}, \"Y\":{pY}}}";
The second method is obviously faster (the nodes are known and “crammed into code”), while the method is of course more complicated. But when you get this code as a function (dynamically generated and compiled) - the complexity is hidden (even what is becoming unclear is hidden
where is the reflection, and where is the runtime code generation).
var p = new Point(){X=-1,Y=1}; // which has better performance ? var json1 = JsonConvert.SerializeObject(p); var formatter = JsonManager.ComposeFormatter<Point>(); var json2 = formatter(p);
Here JsonManager.ComposeFormatter
is a real tool . The rule on which the structure traversal is generated during serialization is not obvious, but it sounds like "with the default parameters, go around all the first level fields for custom value type". If it is set explicitly:
// var formatter2 = JsonManager.ComposeFormatter<Point>( chain=>chain .Include(e=>eX) .Include(e=>eY) // DSL Includes )
This is the metadata description by DSL Includes. The analysis of the pros and cons of metadata descriptions by DSL is enlightened, but now ignoring the metadata entry form, I emphasize that C # provides the ability to compile and compile the “perfect serializer” using Expression Trees.
transition from formatter
to serilizer
(for now without expression trees):
Func<StringBuilder, Point, bool> serializer = ... // later string formatter(Point p) { var stringBuilder = new StringBuilder(); serializer(stringBuilder, p); return stringBuilder.ToString(); }
In turn, the serializer
is built like this (if specified with a static code):
Expression<Func<StringBuilder, Point, bool>> serializerExpression = SerializeAssociativeArray(sb, p, (sb1, t1) => SerializeValueProperty(sb1, t1, "X", o => oX, SerializeValueToString), (sb4, t4) => SerializeValueProperty(sb1, t1, "Y", o => oY, SerializeValueToString) ); Func<StringBuilder, Point, bool> serializer = serializerExpression.Compile();
Why so "functionally", why it is impossible to set the serialization of two fields through a "semicolon"? In short: because this expression can be assigned to a variable of type Expression<Func<StringBuilder, Box, bool>>
, and a "semicolon" is impossible.
Why was it impossible to directly write Func<StringBuilder, Point, bool> serializer = (sb,p)=>SerializeAssociativeArray(sb,p,...
? It's possible, but I’m demonstrating not the creation of a delegate, but an assembly (in this case, a static code) expression tree, with the next compilation to the delegate, in practical use of the serializerExpression
will be set completely differently - dynamically (below).
But what is important in the solution itself: SerializeAssociativeArray
accepts an array of params Func<..> propertySerializers
for the number of nodes that must be bypassed. Some of them can be traversed by the SerializeValueProperty "leaf" SerializeValueProperty
(accepting the SerializeValueToString
formatter), and the SerializeAssociativeArray
(i.e. branches) can be set again, and thus the bypass iterator is built.
If Point contained a NextPoint property:
var @delegate = SerializeAssociativeArray(sb, p, (sb1, t1) => SerializeValueProperty(sb1, t1, "X", o => oX, SerializeValueToString), (sb4, t4) => SerializeValueProperty(sb1, t1, "Y", o => oY, SerializeValueToString), (sb4, t4) => SerializeValueProperty(sb1, t1, "NextPoint", o => o.NextPoint, (sb4, t4) =>SerializeAssociativeArray(sb1, p1, (sb1, t1) => SerializeValueProperty(sb2, t2, "X", o => oX, SerializeValueToString), (sb4, t4) => SerializeValueProperty(sb2, t2, "Y", o => oY, SerializeValueToString) ) ) );
The structure of the three functions SerializeAssociativeArray
, SerializeValueProperty
, SerializeValueToString
not complicated:
public static bool SerializeAssociativeArray<T>(StringBuilder stringBuilder, T t, params Func<StringBuilder, T, bool>[] propertySerializers) { var @value = false; stringBuilder.Append('{'); foreach (var propertySerializer in propertySerializers) { var notEmpty = propertySerializer(stringBuilder, t); if (notEmpty) { if (!@value) @value = true; stringBuilder.Append(','); } }; stringBuilder.Length--; if (@value) stringBuilder.Append('}'); return @value; } public static bool SerializeValueProperty<T, TProp>(StringBuilder stringBuilder, T t, string propertyName, Func<T, TProp> getter, Func<StringBuilder, TProp, bool> serializer) where TProp : struct { stringBuilder.Append('"').Append(propertyName).Append('"').Append(':'); var value = getter(t); var notEmpty = serializer(stringBuilder, value); if (!notEmpty) stringBuilder.Length -= (propertyName.Length + 3); return notEmpty; } public static bool SerializeValueToString<T>(StringBuilder stringBuilder, T t) where T : struct { stringBuilder.Append(t); return true; }
Many details are not given here (lists support, reference type and nullable). And yet it is clear that I will really get json on output, and the rest is even more typical functions of SerializeArray
, SerializeNullable
, SerializeRef
.
It was a static Expression Tree, not dynamic, not eval in C # .
See how the Expression Tree is built dynamically in two steps:
Expression<T>
It certainly will surprise for the first time. Nothing is clear, but you can see how the first four lines put together something like:
("sb","t") .. SerializeAssociativeArray..
Then the connection with the source code is captured. And it should become clear that if you learn such a record (by combining 'Expression.Const', 'Expression.Parameter', 'Expression.Call', 'Expression.Lambda' etc ...) you can really dynamically assemble - any bypass of the nodes (based on metadata). This is eval in C # .
The same decompiler code, but compiled by man.
Be drawn into this embroidery with beads is only necessary to the author of the interpreter. All these artworks remain inside the serialization library . It is important to grasp the idea that libraries can provide dynamically generating compiled efficient functions in C # (and. NET Standard).
However, the stream serializer will overtake the dynamically generated function if the compilation is invoked every time before serialization (compilation inside ComposeFormatter
is a costly operation), but you can save the link and reuse it:
static Func<Point, string> formatter = JsonManager.ComposeFormatter<Point>(); public string Get(Point p){ // which has better performance ? var json1 = JsonConvert.SerializeObject(p); var json2 = formatter(p); return json2; }
If, however, you need to build and save a serializer of anonymous types for reuse, then additional infrastructure is needed:
static CachedFormatter cachedFormatter = new CachedFormatter(); public string Get(List<Point> list){ // there json formatter will be build only for first call // and assigned to cachedFormatter.Formatter // in all next calls cachedFormatter.Formatter will be used. // since building of formatter is determenistic it is lock free var json3 = list.Select(e=> {X:eX, Sum:e.X+EY}) .ToJson(cachedFormatter, e=>e.Sum); return json3; }
After that, we confidently count the first micro-optimization for ourselves and accumulate, accumulate, accumulate ... To whom is the joke, who does not, but before turning to the question that the new serializer knows how new - I fix the obvious advantage - it will be faster.
The DSL Includes interpreter in serilize (and in the same way it is possible in iterators equals, copy, clone - and this will also be the case) required the following costs:
1 - costs of the infrastructure for storing references to compiled code.
These costs are generally not necessary, as is the use of Expression Trees with compilation - the interpreter can also create a serializer on reflections and even lick it so much that it approaches speed to streaming serializers (by the way, shown at the end of the article copy, clone and equals and do not gather through expression trees, and they didn’t lick, there is no such task, in contrast to “overtake” ServiceStack and Json.NET in the framework of the universally understood json optimization problem - a necessary condition for presenting a new solution).
2 - you need to keep abstractions leaking in your head as well as a similar problem: changes in semantics compared to existing solutions.
For example, Point and IEnumerable need two different serializers to serialize.
var formatter1 = JsonManager.ComposeFormatter<Point>(); var formatter2 = JsonManager.ComposeEnumerableFormatter<Point>(); // but not // var formatter2 = JsonManager.ComposeEnumerableFormatter<List<Point>>();
Or: "Does closure work / closure?". It works, only the node must be given a name (unique):
string DATEFORMAT= "YYYY"; var formatter3 = JsonManager.ComposeFormatter<Record>( chain => chain .Include(i => i.RecordId) .Include(i => i.CreatedAt.ToString(DATEFORMAT) , "CreatedAt"); );
This behavior is dictated by the internal device-specific interpreter ComposeFormatter
.
The costs of this type are inevitable evil. Moreover, it is found that by increasing the functionality and expanding the scope of Internal DSL, leaks of abstraction are also increasing. The developer of the Internal DSL will of course be oppressed, here you need to stock up on a philosophical mood.
For a user, abstraction leaks are overcome by knowing the technical details of the Internal DSL ( what to expect? ) And the wealth of functionality of a particular DSL and its interpreters ( which in return? ). Therefore, the answer to the question: “is it worth creating and using the Internal DSL?” Can only be a story about the functionality of a particular DSL - about all its small things and amenities, and application possibilities (interpreters), i.e. story about overcoming costs.
Having all this in mind, I return to the effectiveness of a particular DSL Includes.
Much greater efficiency is achieved when the replacement of a triple (DTO, transformation to DTO, serialization of DTO) becomes one at the place of a detailed instructed and generated serialization function. At the end-ends, the dualism of the object-object allows you to state "DTO is such a function" and set a goal: learn how to set the DTO function.
Serialization must be configured:
e=>e.Name
), but generally by any function (`e => e.Name.ToUpper ()," MyMemberName ") - set the formatter to a specific node.Other features that increase flexibility:
Everywhere such constructions participate: walk tree, branch, leaf, and all this can be written using DSL Includes.
Since everyone is familiar with EF Core - the meaning of subsequent expressions should be captured immediately (this is a subset of xpath).
// DSL Includes Include<User> include1 = chain=> chain .IncludeAll(e => e.Groups) .IncludeAll(e => e.Roles) .ThenIncludeAll(e => e.Privileges) // EF Core syntax // https://docs.microsoft.com/en-us/ef/core/querying/related-data var users = context.Users .Include(blog => blog.Groups) .Include(blog => blog.Roles) .ThenInclude(blog => blog.Privileges);
Here are the nodes "with navigation" - "branches".
The answer to the question of which nodes "leaves" (fields / properties) are included in the so specified tree is none. To include the leaves, you must either list them explicitly:
Include<User> include2 = chain=> chain .Include(e => e.UserName) // leaf member .IncludeAll(e => e.Groups) .ThenInclude(e => e.GroupName) // leaf member .IncludeAll(e => e.Roles) .ThenInclude(e => e.RoleName) // leaf member .IncludeAll(e => e.Roles) .ThenIncludeAll(e => e.Privileges) .ThenInclude(e => e.PrivilegeName) // leaf member
Or add dynamically according to the rule, through a specialized interpreter:
// Func<ChainNode, MemberInfo> rule = ... var include2 = IncludeExtensions.AppendLeafs(include1, rule);
Here the rule is a rule that can be selected by ChainNode. Type ie by the type of expression returned by the node (ChainNode is an internal representation of DSL Includes what else will be said) properties (MemberInfo) for participation in serialization, for example. only property, or only read / write property, or only those for which there is a formatter, can be selected by the list of types, and even the include expression itself can specify a rule (if it lists leaf nodes — that is, the form of the tree association) .
Or ... leave it to the discretion of the user interpreter, who decides what to do with the nodes. DSL Includes is just a metadata entry - how to interpret this entry depends on the interpreter. He can interpret the metadata as he wants, even ignoring it. Some interpreters will perform the action themselves, others build a function ready to perform them (via the Expression Tree, or even Reflection.Emit). A good Internal DSL is designed for universal use and the existence of many interpreters, each of which has its own specifics, its own leaks of abstraction.
Code using Internal DSL can be very different from what it was before.
Integration with EF Core.
The running task is to “cut off circular references”, to allow only what is specified in the include expression to be serialized:
static CachedFormatter cachedFormatter1 = new CachedFormatter(); string GetJson() { using (var dbContext = GetEfCoreContext()) { string json = EfCoreExtensions.ToJsonEf<User>(cachedFormatter1, dbContext, chain=>chain .IncludeAll(e => e.Roles) .ThenIncludeAll(e => e.Privileges)); } }
The ToJsonEf
interpreter accepts the navigation sequence, when serialized it uses it (selects the leaves for the "default for EF Core" rule, that is, public read / write property), is interested in the model - where string / json to insert as is, uses field formatters default (byte [] to string, datetime in ISO, etc.). Therefore, he must perform IQuaryable from under him.
In the case when the result is transformed, the rules change - there is no need to use DSL Includes to set the navigation (if there is no re-use of the rule), a different interpreter is used, and the configuration occurs locally:
static CachedFormatter cachedFormatter1 = new CachedFormatter(); string GetJson() { using (var dbContext = GetEfCoreContext()) { var json = dbContext.ParentRecords // back to EF core includes // but .Include(include1) also possible .IncludeAll(e => e.Roles) .ThenIncludeAll(e => e.Privileges) .Select(e => new { FieldA: e.FieldA, FieldJson:"[1,2,3]", Role: e.Roles().First() }) .ToJson(cachedFormatter1, chain => chain.Include(e => e.Role), LeafRuleManager.DefaultEfCore, config: rules => rules .AddRule<string[]>(GetStringArrayFormatter) .SubTree( chain => chain.Include(e => e.FieldJson), stringAsJsonLiteral: true) // json as is .SubTree( chain => chain.Include(e => e.Role), subRules => subRules .AddRule<DateTime>( dateTimeFormat: "YYYMMDD", floatingPointFormat: "N2" ) ), ), useToString: false, // no default ToString for unknown leaf type (throw exception) dateTimeFormat: "YYMMDD", floatingPointFormat: "N2" } }
It is clear that all these details, all this "by default", can be remembered only if it is very necessary and / or if this is your own interpreter. On the other hand, we once again return to the pluses: DTO is not smeared over the code, given by a specific function, interpreters are universal. The code becomes less - it is already good.
You need to be warned : although it would seem in ASP and prior knowledge is always available, and streaming serializer is not a necessary thing in the world of the web, where even databases send data to json, but the use of DSL Includes in ASP MVC is not the most simple . How to combine functional programming with ASP MVC - deserves a separate study.
In this article I will limit myself to the subtleties of DS Includes, I will show both new functionality and the leakage of abstractions, in order to show that the problem of analyzing "costs and acquisitions" is generally exhausted.
Include<Point> include = chain => chain.Include(e=>eX).Include(e=>eY);
This is different from EF Core Includes built on static functions that cannot be assigned to variables and passed as parameters. DSL Includes itself was born from the need to pass "include" into my implementation of the Repository template without degrading information about the types that would appear if they were translated into strings in the standard way.
The most fundamental difference is still in the appointment. EF Core Includes — enables navigation properties (branch nodes), DSL Includes — a record of the computation tree traversal, assigning the name (path) to the result of each calculation.
Internal view EF Core Includes - the list of strings received by MemberExpression.Member (Expression set by e=>User.Name
can only be [MemberExpression] ( https://msdn.microsoft.com/en-us/library/system.linq.expressions. memberexpression (v = vs.110) .aspx and in the internal representations only the Name
line is preserved.
In DSL, the internal representation includes the ChainNode and ChainMemberNode classes that store the entire expression (eg e=>User.Name
), which can be embedded in the Expression Tree as it is. From this it follows that DSL Includes supports both fields and custom value types and function calls:
Execution of functions:
Include<User> include = chain => chain .Include(i => i.UserName) .Include(i => i.Email.ToUpper(),"EAddress");
What to do with it depends on the interpreter. CreateFormatter- will display {"UserName": "John", "EAddress": "JOHN@MAIL.COM"}
Execution can also be useful for setting traversal on nullable structures.
Include<StrangePointF> include = chain => chain .Include(e => e.NextPoint) // NextPoint is nullable struct .ThenIncluding(e => e.Value.X) .ThenInclude(e => e.Value.Y); // but not this way (abstraction leak) // Include<StrangePointF> include // = chain => chain // now this can throw an exception // .Include(e => e.NextPoint.Value) // .ThenIncluding(e => eX) // .ThenInclude(e => eY);
In DSL Includes there is also a short record of the multi-level traversal ThenIncluding.
Include<User> include = chain => chain .Include(i => i.UserName) .IncludeAll(i => i.Groups) // ING-form - doesn't change current node .ThenIncluding(e => e.GroupName) // leaf .ThenIncluding(e => e.GroupDescription) // leaf .ThenInclude(e => e.AdGroup); // leaf
compare with
Include<User> include = chain => chain .Include(i => i.UserName) .IncludeAll(i => i.Groups) .ThenInclude(e => e.GroupName) .IncludeAll(i => i.Groups) .ThenInclude(e => e.GroupDescription) .IncludeAll(i => i.Groups) .ThenInclude(e => e.AdGroup);
And here, too, there is a leak of abstraction. If I recorded a similar form of navigation, I need to know how the interpetor that will call QuaryableExtensions works. And it translates calls to Include and ThenInclude to Include "string". What may matter (must be borne in mind).
Algebra include expressions .
Include expressions can be:
var b1 = InlcudeExtensions.IsEqualTo(include1, include2); var b2 = InlcudeExtensions.IsSubTreeOf(include1, include2); var b3 = InlcudeExtensions.IsSuperTreeOf(include1, include2);
var include2 = InlcudeExtensions.Clone(include1);
var include3 = InlcudeExtensions.Merge(include1, include2);
IReadOnlyCollection<string> paths1 = InlcudeExtensions.ListLeafXPaths(include); // as xpaths IReadOnlyCollection<string[]> paths2 = InlcudeExtensions.ListLeafKeyPaths(include); // as string[]
etc.
The good news is that there are no leaks of abstractions here, a level of pure abstraction has been reached. There is metadata and work with metadata.
DSL Includes allows you to reach a new level of abstraction, but at the moment of achievement a need is formed to go to the next level: to generate the Include expressions themselves.
In this case, generating DSLs as fluent chains is not necessary, you just need to create structures for the internal representation.
var root = new ChainNode(typeof(Point)); var child = new ChainPropertyNode( typeof(int), expression: typeof(Point).CreatePropertyLambda("X"), memberName:"X", isEnumerable:false, parent:root ); root.Children.Add("X", child); // or there is number of extension methods eg: var child = root.AddChild("X"); Include<Point> include = ChainNodeExtensions.ComposeInclude<Point>(root);
The interpreters can also be passed to the interpreters. Why, then, does the fluent DSL record include at all? This is a purely speculative question, the answer to which is: because in practice, to develop an internal representation (and it also develops) is obtained only with the development of DSL (that is, a brief expressive recording convenient for a static code). Once again this will be discussed closer to the conclusion.
All of the above is also true for interpreters of include expressions that implement iterators copy , clone , equals .
Comparison only on leaves from an Include expression.
Hidden semantic problem: evaluate or not the order in the list
Include<User> include = chain=>chain.Include(e=>e.UserId).IncludeAll(e=>e.Groups).ThenInclude(e=>e.GroupId) bool b1 = ObjectExtensions.Equals(user1, user2, include); bool b2 = ObjectExtensions.EqualsAll(userList1, userList2, include);
Pass through expression nodes. The properties that match the rule are copied.
Include<User> include = chain=>chain.Include(e=>e.UserId).IncludeAll(e=>e.Groups).ThenInclude(e=>e.GroupId) var newUser = ObjectExtensions.Clone(user1, include, leafRule1); var newUserList = ObjectExtensions.CloneAll(userList1, leafRule1);
There may be an interpreter that will select leaf from includes. Why is it done through a separate rule? What was similar to the semantics of ObjectExtensions.Copy
Passage through the nodes-branch of expression and identification by nodes-leaves. Properties matching the rule are copied (similar to Clone).
Include<User> include = chain=>chain.IncludeAll(e=>e.Groups); ObjectExtensions.Copy(user1, user2, include, supportedLeafsRule); ObjectExtensions.CopyAll(userList1, userList2, include, supportedLeafsRule);
There may be an interpreter that will select leaf from includes. Why is it done through a separate rule? ObjectExtensions.Copy ( — include , supportedLeafsRule — ).
copy / clone :
DSL , .. . , Tuple<,>
, .. c readonly , ValueTuple<,>
c writabale ( ).
, ( Expression Trees) Includes — . Include DSL .
Detach, FindDifferences ..
.cs , , run-time :
Roslyn', . Typescript ( DTO , .. ) — DSL Includes Roslyn' ( ) — typescript ( ). " " " " .cs ( Expression Trees).
: run time — , . ( Expression Trees).
Internal DSL Expression Tree :
LambdaExpression.Compile
Lambda . , . , "" expression tree, CallExpression — LambdaExpression, (. LambdaExpression) ConstantExpression. , " /" — , Expression Trees.
ssmbly , ( 10 ) ( assembly , — ). , , , — .
, ( ), , . : . — — .cs .
— 600 15 . JSON.NET, ServiceStack reflection' GetProperties().
dslComposeFormatter — ComposeFormatter , .
BenchmarkDotNet =v0.10.14, OS=Windows 10.0.17134
Intel Core i5-2500K CPU 3.30GHz (Sandy Bridge), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=2.1.300
Method | Mean | Error | StdDev | Min | Max | Median | Allocated |
---|---|---|---|---|---|---|---|
dslComposeFormatter | 2.208 ms | 0.0093 ms | 0.0078 ms | 2.193 ms | 2.220 ms | 2.211 ms | 849.47 KB |
JsonNet_Default | 2.902 ms | 0.0160 ms | 0.0150 ms | 2.883 ms | 2.934 ms | 2.899 ms | 658.63 KB |
JsonNet_NullIgnore | 2.944 ms | 0.0089 ms | 0.0079 ms | 2.932 ms | 2.960 ms | 2.942 ms | 564.97 KB |
JsonNet_DateFormatFF | 3.480 ms | 0.0121 ms | 0.0113 ms | 3.458 ms | 3.497 ms | 3.479 ms | 757.41 KB |
JsonNet_DateFormatSS | 3.880 ms | 0.0139 ms | 0.0130 ms | 3.854 ms | 3.899 ms | 3.877 ms | 785.53 KB |
ServiceStack_SerializeToString | 4.225 ms | 0.0120 ms | 0.0106 ms | 4.201 ms | 4.243 ms | 4.226 ms | 805.13 KB |
fake_expressionManuallyConstruted | 54.396 ms | 0.1758 ms | 0.1644 ms | 54.104 ms | 54.629 ms | 54.383 ms | 7401.58 KB |
fake_expressionManuallyConstruted — expression ( ).
DSL : DSL ; Internal DSL run-time .
Expression Tree .NET Standard .
Expression Trees Internal DSL Fluent API. # .
fluent ( Expression Trees), Internal DSL # fluent, "" Expression Trees.
Expression Trees DSL Includes ( , ), / run-time — (run-time ).
Internal DSL : - serialize , copy , clone , equals "" . , " ", . : includes ( ) , ( , ).
DSL Includes DTO — ( json). , , , " ", . = .
Internal DSL , DSL, Internal DSL ( Expression) ( Expression Tree).
DSL Includes json ComposeFormatter DashboardCodes.Routines nuget GitHub.
Source: https://habr.com/ru/post/419759/
All Articles