📜 ⬆️ ⬇️

XmlSerializer - Assembly Leak without asking :)

Briefly about the main thing


In some parts of the .NET Framework, such as the XmlSerializer, internal dynamic code generation is used. The XmlSerializer creates C # temporary files, compiles the resulting files into a temporary assembly, and then loads this assembly into the process. This code generation is also relatively expensive, so the XmlSerializer places temporary assemblies in the cache, one for each type. This means that the next time you create XmlSerializer code for class X, a new assembly will not be created, but a cache assembly will be used. However, not all so simple.

When you call another constructor, XmlSerializer does not put the dynamically created assembly into the cache, but creates a new temporary assembly whenever a new instance of XmlSerializer is created!
In the application, unmanaged memory leaks occur in the form of temporary assemblies.

Localization problems


To begin, talk about the system that is built by our team.

The application consists of three parts - a website, a data warehouse and a business center.
The whole system is built on .net 3.5.
The website allows you to run data validation on a business service that runs on Windows Workflow Foundation. Each workflow should receive some data (for this, it communicates to the persistence service).
')
The system is built on the latest technologies, and it is not surprising that with the use of various combinations of these same technologies, surprises may occur.

For example, an application that runs workflows, which, in general, works with them (WCF-service), started to eat up to 2.5 gigabytes of memory under load.

We have solved the problem with the memory leak, I will write about it a bit later, because now there is no necessary data at hand.

After solving the problem, the process with the application took up to 500 MB, and sometimes up to 800 MB. We knew perfectly well that this is not the limit, that earlier the working capacity was lost at 2 gigabytes. However, the application even with such a volume after some time began to work much slower. After some observations, we noticed that sometimes the C # compiler csc.exe is launched, which, in principle, should be run in our system only at the first request of the workflow, and for the following - to take the ready-made assembly.

Thinking a little more, we decided to look at the number of assemblies in the process. :)

And here we were surprised: immediately after launching the application, about 100 assemblies were loaded into the main domain, but over time their number reached 3,000, and later, up to 5,000.

Having written a utility that allows you to view the domains and the assemblies loaded in them in any .net application on the go, we saw that those 100 initial assemblies remain. And constantly added only some anonymous assembly. Unfortunately, we could not get more detailed information (which types were declared in the assembly) in another process.

We did not observe such a number of “anonymous” builds on our test environment, although they were there. In order to get detailed information, we decided to inject the code giving the necessary information to us directly into the application in order to later receive the most complete data on the fly.

In general, it turned out that “anonymous” assemblies are assemblies created by the XmlSerializer for serialization. And they are all the same :)

You represent one and the same class 1000 times. And your app is terribly slow, and, moreover, your memory is leaking ...

No, well ... This is .net. There after all is GC. After all, he is engaged in memory.

Actually, the problem


We now turn to the details. The xmlSerializer in .net is capable of causing assembly leak (and assembly leak flows into memory leak). Not always, of course. This class has several constructors.

If you use a regular constructor that accepts a Type, then there is no memory leak:

namespace XmlSerializerMemoryLeak
{
class program
{
private static XmlSerializer serial = null ;

static void Main ( string [] args)
{
for ( int index = 0; index <10000; index ++)
{
TestClass test = new TestClass ();
test.Id = index;
test.Date = DateTime .Now;
StringBuilder builder = new StringBuilder ();
StringWriter writer = new StringWriter (builder);
serial = new XmlSerializer ( typeof (TestClass));
serial.Serialize (writer, test);
string xml = builder.ToString ();
}
Console .WriteLine ( “Done” );
}
}

public class TestClass
{
public DateTime Date { get ; set ; }
public int Id { get ; set ; }
}
}
* This source code was highlighted with Source Code Highlighter .


However, if you use a slightly different constructor, then a memory leak is guaranteed:

namespace XmlSerializerMemoryLeak
{
class program
{
private static XmlSerializer serial = null ;
static void Main ( string [] args)
{
Console .ReadLine ();
for ( int index = 0; index <100000; index ++)
{
TestClass test = new TestClass ();
test.Id = index;
test.Date = DateTime .Now;
StringBuilder builder = new StringBuilder ();
StringWriter writer = new StringWriter (builder);
serial = new XmlSerializer ( typeof (TestClass), new XmlRootAttribute ( "MemoryLeak" ));
serial.Serialize (writer, test);
string xml = builder.ToString ();
}
Console .WriteLine ( “Done” );
}
}

public class TestClass
{
public DateTime Date { get ; set ; }
public int Id { get ; set ; }
}
}
* This source code was highlighted with Source Code Highlighter .


All the difference between them is visible in the reflector - the first one (as, incidentally, another one - XmlSerializer (Type, String)) works perfectly. Climbs into the cache serializers and looks to see if there is already ready. Nope - compile and add to cache.
But the second - completely sucks. No cache is needed. That's why it compiles a new assembly each time and causes assembly leak.

Solutions


There are several ways out:
  1. Use the "right" constructors
  2. Implement XmlSerializerCache - which will always look in the cache. You can, in principle, do not implement, and see here
  3. Do not use serialization, but, for example, if you already have an application that can do serialization (or even it does it), you can give the object to it and get only the xml itself.

Which one to use depends entirely on you and on the situation. If you have a common project, utility classes, so to speak, then I would advise you to implement the assembly cache and quietly use any of the necessary designers, so that you don’t have any problems in the future. Perhaps in a year or two you will forget about the correct designer.

findings


This is not news, this problem is described in MSDN Magazine , it is unclear just why it has not yet been fixed.

The conclusions are simple. Very carefully use the serialization and closely monitor the state of the application. In addition, it is useful to have some diagnostic methods or services in order to obtain as accurate information about the application as possible.

PS:


On the msdn forum, they told me that they know about this problem and that it is described in an article in MSDN Magazine, the link to which I indicated. I will try to find out more.

Source: https://habr.com/ru/post/27342/


All Articles