📜 ⬆️ ⬇️

Microsoft Workflow - antimarketing

Hi, Habr.

I think many of you have heard of this technology - Microsoft Workflow . It is pretty well promoted, there are posts on Habré , there are books in English and in Russian . Yes, and Microsoft publishes beautiful pictures .

The essence of technology is that programmers create an API, and a business analyst creates the business process himself. Without intermediaries.
For example, a client has requested such a business process:
')
image

And then the business analyst draws him. Programmers need only implement the procedures Accept, Reject and other similar cubes. Cool, yeah?
I was not particularly lucky that I heard about this technology only from books and from marketing publications. And of course from the presentations of the form “We looked at MS WF six months ago, we’ve been using a little for a month already - the flight is normal!”. I have been working with a product in which Workflow has been implemented for 6 years (I myself have worked with him for only a couple of years), which has a fairly high load, and therefore I would like to show the main bases and ambushes of this library. I hope the post will help to avoid the same rake, which we attacked.


How does it work?

The idea is extremely simple: the programmer creates cubes - Activity. Each Activity can have parameters. For example, you can create RejectActivity with the User property. Most often this will mean that Reject will occur for this User. Each Activity, in fact, has an external view (that is, how a business analyst sees it) and an implementation. By the way, here we immediately get ambush # 1: this is the same class. Well, that is, our beautiful designer should link to the implementation. But this is solved very simply with the help of IoC, because we call it only a warm-up.
When a business analyst made a design (that is, he drew a bunch of activity, connected them with arrows), you can save it as a Xaml view. Which will be able to download Microsoft Workflow Runtime and start performing.
On some Activities you can pause (for example, Delay Activity). In this case, the working state is serialized to the database. Well, after a certain time (as we ourselves indicated), our working process will wake up again and move on. To save, we need a base or a written preserver . Everything is cool, right?

Stand up with serialization

As you understand from the text above, sometimes the business process should be paused. A typical example: we are waiting for a response from the user (that is, using the Event Activity). In this case, ordinary serialization occurs (xml serialization for Workflow 4.0+, binary for older versions). I think the readers immediately realized that it’s very easy to save too much or make a little mistake when saving in release A, and loading in release B. A typical example from Workflow 3.0 - You subscribed to the Event using the lyamb / anonymous method. Well, if you know, you have created a new class field, which is stored in the database. And your load will drop because deserialization will fall. From here, a big tip: All working code should be strictly imposed outside of your Activity. All fields must be saved somewhere far away from workflow. In Activity we store the very minimum and the simplest types. Let internal design suffer better than stability .
In fact, the setup has not ended here. The best thing starts when you need to change the set of fields. For example, in our RejectActivity from the example, we need to add Reason. And here the code should be ready for the fact that the old RejectActivity does not contain this field. For Workflow 4.0+, you can still change the serialized representation in the database, but for Workflow 3.0 this method is not always suitable (since the compressed binary representation is stored), because this is not quickly updated.

Performance ambush

In fact, Microsoft Workflow has a whole series of flaws. Moreover, the problems relate to both single execution (that is, a number of operations are not done efficiently) and load distribution. However, first things first.

When to save?

Imagine we have a business process in which there are zero expectations. It does not matter how they turned out. What matters is that they are. Workflow treats any wait as an excellent reason to continue, well, that is, to do serialization, and then download our work again (though not immediately, but not important). Naturally, this affects the reaction time in the most unfavorable way. Hence the advice: use the Delay Activity as rarely as possible, and even better - in conjunction with If Activity, which will verify that it is not necessary to wait . Another unpleasant moment is connected with the fact that if you said “to wait for 5 days”, then in simple ways you will not force Workflow to still not wait for anything if our instance is already in the database. Moreover, if Workflow Runtime loads your work into memory, and sees that there is still a lot of time to wait, then, oddly enough, it will simply leave it in memory. And it will wait a lot of time. Hence, another tip: because of these problems do not use a long wait. It is best to use a lot of short ones or to wake up due to an external Event, and already your external service will wake up the process when necessary .

Distributed execution

If you believe the same articles from Microsoft, Workflow perfectly knows how to distribute work. Well, that is, you can have several independent servers, each of which will take a few tasks, perform them, take the following, and so on. There is only one drawback: this is fiction. The whole point is that the distributed implementation of Workflow is an extremely strange product. It works according to the following algorithm:
  1. Take from the database ALL unlocked workflows that can now be executed And put them on the lock for five minutes
  2. After two minutes: repeat paragraph 1

Yes, numbers 2 and 5 can be changed. Another thing is important: the very first lucky guy will take away all the work from the base. By the way, in two minutes he will again clean the base, even if he has something to do. If it does not fit within five minutes for some workflow, then a strange thing will happen: it will still execute the workflow (it will call all WCF connections, etc.), try to save it to the database, but it will not work ( there is no lock!). As a result, this broken object will now forever remain in the memory of this Workflow Runtime. And he will not leave it voluntarily until you physically stop the process. The workflow runtime will not stop itself, it will not be able to do anything. Great implementation. A more beautiful scenario will happen if you set up a lock not for five minutes, but for a longer time. In this case, after stopping the process, these records will be blocked. Well, that is, you can no longer just stop the process, which can have a very negative impact on the Production platform. This problem is solved extremely easily: for correct distributed work you should write the procedures for working with the database yourself (that is, implement all the procedures for parallel and distributed work, make your own implementation of WorkflowPersistenceService ). By the way, there is one feature here: you don’t have to work with MS Sql database, you can experiment with other methods. In fact, the problem is solved with the help of simple file balls, it works quickly and correctly, however this is not fashionable.

Successful work under load

In fact, it is not. Of course, Microsoft claims that everything was fine , but they forgot about one small graph: the dependence of the amount of Activity in memory on the total operating time (and earlier it was like this ). In fact, it has not changed:

image

This is a quadratic graph. It is not the absolute values ​​of time that are important here, but the dependency: how long will everything work for you if the complexity of the business process grows. Moreover, in practice, it is precisely this dependence of Execute time on the total Activity in memory, and it does not matter whether one is a workflow or several. For example, if you have 10 parallel workflows, then more resources will be spent on processing each small Activity than if there was one workflow. Or differently: 10 parallel tasks are processed longer than 10 consecutive tasks. Moreover, with nonlinear dependence.
In the previous part, I wrote how Persistence Service works: it takes everything from the database. In fact, such a focus of 5,000 parallel complex workflows is detrimental to the system: it starts working at an extremely low speed: 1-10 Activity per minute (!!!). And this provided that the processor will be loaded almost 100%. The problem is clear, but how to solve it? Solution: make your Activity handler, reuse Activity, make emulation of your workflows. In fact, you will have to quickly implement the basic component of Workflow, which is engaged in starting and stopping the Activity. This will be required in the first place to prevent a large number of launched Activities on the workflow (because everything slows down), and secondly, to speed up the time of serialization / deserialization (quickly remove the extra ones from memory) . Microsoft Workflow never removes a spent Activity. You will have to implement everything in such a way that the completed Activity is not in memory.

Summary

I tried to describe some of the problems that await you when working with Microsoft Workflow. In fact, there are a large number of nuances, but they are more or less solvable, and I am sure that you will manage. In fact, if you have a task at work to make your own custom business process, then it’s better to start using Microsoft Workflow. For the prototype come down. Moreover, with a weak load, this whole system can work. The main bases are known - they are higher, they are completely solvable. In addition, it is much better to work with a system from which it is known what to expect, than with the one about which there is only marketing information. Well, if the load starts to grow, you can transfer module by module to your effective implementation.

Source: https://habr.com/ru/post/229723/


All Articles