Architectural Template “Macro Shared Transactions for Microservices”

Author: Denis Tsyplakov , Solution Architect, DataArt

Formulation of the problem

One of the problems when building microservice architectures and especially when migrating monolithic architecture to microservices is often the transaction. Each microservice is responsible for its own group of functions, it is possible that it manages the data associated with this group and can service user requests either autonomously or by sending requests to other microservices. All this works fine, as long as we do not need to ensure the consistency of data managed by different microservices.
')
For example, our application works in a large online store. Among other things, we have three separate, weakly related business areas:

Warehouse - what, where, how and how long it has been stored, how many goods of a certain type are now in warehouses, etc.
Sending goods - packaging, shipping, delivery tracking, handling complaints about its delay, etc.
Conducting customs reporting on the movement of goods, if the goods are sent abroad (in fact, I do not know whether in this case it is necessary to arrange something specifically, but still I will connect state services to the process in order to add drama).

Each of these three areas include many non-overlapping functions and can be represented as several microservices.

There is one problem. Suppose a person bought a product, the goods were packed and sent by courier. Among other things, we need to point out that there is less than one unit of goods in the warehouse, to note that the process of delivery of goods has begun, and if the goods are sent, say, to China, take care of the papers for customs. If the application fails (for example, the node is crashing) in the second or third stage of the process, our data will not be consistent, and only a few such failures can lead to quite unpleasant problems for the business (for example, the visit of customs officers).

In the classical monolithic architecture, this kind of problem is simply and elegantly solved by transactions in the database. But what if we use microservices? Even if we use one database from all services (which is not very elegant, but in our case it is possible), working with this database comes from different processes, and we will not be able to stretch the transaction between processes.

Solutions

The problem has several solutions:

Oddly enough, sometimes the problem can be ignored. If we know that failure occurs no more than once a month, and the manual elimination of the consequences is acceptable for business money, you can ignore the problem, no matter how ugly it looks. I don’t know whether the claims of the customs service can be ignored, but it can be assumed that under certain circumstances it is even possible.
Compensation (this is not about monetary compensation to customs, let's say you paid a fine) - a group of different kinds of steps that complicate the processing sequence, but allow you to detect and process the strayed process. For example, before the start of the operation, we write to a special service that we start the shipping operation, and at the end we mark that everything ended well. Then periodically we check if there are any unfinished operations, and if they are, looking in all three databases, we try to bring the data into a consistent state. This is quite a working method, but it significantly complicates the processing logic, and it is rather painful to do so for each operation.
Two-phase transactions, strictly speaking, the XA + specification, which allows you to create transactions that are distributed across applications, is a very heavy mechanism that few people like and, more importantly, few people can customize. In addition, it is poorly compatible ideologically with lightweight microservices.
In principle, the transaction is a special case of the consensus problem, and to solve the problem, you can use numerous systems of distributed consensus (roughly speaking, everything that is googled by the keywords paxos, raft, zookeeper, etcd, consul). But in practical application for extensive and extensive warehouse data, it all looks even more complicated than two-phase transactions.
Queues and eventual consistency (consistency in the end) - we divide the task into three asynchronous tasks, sequentially process the data, transferring them between services from queue to queue, and use the delivery confirmation mechanism. In this case, the code is not very complicated, but there are a few things to keep in mind:
- The queue guarantees delivery “one or more times”, i.e. when re-delivering the same message, the service should correctly handle this situation and not ship the goods two times. This can be done, for example, through a unique order UUID.
- The data at any given time will be a little inconsistent. That is, the goods will first disappear from the warehouse and only then, with a slight delay, will an order be created to send them. Later, the data for customs will be processed. In our example, this is completely normal and does not cause problems for the business, but there are cases when this behavior can be quite unpleasant.
- If, as a result, the very first service must return some data to the user, the sequence of calls that ultimately delivers the data to the user's browser can be quite nontrivial. The main problem is that the browser sends requests synchronously and usually expects a synchronous response. If you do asynchronous request processing, then you need to build asynchronous delivery of the response to the browser. Classically, this is done either through web sockets or through periodic requests for new events from the browser to the server. There are mechanisms, such as SocksJS, for example, which simplify some aspects of building this link, but there will still be additional complexity.

In most cases, the latter option is the most acceptable. It does not greatly complicate the processing request, although it works several times longer, but, as a rule, this is acceptable for such an operation. It also requires a bit more complex data organization to cut off repeated requests, but there is nothing super-complex in this either.

Schematically, one of the options for processing transactions using queues and Eventual consistency might look like this:

The user made a purchase, a message about this is sent to the queue (for example, a RabbitMQ cluster or, if we are working in the Google Cloud Platform, Pub / Sub). The queue is persistent, guarantees delivery one or more times, and is transactional, i.e., if the service processing the message suddenly drops, the message will not be lost, but will be delivered to the new instance of the service again.
The message arrives to the service, which marks the goods in the warehouse as being prepared for shipment and in turn sends the message “The product is ready for shipment” to the queue.
In the next step, the service responsible for the shipment receives a message of readiness for shipment, creates a task for shipment and then sends a message “shipment of goods is scheduled.”
The next service, receiving a message that the shipment is scheduled, starts the process of paperwork for customs.

In addition, each message received by the service is checked for uniqueness, and if a message with such a UUID has already been processed, it is ignored.

Here, the database (s) at each moment in time is in a slightly non-consistent state, i.e., the goods in stock are already marked as being in the process of delivery, but the delivery task itself is not yet there, it will appear in a second or two. But at the same time we have 99.999% (in fact, this number is equal to the level of reliability of the queue service) guarantees that the task for sending will appear. For most businesses, this is acceptable.

What then is the article?

In the article I want to talk about another way to solve the problem of transactionality in microservice applications. Despite the fact that microservices work best when each service has its own database, for small and medium-sized systems, all data, as a rule, easily fits into a modern relational database. This is true for almost any internal enterprise system. That is, we often do not have a hard need to share data between different physical machines. We can store data of different microservices in groups of tables of the same database that are not connected to each other. This is especially useful if you are sharing an old, monolithic application into services and have already divided the code, but the data still lives in the same database. However, the problem of separation of transactions still remains - the transaction is rigidly tied to the network connection and, accordingly, to the process that opened this connection, and our processes are separated. How to be?

Above, I described several common ways to solve the problem, but then I want to offer another way for the particular case when all the data are in the same database. I do not recommend this method to try to implement in this project , but it is curious enough for me to put it in the article. Well, suddenly he is still useful in some particular case.

Its essence is very simple. The transaction is associated with a network connection, and the database does not really know who is sitting on that end of the open network connection. She doesn’t care, the main thing is that the correct commands should come into the socket. It is clear that usually the socket belongs exclusively to one process on the client side, but I see at least three ways around this.

1. Changing the database code

At the database code level for databases, the code of which we can change, making our database assembly, we implement a mechanism for transferring transactions between connections. How it can work from the client’s point of view:

We start a transaction, make some changes, it is time to transfer the transaction to the next service.
We tell the DB to give us the UUID of the transaction and wait for N seconds. If during this time another connection with this UUID does not come, roll back the transaction, if it comes, transfer all the data structures associated with the transaction to the new connection and continue working with it.
We transfer the UUID to the next service (i.e., to another process, possibly to another VM).
In it, open the connection and give the command DB - to continue the transaction with the specified UUID.
We continue to work with the database as part of a transaction initiated by another process.

This method is the most lightweight to use, but requires modification of the database code, application programmers usually do not do this, this requires a lot of special skills. Most likely, it is necessary to transfer data between database processes, and databases, the code of which we can safely change by and large one - PostgreSQL. In addition, it will work only for unmanaged servers, you won’t go with RDS or Cloud SQL.

Schematically it looks like this:

2. Socket manipulation

The second thing that comes to mind is the subtle manipulation of the sockets of the connections to the database. We can make a certain “Reverse socket proxy”, which directs commands coming from several clients to a certain port in one stream of commands to the database.

In fact, this application is very similar to pgBouncer, only, in addition to its standard functionality, it does some manipulation of the byte stream from clients and is able to substitute one client instead of another.

I categorically do not like this method, for its implementation it is necessary to clean up the binary packages circulating between the server and the clients. And it still requires a lot of system programming. I brought him exclusively to complete the list.

3. Gateway JDBC

We can make the gateway JDBC driver - we take the standard JDBC driver for a specific database, let it be PostgreSQL. We wrap the class and make HTTP interfaces to all of its external methods (it is possible and not HTTP, but the difference is small). Next, we create another JDBC driver, a facade that redirects all method calls to the JDBC gateway. That is, in fact, we are cutting the existing driver into two halves and linking these halves over the network. We get the following component scheme:

NB !: As we can see, all three options are similar, the only difference is in what level we are transmitting the connection and what tools we use for this.

After that, we teach our driver to do essentially the same trick with a UUID transaction, which is described in method 1.

In Java code, using this method might look like this.

Service A - the beginning of the transaction

Below is the code of some service that starts a transaction, makes changes to the database and passes it on to another service to complete. In the code, we use direct work with JDBC classes. In 2019, of course, no one does this, but for simplicity of the example the code is simplified.

//    , ,  “” //   Class.forName("org.postgresql.FacadeDriver"); var connection = DriverManager.getConnection( "jdbc:postgresqlfacade://hostname:port/dbname","username", "password"); //  -    statement = dbConnection.createStatement(); var statement.executeUpdate(“insert ...”); /* ,        . transactionUUID(int)  -,     ,   JDBC gateway-.  ResultSet        Varchar,  UUID.            .   ,         UUID.  60 —  ,    .        , , JDBCTemplate.      ResultSet */ var rs = statement.executeQuery(“select transactionUUID(60)”); String uuid = extractUUIDFromResultSet(rs); //      remoteServiceProxy.continueProcessing(uuid, otherParams); //          //     . closeEverything(); return;

Service B - completion of the transaction

 //     ,     // remoteServiceProxy.continueProcessing(...) //     . Class.forName("org.postgresql.FacadeDriver"); var connection = DriverManager.getConnection( "jdbc:postgresqlfacade://hostname:port/dbname","username", "password"); //     Gateway JDBC,    // .  continue transaction    ,   // gateway JDBC statement = dbConnection.createStatement(); statement.executeUpdate(“continue transaction ”+uuid); // ,    ,      //      statement.executeUpdate(“update ..."); //   connection.commit(); return;

Interaction with other components and frameworks

Consider the possible side effects of such an architectural solution.

Pool of connections

Since in reality we will have a real pool of connections inside the JDBC gateway - it is better to turn off connection pools in services altogether, since they will capture and hold a connection that could be used by another service within the service.

Plus, after receiving the UUID and waiting for the transfer to another process, the connection essentially becomes inoperative, and from the point of view of the frontend JDBC, it is auto-closed, but from the point of view of the gateway JDBC, it must be kept, not giving to anyone except will come with the desired UUID.

In other words, dual management of the connection pool in Gateway JDBC and within each of the services can result in subtle, unpleasant errors.

JPA

With JPA, I see two possible problems:

Transaction management With the JPA committee, the engine might think that it saved all the data, whereas they were not saved. Most likely, manual control of transactions and flush () before transferring the transaction should solve the problem.
The second-level cache is likely to work incorrectly, but in distributed systems its use is in any case restricted.

Spring transactions

The mechanism for managing the Spring framework transactions, perhaps, will not work, and you will have to manage them manually. I am almost sure that it can be expanded - for example, to write a custom scope - but in order to say for sure, it is necessary to study how the Spring Transactions extension works there, but I haven’t looked there yet.

Advantages and disadvantages

pros

Practically does not require modification of the existing monolithic code when sawing.
You can write complex cross-server transactions with little or no code complexity.
Allows you to do cross-service transaction tracing.
The solution is quite flexible, you can use classic transactions, where distribution is not required and to share the transaction only for those operations where cross-service interaction is required.
The project team is not required to comprehend new technologies. New technologies, of course, are good, but the task — necessarily and urgently (until yesterday's number!) —To teach 20 developers of the concept of building reactive systems — can be very nontrivial. There is no guarantee that all 20 people will cope with the training on time.

Minuses

Unscaled and, in fact, non-modular at the database level, as opposed to queuing solutions. You still have one database in which all queries and all load converge. In this sense, the solution is a dead end: if you then want to increase the load or make the solution modular according to the data, you will have to redo everything.
It is necessary to transfer the transaction between processes very carefully, especially processes written on frameworks. Sessions have their own settings, and for various frameworks, a sudden change of connection with the base can lead to incorrect operation. See, for example, session settings and transactions for PostgreSQL.
When I told the idea in our local architect chat in DataArt, the first thing my colleagues asked me was if I drank (no, I didn't drink!). But I admit that the idea, say so, is not the most common, and if you implement it in your project, it will look very unusual for other participants.
Requires a custom JDBC driver. Its writing takes time, you have to debug it, look for errors in it, including those caused by network exchange errors, etc.

A warning

Once again I warn you: do not try to repeat this trick ~~at home~~ in this project, if you do not have a very clear explanation of why you need it, and convincing evidence that there is no other way.

All from the first of April!

Source: https://habr.com/ru/post/446288/

All Articles