Almost all modern software products consist of several services. Often, a large response time between service channels becomes a source of performance problems. The standard solution to this type of problem is the packaging of several inter-service requests in one package, which is called batching.
If you use batch processing, you may not be satisfied with its result in terms of performance or clarity of the code. This method is not as simple for the caller as you might think. For different purposes and in different situations, decisions can vary greatly. For specific examples, I will show the pros and cons of several approaches.
Demonstration project
For clarity, we consider an example of one of the services in the application I'm working on.
Explanation of the choice of platform for examplesThe problem of poor performance is quite common and does not concern any specific languages and platforms. This article will use Spring + Kotlin code examples to demonstrate tasks and solutions. Kotlin is equally understandable (or incomprehensible) to Java and C # - to developers, moreover, the code is more compact and understandable than in Java. To make things easier for pure Java developers, I’ll avoid Kotlin’s black magic and use only white (in the spirit of Lombok). There will be some extension methods, but they are actually familiar to all Java programmers as static methods, so this will be a little sugar that doesn’t spoil the taste of the dish.
There is a document approval service. Someone creates a document and submits it for discussion, during which changes are made, and ultimately the document is consistent. The reconciliation service itself knows nothing about the documents: this is just a chat of coordinators with small additional functions that we will not discuss here.
')
So, there are chat rooms (correspond to documents) with a predefined set of participants in each of them. As in normal chat rooms, messages contain text and files and can be replies (reply) and shipments (forward):
data class ChatMessage(
File and user links are links to other domains. We live this way:
typealias FileReference = Long typealias UserReference = Long
User data is stored in Keycloak and obtained via REST. The same applies to files: files and meta information about them live in a separate file storage service.
All calls to these services are
heavy queries . This means that the overhead of transporting these requests is much more than the time they are processed by a third-party service. On our test benches, the typical time for calling such services is 100 ms, so in the future we will use these numbers.
We need to make a simple REST controller to receive the latest N messages with all the necessary information. That is, we believe that in the frontend the message model is almost the same and you need to send all the data. The difference between the model for the frontend is that the file and the user need to be presented in a little decrypted form to make them links:
data class ReferenceUI( val ref: String, val name: String ) data class ChatMessageUI( val id: Long, val author: ReferenceUI, val message: String, val files: List<ReferenceUI>, val replyTo: ChatMessageUI? = null, val forwardFrom: ChatMessageUI? = null )
We need to implement the following:
interface ChatRestApi { fun getLast(n:Int): List<ChatMessageUI> }
The postfix UI means DTO-models for the frontend, that is, what we have to give through REST.
It may seem surprising here that we don’t pass on any chat ID, and even in ChatMessage / ChatMessageUI it doesn’t. I did this intentionally, so as not to clutter up the code of examples (the chats are isolated, so we can assume that we have one at all).
Philosophical retreatBoth the ChatMessageUI class and the ChatRestApi.getLast method use the List data type, whereas in fact it is an ordered Set. In the JDK, this is all bad, so declaring the order of elements at the interface level (preserving the order when adding and extracting) will not work. So the common practice has been to use the List in cases where an ordered Set is needed (there is also a LinkedHashSet, but this is not an interface).
An important limitation: we will assume that there are no long chains of replies or transfers. That is, they are, but their length does not exceed three messages. The whole chain of messages should be transmitted to the frontend.
To obtain data from external services, there are such APIs:
interface ChatMessageRepository { fun findLast(n: Int): List<ChatMessage> } data class FileHeadRemote( val id: FileReference, val name: String ) interface FileRemoteApi { fun getHeadById(id: FileReference): FileHeadRemote fun getHeadsByIds(id: Set<FileReference>): Set<FileHeadRemote> fun getHeadsByIds(id: List<FileReference>): List<FileHeadRemote> } data class UserRemote( val id: UserReference, val name: String ) interface UserRemoteApi { fun getUserById(id: UserReference): UserRemote fun getUsersByIds(id: Set<UserReference>): Set<UserRemote> fun getUsersByIds(id: List<UserReference>): List<UserRemote> }
It can be seen that in external services batch processing is initially provided, and in both options: via Set (without preserving the order of elements, with unique keys) and via List (there may be duplicates - the order is preserved).
Simple implementations
Naive implementation
The first naive implementation of our REST controller will look like this in most cases:
class ChatRestController( private val messageRepository: ChatMessageRepository, private val userRepository: UserRemoteApi, private val fileRepository: FileRemoteApi ) : ChatRestApi { override fun getLast(n: Int) = messageRepository.findLast(n) .map { it.toFrontModel() } private fun ChatMessage.toFrontModel(): ChatMessageUI = ChatMessageUI( id = id ?: throw IllegalStateException("$this must be persisted"), author = userRepository.getUserById(author).toFrontReference(), message = message, files = files?.let { files -> fileRepository.getHeadsByIds(files) .map { it.toFrontReference() } } ?: listOf(), forwardFrom = forwardFrom?.toFrontModel(), replyTo = replyTo?.toFrontModel() ) } fun UserRemote.toFrontReference() = ReferenceUI("/user/$id", name) fun FileHeadRemote.toFrontReference() = ReferenceUI("/file/$id", name)
Everything is very clear, and this is a big plus.
We use batch processing and retrieve data from external service packages. But what happens with performance?
For each message, one UserRemoteApi call will be made to retrieve data for the author field and one FileRemoteApi call to retrieve all attached files. It seems to be all. Suppose that the forwardFrom and replyTo fields for ChatMessage are such that it does not require unnecessary calls. But converting them into ChatMessageUI will lead to recursion, that is, the indicators for call counters can grow dramatically. As we noted earlier, let’s assume that we don’t have a lot of nesting and the chain is limited to three messages.
As a result, we will receive from two to six external service calls per message and one JPA call for the entire batch of messages. The total number of calls will vary from 2 * N + 1 to 6 * N + 1. How much is this in real units? Suppose you need 20 messages to render a page. To get them, you will need from 4 s to 10 s. Awful I would like to meet 500 ms. And since the front end dreamed of making a seamless scroll, the performance requirements of this endpoint can be doubled.
Pros:1. The code is short and self-documenting (support dream).
2. The code is simple, so there is almost no opportunity to shoot in the foot.
3. Batch processing does not look something alien and is organically incorporated into the logic.
4. Logic changes will be made easily and will be local.
Minus:The terrible performance due to the fact that the packages are very small.
This approach can often be seen in simple services or in prototypes. If the speed of making changes is important, it is hardly worth complicating the system. At the same time, for our very simple service, the performance is terrible, so the scope of applicability of this approach is very narrow.
Naive parallel processing
You can start processing all messages in parallel - this will get rid of the linear growth of time depending on the number of messages. This is not a particularly good path, because it will lead to a large peak load on the external service.
Implementing parallel processing is very simple:
override fun getLast(n: Int) = messageRepository.findLast(n).parallelStream() .map { it.toFrontModel() } .collect(toList())
Using parallel processing of messages, we get 300–700 ms, ideally, which is much better than with a naive implementation, but still not fast enough.
With this approach, userRepository and fileRepository requests will be executed synchronously, which is not very efficient. To fix this, you have to change the call logic a lot. For example, through CompletionStage (aka CompletableFuture):
private fun ChatMessage.toFrontModel(): ChatMessageUI = CompletableFuture.supplyAsync { userRepository.getUserById(author).toFrontReference() }.thenCombine( files?.let { CompletableFuture.supplyAsync { fileRepository.getHeadsByIds(files).map { it.toFrontReference() } } } ?: CompletableFuture.completedFuture(listOf()) ) { author, files -> ChatMessageUI( id = id ?: throw IllegalStateException("$this must be persisted"), author = author, message = message, files = files, forwardFrom = forwardFrom?.toFrontModel(), replyTo = replyTo?.toFrontModel() ) }.get()!!
It can be seen that initially simple mapping code has become less clear. This is because we had to separate the calls of external services from the place where the results were used. This is not bad in itself. But the combination of calls does not look very elegant and resembles the typical jet noodles.
If you use Korutiny, everything will look more decent:
private fun ChatMessage.toFrontModel(): ChatMessageUI = join( { userRepository.getUserById(author).toFrontReference() }, { files?.let { fileRepository.getHeadsByIds(files).map { it.toFrontReference() } } ?: listOf() } ).let { (author, files) -> ChatMessageUI( id = id ?: throw IllegalStateException("$this must be persisted"), author = author, message = message, files = files, forwardFrom = forwardFrom?.toFrontModel(), replyTo = replyTo?.toFrontModel() ) }
Where:
fun <A, B> join(a: () -> A, b: () -> B) = runBlocking(IO) { awaitAll(async { a() }, async { b() }) }.let { it[0] as A to it[1] as B }
Theoretically, using such parallel processing, we obtain 200–400 ms, which is already close to our expectations.
Unfortunately, such a good parallelization does not happen, and the payoff is quite cruel: with the simultaneous operation of only a few users, a squall of requests will fall on the services, which will not be processed in parallel anyway, so we will return to our sad 4 s.
My result when using this service is 1300–1700 ms for processing 20 messages. This is faster than in the first implementation, but still does not remove the problem.
Alternative use of parallel queriesWhat if third-party services do not provide batch processing? For example, you can hide the lack of implementation of batch processing inside interface methods:
interface UserRemoteApi { fun getUserById(id: UserReference): UserRemote fun getUsersByIds(id: Set<UserReference>): Set<UserRemote> = id.parallelStream() .map { getUserById(it) }.collect(toSet()) fun getUsersByIds(id: List<UserReference>): List<UserRemote> = id.parallelStream() .map { getUserById(it) }.collect(toList()) }
It makes sense if there is hope for the appearance of batch processing in future versions.
Pros:1. Easy implementation of parallel processing by messages.
2. Good scalability.
Minuses:1. The need to separate the receipt of data from their processing when processing requests for different services in parallel.
2. Increased load on third-party services.
It is seen that the scope of applicability is about the same as that of the naive approach. It makes sense to use the method of parallel queries if you want to increase the productivity of your service by several times due to the merciless exploitation of others. In our example, productivity increased 2.5 times, but this is clearly not enough.
Caching
You can make caching in the spirit of JPA for external services, that is, store received objects within a session so as not to receive them again (including during batch processing). You can make these caches yourself, you can use Spring with its @Cacheable, plus you can always use a ready-made cache like EhCache manually.
A common problem will be related to the fact that there is a sense from caches only if there are hits. In our case, hitting the author field is very likely (for example, 50%), and there will be no file hits at all. This approach will bring some improvements, but radically performance will not change (and we need a breakthrough).
Intersessional (long) caches require a complex logic of invalidation. In general, the later you go so far as to solve performance problems with intersessional caches, the better.
Pros:1. Implement caching without changing the code.
2. Performance increase several times (in some cases).
Minuses:1. The possibility of poor performance when used improperly.
2. Large memory overhead, especially with long caches.
3. Complicated disability, errors in which will lead to difficult-to-reproducible problems in runtime.
Very often, caches are used only to quickly patch up design problems. This does not mean that they do not need to be used. However, you should always treat them with caution and first evaluate the resulting performance gain, and only then make a decision.
In our example, the caches will have a performance gain of around 25%. At the same time there are a lot of minuses in caches, so I would not use them here.
Results
So, we looked at the naive implementation of a service that uses batch processing, and a few simple ways to speed it up.
The main advantage of all these methods is simplicity, from which there are many pleasant consequences.
A common problem with these methods is poor performance, primarily due to packet size. Therefore, if these solutions do not suit you, then it is worth considering more radical methods.
There are two main areas in which you can search for solutions:
- asynchronous work with data (requires a paradigm shift, therefore, this article is not considered);
- enlarging packs while maintaining synchronous processing.
Enlargement of packs will greatly reduce the number of external calls and at the same time keep the code synchronous. The following part of the article will be devoted to this topic.