Jet manifest

In recent years, application requirements have changed significantly. Dozens of servers, a response time of a few seconds, offline service that could last for hours, gigabytes of data — such were large applications just a few years ago. Today, applications work on absolutely everything, from simple mobile phones to clusters of thousands of processors. Users expect millisecond response time and one hundred percent uptime, while data has grown to petabytes.

Initially, this niche was occupied by large innovative Internet companies like Google or Twitter, but such application requirements began to emerge in many areas of the industry. Financial and telecommunications companies were the first to introduce new practices to meet new requirements, and now the rest are catching up.

New requirements require new technologies. Previous decisions focused on managed servers and containers. Scaling is achieved through the purchase of steeper servers and the use of multithreading. To add new servers we had to use complex, inefficient and expensive proprietary solutions.
')
However, progress does not stand still. The application architecture has evolved according to changing requirements. Applications developed on the basis of this architecture are called Reactive Applications . This architecture allows programmers to create event-oriented, scalable, fault-tolerant and responsive applications — applications that run in real time and provide a good response time based on a scalable and fault-tolerant stack that can easily be deployed on multicore and cloud architectures. These features are critical to reactivity.

Reactive applications

The Merriam Webster dictionary defines reactive as “ready to respond to external events,” which means that components are always active and always ready to receive messages. This definition reveals the essence of reactive applications, focusing on systems that:

react to events
Focusing on events means having the following qualities
react to increasing load
Focus on scalability, competitive access to shared resources is minimized
react to failures
Fault-tolerant systems are built with the ability to recover at all levels.
react to users
Guaranteed response time independent of load

Each of these characteristics is essential for a reactive application. They all depend on each other, but not as tiers of standard multi-tier architecture. On the contrary, they describe the properties applicable to the whole technology stack:

4 whales reactivity

Next, we take a closer look at each of these four characteristics and see how they relate to each other.

Event oriented

Why is it important

Applications that use the asynchronous model provide much better connectivity than applications based on purely synchronous calls. The sender and receiver can be implemented without looking at the details of how events propagate in the system, which allows interfaces to focus on the content of the transmission. This leads to an implementation that is easier to expand, change and maintain, provides greater flexibility and reduces the cost of support.

Since recipients of asynchronous interaction are inactive until they receive a message, this approach can allow efficient use of resources, making it possible for a large number of recipients to work in the same hardware stream. Thus, a non-blocking application may have lower latency and greater throughput compared to a traditional application based on blocking synchronization and communication primitives. This leads to a decrease in the cost of operations, an increase in utilization of processor resources and makes the end users happier.

Key building blocks

In an event-oriented application, components interact with each other by sending and receiving messages — discrete pieces of information that describe the facts. These messages are sent and received in asynchronous and non-blocking mode. Event-oriented systems are more prone to push- models, rather than pull or poll . Those. they push data to their customers when data becomes available, instead of wasting resources, constantly querying or waiting for data.

Asynchronous messaging means that an application, by its nature, has a high degree of competition and can work without changes on a multi-core architecture. Any CPU core can process any message, which provides greater parallelization capabilities.
Non-blocking means the ability to continue to work so that the application is responsive all the time, even in conditions of failure or peak load. To do this, all the resources needed to ensure responsiveness, such as CPU, memory and network, should not be monopolized. This will lead to lower latency, greater bandwidth and better scalability .

Traditional server architectures use a shared state of change and blocking operations on a single thread. This makes it difficult to scale the system. The public changeable state requires synchronization, which introduces complexity and non-determinism, making the code difficult to understand and maintain. Switching the stream to sleep mode consumes limited resources, and waking up is expensive.

By separating the generation of events and their processing, we allow the platform to take care of the details of synchronization and dispatching events between threads, while we ourselves concentrate on higher-level abstractions and business logic. We think about where and where events are sent from, and how the components interact with each other, instead of digging with low-level primitives like threads or locks.

Event-oriented systems provide weak connectivity between components and subsystems. Such connectivity, as we will see later, is one of the necessary conditions for scalability and fault tolerance. Without complex and strong dependencies between components, system expansion requires minimal effort.

When an application requires high performance and good scalability, it is difficult to foresee where bottlenecks can occur. Therefore, it is very important that the entire solution be asynchronous and non-blocking. For a typical application, this means that the architecture must be fully event-oriented, starting with user requests via a graphical interface (browser, REST, etc.) and processing requests in the web layer and ending with services, a cache, and a database. If at least one of these layers does not meet this requirement — it will make blocking requests to the database, use a public changeable state, cause expensive synchronous operations — then the entire stack will stall and users will suffer due to increased delays and dropped scalability.

The application must be reactive from top to bottom .

The need to eliminate the weak link in the chain is well illustrated by the law of Amdal , which according to Wikipedia says:

Acceleration of the program due to its parallelization is limited to the sequential part of the program. For example, if 95% of the volume of calculations can be parallelized, then the theoretical maximum acceleration cannot exceed 20, regardless of the number of processors used.

Illustration of Amdal's Law

Scalability

Why is it important

The word scalable is defined by the Merriam Webster dictionary as “able to easily expand or upgrade . ” Scalable application can be expanded to the required scale. This is achieved by giving the application elasticity, a property that allows the system to stretch or shrink (add or remove knots) on demand. In addition, this architecture makes it possible to expand or contract (deploy on more or fewer processors) without the need to redesign or rewrite the application. Elasticity minimizes the cost of operating in the cloud, while we only pay for what we really use.

Scalability also helps to manage risk: too little equipment can lead to dissatisfaction and loss of customers, and too much will simply be inactive (together with the staff) and lead to unnecessary costs. A more scalable application reduces the risk of a situation when equipment is available, but an application cannot use it: in the next 10 years we will have processors with hundreds, if not thousands of hardware threads, and using their potential requires scalability at the microscopic level.

Key building blocks

An event-based system based on asynchronous messaging is the foundation of scalability. The weak connectivity and location-independent independence of components and subsystems allow the system to be deployed on multiple nodes, remaining within the same software model with the same semantics. When adding new nodes, the system capacity increases. In terms of implementation, there should be no difference between deploying a system to more cores or more nodes in a cluster or data center. Application topology becomes a problem with configuration and / or adaptive runtime algorithms that monitor system load. This is what we call location transparency .

It is important to understand that the goal — not to invent transparent distributed computing, distributed objects, or RPC communications — was already attempted before and this idea failed. Instead, we should cover the network , presenting it directly in the program model through the mechanism of asynchronous messages. True scalability naturally relies on distributed computing and their inter-node interaction, which means network traversal, which is inherently unreliable . Therefore, it is important to take into account the limitations, compromises and scenarios of exceptional situations clearly in the program model instead of hiding them behind a screen of leaky abstractions that they are supposedly trying to "simplify" things. As a result, it is equally important to provide yourself with software tools that contain building blocks for solving typical problems that may arise in a distributed environment - such as mechanisms for achieving consensus or messaging interfaces that have a high level of reliability.

fault tolerance

Why is it important

Application failure is one of the most destructive things that can happen to a business. Usually this leads to the fact that the work of the service simply stops, leaving a gap in the flow of profits. In the long run, this can lead to customer dissatisfaction and bad reputation, which will harm the business even more seriously. Surprisingly, application resiliency requirements are universally ignored or solved by ad-hoc technicians. This often means that the problem is considered at the wrong level of detail, using too inaccurate and coarse tools. A common solution is to use application server clustering with disaster recovery during operation. Unfortunately, such ready-made solutions are extremely expensive and, moreover, dangerous - they can potentially “drop” the entire cluster in a cascading manner. The reason is that the problem of managing failures is solved on a map of too small scale, although it should be worked out in detail at the level of interaction of smaller components.

In a reactive application, fault tolerance is not left “for later”, but is part of the architecture from the very beginning. Attitude to failures as first-class objects in the program model will facilitate the task of responding to and managing them, which will make the application tolerant to failures and allow the system to “heal” and “repair” itself in the process. Traditional methods of handling exceptional situations cannot achieve this, because problems are not solved at those levels — we either handle exceptions right where they occur, or we initiate the recovery procedure for the entire application.

Key building blocks

To manage failures, we need a way to isolate them so that they do not extend to other workable components, and to monitor them from a safe place outside the context in which the failures can occur. One way that comes to mind is the bulkheads that divide the system into compartments in such a way that if one of the compartments is flooded (fails), it does not affect the other compartments. This prevents the classic cascading glitch problem and allows you to solve problems in isolation.

Compartments and bulkheads

An event-oriented model that provides scalability also provides the necessary primitives to solve the fault tolerance problem. Weak coherence in the event-oriented model provides us with completely isolated components, in which failures are encapsulated into messages along with the necessary details and forwarded to other components, which in turn analyze the errors and decide how to react to them.

This approach creates a system in which:

business logic remains clean, separate from error handling;
failures are modeled explicitly so that partitioning, observation, control, and configuration are set declaratively;
the system can “heal” itself and recover automatically.

It is best if the compartments are organized in a hierarchical manner, like a large corporation, where problems are raised to a level that has enough power to take action.

The power of this model is that it is purely event-oriented - it is based on reactive components and asynchronous events, and therefore has location-based transparency . In practice, this means that its semantics does not depend on whether it works on a local server or in a distributed environment.

Responsiveness

Why is it important

Responsive is defined by the Merriam-Webster dictionary as “responding quickly or responding appropriately.” Note that in the following we will use this word in its general sense and will not be confused with responsive web design with its CSS media queries and progressive improvements .

Responsive applications are real-time applications, they are attractive, rich in functionality and provide shared access. An open and continuous dialogue is maintained with clients through responsiveness and interactivity. This makes the work of clients more productive, creates a sense of constancy and readiness to solve problems and complete tasks at any time. One such example is Google Docs, which supports co-editing in real time, which allows users to directly see each other's edits.

Applications must respond to events in a timely manner, even in the event of a failure. If an application does not respond within a reasonable period of time (also called latency), then in fact it is not available and therefore cannot be considered fault tolerant .

The inability to remain in a rigid real-time framework for some applications, such as those related to weapons or medicine, is tantamount to a complete system failure. Not all applications have such strict requirements. Many applications quickly become useless if they stop meeting temporary requirements, for example, an application performing trading operations may lose the current transaction if it does not have time to respond in time.

More common applications, such as online shopping retailers, lose profit if response time increases. Users interact more intensively with responsive applications, which leads to large volumes of purchases.

Key building blocks

Reactive applications use observable models, event flows, and stateful clients.

Observed models allow other systems to receive events when their state changes. This provides real-time communication between users and systems. For example, when several users work simultaneously on the same model, changes can reactively synchronize between them, eliminating the need to lock the model.

Flows of events form the basic abstraction on which such connections are built. By keeping them reactive, we avoid blocking and allow transformations and communications to be asynchronous and non-blocking.

Reactive applications must have knowledge of the order of the algorithms to be sure that the response time to events does not exceed O (1) or, at a minimum, O (log n) regardless of the load. A scaling factor may be included, but it should not depend on the number of customers, sessions, products or transactions.

Here are some strategies that will help keep latency independent of the load profile:

In the case of explosive traffic, reactive applications must absorb the costs of expensive operations, such as I / O or concurrent data exchange, applying batching with understanding and taking into account the specifics of the underlying resources.
Queues should be limited taking into account the flow rate, the length of the queues for these requirements at the response time should be determined according to the law of Little .
Systems must be in a state of constant monitoring and have an adequate margin of safety.
In the event of failures, circuit breakers are activated and replacement processing strategies are triggered.

As an example, consider a responsive web application with “rich” clients (browser, mobile application) to provide the user with a high-quality interaction experience. This application executes the logic and stores the state on the client side, in which the observed models provide a mechanism for updating the user interface when data changes in real time. Technologies like WebSockets or Server-Sent Events allow the user interface to connect directly to the event stream, so that the entire system becomes event-oriented, starting with the back-end layer and ending with the client. This allows reactive applications to push events to the browser and mobile applications through asynchronous and non-blocking data transmission, while maintaining scalability and fault tolerance.

Now it becomes clear how the four “whales” of reactivity: focus on events , scalability , fault tolerance and responsiveness - are connected with each other and form a single whole:

4 whales reactivity

Conclusion

Reactive applications are a balanced approach to solving current problems in the development of software systems. They are built on an event-driven and messaging framework and provide tools for scalability and fault tolerance . On top of this, they support rich and responsive user interaction interfaces. We expect that systems, the number of which is growing rapidly, will follow this manifesto in the near future.

Subscribe under manifest

Source: https://habr.com/ru/post/195562/

All Articles

Jet manifest

Reactive applications

Event oriented

Why is it important

Key building blocks

Scalability

Why is it important

Key building blocks

fault tolerance

Why is it important

Key building blocks

Responsiveness

Why is it important

Key building blocks

Conclusion

Subscribe under manifest

More articles: