Say no to solidity: how we made the CPA platform

Under the cut - a story about how we implemented the modular architecture of the CPA-platform (Content Provider Access) for the telecom operator on Fuse Fabric. At the same time, we will explain why we made such a decision, rather than using the standard J2EE technology stack to create a monolithic application.

What is CPA?

CPA is a platform for organizing interaction with content providers that provide various services to subscribers of their partners, telecom operators.

As the simplest scenario, let's consider the situation when a subscriber wants to know the weather for tomorrow:
')

the subscriber sends an SMS message to a short number to receive the weather forecast;
The SMS message is received by the SMS center of the telecom operator;
SMS center sends a message to the CPA platform;
The CPA platform determines the service associated with the short number, determines its cost, checks the subscriber’s account with enough money, requests the weather forecast from the partner providing the service, writes off the subscriber’s account, logs the business transaction and sends the data to the SMS center for sending to the subscriber;
The SMS center sends an SMS to the subscriber with the weather for tomorrow;
the subscriber receives an SMS message.

This scenario illustrates well how different processes can be built on the basis of the interaction chains of various modules. In particular, they involve: the module of interaction with the SMS center, the service catalog, the catalog of partners, the module of work with the billing system, the module of logging business operations, the module of interaction with a partner, and so on.

Even this simplest scenario consists of more than a dozen operations. At the same time, telecom operators can have dozens of much more complex scenarios, during which the most diverse systems interact. For example, a request may come from the USSD center, IVR center, web portal, and so on. In addition, it may not be a one-time request, but a subscription to the service (including in installments), unsubscribe from it, and so on.

Choice of approach

When we started the project to create a CPA system, we chose from several options for the platform architecture. When designing applications of this scale, first of all you need to consider the following aspects:

This will be a priori high-load system, which should handle hundreds of fairly complex business scenarios per second. As the load increases, the system should scale easily horizontally.
Such systems are subject to very stringent requirements for downtime. It should be minimal even when the system configuration changes, for example, in the case of adding new functionality or changing current components.
It is necessary to provide a simple centralized management of the system, with the display of metrics that allow unambiguously assess the current state.
The telecom sector requires a very quick response to changes in the market and technology pool, so the system architecture must allow for the rapid replacement of components.
It is extremely important to shorten the time to introduce the product as much as possible, therefore the development and testing cycle for new functionality should be as short as possible.

Given all this, we quickly realized that the traditional monolithic J2EE architecture does not suit us, because it does not provide the required flexibility and does not cover all customer needs. Therefore, we chose a modular architecture based on OSGi — Red Hat Jboss Fuse in the Fuse Fabric configuration.

Our architecture was to be based on stand-alone modules that interact based on well-defined interfaces. Today, this is described by the term “microservice architecture”, but at the time of the beginning of the creation of the CPA platform, the word “microservices” was known to few.

Note that the wrong choice of architecture for systems of this level can lead to literally catastrophic consequences both at the stage of development and operation, and at the stages of further development of the system to expand its functionality. But we needed as soon as possible to show the customer a prototype proving the viability of the chosen idea. This is where OSGi modularity support came to the rescue.

In designing the system, we relied on the API modules of which it was supposed to consist, and BPM processes (business processes described using standard BPMN notation ). These processes described the business scenarios that the CPA platform performs by calling the API modules. That is, at the initial stage, we had modules (OSGi-sets), which were essentially microservices- “stubs” that performed primitive operations, and BPM-processes that described the sequence of calls to OSGi-services as close as possible to the business scenarios of the telecom operator to achieve the desired result.

At the same time, we understood that the OSGi-sets created by us will be called both in BPM scripts and from other places. For example, a service for working with a service catalog is not only directly called from BPM processes, but is also used in back-office systems so that content managers can start, delete, and edit services. Also, the service catalog API is used to provide a list of services on the telecom-operator site. The billing system service was first used only in the CPA platform, but now it is also available as a REST service, therefore other systems in the infrastructure of the telecom operator use its services.

Implementation

When creating an architecture in the form of modules with a clearly specified API, we very quickly became convinced of its effectiveness and efficiency.

Since the OSGi-APIs were agreed and approved, we were able to parallelize the development process of individual OSGi-sets for different development teams. At the same time, the teams did not influence each other in any way, because they all worked with their code bases and subject areas. The main thing - do not violate the agreed API-set. One team could focus on developing a partner directory, another on developing a business-logging service, and so on. In this case, the architect monitored the integrity of the system based on the agreed API.

For easy tracking of API changes, we have adopted semantic versioning of OSGi sets. This implies assigning versions of components in XYZ format, where X is the major version, Y is the minor version, Z is the patch version. The versions are iterated according to the following rules:

the major version is incremented when incompatible API changes are made;
we increase the minor version when new functionality is added that does not violate backward compatibility;
We increase the patch version when backward-compatible fixes are added that do not affect the API.

Thanks to this, we easily tracked changes to the API services across the entire system and accurately managed them when building together the services together.

When developing complex systems, it is necessary to write tests for the implemented functionality. Since the components in the OSGi environment interact by invoking interfaces, this greatly simplifies unit testing. If any service in our application refers to another service, then at the testing stage we can easily replace with a “stub” the service that is being accessed. And when testing each module, it is necessary to take into account the specifics of its work.

We understood that in a high-load system, any service, depending on the nature of the load, could become a bottleneck in the entire application. Therefore, for each API call of an OSGi set, metrics that were available via the JMX interface were removed. This made it possible to record the maximum duration of a call to any service method, the average value over a period of time, and the dynamics of changes in the metric. The captured data helped us during the load testing to immediately identify the bottlenecks of the application, and also made it possible to monitor the production system in the online mode. So with the degradation of performance, we could quickly take measures to stabilize the system. For example, you can transfer the service to another node of the runtime environment, or move it to a separate JVM. In this case, OSGi allows us to perform all these actions on a running system without affecting the rest of the application components.

When adding new functionality to an existing system, we can explicitly track changes in the code base (new sets or later versions) and optimize the testing process. If we see that in the next delivery the version of the component has not changed, then it does not need to develop new tests and re-test it carefully. It is better to focus on the new functionality added to the system.

Of course, no one is immune from errors that may occur in the production-environment. But OSGi allows you to quickly make changes to a running system. It is enough to correct the error in the component, increment its patch version and install it into the OSGi runtime environment. This is an important advantage in conditions when the system downtime significantly affects both the quality of the service provided and the company's profit.

It is also worth noting that when an error is detected in a monolithic architecture, in contrast to a modular architecture, you have to rebuild the entire application and restart it. And restarting a traditional J2EE application usually takes a lot of time, which is tantamount to financial losses for a company.

Development department of the Center for software solutions of Jet Infosystems

Source: https://habr.com/ru/post/336022/

All Articles

Say no to solidity: how we made the CPA platform

What is CPA?

Choice of approach

Implementation

More articles: