On the MegaFon Card - technical details

In previous posts we have already discussed the MegaFon card as a financial product and talked about its capabilities for the end user. But, of course, behind such a project is a huge amount of work done by a team of professionals. This time we will tell you more about the technical features of this project and about the software design.

Given the scale of the project, it will not be possible to describe all the subtleties within one post, so we will start with the story about the backend of the RBS system (remote banking service). The task of the backend is to provide a working bundle of all specialized systems within a single logic, as well as the functioning of a variety of background processes. And most importantly - a convenient and functional personal account for users.

Megafon has chosen TYME as the technical partner of the project. All the time, the specialists of the two companies worked closely with the suppliers of banking software, billing, payment systems and other suppliers, assembling separate functional pieces of the mosaic of the future project into a single whole.
')

“We had tight deadlines and a lot of work. There were no extra days for long discussions and the right to make a mistake when choosing technology and approaches. It’s great that together with TYME we managed to realize a complex and innovative project. ”
Yan Kukhalsky, CEO MegaLabs

In 2013, the project “Terminals” was launched, and since that time we have continuously improved the solution, integrating it with our services and new service providers and adding new features for customers.

TYME has quite a lot of experience in the FINTECH industry, they successfully overcame all the difficulties and we, together with technology partners, successfully launched the MegaFon map a. Further on the pages of the MegaFon blog we give the floor to the TYME team, let the guys themselves tell about the details of their part of the project.

Details of a large project

After the implementation of each large-scale project, we look back to evaluate the work done. In a very short time we managed to launch an extremely complex system, which is located in the very heart of a federal-scale financial product.

Some numbers for example:

18 external systems for integration
600 business processes were implemented in the project framework.
770 pages of technical documentation
2500 scripts for autotests
15,000 man-hours of development

It is possible to estimate the true scope of tasks only when the development of the project ends. The volume of tasks during the work is constantly increasing. If you look at it now, it becomes even a bit scary - how could we, in principle, subscribe to such a volume in such a time frame.

November 2015 The project is at the concept stage. Translated into human - we have only a clearly defined launch date and an approximate TK from the business.

We cut the marble

The opportunity to study the customer and not get it with the constant questions we have thanks to properly built relationships and several years of working together.

Here are some principles, following which we really helped:

Work based on the current state of requirements. You need to look for specific tasks that you can already undertake right now, instead of engaging in scrupulous detail.
Apply decomposition. It is necessary to distribute the work and conduct design and development independently.
Use ready sources. One of our tasks is integration with ready-made systems, which at the time of commencement of work were already partially launched. This will sound banal, but what allowed us to simplify integration is a thoughtful study of the documentation on functioning systems and joint fixation of the principles of work in the early stages of the project.
Take the lead in management. The key to project success is the developer’s initiative in the process of requirements formalization. That is, we must always work ahead of the curve, look for an opportunity to move forward along other parts of the plan, if some others sag.

Of course, with this approach, a risk for the developer inevitably arises - after all, resources for analytics have to be spent very substantial, and on their own initiative. This way is suitable in those projects in which you are completely confident in your relationship with the customer, and if you are well-versed in the industry itself.

Agile and long-term planning

Much has already been said about the merits of Agile methodology. We will not repeat and concentrate on those points that are usually harder for the customer to accept.

No hard deadlines for a big project
The cost of a large project can grow during its development.
Unpredictable boundaries of the project, that is, the workload will always be relative, and an attempt to finalize it will only lead to a strongly delayed start.

Since we worked on SCRUM, the problem was standard - the customer needed a waterfall project plan for the coming year. A detail of the project was ready only for a few sprints, for which the team has committed itself to perform the tasks described.

You can often see the following recommendation : if you want to get a long-term plan while working on Agile, start each sprint or iteration as a task in Project. As a result, the output will be approximately the following picture:

The recommendation is so general that we have not taken root. This version of the plan does not show product milestones that are significant to the customer, and cannot be fully used to discuss the long-term project schedule.

The 1986 silver bullet was invented by Barry Boeham, an American military. His idea is a spiral model of development . Many IT professionals already know what it is, but in practice, extremes are very often observed - either Agile without long-term plans, or a waterfall with deadlines and budgets that are constantly changing.

Artemy Lebedev in “Kovodstvo” spoke well on this subject.

Thanks to the spiral model, we solved two problems at once:

supported the current long-term plan, understandable to the customer
They gave the task development iteratively, without turning the SCRUM into a waterfall.

Project work was built like this:

Iterative detail requirements. The document, which described the requirements for each of the modules of the system, had its own internal structure, when its components may be in different degrees of readiness. At each of the iterations, the detailing of requirements increased, making it possible to take the next volume of tasks into development.

The evolution of work products. At each iteration of the development, we reviewed the entire stack of work products to the full depth in accordance with the new realities of requirements and feedback to prototypes. This sometimes forced us to redesign the product architecture. For example, at the start of the project it was planned that all the processes would be provided within 40 microservices, and during the architectural refinement their number was reduced to 20.That evolved ideas about how to decompose business logic into services. As a result, this gave a very big advantage - we constantly had the opportunity to receive feedback on the current implementation of the architecture, and we practically did not accumulate technical debt.

Regular rebuilding plan. The long-term project plan was also “alive”, based on current requirements, information on the schedule for the development of external systems and other introductory ones. We did not treat the plan as a constant fixed at the start of the project, for us it was a “route” that should lead from the point where we are to the final goal. Accordingly, if the situation changes around (and it happens regularly), the plan is also specified.

Orientation to early results. We were constantly working to get the task as soon as possible into the development, and as soon as possible to produce the result for acceptance and testing. At the same time, we tried to break the dependencies that prevented us from doing this - if external service is not ready, we do not wait for its implementation, but write a “stub”; if there is uncertainty in some part of the problem, we try to decompose it even stronger and perform work on those components where there is certainty. Often, development teams do not incur such costs for fear of doing extra work, and often these concerns are justified. This time we made a bet on speed.

Development by SCRUM . We made every effort to isolate the developers from the uncertainties in the project, without disturbing the familiar rhythm of SCRUM. The project management was built in such a way that at each moment of time there was a “buffer” of clear tasks for the team for at least one sprint ahead. This allowed not to lose momentum throughout the entire period of active development.

As a result, we arrived at such a task mapping between the plan in MS Project for the customer and Jira for the development team.

Jira was responsible for managing the workflow itself (but this could be any convenient product for Agile), and the MS Project plan was needed to control the global status of the work and visualize it for the customer.

Macro effect from microservices

The project turned into a bunch of subprojects that were highly related.
Since we first chose an approach that provides such fragmentation - we could afford it.

The trend of microservices has existed in the development world for quite some time. A huge number of discussions on this topic take place at relevant conferences. Someone fundamentally denies the benefits of such a construction of systems, there are those who have taken the exact opposite position and translate all their complex systems to microservice architecture.

Microservice architecture is an approach to development, in which instead of creating one large application, it can be split layers (GUI, Business logic, DB), a lot of small, isolated components are created, which are called microservices. Without going into theoretical details that can be found quite easily on Habré , I would like to dwell on how this approach turned out to be useful in our project.

Here are the main advantages of this approach (in our opinion):

Each service solves a specific set of tasks, it has an API, by which it is possible to access the service. We perfectly isolate responsibility within one service.
One microservice, if desired, can be easily replaced with a new version or quickly securely refactor.
Horizontal scaling microservice in the presence of difficulties with speed. This is a killer-feature for systems that should work in 24 * 7 mode. Scaling always goes along with monitoring the speed of each service, we collect these statistics and decide to launch additional instances.
The features of corporate networks are such that we are obliged to work in a closed loop on IS, but at the same time part of our platform has access to the Internet, other services are isolated and work with dozens of external systems within separate subnets. We have identified segments that operate on the public Internet, internal services and integration services, which are located in a special zone with the most restricted access. In the case of a monolith, one would have to combine several networks on one server, which is not always liked by the staff providing information security.

Of course, it was not possible to avoid some difficulties:

The most difficult decision is about the limits of microservice. It is necessary to answer unmistakably the question - which of the existing microservices should perform this task? At the start, we had to duplicate some solutions in several microservices in order to maintain their isolation. Yes, for the developer, this is a slightly unusual situation, since reusing the code is the main task.
A radically different approach to updating our application. Without automating this process, the administrator becomes more and more complicated, since he would have to perform several times more operations in the maintenance window in order to deliver the update to the industrial environment.

Dividing the system into a set of microservices + a small team + development on SCRUM: here’s a recipe that helped us reduce dependencies and make the most effective use of all our capabilities and competencies, to develop simultaneously in several directions simultaneously, minimizing the impact of each service on the rest of the system.

Be in the middle of the action

Our system not only provides the business logic for presenting bank data to a client, it is also an integration bus that combines the front-end and all the external systems involved.

When our backend fails, it automatically means that the user sees a crash. When any other system fails, we have the opportunity to mitigate the problem and maintain the efficiency of most functions.

This means that our system must have performance characteristics that exceed these indicators of any adjacent platform:

Best uptime
The largest performance margin
Better Resilience to Failures and Failures

We initially understood that our system will become the main hub from which to start the diagnosis of incidents, and we must have comprehensive information for any technical investigations.

These inputs have predetermined additional conditions for the development:

Fast delivery of updates
Maximum quality of work of each service
Exhaustive end-to-end ligation

Refresh in 90 seconds

When working with a platform built on microservices, it is important to understand:

Update needs to be automated as much as possible. It's time to forget about the manual update and eliminate the human factor as much as possible.
A huge plus of microservices is decentralization. We immediately planned that our systems should not have a single point of failure to ensure maximum uptime.

At the same time, we could not afford to introduce any technological windows for updates - any, even small system downtime, in our case, is extremely undesirable.

Combining the efforts of the exploitation service and the developers, we got a solution that has all the necessary properties:

Fast update that does not require "manual" intervention of developers or administrators. We set up auto-deploy, which provided rolling out “one button” with all our specifics. Preparation of the assembly takes place on the resources of the supplier, and the actual deployment - in the customer’s protected loop.

During the update the service does not stop. Since each service is deployed and runs on at least two parallel servers, during the upgrade we disconnect from the traffic a group of servers that are currently being updated and distribute the load among other instances of microservices. After the update, we check with the smoke test the operation of the new version of the service and, if everything went well, update the next group of services, transferring traffic to the already updated services. So, gradually raising the version, we get a working system, without a full stop of the platform.

Continuous monitoring of functional degradation. Thanks to autotesting of all the procedures, we were able to provide the ability to very quickly diagnose deviations before deploying to the battle and consciously make the decision to carry out the update.

Fast delivery of functionality to the battle. Thanks to all the points previously outlined, we can very quickly roll out corrections and new functionality into commercial operation without sacrificing continuity of work.

There are a number of points that can be improved in the upgrade procedure - for example, updating the database of each of the services and providing backward compatibility for the old and new versions of the service at the same time.

If it will be interesting, we will gladly release a separate article about the approaches that we use in ourselves.

Quality philosophy

A service that does not have to stop, and at the same time dynamically develops, is always between two fires. In case of downtime, financial and reputational losses will be enormous. And if we recall the extremely tight deadlines and changing requirements in the development process, the degree of complexity continued to rise. Therefore, the quality department faced an ambitious task to build product testing in such a way that it was:

Quick . We have tight deadlines, short sprints, and we don’t want to wait several days for testers to perform regression testing on checklists. Ideally, you need to get an answer about what regress the system suffered after each code change.

Complete . Rapid test execution is not an end in itself. The real goal is to be sure that if the tests were successful, then our system is ready for upgrade. So, the tests should not only be fast, but also fully cover the functionality of the product.

Flexible . Quick and complete tests are great. But if the customer has new requirements, and the changes require the complete rewriting of tests - this is no good, because we are losing speed again. Accordingly, tests must be flexible to change.

Cheap . A tester's resource at high development speed will always come to the fore, if we need to get high quality. Therefore, everything described above should be performed by the existing number of testers. We here only confirmed the general ratio, when 3 developers account for 1 tester.

Python was chosen as the basis for autotests along with the py.test framework. Python is a fast (in terms of development) and powerful language, and py.test with its wonderful subsystem of fixtures and parametrization is a flexible tool that allows you to widely reuse test code.

As an aggregator of results: a TeamCity build server with installed plugins for interpreting the check results from py.test.

The tests themselves are written as isolated as possible, the execution of each test does not depend on the result of the remaining tests. If the test needs to connect to the system database and receive data, then the test fixture must provide this data there before the test is performed. If the test can be affected by the value in the cache, then another fixture should reset the cache before performing this test. Yes, this led to the fact that it took a lot of time to develop a systematic grid of fixtures, but very quickly these investments paid off with the speed of adding new tests and, most importantly, the stability of their results. Full control over the system under test means a minimum of false test positives.

Integration with the TeamCity build server made it possible to simply press one button so that the tests check all the processes of the platform. At the same time, no preparations are needed, which means that any member of the team can do this. The test report is displayed in a detailed and clear way on the web-interface of the build-server.

We did not regret the complete abandonment of the automation of API tests through specialized solutions. Yes, such tools give a set of functionality right out of the box. But, firstly, it is not cheap, and secondly, we still need more opportunities.

For example, in certain test cases of our API, it was necessary to receive an SMS confirmation code for an operation, forward it to tests and observe the behavior of the system. This is where the power of coded tests comes to the fore, let them develop more expensive than, for example, collecting test-steps in SoapUI.

As a result, now the process is structured in such a way that Postman or the same SoapUI is used by testers only at the initial verification stage. In the end, the process should still be covered by auto-tests in Python and implemented in a common repository. This is the law.

Not a single change in the functionality of the system goes without testing in general and without autotests in particular. Story is simply not considered complete until it is covered by autotests. This approach requires high self-discipline from the team and performance from testers, but the result is worth it: if the build machine tests are green, we are confident in the quality of our system.

Now the number of functional tests exceeded 2500 and continues to grow. Execution takes 25 minutes.

The right choice of tools and full coverage of autotests from the early stages of development allowed us to maintain a high rate of implementation of new functions throughout the project, remain flexible to changing requirements, and not sacrifice product quality.

One more thing

Documentation is an aspect of IT projects that often remains in the shadow of stories about architecture, design and communication with the customer. But this is a very important detail, the lack of attention to which can complicate the operation and development of even the most remarkable, from the point of view of internal organization, system.

We have chosen an approach in which documentation evolves along the development cycle:

The initial requirements from the customer immediately form the first pages of future documentation.
At the design stage, it is supplemented with a technical description of the planned implementation
During the evaluation and planning, it is supplemented by answers to the question of developers, if they were not taken into account during the design.
After the requirements go into development, they are decomposed into tasks, each task receives a link to the section of the documentation in accordance with which it should be done
During acceptance testing of the implemented function, not only its correct operation is checked, but also the relevance of the documentation - often the implementation makes clarifications and “pops up” moments that were not obvious in the design, all the inconsistencies are returned to the analyst.
Upon completion of acceptance and testing, the developed function has a set of tests and final verified documentation.

This approach gave several advantages at once:

The work on the documentation was distributed throughout the development process; we avoided a situation where, upon completion, we need to allocate a large resource for documentation.

The documentation for each module was correct at the moment when all participants were maximally immersed in the topic, and this made it possible to make it as accurate as possible and not waste time trying to remember “how is it done”, which are inevitable if we carried this stage beyond development

In case of any disputes and doubts, we always have the opportunity to trace, including the initial requirements, development tasks and documentation.

The most important thing is that at any time we can provide the customer with documentation both on the current version of the platform and on the planned changes in order to ensure their readiness for updates.

Thanks to a couple of days that were spent on general editing and setting up a download from Confluence, the download of the final document can now be completed within an hour across all the components of the system.

Instead of conclusion

We tried to describe the general principle of the two companies and reveal the internal kitchen of the development.

It would be great to hear from you complex questions that we could use as the basis for new articles!

Thank you guys for such a detailed story! This is only the first part of the technical component of the Megaphone map, there will be more stories. Stay in touch.

Links

Source: https://habr.com/ru/post/317740/

All Articles