As we broke an old shack and built a skyscraper in its place

Zurab Bely, team leader, Java practice, tells his story of work in a project for one large company and shares his experience.

How I settled ...

I got to the project in the late summer of 2017 as an ordinary developer. I will not say that at that time I liked it a lot: the technology used in the project was old, communication within the team was minimized, communication with the customer was difficult and unproductive. So the project met me. At that time, I had only one desire: to get out of it quickly.

I'll tell you a little about the project as a whole. This is the official portal of a large company with general information, news, promotions and other content. All marketing newsletters contain links to certain pages of the site, that is, the load is consistently average, but at certain points in time can reach high values. Special attention is required to the stability and availability of the web application - every minute of service downtime leads to large losses for the customer.

Khibara, which glanced from the wind

At first, I mainly studied the technical condition of the project, fixed frivolous bugs and made minor improvements. From a technical point of view, the application looked awful: a monolithic architecture built around an outdated commercial version of dotCMS, code written in Java 6th version, when the ninth, server-side rendering of the client part on the Velocity framework, which by that time was not maintained. Each instance was launched in JBoss AS and routed using Nginx. Memory leaks led to constant restarts of the nodes, and the lack of normal caching led to an increase in server load. But the biggest splinter was the changes made in the CMS code. They excluded the possibility of a painless update to a more recent version. A good illustration of this was the transition from version 3.2 to 3.7, which was just ending at that time. The transition to the next minor version took more than a year. There were no popular solutions, such as the Spring Framework, React.js, microservice architecture, Docker, etc. Going deeper into the project, the consequences of such a technical condition became visible. The most acute of them was the impossibility of running the local application for development and debugging. The whole team of 8 people worked on one developer stand, where a copy of the production version of the application was deployed. Accordingly, only one developer could debug his code at a time, and rolling up the updated code blocked the entire command. The failure was the apogee, during which millions of letters were sent via various channels, SMS and push notifications to users - tens of thousands of sessions were opened simultaneously. Servers could not stand, and most of the time the portal was unavailable. The application did not scale well. There was only one way to do this: deploy another copy next to it and balance the load between them using Nginx. And each delivery of the code for the production involved a large amount of manual work and took several hours.
')
Six months after my involvement in the project, when the situation had already begun to spin out of control, it was decided to radically change the situation. The transition process has begun. The changes affected all areas: team composition, work processes, architecture and technical component of the application.

We built, built ...

First of all, there were personnel changes. Replaced several developers, I was made timlidom. The transition to modern solutions began with the involvement in the team of people who had experience working with them.

Procedural changes were more global. By that time, the development was carried out according to Agile- + Scrum-methodology, two-week sprints with delivery at the end of each iteration. But in fact, this not only did not increase the speed of work, but on the contrary, it slowed down. Daily meetings were delayed for a half to two hours and did not give any results. Planning and grooming turned into controversy, swearing or simple communication. It was necessary to do something with it. Initially, it was very difficult to change something in this vein - the team almost lost confidence in the person of the customer, especially after a failed sale. Each change had to be justified, discussed and proved for a long time. Oddly enough, but the initiative came from the customer. From their side, a scrum-master was involved to control the correctness of applying approaches and methodologies, debugging processes and setting up teams in a working manner. Although he was attracted to just a few sprints, it helped us to properly build the foundation. The approach to communication with the customer has changed a lot. We began to discuss the problems in the processes more often, retrospectives began to be more productive, the developers were more willing to give feedback, and the customer, on his part, went forward and supported the transition process in every way.

But, honestly, I will honestly say: there were quite a few moments when some changes within the team were carried out "in the dark", and already after the appearance of positive results this was reported to the customer. For six months, the attitude has changed to a comfortable working communication. This was followed by several teambuildings, one-day and two-day meetings of the entire development team with the customer's team (marketing specialist, analyst, designer, product-product, content managers, etc.), joint visits to the bar. A year later, and to this day, communication can be called friendly. The atmosphere has become welcoming, relaxed and comfortable. Of course, it is not without conflicts, but sometimes even in the happiest family there are quarrels.

No less interesting changes occurred during this period in the application code, in its architecture and in the solutions used. If you are not technically savvy, feel free to skip all the text to the conclusion. And if you are lucky as I am - welcome! The entire transition can be divided into several stages. About each detail.

Stage 1. Identification of critical problem areas.

Everything was as simple and clear as possible. First of all, it was necessary to get rid of the dependence of a third-party commercial product, cut the monolith and make the possibility of local debugging. I wanted to separate client and server code, spread it out architecturally and physically. Another problematic place is qualification. The project completely lacked any automated testing. This made the transition process a bit more difficult, since everything had to be checked manually. Considering that there were never any technical assignments for the functionality (this is due to the specifics of the project), there was a great chance to miss something. Having painted problem areas, we looked at the list again. Looked like a plan. It's time to build a skyscraper!

Stage 2. Updating the code base.

The longest playing stage. It all started with the transition to a service architecture (not to be confused with microservices). The idea was as follows: to split the application into several separate services, each of which will be engaged in its specific task. The services were supposed to be not “micro”, but they didn’t want to put everything in one pot either. Each service had to be a Spring Boot application written in Java SE 8 and run on Tomcat.

The first was the so-called. "Content service", which became the layer between the future application and CMS. He became an abstraction on the path to content. It was assumed that all requests that we previously made directly to the CMS will be executed through this service, but now via the HTTP protocol. This solution allowed us to reduce connectivity and made it possible later to replace dotCMS with a more modern analogue or even eliminate the use of commercial products and write our solution, tailored to our tasks (looking ahead, I’ll say that we went this way).

Immediately created the ground for the separation of the front and backend code. We created a front-service that became responsible for the distribution of code written in the reactor. Screwed npm, configured the node and debugged the assembly - everything is as it should be according to the current trends of the client part.

In general, the functional was allocated to the service according to the following algorithm:

created a new Spring Boot application with all the necessary dependencies and settings;
ported all the basic logic (they often wrote it from scratch, referring to the old code, only to make sure that they didn’t forget about any nuance), for example, for the caching service, these are the possibilities to add to the cache, read from it and disable it;
all new features are always written using the new service;
Gradually rewrote the old pieces of the application in the new service in order of importance.

At the start we had a few of them:

Content service. Served as a layer between the application and CMS.
Cash service. Simple storage on Spring Cache.
AA service. At the start, he was engaged only in the distribution of information about an authorized user. The rest remained inside the dotCMS.
Front service. Responsible for the distribution of client code.

Stage 3. Autotest.

Taking into account all the experience of the project, it was decided that the availability of functional tests greatly simplifies life and possible further update of the application. It's time to introduce them into the project. The unit tests of the code, sadly to talk about it, stalled almost immediately. They took a lot of time to write and support, and we had very little of it, because, in addition to rewriting the code, we had current tasks for the new functionality, and bugs often surfaced. It was decided to stop only on testing the application interface using Selenium. On the one hand, it simplified our regression testing before deliveries to production, on the other hand, it became possible to refactor on the server side, monitoring the state on the client side. The team did not have an automator, and writing and maintaining the relevance of autotests require additional costs. They did not retrain any of the testers, and one more person was added to the team.

Stage 4. Deploy automation.

Now that we have separate services, when the frontend has separated from the backend, when the main functionality began to be covered by autotests, it’s time to open a can of beer and automate all the manual work of deploying and supporting the application locally, on demo and prod servers. Cutting the monolith into pieces and using Spring Boot opened up new horizons for us.

The developers were able to debug the code locally, running only that part of the functionality that is necessary for this. Test benches finally began to be used for their intended purpose - there already got more or less debugged code ready for initial and qualification testing. But still there is a lot of manual work that takes us precious time and energy. After studying the problem and sorting out solutions, we stopped at a bunch of Gradle + TeamCity. Since the project builder was Gradle, adding something new did not make sense, and the written scripts turned out to be platform independent, they can be run on any OS, remotely or locally. And it allowed not only to use any solution for CI / CD, but also to change the platform to any other without serious consequences. TeamCity was chosen because of the rich built-in functionality, the presence of a large community and a long list of plug-ins for all occasions, as well as integration with the Intellij IDEA development environment.

At the moment there are more than 100 optimization scripts and more than 300 tasks in the CI system to run them with different parameters. It is not only deploying test benches and delivery in production, but also working with logs, server management, routing, and just solutions for routine tasks of the same type. Part of the task was removed from our shoulders. Content managers were able to reset the cache themselves. Technical support guys were able to independently pull individual services, conduct primary resuscitation actions. Developers sleep has become deeper and calmer.

Stage 5. Own CMS.

After we managed to abstract from a commercial CMS, it became possible to receive data from another source as well, without rewriting the application code. Where to get this or other data was decided by the service for working with content. After searching for ready-made solutions and analyzing them, we decided to write our own content management system, since none of them met our needs completely. Writing your own CMS is an endless process, new needs and wishes are constantly emerging. Chose a few basic features and went to the wonderful world of development. To launch the first version in the prod, we have had one and a half man-months. As soon as the functionality in the new CMS is ready, we transfer the content from the old one to it. All new pages have nothing to do with dotCMS. Over time, this allowed to abandon the paid version and go to the community version, and later to completely abandon something third-party.

Stage 6. Warehouse.

Rolling up our pants, we began our journey into the world of hipster programming. This stage for my project was the final in the process of restructuring. He continues to this day. The main sphere for which this stage appeared in general is scaling. The Docker + Kubernetes + Consul bundle allows you not only to facilitate deployments to different servers and manage each service separately, but also flexibly scale the entire application, only in places where it is needed, and only while it is required. I can only describe in more detail when we fully switch to this solution in production.

... and finally built. Hooray!

A year has passed since the beginning of the application update. Now it is 26 REST services written in Spring Boot. Each has detailed API documentation in Swagger UI. The client part is written in React.js and is separated from the server. All main portal pages are covered with tests. There have been several large sales. The transition to modern technology, getting rid of the old code and the stable operation of the servers strongly motivate developers. From “as they said, and we are doing” the project moved to the mainstream, where everyone is interested in success, offers his own options for improvements and optimization. The attitude of the customer to the team has changed, a friendly atmosphere has been created.

This is the result of the work of the whole team, each developer and tester, managers and the customer. It was very hard, nervously and sometimes on the verge of a foul. But the team cohesion, good planning and awareness of the results made it possible to overcome all difficulties.

Source: https://habr.com/ru/post/457624/

All Articles

As we broke an old shack and built a skyscraper in its place

How I settled ...

Khibara, which glanced from the wind

We built, built ...

... and finally built. Hooray!

More articles: