Why the SoundCloud team switched to microservices

We have already talked about the imgix photoservice data center, described the detective story of finding problems with SSD disks of the Algolia project, and today we present to your attention how the SoundCloud streaming service team switched to using microservices.

/ photo Jonathan Gross CC

Recently, microservices have become a real hit, but the company made the move not for technical reasons, but for productivity reasons. Two teams worked together with one of the working projects, which was a monolithic application on Ruby on Rails, a one-page JavaScript web application and a client to a public API .
')
They were isolated from each other and even worked in different buildings. The development process was based on the description of ideas, mocaps, the creation of layouts, code, and the layout of the project after a short test. As a result, engineers and designers complained about the overload, while product managers and partners complained that nothing was ever done on time.

At this stage of the project development, it was critically important to announce a partnership with a number of prominent companies, but for this it was necessary to launch a closed beta test. Just at this time, the team decided to find out where the organic growth of the company had led them.

For this, a Value Stream Map was used, which helped to visualize the reality of the development process. It turned out that many third-party services and iterations were involved. For example, the specification was published in Drive, the cards on the board in Trello, the code changes in Pivotal Tracker, and so on. As a result, the total implementation time increased also due to the fact that the second team usually waited for changes in the brunch to accumulate, and only then “rolled in” them into production.

In general, it took at least two months for a new feature to be implemented. To solve the problem, the team decided to adapt the release train technique. Instead of waiting for a few new features to accumulate, you can simply deploy them as soon as they are ready every day. In this whole scheme, the main trouble was ping-pong between two front-end and back-end teams — 47 days spent on “engineering,” and 11 days — on development. The rest of the time was wasted on meaningless waiting.

An experiment with the development of a pair of features by backend and front-end developers led to the fact that each individual employee began to spend more time working on a specific function. At the same time, the process of obligatory code review (pull request) was preserved, before the code could get into the master branch of our Rails application.

Further, the designer, product manager and front-end developers began to work together, and the development cycle was a little further reduced. All this allowed the team to implement the first release of Next, much earlier than the deadline.

Creation of pairs from employees of different specialties, which led to the structure of “feature teams” (feature team), which is used in SoundCloud today. Further, in the process of working on a monolith on Ruby on Rails, the team was faced with questions regarding the practice of reviewing the code. It was decided to apply a similar approach - the requirement of viewing the code by the second engineer, and with pair programming, such a review was carried out in real time.

An experiment with several pairs revealed problems that made us start all over again. The base of the monolith code was so extensive and covered so different aspects that no one could navigate in all this. The company even born a meme "everything is always good until you need to climb into the monolith."

As a result, a lot has come to reconsider. The backend system was a black box that needed to be opened. The output was not a monolith, but a platform with many components. Each of them has its own owners and its own independent life cycle. In addition, for different modules, the service level expectation also differed.

Thus, at the code level, it was necessary to make sure that making changes to a specific function can be carried out in relative isolation, without having to touch the code of other components. And from the point of view of deployment, it was necessary to make sure of the possibility of an isolated deployment of a feature that would lead to the creation of a group of servers responsible for only one feature.

It was decided to go the other way and build everything that is needed for the monetization project, as services isolated from the monolith. The project included large-scale updates and a complete rethinking of the subscription model, but we managed to finish it earlier with two teams of two engineers each. The experience was considered successful, and this approach was applied to all new developments.

The new architectural framework has reduced the time to create new functions to, if not ideal, but much more acceptable for the company values that allowed not to lag behind competitors in the complex music market.

One team is still responsible for the main objects, but now the architecture is more stable, which reduces the number of “fires”. As a result, even these employees can make time for switching to microservices.

Today SoundCloud still has a monolithic code, but its importance is decreasing every day. It is still critical for many functions, but now it does not even work directly with the Internet (thanks to a special system ). Perhaps it will never completely disappear - many functions are so small and stable that it may be cheaper to leave everything as it is, but after a year nothing really important will remain there.

PS A little more about how we improve the work of the virtual infrastructure provider 1cloud :

Source: https://habr.com/ru/post/266699/

All Articles

Why the SoundCloud team switched to microservices

More articles: