High load application architecture. Scaling of distributed systems. Part one

Some time ago, the deputy head of the Badoo development office in Moscow, Alexei Rybak, and the leading IT compote, recorded the release of the podcast “Architecture of high-loaded applications. Scaling distributed systems. "

Now we have done the decoding of the podcast, brought it into a convenient form for reading and divided it into 2 parts.

What they talked about in the first part:

General information about the project Badoo: technology stack, the nature and volume of workload, attendance.
Horizontal scaling of the project:

- Web server, caching, monitoring etc;
- pitfalls when scaling the project;
- scaling databases, how to do sharding.
')

Moderators: Hello everyone, you are listening to the 45th issue of the podcast “IT Compote”, and with you the presenters are Anton Kopylov and Anton Sergeyev.
Today we decided to talk about back-ends, about web development, and more specifically about the architecture of high-load applications and how to scale distributed systems. Our guest, Aleksey Rybak, will help us in this. Alexey, hello!

Alex Rybak: Hi!

Led .: Site "My Circle" says that Alex - the head of platform development and deputy. heads of badoo’s Moscow office. Aleksey has been developing various complex distributed high-load systems for a long time using a variety of technologies, including monitoring Pinba servers and other things. Maybe I don’t know something - Alexey will complement or correct me. In addition, our guest actively participates in conferences, is on the HighLoad organizing committee, a large and powerful konfy, in PHPConf and, probably, somewhere else.

AR: I would like to correct a little. Pinba was made by my colleague Anton Dovgal, and the first version was made by Andrei Nigmatulin. I suggested something more there, invented it.
And yes, I really am the deputy head of the development of Badoo and mainly deal with platform projects, supervise large “open source” projects. And in general, everything that I tell, I rather do not myself, but our guys do. We have a fairly large staff of engineers. We have been working with many since the days of “Mamba”. Therefore, to say that I’m working on such a gigantic system in one person is, of course, wrong - I’m more involved in administrative work, I’ve recently led several development teams.

Veda: I see. But we were not going to, so to speak, strongly raise you against the background of other developers, to say that you are such a good fellow in Badoo. We still have questions about the team, we listeners asked such questions in the comments.

I will now voice a small plan to our listeners. Alexey will tell in brief about himself, how he started, what he did, what he came to, and so on; we'll talk about what Badoo is, if suddenly someone doesn't know what kind of project it is. We will touch on the issue of horizontal scaling and tell you how to scale in general, what technologies are there, what problems are, using the experience of a large service as an example, consider this whole thing. And we will have a very interesting topic today - this is about all sorts of different asynchronous tasks in high-load systems (these are job queues, message queues).

Alexey, you have the floor, tell us briefly how you came to the world of development, what you do now and what you plan to do in the future in general, what plans.

AR: Well, good. In short - I was not going to become a programmer when I was in school, and in general I was practically not engaged in programming. In grade 9-10, I went to university to study Fortran, but before I got to these classes, we often went somewhere to drink beer or wine. In general, somehow I didn’t have programming at all, I entered the Faculty of Physics of Moscow State University and wanted to study physics. But it so happened that there was little point in practicing physics in the 90s, all my friends left, and I decided to go in for web development. And the first thing I did, it was not really web programming - I was a webmaster. Besides the fact that I wrote some Perl scripts and automated something, I painted banners, cut something in Photoshop and so on. Then I worked in different companies and I was lucky - in 2004 I got into Mambu. This is a very big project that instantly became popular in Russian-speaking countries.

Ved .: Everyone knows, probably, this dating site, the first big known social network. At least, I remember her since the days of the "dial-up" of the Internet.

Ved .: Yes, I remember it was very popular to do “affiliate programs” with “Mamba” when you screwed your domain to this system and get a commission from your users.

AR: “Mamba” and developed at the expense of large and small partners. It was a giant step forward and a coup in my career. It was a very small, but very professional team. I partially had to learn from my mistakes there. We made a huge number of errors associated with the development of a few years ahead. But survived - this project is still alive. True, there, most likely, everything has already been redone. Somewhere from 2006 or from the end of 2005, we are not engaged in this project - there is now a team there, and we have started working on the Badoo project.

In Badoo from the very beginning there was a very small team, literally a dozen engineers, including remote employees, system administrators - in general, everyone. Since then, the company has grown. I was there responsible for all the development of the back-ends server part, with the exception of C-demons, that is, for all the deployments of the main features, just for the development of the site. There was actually only one “techno-manager” - the technical director, who was also the head of admins and “sshnikov”.

Ved .: And I wanted to know this. You have just mentioned that the back-end of yours is sishny - do you have a lot of code in your app?

A.R .: Actually, there is quite a lot of code, but, if I may, I will tell you a little later why we have such a stack of technologies.

Ved .: Good. Alexey, it turns out that you already had good baggage, experience in developing dating services, some ideas about what things need to be done and not needed there, and that’s how you entered Badoo?

A.R .: Yes, we have grown very much over the past few years, and therefore I had to leave development. But the experience that we have accumulated here is quite interesting, and we will convert it into articles, into speeches. We even have a seminar - my seminar on the development of large projects. Therefore, this topic is very interesting to me and I will be happy to talk about what we have learned, what methods and so on, we use.

Regarding the technology stack: we use Linux and a bunch of "open source" software. And even if you do not develop anything in C or C ++, then it often happens that you have some kind of underground knock, and you do not know what's the matter. You need to have competence within the company: take, open source, read, understand and fix. Starting from a certain scale of the project, from a certain load, such competence should be. Imagine that you have good system administrators, and good system administrators, as a rule, are able to read C code. But if you have sishniki in the state who can read and correct it, then this is generally wonderful. Therefore, one of the areas that our customers are covering is the refinement and development of various kinds of "open source" software.

Veda: Yeah. So basically, as far as I know, Badoo is written in our native PHP?

A.R .: Yes, basically this is the PHP language, and the database we use is MySQL, that is, these are the most seemingly simple technologies. Nevertheless, everything can be done on them - great and wonderful.

Led .: Yes. And how much audience are you currently active daily?

AR: Good question. I’ll probably don’t remember the specific numbers; I could be mistaken for tens of percent. Interested in daily or monthly? I think that the daily audience in the region of 10 million. I am talking about authorized users. And if we talk about those who just sometimes come and look at some questionnaires, I think there are all 20 million.

Ved .: And while you have a fairly large fragmented application. I mean that it is actively represented on the mobile platform - these are phones, tablets, the web version ... are you not on the Xbox, PlayStation consoles?

AR: This is an interesting question. The fact is that once the development was really done in such a way that it worked, roughly speaking, on a TV, in which there is JavaScript, if it exists, then it is somehow castrated and so on. If we talk about what applications, Badoo is a hybrid of a social network and a dating site. But we have separate applications that somehow just repeat the functionality of Badoo. But they can be completely detached. It can be applications in Facebook, it can be applications on the phone that are distributed separately through their channels, no matter how badoo. There is the main site of Badoo, the application - there is just “voting”: beautiful or not, good or not. Any such things. There are really a lot of applications, and I am afraid that there are already dozens of them. But, anyway, the back-end for everyone for them is PHP and the database (if used, then MySQL).

Ved .: And that's how the project always begins. With him, in case this project is really successful, interesting to users, the audience begins to grow rapidly. Accordingly, you need to somehow somehow support this whole thing so that the service does not fall, so that the service responds adequately. Accordingly, there is such a very banal thing that only lazy is not talking about right now - this is horizontal scaling. There are different ways and technologies here - Amazon, not Amazon, clouds, physical hosting; Anything is used - PHP, Python, Ruby, Java, I don’t know ... even Erlang - maybe someone is the one who uses it right away, God forbid. How did you approach this, what are you using and why these technologies?

.R .: It is difficult for me to answer in the most general form, I will tell you more about what we use. It seems to me that the “zoo of technologies” in the world in general is such that many, having chosen this or that technology, are forced to reinvent the wheel sooner or later. But since around one or another language the community is large enough, in the end this bike becomes a community bike entirely, and one way or another, choosing one or another language, most likely, the task of scaling is solved.

We quite easily learned how to scale directly the application server. In order, relatively speaking, to scale the application server, all that is needed is to do so that the state of the application itself is as small as possible, and that the processing of the request can easily flow from one application server to another, and so as little as possible there was a load on the processor. The scripting languages here are not very good.

The tasks that we solved here were mainly connected not with scaling, but with reliability. Since 2005, in my opinion, we have been patching PHP. As a result, the project became the so-called PHP FastCGI process manager or PHP-FPM and entered 2008 or 2009 into the core of PHP. And this is actually our development. Initially, Andrei Nigmatulin did all this, and this was done rather not even for scalability, but for the convenience of support, so that it could be restarted smoothly and so on.

And scaling directly to the application servers is a fairly simple thing, you just need to correctly scatter traffic. Difficulties appear when scaling databases: here is how to make horizontal scaling of the databases directly? And with the cache, too, everything is quite simple, because the cache is such a thing, which is quite a simple way to smudge, and in case you add new servers, it's okay if you have this spreading data completely transferred from one server to other. If you are unable to raise the system with caches with such data re-blurring - most likely, you have something wrong there just in the architecture, that is, adding servers or redistributing the data according to them is nonsense.

And with the database does not work. There are several methods to scale them, but all of them, one way or another, must be based on usability. We have at the moment - now I do not remember, somewhere I could be mistaken - about 500 database servers. In fact, a fairly large installation, it all lives in several data centers and is managed, in fact, by one person. Here is one person in our database. We did everything to make it optimally comfortable.

Ved .: But you managed to control one person with the help of automation, raising new nodes, backups and so on?

AR: Yes.

Ved .: Do you have your own tools for this or do you use any solutions?

AR: The fact is that, probably, there are now some solutions that could be considered. But such research is expensive, and we designed everything in 2005, and at that time there was nothing like that. Therefore, it is now more profitable for us to work within the framework of what we already have. That is, I will not recommend people to invent everything from scratch, but we are in the framework of the technologies that were laid back in 2005. But I must say that we laid them pretty cool - we moved from one data center to two. If there is a third data center, we will calmly continue to scale further, and most likely this stack will not change with us. And the meaning is this: one of the main mistakes that people make when they first start scaling databases is that they spread data across servers. I call this approach deterministic - by keys, that is, roughly speaking, the simplest example is some function of the key value.

Conductor: Yeah.

AR: The remainder of the division, the first letter of the login, there are all sorts of ways - they all do not work in support. That is, they are simple and beautiful there, but they all do not work in support. Because as soon as it turns out that some node crashes and you need to immediately use instead of ten servers only eight for some reason, you need to temporarily turn off the two nodes, transfer data from them somewhere, and then an ambush appears full, because that this formula fails. Then people, who somehow work with support, understand that it is necessary to somehow modify this method, come up with all sorts of different tricks, but they are already so obtained semi-deterministic.

The most convenient way is the so-called virtual buckets or virtual baskets, virtual “shards”. The point is that a certain mapping of the same identifier or something on some number is used, but this number is virtual, and there are many of them. And then these virtual numbers or virtual shards are mapped by a special configuration file onto physical tables or servers. And so two problems are solved at once - one problem is that the mapping is all the same, one way or another, routing data - from where to load data or where to save data. It turns out to be deterministic, in the sense that it is quite easy to calculate the function. On the other hand, in the event that a node fails, the administrator only needs to change the configuration file, if there is a backup, “transfer” some data and run it all in production. Almost all projects stop there, but such a task does not solve the problem of several data centers and the connectivity problem.

User moved from one country to another. Since we still have some deterministic part, we can always “otmappit” the user only on one particular shard. Conventionally, yes, well, because the function works for us. We can not take and say, that this particular user is moving from one shard to another. And we have such a problem, because we sometimes migrate between users' data-centers in order for everything to work faster for them. But in order to do this, we have to do the most difficult mapping: we sort of assign each user to a particular shard through, roughly speaking, separate storage - very fast storage, but we can move one shard between shards.

I thus described, probably, the whole range of solutions, one way or another tied to the horizontal scaling of databases - simple deterministic, through virtual buckets and with each user tied to his shard.

Ved .: We, by the way, are asked about this by user Andrei. He asks: “Alexey, I wonder how your data storage architecture, cache and database are arranged?” The base is clear. And how you determine the need to move the user, that is, why the given user may need to move to another shard. How do you define?

AR: I will give you a very simple case, there is nothing secret here. There are a lot of examples, but this case is glaring and a little ridiculous. We have two data centers, one in America, the other in Europe. And when we launched the American data center, we thought, and who do we want to land there - and it is obvious that Americans, Canadians, and also people of Latin America need to land there. In general, everyone who lives in South and North America. Fine! And so we lived for some time and suddenly realized that Asia was very slow. And we conducted a series of studies and realized that if we were to land some Asian countries in America, and not in Europe, it would be great. Now imagine: you have a country in which there are already several million active users, and specifically these users need to be moved from one data center to another. Here, the "epic" problem, it was recently solved with us.

Ved .: It's great, and now you get more and the system itself, there are automated tools. Through this layer, you can flexibly and easily, if you need it, redistribute users. Generally great.

AR: We originally laid it, yes.

Ved .: Here's what about scaling. , , . : « - , ». : , , , , , — .

..: , , - . - , . , . , . , , - . , , , , , . - , , . - , , , .

, , . - iPhone Android', «» http, , , . , html, - — , -, , , — , — , — storage-, .

, , , «» , , 0,1-0,2 , 0,2 — , , . , , , , . , 0,5 — , , . , . application- Pinba, - – url, , , , , URL.

.: , : , ? , - , - , « » - ?

..: , . , . , , : , - , ? , - — , - rebuild - - , ? ? , , , 100 , , . , , , , . — 0,001 100 — 100 000 . : 1000 , - , « », , - , . , . , . , - , - , , , . . : , .

.: , , .

..: , . , -. , , , .

2005 , — , , statement based. , data- , , , . , - , . , - , - — , - .

, , , ( — ), . , , , . . MySQL.

.: Mysqldump -, - , , - ?

..: . , , .

.: . , - , , ?

..: , , , , , .

.: : , ? «» ?

..: — — . , , , : , , data- . «» , . , , — 10-15 data-.

.: , - , front-end, — PHP, , ? , nginx?

..: , nginx.

.: , PHP-FPM. , , — , , , , — FlySoft, , FPM- PHP , , , Apache , nginx PHP-FPM, -, «». : - , , Amazon S3 , EC2 - . , - , data-?

..: data-. , , — , , , « », , , , . . - , - , , , . , , - . - , , . , , , : -, CDN, - — . «» , .

, , , , . . data-. , , , , - , - , . , . , , , , .

.: .

..: . CDN', , - , , . - - research connectivity, , - . CDN' - - — , CDN. , , - . , - — , , - research. , , . CDN. , - , - CDN CDN, .

.: , . , — , Vagrant, , , . ? memcached ?

..: , «» memcached — .

.: , - Redis', ?

..: , . production . - , , Redis - . production memcached.

.: , , , , , ?

..: , , ? , — , , ()… .

.: , , () . , : « ? - , ?» . : , , ORM , , - , , , , , , , ? ? , -, - , ORM . , , , ?

..: — ORM , , « — ». , , , , « »? — . - production , : - , , - , , , , . — «» , , , , .
: «, , ». , , . , . , , — .

...
.

Source: https://habr.com/ru/post/185220/

All Articles

High load application architecture. Scaling of distributed systems. Part one

More articles: