📜 ⬆️ ⬇️

The story of my startup: 500,000 users in 5 days on a hundred dollar server

It seems that everyone in the startup world agrees that the first versions of applications should be a minimally viable product (MVP, Minimal Viable Product), creating which you can not really care about its scaling from a technical point of view. I have heard many times that the most important thing in such cases is to quickly release something that works. And, as long as the business model functions normally in the conditions of growing customer base - everything is fine. And to spend time and money on making a system that can withstand the sudden influx of users is not worth it. All you need to do is worry about checking the assumptions, about the assessment of the market and about the promotion of business. Scalability is something that can be postponed until later. Unfortunately, such a blind belief in stereotyped ideas has often led to deafening failures. Pokémon GO, and, in particular, the applications associated with this project, serve as a reminder of this.



Jonathan Zarra, a developer of GoChat, certainly will not make such a mistake again. He scored a million users in 5 days , creating a chat application for Pokémon GO fans. From the above material, you can learn that he spoke with investors on the subject of monetization and expansion of the application. Immediately after this, GoChat collapsed. Lost a lot of users and spent serious money. The idea is a genius, and as a result - a real disgrace. Yes, by the way, I, a few days after writing, slightly edited the original version of this article, since GoChat is still alive on Google Play, and already has more than two million users. It is expected that its iOS version will be restored again soon.

Zarra had a hard time: he needed to pay for the servers needed to support a million active users. He did not think that he would be able to attract such an audience. He created the application in the best traditions of MVP, postponing concern for scaling for later. In general, we can say that he initially condemned his project to failure. When the problems of application architecture manifested themselves in all their glory, Zarra hired a programmer at Upwork to fix a lot of performance problems. He said that server costs are around $ 4,000. It is 2016 in the courtyard, so it is safe to assume that this is not about buying “iron” servers. This $ 4,000 is the cost of annual or monthly rent of virtual servers and payment for traffic.
')
Practically all my professional life I have designed and created web platforms for hundreds of millions of active users. And I can say that $ 4,000 per server is too much for a million chatters. Even for MVP. This suggests that the server side of the application is poorly designed. In fact, it’s not so easy to create an economical, yet scalable system for millions of monthly active users. But this is not beyond human capabilities - to use such a combination of software that will allow you to maintain a serious audience on inexpensive cloud servers. This is worth considering when selecting the appropriate components when creating the MVP.


GoSnaps: 500,000 users in 5 days on the server for $ 100 per month


GoChat allowed players in Pokémon GO to chat, in the official application, they can not do this. I created a project, GoSnaps , also aimed at Pokémon GO fans. This is a mobile application that allows users to share screenshots and images with reference to the map. This is something like Instagram or Shapchat for Pokémon GO.

On the first day, GoSnaps scored 60,000 users. In the second - there were already 160 thousand, on the fifth day (at the time of writing this material) there were already half a million of them. At this point, users have uploaded about 200 thousand pictures to the system. At any time, about 1,000 people use the application at the same time. I created an image recognition subsystem to automatically check whether the loaded image has something to do with the Pokémon GO, and also means to change the size of the loaded images. All this works on one quite ordinary server from Google Cloud, the rent of which costs $ 100 per month. This includes the inexpensive Google Cloud Storage, which hosts the images. We are talking about a hundred dollars a month, not thousands. In this case, everything works fine.

GoChat versus GoSnaps


Compare GoChat and GoSnaps. Both applications probably perform multiple requests per second in order to show chats or images in a specific area of ​​the map. This is a geospatial search in a database (or search engine), performed either within a certain polygon on a map, or is carried out for a certain point defined by latitude and longitude. We use a polygon; requests are executed every time the user moves the map. Such queries create a serious load on the database, especially in combination with sorting or filtering data. GoSnaps have to process similar search queries hundreds of times per second. Probably the same thing happens in the depths of GoChat.

A feature of GoChat is that the application every second has to pull out of the database and send users a lot of chat messages. The material about GoChat talks about 600 requests per second for the application as a whole. These 600 requests are a combination of requests for the card and chat messages. Messages are small, work with them can (or even need) to organize through simple sockets. But messages appear frequently and need to be distributed among multiple users in the chat. It is quite possible to cope with such a situation if the program part of the solution is organized correctly. If we are dealing with a poorly designed MVP application, chat support can be a daunting task.

On the other hand, there are many images in GoSnaps that are downloaded from the repository and like every second. Images are saved on the server, since even old pictures do not lose their relevance. At the same time, outdated GoChat chat rooms are no longer needed. Since the image files are stored in Google Cloud Storage, the number of requested image files does not bother me as a developer. Google Cloud does all this, and I’m sure about Google’s capabilities. But the requested images with reference to the map - this is something that really bothers me.

In GoSnaps, there is an image recognition subsystem that looks for patterns on user-uploaded images in order to check whether these pictures are related to the Pokémon GO or not. In addition, this subsystem is engaged in resizing images and sending them to Cloud Storage. All this is resource-intensive, in terms of CPU load and consumed traffic, operations. These actions are much harder than distributing a small number of small chat messages, but they are performed less frequently.

The above allows me to conclude that both applications are very similar in terms of the complexity of scaling. GoChat handles more small messages, while GoSnaps works with larger images and performs more server-intensive operations. The design and architecture of these two applications require, for almost the same complexity, a slightly different approach.

How to create a scalable MVP in 24 hours


GoSnaps was created as an MVP, not as a professional business product. He was fully prepared in 24 hours. I took the Node.js template for hackathons and used the MongoDB database without any forms of caching. Nothing else applies to the project: neither Redis, nor Varnish, nor Nginx intricate schemes. The application for iOS was written in Objective-C, some code for working with Apple Maps was taken from our main application, Unboxd . How did I manage to make GoSnaps scalable? In fact - only because I was not lazy, following the harmful canons of MVP.

For example, I would consider the creation of MVP solely as a race against time, the purpose of which is to release a running application as quickly as possible, and at the same time I would not pay attention to the quality of the server part. Where, then, is the image stored? Of course, in the database, in MongoDB. This does not require additional settings, and the code is necessary - nothing at all. Very simple. In the spirit of MVP. How to request from the database images from a certain area on the map that have the most likes? It is enough to execute the usual request to MongoDB, covering the entire volume of the pictures stored there. One request to one data set in the database. Again - MVP. All this would destroy my application and make it impossible to use its functions.

Let's look at the request that I would have to fulfill in order to pull out the above images from the database. It would look something like this: “find all the images related to the area on the map [A, B, C, D], excluding those that are marked as invalid and those that are still being processed, sorted by the number of likes, by whether they are related to Pokémon GO, and sorted by novelty. ” On a small data set, such a query will work fine. But if we are talking about a serious load, such requests to the database will tumble down the entire system. This will happen even if we simplify the above query so that it includes only three conditions and sorting operations. Why? Because such an approach is not what the database is designed for. Access to the database should be performed using only one index at a time, which is impossible in the case of geospatial queries. If the application has few users, this design will be quite workable. But if the application suddenly becomes successful, it will kill him. What actually happened to GoChat.

What did I do instead? After performing resource-intensive operations on image analysis and resizing, the processed images are uploaded to Google Cloud Storage. Thanks to this, my server and database are not under load due to the output of images to users. The database should take care of the data, not the pictures. This, in itself, allows you to seriously save on servers.

From the point of view of database organization, I divided the images into several sets. These are all the images, the most liked, the newest, the newest ones matching the Pokémon GO theme, and so on. When users add new images, like them, mark them as inappropriate, the code checks that the images belong to one of the groups and acts accordingly.

With this approach, queries are performed on prepared datasets, rather than making complex calls to one huge pile of unstructured records. This is the result of the logical partitioning of data into several simple blocks. Nothing complicated. But this allowed me to perform queries only on geospatial coordinates with one sort operation, instead of the complex query described above. Simply put, this approach simplifies data retrieval as much as possible.

How much time have I spent on all these improvements? 2-3 hours, no more. Why did I do it first? Because I'm used to working in this way. I assume that my application is waiting for success. I could not sleep if my development would become popular, and then collapse under load just because it is poorly designed. I have embedded minimally viable scaling principles into the application. This is the difference between happiness from success and hopelessness. This is what, in my opinion, should be made part of the ideology of MVP applications.

Choosing the right tools for implementing MVP


If I created GoSnaps using a slower code execution environment, or based on a floppy framework, I would need more servers. If I used something like PHP with symfony, or Python with Django, or Ruby on Rails, I would spend all day accelerating slow components, or I would know what to add to the server. You can believe: I already had to go through it. These languages ​​and frameworks are great for a lot of scenarios, but not for MVP with a small server budget. This is mainly because of the many levels of code that are commonly used to work with data from databases in programs and unnecessarily bloated framework functions. All this is too much load on the server. Let me give you an example of how much this really means.

As I said, the GoSnaps backend is based on Node.js. This platform is generally fast and efficient. I used Mongoose as ORM in order to simplify software work with MongoDB. I do not consider myself to be a Mongoose expert, and I know that this library has a huge code base. Thus, over the Mongoose, I mentally put a big question mark. But yes, we are talking about MVP. Once, quite recently, 4 Node.js processes on our server consumed approximately 90% of CPU resources each, which is unacceptable for me, with about a thousand simultaneous users. I realized that the thing is most likely that Mongoose does something with the data. Obviously, I just needed to turn on the Mongoose lean () function in order to get regular JSON objects instead of Mongoose's ingenious objects. After this change, Node.js processes started loading the server by about 5–10%. Simple solutions based on knowing what the code actually does are very important. In my case, this reduced the load by 90%. Now imagine that we have a really heavy library, like Symphony with Doctrine. Such a colossus would need a couple of multiprocessor servers only for its own code to work, even considering that it is assumed that the bottleneck of the system is the database, and not the program mechanisms.

Choosing an economical and fast code execution environment is important for scalability, unless payment for servers is a problem for your project. Choosing a programming language with many useful libraries available is even more important, as MVP usually needs to be created as quickly as possible. Node.js, Scala, and Go are environments and languages ​​that satisfy both conditions. They have both high performance and lots of libraries. By themselves, languages ​​like PHP or Java are not necessarily slow, but they are usually used with large frameworks and libraries that make applications heavy. These languages ​​are good for pure object-oriented development and well-tested code, but not for creating applications that can be quickly and cheaply scaled. I don’t want to start a holy war here, so just let me say that all this is my subjective opinion, not supported by a deep and complete analysis of all the details. For example, I love Erlang, but I would never use it for an MVP application, so I consider any disputes on this issue to be meaningless.

My previous startup Cloud Games


Several years ago I became one of the founders of the project Cloud Games - a platform for publishing HTML5 games. When it all started, we were a gaming B2C site aimed at the MENA region. We have made great efforts to attract the audience, and have reached, in a few months, a million monthly active users (MAU, Monthly Active Users). At that time, I used PHP, Symphony2, Doctrine and MongoDB in a very simple and economical configuration. I worked at Spil Games with 200 million MAUs, PHP was used there at that time, and then switched to Erlang. After Cloud Games reached approximately 100,000 MAU, we faced server overload. The reason was Doctrine and MongoDB. I configured MongoDB correctly, indexes and queries, but the servers barely coped with the processing of the code. I used the APC cache of PHP and so on, but without much success.

Since cloudgames.com was a fairly static site, I was able to transfer this MVP project to Node.js from Redis in just a few days. It turned out a similar functionality, but in a different environment. This led to an immediate drop in server load of about 95%. I admit, the point here is that I managed to get rid of heavy PHP libraries, and not in the choice of application execution environment or language. But the minimal working configuration of Node.js is much more functional than the minimal PHP configuration. Especially considering that both MongoDB and the interface part of the project are 100% JavaScript, like Node.js. PHP, without frameworks and libraries, is just one of many programming languages.

We needed such a lightweight configuration, since we were a self-funded start-up at an early stage of development. Today, Cloud Games shows good results. The project is still based on Node.js. We could not have succeeded if we used technologies that require large investments, given the fact that in the life of Cloud Games as a startup, there were tough times. Designing a cost-effective scalable architecture has become one of the main conditions for success.

Results: MVP and scalability can coexist


If your application has a chance of exponential growth due to an explosive interest in it or possible media coverage, do not forget to consider scalability as part of the MVP strategy. The principles of minimum viable product and scalability can coexist. There is nothing sadder than creating a successful application and being present at its failure caused by technical problems. And the Pokémon GO itself had a lot of problems, but the project was so unique and powerful that it didn’t matter much. Small startups cannot afford such luxury. Timing is everything . A million GoChat users and half a million GoSnaps users are likely to agree with me.

Source: https://habr.com/ru/post/318048/


All Articles