
I am sure that a huge number of users live in Habré, licked while reading descriptions of technologies and architectures used in young, dynamic and, most importantly, fast-growing companies in their user base. Unfortunately, a relatively small number of our compatriots work in such companies around the world, and those who still work in the domestic kitchen are bound by various conditions of employment contracts or commonplace NDAs that prohibit the public from disclosing the most interesting details. Nevertheless, I personally know a large number of specialists, especially those interested in high loads and who do not know where to get this information first-hand.
This problem can be solved in the only way - to give the floor to one of the managers of the development department or to any other person who is in an adequately high position and knowledgeable in development, and after that - pull, pull all the details out of it. Something like the
Information Queue did, interviewing one of Twitter’s engineers, Evan Weaver, about why the company had been developing on the rails for so long, decided to switch to using other technologies and what consequences it had.
In this article I will fully refer to the words of Evan, explaining the essence of migration and the benefits derived from the use of JVM, first of all - performance and, all the same, scalability. But as we will find out a little later, the decision was also dictated by the desire to isolate individual services, as well as slightly change the overall architecture of the product.
')
So, the story begins last year, when Twitter announced changes in the backend architecture (message queue), and also announced its intention to rewrite Twitter Storage on Scala, and in the spring, work began on rewriting the entire search engine. As part of these changes, the MySQL database (underlying the search) was replaced by
Lucene . And, finally, quite recently, the development team
announced the replacement of Ruby on Rails in the search area - in its place was a Java server, which they call Blender. The result of this replacement was a threefold reduction in the delay in executing a search query.
Overview
One of the first conclusions that can be made by looking at the overall architecture of Twitter is that many of the decisions of its developers seem perfectly pragmatic. For example, in the backend of the product, both MySQL and the distributed
Cassandra database are used . Few people know about Twitter’s own development:
Gizzard is the framework used to create distributed repositories based on MySQL databases. According to Weaver, "it is mainly used for highly structured data (SLA-data) because it is not relatively flexible."
All real-time data is obtained from either Gizzard / MySQL or Cassandra. Also, the architecture actively uses the stack for distributed Hadoop computing for offline costing, while the system built using the key-value
Redis database and the aforementioned Gizzard is used online.
The relationship between the frontend and backend levels is realized through the development of Facebook -
Thrift (interface description language, used as
RPC ) and the JSON REST API used in all “official” clients from Twitter and the new product site.
Languages
A similar, pragmatic approach is seen in the choice of programming languages used in the company. Level 1 languages: JavaScript, Ruby, Scala and Java. The same work is partially supported in C, but new services are not written on it. In general, Evan talks about the transition to Scala of developers with knowledge of Ruby and the use of Java by those who have previously been spinning in the field of C / C ++.
As for the search engine team, here the tools dictate the choice of languages. Since Lucene is built in Java, programmers have to operate primarily in this language.
In order to allow developers to choose the ideal work language for them, Twitter has put a lot of effort and money into writing internal frameworks that encapsulate the common efforts of all teams.
Finagle (written in Scala), for example, is a library for creating asynchronous RPC servers and clients in Java, Scala, or any other language on the JVM platform.
While the whole backend is gradually moving towards the JVM, the front-end (client) code is increasingly leaning towards using browser-based JavaScript, reducing the share of the Ruby language.
Search: From Ruby to Java
The transition from Ruby to the Java-based Blender framework was accomplished in two steps. The first was to replace the existing MySQL back-end with a reversed index in real time, based on Lucene and named Earlybird. Its commissioning doubled the efficiency of the memory used, as well as the flexibility to add various search filters to help support the rapidly growing demand for search by product. A more detailed description of the mechanism of operation of this part of the system was
described by Twitter engineers .
In order to solve the problem of poor front-end performance, the development team built a Java Blender server. Blender is the Thrift and HTTP API service already mentioned, built on
Netty and the NEW I / O (NIO) scalable client / server library written in Java and allows you to develop various server protocols. Netty allows the company to create fully asynchronous aggregation services that can collect results from several back-end services (such as real-time indices, top tweets, and geo-data).
This made it possible to avoid high I / O queues, optimizing CPU performance and processing current requests faster. In addition, many requests in the backend can be processed in parallel, which significantly reduces latency.
Twitter's search engine is one of the busiest in the world, processing about a billion queries per day. The result of the transition to Blender was simply fantastic: 95% of requests were processed three times faster, the delay decreased from 800ms to 250ms, and the load on the processors at both ends (front-end) of the product was reduced by half. Today's facilities allow the product to process 10 times more requests for a machine than it was before applying these technologies in architecture. Thus, the company reduces the cost of services required to maintain the entire architecture in a high-performance state.
And although performance and scalability were important problems, due to which it was necessary to increase the use of JVM - Evan says that encapsulation is still the key point of this transition, because Twitter's current architecture, in general, is doing well. The transition to JVM was largely dictated by the fact that the productivity of developers on it is higher and, as a result, the productivity of the entire product.
The combination of the Ruby on Rails framework along with the MySQL database has been very popular for Western startups over the past years. The advantages are clear: developers could quickly test new and simple ideas in order to test their returns in terms of working in a real market, where supply is dictated by user demand. However, the disadvantages of such a bundle are also quite obvious - problems of scalability and performance, as well as some immaturity of libraries and tools, as regards RoR in the first place.
Thanks for the help in precise wording to ivaxer
Habrawer
InfoQ via
RRW