Architecture large projects: Facebook
At the level at which Facebook works, traditional approaches to organizing the work of websites do not work at all or, at a minimum, do not provide adequate performance. The huge attendance of the project challenged Facebook engineers with the need to maintain the site’s performance with nearly half a billion active users. This article describes the software and hardware that made this possible.
')
Tasks
- Facebook has about 570 billion page views per month (according to Google Ad Planner)
- There are more photos uploaded to Facebook than all other image sharing services combined (including sites like Flickr)
- More than three billion photos are uploaded to Facebook servers every month.
- Facebook servers give about 1.2 million photos per second to view (excluding photos from the Facebook content delivery network)
- Each month, users exchange more than 25 billion pieces of information (status updates, comments, etc.)
- For the month of May 2010, the Facebook project had over 30,000 servers online.
Facebook software
In some ways, Facebook is still running on the LAMP stack, but the size of the project required the commissioning of many other elements and services. As well as changes to existing work. For example:
- Facebook still uses PHP, but before execution, the scripts are compiled into native processor code, thus speeding up work
- Facebook servers run on Linux, but the Linux code has been optimized (mostly in its network part)
- Facebook uses MySQL, but mostly as a key-value store. All data connections and business logic are moved to the script level, since it is much easier to perform optimization (on the other side of the Memcached level)
The project has a system written from scratch. For example, Haystack is a highly scalable storage facility that is used to store photos. Scribe is an example of another system that provides protocols on a Facebook scale.
So, first things first.
Memcached
Memcached is one of the most widely known projects on the Internet. Its distributed information caching system is used as a caching layer between web servers and MySQL (since access to databases is relatively slow). Years passed and Facebook made a huge number of modifications of Memcached code and related software (for example, optimization of the network subsystem).
Facebook has thousands of Memcached servers with dozens of terabytes of cached data at any given time. This is probably the world's largest array of Memcached servers.
HipHop for PHP
PHP, because it is a scripting language, is rather slow when compared with the native processor code running on the server. HipHop converts PHP scripts to C ++ source codes, which are then compiled for good performance. This allows Facebook to get more value from fewer servers, since PHP is used almost everywhere on Facebook.
A small group of engineers (at the beginning there were only three) developed HipHop for 18 months and now it works on project servers.
Haystack
Haystack is a high-performance photo storage / retrieval system (strictly speaking, Haystack is an object storage, so it can store any data, not just photos). The share of this system falls a huge amount of work. More than 20 billion photos are uploaded to Facebook and each is saved in four different resolutions, which ultimately gives us more than 80 billion photos.
Haystack should not only be able to store photos, but also give them away very quickly. As we mentioned earlier, Facebook gives out more than 1.2 million photos per second. This number does not include photos that are delivered by the Facebook content delivery system and is constantly growing.
Big pipe
BigPipe is a dynamic web page delivery system developed by Facebook. It is used to deliver each webpage in sections (called pagelets) to optimize performance.
For example, the chat window, news feed and other parts of the page are requested separately. They can be received in parallel, which increases productivity and allows users to use the website even if some part of it is disabled or faulty.
Cassandra
Cassandra is a distributed, fault-tolerant data warehouse. This is one of the systems that is always mentioned when talking about NoSQL. Cassandra became an open source project and even became a subsidiary of the Apache Foundation. On Facebook, we use it to search the Inbox. In principle, it is used by many projects. For example, Digg. It is planned to use it in the Pingdom project.
Scribe
Scribe is a convenient logging system that is used for several things at once. It was designed to provide protocol-wide across Facebook and supports adding new event categories as they appear (there are hundreds on Facebook).
Hadoop and Hive
Hadoop is an open source implementation of the map-reduce algorithm that allows you to perform calculations on huge amounts of data. In Facebook, we use it to analyze data (as you understand, Facebook has enough of them). Hive was developed on Facebook and allows you to use SQL queries to get information from Hadoop, which facilitates the work of non-programmers.
Both Hadoop and Hive are open source and are developed under the umbrella of the Apache Foundation. They are used by a large number of other projects. For example, Yahoo and Twitter.
Thrift
Facebook uses different programming languages ​​in various system components. PHP is used as front-end, Erlang for chat, Java and C ++ are not left idle either. Thrift is a cross-language framework that binds all parts of the system together, allowing them to communicate with each other. Thrift is being developed as an open source project and has already added support for some other programming languages.
Varny
Varnish is an HTTP accelerator that can serve as a load balancer and content cache, which can then be delivered at high speed. Facebook uses Varnish to deliver photos and images of profiles, withstanding loads of billions of requests per day. Like everything that uses Facebook Varnish - open source software
Something else
We told you about some software complexes that allow Facebook to keep the load. But managing such a large system is a difficult task. Therefore, we will tell you about something else that allows the project to work stably.
Graded releases and implicit activation of new features
Facebook uses a system called GateKeeper, which allows you to serve different users with different versions of the source code of the system. This allows you to release releases step by step, activate certain functions only for Facebook employees, and so on.
GateKeeper also allows Facebook to implicitly activate new features to perform, for example, load testing and identify slow-moving system components. Implicit activation includes some feature or feature, but does not show it in the user interface. Usually, implicit activation is performed two weeks before a new opportunity is provided to users.
System performance test
Facebook carefully monitors system performance and, interestingly, monitors the performance performance of each PHP system function. This is achieved using XHProf.
Partial Disable Functions
If the project starts to work slowly, we can disable some features (from a sufficiently large number) in order to improve the performance of key system components.
What we have not said
We did not mention the Facebook hardware in this article, but, of course, this is enough for scalable projects. For example, Facebook uses a content delivery system to deliver its static elements. Of course, there is still a large data center in Oregon to increase scalability by entering even more servers.
In addition, we still have a lot of different software works. But it seems to us that we managed to tell you about some interesting decisions that are involved in the project.
Facebook loves open source projects
This article would not be complete if we didn’t tell you how Facebook loves open source projects or would simply say that “we love open source projects”.
Facebook is not only involved in the development of projects such as Linux, Memcached, MySQL, Hadoop and others, but also releases its internal development as open source software. For example, HipHop, Cassandra, Thrift and Scribe. Facebook also launched the Tornado project, a high-performance framework developed by the team that created FriendFeed (this project was purchased by Facebook in August 2009). A list of projects in which Facebook is involved can be found on
this page .
More problems
Facebook is growing at a tremendous rate. The user base is growing almost exponentially and is approaching half a billion active users and who knows what will happen at the end of the year. The increase is almost 100 million users every six months.
Facebook even has a special “growth” team that is constantly trying to expand the project’s audience.
The constant growth means that Facebook will face various problems in the field of productivity with an increase in the number of searches, views, uploaded images and so on. But this is part of the daily work of the service. Facebook engineers will look for new ways to increase the scalability of the project (and this is not just about adding more and more new servers). For example, the Facebook photo storage system has been rewritten several times as the site grows.
Let's see what Facebook engineers will have to face next time. One can argue that with something very interesting. In the end, they work with a value project, which can only be dreamed of, with a site that has more users than residents in most countries. When you are dealing with a project, you better think creatively.
Sources:
Facebook Engineer Presentations and
Facebook Engineers Blog