I'm fascinated by how Facebook works. This is a very unique society, not easily recreated (and their method would not work for all companies, even if they tried). These are notes accumulated from conversations with many friends from Facebook about how the company develops and releases software products.
More than six months have passed since I collected these observations, and I am sure that even now Facebook is constantly improving its software development methods. So these notes may be a bit out of date. And also, it seems that the culture of Facebook, managed by developers, is receiving increasing public attention. So I feel more comfortable now by releasing these notes ... HUGE thanks to the many people who helped bring this idea of ​​Facebook from the inside together! Thanks also to people epries and fryfrog, who made corrections and edited. ')
UPD : translation is not a colorful literary work that is read and breathtaking.Therefore, if possible, it is better to read the original in English.
Notes:
As of June 2010, the company consisted of almost 2,000 employees, with 1,100 employees 10 months before. Almost doubling the staff in less than a year!
The largest teams are specialists and operations department, about 400-500 people each. Both teams make up almost 50% of the entire state.
The ratio of the number of managers to the number of specialists is approximately 1: 7 or 1:10.
All specialists go through a 4-6 week training at the training center, where they study the Facebook system by correcting errors (bug fixing) and listening to lectures of senior / full-time specialists. Approximately 10% of people from each training class do not pass on and retire with recommendations from the company.
After the training center, all specialists have access to the current database (accompanied by the standard lecture “Great responsibility comes with great responsibility” and a clear list of “fire-able offenses”, for example, disclosure of personal user data).
[Ed. thx fryfrog] “There are also very good security measures to keep anyone in the company from doing terrible things that may come to mind. Surrounding people have the opportunity to get in the way and try to solve the problem. But if you still “become” those who need help, this fact is recorded along with the cause and carefully considered. To stray from the true path here is not allowed, of course. "
Any specialist can change any part of the Facebook code and commit it at will.
Culture driven by developers. "Product managers are essentially useless here" - a quote from a specialist. Specialists can change the specifications of the development process itself, change the order of working projects and introduce new ideas at any time.
During the monthly inter-team meetings, the specialists are the only ones who report on the progress. Marketing and management departments participate in these meetings, but if they are too frank, it is reported to management as “a product spoke too much at the last meeting”. They actually want the specialists to openly own their developments and be the main link for the projects that they have developed.
The allocation of resources for projects is completely voluntary.
PM gathers a group of experts, trying to give them the opportunity to get excited, discussing their own ideas.
Experts decide which idea sounds more interesting to start working on it.
Experts communicate with their managers and say: "I would like to work on these 5 things during the week."
Those. Directors usually leave the preferences of specialists at their discretion, sometimes they can ask to do certain tasks in the first place.
Specialists manage the entire development themselves - JavaScript on the frontend, database code on the backend and everything in between. If they want to get the help of a designer (the staff of specialized designers is limited), they have to interest the designer strongly enough to take on their project. The same applies to architects. But in most cases, it is expected that the experts themselves will cope with all their needs.
Whether the idea of ​​a damn is worth it usually becomes clear during the week of its introduction and further testing on selected users, for example, 1% of the users of the state of Nevada.
In general, experts prefer to work on infrastructure, scalability and "difficult problems" - the most prestigious areas. It can be difficult to observe specialists who are enthusiastically working on front-end projects and user interfaces. This is the opposite of what you can see in other consumer markets, where everyone wants to develop things that users touch directly, and you can point a finger at a specific part and say "I did it." In Facebook, the backend, such as newsfeed algorithms, targeted advertising algorithms, memcache optimization, etc., are first-class projects that experts want to work on.
Comits that affect some high-priority functionality (for example, a news feed) are tested by the code before the merge (approx. "Merge"). The news feed is very important, so Zuckerberg himself looks at any of her changes, but this is an exceptional case.
[Hotfix - thx epriest] “There is a mandatory code verification of all changes (by one or several specialists). I think the clause simply explains that Tsuk does not look at every change personally. ”
[Correction of thx fryfrog] “All changes are reviewed by at least one person, and the system is such that anyone else can take and view your code, even if you did not ask. Otherwise, it may lead to the deliberate introduction of malicious code into unverified code. ”
Specialists are responsible for testing, correcting errors and supporting their work after launch. Several unit-testing and integration-testing frameworks are available, but they are used only from time to time.
[Hotfix thx fryfrog] “I would also like to add that we, of course, have a QA, just not an official band. Every employee who is in the office or connected via VPN uses a version of the site that includes all the changes queued for the next display. This version is updated constantly and usually 1-12 hours before the whole world sees it. All employees are strongly advised to report any bugs they find, and it all works very well. ”
re: surprised by the lack of QA or automated unit tests - “most professionals are able to write error-free code. This is something they do not see the point of doing in most companies: when there is a QA department, it’s easy to just throw it all over to find mistakes. ”[Please note that this was a subjective opinion, I wrote it because of the stark contrast, which is seen in the standard development practices of other companies].
[Correction of thx epriest] “We have automatic testing, including“ push-blocking ”tests that are required to pass before laying out the release. We absolutely do not believe in the phrase "most specialists are able to write error-free code," we more think that this is reasonable as one of the basic principles of development. "
re: surprised by the lack of influence / control PM - managers have a lot of independence and freedom. The key to independence is to build really good relationships with technical directors. You need to be technically savvy enough not to suggest stupid ideas. In addition, there is no need to ask for permission or pass some roadmap / backlog checks. "My product director doesn't even know all the things that are on my roadmap." Accordingly, there are several PMs, but they all feel that they have a great responsibility for a really important area in the company, with personal interest.
By default, all commits are packaged in weekly releases (Tuesdays).
With additional effort, changes can be posted on the same day.
Releases on Tuesdays require the presence of all specialists who committed the code in the previous week for the release candidate to be posted.
Before the start of the release, specialists should be present on a special IRC channel for a “call to display”, otherwise they will be punished in a public “confusion”.
The operations team launches the release, gradually rolling it out to the servers.
Facebook has about 60,000 servers.
There are 9 concentric levels for rolling out a new release.
[Correction thx epriest] “The nine steps of the calculations are not concentric. There are 3 concentric stages (p1 = internal release, p2 = small external release, p3 = full external release). The remaining six stages are auxiliary levels such as internal tools, video download server, etc. ”
The smallest level is 6 servers.
For example, every Tuesday release rolls out onto 6 servers (level 1), then the operations team monitors these 6 servers and makes sure that they work correctly before rolling out to the next level.
If there are any problems in the release (for example, errors fall, etc.), then the calculation is canceled. The specialist who made the faulty commit is called to correct the error. Then the calculation starts from the beginning.
Thus, the release can go through the levels multiple times: 1-2-3-fixes. Return to 1. 1-2-3-4-5 fixes. Return to 1. 1-2-3-4-5-6-7-8-9-9.
The team of operators is really well prepared, united and very concerned about their work. Their server metrics are more than just error, load, and memory usage reports — they also include custom metrics. For example, if a new release changes the percentage of people using Facebook, the operations team sees this in its performance and can therefore stop the release to clarify the problem.
During the release calculations, the operations team uses an IRC-based paging connection that can send information to engineers via Facebook, e-mail, IRC, IM, and SMS, if their attention is required. Ignoring the tellers' messages leads to a public "shame."
As soon as the code is rolled out to level 9 and it is stable, the weekly vacuum is considered complete.
If the functionality was not developed in time for the day of the weekly calculation, then it is not so critical (if it does not contain any hard external dependencies) - the functionality will simply be fully implemented when it is completed.
Receiving svn-complaints (svn-blammed), public shame or too frequent delay of projects may affect the dismissal of a specialist. "This is a very high-performance culture." People who are unproductive or not super-gifted really put themselves in jeopardy. Managers will literally take, will lead those who do not succeed aside within 6 months after hiring and they will say, “It just didn’t work, you are not sufficiently suitable for this culture.” Generally speaking, in relation to any company level, even those employed on the C-level and VP-level were quickly dismissed if they were not super-productive.
[Correction, thx epriest] “People are not called to review errors. They are invoked only if they asked the changes to be included in the release, but not to support the changes, when something went wrong (and if they did not find anyone to replace). ”
[Correction, thx epriest] “Because of the complaints, you will NOT be dismissed (approx. Translator: meaning svn-blame). We are extremely lenient in this regard, and most of the main specialists laid out at least one terrible thing, including myself. As far as I know, no one has ever been fired for making such mistakes. ”
[Correction, thx fryfrog] “I also don’t know anyone who would have been fired for the errors in the article. I know people who accidentally dropped the site. They are working hard to correct what caused the problem, and everyone learns from it. Public shame is much more effective than the fear of being dismissed, in my opinion. ”
It will be extremely interesting to see how the development culture on Facebook evolves over time, and especially to see how this culture can continue to expand with the expansion of the company itself to thousands of employees. What do you think? Will a “developer-driven culture” work in your company?