Facebook uses MySQL knowing that it doesn’t scale well (or is there some special magic here?). I wanted to ask, from what reasons did they choose MySQL? Are JOINs used? And do you plan to switch to another database?
A:
Adam D'Angelo , the former CTO of Facebook, responds that he is developing his startup Quora :
If you break the data across different servers at the application level, the MySQL scalability is not such a big problem. For 2008, on Facebook [1] , we had 1,800 MySQL servers for which only two administrators were required. Of course, you cannot make a JOIN with data from different servers, but NoSQL databases will not allow you to do either. There is no evidence that Facebook uses Cassandr as the main repository, and it seems that the only thing for which it is needed there is a search for incoming messages. [2]
In fact, distributed databases like Cassandra, MongoDB, and CouchDB [3] are not very scalable or stable. For example, Twitter guys are trying to switch from MySQL to Cassandr for a whole year. Of course, if someone talks about how he used any of these databases as the main repository for 1000 cars during the year, then I will change my opinion.
Bad idea to risk your main base for the sake of a new technology. It will be a disaster to lose or spoil the base, and you may not be able to recover everything. In addition, if you are not the developer of one of these new-fangled databases and one of those few who use them in combat mode, then you just have to pray that the developer will fix bugs and problems with scalability as they appear.
In fact, you can go very far on one MySQL without worrying about the application-level data partitioning. You can easily “scale” the server on a bunch of cores and tons of RAM, well, do not forget about replication. In addition, if a memchached layer is in front of the server (which simply scales), then the only thing your database does is write new data. And for storing large objects, you can use S3 or any other distributed hash table. Therefore, while you are sure that you will be able to scale the base as it grows, you do not need to charge yourself with the burden of making the database scalable by an order of magnitude more than you really need.
Most problems arise when you try to split data across a large number of servers yourself. But you can use an intermediate layer between the base, which is responsible for this kind of partitioning, which, in fact, did in FriendFeed. [four]
I believe that the relational model is the right way to structure data in most applications that users create content. Schemes allow you to contain data in a certain form as you develop new versions of the service, they also serve as documentation and allow you to avoid heaps of errors. Another SQL allows you to process data as needed, and not to get tons of raw information, which then still needs to be further processed in the application. I think that all the hype around “NoSQL” will end immediately, as someone finally develops a distributed relational database with free semantics.