Conditions
We use MySQL to store any
FriendFeed data. Our database grows with the number of users. Now we have more than 250 million records, these are user records (post'y), comments, ratings ("likes")
As the database grew, we occasionally dealt with scalability issues. We solved problems in standard ways: read-only slave servers, memcache to increase read throughput and partitioning to increase write throughput. However, as it grows, the scalability methods used have made it difficult to add new functionality.
In particular, changing the database schema or adding indexes to the existing 10-20 million records resulted in a complete blocking of the server for several hours. Deleting old indexes took time, and not deleting hit performance, as the database continued to use them on each INSERT. There are complex procedures that can be used to circumvent these problems (for example, creating a new index on the slave server, and then swapping master and slave places), but these procedures are so difficult and dangerous that they completely deprived us of the desire to add something new. requiring a schema or index change. And since our databases are highly distributed, MySQL relational things like JOIN never worked for us. Then we decided to look for a solution to the problems that lies outside the relational databases.
')
There are many projects designed to solve the problem of storing data with a flexible scheme and building indexes on the fly (for example,
CouchDB ). However, apparently none of them are used by large sites. In the tests about which we read and chased ourselves, none of the projects proved to be stable, mature enough for our purposes (see
this article outdated on CouchDB , for example). And all this time, MySQL worked. He did not spoil the data. Replication worked. We already sufficiently understood all his bottlenecks. We liked MySQL exactly as a repository, outside relational templates.
Having weighed everything, we decided to create a storage system without a schema over MySQL, instead of using a completely new solution. In this article I will try to describe the main details of the system. We are also curious how other sites solved these problems. Well, we think that our work will be useful to other developers.
Introduction
Our database stores data without a schema as a set of fields (for example, JSON objects or dictionary (dictionary) in Python). The only required field is the id, a 16-byte UUID. The rest should not be important for our repository, it is for this that it is created. We "change" the scheme by simply adding new fields.
We will index the data records and save the index in a separate MySQL table. If we want to index 3 fields of each record, we will get 3 MySQL tables - one for each index. If we no longer need an index, we stop writing to the index table, and we can delete the table if desired. If a new index is required, we create a new MySQL table for it and start an asynchronous process to populate the index without interrupting the rest of the tasks.
As a result, we get more tables than before, but adding and removing indexes is easier. We have seriously optimized the process for populating indexes (which we called “The Cleaner”) so that it creates indexes quickly without disrupting the site. Now we can add new properties and index for days, not weeks. Also now no master exchange for slave or other dangerous operations is required.
Details
In MySQL, our records are stored as follows:
Copy Source | Copy HTML CREATE TABLE entities ( added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , id BINARY (16) NOT NULL , updated TIMESTAMP NOT NULL , body MEDIUMBLOB, UNIQUE KEY (id), KEY (updated) ) ENGINE=InnoDB;
Copy Source | Copy HTML CREATE TABLE entities ( added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , id BINARY (16) NOT NULL , updated TIMESTAMP NOT NULL , body MEDIUMBLOB, UNIQUE KEY (id), KEY (updated) ) ENGINE=InnoDB;
Copy Source | Copy HTML CREATE TABLE entities ( added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , id BINARY (16) NOT NULL , updated TIMESTAMP NOT NULL , body MEDIUMBLOB, UNIQUE KEY (id), KEY (updated) ) ENGINE=InnoDB;
Copy Source | Copy HTML CREATE TABLE entities ( added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , id BINARY (16) NOT NULL , updated TIMESTAMP NOT NULL , body MEDIUMBLOB, UNIQUE KEY (id), KEY (updated) ) ENGINE=InnoDB;
Copy Source | Copy HTML CREATE TABLE entities ( added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , id BINARY (16) NOT NULL , updated TIMESTAMP NOT NULL , body MEDIUMBLOB, UNIQUE KEY (id), KEY (updated) ) ENGINE=InnoDB;
Copy Source | Copy HTML CREATE TABLE entities ( added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , id BINARY (16) NOT NULL , updated TIMESTAMP NOT NULL , body MEDIUMBLOB, UNIQUE KEY (id), KEY (updated) ) ENGINE=InnoDB;
Copy Source | Copy HTML CREATE TABLE entities ( added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , id BINARY (16) NOT NULL , updated TIMESTAMP NOT NULL , body MEDIUMBLOB, UNIQUE KEY (id), KEY (updated) ) ENGINE=InnoDB;
Copy Source | Copy HTML CREATE TABLE entities ( added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , id BINARY (16) NOT NULL , updated TIMESTAMP NOT NULL , body MEDIUMBLOB, UNIQUE KEY (id), KEY (updated) ) ENGINE=InnoDB;
Copy Source | Copy HTML CREATE TABLE entities ( added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , id BINARY (16) NOT NULL , updated TIMESTAMP NOT NULL , body MEDIUMBLOB, UNIQUE KEY (id), KEY (updated) ) ENGINE=InnoDB;
The added_id column is needed because InnoDB physically stores data in the order of the primary key. AUTO_INCREMENT keys ensure that new records are written to the hard disk after the old ones, which helps both reading and writing (access to new records usually occurs more often than old records, so FriendFeed pages are sorted in reverse chronological order). The body of the record is stored as a compressed (zlib) Python-
pickled dictionary.
Indexes are stored in separate tables. For the new index, we create a table with the attributes by which we want to search. For example, writing to FriendFeed looks like this:
Copy Source | Copy HTML
- {
- "id" : "71f0c4d2291844cca2df6f486e96e37c" ,
- "user_id" : "f48b0440ca0c4f66991c4d5f6a078eaf" ,
- "feed_id" : "f48b0440ca0c4f66991c4d5f6a078eaf" ,
- "title" : "We just launched a new backend system for FriendFeed!" ,
- "link" : "http://friendfeed.com/e/71f0c4d2-2918-44cc-a2df-6f486e96e37c" ,
- "published" : 1235697046 ,
- "updated" : 1235697046 ,
- }
We want to index by user_id field to display all the records that the user has done. Our index table looks like this:
Copy Source | Copy HTML
- CREATE TABLE index_user_id (
- user_id BINARY (16) NOT NULL ,
- entity_id BINARY (16) NOT NULL UNIQUE ,
- PRIMARY KEY (user_id, entity_id)
- ) ENGINE = InnoDB;
Our library automatically creates indexes. To start our repository, which saves such records with the index described above, we write (in Python):
Copy Source | Copy HTML
- user_id_index = friendfeed.datastore.Index (
- table = "index_user_id" , properties = [ "user_id" ], shard_on = "user_id" )
- datastore = friendfeed.datastore.DataStore (
- mysql_shards = [ "127.0.0.1天306 " , "127.0.0.1:330307" ],
- indexes = [user_id_index])
- new_entity = {
- "id" : binascii .a2b_hex ( "71f0c4d2291844cca2df6f486e96e37c" ),
- "user_id" : binascii .a2b_hex ( "f48b0440ca0c4f66991c4d5f6a078eaf" ),
- "feed_id" : binascii .a2b_hex ( "f48b0440ca0c4f66991c4d5f6a078eaf" ),
- "title" : u "We just launched a new backend system for FriendFeed!" ,
- "link" : u "http://friendfeed.com/e/71f0c4d2-2918-44cc-a2df-6f486e96e37c" ,
- "published" : 1235697046 ,
- "updated" : 1235697046 ,
- }
- datastore.put (new_entity)
- entity = datastore.get ( binascii .a2b_hex ( "71f0c4d2291844cca2df6f486e96e37c" ))
- entity = user_id_index.get_all (datastore, user_id = binascii .a2b_hex ( "f48b0440ca0c4f66991c4d5f6a078eaf" ))
The index class looks at the user_id field in all records and automatically creates an index in the index_user_id table. Since our database is partitioned (sharding), the shard_on argument is used to determine which segment the index will be stored in (in our case, entity ["user_id"]% num_shards)
To execute a query using the created index, an Index class object is used (see user_id_index.get_all). The “repository” algorithm makes the “join” of the index_user_id tables and tables with records, first going over all the index_user_id tables on all database segments to get a list of record IDs and then retrieves these records from the entities table.
To create a new index, for example, using the link attribute, we will create a table:
Copy Source | Copy HTML
- CREATE TABLE index_link (
- link VARCHAR (735) NOT NULL ,
- entity_id BINARY (16) NOT NULL UNIQUE ,
- PRIMARY KEY (link, entity_id)
- ) ENGINE = InnoDB DEFAULT CHARSET = utf8;
The inclusion code for the new index will be:
Copy Source | Copy HTML
- user_id_index = friendfeed.datastore.Index (
- table = "index_user_id" , properties = [ "user_id" ], shard_on = "user_id" )
- link_index = friendfeed.datastore.Index (
- table = "index_link" , properties = [ "link" ], shard_on = "link" )
- datastore = friendfeed.datastore.DataStore (
- mysql_shards = [ "127.0.0.1天306 " , "127.0.0.1:330307" ],
- indexes = [user_id_index, link_index])
We can also fill the index asynchronously (even during real work) using the process:
./rundatastorecleaner.py --index = index_link
Consistency and atomicity
Due to the fact that the database is segmented, the index for a particular record may be on different segments. What happens if the process ends unexpectedly before it writes down all the indexes on the tables?
The most ambitious FriendFeed engineers thought that transactions were necessary in this situation. However, we wanted to keep our system as simple as possible. We decided to relax the restrictions:
- The attribute set stored in the main record table is canonical.
- Index may return inappropriate entries.
As a result, we create a new entry in the following order:
- We save the record to the main table, using the InnoDB ACID guarantees (Atomicity, Consistency, Isolation, Durability).
- Save indexes to all index tables on all segments.
When we read from the index tables, we know that the result may be inaccurate (that is, the result may contain unnecessary objects if the recording was not completed in step 2). To make sure that we return the correct records, we re-filter the result obtained from the table index:
- Read entity_id from all index tables participating in the query.
- We read all records by received id
- We filter (in Python) all records that do not match the query criteria.
To fix the indexes, the “Cleaner” process (cleaner) was created, which was mentioned earlier. It runs on the table of records, recording the missing indexes, deleting old ones and correcting the wrong ones. He starts with new records, in practice, all inaccuracies are corrected very quickly (within a few seconds).
Performance
We have slightly optimized our primary keys in the new system and are happy with the result. Below is a chart of delays before returning the FriendFeed page for the last month (we launched a new backend a few days ago):

In particular, the latency of our system is stable, even during mid-day peaks. Below is a chart for the last 24 hours:

Compare with delays a week ago:

With the new system it has become much easier to work with. We have already changed the indexes several times on the working system, and now we are starting to migrate our main tables in order to move on.
How FriendFeed uses MySQL to store schema-less data , by Bret Taylor • February 27, 2009
I also recommend reading the discussion of this article on the popular mysqlperformanceblog.com