Recently, at Habré, quite a lot of attention has been paid to the Yii framework. He became our choice for a major project. And the problem of most large projects, as is known, is in scaling. It is equally well known that you can easily put hundreds of parallel nginx and balance the load on the processor, memory, disk and even the channel. But with the database is much more complicated.
In order to overcome this problem beforehand, it was decided to implement sharding support in Yii. Speech under the cut will go briefly about what sharding is and in detail about:
- ActiveRecord device in Yii
- Implementations on this sharding device
- Problems that still exist in AR
UPD: moved to PHP, because the presence of an extension for sharding can tip the scales when choosing a framework.
Sharding
Sharding is the vertical scaling of data: depending on a condition, a specific row in the table is inserted into one of the database nodes.
In the most banal case of any social network, the algorithm for selecting a node for inserting user data may be the following:
server_id := user.id % 2
server_connection = server_id == 0 ? 'dsn1' : 'dsn2'
You can select a node where we will search for user data either by the same algorithm or by the system lookup table. The second option will allow us to freely transfer at least one user to the entire selected node.
')
Together with the user, who is considered the “entity on which shards are conducted,” we keep all his work. Choosing a server to post to a blog looks like this:
server_id := post.user.id % 2
server_connection = server_id == 0 ? 'dsn1' : 'dsn2'
Sharding, allowing you to make scaling almost infinite, creates a lot of problems. About their description and about the ways of life with sharding in the real world, you can read
an article by Netlog . They are quite detailed in all the specifics of this pattern.
ActiveRecord device in Yii
The main functionality of the interaction with the base in Yii is divided into two classes: ActiveRecord and ActiveFinder. The first is able to work with only one table to which the model used belongs, and as soon as it meets with () or join () it immediately replaces itself with ActiveFinder. Which, in turn, can work with only a few tables and ALWAYS builds a query for JOINs.
The main database connection for AR operation is determined by the Yii config. In the implementation, the public method getDbConnection () is responsible for it. In this case, ActiveRecord determines this method, and ActiveFinder refers to ActiveRecord.
Implementation of sharding
The finished version of all that is described below is
here and is currently under consideration by the developers of the framework with the question "how would we either include it or share it for everyone."
All that we need in order to implement sharding in a framework is to teach ORM to choose a connection depending on some condition in the outside world. Accordingly, we need to give it the opportunity to take some parameter and force it to choose the appropriate connection depending on it.
The Yii interface is for chained calls: Foo :: model () -> with ('...') -> find (). Therefore, a good way to transfer “information from the outside” to it will be to implement another such method that will remember it and then allow it to make a decision based on it. Meet choose ().
For the end user, it looks like this: Foo :: model () -> choose ($ shard) -> find (); This variable is stored within the object. And on other models will not affect in any way. As well as other instances of the same model. Which leads us to two results:
1. Encapsulation. Nothing suddenly breaks. :)
2. For each model where you need to use shards, you must manually pull choose. :(
But back to the internal device: the stored key is taken into account by the overridden getDbConnection (). If the key is there, you need to run the shard selection algorithm by it. If there is no key (the chuz is not called or is called without a parameter) - use the standard connection from the config. The connection selection algorithm itself is implemented in a special class DbConnectionManager. The implementation of this class is quite typical with the exception of one operation “to understand what kind of dsn should be used by this key”. It is precisely this that is proposed to implement, focusing on the abstract class included in our implementation.
Total: choose ($ shard) -> getDbConnection ($ shard) -> DbConnectionManager :: getConnection ($ shard) -> DbConnectionManager :: _ findConnection ($ shard).
Here is such a chain. Basic sharding support is. By the way, with the help of this implementation, you can still make an automatic switch to the replica and many other nice things. Another plus of this implementation: full transparency. While choose () is not called, nothing changes.
Further implementation very much depends on how exactly you will build a sharding. For our project, we implemented another class inheriting from ShardedActiveRecord, which in 70% of places saved us from manually specifying a shard. Knowing that we are sharding the user base, nothing prevents us from calling choose () automatically when it is saved. Or while saving his blog post.
At the same time, theoretically, based on the Conventions over Configurations approach, an extension can be developed and some basic simple sharding can be made completely out of the box. Perhaps the next article will be about this.
In conclusion, I once again draw attention to the fact that the implementation described above lies in the
archive on the Yii forum in the topic where it is discussed . It includes not only two classes: the abstract ConnectionManager and ShardedActiveRecord, but also a set of unit tests. And from these unit tests it is pretty well seen what the selection of objects will turn into if such an implementation is used.
Problems that still exist in AR
If you still read the article on the
reference about the implementation of sharding in Netlog from the first paragraph , then you probably noticed that in addition to sharding, the guys describe an interesting approach to caching. Caching is modeled.
By default, Yii tries to use a weighted average policy for using JOINs. He does not make all hundreds of requests, but does not try to shove it all into one. At the same time, there is a together () method that allows you to force the creation of a single request. But there is no anti-together () method that would force it to first select the id of all entities, and then make single requests to select each of them.
If you get rid of memcached and sphinx, this method seems useless and insane. But imagine that we have a certain search query that returned us a bunch of id. And all entities (generally all) for model_ $ id are stored in RAM. And Yii before searching for a specific model by ID in the database would try to find it in the cache.
Implementing all of this does not seem very difficult, until we stumble upon ActiveFinder and Behaviors. But this, in general, is also a subject for a separate article. In the meantime, the author of the framework promised to think about personal implementation (anyway, he is guided to WHERE better than external hackers in AR) of this in 1.1;].