📜 ⬆️ ⬇️

MySQL Sharding on Yii Framework

To begin with, our project is at the initial stage of development, and its launch is planned for the 1st of November. And, in order to immediately cut off all possible criticism with regards to premature optimization, I would say that the team was tasked with developing an application to cope with abrupt load changes (from 1,000 to 50,000, etc.). In this regard, it was decided to build a highly scalable architecture that allows you to easily and quickly increase system performance at the expense of the hardware (on the scale-out principle).




')

Platform

We chose the platform quickly - it was decided to develop a project based on the Yii Framework. I can briefly say that reviews of familiar developers inspired me to study it a year and a half ago, and the choice of yii, as a platform for developing this project, was made thanks to a long google, tests and comparative characteristics (such as www.yiiframework.com/performance )

It was decided not to abandon the use of Yii ActiveRecord as an ORM, because it is very convenient in terms of development, and there is always the possibility of later reworking bottlenecks to use direct queries to the database. At the initial stage, the bottleneck itself seemed to us to be the data storage structure, since the stated requirements hinted to us that a very large stream of queries that one server could not cope with would soon go to MySQL (chosen for data storage).

Replication

Replication, as a solution, disappeared immediately. At the initial stage of development (first year), it would allow us to forget about the problems with the load, but later we would have to solve the newly raised problem of the constantly loaded master (by the way, these problems are solved in very muddy ways such as a proxy for the master, slave as a master for slaves etc.). Yes, and the constant work of admins to maintain relevant data in the databases and pulling logs could lead to disastrous consequences, since nobody canceled the human factor.

Sharding

The choice fell on horizontal scaling (sharding) mysql. I will not go into the details of such a principle of data allocation, I will only say that instances of one entity (rows of one table) are spread across different servers (described in more detail here ). The tables, divided into shards, are selected according to the “most frequently visited” principle, i.e., those to which there are very many requests (no matter what type).

In our case, we selected several basic tables, which we decided to flatten horizontally between the servers. I'll tell you on the example of a table of users, which we expect several million. So, we have a user table (tbUser) spread across multiple servers, and for each requested user we now also need to know which server it is on. There are many solutions to this situation, when it is proposed, for example, to take the remainder of dividing the UID by the number of servers, etc. But in this case there is a problem when adding new servers and you need to write additional user transfer scripts that block the table or something else. Therefore, to determine the server on which the user is stored, we chose a variant of third-party data storage such as UID => SERVER_ID, and we took Redis as the storage itself.

Rediska setup

To interact with Redis, the Rediska library was installed. Integrating it with Yii is easy - you need to download the library, place it, for example, in / protected / vendor / rediska and be sure to put the following in the Rediska.php header:

<?php spl_autoload_unregister(array('YiiBase','autoload')); require_once dirname(__FILE__) . '/Rediska/Autoloader.php'; Rediska_Autoloader::register(); spl_autoload_register(array('YiiBase', 'autoload')); class Rediska extends Rediska_Options 


This solution temporarily disables autoload Yii and turns on Radiskovsky (otherwise, autoloads simply conflict).

In the / protected / components / directory, we created our own component using this library:

 <?php require_once dirname(__FILE__).'/../vendor/rediska/Rediska.php'; class RediskaConnection { public $options = array(); private $_rediska; public function init() { $this->_rediska = new Rediska($this->options); } public function getConnection() { return $this->_rediska; } } 


... and hooked it up in main.php

 'RediskaConnection'=>array( 'class'=>'application.components.RediskaConnection', 'options'=>array( 'servers' => array( 'server1' => array( 'host'=>'192.168.0.131', 'port'=>'6379', 'timeout'=>'3', // in seconds 'readTimeout'=>'3', // in seconds ), ), 'serializerAdapter'=>'json', ), ), 


All Rediska settings are described at the office. Site. After configuring a new component, it is enough to check it with get / set

 Yii::app()->RediskaConnection->getConnection()->set('key', 'value'); echo Yii::app()->RediskaConnection->getConnection()->get('key'); 


Database Preparation

When all the minor troubles with Redis ended, we moved on to creating a couple of MySQL instances with similar databases:

  'shard1' => array( 'class' => 'CDbConnection', 'connectionString' => 'mysql:host=192.168.0.131;dbname=dbshard', 'username' => 'dbuser', 'password' => 'dbpass', 'autoConnect' => false, 'charset' => 'utf8', ), 'shard2' => array( 'class' => 'CDbConnection', 'connectionString' => 'mysql:host=192.168.0.61;dbname=dbshard', 'username' => 'dbuser', 'password' => 'dbpass', 'autoConnect' => false, 'charset' => 'utf8', ), 


Software implementation of sharding

On these bases, we placed only tables that undergo sharding (including tbUser), so as not to get confused later. Now it was necessary to think over the mechanism of data distribution among the received servers. To this end, it was decided to create the CShardedActiveRecord class, which essentially extends CActiveRecord and overrides the methods we need. The principle of operation is as follows: we redefined the beforeSave (), which now determines the server to save the new record (the same remainder after dividing the UID by SERVER_QUANTITY) and saves the calculated value in Redis, and the id record to save is generated using an incremental counter implemented with using Redis increment () (this allows you to have through id numbering in all databases); redefined findByPk (), which now also gets the server number to be sampled. All code is shown below:

 <?php /** * Sharded active record */ class CShardedActiveRecord extends CActiveRecord { /** * Used in find by PK and * @var integer */ private $_pk = null; /** * Used connection * @var CDbConnection */ private $_connection = null; /** * @see db/ar/CActiveRecord#getDbConnection() */ public function getDbConnection() { if (!is_null($this->_connection)) return $this->_connection; if (is_null($this->_pk)) { $serverName = Yii::app()->params->servers['serverNames'][0]; } else { $serverId = $this->getServerId($this->_pk); $serverName =empty(Yii::app()->params->servers['serverNames'][$serverId]) ? Yii::app()->params->servers['serverNames'][0] : Yii::app()->params->servers['serverNames'][$serverId]; } $this->_connection = Yii::app()->{$serverName}; return $this->_connection; } private function removeConnection() { $this->_connection = null; } private function getRedisKey($key) { return $this->tableName() . '_' . $key; } /** * @return server id or false, for null $pk */ private function getServerId($pk) { if (is_null($pk)) return false; $serverId = Yii::app()->RediskaConnection->getConnection()->get($this->getRedisKey($pk)); return $serverId; } public function findByPk($pk, $condition = '', $params = array()) { if (!is_integer($pk)) throw new Exception ('primary key must be integer'); $this->_pk = $pk; $this->removeConnection(); return parent::findByPk($pk, $condition, $params); } /** * Set unique id for new record * @return boolean */ protected function beforeSave() { if (!parent::beforeSave()) return false; if ($this->getIsNewRecord()) { $key = $this->tableName().'_counter'; $this->id = $this->_pk = Yii::app()->RediskaConnection->getConnection()->increment($key); $serverId = $this->id % Yii::app()->params->servers['serverCount']; $result = Yii::app()->RediskaConnection->getConnection()->set($this->getRedisKey($this->id), $serverId); $this->removeConnection(); } return true; } } 


Total

Thus, we got our own ActiveRecord for sharding tables, and now we just had to create models that inherit functionality not from CActiveRecord, but from CShardedActiveRecord. Overridden methods are enough for us, since Most of the samples from the database are PK-based, and the database search is implemented using Sphinx (for which we even wrote our DataProvider so that you can use widgets integrated in Yii that work only with DataProvider). Further, by simple inserts and samples in the table, we tested the work of our component and were satisfied with the result of our work.

Source: https://habr.com/ru/post/129780/


All Articles