📜 ⬆️ ⬇️

Horizontal scaling of a real project database using SQL Azure Federations

Sharding

The issue of scaling applications in the cloud occurs very often. The concept of cloud technology itself implies scaling applications on demand. Any self-respecting cloud provider supports relevant features.

Why do you need horizontal scaling? When the question arises of improving the performance of the application, there are several options. As you know, you can buy a new hardware for the server, add the amount of RAM, etc. This principle is called vertical scaling. However, this method can be quite expensive, long, and it has a limit. You can certainly buy top-end hardware, but it may not pull all the requirements of your application.

The second method, called horizontal scaling, involves expanding the computing resources available to the application by increasing the number of servers or application instances, in the case of PaaS, on which your application is hosted. That is, if earlier your application was located on one server, and at some point it stopped “pulling” the load, you can simply buy a second exactly the same server. Put your application on it, and thus part of the requests to the application will go to the first server, part to the second.
')
This principle and put in horizontal scaling of applications hosted in the "cloud", but instead of real physical servers and we have the concept of a virtual machine. When an instance of a single virtual machine is not enough for your application, you can increase it, thus spreading the load among several virtual machines.

If we consider the capabilities of the cloud platform from Microsoft, then they are quite broad. There are auto-scaling, scaling on request, and all this is available both using the UI, and using the SDK, REST API and PowerShell.

However, if everything is quite simple with scaling an application (PaaS) or virtual machines (IaaS), you specify how many instances you need, so much if your application uses MS SQL databases, several questions arise. Of course, the first thing that comes to mind is to organize a cluster of SQL Server virtual machines. The solution is quite simple and familiar to everyone. And what if the application uses the database as a service (SaaS)? What if we don’t want to tune a SQL Server cluster?

Of course, if we are talking about Windows Azure, then SQL Azure will be used as the SQL database. This database supports horizontal scaling technology (sharding) called SQL Azure Federations. The principle of its operation is very simple: logically independent rows from one table are stored in different databases. The simplest example:



This is the same table whose data is stored in different copies of the database (shards). That is, account data with ID 1 is stored in the first database, with ID 2 in the second, and so on.

SQL Azure limitations

So what does this give us? First, the isolation of data from each other. The data of one account is not related to the data of another. Accordingly, using sharding technology, we can realize not only horizontal scaling of the database, but also a multi-tenant scenario.

The speed of data sampling from the database increases, since the size of the data stored in a particular shard is an order of magnitude smaller than the total volume of the database.

However, any technology has its drawbacks. Sharding is no exception. Let's think what disadvantages can occur when using sharding technology using the example of the database described above?

If we recall the architecture of SQL Azure , as is well known, the service does not support fetching data from several databases simultaneously. That is one database - one connection. And shards are no exception. That is, if we assume the most advanced request for the return of the number of customers in the database must be performed on each shard separately.

Original request:
SELECT Count(*) FROM Account

Sample request for a specific shard:
USE FEDERATION Accounts(AccountId = 4) WITH RESET, FILTERING = OFF
GO
SELECT Count(*) FROM Account

The logic of the summation of the values ​​returned by this query must be placed in the application. That is, as a result of the use of federations, part of the code will “go” to the application, since at the database level some features of a regular SQL Server are limited.

Foreword

Of course, SQL Azure Federations is not a panacea and you can implement your principle of horizontal scaling of databases. Suppose a multi-tenant approach is also a kind of horizontal scaling of the database. Since the data of one user is separated not only “logically” from the data of another user, but also “physically”.

If you need to add a new user, we configure a separate database for it. The question is that in the application logic there should be a “routing” mechanism. That is, the application must know which database it is currently working with.

But back to SQL Azure Federations ...
The very idea of ​​Microsoft is worthy of all praise. It would be nice to get a tool that allows you to easily mastab the database. And also to make auto-scaling, based on the results of certain requests (well, this is fantastic) ...

However, as a rule, before making the final decision, to switch to using SQL Azure Federations, it is necessary to carefully analyze the existing database (and, as a rule, transfer existing databases), or think through the smallest details of the database architecture that the application will use, as well as the logic itself applications to work with this database.

With a theory sort of sorted out. However, as a rule, in practice there is a fairly large number of rakes, which can be stepped on. Therefore, instead of showing how you can easily start using SQL Azure Federations from scratch, we will try to migrate the existing SQL Azure database to federations. Consider the steps that DBA needs to take, as well as problems that may be encountered during the migration phase.

In general, do not switch! Have a good start of the working week!

Source: https://habr.com/ru/post/192984/


All Articles