📜 ⬆️ ⬇️

Multi-tenant Data Architecture

Hi Habr!
image



Once upon a time, there were computers of several floors in size, and several operators worked with some powerful computers. Then this miracle of technology could be reduced to a desktop size and put it in every home. Now again, “cloud computing” is considered an innovation, hmm ... it all comes down to several terminals and one powerful computer data center. Well, let's wait until Google’s data center is reduced to the size of a laptop ...
')

Introduction


After the year 2000, the term acronym SaaS (software as a service) entered the lives of developers and architects, customers and many others related to IT. The essence of SaaS comes down to the fact that the Contractor develops the software, places it on the servers, maintains its performance, security, availability for customers (usually in the plural) and naturally pays for the server and other expenses.
The client pays for the use of services and does not think about the purchase of servers and their maintenance. This approach is cheaper for the Client and pays off for the Contractor (since it solves problems with licensing and piracy, and most importantly one service can be provided to several clients). However, the Client must trust the Contractor, since all his data are far away.
Naturally, such a decision forced to look for compromises in the field of data placement and sharing.
Since telling something without an example is quite problematic, consider the abstract problem.

Task


  1. There are several large companies that need to follow the news to be able to compete;
  2. News - information scattered throughout the internet;
  3. As a service, a system that in a certain way collects the necessary information on the Internet and stores it in a database in order to show it to the client;
  4. After a time (six months or a year), the data will be transferred to the archive (deleted from the main database).


For the Client, the convenience of such a system is obvious - employees of the company enter the system and see all the necessary data for more profitable and convenient work. The executor may not worry about the unauthorized use of the developed system, and most importantly, this system can be simultaneously used for several Clients, since the task is the same, and the database structure will also be identical. And here the decision tree forks, but more on that later.

Total: there are several customers who write and read to the database and know nothing about each other. It is necessary to ensure their isolation from each other and effective use of hardware and software.

Solution: MULTI-TENANT


The first solution that comes to mind to separate the data of different customers will store them in different databases (isolated approach). Alternatively, you can store all the data in one database by entering the additional field TenantID, by which to distinguish them (the general approach). In spite of all the binary nature of logic, there were no more solutions, but more. This fact is very convincingly presented in the picture borrowed from msdn:

image

Different Databases


This solution - the first thing that comes to mind is the simplest implementation of data sharing.

image

With this approach, the code is shared between clients (common UI files and business logic are used), and the data is logically (and possibly physically) separated from each other.

Benefits:


Disadvantages:



As a result, despite the high price, this is the most appropriate solution for customers whose main goal is security and who are willing to pay (for example, banks).

Common database, different schemes


So, we decided to store everything in one database, how to do it without introducing an additional field TenantID? Store information from different customers in different tables. To implement this method will help us schema (schema) . In essence, schemes are “namespaces” that contain certain resources (tables for example) and for which certain permissions can be given.

Benefits:


Disadvantages:


The result, if customers are ready to live next door, then the option is quite effective, because will allow on one machine to settle more customers than in the previous version.

General database, general scheme


The third option is to store everything in a single database and in general tables. To implement this option, a prerequisite will be the introduction of an additional field TenantID (or CustomerID as you like) for sharing information between customers.

Benefits:


Disadvantages:


So, if you have a lot of clients, but few servers (money), and if you do not have all the hard drives in the mirror at the same time, then the approach is just for you.

Harsh reality


Since the first option (different databases) can not afford all, it is most appropriate to use the option with a common database. But on this way we are waiting for a few unexpected surprises.
image
So, choose the cheapest option as a worker. Imagine that our bases do not fall, and hard drives are eternal. And now our system has been working successfully for a year and everyone is happy, gigabytes of data have accumulated, and suddenly everything starts to work slower and slower. The DBMS is not a black box, it must also be configured. The problem is trivial - the data is written in one table, in one file, and therefore on the hard drive fall behind each other. Now let's imagine how data is sampled for a single client: the hard drive head (yes, fans of SSD hard drives forgive me) slowly reads into the buffer everything, and records with the desired TenantID may not appear soon. Now imagine that we need to transfer the data of one client to the archive (by deleting them in our database).
Indices do you say? - I'll tell you dearly. In addition, even with indexes, the data will still lie in different parts of the disk.
There is a very elegant solution available in the enterprise versions of the DBMS called Partitioning (Sectioning), the essence of which can be seen in the figure:

image

With partitoning, the data of one client are arranged sequentially, in a separate place from the rest. For simplicity, we can assume that the data of each client lies on a separate logical drive. This feature is supported at the DBMS level, so there is no need to write drivers and reinvent the wheel.

P.S


This article is not a translation, but of course there are English-speaking sources setting forth the essence of multi-tenant.
Some related links:
www.developer.com/design/article.php/3801931
msdn.microsoft.com/en-us/library/aa479069.aspx
msdn.microsoft.com/en-us/library/aa479086.aspx
Special thanks: DA Surkov and DMinsky (these are different Dmitriy)

Post Postscript


Taking advantage of my official position , I want to congratulate everyone on the upcoming, the very last, most Friday Friday this year!

Source: https://habr.com/ru/post/110979/


All Articles