This text is about how to speed up the sampling of private data in multi-user projects.
Some things considered at the beginning may already be well known to you, but given that questions about the differentiation of access rights are asked regularly, I found it necessary to examine them in more detail.
For a guru, there can be valuable, if not justification, what this method is not good for. If this is understandable to you, go directly to the “binary masks” item, there it contains the most basic.
Many services allow the user to place various data on the network, while the person placing himself, as a rule, needs to limit the circle of people who can view, comment on his records or images. For these purposes, developers create different systems of rights control.
It should be noted that in most services the elements are displayed in the form of a list, and the elements whose reading is not allowed in the list should not be shown. Unlike, for example, the file system, where you can show a file in the list, and check the rights already at the opening stage, a hidden blog entry or “preview” of the photo should not be shown at all, since it already carries some information.
Separation of rights based on access level.
An example of such a separation is the system of secrecy in public institutions, that is, each subject has a certain level of access, and each object has a certain level of secrecy; to read an object, an access level not lower than the level of secrecy is required.
For Internet sites, as a rule, the division into levels is carried out: “acquaintances”, “friends”, “relatives”. In addition, according to the same principle (only for actions), the mechanism of "karma" works on the site
habrahabr.ruThus, the selection is carried out on two tables:
records |
id | int | Record ID |
us_id | int | Owner |
level | int | Level of secrecy |
user_levels - access levels |
us_id | int | User |
cor_id | int | Linked User |
level | int | Access level |
Thus, you can select allowed entries as follows:
SELECT `records`.`id`
FROM `records`, `level`
WHERE `records`.`us_id`=`user_levels`.`us_id`
AND `records`.`level`<=`users`.`level`
AND `user_levels`.`cor_id`=$us_id;
The presented method has one major drawback; we are limited only by comparing the access level, i.e. in this case, the record available to friends will be available to parents, which, you see, is not always convenient.
ACL
ACL is an access list system in which roles (groups) can be allowed or denied some action on an arbitrary object. In this case, one object can inherit from another access rights.
Also belonging to the role can be calculated on the fly in relation to the object.
This is called “virtual groups”; for example, a virtual group can be “owner”, which is calculated as user_id = object.owner_id. Disk quotas can also be implemented as a virtual group: users who have less than N megabytes loaded.
ACL is an incredibly flexible and convenient tool for working with rights. You can read more about ACL in the book "Protected Code", which describes in detail the mechanism of access lists in Windows OS, if you have anything to do with Microsoft or if you want to immediately use this mechanism on your site, then you should refer to the documentation of the zend_acl library.
Despite all the advantages, the ACL has significant drawbacks:
- The documentation for zend_acl contains a fragment about storing lists in which it is written: a programmer can store an access table, where he pleases, for example, by serializing.
In this lies the main hitch, the structure of the access tables is rather nontrivial and it is rather difficult to normalize it for storage, i.e. as a rule, they are stored in a serialized form.
It turns out that in order to find out whether a user can read a record, you must first receive this record, then initiate its ACL and only then check access.
Those. to get a list, or (oeeee) the number of elements, we need to select more records than you need to display, and even from the uncertain boundary conditions.
It is practically impossible to do this at the sampling stage, right in the request.
Yes, many databases allow you to work with serialized structures, and even index them, but the costs of this are still too high. - The flexibility of an ACL is compensated for by the complexity of the setup; practice shows that even developers can make an ACL for their needs with great difficulty, what can we say about end users.
Therefore, I believe that the ACL mechanism is redundant for most needs, so they should be where it is really needed, and for simple cases (just a reading restriction), use simpler tools.
User groups.
So, as a rule, a user has several groups, suppose “friends”, “colleagues” and “relatives”, with which he wants to allow certain records to be read.
Wherein:
- the same user may belong to several groups at the same time (be “friend” and “colleague”, although this is contraindicated)
- A record can be opened for one or several groups (for example, friends and relatives should know about your desire to change jobs, but it’s not necessary for colleagues to know about this).
Thus, we have the following structure:
records |
id | int | Record ID |
us_id | int | Owner |
record_permissions - who can read |
record_id | int | Record ID |
group | int | Group id |
group_members are members of groups |
us_id | int | User |
group | int | Group id |
Thus, the selection of records occurs using the following query:
SELECT DISTINCT `records`.`id` , `records`.`name`
FROM `records` , `record_permissions` , `group_members`
WHERE `records`.`public` =0
AND `records`.`id` = `record_permissions`.`record_id`
AND `record_permissions`.`group_id` = `group_members`.`group_id`
AND `group_members`.`us_id` =$us_id
In principle, the whole picture of rights can be kept in this form, but it is not worth doing.
- It is better to keep the rights to actions in the ACL, in practice you will never use this data.
- The rights to change, creation, deletion change much less frequently, and if you are such a fan of database normalization, store them at least in a separate table so as not to clutter the one you are working with (or view if the DBMS is able)
Despite the simplicity and elegance of the solution, it has significant drawbacks, which will be discussed below, in the same place I will try to show solutions.
Public records.
A diligent reader has already noticed a hole in our reasoning: only the user who is explicitly allowed to read the record can read it.
By default, in most cases, the records are open, and anyone can read them.
You can solve this with the help of a tricky query with LEFT JOIN, but in the end we have a condition on the table being joined, that there is a universal evil.
You can make a virtual group "Everyone", which includes any user and who is allowed to read public records. But this is not very good in relation to resources and it is generally ugly to create meaningless recordings for everyone.
Therefore, we introduce a new property in records: public, this will allow us:
- Pull out without write extras to the system of privileges records for blocks like: “the last on the site”
- Display entries for unregistered users also without using rights verification.
- Add open records by a simple UNION in the query (UNION works extremely fast, it is even recommended to use it instead of OR and IN, for a particularly confusing case)
Binary masks.
Even in the last query, the bad word DISTINCT is used, and the sample is based on three tables, and even with one-to-many relationships, this is a matter of concern.
In order to get something, you need to sacrifice something, we introduce some boundary conditions:
- The user can open records only for their groups.
Very few people may need to open their notes to “Vasya's friends,” and in this case it will be difficult to do at the interface level. - A user may have a limited number of groups.
A person can memorize 5 ± 2 objects, in a large number of groups it is easier to get confused than to understand anything.
Based on this, we reduce the number of tables to two:
records |
id | int | Record |
us_id | int | User |
access | int | Access mask |
public | bool | Public record flag |
user_references- user relationships |
us_id | int | User |
cor_id | int | Linked User |
mask | int | Relationship mask |
access and mask are binary masks that define access for the owner of the record and belong to the user’s groups.
Those. each bit of the mask corresponds to a group, int we have 32 bits, so the number of groups is limited to 32 (several bits can be reserved).
Example.
User Relations :Friends | Family | Colleagues | User | Total |
one | 0 | 0 | Friend | one |
0 | one | 0 | Mama | 2 |
0 | 0 | one | Colleague | four |
one | 0 | one | Friend-colleague | five |
Access to records :Friends | Family | Colleagues | Record | Total |
0 | 0 | 0 | Only me | 0 |
0 | one | 0 | Hello Mom! | 2 |
0 | 0 | one | Working | four |
one | one | 0 | I want to quit! | 3 |
Thus, it is possible to find out if the user has the right to read the record by performing the bitwise "&" above the masks. Bitwise operations are fast because everything is based on them.
Thus, the query looks like this:
SELECT `records`.`id` , `records`.`name`
FROM `records` , `user_references`
WHERE `records`.`public` =0
AND `records`.`us_id` = `user_references`.`us_id`
AND `records`.`access` & `user_references`.`mask`
AND `user_references`.`cor_id` =$us_id
Or taking into account the public field:
SELECT `records`.`id` , `records`.`name`
FROM `records` , `user_references`
WHERE `records`.`public` =0
AND `records`.`us_id` = `user_references`.`us_id`
AND `records`.`access` & `user_references`.`mask`
AND `user_references`.`cor_id` =$us_id
UNION
SELECT `records`.`id` , `records`.`name`
FROM `records`
WHERE `public` =1
Testing.
During testing, the performance of the solution was compared with user groups and solutions with bit masks.
As we see, the gain is very substantial, which justifies the limitations imposed by the latter method.
Separately, I will say that on a real project, an ACL (proprietary) is used, which eats up a fair amount of resources and creates significant difficulties for developers.
Conditions | By mask (s) | By groups (sec) | Speed ​​at times |
MySql 4, 30,000 records, 100 users, 1000 connections, UNION | 0,00215 | 0,0107 | 4.98 |
MySql 4,500,000 records, 10,000 users, 100,000 links | 0,00185 | 0.1346 | 72.76 |
MySql 4,500,000 records, 10,000 users, 100,000 UNION connections | 0,00215 | 0.1383 | 64.33 |
MySql 5,500,000 records, 10,000 users, 100,000 links | 0,00135 | 0.169 | 125.19 |
Sampling time averaged over 4 users created manually. All other users were created automatically; each such user had 10 friends and 10 “for friends” entries.
As can be seen from the results, the sampling rate differs by 2 orders of magnitude, while the solution with the tables is expandable, the sampling rate depends little on the number of records
It should be noted that if the user is in two groups (“friends” and “colleagues”), then this does not affect the sampling rate by mask. And sampling by groups (DISTINCT is involved) takes a few seconds (horror of horror), I did not even use it in the pivot table.
Results
It turns out that to improve the performance of the system of rights, you can create additional easily indexable tools.
I do not call in any way to abandon the ACL or other complex rights demarcation systems, but in some cases use optimized solutions for a specific task.
If you know more effective ways of differentiation of access rights, or are aware of aspects unknown to me, you are welcome in the comments.