📜 ⬆️ ⬇️

How in the Cloud Mail.Ru appeared protection against viruses



Hello everyone, my name is Yuri Lazarev, I am a system administrator for Mail.Ru Cloud . We recently introduced an automatic antivirus scan of all files uploaded to the repository. Now all content is scanned by Kaspersky Anti-Virus, whose products are already used to protect against viruses in Mail.Ru Mail. In addition, files that have been poured into the Cloud since its launch last year were scanned. Implementing such a test in a high-loaded service, while maintaining the same high speed, is a rather difficult task.

As an analogy, you can compare the process of building a one-story house and a skyscraper. A single-storey house can be built even by a person without deep knowledge and great experience, and this structure will somehow stand and serve. With a skyscraper, everything is much more complicated: the design of such a building must be seriously considered in terms of the carrying capacity of the soil, wind loads and many other factors. Similarly, antivirus check in the cloud service is not organized at all as on home computers or even in corporate networks.
')
If you want to know in more detail what the Cloud architecture represents, then you can read the previous article on Habré. This will give an understanding of how the process of saving the file and its upload to the Cloud proceed. And here we will describe how we manage to check for viruses petabytes of data in our highly loaded system, without losing the quality of the service or the speed of downloading and checking files.

New Files

In the Cloud, there is deduplication, that is, a file with specific content is present in only one instance (actually in two, since there is a backup copy). If 200 people upload the same file, then 200 identical files will not be in the storage. Just all of these users will be distributed copies of one file. What for? First, it allows us to more efficiently use disk space and, as a result, offer users more free storage space. In addition, we save power to check files. At the moment, deduplication allows us to reduce the load on the storage by about 15%.

The scan is carried out several times: as soon as the file gets to the Cloud and later, with the help of updated anti-virus databases. After all, there is always the possibility that the file has become infected with a new virus that was not yet known to the antivirus at the time of download. So checks are carried out on an ongoing basis. If the file is infected, the service will not allow you to either download or create a link to it.

We check files on separate servers, which are allocated exclusively for this task. In addition, we wrote a utility that allows you to scan files using the Kaspersky API. The fact is that you can not just put the boxed version of the antivirus on any server and tell him to check all the files. In this case, it will be possible to completely forget about the phenomenon of high performance. Antivirus product is not a tool specifically designed for use in cloud systems, it must be integrated. And the process of anti-virus scanning in high-loaded systems must be tightly controlled. The above-mentioned utility assumed this role. It not only determines the sequence of checking files, but also optimizes the load. If you describe it in a simple way: you do not need to download the entire file from the repository and transfer it for verification. The utility takes the beginning of the file, downloads a certain piece of storage. Next, Kaspersky analyzes the type of this file. As a rule, it makes no sense to check the entire body of the file entirely. Depending on the type of the SDK file, the antivirus determines the scan strategy. Next comes the query to our utility, they say, give me this piece, and the necessary information is downloaded from the repository. As a result, when the SDK decides that the file is checked, it receives a note about the actual fact of the check, indicating the time of its execution and the version of the anti-virus database. Thus, the use of the control utility significantly reduces the scan time, reduces the load on the network and on the disks themselves.

At the moment we have more than 20,000 disks with files of Cloud users. And the check is ongoing. The storage contains a variety of data, including huge video files. Pulling them out of the repository and driving them over the network would be an extremely sub-optimal waste of resources. But, thanks to the mechanism described above, we managed to adjust the anti-virus scan by several dozens of servers. Now about 8 million files are checked per day, about 50 terabytes. This is not the peak performance of the system, besides, we laid the possibility of further scaling.

Check queue

So, we reduce the cost of storage and the load on it due to the use of deduplication, and also significantly increase the speed of anti-virus scanning using the control utility. But this would not be enough to quickly process such a large amount of data. Therefore, we have applied another tool - the check queue. It is not just a list of files to which data is added from below, it is a separate service. The queue itself is on a separate server and is running the Tarantool DBMS. This is an in-house development of Mail.Ru Group employees, and one of its features is very high performance. This was the determining factor when choosing a DBMS, and not its origin. First of all, new files uploaded to the Cloud are in the queue. They are placed there by the service loader. And also in the queue are added files with the greatest time that has passed since their last check. The second service, which replenishes the verification queue, has a limit on the maximum number of old files to be added so that it does not slow down the process of checking new ones. On each server at the same time there are several such processing services. Now we are trying to distribute files of different types and sizes to different queues, in order to shorten the scan time for most downloaded files.



Old files

Due to the fact that the automatic check was introduced some time after the launch of the Mail.Ru Cloud, about 14 petabytes of data had to be collected. And, of course, they were not lying on one machine, but were scattered across several data centers. The situation was complicated by the fact that all these servers are active, which means that it was impossible to load them with file checking tasks. If the server on which the files are stored will be occupied with some analytical tasks that load the hardware resources of the storage, then potentially the speed of all operations is significantly reduced, including the transfer of files over the network. And in this case, we would get a degradation of the quality of service.

It would also be inexpedient to check this amount of data gradually, it would take too much time. Therefore, it was decided to use additional resources to conduct an audit as soon as possible. For this purpose, a temporary cluster of 60 servers was assembled. They checked all previously uploaded data in about three weeks.

We also calculated which blocked malware is the most common:



Conclusions and future plans

So, thanks to the integrated use of the control utility, the scan queue and the Tarantool DBMS, we managed to achieve high performance of antivirus scan, almost in real time, using relatively small resources.
The trend is that over time, more and more user information will be stored in the clouds. Therefore, anti-virus scanning becomes an integral part of not only user devices, but also online services where their data is stored.

The verification mechanism in our Cloud, we will still significantly upgrade. For example, it is planned to introduce a specific weight. Thanks to this option, the most requested files will most often be checked. Large files will be allocated in a separate queue, since their scanning takes a lot of resources and time. Organizing such queues for different files is an interesting task, which we will discuss in one of the following posts.

Source: https://habr.com/ru/post/231785/


All Articles