📜 ⬆️ ⬇️

Machine learning in the practice of administration. QoSmic technology



Recently, news feeds have flooded articles about machine learning (ML; Machine Learning) and deep learning (Deep Learning).

Indeed, in a few years, researchers have made significant progress in this direction - and, more importantly, society has become ready for new technologies.
')
Unfortunately, while speculating on the popular topic of machine learning, many focused on areas of its use that were completely unnecessary to mankind: generating texts and scripts for insane films, writing pictures in the style of famous artists, etc. Some of these articles are completely slipping into panic like “soon we will all be without work.”

Such an approach creates in society a misconception about machine learning and scenarios for its use in real-world tasks. In practice, ML allows us to automate a particular type of mental activity. A machine is capable of making decisions just like a person, and even faster or more accurately than a person - by processing more data. However, it is important to understand that the algorithm is capable of performing only one specific task well.

We cannot replace a policeman with a computer, but we can teach a computer to recognize people who look like a criminal in a stream of people. All the power of algorithms dries out when the available data becomes insufficient or an event occurs, unlike previous ones - known to the system. The machine is not capable of creativity in an unfamiliar situation.

Machine learning in storage systems


With regard to the topic of storage systems, let us ask ourselves: how can ML approaches be applied in the world of hard and solid-state disks? Here are some examples of using machine learning in storage.

1. Changing the parameters of the entire system as a whole and its individual components


Algorithm, studying the set of data and monitoring the characteristics of devices, can change the parameters of the behavior of the storage system.

There have been attempts to implement such "autopilot" for flash-drives. It should be noted that developers of devices based on NAND Flash always have to make compromises, choosing between volume, reliability and the maximum number of overwrites. In this case, it is possible to change the device parameters by adjusting the controller behavior through setting the parameters of the registers. Such registers can be 50–100 in planar memory and more than 1000 in 3D NAND.

A striking example of machine learning for changing the parameters of various registers is demonstrated by the startup NVMdurance. Details on how ML can be applied in the field of SSD-drives, read the whitepaper from Coughlin Associates.

2. Analysis of the behavior of storage systems and predictive analytics


The algorithm can analyze logs and event history and predict potential problems with a storage cluster. So, a lot of work on the way to the “smart data center” was done by the guys from Nimble Storage. You can find the story about their InfoSight product here.

The development of mobile technologies and behavioral characteristics of the new generation of users will lead to the fact that in 5-7 years the means used to manage the infrastructure will understand the natural language. Already, there are analogues of Microsoft IFTTT, which understand tasks like “Create a link on Twitter for all my updates on Facebook”.

Sooner or later, systems in enterprises will be able to provide information on the state of the infrastructure, allocate the necessary resources or carry out load analysis on a simple request.

One thing is clear - the approach to managing the infrastructure of an enterprise will change and change drastically.

3. QoS (quality of service) at the application level


The third important ML application case is providing QoS for certain applications. Obviously, the old “one application - one LUN” approach no longer works. We have repeatedly encountered the situation when different applications work with the same volume from the same host. At the same time, many of them are absolutely not critical for business, but very demanding of resources.

To solve this problem, we implemented a QoSmic project. In fact, RAIDIX software taught the storage systems to recognize applications and loads based on typical IO features. About him we will talk a little more.

Development of "smart" technology QoS


The main task of most modern storage systems is the simultaneous provision of storage resources to several client stations (initiators).

It is necessary to divide the tasks requiring storage resources into critical and non-critical for the business of the company. Failure to perform business-critical tasks due to the fact that all the necessary resources were captured by applications that perform non-critical tasks can lead to serious financial losses.

Quite often, the provision of the service level at the request of various initiators is formed manually by the system administrator. The administrator can give priority to one or several initiators, then requests from them will be provided with guaranteed bandwidth. However, this way of managing storage systems cannot provide a level of service with optimal performance and reliability.

In RAIDIX, we have created our own unique, QoSmic priority prioritization technology based on the identification of running applications on the fly. Using the QoSmic algorithm, critical business applications are recognized, and it is they who are given the highest priority. Priority is automatically removed when a critical application stops working. This algorithm can be turned on or off by the client.

The storage system with embedded QoSmic technology is shown in Figure 1.



Figure 1. Data storage system with embedded QoSmic technology

QoSmic operation principle


The application identification algorithm consists of two modules: learning and recognition

  1. Learning module: we “acquaint” our storage system with a new application that is planned to be recognized for QoS operation or proactive reading
  2. Recognition Module: QoS or read ahead applications are identified in real time.

Requests for t seconds (for example, 20 seconds) are collected into recognition modules, then this log is analyzed, and I / O signatures are built on it. To identify the application, we only need to know four characteristics: the length of the request, the type of request (read or write), offset (address space), the time of arrival of the request. In the learning mode, the signatures are marked with the name of the application, in the recognition mode they are submitted to the module for identification.

The classification algorithm (“Model” in Figure 1) is Random Forest.

This algorithm is often used in the industrial industry, as it usually gives good results and is “understandable” for the customer. And it has a number of advantages such as high learning speed, non-iterative learning (the algorithm is completed in a fixed number of operations), scalability (the ability to process large amounts of data), high quality of the resulting models (comparable to neural networks and neural network ensembles), small number of tunable parameters .

Then, using the Random Forest algorithm and the model obtained in the training mode, applications running on the client are identified, or the answer “could not be determined” in the case of:


Then the application names go to the QoS module, which prioritizes.

We are often asked about how an administrator understands that it’s time to do retraining? Retraining is worth doing if the question “failed to determine” is often asked or applications are incorrectly identified. In the process of learning, the algorithm itself can suggest that the signatures obtained are not enough for accurate identification.

Functional features of the QoSmic algorithm


The algorithm accurately identifies applications.


The likelihood of errors of the first kind (when an important application was considered unimportant) or of the second kind (when one critical application was taken as another) was negligible and is presented in Table 1.

The high level of accuracy achieved due to the selected parameters in the I / O signature allows the formation of a fairly accurate statistical profile of the applications and, therefore, with high accuracy to detect a running critical application and distinguish it from a low-priority one.

Table 1. Identification accuracy
Errors of the first kindErrors of the second kind
Apple Final Cut Pro / X0.1%0.5%
Adobe Premiere Pro0.15%0.8%
Autodesk smoke0.12%0.7%
Antivirus Kaspersky Small Business Security0.01%0.01%
SQL database0.005%0.01%
Unimportant applications0.1%-

For comparison, three media applications are selected that generate fairly similar sequential traffic. Under unimportant applications means the work of the browser, utilities, simple copying, backups, etc.

The recognition algorithm is not resource intensive and does not affect system performance.




Figure 2. Scope diagram of latency storage distribution

With a maximum intensity load on the storage system with a low-power processor and low-cost consumer-grade HDDs, the system performance practically does not change, that is, the end user does not feel any delays in the system operation, both with the QoSmic mode turned on and off (see Figure 2 and table 2).

Table 2. Performance Testing with Iometer
WorkloadLength of request, KBRead%Decrease in bandwidth
Sequential read4, 32, 256, 1024100%1.3%
Sequential write4, 32, 256, 10240%<1%
Sequential read / write4, 32, 256, 102450%1.5%
Random read4, 32, 256, 1024100%1.4%
Random write4, 32, 256, 10240%<1%
Random read / write4, 32, 256, 102450%2%

Conclusion


Summing up, it is worth highlighting the main advantages of the QoSmic technology:


It is also worth noting that the application recognition module can be used for proactive reading.

The use of machine learning methods in storage is quite realistic and gives good results. Such technologies cease to be fiction - they successfully work and are effectively applied. It is too early to say whether machine learning will replace the system administrator or become his best assistant, but it can be said with confidence that these methods will have a significant impact on the industry in the foreseeable future.

Source: https://habr.com/ru/post/338000/


All Articles