📜 ⬆️ ⬇️

Experiment in Yandex. How to identify a cracker using machine learning

Yandex servers store a lot of necessary and important information for people, so we need to securely protect the data of our users. In this article we want to tell you about our research, in which we study how to distinguish the account holder from the attacker. And even when both have a username and password from the account. We have developed a method that is based on the analysis of user behavioral characteristics. It uses machine learning and makes it possible to distinguish the behavior of the current owner of the account from the attacker for a number of characteristics.



Such an analysis is based on mathematical statistics and the study of data on the use of Yandex services. Behavioral characteristics are not enough to uniquely identify the user and thereby replace the use of a password, but this allows you to determine the hacking after authorization. Thus, the stolen password from the mail will not allow to pretend to be its real owner. This is a truly important step, which will take a different look at security systems on the Internet and solve such complex tasks as determining the current account holder, as well as the moment and nature of hacking.

It is generally accepted that the methods of recognizing a person appeared relatively recently, but in fact the history of various methods of identification is rooted in the Middle Ages. It is known that in ancient China at the turn of the 14th-15th centuries, fingerprints were already guessed. True, this method was used in a limited way - merchants thus signed trade agreements. In the late 19th century, the uniqueness of the papillary lines formed the basis for fingerprinting, the founder of which was William Herschel . It was he who advanced the theory that the drawing of the palmar surfaces of a person does not change throughout his life.
')

Fingerprint card Herschel

With the development of information technology, various user recognition systems have emerged. Most of these methods are designed to enable a person to control access to some system, but in fact the area of ​​user identification and authentication is much wider.

Scientists around the world are struggling with the problem of identifying people on various grounds. There are different models and theories: from the most popular, where the already mentioned fingerprints, eye iris, voice, to new and controversial, in which mouse movements, keyboard handwriting and website behavior are taken into account are used for recognition. Yandex is also actively engaged in the study of existing models and the creation of new ones. We are at the very beginning of the path, but we have already achieved some success, so we want to tell you a little about our experiments.

We are constantly working on the algorithms for protecting mailboxes from hacking, spam and malicious activity that could harm the user. Those access control methods that already exist make it difficult for hackers to penetrate the mailbox, but, alas, do not completely solve the problem of hacking. The bottleneck is the use of a password that can be lost, stolen, intercepted or picked up. For example, password interception may occur if you use a password from Yandex on other services where a secure connection is not supported.

We thought: “Is it possible to distinguish a burglar from the real account holder if both are logged in with the same password?” It turned out that yes. Our research has shown that the behavior of the mailbox owner is always different from how a cracker behaves.

In general, a number of characteristics can be distinguished from the user's behavior in the mail: the time of entry, the usual location, the number of authorizations, devices used, etc. There are operations that are not typical for a particular person. For example, deleting read letters, erasing folders, sending mailings. A person may develop certain behavior when working with different types of letters: reading letters from people, deleting mailings, ignoring letters from social networks. In addition, there are habits such as “reads a chain of unread letters from the bottom up,” “logs in and goes first to the Mail, then to the Disk and then to the News,” and so on. Such behavior patterns can be computed for many of our services. From the combination of these factors, the user profile is added up, which does not give a complete picture of the user himself, but allows to distinguish the fact of an account being hacked from normal authorization. Of course, this approach cannot be effective without the use of machine learning. It determines a set of factors that influence the profile, and the boundaries for defining hacking.

The essence of this method is very simple: everyone has habits peculiar only to him, starting with the mode of work and rest, continuing in the places in which a person happens and the number of devices he uses. For example, someone always checks mail from home and work, uses two devices, never deletes read letters and does not send spam. He uses mail during daytime hours and never checks mail at night. And someone within a month often happens on business trips and periodically reads mail from different countries. These users will have different behavior patterns, on the basis of which you can build an individual profile and compare each new entry to the mail with it.



Here are the profiles of two different people. The red graph shows the profile of a regular non-cracked user. It can be seen that everything is quite uniform, and there are no sharp jumps in the parameters. The blue graph illustrates the behavior of a suspicious account: all indicators jump strongly, a chaotic appeal to the resource is traced. This makes it possible to assume the fact of unauthorized access.



And this graph shows the profile change at the time of hacking. In the blue area you can see that the indicators are normal, while in the red zone there are already significant fluctuations. In addition, the dates on which this occurred are clearly visible, which can greatly simplify the search for a hacking site.

This approach can protect users from stealing passwords and session cookies and will allow to detect hacking even after logging in to your account.

We are not yet ready to talk about the launch of a fully working system for determining hacking. Not all the details of the puzzle are assembled - it will take time to fully appreciate and learn how to use the advantages of these technologies. But their effectiveness is obvious now: the use of machine learning in information protection systems can greatly increase the security of stored data. So we will continue to work in this direction.

Source: https://habr.com/ru/post/230583/


All Articles