📜 ⬆️ ⬇️

Mathematical secrets of "big data"

image

The so-called machine learning never ceases to amaze, but for mathematicians the reason for success is still not entirely clear.

A couple of years ago, at the dinner I was invited to, the outstanding specialist in the field of differential geometry, Eugenio Calabi, volunteered to dedicate me to the subtleties of a very ironic theory of the difference between the adherents of pure and applied mathematics. So, having come to a dead end in their research, the advocates of pure mathematics often narrow the problematic, thus trying to circumvent the obstacle. And their colleagues, specializing in applied mathematics, come to the conclusion that the current situation indicates the need to continue studying mathematics in order to create more effective tools.
')
I have always liked this approach; after all, thanks to him, it becomes clear that applied mathematics will always be able to use new concepts and structures that now and then appear in the framework of fundamental mathematics. Today, when the issue of studying “big data” is on the agenda - too large or complex blocks of information that cannot be understood using only traditional data processing methods - the trend, moreover, does not lose its relevance.

The modern mathematical approach to the interpretation of many methods that are crucial in the current revolution of big data is, at best, insufficient. Consider the simplest example of learning with a teacher that companies such as Google, Facebook and Apple used to create voice recognition or image recognition technology, which in terms of accuracy should be as close as possible to human performance. The development of such systems begins with the preparation of a huge number of training samples - millions or billions of images and voice recordings - which are used to form a deep neural network defining statistical patterns. As in other areas of machine learning, the researchers hope that computers will be able to process enough data to “study” the task: in this case, the machine is not programmed for a detailed decision plan; she is given the opportunity to adhere to various algorithms, thanks to which one can focus on the corresponding patterns.

Speaking in the language of mathematics, in such systems of teaching with a teacher there are large sets of stimuli and corresponding reactions; the computer is tasked with mastering a function that, surely, guarantees the correct result in the event of a new incoming signal. To do this, the computer has to decompose the task into several unknown - sigmoid - functions. These S-shaped functions resemble the ascent from the road to the pavement: this is a smooth transition from one level to another, where the initial level, the step height and the width of the transition region are not predetermined.

In response to the incoming signal arriving at the first level of the sigmoid function, results are generated, which, before the transition to the second level of the sigmoid function, can be combined. So the process continues from level to level. The data obtained during the operation of the functions form a “network” in the neural system. The “deep” neural network consists of multiple layers.

A few decades ago, researchers proved that such networks are universal, and, therefore, can generate all possible functions. Other scientists later came to the theoretical conclusion about the existence of a unique connection between the network and the functions it generates. True, the results of such studies concerned potential networks consisting of an incredible number of layers and having many function intersection points within each layer. In practice, neural networks use about 2-20 layers *. It is because of this limitation that none of the classical theories could explain why neural networks and learning through deep neural networks are so effective.

And here is the credo of most applied mathematicians: if mathematical principles really work well, everything must have an excellent mathematical explanation and we just have to find it. In this case, it may turn out that so far we don’t even have an appropriate mathematical basis to deal with all this (or, if there is one, it may have been created within the framework of pure mathematics, from which the approach has not extended to other mathematical disciplines).

Another method used in machine learning - learning without a teacher, is used to identify hidden links in large blocks of information. For example, suppose you are a researcher who wants to study in detail the personality types of people. You have received a solid grant, thanks to which an opportunity has arisen to conduct a personality test from 500 questions among 200,000 participants in the experiment. Answers vary on a scale from one to 10. As a result, you have 200,000 data processing results in 500 virtual “measurements” - one measurement for each initial question from the test. Taken together, these results form one- and two-dimensional areas in a 500-dimensional space. It's like a simple ascent into the mountains corresponds to a two-dimensional model in three-dimensional space.

For you, as a researcher, it is important to define the one and two-dimensional models mentioned, in order to subsequently reduce the personality portraits of the 200,000 participants in the experiment to the fundamental characteristics - this is how to find out that two variables are enough to identify any point within a particular mountain range. It is possible that the test for determining the personality will turn out to be set by a simple function describing the relationship between variables, the total number of which is less than 500. This kind of function will make it possible to display the hidden structure in the data.

Over the past 15 years, researchers have created several tools to analyze the geometry of hidden structures. For example, you can build a surface model, pre-increasing the scale of different points. At each point you put a drop of virtual ink and see how it spreads. Depending on the degree of bending of the surface to a specific point, the ink will / will not flow in one direction or another. Combining all the ink stains, you get a pretty clear idea of ​​how the surface looks as a whole. With this information, you would not only have a set of data processing results. Before your eyes would open the connection on the surface, interesting loops, folds and kinks. And, then you would understand how you can explore the information received.

Thanks to the above-mentioned methods, many interesting and useful discoveries have been made, but additional tools will be required. Applied mathematicians will have to work hard. But even when faced with such difficult tasks, they believe that many of their “pure” colleagues will be objective, continue to participate in current projects and help discover the connection between various mathematical structures. And, perhaps, they will create new ones.

* The original version of the article states that practical neural networks use only two or three levels. Currently, this indicator for innovation systems is more than 10 levels. The winner of the last stage of the ImageNet Large-Scale Visual Recognition Challenge - Google’s image recognition algorithm - involved 22 levels.

Source: https://habr.com/ru/post/272883/


All Articles