Yandex Marketing Services Director Andrei Sebrant told students of the Small ShAD about what big data is and about those often unexpected places where they find their application.
Bid Data as a concept everyone has heard is not the first year. But not everyone has an accurate idea of what this concept represents, especially for people outside the IT sphere. The easiest way for an uninformed person to explain this is with a practical example.
')
Two years ago, a huge network of stores Target began to use machine learning when interacting with customers. As a training sample, data collected by the company over several years were used. Bank and registered discount cards were used as markers of specific customers. The algorithms analyzed how and under what conditions customer preferences changed and made predictions. And on the basis of these forecasts, buyers were made all sorts of special offers. In the spring of 2012, a scandal broke out when the father of a twelve-year-old schoolgirl complained that his daughters were sending booklets with proposals for pregnant women. When the Target network was already prepared to admit the mistake and apologize to the offended customers, it turned out that the girl was indeed pregnant, although neither she nor her father were aware of this at the time of the complaint. The algorithm caught changes in customer behavior characteristic of pregnant women.
Signs of big data
- Volume : really large (although the size depends on the resources available for processing).
- Variety : poorly structured and heterogeneous.
- Velocity : it is necessary to process very quickly (and the results are often needed quickly, if we are talking about online services).
Applications can be very diverse. For example, the site
ancestry.com is trying to build a family history of all mankind, based on all the data types available today: from handwritten records in various accounting books to DNA analysis. To date, they have managed to collect about five billion profiles of people who lived in the most diverse historical epochs, and 45 million family trees that describe the connections within families.

The main difficulty in this work lies in the fact that the data being processed suffers from incompleteness, there are many inaccuracies in them, and people need to be identified by not unique names, surnames, dates of birth, death, etc. Standard algorithms do not cope with the processing of such data. However, machine learning allows you to take into account all these inaccuracies and with a high probability to produce the correct results.
Another example is the
eHarmony project. This is a dating site, which now has about 40 million registered users. In the forms you can specify up to 1000 different signs. The system makes about 100 million assumptions every day that two people can fit together.

And these assumptions are built not simply on the banal finding of correspondences in the properties and preferences specified by users. For example, it turned out that the relative area of a person in a profile photo may influence the likelihood of contact between certain people. In addition, it turned out that people with a passion for certain types of food may have different compatibility with each other. Two vegetarians with a probability of 44% will find a common language and begin to communicate, while two lovers of hamburgers with a probability of 42% will not start a relationship.

The most interesting thing about all this is that by applying machine learning to make decisions, we no longer understand the principles on which they are adopted. Of course, machine learning cannot be called artificial intelligence in the true sense, because it can only solve the task for which it was trained. But those hundreds and thousands of factors that a trained algorithm takes into account may simply not occur to us. After learning, the algorithm can better than any usableist to determine which button design to show to a particular user, here a huge amount of data will work better than a person’s experience and skills. But constructing a good website from scratch with the help of machine learning will not be released yet.
After watching the
lecture to the end, you can get a general idea of how machine learning works. A more familiar with this topic can be with the help of lectures on
machine learning and
computer vision .