Notes on the fields of Big Data Week Moscow

In continuation of our previous post with presentations from Big Data Week Moscow, we collected several statements by Russian and international speakers, which we especially remember and seemed worthy of attention.

I recorded these words by ear, so please forgive in advance any inaccuracies. Also, if you consider any of the statements “bayan”, write in the comments - it is interesting to know where your legs grow from!

1. "Knowledge of the subject area does not particularly help the data scientist in his work"
Mikhail Levin, Chief Data Scientist, Yandeh Data Factory
')
Context: Data Factory is a sensational project of Yandex, which was presented last December at the Paris conference Le Web. Yandex Data Factory is developing as a startup aimed at the international market. Yandex Data Factory developers create Big Business-oriented products based on Big Data. Among Russian pilots called, for example, Sberbank.

Why it is interesting: Traditionally, data scientists argue that knowledge of the specifics of the domain determines up to 50% of the success of machine learning. And Mikhail Levin focuses on a kind of “black box”, which is looking for correlations between various parameters without taking into account the physical meaning of certain values.

2. “The evolution of the Hadoop ecosystem follows the evolution of Linux”
Josep Curto, Data Scientist, Professor IE Business School Madrid

Context: IE Business School is in the top 20 business schools in the world. Recently, they had a master program on Big Data, and they began to collect expertise in this area. Josep Curto is the director of the Delfos Research research company and data scientist who specializes in implementing data analysis methods in various business areas.

Why it is interesting: Comparing Hadoop and Linux at first seems unexpected, but essentially productive. It implies both the potential scale of Hadoop distribution, and refutes predictions about the "death of Hadoop" (for example, in the context of Hadoop vs. Spark). Curto speaks of Hadoop as a paradigm and predicts this ecosystem is not death, but development. By the way, the opposition of Hadoop and Spark is not correct; it is more accurate to compare Spark and Hadoop Map-Reduce.

3. "Beeline made a strategic decision to focus on the development of Big Data for external customers, not for internal optimization tasks"
Alexanr Mole, a data scientist, VimpelCom

Context: VimpelCom (the company owning the Beeline brand) has been successfully developing the Big Data line for solving internal problems for quite some time. Moreover, in Vimpelcom there are as many as two units that work with big data - the management information department and a special Data Science laboratory. In the fall of 2013, a new general director, Mikhail Slobodin, came to VimpelCom, with the appearance of which there are big changes in the telecom strategy.

Why is it interesting: Vimpelcom has one of the strongest teams on Big Data in Russia (among those that are not part of large Internet companies). About the “traditional” (that is, non-Internet) business, it is commonly believed that Big Data helps them, in the first place, increase revenues from their core business - finding new customers, raising checks, solving security issues and preventing fraud. The transition to a new strategy, in which Beeline will make money on data, providing services to external customers (we are not talking about providing data to subscribers, this is clearly outlined in the company several times). The decision is connected with the arrival of the new CEO, Mikhail Slobodin. The Russian telecom market has long passed the stage of explosive growth, now it is growing only a few percent a year and in the future traditional services will become cheaper and bring less and less profit, therefore Beeline relies on Big Data as an opportunity to transform the business structure.

4. "Conversion of advertising Internet companies can be increased by about 20%, if you customize them taking into account the psychosegmentation of the audience"
Kirill Chistov, Director of Development Data-Centric Alliance

Context: Data-Centric Alliance is a Russian company specializing in working with Big Data and high-load systems. The company's developments are in the field of digital marketing - from programmatic purchases for online advertising, to technological integration with databases of client companies.

Why it is interesting: Having on hand data on the user's behavior on the Internet, you can target advertising campaigns based on its location, gender and age. By slightly complicating the task of the analyst, you can also learn a lot about the intentions and preferences of the person - what he read and watched, where he rested, what kind of car he drove. But today, many marketers are not enough.

DCA learns to divide audiences by psycho-types (rational / irrational, extroverts / introverts and disturbing ones). “Psycho-typing” is a complex analytical process that requires both machine learning and human resources.

When a brand understands the nature of the consumer, it can adapt not only the meaning of the message, but also the form of the feed, which significantly increases the conversion. DCA shared such a case study from its practice: in the “anti-aging cosmetics” category, targeting women who are anxious about age-related changes (“anxious” psycho-type) increased the influx of targeted visitors to the promotional site 2.5 times A visit costs 60% less for an advertiser.

Accurate targeting of advertising campaigns is now becoming increasingly popular. In March, Sberbank bought the company RuTarget, which is the developer of the Segmento advertising platform, a service that uses artificial intelligence technology and big data processing for ultra-precise advertising targeting.

5. “The use of Big Data technologies for analyzing social networks does not have indisputable business applications and this is more and more an R & D task”
Alexey Natekin, Director, Data Mining Labs

Context: Data Mining Labs is engaged in data mining, student learning, project development, and research in data analysis theory.

Why is it interesting? The ability to use open sources of information is one of the advantages of working with big data. In connection with social networks often mention the problem of optimizing advertising and credit scoring, but these cases more rely on social "features" for external tasks, says Natekin.

PS The organizer of Big Data Week Moscow was the Laboratory of New Professions and the Digital October Center.

Source: https://habr.com/ru/post/257499/

All Articles

Notes on the fields of Big Data Week Moscow

More articles: