Up to a hundredth: top 10 reports SmartData 2017

SmartData conference viewers are people who love working with data. It must be assumed that they gave very thoughtful assessment to the reports after last year’s conference.

And now, according to these estimates, we have compiled the top 10 videos. And at the same time, in order to please the data lovers, we indicated for each of the ten reports all the accompanying numbers: a place in the top, an exact audience rating, the number of viewers.
')
Generally speaking, often the neighboring positions in the top ratings differ slightly. So, perhaps, one should not attach much importance to “who follows whom” - it is more important that all these reports received high marks. But on the other hand, how can it not give much attention to numbers when it is so exciting!

Neurona: why did we teach the neural network to write poems in the spirit of Kurt Cobain?

Speaker: Ivan Yamshchikov
Location: 1
Rating: 4.51 ± 0.08
Number of viewers: ~ 200
Presentation of the report

The keynout from the creator of the projects “Neural Defense” and Neurona became the clear leader of the conference. This is an accessible performance that does not require a gigantic preparation from the viewer - but at the same time it is not just a hundred-thousandth explanation of “how neural networks work”. It seems to be an “entertaining” format (it is unlikely that what he heard will immediately affect your work project) - but in the long run, this can be not only very interesting, but also useful. In general, is it any wonder that we invited Ivan to participate in the upcoming SmartData 2018.

From click to forecast and back: Data Science pipelines on Odnoklassniki

Speaker: Dmitry Bugaychenko
Location: 2
Rating: 4.36 ± 0.08
Number of viewers: ~ 140
Presentation of the report

And here is the opposite. First of all, this is not a general “that can give us machine learning,” but the specifics of “exactly how we implement everything”. And the report is not about ML itself (the personalization of the news feed is given merely as an example), but about all the accompanying ones: “what needs to be done to make all this ML beauty work”. In general, if the performance of Yamshchikov may interest even a wide audience, here it will be interesting only to be personally involved with machine learning, but they can take a lot for themselves.

CatBoost - the next generation of gradient boosting

Speaker: Anna Veronika Dorogush
Location: 3
Rating: 4.32 ± 0.12
Number of viewers: ~ 100
Presentation of the report

If the gradient boosting is not your specialization, and the subject of the report caused a feeling that “there must be nuances for those who are already doing this in full swing,” dispel fears. The report is friendly to beginners and does not immediately dive into the pool with his head, but first explains the basic things. And given that over the past year, the CatBoost library from Yandex has become more beautiful and popular than before, it is useful to have an idea about it, even if you don’t have to deal with it right now - and the report can be just a good introduction.

Back to the future of the modern banking system

Speaker: Vladimir Krasilshchik
Location: 4
Rating: 4.31 ± 0.17
Number of viewers: ~ 80
Presentation of the report

What to do if eventually the quarterly report you disagree with the monthly, and the auditors with the regulators have questions? Vladimir Krasilshchik explains that the key concept here is bitemporality: there is “when the event happened”, and there is “when the system found out about it”, it is necessary to work with both of these scales and demonstrate to third-party verifiers both at once. This report is not limited to, there are still a lot of things - for example, did you think that at the IT conference you would hear the phrase “there is no justice, and do not try to create it”?

Name is a feature

Speaker: Vitaliy Khudobakhshov
Location: 5
Rating: 4.28 ± 0.08
Number of viewers: ~ 280
Presentation of the report

The most paradoxical speech of the conference, forcing it to bewildered in the back of the head. On the one hand, it is completely obvious to any rational person: there are no noticeable reasons for the correlation of a person’s name (if it’s about popular Russian names) and whether this person will be in a relationship. On the other hand, Vitaly presents data showing the opposite. He himself has no exact explanation, but no one has any really convincing objections. You can try to search for yourself.

No data? No problems! CGI Deep Learning

Speaker: Ivan Drokin
Location: 6
Rating: 4.26 ± 0.18
Number of viewers: ~ 40
Presentation of the report

As it is known, there are not enough algorithms for in-depth training - initial data are needed for training. As a result, a good data set has become a valuable resource. But what to do if you don’t have it now, and you aren’t Google and cannot invest huge resources? It turns out that it is not always necessary to take "real" data from the real world, and under certain conditions they can be literally generated. The report deals with a specific case of this kind.

Deep convolutional networks for object detection and image segmentation

Speaker: Sergey Nikolenko
Location: 7
Rating: 4.24 ± 0.17
Number of viewers: ~ 80
Presentation of the report

If you are still far from machine / deep learning in general, then the first 20 minutes of this report may well come up: there is a thorough introduction to the topic with a historical excursion starting from the 50s. And if you understand everything in general, but you don’t understand specifically the sub-topic of deep convolutional networks, then you can immediately skip the introduction and pay attention to the second half of the report, where they go to convoluted neural networks.

Hadoop high availability: Badoo experience

Speaker: Alexander Krashennikov
Location: 8
Rating: 4.22 ± 0.14
Number of viewers: ~ 100
Presentation of the report

It seems that, in addition to the concept of “big data,” more “growing data” would be useful, because growth dictates its own specifics. Once Badoo had orders of magnitude smaller amounts of data and one approach to them, then the volumes increased and changes were needed - and it must be borne in mind that tomorrow everything could grow even stronger, doing everything “with a stock”.

The company became interested in the combination of “Hadoop” and “realtime” even when they usually wrote “incompatible” between these two words, and now they told about their experience with Hadoop and ensuring high availability in its case. Bonus: a little creativity Vasily Lozhkin on slides.

We segment 600 million users in real time every day.

Speaker: Artyom Marinov
Location: 9
Rating: 4.21 ± 0.09
Number of viewers: ~ 120
Presentation of the report

Here the project is very different from Badoo: not the dating, but the DMP (data management platform), where you need to allocate among the audience segments like “housewives with a car over five years old”. But, first, there are also large scales (about one hundred thousand events per second). And secondly, here you need to be even more ready for growth: “among the data sources there are pixel installations, if suddenly a superpopular website puts your pixel to itself, there will be a huge stream that you will have to cope with.” What technologies are used and how they are used? Answers in the report.

Distributed ML on big data: the experience of building a recommendation system in ivi

Speaker: Boris Shminke
Location: 10
Rating: 4.21 ± 0.09
Number of viewers: ~ 100
Presentation of the report

Finally, the last report is also about “infrastructure, not algorithms,” and also based on the experience of a large product. Once ivi started to implement recommendations from the use of a third-party service that provided recommendations-as-a-service. Then they “grew up” from it and began to make their own system. On Habré, the company wrote about it back in 2014, and from the report you can find out about the current state of affairs.

If you are interested in these reports, please note: SmartData 2018 will take place this fall. Some speakers from this top-10 will return with new reports, there will be completely new names. The most up-to-date information about the program can always be seen on the website , there you can also buy tickets - and their price is gradually increasing, so it’s worth considering now.

Source: https://habr.com/ru/post/416985/

All Articles