
On May 27, another Moscow Data Science Meetup was held at the Mail.Ru Group office. The meeting brought together representatives of large Russian companies and scientific organizations, as well as enthusiasts in the field of machine learning, recommendatory systems for analyzing social graphs and related disciplines. The guests shared with each other their experience in solving practical problems of data analysis. We bring to your attention video recordings and presentations of three reports presented at the meeting.
Dmitry Nosov, Rambler & Co, H2O on Spark: how we drank soda and almost chokedH2O is an interesting and promising machine learning platform. It can please the analyst with the speed of working with large amounts of data, a set of algorithms, the presence of an API for several programming languages, and, of course, beautiful and detailed reports on the constructed models. H2O is written in Java, so
TM works everywhere, including on the Spark cluster. In the report, the speaker shared his experience of using H2O on Spark and YARN, as well as the reasons for not using H2O in the production environment, despite all its positive qualities.
')
Video of the speech:
it.mail.ru/video/724Pavel Filonov, Kaspersky Lab, Deep learning and feature extraction in time series forecastingAutomatic selection of features that occurs when building deep networks is seen as a promising tool that can significantly reduce the amount of data preparation work. The report considers the problem of predicting the values ​​of the time series and compares the approaches to its solutions using both the manual identification of features and those built on fully automatic processing of raw data.
Video of the speech:
it.mail.ru/video/723Alexander Dyakonov, VMK MSU, Solution of the Search Results Relevance problem (on the Kaggle platform)The task of determining the relevance of the search results, which was solved at last year's “Practical seminar on blood pressure kaggle”, was analyzed. A very simple algorithm was described that does not use complex methods for analyzing texts, dictionaries and ensembles of algorithms, and which, nevertheless, was able to get into the top ten of the strongest among more than 1,300 participants.
Video of the speech:
it.mail.ru/video/722We remind you that at the moment on the
ML Boot Camp platform we are hosting a machine learning competition. Register, until the end of the contest there are just over two weeks left, those who want to easily have time to break into the TOP! :)