
On September 1, Mail.Ru Group and the Open Data Science community will hold the largest Moscow Data Science Major meeting. The event consists of five thematic blocks of reports, one ML-training and a whole hall for networking and dating.
Meet the program and
register ! Entry to the event is free, according to the approved registration.
Reports at the Moscow Data Science Major will be held in two streams. In the table you will find the grid with the schedule, and below - the description of the reports.
')
Schedule:

Report descriptions:
"Speaker Diarization Problem", Gregory Sterling, NeurodataLab LLCI will briefly talk about speech processing as a whole and about the speaker diarization task (by recording the dialogue, it is necessary to determine who spoke and when). I'll tell you about the history of the problem, why, why, about the cocktail party problem, who decided how it is difficult. The main part of the report will be devoted to the results of 2017-2018, for example, about the Google article, which describes the solution of the problem for the video (where the neural network seems to be trying to read lips). I will end with what they do when there is no video, but there is only sound (dialogue on the phone, for example), walk through the articles and our approach.
“Neuronet vocoders”, Sergey Dukanov, Mail.Ru GroupFirst, there will be a small digression into modern approaches to solving the problem of speech synthesis, then we will talk about vocoders, and then we will focus on one of the most interesting of them (both in terms of theory and practice).
"Pizza a la semi-supervised", Arthur Kuzin, DbrainUsing the example of product control at Dodo Pizza, I’ll talk about how to work with data when teaching models. In particular, I will show how the boxes are stretched onto the semantic segmentation of objects, as well as how to train the model and get the markup of the dataset by marking out only a few samples.
“OCR and TD architecture in the recognition of photographs of printed documents”, Alexey Goncharov, Ilya Zharikov, Philip Nikitin, MIPT Machine Intelligence LaboratoryThe report describes the structure of OCR (character recognition) and TD (detection of windows with text), which our team uses in projects for the recognition of photographs of printed documents of various types. Let's talk about both the architecture and the training of these systems.
“How to do domain adaptation, and ideas to improve its quality”, Renat Bashirov, Samsung AIThe report is a squeeze of ideas from a couple of dozen articles. Articles were selected according to the degree of utility for implementing domain adaptation for images: having one marked set, how to get / improve markup on another similar set.
Will be:
- many gan'ov,
- several architectures with a dozen loss functions
- told about
- that such different things can be served as a loss function,
- style transfer
- application domain adaptation for different tasks: classification, segmentation.
Do not think that nothing will be clear if you understand, for example:
- what is the loss function
- how backprop works
- why batchnorm is needed and how it works,
- What is the size of the tensor obtained after the global average pooling.
“Search for goods - organization of work”, Dmitry Dremov, Analysis of checksAbout the task, the approach to the organization of work and results.
“Showcases in the social network: how and what to show”, Sergey Boytsov, OdnoklassnikiLet's go all the way from the user to a specific item in the window that he sees. Data collection, preprocessing, analytical processing, A / B testing.
“Recommender systems for transport tickets”, Artem Prosvetov and Konstantin Kotochigov, CleverDATAThe report will tell about the use of recommendation systems in an unusual area for them: for the sale of transport tickets. What traditional approaches can help in solving this problem, which heuristics show themselves well and what discoveries we have made for ourselves while working on this project.
Tuning Jupyter Notebook, Alexander Lifanov, MarketGuardHow to configure Jupyter Notebook for productive and convenient work.
“BigArtm is not just for text”, Maxim Statsenko, Mail.Ru GroupMany people are accustomed to embedding is about text: we make embedding of words, sentences, etc. In a sense, thematic modeling is embedding too. In my report, I want to show that with the help of Python and ingenuity, it is possible to use the approaches of thematic modeling and embeddings in tasks in which there are no texts at all, namely in clustering users by sources of earnings and by interests.
“PID Controller intro, or How to brew beer with PyData”, Anton LebedevichA gradual introduction to the most popular automatic controller on the example of mashing malt for beer, with animation and Python code. In addition to the basic PID controller, there will be a couple of tricks that improve his work in real life. In practice, automatic regulation is often needed, and almost any of its implementation contains PID elements along with their flaws, which you need to be aware of and be able to repair.
Zone "Cinema"Networking and dating area. In this hall, you will be able to communicate with colleagues and other participants of the event in a free format.
To participate you must
register . Do not forget your passport or driver's license.
Collection of participants and registration : 10:00 - 11:00.
Beginning of reports : 11:00.
Approximate end of the event : 17:00.
Address : Moscow, m. Airport, Leningradsky Prospect, 39, p.79.
Broadcast