Data Festival at the Moscow Museum, as it were

Hi Habr,

So, we held the Data Festival at the SMIT exhibition at the Museum of Moscow , which we wrote about here .
')
This is the first event from the series in which we gather experts from different areas of business, science and government and tell about data analytics.

Storing and analyzing data that was the prerogative of a narrow circle of companies and people are now beginning to affect the lives of almost everyone. For this reason, we started this series of events, where we tell a wide audience about the data and their analysis.

So, what was at the Festival:

First, Andrei Ustyuzhanin (Head of the joint projects of Yandex and CERN ) told how machine learning helps to study Black Matter.

Further, Alexey Vorobiev and Kirill Krasnoshchekov ( State Unitary Enterprise "NI and PI of the General Plan of Moscow" ) told about the use of Big Data for city planning.

Natalya Kalaitanova (DCA company media expert) talked about changing the approach to media placement with the help of analytics.

Nikita Kotlyarov from Avito told about using machine learning to block fraudulent ads on Avito.

Yuri Kashnitsky from the Beeline Data School spoke about the importance of analyzing emissions in the data using the example of identifying very successful Playboy models in terms of their parameters that do not fit the classical canons.

Rostislav Yavorsky (Associate Professor of the Department of Data Analysis and Artificial Intelligence of the Faculty of Computer Science of the National Research University Higher School of Economics) told about the analysis of social networks.

Sergey Marin from Beeline Big Data Department and the founder of Beeline Data School spoke about using Big Data to create a personalized customer experience at the level of each client.

All presentations are available here .

Also, as part of the Festival, we conducted a Hakaton on data analysis. The theme of the hackathon was the prediction of communications between subscribers.

Especially for the hackathon, we generated synthetic data, as close as possible to reality, which described the graph of communications between various subscribers. There were over a million peaks of the count.

After that, we made a special noise of this data, breaking some of the connections. The task was to restore the maximum number of links, along the way, without creating a lot of new edges that did not previously exist.

We were not limited to the simple fact of the existence of any connection between users, but also added information about the magnitude and form of the connection between them.

Description of file fields:

A - Id of subscription A,
B - subscription Id B,
x_A - subscriber A operator Id,
x_B - subscriber B operator Id,
c_AB - the number of calls from A to B,
d_AB - the duration of calls from A to B,
c_BA - the number of calls from B to A,
d_BA - the duration of calls from B to A,
s_AB - the number of SMS from A to B,
s_BA - number of SMS from B to A

Participants were also provided with a code to familiarize themselves with the structure of the solution format and for internal checks:

Benchmark.ipynb is an example of a simple solution with the conversion of the answer into a special format required to verify the results.
Checker.ipynb is the code that will be used to check the quality of the solution.

During the Hackathon, we realized that the proposed task was more interesting and more complicated than we had seen before, and we decided not to limit ourselves to the original four hours, giving the registered participants time before 6:00 pm Wednesday, December 23. For this, we quickly transferred the Hakaton online.

The subsequent format of online interaction was as follows:

A form was created in Google forms, in which registered participants indicated the following information:

First Name and Last Name (or Nickname)
post office
Direct link to laid out submission.csv
Comment - in case of questions

The final document was visible only to the organizers.

Once a day or more often we:

Downloaded solutions and drove them through the checker with source data
Updated rating and results of participants
Answered questions

After 6 pm on Wednesday, we summarized the results and determined the winners. They were:

1st place: Alexander Kukushkin. Prize: Certificate for studying at Beeline Data School
2nd place: Anton Ustinov. Prize: Quest Ticket
3rd place: George Zubrienko. Prize: Headphones

Alexander posted a description of his decision here .

All the guys are great fellows! We will solemnly hand over all prizes in the first week of January at the central office of VimpelCom in Moscow.

In general, I want to say a huge thank you to all the participants of our Festival, and also to express the hope that the event itself and the organization were liked.

This is the first of these events, and next year we plan many more. Follow the announcements on Habré and subscribe to news on the School page.

On top of this year and in the continuation of the paradigm, we spoke about the data analytics to a wide audience on the radio of Komsomolskaya Pravda, where we talked about data analytics, trends, and about the School of Data . Live recording is available here .

All with the upcoming holidays and see you in the New Year!

Source: https://habr.com/ru/post/274205/

All Articles

Data Festival at the Moscow Museum, as it were

More articles: