Hi, Habr! As promised in a previous post about
Why So Serious Hack , we prepared the following story in this series. This time we will talk about the hackathon
"Municipal moirs" , which was held April 21-22 by the European University in St. Petersburg.

Introduction
Coincidentally, this hackathon was also unusual and very similar to Why So Serious Hack. But this is not surprising, given that the organizers overlapped, and the same person was responsible for the technical part of both events. That's why we decided to write about it now, while our memories of the hackathon are fresh, and yours about the previous post.
But for those who have not read the previous article, I repeat about the format of such competitions. They do not need to invent and implement the idea of ​​your project. Instead, competitors are asked questions on data analysis, to which they must respond in 24-48 hours.
')
In addition to the questions, the organizers provide data on which it is proposed to train models that predict certain target values. Knowing the accuracy of the model on a closed test sample, it is possible to test hypotheses regarding the original data. This helps the participants to understand in which direction to think, and can confirm the findings, provided good accuracy.
The quality of their models can be checked using the testing system. No, not on kaggle, as many of you most likely assumed, but with the help of a bot in a telegram! A place in the leaderboard does not determine the winner, but affects the order of performance of the teams. Better soon - before the presentation. The prize-winning place is appointed by the jury on the basis of the depth, quality, originality and elaboration of the team’s response to the question. But I will tell you about this in more detail later.
Hackathon organization
Like last time, let's start with a couple of words about how the hackathon was organized.
The event was held in the building of the European University. Unfortunately, there were no designated areas for overnight, but since the hackathon lasted 30 hours, this was not a big problem. There was no high-grade food at the site either, but the organizers provided a room with tea, coffee, cakes and cookies. This all sounds not like the description of a top hackathon, but knowing the current difficult situation at the EUSP, they can forgive all this.
The prize fund was 100,000 rubles, and only one team could receive it.
Case and solution
The team of the Res Publica Center of the EUSPb, which organizes the event, conducts a study of the quality of municipal government and its dynamics in Russia from 2007 to 2018.
So, the hackathon participants had to answer the question, on which the career path of the heads of cities and regions depends. It is assumed that the resignation of the head of the municipality is an indicator of the inefficiency of his work, and career advancement to higher levels - on the contrary. In our opinion, this is logical and should be so in fact (spoiler: no).
As data for predicting career trajectories of officials, it was suggested to use their biographical data, as well as various indicators from the database of municipalities. For example, the general condition of the roads or the number of hospitals.
The participants had a history of heads of municipalities over an 18-year period. Each entry in the dataset told about the state of the head’s career in a particular year, namely, it contained the following fields: year and region of work, municipality, job title, gender and age of a person, level and field of education, current career status and others.
The data was anonymized, but if desired, they could be recovered. This was considered a violation of the rules and was punished by disqualification.
The most interesting field here is the state of the career, since it was this field that needed to be predicted. The state of the career of a chapter is described not only by three meanings (“appointed to the position”, “works”, “dismissed from the position”), which would be logical to assume, but much more diverse and detailed set. For example, the head of the Ministry of Defense could resign from office for health reasons, or in connection with a criminal case brought against him. There were 13 such categories in total.
A typical example of a career trajectory chapter:

Since some readers of the previous post asked for more technical issues, we will tell you a little about them.
First, I’ll clarify that it was necessary to predict career trajectories not for future years, but for other candidates living in parallel. But in our opinion, this is a much more boring task than predicting the future. However, the rules are set by the organizer.
We settled on the one-vs-rest model, that is, when we build a separate classifier for each class. As a response to the sample, choose the class whose model is more confident that the example belongs to this class.
Having studied the data a bit, we paid attention to the frequency of the appearance of the category “victory in elections” depending on the year. The picture clearly shows the peaks in every fifth year. And it seems quite logical if most candidates are often chosen for the next 5 years.

Next, we decided to see how the categories are distributed depending on the region. To get the picture below, we first normalized everything in columns, and then in rows.

From the hitmaps above it can be seen that some cells stand out strongly compared to the others. For example, in the Republic of Udmurtia, posts are eliminated much more often than in other regions. And in the Yaroslavl region, candidates often move to another job.
Due to the presence of such features, we decided to add all these signs, that is, the frequency of classes by region. And it really helped: take a look at the importance of signs in our models and see that frequency plays the most important role.


Here are examples for two classes: reassignment and retirement, respectively.
Another interesting technical point is that the additional database with data on municipalities weighed more than 30GB, so you could either parse it or download it to a server that would have enough RAM to process.

This database contained a lot of different information about municipalities. However, its use did not help improve the result.
As I mentioned above, testing was conducted using a special bot in the telegram. The participant sends him his answers, and the bot returns the value of the evaluating metric and the position of the team in the results table. That is, the results of the other teams no one knows. For example, this is how it looked at this competition:

However, if you really want to, then you can cheat a little, sending in your system is not the best result and understand how tightly the team from below breathes into your head.
I would like to mention why we get a rather small value of F1 soon. The thing is that classes have a strong imbalance. Some are very many, others are few. Therefore, predicting many classes with good accuracy, and some that are found only a couple of times in the test is bad, we will not get an impressive metric value.
Someone may start to spit on, they say, why not hold such a competition on kaggle? Granted, kaggle is a pretty nice system. However, testing with the help of a bot does not look so ordinary and ordinary, which makes the competition unusual.
And probably, many noticed that only 6 teams took part in the hackathon. And this is very sad, because the hackathon was advertised in public, different chat rooms, but only about 20 participants came. So, victory was not a special challenge, but since we came to him and got some experience, why not tell about it?
results
At the end of the hackathon there were performances of teams. We presented our work first (slides can be found
here ). From the negative points: the jury mentioned that we told us in a too technical language and some terms were understandable only by context. Think about this both when composing a slide and when rehearsing a speech.
Immediately after the story, we realized one important mistake. Despite the fact that the winner is not determined by the position on the leaderboard, this time, for some reason, we blindly followed the goal: to maximize speed, and not spend much time to compose a full answer to the question: “What should you do to get a promotion?”
The jury also believes that it did not receive a convincing answer, but judging by their
recall , other results will also be useful. By the way, this is the first hackathon for us, after which they asked to open access to the code and briefly describe the solution and the signs used. It's nice that the result can help someone in researching this area.
As a small and obvious conclusion, friends: never forget the main goal you are pursuing, despite the level and scale of the event.
Post written in collaboration with
avgaydashenko .