⬆️ ⬇️

AI.Hack St. Petersburg

Hi, Habr! In this post I will tell you about one of the coolest hackathons with a DS track, which took place recently in St. Petersburg. Under the cut - a general overview, cases that we decided, and, of course, about how both teams of AU were able to become winners.



image



Introduction



This is the third post from our hackathon review series (read the first two here: WSSH , MunHack ), and therefore you just want to say that this hackathon was undoubtedly more ambitious and also more like a hackathon in the classical sense. That is, there were both teams solving the ML tasks on the data provided, as well as participants who came to finance their start-ups. However, I will tell you more about the proposed tracks later.



Organization



Hackathon was held in the beginning of March, in the new office of Gazprom Oil in St. Petersburg. But it was possible to participate online, presenting your results on Skype.

')

image




After registration, there was a presentation of the tracks, and then the formation of teams. Of course, it is better to always come beforehand. At a minimum, because everyone understands the pace at which a person works and what they can do, and do not risk running into the pitfalls.



Ideally, it is worthwhile to agree in advance on the distribution of roles and project structure. For example, for DS tasks there is a popular tutorial on structure . So far, we have not used it yet, but we definitely plan it. Because each time after a couple of hours after the start of the hackathon, the common repository turns into a dump of undocumented scripts, which are then very, very difficult to understand.



After introducing the tasks and collecting the teams, the development process itself began. It should be noted that this hackathon differed from others by the presence of checkpoints and lectures during all two days.



Although the checkpoint was more familiar with the mentors and the jury than the current progress report, it was very helpful in the end. At the presentations, the presentation of the results was strictly limited to three minutes, but since the jury already had an idea about the teams, it was good for everyone. On the one hand, during the hackathon, participants were distracted by conversations and received new ideas from experts in the field. On the other hand, at the end there were no tedious half-hour presentations when the speaker decided to suddenly begin with his biography. In general, everything was in the case, for which many thanks to the organizers.



Cases and solutions



The organizers came up with six tracks to participate, among which there were two nominations for the best solution for kaggle competitions at the Gazprom Oil and RoboMed companies, it was also just a case from Gazprom Neft and the Speech Technology Center, even teams could present their project in the field of AI, and lastly, if the decision in one of the kaggle competitions did not gain the best speed, you could think of how to monetize it and thus win the nomination for the best product solution using the provided dataset.



As I already mentioned, two teams took part in this hackathon from the Academic University. Our team immediately took on both kaggle tracks, and our classmates decided one of them, from RoboMed.



image



RoboMed suggested participants to solve the problem of predicting customer churn by gender, age, diagnosis and history (patient complaints). For some time, our team was the first on the leaderboard of fifteen teams, but then our classmates overtook us.



If you go into the technical details, then the most difficult thing was to work with the anamnesis. It was impossible not to take it into account at all, because there were many omissions in the other signs and the patient's opinion, which just describes the history, must be taken into account. Especially when predicting whether the patient will return to the clinic or not.



Problems arose mainly due to the free-form filling history. For example, there were many typos with which we fought regular expressions. Further, the corrected line passed through the stemming (you can think of this as a reduction to the initial form of each word). The resulting string was vectorized using TF-IDF or Bag-of-words. But, for example, we have not done such a killer feature as the length of the string. Our classmates thought of this before and it turned out to be quite important. In addition, the color history (positive / negative) and the risk category of the disease in the diagnosis were analyzed.



Trees (CatBoost and LightGBM) and the nearest neighbors method were used as models for prediction. As well as blending, that is, averaging the results of the prediction of several models.



About the task on the data of Gazprom Neft, I, unfortunately, will not tell, because she was engaged in another part of the team.



image



Presentation of results



By the end of the competition, we did not reach the first place in the accuracy of predictions, so a few hours before the presentations it was necessary to urgently decide how to present the results in the context of a business model. And this again helped the mentors.



Often, when there is a clearly defined task, you don’t even have to think about where and how the result of your work will be used. Moreover, as a person far from business, I cannot even think of a way to monetize the resulting algorithm.



It is clear that in the end I want to use the knowledge that the client will leave to prevent this. Therefore, we decided that we will try to highlight the reasons why the client may not want to return. Suddenly, the most popular reason for the outflow was that the client was simply cured. We made such conclusions when the disease was considered non-chronic. Having identified several more such clusters, we again went to the experts to find out what a hypothetical startup could do with such information.



From the interesting moments there was also the fact that we even considered how much money a company could save with the help of such an algorithm, and this was definitely a new experience for me.



The resulting slides, by the way, can be viewed here: Robomed , GPN , the presentation of classmates , and the presentations from the hackathon here .



image



Aftertaste



Now I want to mention a small minus, then to finish on a positive note. This food, for some reason, hackathons traditionally have problems with such a basic thing. Apparently, it is assumed that on an empty stomach it seems better. Yes, there was coffee-point, cookies, and even energy, but there were not enough tasty lunches and breakfasts from misted containers.



image



But on the site were cool sleep capsules. Personally, I could not fall asleep in it, because it was impossible to turn around because of the shape of the capsule, but it looked nice and just lying there for 15 minutes under light music was pleasant.



image



From the pleasant little things on the hackathon there were quite comfortable places for work and merch in the form of stickers and T-shirts. And after the memory there was a lot of photos, because almost all the time at the site were photographers. And even two overview videos: from the organizers and from Gazprom Neft .



Conclusion



It seems I have never mentioned who organized the hackathon, and it’s in vain. This was done by the guys from Sci.Guide , for which many thanks to them. We had a great time and for St. Petersburg such a level of the hackathon is definitely something to be equal to. In general, we want more!



image



Post written with Rebryk .

Source: https://habr.com/ru/post/358242/



All Articles