📜 ⬆️ ⬇️

How and why we won the Big Data track at the Urban Tech Challenge

My name is Dmitry. And I want to talk about how our team reached the hackathon final of the Urban Tech Challenge on the Big Data track. I’ll say right away that this is not the first hackathon in which I participated, and not the first in which I won prizes. In this regard, in my story I want to voice some general observations and conclusions concerning the hackathon industry as a whole, and give my point of view as opposed to the negative reviews that appeared on the network immediately after the end of the Urban Tech Challenge (for example, this one ).

So, first some general observations.

1. It is surprising that quite a few people naively think that a hackathon is something like a sports competition where the best coders win. This is not true. I do not consider cases where the hackathon organizers themselves do not know what they want (I have seen it). But, as a rule, the company that arranges the hackathon pursues its goals. Their list may be different: it may be a technical solution to some problems, the search for new ideas and people, etc. These goals often determine the format of the event, its deadlines, online / offline, how tasks will be formulated (and whether they will be formulated at all), whether there will be code review on the hackathon, etc. And the teams, and what they did, are evaluated from this point of view. And those teams that best get to the point needed by the company win, and many get to this point completely unconsciously and by chance, thinking that they are really participating in a sports competition. My observations show that in order to motivate participants, the organizers should create at least the appearance of a sports environment and equal conditions, otherwise they receive a wave of negative, as in the above-mentioned recall. But we deviated.

2. Hence the following conclusion. The organizers are interested in the participants coming to the hackathon with their own work, sometimes for this purpose they even specially arrange an online correspondence stage. This allows you to get stronger solutions at the exit. The concept of “their own groundwork” is a very relative one; any experienced progger can, at the very first commit, accumulate thousands of lines of code from his old projects. And whether it will be in advance prepared by an operating time? But in any case, there is a rule that I expressed in the form of a famous meme:
')


To win, you have to have something, some kind of competitive advantage: a similar project that you did in the past, knowledge and experience in a particular topic or a ready-made workout done before the hackathon began. Yes, it is not sports. Yes, it may not pay off the effort expended (here everyone decides whether it is worth coding 3 weeks at night for a prize of 100 thousand divided into the whole team, and at the risk of not getting it). But, often, this is the only chance to get ahead.

3. Selection of the team. As I noted in the hackathon chat rooms, many approach this question quite lightly (although this is the most important decision that will determine your result on the hackathon). In many areas of activity (both in sports and in hackathons) I saw that strong people tend to team up with strong, weak with weak, smart with smart, well, in general, you understand ... That's about what happens in chat rooms: less powerful programmers they are immediately snapped up, people who do not possess any valuable hackathon skills, hang out in chat for a long time and choose a team on the principle that someone would take it. Some hackathons practice random distribution among teams, and the organizers claim that random teams show results no worse than those that have already been established. But according to my observations, motivated people, as a rule, find the team themselves, if it is necessary to distribute someone, then, often, many of them do not come to the hackathon.

As for the composition of the team, it is very individual and highly dependent on the task. I could say that the minimum viable composition of the team is the designer, the front-fender, or the front-fender, the back-tender. But I also know of cases when teams that consist only of front-runners who attached a simple back-up to node.js, or made a mobile application on React Native, won; or only from backenders who made a simple layout. In general, everything is very individual and depends on the task. My plan for recruiting a team for a hackathon was as follows: I planned to assemble a team or join a team such as a front-endender — a backender — a designer (I myself am the front). And rather quickly I started chatting with a backend python and a designer who accepted the invitation to join us. A little later, a business analyst girl joined us, who already had the experience of winning the hackathon, and this resolved the issue of her joining us. After a short meeting, we decided to call ourselves U4 (URBAN 4, urban four) by analogy with the fantastic four. And they even put the corresponding picture on the avu of our telegram channel.

4. Select a task. As I said, you should have a competitive advantage, the task for the hackathon is chosen based on this. Based on this, after looking at the list of tasks and assessing their complexity, we stopped at two tasks: a catalog of innovative enterprises from the DIAE and a chat bot from EFKO. The task from the DPIiR was chosen by the backender, the task from EFKO I chose, since Had experience writing chatbots on node.js and DialogFlow. The EFKO task also assumed ML, I have some, not very big, experience in ML. And according to the conditions of the problem it seemed to me that it is unlikely to be solved by means of ML. This sensation strengthened when I went to the Mitap Urban Tech Challenge, where the organizers showed me a datasheet for EFKO, where there were about 100 photos of product layouts (taken from different angles) and about 20 classes of layout errors. And, at the same time, customers of the task wanted to get a classification success of 90%. As a result, I prepared a presentation of the solution without ML, the backeder prepared a presentation on the catalog, and jointly, having finalized the presentations, we sent them to the Urban Tech Challenge. Already at this stage revealed the level of motivation and contribution of each participant. Our designer did not participate in discussions, answered with a delay, and even filled out information about himself in the presentation at the last moment, in general, there were doubts.

As a result, we went through the task of PDiIR, and we were not upset that we did not go through EFKO, because the task seemed to us, to say the least, strange.

5. Preparing for the hackathon. When it became finally known that we passed to the hakaton, we began to prepare the blank. And here I don’t urge to start writing code a week before the hackathon begins. At a minimum, you should have a boilerplate ready, with which you can get to work right away, without having to tune up the tools, and without running into some kind of bugs that you decided to try on the hackathone for the first time. I know the story about the Angulyarshchik, who came to the hackathon and spent all 2 days setting up the build of the project, so everything must be prepared in advance. We assumed to distribute the duties as follows: the backender writes the crawlers that search the Internet and put all the collected information in the database, I write the API on node.js, which requests this database and sends the data to the front. In this regard, I pre-made the server preset on express.js, made the frontend preset on react. I do not use CRA, I always customize the webpack for myself and I know perfectly well what risks this may pose (remember the story about angularists). At this point, I asked for a blank interface, or at least a mockup from our designer, to have an idea of ​​what I would type out. In theory, he should also do his own preparations and coordinate them with us, but I never received an answer. In the end, I borrowed the design from one of my old project. And so it began to turn out even faster, since all the styles for this project had already been written. Hence the conclusion: the designer is not always needed in a team))). With these achievements we came to the hackathon.

6. Work on the hackathon. I saw my team live for the first time only at the opening of the hackathon in the CDP. We met, discussed the solution and stages of work on the task. And although after the opening we had to go by buses to Red October, we went home to sleep, having agreed to come to the place by 9.00. Why? The organizers, apparently, wanted to squeeze the maximum out of the participants, so they arranged just such a schedule. But, in my experience, you can normally code, not sleeping one night. As for the second, I'm not sure anymore. Hackathon is a marathon, you need to adequately calculate and plan your strength. Especially since we had blanks.



Therefore, having sleep off, at 9.00 we sat on the sixth floor of the Dewocracy. Here our designer unexpectedly announced that he does not have a laptop, and that he will work from home, and we will communicate by telephone. This was the last straw. And so we from the four turned into the three, although the name of the team did not change. Again, this was not a strong blow for us; I already had a design from an old project. In general, at first everything went quite smoothly and according to the plan. We loaded into the database (we decided to use neo4j) dataset innovative companies from the organizers. I started to typeset, then took up node.js, and then misfires went. I have never worked with neo4j before, and at first I looked for a working driver for this database, then I figured out how to write a query, and then I was surprised to find that this database returns entities as an array of node objects and their edges. Those. when I requested the organization's TIN and all the data on it, instead of one organization object, I returned a long array of objects containing data on this organization and the relations between them. I wrote a mapper that went through the entire array, and glued all the objects of the organization into one object. But in combat, when queried for a base of 8 thousand organizations, it was executed extremely slowly for about 20–30 seconds. I was thinking about optimization ... And then we stopped on time and moved to MongoDB, and it took us about 30 minutes. In total, neo4j was lost about 5 hours.

Remember, never take on hakaton technology with which you are not familiar, there may be surprises. But, in general, apart from this failure, everything went according to plan. And in the morning of December 9, we had a fully working application. For the rest of the day, we planned to add additional features to it. In the future, everything went relatively smoothly for me, but the backender had a whole bunch of problems with the ban of its crawlers in search engines, in the spam of aggregators of legal entities, which came in the first places of search results when queried for each specific company. But he will tell about it better. The first additional feature that I screwed up is a search by full name. CEO VKontakte. It took a few hours.

So, on the company's page in our application appeared ava director general, a link to his VKontakte page and some other data. It was a good cherry on the cake, although perhaps it did not ensure our victory. Then, I wanted to wind up any analytics. But after a long search of options (there were a lot of nuances with UI) I stopped at the simplest aggregation of organizations by economic activity code. Already in the evening, in the last hours, I deployed a blank for displaying innovative products (in our application, the Products and Services section is assumed), although the backend for this was not ready. The base at the same time swelled like yeast, the crawlers continued to work, the backeder experimented with NLP in order to distinguish innovative texts from non-innovative ones))). But it was already time for the delivery of the final presentation.

7. Presentation. From my own experience, I can say that the switch to the preparation of the presentation should be somewhere between 3 and 4 hours before its delivery. Especially if the video is supposed to be in it, shooting and mounting takes a lot of time. We were supposed to video. And we had a special person who was engaged in this, and also solved a number of other organizational issues. In this regard, until the very last moment we were not distracted from coding.

8. Pitch I did not like that the presentations and the final were made on a separate weekday (Monday). Here, most likely, the organizers continued the policy of squeezing the maximum out of the participants. I did not plan to ask for leave from work, I wanted to come only to the final, although the rest of my team took the weekend. However, the emotional immersion in the hackathon was already so high that at 8 am I wrote to the chat of my team (working, not the hackathon team) that I took the day at my own expense and went to the CPD for pitches. Our task turned out to be a lot of Scientists' clean dates, and this greatly affected the approach to solving the problem. Many had a good DS, but no one had a working prototype, many could not get around the bans of their crawlers in search engines. We were the only team with a working prototype. And we knew how to solve the problem. As a result, we won the track, although we were very lucky that we chose the least competitive task. Looking at the pitches in other tracks, we realized that we would not have any chance there. I also want to say that we were very lucky with the jury, they meticulously checked the code. And judging by the reviews, this was not the case on all tracks.

9. Final. After we were summoned several times to the jury for a code review, we thought that we finally decided all the questions and went to dinner at Burger King. There, the organizers called us again, had to quickly pack orders and go back.

The organizer showed which room to go to, and, having entered there, we found ourselves at the training in oratorical skills for the winning teams. The guys who were supposed to perform on stage were well charged, everyone went out like real showmen.

And I have to admit, in the final, against the background of the strongest teams from other tracks, we looked pale, the victory in the nomination of the state customer deservedly left the team from the track real estate tech. I think that the key factors that contributed to our victory on the track were: the availability of a finished workpiece, due to which we were able to quickly make a prototype, the presence in the prototype of "highlights" (search for general directors in social networks) and NLP skills of our backender who are also very interested in the jury.



And in conclusion, traditional gratitude to all those who supported us, the jury of our track, Evgeny Evgrafyev (the author of the task, which we solved at the hackathon) and, of course, the organizers of the hackathon. It was probably the largest and coolest hackathon, of all in which I participated, it remains only to wish the guys to keep such a high brand in the future!

Source: https://habr.com/ru/post/433802/


All Articles