NeurIPS is a conference that is currently considered the most top event in the world of machine learning. Today I will tell you about my experience of participating in NeurIPS contests: how to compete with the best academics in the world, take a prize place and publish an article.
NeurIPS supports the introduction of machine learning methods in various scientific disciplines. About 10 tracks are launched annually to solve actual problems of the academic world. According to the results of the competition, the winners perform at the conference with presentations, new developments and algorithms. Most of all I am passionate about learning with reinforcements (Reinforcement Learning or RL), so for the second year I have been participating in RL contests dedicated to NeurIPS.
Secondly, this conference is a global event, scientists from different countries gather in one place, with each of which you can communicate.
In addition, the entire conference is filled with the latest scientific achievements and state-of-the-art results, to know and follow which people from the field of data science is extremely important.
Starting to participate in such contests is quite simple. If you understand so much in DL that you can train ResNet - that's enough: register and go ahead. There is always a public leaderboard on which you can soberly assess your level compared to other participants. And if something is not clear –– there are always channels in slack / discord / gitter / etc to discuss all the issues that arise. If the topic is really “yours”, then nothing will stop you from getting the coveted result –– in all the contests in which I participated, all approaches and solutions were studied and implemented right in the course of the competition.
The gait of a person is the result of the interaction of muscles, bones, organs of sight and inner ear. When the central nervous system is impaired, certain movement disorders can occur, including gait disturbance –– abasia.
Researchers from the Stanford laboratory of neuromuscular biomechanics decided to connect machine learning to the subject of treatment in order to be able to experiment and test their theories on a virtual skeleton model, and not on living people.
The participants were given a virtual human skeleton (in the OpenSim simulator), which had a prosthesis in place of one leg. The task was to teach the skeleton to move in a certain direction with a given speed. During the simulation, both the direction and speed could change.
Reinforcement Learning (RL) is an area that deals with decision theory and the search for optimal behavior policies.
Recall how to teach cat doggy new tricks. Repeat some action, for doing the trick you give a snack, for failing - you don't. A dog should understand and find the strategy of behavior (“policy” or “policy” in terms of RL), which maximizes the number of received tastes.
Formally, we have an agent (dog) who is trained in the history of interactions with the environment (man). In this environment, assessing the actions of the agent, gives him a reward (snack) - the better the behavior of the agent, the greater the reward. Accordingly, the task of the agent is to find a policy that maximizes well the reward for the whole time of interaction with the environment.
Developing this topic further, rule-based solutions - software 1.0, when all the rules were set by the developer, supervised learning is the same software 2.0, when the system learns itself by the existing examples and finds data dependencies, reinforcement learning is a step further when the system itself learns to explore, experiment, and find the required dependencies in their solutions. The further we go, the better we try to repeat the way a person learns.
The task looks like a typical reinforcement training representative for tasks with a continuous space of action (RL for continuous action space). It differs from the usual RL in that instead of choosing a specific action (pressing the joystick button), this action is required to be accurately predicted (there are infinitely many possibilities here).
The basic approach to solving ( Deep Deterministic Policy Gradient ) was invented in 2015, which for a long time by the standards of DL, the area continues to actively develop in application to robotics and real-world RL applications. There is something to improve: robustness of approaches (not to break a real robot), sample efficiency (not to collect data from real robots for months) and other problems of RL (exploration vs exploitation trade-off, etc). In this competition, a real robot was not given to us - just a simulation, but the simulator itself was 2000 times slower than Open Source analogues (on which everyone checks their RL algorithms), and therefore brought the problem of sample efficiency to a new level.
The competition itself took place in three stages, during which the task and conditions were somewhat modified.
The main quality metric was considered the total reward for the simulation, which showed how well the skeleton adhered to a given direction and speed throughout the distance.
During the 1st and 2nd stages, the progress of each participant was displayed on the leaderboard. The final decision was required to send in the form of a docker-image. There were restrictions on work time and resources.
Because of the availability of the leaderboard, no one shows their best model in order to give out “a little more than usual” in the final round and surprise their rivals.
Last year, there was a small incident in evaluating decisions in the very first round. At that time, the test went through http-interaction with the platform, and the face of testing conditions was found. It was possible to find out in which situations the agent was evaluated and to retrain him only under these conditions. Which, of course, did not solve the real problem. That is why it was decided to transfer the system submit to docker-images and launch on remote servers of the organizers. Dbrain uses the same system for calculating the result of competitions exactly from these considerations.
Ideally, your knowledge and skills should be at the same level and complement each other. For example, this year I got our team on PyTorch, and I got some initial ideas for implementing a distributed agent training system.
How to find a team? First, you can join the ranks of ods and look for like-minded people there. Secondly, for RL-fellows there is a separate chat in a telegram - the RL club . Thirdly, you can take a wonderful course from the ShAD - Practical RL , after which you will definitely get a couple of acquaintances.
However, it is worth remembering the policy of “submitting - or not”. If you want to unite - first get your decision, zabmmit, appear on the leaderboard and show your level. As practice shows, such teams are much more balanced.
As I already wrote, if the theme is “yours”, then nothing will stop you. This means that the region does not just like you, but inspires you - you burn with it, you want to become the best in it.
I met RL 4 years ago - during the passage of the Berkeley 188x - Intro to AI - and still do not cease to be surprised at the progress in this area.
Third, but just as important - you need to be able to do what you promised, to invest in the competition every day and just ... solve it. Everyday. No innate talent can compare with the ability to do something, even a little bit, but every day. This is what motivation is needed for. To succeed in this, I advise you to read DeepWork and AMA ternaus .
Another extremely important skill is the ability to distribute your strength and use your free time properly. Combining fulltime work and participation in competitions is not a trivial task. The most important thing in these conditions - do not burn out and withstand the entire load. To do this, you need to properly manage your time, soberly assess your strength and do not forget to rest in time.
At the final stage of the competition, there is usually a situation where in just a week you need to do more than just a lot, but A LOT. For the sake of a better result, you need to be able to force yourself to sit down and make the last dash to the coveted prize.
Because of what you may need to rework for the benefit of the competition? The answer is quite simple - the transfer of deadlines. At such competitions, organizers often cannot predict everything, which is why the easiest way out is to give participants more time. This year the competition was extended 3 times: first for a month, then for a week and at the very last moment (24 hours before the deadline) - for another 2 days. And if during the first two transfers it was necessary to simply organize the extra time, then on the last two days you just had to plow.
Among other things, do not forget about the theory - to be aware of what is happening in the region and be able to note the relevant. For example, to solve last year, our team pushed away from the following articles:
This year they added another “couple”:
I also advise OpenAI to compile articles on reinforcement learning and its version for mendeley . And if you are interested in the topic of training with reinforcements - join the RL club and RL papers .
The averaged reward for 10 test episodes served as the final evaluation of the decision.
The graph shows the results of testing our agent: 9 out of 10 episodes, our skeleton passed just fine (average - 9955.66), but one episode .... episode 3 was not given to him (award 9870). It was this mistake that led to a fall in the total score to 9947 (-8 points).
And finally - do not forget about banal luck. Do not think that this is a controversial point. On the contrary, a little luck strongly contributes to continuous work on yourself: even if the probability of luck is only 10%, a person who tried to participate in the competition 100 times succeeds much more than someone who tried only 1 time and abandoned the idea.
This year there have been a couple of changes. First, simply there was no desire to participate in this competition, I wanted to win it. Secondly, the team has also changed: Alexey Grinchuk, Anton Pechenko, and me. To take and win did not work, but we again took 3rd place.
Our solution will be officially presented at NeurIPS, and now we will limit ourselves to a small number of details. Taking the decision of last year and the success of off-policy reinforcement learning of this year (articles above), we added to this a number of our own developments, which we will tell on NeurIPS, and got Distributed Quantile Ensemble Critic, with which we took the third place.
All of our achievements –– distributed learning system, algorithms, etc. will be published and available in Catalyst.RL after NeurIPS.
Our team confidently went to the 1st place throughout the entire competition. However, the big guys had other plans - 2 weeks before the end of the competition, 2 big players came to the competition at once: FireWork (Baidu) and nnaisense (Schmidhuber). And if nothing could be done with Chinese Google, we managed to fairly fight for second place with the Schmidhuber team for a long time, losing only with a minimum margin. It seems to me pretty good for lovers.
I strongly recommend not trying to explain to the American who checks you that you are going to the conference because you are training virtual skeletons to run in simulations. Just go to the conference with a report.
Participation in NeurIPS is an experience that is difficult to overestimate. Do not be afraid of loud headlines - you just need to pull yourself together and start to decide.
And go to Catalyst.RL , what really.
Source: https://habr.com/ru/post/430712/
All Articles