Online courses, besides their convenience and accessibility, are famous for being unusually easy to score on, which many students do with success. Scoring listeners happens for a variety of reasons - the course is incomprehensible, the deadline is missed, did not have time to score points, Fallout 4 was released - everyone has his own excuse. But we can’t have excuses: if a person leaves the course, the world loses a potential developer or data analyst, and even kilowatt-hours and time spent by our hero.
The most difficult task here is to determine which of the users will run away, and knowing them, it is already much easier to prevent losses: “forewarned is forearmed”.
At the end of the article you will learn how to get to the hackathon by analyzing data using a solution to a problem.
Since the question of leaving the user is not uniquely defined, and no one can ever accurately predict the real outcome, methods of machine learning come to the rescue.
Usually, we hear about the use of machine learning methods to predict the outflow in the context of all kinds of banks and telecom. Similarly, educational projects have similar problems.
It would be very cool to learn how to prevent such situations - to anticipate cases when the listener is about to leave the course and, if possible, return it back with a reminder, advice, a cookie, or something else.
Stepik.org is a major Russian educational platform for online courses, which provided us with data for the task of predicting user outflow.
Speaking more strictly, we want according to the data about the user and his activity within the course of the course to determine whether he will finish the course. The words “complete the course” in this case should be understood as “gaining the necessary number of points for passing”.
Each course is a sequence of steps — minimal pieces of a lesson, each of which can be “visited” and “passed through”. Some steps will be completed immediately when you visit - for example, theoretical material, for others it is necessary to perform some task, it will be automatically checked, and points per step will be counted only if the correct answer.
To provide greater freedom in analytics, searching for signs that affect whether a user finishes a course, user data is provided in as detailed a form as possible: indicating the time when the user performed each action — opened the video, sent the code for review, or answered the test questions.
The traditional way to predict a user's departure is, to begin with, to try to find those signs that distinguish those who have completed the course from those who have fallen off.
Examples of such signs:
It is usually better to use several attributes at once, combining them together with some rule (for example, decision trees). It is worth expecting that even a few of the combined features will not give perfect predictions of user care.
To compare different approaches, you should use formal quality metrics. An example of such a metric is accuracy - the proportion of cases where your algorithm correctly guessed whether the user will pass the course. Another example, f1_metrics, is the average between precision and recall, the two main characteristics of a quality classification. Precision - "accuracy", the proportion of true examples of true marked, among all true. Recall - "recall", the proportion of the same true and properly marked examples, but among all the noted examples.
Kaggle.com platform was chosen for the task, where the competition was launched, which, in addition to the great goal of teaching the lost to the true path of enlightenment, is the third part of the qualifying stage for the GoToHack hackathon final . This is a three-day event for students and schoolchildren, which will be held in December 2016 with the support of RVC.
In addition to the mentioned competition, potential participants of the hackathon were offered two simpler tasks on the same data that were given to selectors for the beginners. The first blitz task is entirely educational, but the second, like a competition, has useful observations on how to improve the course.
Sometimes it happens that students return to the steppes several times: either there was something very interesting, or the material was difficult to understand from the first time. It is important to find such blocks in order to define too complex parts of the course, after which the listener may stop understanding the material. Actually in the second blitz it is necessary to find out the most “returning steps”. More information about the blitz can be found on the selection page.
Now we will tell about the basic solution for the outflow prediction problem. In the simplest case, it was chosen to simply take into account the number of steps covered to the current moment. Thus, for each user one number is considered, which is used for prediction. Baseline is extremely simple and will be understood even by people who are not very familiar with machine learning. The competition itself is available after accessing the secret link . Until the end of the competition even more than a week, so that all of you still have a chance to participate. We hope that the participants will not only pass the selection for the hackathon, but also show the really applicable result on real data.
This is the second GoToHack data analysis. The first was successfully held in February 2016, it is time to grow. More participants, more solid prizes, older participants. This time we invite schoolchildren and students up to 20 years old in one of two streams (those who are older should look into the very end of the article). Beginners will be given a master class in data analysis and machine learning, while the advanced will immediately rush to fight prepared datasets or implement their own idea.
Speaking of datasets. Hackathon is dedicated to the topic of education and HR, therefore the tasks from the partners will be relevant. For example, HeadHunter will provide a base of their vacancies and some summaries, while SkyEng will offer a time series of user actions and voice recordings of lessons.
In general, we are waiting for all concerned December 9-11 in Moscow. Hurry up, there is a week left until the end of the selection. Among the prizes are not only gadgets, but also, for example, training at the GoTo project school or summer school on Bayesian methods in in-depth training or participation in the NTI Olympiad final with bonuses for admission to universities.
Applications are accepted until November 27, you can get acquainted with the tasks here .
By the way, everyone who is over 20 years old and has real experience in the industry, we invite you to become curators / consultants of teams at our hackathon. Read more here .
Source: https://habr.com/ru/post/315828/
All Articles