
At the beginning of November, the machine learning and data analysis championship was launched under the code name
Telecom Data Cup , organized by Mail.Ru Group and MegaFon.
The competition is launched on the already well-known platform
ML Boot Camp . We try to hold most of our contests on data analysis. This competition is the second in a row this year and the seventh for the entire existence of the project. The previous championships are open in sandbox mode, which allows you to train at any time of the day or night in solving past tasks.
')
Over 2500 users have registered for the current competition, 1,700 people have downloaded datasets, 7,800 different solutions have been downloaded, and the
chat community has exceeded 1,600 participants. The competition will end on December 16, so it's time to join the fight club, if you are not already in it. We welcome and help everyone. Coffee or something your own and invigorating to help you :)
In the footer of the article you will find useful links and materials on this and previous competitions. The main thing is that now you immerse yourself in the world of the task of the Telecom Data Cup, which will allow you to quickly get involved in the process and enjoy the real research.
Briefly about the task
Those who are already aware of what is happening in the championship, can proceed to the next section.
We all get tired of the harassing telephone and online “marketers” surveys. Imagine how they call you and ask if you are watching TV now, which channel, how many devices are currently on, and which TV program is on them. God, I just want to hang up (so often we do). Users are indignant, completely reluctant to share feedback, which negatively affects the quality of the services provided. The problem requires a solution.
In this competition, you need to dive into the world of telecommunications so that, based on the anonymized user data provided by the telecom operator MegaFon and obtained from real live customers during surveys, it is necessary to predict whether the subscribers are satisfied with the quality of communication.
9443 subscribers were polled. The result of the survey is the satisfaction index for each subscriber, equal to zero (0 - satisfied) or one (1 - not satisfied). It is necessary to identify dissatisfied customers as accurately as possible.
ROC AUC is selected as a metric that evaluates your decisions. Prediction must be made for 5221 subscribers in the same order as in the
subs_csi_test.csv file. Data can be downloaded from the platform website. Preliminary results will be generated by answers for 2088 subscribers, and final results by answers for 3133 subscribers (40/60). The maximum number of decision downloads per day is 5, and the number of selectable solutions is 2.
Graali
The task aroused curiosity from the community. Participants choose different solutions. Some generate N models, view them, drain and drain, and ... voila - that's it. Others generate features, study the discipline "Information Systems and Technologies" in lectures posted in the
repository , and everything seems to be fine too. And some are hoping for a rand with good sid.
In order for the leaderboard at the end of the competition to acquire a more beautiful look, we want to share with you some of the Grail on the task.
Grail number 0.

Check out the
chat and
Github lecture repository. There is a lot of useful information. Many of us have little idea how the network works. Who seeks will always find! A short presentation was added to the repository with a description of the BS operation and a file with the distribution of features by services.


In the chat, participants are torturing the organizers. Trying to hold back, but difficult.

Grail number 1
In the proposed data, the
cell_lac_id field denotes one cell. Each cell belongs to only one communication generation: 2G, 3G, 4G (LTE). We recommend that you try to determine for each cell which generation it belongs to.
Grail number 2
Each phone has the maximum data transfer technology that it supports: 2G, 3G, 4G. Information about this is contained in the INTERNET_TYPE_ID field of the
subs_features table. The field is coded. Consider how you can determine which of the values ​​of this field corresponds to which technology.
Grail number 3
Please note: if some customer has a 4G-enabled phone, but in history we see that he often downloads traffic through 3G or even 2G cells, how can this fact affect his perception of communication quality?
Grail number 4
Customers have cells, on which they are often and regularly (home, work, road, shop, etc.), and cells, on which they are rare and small. What do you think the quality of which cells may be more important for the client? How can you identify important cells?
Grail number 5
In the
subs_bs_consumption table for Internet traffic, there is information both on the amount of data transferred (SUM_DATA_MB) and on the time spent on it (SUM_DATA_MIN). What information about customer experience on a cell can be extracted from this data?
Grail number 6
In the tables
bs_avg_kpi and
bs_chnn_kpi there is information about a large number of characteristics of cells both on average per day and per hour of greatest load (CNN), and with a history of several months. Try to select groups of cells that are similar to each other by these characteristics. Maybe there is a honeycomb, very different from the total mass? What happens to customers who are often on these comb?
At this Graali from the organizers ended. We are sure that they will help you in achieving the best rate at private chat. It does not work - download a random, you never know, take off on a t-shirt. All the fun ahead. At the end of the championship, the leaderboard will burn :) Remember the top five!

schedule
The end date of the championship is December 16, and December 22 is the award ceremony at the MegaFon office.
Gifts
1st place: 400,000 rubles;
2nd place: 200,000 rubles;
3rd place: 100,000 rubles.
Traditionally - the top 200 will receive T-shirts with the symbols of the championship.
In addition, there are special nominations:
- For the most "vzhuh" down on the privat - SSD Kingston 120 Gb.
- Each participant who has taken a place that is a multiple of 50 will receive a T-shirt with a sticker from the community pack.
Community
Join our Telegram community. You can always ask questions, get expert advice in the field of Data Science. The Mail.Ru Group championship community is networking where it's easy to find like-minded people.
useful links
- ML Boot Camp I ( Machine Learning Boot Camp - as it was ... )
- ML Boot Camp II ( ML Boot Camp 2016. New to the Top 10 , “Performance Evaluation.” Very simple ... )
- ML Boot Camp III. Binary data ( As we did ML Boot Camp III , the winning decision of the ML Boot Camp I ... contest , ML Boot Camp III: floor care prediction ... )
- ML Boot Camp IV. A problem with a secret ( ML Boot Camp IV. Fourth. Secret. T ... , ML Boot Camp IV. From 1 in public to 35 in p ... , Stabilization and Dirichlet processes in solving ... )
- ML Boot Camp V. Prediction of CVDs ( AgeHack is the first online hackathon to extend ... , ML Boot Camp V, solution history for 3 months ... , Meetup on the results of the ML Boot Camp championship )
- ML Boot Camp VI. The forecast of the audience response to the online survey ( ML Boot Camp VI. The forecast of the response of the auditor ... , History of the first place on the ML Boot Camp VI ).