Final stage: individual and command parts

And this is the schoolchildren, whom he repeatedly
won in football, met in
the GOTO camps and on the
hackathon .
The final stage of the Olympiad consists of two parts: an individual solution of problems in subjects (mathematics, computer science) and a team decision engineering problem. For an individual solution of the problems is given for 2 hours on one subject. Problems in mathematics and computer science are common to parallels grade 9 and 10-11. The solution to each problem gives a certain number of points (see evaluation criteria below). In mathematics for each task you can get from 0 to the specified number of points in accordance with the described criteria.
Informatics points are credited in full for the correct solution of the problem.
Solving computer science tasks involved writing tasks in Python. Participants receive a score for solving problems in the aggregate in all subjects of this profile (mathematics and computer science) - in total from 0 to 24 points.
Maths task 3.1.1v (2 points).A group of psychologists has developed a test, having passed that, each person receives an assessment - the number Q is an indicator of his mental abilities (the more Q, the more abilities). The country’s rating is taken as the arithmetic average of the Q values ​​of all the inhabitants of this country.
The group of citizens of country A emigrated to country B, and the group of citizens of B - to country B. As a result, the ratings of each country were higher than the initial ones. After that, the direction of migration flows changed to the opposite - some of the inhabitants of B moved to B, and some of the inhabitants of B - to A. It turned out that as a result, the ratings of all three countries increased again (compared to those that were after the first move, but before second). (So, in any case, the news agencies of these countries claim.) Could this be (if so, how, if not, why)? (It is assumed that during the considered time Q citizens did not change, no one died or was born.)
Math Problem 3.1.2 (6 points)The administration of the social network VKontakte decided to create a community "All those who have less than half of their friends are in this community." To do this, they need to be included in the user community so that in the end:
- everyone in this community has less than half the friends in it;
- everyone who is not in this community has at least half of the friends in it.
Do they always manage to create such a community?
(It is assumed that users do not themselves join the community, but are distributed by the administration of the social network)
The solution is
here on page 191 .
The task of computer science 3.2.4 "Final certification" (3 points)The end of the year is a hectic time not only for schoolchildren who are preparing for exams, but also for the compilers of the exam tasks. When composing any test, it is necessary to take into account how difficult the task will be for students, and determine how many students will pass the test successfully.
This year it was decided to hold a test exam, inviting 100 students from different schools to solve 5 problems. Each task is scored ai points. The problem is either solved for a full score, or not solved at all, which means points are not awarded for it. Partial decisions are not taken into account. After the exam, the compilers received the results of the students. For each student, the results of testing all tasks are known.
You must calculate how many students will receive at least K points if the exam will pass 1,000,000 students.
Please note that it is impossible to reliably find the probability to solve a specific set of problems, but we will assume that it is possible to reliably estimate the probability to solve one problem.
Input format:The first line contains the number K - the number of points needed to pass the test. The second line of 5 natural numbers - points for the problem. The first number corresponds to the points for the first task, the second - for the second and so on. This is followed by 100 lines. Each line contains 5 numbers indicating whether the corresponding problem is solved by number or not. In the first place, the line indicates whether the first task is solved, if the second one is solved, and so on. If the problem is solved, then the line will contain 1, if not - 0.
Output Format:In a single line print the expected number of people who successfully pass the same test if it is solved by 1,000,000 schoolchildren.
The solution is
here on page 200 .
Command part

Formulation of the problem.
The participants in the command part of the final stage needed to solve a series of tasks on analyzing the graph of social network users: predict the age of users who did not indicate it in their profile; predict the region of residence of the user; guess who of the other users of the social network is familiar to the user.
Participants had to write programs in the Python language. The duration of the command part of the final stage is 3 days (18 astronomical hours in total). Participants had access to the Internet and could use their phones and laptops.
All teams were offered 3 tasks - one for each day. The condition of the problem became known to the participants in the morning of the corresponding day. For each task, two subgraphs of the real Odnoklassniki social network were prepared:
- the participants were presented with a specially prepared, purified and anonymized subgraph;
- The quality check of the solution was carried out automatically on the full graph, in which there were data cleared from the first graph.
For each task, the participants were provided with a working basic solution with low efficiency, and the participants were faced with a choice: to program their own solution from scratch, which could solve the posed problem more qualitatively, or refine the proposed solution. In this case, it was possible to use the basic solution in part, for example, only the data model or only the input data recognizer.
Description of the source dataIn all tasks, the participants were provided with a user graph (links between users) and a file with demographics (anonymized data for each user).
User graphThe graph is saved in a sparse matrix format, where for each link there is information about its type (relative, friend, etc.) in the form of a bit mask. Each row of the matrix corresponds to the friends of one user and has the format:
User_ID1 {(ID_druga1, mask1), (ID_druga2, mask2), ...}
The matrix is ​​partitioned by user ID into 16 files, each of which is compressed with the standard GZip compression protocol.
Pairs in the list of links are sorted by friend ID (ascending). Example of records from the graph:
102416
{(5362439.0), (7321627.0), (7345280.0), (9939258.0), (9976393.0), (11260492.0),
(11924364.0), (16498676.0), (16513827.0), (21716731.0), (21826340.0), (23746537.0),
(23751503.0), (24412936.0), (24423533.0), (30287856.0), (32321147.0), (34243036.0),
(37592142.0), (39485706.0), (41505243.0), (42791620.0), (52012206.0), (52671472.0),
(54652307.0), (57293803.0), (59242794.0), (59252048.0), (62535397.0), (62563866.0),
(62567154.0), (64588902.0)}
102608
{(4167808,32784), (6019974.322), (6152844.16), (9570536.64), (10699806.33),
(13290514.0), (15064491,128), (16432948.512), (24473204.0), (24655822.0),
(25833075,256), (28000951.64), (30834507.20048), (34567533,16), (35766667.0),
(37385121.0), (40123805.512), (43134386.1024), (45439608.0), (45484652.0),
(47562525.0), (52378153.256), (52403136.512), (52493894.1024), (53483990.0),
(54048767.0), (54286279.20048), (57401158.0), (57956631.0), (58183281.0),
(61117236,32), (61898065.0), (61936634.0), (64512205.512), (65014849.0),
(65112662.0), (65259449.0)}
The following bits can be set in the communication mask:
- Love
- Spouse or Spouse
- Parent
- Child
- Brother or sister
- Uncle or aunt
- Relatives
- Close friends
- Colleagues
- Classmates
- Nephew
- Grandfather or grandmother
- Grandchild or granddaughter
- Classmate
- Army Friendship
- Adoptive parent
- Adopted child
- Godfather
- Godson
- Playing sports games together
In addition to the listed bits in the relationship mask, the zero bit may or may not be set. This bit plays a purely technical role and has no physical meaning. As a result, for example, a child type relationship can be encoded with the numbers 16 or 17.
The data were prepared using the Apache Pig big data storage tool and contain two matching files with headers that allow participants to use this tool for preprocessing / filtering data.
User demographicsDemographic data is provided for the same million users as social link information in the attribute list format:
userId create_date birth_date gender ID_country ID_Location loginRegion
Where:
- userId - user ID
- create_date - the date of creation of the user account (the number of milliseconds from 01/01/1970)
- birth_date - user’s birth date (number of days as of 01/01/1970, may be negative!)
- gender - user gender (1 - men, 2 - women)
- ID_country - identifier of the country specified in the profile
- ID_Location - the region / city identifier specified in the profile.
- loginRegion - identifier of the region from which the user is most often authorized in this social network (may be absent!)
Sample data:44053078 1166032023073 3067 1 10414533690 2423601 99
12495764 1177932393270 1138
2 10405172143 188081
25646929 1165304175170 3756 2 10414533690 3953941 22
25646999 1160728984480 3884 2 10414533690 241372 120
12495833 1176909723643 3363 2 10414533690 2724941 11
Demography is partitioned in the same way as the graph, but not compressed (transmitted as open texts). It can also be processed using the standard Apache Pig big data storage tool or any other tool that supports CSV.
Tasks
Task 4.2.1 "Date of birth"The fragment of the social graph presented for analysis includes information about the connections of 100 thousand users who have fallen into the two-step neighborhood of hundreds of randomly selected users. Participants are provided with social network graph files with all links and a demography file, which contains data on users, including age, but not all users are age specified.
For users who are present in the graph, but not present in the demographics, you must set the value of their birth_date attribute (date of birth).
Data is written to the file in the format:(<polzator_id> \ t (tab) <birth_date>)
The calculated results of the participants are taken in the txt file and compared with the complete data by a specially written program that considers the discrepancy between the data of the participants and the real data. The smaller the discrepancy, the higher the team score.
The basic solution of the problem on
page 208 .
Task 4.2.2 “Region”The fragment of the social graph presented for analysis includes information about the connections of 100 thousand users who have fallen into the two-step neighborhood of hundreds of randomly selected users. Participants are provided with social network graph files with all links and a demography file, which contains data on users, including the region, but the region is not specified for all users.
For users who are present in the graph, but not present in the demographics, you must set their ID_Location attribute (region).
The answer is recorded in a text file in the format:
(<user_id> \ t (tab) <ID_Location>)
The calculated results of the participants are taken in the txt file and compared with the complete data by a specially written program that considers the discrepancy between the data of the participants and the real data. The smaller the discrepancy, the higher the team score.
The basic solution of the problem on
page 213 .
Task 4.2.3 “Finding Connections”The fragment of the social graph presented for analysis includes information about the connections of 1 million users, which fell into the two-step neighborhood of hundreds of randomly selected users. Participants are provided with graph and demography files by users. Some of the links in the social box provided are hidden and the task of the participants is to fully and accurately disclose them.
Hiding links affected only users from the original million, the remainder of dividing attributes whose ID by 11 is 7 (id% 11 == 7), about 10% of links for each of these users were hidden. Only the links to the original million were hidden.
In the forecast, it is enough to restore the presence of a connection, its type is not important. The results of the forecast must be presented in a CSV file of the form:
ID_1 ID_1.1 ID_1.2 ID_1.3
ID_2 ID_2.1 ID_2.2
ID ( ), ( , ). :
5111 178542 78754
18807 982346 1346 57243
(Normalized Discounted Cumulative Gain, NDCG), . , , . ,
, . - , 0.
216 .