📜 ⬆️ ⬇️

The study of the results of the exam, GIA and competitions for Moscow schools. What schools in which universities come

A month ago, I wrote about our participation in the hackathon according to open data .

After the hackathon, we did not stop at what had been achieved, as is usually the case, but continued to work. We had data on hand that only employees of the Ministry of Education probably had access to earlier: GIA results and victories in olympiads for 2014-2015 for 90% of Moscow schools. 55% of schools were able to collect data on the Unified State Exam for 2015. We have downloaded all the accounts of Moscow schoolchildren in Vkontakte, looked at what universities they indicate in their profiles after graduation.

Naturally, it was interesting to learn such a dataset. First, the trivial things that people from education probably know well:


For some schools there is data on the Unified State Exam for 2014, so you can try to look at the dynamics for two years:


')
For some schools, we have not only points on the exam, but also the number of people who have passed the subject. You can look at the popularity of disciplines. Most likely, people in the subject, and so they know:



I thought that the more popular the subject, the higher the average score for it. But it seems the opposite is true:


Now a little about the GIA. I thought that the better the school passed the GIA, the better in two years and points on the exam. It turned out that this is true only for Russian and mathematics and in part for social studies. Why so, who knows?



There was a hypothesis that preferences in subjects change. It is possible that those who donated, for example, physics in the 9th grade, do not necessarily give up physics at 11. But according to the GIA, we also have data on the number of people who give them up and the popularity of the subjects as a whole coincides with what we see for the USE:



Maybe it's in the tasks. If you order the items according to the average GIA score, the order will not be the same as for the exam:



Now about the Olympics. We have the number of winners of the Moscow and All-Russian Olympiads in all subjects. It was interesting to check whether the success at the Olympiads correlated with the average score on the exam in school:



For all schools known coordinates. Yes, it happens that there are several buildings, but for now we are looking at the legal address.



I had an idea that the closer the school is to the center, the better it is. But it seems that it is not. At least, the average score on the USE does not depend on proximity to the center:



Probably, some are now wondering where the data comes from and why they can be trusted. The results of the GIA and the Olympiad were kindly provided by the Ministry of Education. They promised that soon these data will be publicly available. The results of the exam in subjects, for some reason, are considered a great secret, so we had to collect them manually from the sites of schools. All Moscow schools are hosted on the portal mskobr.ru and all have a section " public report ". There is usually a link to a document where the school principal reports in an arbitrary manner for the past year. Naturally, all schools see the content and presentation of the report in different ways:



Therefore, we had to forget about automatic data collection. We took a cool tool to recognize tables in PDF documents - Tabula . I patched it up a bit and the data collection process looked like this:



After ~ 30 hours, all ~ 600 documents were processed. It turned out that only from ~ 55% it turns out to get data on the USE. Often, the data in the report is stale or the results of the Unified State Exam is not or not just the average score, but there is only, for example, the maximum. Then in ~ 300 schools, for which they managed to get points on the exam, letters were sent asking them to check the data. ~ 30 schools answered, 2 found errors, 5 sent points slightly overestimated regarding the report, the rest said “norms”. That is, there are no big problems with accuracy, there are problems with completeness. It is necessary to get points somewhere else for ~ 300 schools.

Then we started to contact. The goal was to determine from which schools to which universities most often enroll. The first step was to combine the official names of the schools with those used by the Contact. This is not so easy to do. Because, for example, we have “School No. 17”, and VK has “Evening School No. 17”, “Music School No. 17 named after them. L.N. Oborina "," Boarding school №17 ". In addition, the Contact allows you to receive only 1000 search results. If the school is listed in more than 1000 accounts, and for Moscow schools this is almost always the case, then you need to invent something. We split one query “school number 17” into several: “school number 17 girls from 6 to 14”, “school number 17 boys from 6 to 14”, “school number 17 girls from 15 to 17”, “school number 17 boys from 15 to 17 ”and so on. To queries to search, it seems, there is some kind of fuzzy limit. After ~ 50 calls, we were banned for ~ 1 hour. Anyway, after a couple of days, all accounts were pumped. An average of 1,800 people per school, of which ~ 450 indicate a university.


If we use this data as it is, in a strange way, 90% of Moscow students go to MSU. Therefore, the following sophisticated algorithm is applied: discard MSU. Yes, for example, for Lyceum No. 1533, from where 50% of people go to MSU, this algorithm does not work very well, but other approaches terribly worsen coverage for all schools. It remains, for example, not ~ 450 people, but ~ 45, to build on them the distribution by universities does not work. Those who studied in schools from the image, please write whether the histogram is true or not:


The rest of the school, you can try to search on obr.msk.ru

Source: https://habr.com/ru/post/270675/


All Articles