📜 ⬆️ ⬇️

API for the Russian public initiative. Step 1: data collection and analysis

As an introduction


You all remember this phenomenon as the Russian Public Initiative ( www.roi.ru ) - an initiative to collect signatures for online petitions proclaimed by the state represented by the federal government. It is assumed that if in 1 year 100 thousand votes will be collected by petition, then the petition will be officially considered by our authorities. And even has a chance to get the status of the bill.

At the same time, 6 petitions have already passed such a filter - https://www.roi.ru/complete/ two of them have passed the current collection of 100 thousand votes each and 4 petitions that collected votes are much smaller, but the authorities managed to react.

And, although the petitions do not guarantee that a decision will be made at all, many create them not only in the hope of a positive decision, but also to put the problem on the “media agenda” in other words, so that the media can write about it and there would be a public state reaction.
')
Therefore, the ROI, so far, is not the last of the state projects and there is interest in it. At the same time, ROI has a number of drawbacks and problems.

ROI problems



Authorization through ESIA (Gosuslugi)

Many have already written about this - authorization, of course, led to the fact that hundreds of thousands of people began to register on state services to be able to vote, but it is a barrier for anyone. It is not so simple organized and so far not all citizens have such registration. It would be possible to organize online registration with reference to a mobile phone number for example.
This is a limitation that we can not yet overcome.

Open Data and APIs

ROI are interested in many people not only from the point of view of their petitions, but also of petitions in general. Petitions are interesting material for all who want to understand what excites citizens and what problems affect everyone the most.
Open data is needed for many tasks:


We collect data



Before starting to make a full-fledged API for the POI, I began by modeling the collection of information from there and wrote this short document on Github — the API for the POI

Where pre-painted the basic concepts that are in the system and which can be extracted theoretically.
And immediately revealed the limitations:
  1. Data on votes / against is available only to authorized users. Considering that authorization through public services - this imposes certain restrictions. Authorization, of course, is surmountable, but “head on” for the time being we are collecting data in which there are no such restrictions.
  2. The data is divided between the petition description on the petition page and the petition list. In the list there are data on votes for, and on the page, as I already wrote, data on votes is only with authorization.


In order to download the data a small script was written that pumped data from the list of petitions and from their pages, and then put them into one general description. MongoDB was used as storage. Here it can be downloaded and viewed - github.com/ivbeg/apiroi/blob/master/scripts/data_extract.py
The script is as simple as possible and of course, then it will be thoroughly changed in order to regularly update the petitions and immediately bring them into a single format.

The data were collected rather quickly - it took just a few hours. I will not go into the details of how the parsers write - this is a very simple case, there are no surprises.
The data obtained is now available on Github'e here github.com/ivbeg/apiroi/tree/master/scripts/data/raw and on the hub of open data - hubofdata.ru/dataset/roi-dump

So, the data is collected, what next?

Analyzing data


I called this post a post about API, because the ultimate goal is to get it. However, while we are doing it, we can understand how to make the API in the most convenient way and whether it is not necessary to include any data there and create additional data slices based on the collected data. The API, after all, may not be just an API for returning data; an API can perform many more tasks.

To begin with we will think that we can extract from our data convenient for visualization. Suppose that API consumers are media and those who want to present them visually.
Here are some thoughts about what might be interesting:
1. Understand the likelihood of an initiative to collect 100 thousand votes.
2. Evaluate the intensity of voting for the initiative.
3. To determine the most "voted authors"
4. Identify the most requested topics.

Actually, in order to start defining this all - the data_process.py script was written, it is also on github and the indicators above were calculated with it.
In the data folder - refined - are the results of preliminary calculations in JSON.

How to assess the likelihood of the initiative? Ideally, it is desirable to have details on the voting statistics for the entire lifetime of the initiative and by day, but we do not have an ideal situation and such detailing is available only to the authors of the initiatives.
In the meantime, the prediction formula is very simple. Calculate how many people can potentially vote by the formula:
votes + (votes / (probe_date_seconds - start_date_seconds)) * (end_date_seconds - probe_date_seconds)


In other words, everything is considered from the assumption that people will vote also as they used to vote earlier and the distribution of votes will be approximately equal. This, of course, most likely is not so and much depends on the media activity of the initiators, but the initial approximation gives.
And the first analysis showed a picture of what is shown in the screenshot.


Turns out that:


Or the same picture


From what I conclude that it is useful to include many additional features in the API:


It is a pity that the creators of the ROI themselves do not make any efforts to make the POIs open in terms of API and data.
But due to the fact that the first step has been taken - the first upload of data is and there are examples of a script for upload, now anyone can make such an API. And in subsequent posts I will write more about this.

Source: https://habr.com/ru/post/200682/


All Articles