📜 ⬆️ ⬇️

Open data on public services of the Russian Federation

I am sure that many of you, and perhaps all have already come across a site of public services .
What I observe, one way or another, is it good or bad, but there is interest in it.
However, in order to fully realize this interest, I personally believe that open data is necessary.

And there is such open data. Albeit not provided by the Ministry of Communications, but rather extracted from the site of public services by a special parser, but they exist.

For example, this data allowed me a month ago to get some interesting figures on the analysis of organizations on this site and their contacts.

')
I will quote from that post:
On the website of state services 19989 registered state organizations.

All organizations have 6730 unique email 'addresses (for some structures, the addresses are duplicated, so we consider only unique ones). Of them:

- 412 (6%) - filled incorrectly, do not pass validation.
- 59 (1%) - indicate non-existent domains
- 1517 (22.5%) are free email addresses such as Mail.ru, Google Mail, Yandex.Mail and Rambler Mail.
More details for each:
- 982 (64.7%) - Mail.ru
- 305 (20.1%) - Yandex.Mail
- 118 (7.8%) - Rambler Mail
- 112 (7.4%) - Google Mail
- 30 - HotMail (1.97%)

However, I looked at all this on one side only and I am quite sure that there are much more problems there. For example, in many cases, completely incorrect contact numbers, a huge number of organizations without places of service, many organizations in general are not connected to services, most of the organizations do not have contacts, and so on.
Surely, many of you will be able to find there interesting data for visualization and analysis.

And the data itself is available in formats suitable for use in MongoDB :
- in JSON format through Mongoexport - http://export.opengovdata.ru/raw/gs_json.7z
- in BSON format through Mongodump - http://export.opengovdata.ru/raw/gs_bson.7z

The array is more focused on analyzing organizations, rather than public services, so the main table there is orgs. There are also several auxiliary tables through which the statistics on domains, email addresses and so on were considered.

Data structure description is as follows.

Collection orgs - organizations


Collection pages - pages


The domains collection is the domains of sites (based on data on email addresses)


Collection mx_servers - mail servers


Collection emails - email addresses from contacts of organizations


Collection services - government services
the description is still incomplete, the services have only names and bindings to organizations


And also, those of you who think about how you can work with this data, I suggest to pay attention to the catalog in OpenGovData.ru which data you can try to use to improve / analyze data on public services.

I can also send the code for retrieving and parsing data from the state services to those who wish. I will soon post it, in any case, in the public domain, but so far it is not particularly ready for publication - without comments and explanations.

Source: https://habr.com/ru/post/117565/


All Articles