Open data on public services of the Russian Federation

I am sure that many of you, and perhaps all have already come across a site of public services .
What I observe, one way or another, is it good or bad, but there is interest in it.
However, in order to fully realize this interest, I personally believe that open data is necessary.

And there is such open data. Albeit not provided by the Ministry of Communications, but rather extracted from the site of public services by a special parser, but they exist.

For example, this data allowed me a month ago to get some interesting figures on the analysis of organizations on this site and their contacts.

')
I will quote from that post:
On the website of state services 19989 registered state organizations.

All organizations have 6730 unique email 'addresses (for some structures, the addresses are duplicated, so we consider only unique ones). Of them:

- 412 (6%) - filled incorrectly, do not pass validation.
- 59 (1%) - indicate non-existent domains
- 1517 (22.5%) are free email addresses such as Mail.ru, Google Mail, Yandex.Mail and Rambler Mail.
More details for each:
- 982 (64.7%) - Mail.ru
- 305 (20.1%) - Yandex.Mail
- 118 (7.8%) - Rambler Mail
- 112 (7.4%) - Google Mail
- 30 - HotMail (1.97%)

However, I looked at all this on one side only and I am quite sure that there are much more problems there. For example, in many cases, completely incorrect contact numbers, a huge number of organizations without places of service, many organizations in general are not connected to services, most of the organizations do not have contacts, and so on.
Surely, many of you will be able to find there interesting data for visualization and analysis.

And the data itself is available in formats suitable for use in MongoDB :
- in JSON format through Mongoexport - http://export.opengovdata.ru/raw/gs_json.7z
- in BSON format through Mongodump - http://export.opengovdata.ru/raw/gs_bson.7z

The array is more focused on analyzing organizations, rather than public services, so the main table there is orgs. There are also several auxiliary tables through which the statistics on domains, email addresses and so on were considered.

Data structure description is as follows.

Collection orgs - organizations

_id - unique organization code in the system, Mongodb identifier
key - the unique code of the organization on the website Gosuslug
name - organization name
url - link on the website of state services
level - the level of organization subordination
parent - parent organization code, if any
profile - an array of arrays of 2 lines each with a list of fields from the organization's profile
childs - subsidiaries in the form of a dictionary
childs.num - the number of organizations
childs.list - list / array of organization codes
services - dictionary with description of services provided by this organization
services.exists - the flag of the existence of the service block of the organization
services.items - an array of service dictionaries with name and url fields
suborgs - dictionary of subordinate organizations
suborgs.exists - the organization’s flag
suborgs.items - an array of service dictionaries with the fields key, name and url
unknown - the block of "unknown" page in the form of a dictionary. Present only if there are no other blocks.
unknown.exists - the flag of the existence of the service block of the organization
unknown.items - array, always empty
contacts - organization contact dictionary
contacts.exists - the flag of the organization’s existence of a block
contacts.items - an array of strings with contacts
places - dictionary of service locations
places.exists - the flag of the organization’s existence of a block
places.items - an array of strings describing the places of service

Collection pages - pages

_id - unique code in the system, Mongodb identifier
url - link to the requested page
rurl - the url of the page after the redirect from the site of public services
page - a piece of HTML code page content.

The domains collection is the domains of sites (based on data on email addresses)

_id - unique code in the system, Mongodb identifier
domain - domain
has_a - flag presence of A record in DNS
a - an array of dictionaries with a name field and a list of query results A to DNS
has_mx - flag of having MX record in DNS
mx - an array of dictionaries with fields name (server name), l2_dom (second-level domain server), priority (priority) and a list of MX query results for DNS

Collection mx_servers - mail servers

_id - unique code in the system, Mongodb identifier
domain - mail server domain
l2_dom - second level domain
num_domains - the number of domains using this MX server
domains - an array of domains using this MX server

Collection emails - email addresses from contacts of organizations

_id - unique code in the system, Mongodb identifier
email - email address
domain - email address of alresa
parsed - the flag that the email address is parsed
valid - the flag that the email address is correct
has_a - flag presence of A record in DNS
a - an array of dictionaries with a name field and a list of query results A to DNS
has_mx - flag of having MX record in DNS
mx - an array of dictionaries with fields name (server name), l2_dom (second-level domain server), priority (priority) and a list of MX query results for DNS

Collection services - government services
the description is still incomplete, the services have only names and bindings to organizations

_id - unique code in the system, Mongodb identifier
name - the name of public services
url - link to the website of public services
num_orgs - the number of organizations
orgs - an array of codes of organizations providing this service

And also, those of you who think about how you can work with this data, I suggest to pay attention to the catalog in OpenGovData.ru which data you can try to use to improve / analyze data on public services.

I can also send the code for retrieving and parsing data from the state services to those who wish. I will soon post it, in any case, in the public domain, but so far it is not particularly ready for publication - without comments and explanations.

Source: https://habr.com/ru/post/117565/

All Articles

Open data on public services of the Russian Federation

More articles: