In my experience of communicating with developers who have ever participated in an open data contest, they all say that data is needed at the highest possible level of detail.
For example, not statistics by region, but statistics by municipalities. Not a summary of crimes / accidents, but information with addresses and coordinates.
Not just addresses of institutions with coordinates, but detailed information about each.
While such detailed data, frankly, in a convenient form a bit. If we take Moscow as an example, even on the Moscow portal
data.mos.ru most of the data is geodata or geo-referenced data in the form of an address and some other minimal information. It is clear that to do something really interesting with them is difficult. Therefore, let us say thanks to the Government of Moscow for the fact that they at least revealed this and will try to understand where to get more interesting data and what to do with it.
')
Contests and competitions
When asked why this should be answered right away - it is impossible to hold any competition / hackathon / competition for developers without having enough interesting data. We encountered this on the
Yandex hackathon , the last Apps4Russia contest and many others.
So now, because we are helping in the preparation of the
API Challenge competition, we have decided to prepare as much useful data as possible. And since the API Challenge is a competition coming from the power of Moscow and focused on Moscow, we collect data in Moscow.
In order to achieve this, we began to look through dozens of state sites and are looking for something that can be used legally and with benefit.
How it happened and continues
First you need to understand where to look for the data. The universal formula consists in 4 directions.
- Official websites of authorities
- Sites of territorial divisions of federal bodies (FSIN, Ministry of Justice, Ministry of Internal Affairs and others)
- Sites of state-owned enterprises and state-regulated monopolies
- Sites of municipalities
The last point to Moscow is weak, and then only to the new territories, but all the rest are completely exist and accessible.
We looked through the sites of all the departments after finding a list of interesting data on
www.mos.ru there is not that little, but enough. Some of the data from what they have already published on data.mos.ru, while others require substantial efforts to extract them from PDF documents, for example,
Mosomcomonitoring reports are large PDF documents that you cannot translate into manual data.
Further on the sites of territorial administrations of the federal government. In Moscow, as in all regions, there are representative offices of a large number of federal bodies since in our country many functions of power are divided between the federal government and the regions. In particular, the Ministry of Internal Affairs refers to the federal government, the Federal Penitentiary Service, the bailiff service, the Prosecutor's Office and much more. We looked through many of their sites by finding a list of them first on the website of the Government of the Russian Federation, and then going through each one and finding a section in Moscow.
And finally, the data on state-owned enterprises and regulated corporations are the most complex in terms of their use. The fact is that natural ones are obliged to publish many data according to the orders of the FAS and the Federal Customs Service and these data are only Public Domain, there are no restrictions on them. Usually these sections on the sites are called "Information Disclosure." According to other information on their sites there is no unambiguous legal clarity / understanding - here the city’s policy is needed in regulating its openness. Nevertheless, for the competition of developers such data is quite suitable in the case of their high social value.
What we found
I will list the data immediately with links to the arrays that we extracted and which can be downloaded and immediately used.
All the data that we collect we post on our
Hub of open data . This is an open, non-commercial project made similar to
thedatahub.io from the Open Knowledge Foundation. All that is placed on it will always be open and the portal allows those who wish to download at least all the data through the CKAN API.
Register of lawyers
These data are posted
on the website of the Ministry of Justice of Russia - Moscow office.
We dumped them and converted them to JSON, CSV and XLS with normalized fields. Now the data can be downloaded here -
http://hubofdata.ru/dataset/mosadv
Register of notaries
Data, again,
from the site of the Ministry of Justice .
With them, exactly the same story - this is an XLS file initially, we just pumped it out, processed it in OpenRefine and converted it to JSON, CSV and posted it here -
http://hubofdata.ru/dataset/mos-notary
Moscow prisons
A very small list of prisons is available on the FSIN website in Moscow -
http://www.77.fsin.su/structure/
Very simple parser it was turned into all the same formats JSON, CSV, XLS and posted here -
http://hubofdata.ru/dataset/mos-prisons
Contact Mosgaz units on the streets
If the previous 3 arrays belonged to state data from federal authorities, the next array is the data on Mosgaz contacts, which is an enterprise in Moscow and regulated by laws and information disclosure orders.
Mosgaz has a section in which you can enter the street to find out the contacts of its units. Here it is
http://www.mos-gaz.ru/services/territory/
Since this section turned out to be a fairly simple AJAX code - it turned out in a short time to extract all contacts and all divisions, and we posted a large array of contacts
http://hubofdata.ru/dataset/mosgaz-contacts in which there are files with street binding to districts and files with subdivisions to districts.
Addresses of TPP, hydroelectric station and state district power station Mosenergo
The site of Mosenergo, one of the natural monopolies of Moscow, has the addresses of their CHP, HPP and GRES -
http://www.mosenergo.ru/catalog/228.aspx this list is rather small, but useful for all who are interested in such data.
It was easy to parse and post it here -
http://hubofdata.ru/dataset/mosenergo-filials . These data are useful for everyone who decides to make applications on the environmental situation in Moscow and, I will say straight away, we still managed to process not all the data of Mosenergo. They have a lot of public reports in the section “
Statistical report on the 2TP-air form ” there is a lot of data in XLS format for each of the stations about how much waste they are throwing out. Maybe someone will be ready to collect them and put them together.
Addresses and characteristics of branches of the Russian Post
Russian Post is not an authority, but a state enterprise is often criticized in terms of quality of work. They have data on departments, in particular, they publish them on several of their websites, the main of which is
their website .
We pulled the data on their offices in Moscow with information on the coordinates of their location, addresses, indices, work time and so on. These data could not be packaged in CSV in a simple way, so that they are available in a single JSON file
http://hubofdata.ru/dataset/ruspost-msk
Noise complaints
On the site of the previously mentioned Mosecomonitoring, a small but curious array of data on complaints of city residents was revealed. Here
http://www.mosecom.ru/noise/territ/noise_stroy_pl_2013.php collected these complaints and they even have information about the address, and that is, they can be superimposed on the map if desired.
We also pulled this data with a parser and uploaded it to the hub -
http://hubofdata.ru/dataset/msk-noise-req
Addresses of non-profit organizations
And here came the largest data sets. In this case, we looked at the website of the Ministry of Justice and found that in the register of non-profit organizations they can be obtained by region. Here -
http://unro.minjust.ru/NKOs.aspx .
In fact, we did it a long time ago, at the beginning of this year, and the data "gathered dust on the shelf." Now we have converted them into convenient for work formats and laid out on the hub -
http://hubofdata.ru/dataset/mos-nko-2013
Please note that the data is divided into types of organizations. In case you want to work separately on religious organizations and separately on the rest.
Base of houses of Moscow with reference to constituencies and with dates of construction
And finally, data that may come in handy most. On several sites found detailed data on each house in Moscow. These are sites such as dom.mos.ru, gorod.mos.ru, reformazhkh.ru, mosgorizbirkom.ru and several others.
We did not have time to process them all and realize the dream of putting all the data on houses into a single database, but we took the first step - we disassembled several bases and made their further integration possible.
Now available:
This, of course, not all. More data and we will regularly upload them to the hub.
On github'e laid out all the script code that we use
https://github.com/infoculture/mosopendata
As a summary of what conclusions and suggestions:
- All that we are now collecting and parsim in Moscow, we will offer officials from DIT officially disclose. I think that they will not refuse, since the data is already clear where to look. In any case, in those data that are under the jurisdiction of the Moscow authorities, with the federal authorities, it will be necessary to request the federal authorities longer.
- You can easily do the same in your favorite region or city and make a portal to the open data of the city or download us to the hub or elsewhere for public access.
- Participate in contests and competitions. And in that I have resulted above, and in all that will be. This is not only an opportunity to test your skills, but also to get a weighty prize.