📜 ⬆️ ⬇️

Creating your own hub to publish open data

The theme of open government and open data is increasingly gaining momentum and gaining popularity among many countries of the world, their governments and organizations . In addition, quite recently a law on open data was passed in Russia, which indicates a growing interest in this topic. In Ukraine, too, the government is moving towards the publication of open data. Actually, since it is popular, you can make money on it or take part in the fashion movement . In addition, contests , festivals and hackathons for creating websites and applications for publishing open data are held annually.

Open data is a way of presenting publicly available information in machine-readable form. In a form in which developers can download them into databases, analyze and present in a much more visual and understandable form than how it is done in state systems.

I would like to share my personal experience in creating a site for publishing open data. I used the open source platform CKAN . It’s up to you to go in a similar way, use another platform or write your website from scratch. I hope my article will help you make the right choice.

CKAN is a data management system that makes them available through tools that simplify their publication, distribution, search, and use. More than 50 countries, organizations and cities have chosen this platform to publish their data. Among them are the UK, USA, Czech Republic, Australia, Brazil and others. In general, the list is impressive. The platform itself is written in python. Here is a detailed article in English. Here is a detailed article in Russian.

CKAN installation


At this address is a detailed instruction on installing the platform. True, not everything works as smoothly as described there. I spent a fair amount of days to sort out and install the platform. In turn, the developers offer paid conditions for the installation, hosting and maintenance of the platform. Previously, they posted prices on the site, but now they are not. However, we are interested in CKAN as a free platform. You can also fork this project if you wish. And this is one of the most popular forks - the open data hub of the UK government.
')
You are offered two ways to install the platform: installing by package or installing from source. The first way saves a huge amount of your “nervous” energy. But it will suit you only if you have a suitable system. At the moment it is Ubuntu 12.04 (until recently it was - 10.04). Here on it and I recommend you to put this platform. If you are confident in your abilities or you already have a configured system and do not want to give it up, then the project wiki will help you. My experience is OpenVZ Ubuntu 12.04.

So the first way is batch installation. I did not succeed in it, for the reason indicated above (inconsistency of OS versions). But even here I can give you a couple of tips. Since this was my first experience of administering a virtual server (and indeed administration), my advice may seem like experienced (bearded) admins are childish, but for beginners, I hope, will be useful.


!!! Pay attention to the version of the installed platform. CKAN is currently being translated into more than 30 languages ​​of the world, but with different success. Translation is done by volunteers. And each new version is released with a different set of translations. Check at this address the translation status of the version you intend to install. I had to participate in the translation of the Russian and Ukrainian locale (ver. 2.0 - 2.1), since the translation was not ready. Translation is carried out on the site transifex . You have a choice - either to install the latest version, which has a translation, or to participate in the translation. Translation status of the Russian locale.

Installing CKAN from Package


1. Install the CKAN Package

We do everything according to the instructions. If no errors - go ahead, if errors - go to the second method. This rule works for all items. But first check the essence of the error - maybe it is you or the server settings.

2. Install PostgreSQL and Solr

Before installing the database, we should give ourselves the right to overwrite the stack / dev / null, otherwise we get the error / dev / null: Permission denied.
Fix simple - we get root rights and fix:
# rm /dev/null && mknod -m 0666 /dev/null c 1 3
Checking:
# ls -la /dev/null
rights should look like this:
crw-rw-rw-
After installing PostgreSQL, you must set the locale and text encoding. Install languages ​​into the system:
apt-get install language-pack-ru-base (apt-get install language-pack-uk-base)
Stop the database:
pg_dropcluster --stop 9.1 main
And install the locale itself (note that all databases will have the same locale):
pg_createcluster --locale ru_RU.UTF8 9.1 main (pg_createcluster --locale uk_UA.UTF8 9.1 main)
We reboot and check - now the databases should have the locale and encoding we need:
reboot
sudo -u postgres psql -l

Developers recommend installing the solr-jetty package. But, according to my observations and experience - it does not work. I do not know why. I tried everything, but it does not work. I had to go around. If you are unable to run the sorl native method, then catch the fix:
Assign the value of the latest version of jetty :
JETTY_VERSION=7.6.10.v20130312
Take her:
wget download.eclipse.org/jetty$JETTY_VERSION/dist/jetty-distribution-$JETTY_VERSION.tar.gz
Unpack:
tar xfz jetty-distribution-$JETTY_VERSION.tar.gz
We take the latest version of sorl:
wget apache-mirror.telesys.org.ua/lucene/solr/3.6.2/apache-solr-3.6.2.zip
Unpack:
unzip -q apache-solr-3.6.2.zip
Go:
cd apache-solr-3.6.2/example/
Run in the background sorl:
nohup java -jar start.jar&

Clearly follow all the instructions in the manual, and soon you will see a working site.

Now the second way, if you do not have Ubuntu 12.04
Once again I pay attention to the wiki on installing CKAN.

Installing CKAN from Source


1. Install the required packages

We are offered this set of packages:
sudo apt-get install python-dev postgresql libpq-dev python-pip python-virtualenv git-core solr-jetty openjdk-6-jdk
I recommend you install the following set (do not forget apt-get update and about / dev / null (described above)):
sudo aptitude install python-dev postgresql-9.1 libpq-dev python-pip python-virtualenv git-core openjdk-6-jdk curl nginx gcc bcc tcc

3. Setup a PostgreSQL database

+ additional configuration described above

5. Setup Solr

described above

9. You're done!

You are offered a code:
paster serve /etc/ckan/default/development.ini
My suggestion for running in the background is:
nohup paster serve /etc/ckan/default/development.ini&

For testing on a local machine, the steps are sufficient. But if you want to transfer your platform to the server, then here I will also give you one piece of advice.

Deploying a Source Install


My good advice (for which many thanks ibegtin ) sounds like this - use Nginx. This will greatly speed up your site. Here there is a great instruction on how to install the paster + Nginx bundle. She really helped me to solve the problem with platform virtualization in this way.

In all other respects just follow the instructions, and everything will work out for you. If you have any questions, you can ask them to me or write to the developers . You can also subscribe to the newsletter or follow the development of the project on twitter .

Useful resources


CKAN Storage Extension for Google Refine
Integrating CKAN and Drupal

Sites on the CKAN platform


List of sites working on this platform
Directory site running on CKAN that collects data about existing data hubs.
Hub of open data in the Russian Federation
Hub of open data in the Russian Federation on the activities of law enforcement authorities
International hub , working on the CKAN platform. You do not need to create your hub. You can upload any open data here and use api or link to this resource. The choice is yours. Good luck!

Source: https://habr.com/ru/post/186708/


All Articles