📜 ⬆️ ⬇️

MyDataSpace - data publishing service


I think many have heard about the benefits of open data. Here and there, open data saves the budget, helps businesses and much more. Nevertheless, the quality of public data still leaves much to be desired. This greatly hinders progress in this direction.


It is obvious that it does not make sense to wait when the state bodies start publishing data in a ready-to-use form. In addition, open data is not limited to government data.


Therefore, I present to your attention the project MyDataSpace , which is designed to make open data more accessible to everyone.


MyDataSpace is a data publishing service. Anyone can publish data for free (under a free or own license) and anyone can access it via a web interface or API.


The task of the service is to provide reliable data storage and fast (close to real time) access to it from anywhere in the world via an API. MyDataSpace can be compared with Google or Yandex geo-services. Only Google and Yandex collect data themselves, and on our service users do it. We only provide reliable access to them.


Here are the main features of the service:



Now more on some points.


Data import


The web interface for import is based on OpenRefine . It is a tool for cleaning data sets and for performing complex data operations. He has his own language GREL, similar to Exel and OpenOffice.



OpenRefine is difficult to master, but it has open source code and unlimited possibilities for processing data of different formats - JSON, JSONL, XML. Also from allows you to import data from ODT, XLS, XLSX and even from Google Docs.


Among the shortcomings it can be noted that OpenRefine loads all data into memory at a time and this limits the size of the file imported through it. But for importing large files (> 500mb) there is an API.


Tree data structure


Unlike analogues such as data.word or Firebase , data is stored in a tree view. Each data element can be accessed in an absolute way, like a file in the file system. For example, you can find out the value of Bitcoin on the WEX exchange on March 5, 2018 at 14:45 (UTC):


 https://api.mydataspace.net/v1/entities?root=exchange_rates&path=btc_usd/wex/2018-03-05_14-45 

All additional services are part of this tree. Thanks to this, we have a single API for reading / changing data, adding tasks, uploading files and creating visualizations.


API


MyDataSpace is not just a data warehouse. It can be used as a backend for a site or a mobile application. The API is designed to give the user maximum opportunity for working with data:



Data is stored in multiple MySQL shards and indexed in ElasticSearch. As Elastic is used only as an index, we can easily migrate to new versions, change mappings without fear of losing data and without stopping the service.


API is available in 2 options:



The possibilities are the same, except for the fact that WebSocket has an SDK and it allows you to receive notifications about changes in the data you are interested in in real time. What makes it possible to do, for example, is:



Data versioning


All open data in government sources is versioned. Our service without such a possibility would also be incomplete.



Run custom javascript on the server


Users can write small JavaScript programs that will run on a schedule (once an hour, day, week, month) on the server. Such programs are useful for updating data from a remote source.


The program on the server runs in a sandbox like in a browser (you can, for example, connect jQuery), this allows you to debug the program directly in your browser before sending it to the server for execution.


Since nodejs 8+ is used on the server, the latest JavaScript features are available to the user (classes, async / await, etc. ).



The post is already quite large, so to be continued.


')

Source: https://habr.com/ru/post/350482/


All Articles