Data transfer, especially on an ongoing basis and periodically updated, poses many technical, technological, methodological, managerial and legal issues to suppliers. And if the legal aspects are somehow regulated, technical ones are conditioned by the available resources (material and technical base), then managerial (economic, marketing) and more methodological ones lead to very difficult problems that have to be solved independently and not always successfully.

This publication is a continuation of the general public data series. Many of the concepts found in the text were considered in previous articles.
Links to previous articles
Having innumerable deposits of accumulated digital data, any their owner begins to understand sooner or later that he has and does not use an invaluable resource. And the larger the storage sizes, the more they start talking about the greatness of the digital world, about priceless oil-like data sources, about the big data phenomenon, about the strong dependence of modern control systems on timely information support.
And the data is.')
And so that the data do not gather dust on the shelves with heavy loads, their owners initiate and promote internal analytics. The latter, by carrying out quite specific mercantile tasks, can gradually turn from a structure designed to gain knowledge into a unit preparing heterogeneous and colorful, static reports and charts or dynamic indicator panels. Thereby, in no way helping managers make decisions, but burdening them with unnecessary information of dubious quantity and quality. There is a growing need for qualified information management and the development of a specific strategy (model) of management analytics.
And the data is accumulated and are.When public data enters the arena, the situation with suppliers and recipients is seriously complicated.Planning, organizing, coordinating and controlling a closed
internal figure is easier and more efficient. She is always in the zone of full attention and control, and the impact on her is understandable and does not require special approaches.
Public data, both external and poorly controlled, raises somewhat different questions and requires solving special problems. The supplier is not able to fully control the use of published data, as well as the recipient is not able to significantly affect the data until the moment of their disclosure to a wide range of persons. Competition of suppliers for recipients and competition of recipients for data is a separate story with its heroes and losers.
In general, it is not so easy to include in the business management framework open, shared and delegated data from the point of view of their publication, as well as from the point of view of their implementation.The preparation and delivery of data to an unlimited or conditionally limited circle of users is performed by their owner or a third-party supplier in the interests of the owner. Even if the data owner transfers such functionality to the side, in most situations he is directly interested in the results. Otherwise, it turns out some kind of meaningless activity on “data sink”.
Strategy
Developing a public data strategy for a supplier is the best start for further meaningful and useful results. It should be clearly understood that such a strategy is ideally built in addition to the strategies of management accounting, knowledge management and analytics (modeling) of a business.
Of course, you can try to “push” some hypothetically useful data sets into the surrounding network reality without paying special attention to the key issues of the strategy. Only the effect of such activities is likely to be used by someone else and it’s good if it is not a direct competitor.
It is strategically worth deciding and planning work in the following areas:- Defining the purpose of publishing data and key subject areas in which it searches for new solutions and knowledge. It is important to be attached at this stage to the internal problems and the business intelligence system.
- The formulation of large subtasks of data transfer under a public scheme in accordance with the goals and subject areas with preliminary prediction (prediction) of the expected results.
- Formalization of data selection criteria for publication, including content, structural and format aspects. It is even possible in the form of internal closed or public regulations (standards, rules).
- Plan to publish data in the form of general principles or even at the level of individual events. Better if part of this plan will be available to potential recipients.
- Building a system of reverse control of the effect of public data, which is designed through events (events), communities (communication with experts) and research (network analysis) to establish a return to the supplier of the knowledge that can be obtained by third-party users.
- The public data supervisor is a separate control and coordinating function, the purpose of which is the general and problematic evaluation of the process of publishing data for the purposes of the supplier. For the “supervisor”, it is necessary to define benchmarks and give the opportunity not only to actively observe and intervene in the immediate procedures for publishing data, but also in the processes and objects within the supplier organization that accept or can assume the reversible effect of new decisions and knowledge (products and services).
- Personnel support of public data both through the allocation of functionality to individual positions, and through a reasonable addition of the functionality of already existing positions. As always, the issue of improving the competence of individual employees in the field of public data remains important.
- The support for publishing data with tools is due to the complexity of the procedures for preparing, verifying, publishing and monitoring digital data sets. Search or development and subsequent implementation of complex software, managerial and technological tools into practice is an integral part of effective work in this area of ​​business informatization.
- Technical support for the publication of data in terms of evaluation and additional allocation of machine resources (storage, computing power, specialists).
- Legal support for the publication of data both at the level of a formal description of a data set, and at the level of drawing up and issuing a general contract (list of conditions) for public data transfer.
- Marketing support for publishing data to attract users to freely distributed digital data sets.
As in many things related to management, the formulation and optimization of a strategy is a continuous iterative process with feedback and convenient benchmarks.
In connection with the topic under consideration, the most incorrect thing that can be proposed for evaluating the results of the implementation of the supplier’s strategy for managing public data is to count the number and size laid out in the data network. The main task of the supplier's competent strategy for managing public data is not at all in the execution of the data disclosure plan, but in obtaining the very “
magic ” result that others will do, but on its data and for its benefit.
Selection
What data to publish?If a business or other economic entity wants to take part in the global process of public data transfer and it has a similar question regardless of other important issues, then probably no data should be published.
The Internet is saturated with a variety of information every second and additional sets of digital data will not hurt him at all, but they will not bring tangible benefits. And the very problem of choosing data sets for publication should be solved by setting public disclosure goals in the framework of the above strategy.
Moreover, such a goal should be truly meaningful and logical.
The choice of data sets for publication directly follows from the need of the supplier to forcibly, variably and openly explore some problem domain for him. Almost a brainstorming mode, but without a fixed circle of people and in the conditions of turbulence of the global information network. It is clear that the target data sets should correspond to the subject and be of sufficient quality: relevant, relevant, holistic, objective, measurable. The choice of the structure and format of data publication should be calculated on the basis of the target audience.
Of course, gradual expansion of data sets is allowed, both in the meaning and structure, and in the data schema. However, it is important to realize that such changes do not inspire much confidence in the supplier and, in many cases, force to change the loading and processing algorithms, and sometimes even choose other tools.
Sets
Data publishing is done in sets.This is a convenient concept for a certain portion of digital data, separated by meaning, structure or format. In fact, a set can be understood as a separate table in the file, and the issuance of data lines on request to the program interface.
On the other hand, there are no dimensional restrictions in the concept of a data set - it can even be understood as a relational database as a whole.
Each set must be accompanied by metadata and passport (notice).In this case, a “passport” is a kind of conditional certifying set of characteristics, including basic metadata, separate special metadata in a compressed form, and linking to the context. The passport includes, among other things, the assessment of the quality of the data set by the supplier in one form or another.
Currently, there are no generally accepted convenient standards for the full formation and description of public digital data sets.Most likely, such standards or regulations will need to be entered for each type or even subject of public data. However, there are a number of regulations and recommendations.
If the subject seeks long-term to get the maximum effect from the disclosure of data, then he must decide on the difficult conceptually and technically the question of the correct formation of data sets.
Publication
Within the framework of public data transfer, when it is undesirable with each of the recipients to create and maintain a long-term separate secure communication channel, direct delivery of data (their publication) is possible in one of two ways:
- static by reference - the ready data set is pre-formed and available for downloading its copy by the recipient at a fixed address on the network;
- dynamically on demand - a data set is created (completed) with a special software service based on the stored resource data according to the parameters specified by the recipient and is available for its direct receipt as an answer to the request.
Each method has its advantages and disadvantages. And they appear at any level of digital public data.
When choosing a preferential method for a particular supplier, based on their own resources and the chosen strategy, it is necessary to solve such questions as
- API need and functionality
- location of data - your network resource or third-party,
- connection with a third-party context,
- the format of the downloaded files and their size limits
- fixing relevance as part of a continuous or discrete update,
- participation in catalogs (on portals) reference or actual, etc.
The publication is important to adhere to consistency and focus on maintaining high quality data. Therefore, the priority is a systematic approach, and not random unloading of arrays of "numbers".
At the heart of the systematic publication of data is a well thought out and feasible plan.Professional planning of public data transfer and subsequent execution of the plan allows not only to avoid a lot of operational errors, but also to form a positive impression among the recipients of data about the publisher’s responsibility and interest.
The user does not like complex data.And with regard to public data, complexity is usually not in the depth of nested and subordinate sets and not in the number of data units (table fields).
The main difficulty is incomplete or incorrect metadata describing the basic data, which can be changed at any unspecified moment. The recipient (expert, analyst) is forced to spend time monitoring and parsing the data and bringing it to a suitable state, i.e. he is forced to recover missing or correct incorrect characteristics accompanying the already large arrays of digital data.
Semantic, structural and format complexity is removed only by metadata.But the metadata itself is also data, so the same rules and
quality indicators apply to them.
And oddly enough, the metadata is explicitly or implicitly accompanied by the corresponding metadata of the next level.
At the last stage, when the prepared data sets are ready for distribution, one should not neglect the characteristic
legal and administrative features depending on their type:
open, shared or delegated . The best option would be “release” editing and proofreading of the finished data sets, of course, using the appropriate tools.
An interesting integrated indicator of the professionalism of data publishing is the measurement of the amount of time to publish one data set ...... starting from the moment of setting the task for extracting data from the general storage and ending with checking and confirming the possibility of obtaining a set from the network resource where it is published. The absolute value of the indicator gives a general idea of ​​labor costs, and changing it over time allows you to assess whether there are any improvements in the management of the supplier by publishing data, at least at the stage of their free distribution on the network.
The specified indicator comprehensively reflects:
- general level of data management;
- the level of disposable information technology;
- level of financial, economic and organizational management;
- The level of security and risk management services.
What is the problem to publish the data?
Problems should not be:
- if you have them in a good working format and format,
- if you have a means of communication with the digital society,
- if you can evaluate them for openness and the opportunity to publish and evaluate the risks of publishing them,
- if you can physically in one form or another upload them to the server (network),
- if you can upload metadata at the same time and provide contextual links
- and finally, if you understand why all this is necessary.
This indicator is not the only way to assess. Specific and controlled measures of success are developed at the stage of strategic design and allow you to monitor the performance of a completely non-senseless activity - the publication of data.
Feedback
In the simplest version, the feedback on public data for the supplier is the responses of the recipients (users) about the quality of data, their applicability and the results obtained.
In a more complicated case , feedback is the reclamation, research and implementation of those third-party new solutions and knowledge into existing processes and business objects (organizations).
Maintaining feedback with recipients is a separate important component of the overall meaningful public data management model. The organization of feedback at a level sufficient for obtaining a meaningful result is more complicated, more expensive and more important than putting public data sets on network resources.
In the absence of stable links between the supplier and an unlimited number of recipients of public data, it is actually necessary to implement many different mechanisms for mutually beneficial communication.
It can be:
- subscription of recipients (users) to a series of notifications from the supplier,
- open pool of data recipients (a certain club or community of interest),
- real or virtual contests based on public data (hackathons),
- offline or online themed events (seminars, conferences),
- survey of data recipients (simple or focused surveys),
- training activities (lectures, certification tests), etc.
But the goal of all such mechanisms for establishing feedback will be the desire to obtain in one form or another the results obtained by those actors who uploaded public data, processed and analyzed them, found something new and useful, or created new products and services.
Direct feedback
methods involve contacting the public data provider with the recipients.
Indirect feedback
methods are reduced to researching the information space of the network and searching for new solutions and knowledge gained on the supplier’s data or using the supplier’s data, but which for some reason or other have fallen off the line of sight.
There is a certain risk of losing sight of the beneficial effect that someone will have on the supplier’s data. But this is not a problem. The supplier can always recover lost benefits, especially if he himself is the owner of the data. However, finding out that it is generally possible to do something valuable on a specific set of public data is to get feedback from those who have already done something, or at least tried.