📜 ⬆️ ⬇️

EDI standard. Technical Overview

EDI standard ( Electronic Data Interchange ) - part of the old, well-established systems. But we constantly see how EDI is presented as a modern standard. Is it so? Do we need to consider EDI as a base technology for new projects?
Let's look at EDI from a technical point of view, discarding everything else.

EDI Data Format


EDI uses a delimited text format . It works well for flat data structures, such as tables. It is not so good for representing hierarchical data structures. Nested objects are better serialized using tagged formats such as XML and JSON.
Very strange, but the definition language ( document definition) was not created for EDI. So many years have passed since the advent of EDI and so much effort has been spent on it, but the description language has not been created. The description language allows you to automate data processing, namely, their generation, verification, transformation, serialization, deserialization. For comparison, to verify XML data, we take a data scheme (XML Schema, xsd) and the parser automatically checks the data for compliance with this scheme.
You can do without the scheme, but then the desired layout of the document. XML and JSON documents can be deserialized without a schema, because the data itself contains tags (names) of data elements. EDI has tags for segments only and does not have tags for elements. Elements are determined by the position within the segment. Universal EDI parser can only parse a document into primitive collections, because the document contains neither names nor types for data elements.

Let's turn to the details.

The EDI standard consists of two main parts:

')

Batch format


EDI defines packages for sets of documents, groups of documents, and the documents / transactions themselves ( Interchange , Group and Transaction / Document) . Packages are limited respectively to ISA / IEA, GS / GE, ST / SE by pairs of segments.
Note: For illustration, I use the EDI X12 version of the standard common in North America. Another version of the standard, EDIFACT, is common in Europe and is not fundamentally different from X12.
Here is an example of the very first segments of all three packages: ISA, GS and ST. An example is taken from here :
ISA * 00 * * 00 * * ZZ * RECEIVERID * 12 * SENDERID * 100325 * 1113 * U * 00403 * 000011436 * 0 * T *> ~
GS * FA * RECEIVERID * SENDERID * 20100325 * 1113 * 24712 * X * 004030 ~
ST * 997 * 1136 ~

What do we see in the first segment?
The last three characters of the ISA segment are the separator characters : "*> ~": the '~' is the segment separation character; '*' - the symbol of separation of elements within a segment; '>' Is the symbol for separating sub-elements inside an element. By changing these symbols, we essentially change the formats of packages and documents. In XML and JSON separator characters are written in the standard, they can not be changed. Variable separator characters are the rudiments of an era when Unicode was not yet created. But even in those days, making separator characters changeable was not a good idea. Separation characters are very important characters. If we can use any characters as separators, it not only names the logic for parsing packages into component parts, it greatly complicates the logic for parsing text within the elements themselves.
Even in the ISA segment, we see elements defining the formats of time and dates . They help us use custom date and time formats within documents. This made sense in the seventies, when we had to save a few bytes when encoding dates and times. Do we need these elements now, after we have overcome the problem of the "2000 year", after the specialized and very detailed standards for the presentation of time were created?
We see in the ISA segment elements defining the sender and the addressee . In essence, this is address ( routing ) information. That is, the packaging standard is combined with the addressing standard. Using EDI, we must specify the sender and the addressee within our data. In the ISA segment there are also authorization elements . The whole idea of ​​placing this authorization information inside the messages themselves was once quite progressive, but now it looks at least naive, and even dangerous. Now we understand that authorization information is much more complicated than a pair of values. The same can be said about the address information. The EDI standard encourages us to use these elements.
We also see the acknowledgment request element. That is, the creator of the document sets the strategy for using confirmations directly in the document. Is this a good idea? We can use documents in different scenarios. In some of them, confirmations are used at the application level, in others, other protocols are used to increase reliability. Reliability policy is not defined within the data itself, because reliability is a rather complex topic in data transfer, as determined by many communication participants.
Still inside the packet segments, we see control numbers ( Control Numbers ). They are needed in scripts when we receive a set of documents, but part of the set is lost or distorted along the way, and we are trying to recover as much data as possible. This scenario has not been used for a long time, since such a reliability problem is usually solved at the lower levels of communication protocols. We don't embed communication reliability at the application level, right?
Another element of the ISA segment is the EDI version ( Standard Identifier ). This is similar to versioning support, familiar to us from serialization standards.
In the GS segment is an element that defines the type of document ( Type of Document ). For example, this is an order or invoice. There is nothing very bad about this, although it is simpler to set the document type within the document itself.

As you can see, almost all the elements in the package segments are either useless or, moreover, dangerous if we use them in accordance with the standard.
Please do not attempt to use data from packet segments for authentication and addressing.
EDI was created at a time when placing this information in packages was the only option. Now we transmit documents via the Internet and use a large set of standards and protocols for packaging, addressing, authentication, authorization, reliability, encoding, serialization, segmentation, etc., etc. Protocol-specific information is added and removed throughout the data path, and this information is independent of the data itself.

Is EDI a data format standard or protocol?

EDI is trying to be a protocol, which is why we see these elements of addressing, authorization, and confirmation requests. I do not know how this information can be compared with the OSI protocol layer model.
But still, most of the EDI standard is about data formats.

Document Formats

Inside the packages we see the documents themselves. But we will not find a standard for a universal, generalized document. The standard defines numerous formats for various types of documents: for orders, for invoices, for attachment inventories ... Here you will find a small part of the huge list of standardized documents.
EDI follows the well-known myth: “Somewhere there is an ideal format that describes everything in the world of scenarios. We will definitely find this format. We just need to add new scripts and tweak old ones. ”
As a result, EDI standard documents (specifications) are overly complex.
Take one example: We need an invoice for a small local bookstore. We found a suitable standard specification, EDI 850, Purchase Order. At first glance, it looks too detailed. We will not buy food, coal, grain, liquid products, hazardous products, medical preparations. We do not need international addresses. We will not use express delivery service. The EDI specification describes all of these possible variations, but there are too many fields in it that we will never use. It is too complicated for our simple document.
There are many industrial (domain) standards that are used as a kind of knowledge repository. But these standards are not used as data transmission standards. (See this article describing the problem of industry standards.)

Loops inside documents

The structure of individual documents is quite simple. Documents are made up of a series of segments, inside which are document data.
But it turns out that the segments can be combined into groups or into repeated groups, so-called loops . The piquancy is that these cycles are absolutely not highlighted in the document. We can read about the presence of a cycle in the specification of this particular document. Segments of the same type (with the same tags) can be located both independently and within cycles. Creating a parser that recognizes loops (which, I repeat, are not mentioned in any way in the document), is a rather non-trivial task.
In XML and JSON, there is no such problem; hierarchical objects or collections of objects of any nesting level are very simply defined using opening and closing tags, named or unnamed.
EDI tried to sit on two chairs. On the one hand, its document format is similar to the csv format and is convenient for presenting tabular data. On the other hand, he tried to describe hierarchical objects, and this attempt ended very unconvincingly. Of course, we understand it now, when we have JSON before our eyes. But let's remember that EDI was not made for the transfer of tabular data, but for the transfer of documents whose structure is hierarchical.

Nontechnical look at edi


For the full picture, I still list some of the non-technical features of EDI:


As you can see, the EDI standard is outdated in almost every aspect if we consider it from a technical standpoint. There are hardly any rational technical reasons for using it now. But despite this, EDI is still widely used.
In the next part we will try to find the reasons for this. Most likely they will be non-technical.

Source: https://habr.com/ru/post/276925/


All Articles