EDI standard (
Electronic Data Interchange ) - part of the old, well-established systems. But we constantly see how EDI is presented as a modern standard. Is it so? Do we need to consider EDI as a base technology for new projects?
Let's look at EDI from a technical point of view, discarding everything else.
EDI Data Format
EDI uses a
delimited text format . It works well for flat data structures, such as tables. It is not so good for representing hierarchical data structures. Nested objects are better serialized using tagged formats such as XML and JSON.
Very strange, but the
definition language (
document definition) was not created for EDI. So many years have passed since the advent of EDI and so much effort has been spent on it, but the description language has not been created. The description language allows you to automate data processing, namely, their generation, verification, transformation, serialization, deserialization. For comparison, to verify XML data, we take a data scheme (XML Schema, xsd) and the parser automatically checks the data for compliance with this scheme.
You can do without the scheme, but then the desired layout of the document. XML and JSON documents can be deserialized without a schema, because the data itself contains tags (names) of data elements. EDI has tags for segments only and does not have tags for elements. Elements are determined by the position within the segment.
Universal EDI parser can only parse a document into primitive collections, because the document contains neither names nor types for data elements.
Let's turn to the details.
The EDI standard consists of two main parts:
- Envelope (batch?) Format (mixture of message standards (messaging))
- Specifications (formats) of documents (a mixture of industrial (domain) standards)
')
Batch format
EDI defines packages for sets of documents, groups of documents, and the documents / transactions themselves (
Interchange ,
Group and
Transaction /
Document) . Packages are limited respectively to ISA / IEA, GS / GE, ST / SE by pairs of segments.
Note: For illustration, I use the EDI X12 version of the standard common in North America. Another version of the standard, EDIFACT, is common in Europe and is not fundamentally different from X12.
Here is an example of the very first segments of all three packages: ISA, GS and ST. An example is taken
from here :
ISA * 00 * * 00 * * ZZ * RECEIVERID * 12 * SENDERID * 100325 * 1113 * U * 00403 * 000011436 * 0 * T *> ~
GS * FA * RECEIVERID * SENDERID * 20100325 * 1113 * 24712 * X * 004030 ~
ST * 997 * 1136 ~
What do we see in the first segment?
The last three characters of the ISA segment are the
separator characters : "*> ~": the '~' is the segment separation character; '*' - the symbol of separation of elements within a segment; '>' Is the symbol for separating sub-elements inside an element. By changing these symbols, we essentially change the formats of packages and documents. In XML and JSON separator characters are written in the standard, they can not be changed. Variable separator characters are the rudiments of an era when Unicode was not yet created. But even in those days, making separator characters changeable was not a good idea. Separation characters are very important characters. If we can use
any characters as separators, it not only names the logic for parsing packages into component parts, it greatly complicates the logic for parsing text within the elements themselves.
Even in the ISA segment, we see elements defining the
formats of time and dates . They help us use custom date and time formats within documents. This made sense in the seventies, when we had to save a few bytes when encoding dates and times. Do we need these elements now, after we have overcome the problem of the "2000 year", after the specialized and very detailed
standards for the presentation of time were created?
We see in the ISA segment elements defining the
sender and the
addressee . In essence, this is
address (
routing ) information. That is, the packaging standard is combined with the addressing standard. Using EDI, we must specify the sender and the addressee within our data. In the ISA segment there are also
authorization elements . The whole idea of placing this authorization information inside the messages themselves was once quite progressive, but now it looks at least naive, and even dangerous. Now we understand that authorization information is much more complicated than a pair of values. The same can be said about the address information. The EDI standard encourages us to use these elements.
We also see the
acknowledgment request element. That is, the creator of the document sets the strategy for using confirmations directly in the document. Is this a good idea? We can use documents in different scenarios. In some of them, confirmations are used at the application level, in others, other protocols are used to increase reliability. Reliability policy is not defined within the data itself, because reliability is a rather complex topic in data transfer, as determined by many communication participants.
Still inside the packet segments, we see
control numbers (
Control Numbers ). They are needed in scripts when we receive a set of documents, but part of the set is lost or distorted along the way, and we are trying to recover as much data as possible. This scenario has not been used for a long time, since such a reliability problem is usually solved at the lower levels of communication protocols. We don't embed communication reliability at the application level, right?
Another element of the ISA segment is the
EDI version (
Standard Identifier ). This is similar to versioning support, familiar to us from serialization standards.
In the GS segment is an element that defines the
type of document (
Type of Document ). For example, this is an order or invoice. There is nothing very bad about this, although it is simpler to set the document type within the document itself.
As you can see, almost all the elements in the package segments are either useless or, moreover, dangerous if we use them in accordance with the standard.
Please do not attempt to use data from packet segments for authentication and addressing.
EDI was created at a time when placing this information in packages was the only option. Now we transmit documents via the Internet and use a large set of standards and protocols for packaging, addressing, authentication, authorization, reliability, encoding, serialization, segmentation, etc., etc. Protocol-specific information is added and removed throughout the data path, and this information is independent of the data itself.
Is EDI a data format standard or protocol?
EDI is trying to be a protocol, which is why we see these elements of addressing, authorization, and confirmation requests. I do not know how this information can be compared with the OSI protocol layer model.
But still, most of the EDI standard is about data formats.
Document Formats
Inside the packages we see the documents themselves. But we will not find a standard for a universal, generalized document. The standard defines numerous formats for various types of documents: for orders, for invoices, for attachment inventories ...
Here you will find a small part of the huge list of standardized documents.
EDI follows the well-known myth: “Somewhere there is an ideal format that describes everything in the world of scenarios. We will definitely find this format. We just need to add new scripts and tweak old ones. ”
As a result, EDI standard documents (specifications) are overly complex.
Take one example: We need an invoice for a small local bookstore. We found a suitable standard specification, EDI 850, Purchase Order. At first glance, it looks too detailed. We will not buy food, coal, grain, liquid products, hazardous products, medical preparations. We do not need international addresses. We will not use express delivery service. The EDI specification describes all of these possible variations, but there are too many fields in it that we will never use. It is too complicated for our simple document.
There are many industrial (domain) standards that are used as a kind of knowledge repository. But these standards are not used as data transmission standards. (See
this article describing the problem of industry standards.)
Loops inside documents
The structure of individual documents is quite simple. Documents are made up of a series of segments, inside which are document data.
But it turns out that the segments can be combined into groups or into repeated groups, so-called
loops . The piquancy is that these cycles are absolutely not highlighted in the document. We can read about the presence of a cycle in the specification of this particular document. Segments of the same type (with the same tags) can be located both independently and within cycles. Creating a parser that recognizes loops (which, I repeat, are not mentioned in any way in the document), is a rather non-trivial task.
In XML and JSON, there is no such problem; hierarchical objects or collections of objects of any nesting level are very simply defined using opening and closing tags, named or unnamed.
EDI tried to sit on two chairs. On the one hand, its document format is similar to the csv format and is convenient for presenting tabular data. On the other hand, he tried to describe hierarchical objects, and this attempt ended very unconvincingly. Of course, we understand it now, when we have JSON before our eyes. But let's remember that EDI was not made for the transfer of tabular data, but for the transfer of documents whose structure is hierarchical.
Nontechnical look at edi
For the full picture, I still list some of the non-technical features of EDI:
- EDI standard is not free . It looks rather strange compared to other standards.
- The EDI standard specifications are overly detailed . EDI specifications are so complex that companies must hire experts familiar with a particular specification. These experts communicate using special EDI terms; this is almost an EDI language that is not related to business. Look at the EDI agreements between companies. These agreements are full of specific requirements, defined by the EDI standard, but far from business requirements.
- EDI standard is not stable . A special committee issues modifications to the EDI standard every six months. Each of these versions introduces new refinements. The development of the standard does not follow the demands of users, rather, it simply follows the schedule. Presumably this is not due to the very high requirements for the standard, but because the committee needs to show the results of its work.
- EDI was created to save bits and make documents as compact as possible. This requirement still exists, but it is hardly used to transmit documents. Every child now owns a phone that downloads gigabytes of video. The yard is no longer the era of mainframes and teletypes. And it is rather strange to read reports that seriously discuss the saving of resources due to the transition from paper workflow to the use of EDI.
- To save memory, EDI uses codes to represent data wherever possible. As a result, the documents look encrypted, which creates an additional problem of exchanging code tables.
- The EDI standard was created for the transfer of batches of documents due to the fact that communications and computers were expensive and worked slowly. Since then, much has changed, communications and computers have become fast and cheap. Data is now transmitted in small messages or streams, and these small messages are the basis of distributed systems. Document sets are still in use, but not because of slow equipment, but because business processes require it.
- There is no standard for the EDI description language . This means that we cannot create a universal parser for processing EDI documents. Parsers should contain descriptions of thousands of existing EDI specifications with a huge amount of detail. (For example, Microsoft provides about 7,000 XML schemas for EDI documents as part of BizTalk Server.) Available EDI parsers are expensive. To work with EDI documents, we will most likely have to convert EDI documents to XML format and use XML Schema together with an XML parser to process EDI documents: for checking, transforming, serializing, deserializing, creating. What is being done in BizTalk Server.
- Due to the lack of a standard EDI description language, documents are described using ... multi-page instructions. The developers of EDI parsers interpret these instructions differently, and because of this, different EDI parsers are incompatible .
- The EDI standard was created at a time when the development of programs, protocols, and data formats was extremely expensive and lasted a very long time. Creating a standard for a universal format of documents was justified. Now data formats are generated on the fly and our programs, as a rule, do not use any universal standards, but create different formats for specific cases. EDI specifications include as many details as possible to satisfy all users. Modern programs include in the specifications of the transmitted data only those data that are necessary. The number of elements in the EDI specification that are unnecessary in your particular case will always be very large.
- EDI mixes two types of standards: standards for communications and standards for business data formatting. Current trends are exactly the opposite: standards should be independent of each other (orthogonal), which allows you to mix them in any combination.
As you can see, the EDI standard is outdated in almost every aspect if we consider it from a technical standpoint. There are hardly any rational technical reasons for using it now. But despite this, EDI is still widely used.
In the next part we will try to find the reasons for this. Most likely they will be non-technical.