1. Introduction
Back in 2001, the W3C consortium developed recommendations for the XML Schema Definition Language (XSD), combining the most popular schema definition languages ​​into one standard. The main goal that was pursued at the same time was to obtain a platform-independent standard that all participants in the information exchange can use. By curbing chaos, XML has become the most attractive format for information exchange. Nowadays, the XML format in information technology is used very widely, going far beyond the simple exchange of data.
As a result of the popularity and breadth of using XML, both the increase in the volume and the complexity of the structure of the data transmitted in XML become. A more young and simple JSON format, which “selected” all information flows from XML with a more or less simple structure of message formats, made its own contribution to this process. Today we have the fact that XSD schemas that describe the data structure of XML messages have become large and complex. It became very difficult to read large and complex schemes in text form, so there is a need for both special software and up-to-date documentation that describes XML message formats.
In this article I will talk about how the problem of documenting XML message formats used for information exchange was solved ...

2. Issues
The best XSD schema documentation is the XSD schema itself. This axiom is valid until the scheme exceeds a certain threshold of complexity or you do not meet a person who does not know or does not want to read the XSD scheme. When developing an application that uses XSD schemes, you may also be faced with the requirement to describe the developed message formats in the technical or accompanying documentation.
')
If you are involved in developing applications with loosely coupled or distributed components in a service-oriented architecture, you are familiar with the concepts of SOA (service-oriented architecture) and SOAP (Simple Object Access Protocol), then very soon there will be a moment when you yourself need to be updated. documentation more than your customer.
Therefore, the question “Do you need documentation?” Has an unambiguous answer “Yes”; sooner or later everyone who is involved in developing software using XML will face this.
The next obvious question is what should be the result of documenting formats? It is quite difficult to answer this question, because different users of the result (architects, developers, analysts, technical writers, administrators, representatives of the Customer and all the others) have completely different tasks.
Solve this problem in different ways. Someone (for example, the developers of oXygen) followed the path of a full description of the XSD scheme. As a result, the description becomes even more difficult to understand than the scheme itself. Others stand on the fact that everyone should be able to read XSD schemes and no documentation is needed - sometimes only because they understand that they are not able to keep this documentation up to date. As always, the truth lies somewhere in the middle ...
3. The essence of the problem
The process of developing message formats can be represented by the following major steps (see figure below).

Iteration number 1:
- 1. Determine the amount of data for information exchange - this step determines the amount of data that needs to be transferred between the participants of information exchange - the essence, their attribute composition and interrelations.
- 2. Develop XSD schemes - based on step No. 1, the architect or developer develops XSD schemes that, in addition to the data themselves, contain SOAP message mechanisms necessary for transport, security, etc.
- 3. Describe the message formats - this step develops documentation that describes the formats and provides examples of messages.
- 4. Reconcile - this step checks and reconciles the formats within the team developing these formats. If inaccuracies are found, the development iteration is repeated.
The process of developing message formats is iterative. After the first iteration has been completed and the comments have been received, the next one starts immediately from step No. 2:
- 2. Develop XSD schemes - the architect makes changes to the XSD schemes; it takes much less time than the development of these schemes took at the first iteration.
- 3. Describe message formats — The analyst or technical writer should update the documentation describing the formats.
And here he has a dilemma: make only those changes that the architect informed him or cover all the schemes that have changed the file size. Anyone, even the most conscientious worker will choose the first option - and he will be wrong. This option does not work! - in the schemes there are often unannounced changes, which the architect or developer forgets to inform about and with this approach the description of the formats will inevitably diverge from the schemes. What does it threaten with? - when development begins, a discrepancy will be found that will bring a bit of chaos and in varying degrees complicate the development to all the teams participating in the integration.
Could it be worse? - yes, if the development schedule for the participating teams is different. One of the teams at the beginning of the year, in accordance with the specification, implements sending messages with incorrectly filled data and safely closes the contract. Another mid-year team implements the reception of these messages and detects a discrepancy between the required data and their description. This time the chaos comes to stay for a long time and the discrepancy of formats with their description can cost too much.
What is the solution? Alas - it remains the only option - to cover each time, all the changed schemes. It is very difficult to accept. A document with an album of formats can take more than one hundred sheets and skip it, it is very hard and painstaking work. Very often, the urgency of implementation puts strong pressure on the one who develops this document. Not everyone understands why changing the description of several elements in several schemes can “cost” a whole working day or more.
Thus, this step becomes a “bottleneck” of development, where each determines to the extent of its responsibility what is currently more valuable - quality or time. - 4. Reconcile - the reconciliation at first goes within the development team of the formats. When the internal coordination is completed, it is the turn of coordination of the external - for all participants of information exchange.
The discovered "bottleneck" makes a very difficult choice between quality and development time. It is almost impossible to choose between them because both options are needed at once!
4. Documenting Message Formats
The most obvious way to document formats is with pens. You open the circuit and describe its element by element, which takes a lot of working time. If the scheme is large or there are many of them, then within a few days you will acquire a specific red tint to the eyes of a professional programmer and a persistent aversion to this work. Next comes the understanding that it cannot be that such work has not been automated for a long time and the subsequent persistent search for a finished tool.
Before looking for an automation tool, it would be nice to understand how you would like to use it and what should be the result of its work?
All the work on documenting message formats fits into the following usage scenarios:
- Documenting the structure of elements of one or several XSD schemas with filled “documentation” is the simplest option when we form a document from a single source of information (XSD schemas). These are usually schemes that are developed within the team as part of ongoing work. It is ideal if the development is carried out taking into account the agreement on the development of schemes, in which it is indicated, not only that the elements of the scheme should be documented, but also with what content and wording.
- Documenting the structure of the elements of one or several XSD schemes with unfilled “documentation”, or partially filled in - this option is more complicated. These are schemes that are developed by other teams. Often such schemes regularly come from the “As is” side and we cannot impose any requirements on them. In this case, only the structure can be taken from the scheme itself, and the description of the elements should be added with pens.
- Comparing the structure of elements of XSD schemes of different versions - we have the schemes and their description, and now the schemes have changed and we need to update the description or get information about the changes. Schemes can change as meaningfully when elements are added or removed, and purely cosmetically, when an extra space has been removed from a comment and the checksum of the file has changed. Separately, it is necessary to note the case when the scheme is rewritten using a different template - in this case, from the point of view of the data, nothing changes, but you can find out the old scheme only by spending a lot of time or using special software. Taking into account the fact that the circuits can come in batches of hundreds of pieces, it becomes clear that comparing the circuits with eyes is very difficult and resource-intensive work.
As for the result, over many years of working with diagrams and their documentation, I developed my own understanding of what the result of describing message formats should be, what is called “from the plow”. The basis of the concept can be described in just three points:
- The scheme itself is secondary - the primary data. When developing, we do not need a description of the scheme as such - we need a description of the data that this scheme describes. In essence, we need a description of the format of the elements as they appear in the XML message, or in the XSD scheme developed using the Russian Doll design pattern (for more information on design patterns, see the XSD Design Patterns article ). It is in this form it is convenient to discuss the scheme both during development and much later, with integration or maintenance. That is what the customer wants to see in the technical documentation.
- The description of the formats should be a simple and clear table, which can be equally easy for both professional developers and those for whom everything related to the development is a kind of “magic”. There will always be someone who, being a crucial source or consumer of information for you, pokes the XSD scheme with a finger and says: “What is this ???”.
- All elements must be described once in the format album. This means that when describing any element rendered in a separate XSD scheme, only this element should be described in the table. You do not need to pull up the entire SOAP message format there, you do not need to disclose the types described in the imported schemas. This approach will not allow the document to swell to indecent size and is better readable, and if necessary, add additional information on any element, it will need to be done in one place!
What does the format description look like in a document? In the process of work, the table with the description of the formats of the elements of the XSD scheme changed more than once, both in the set of columns and in their content, until it received the columns described in the following:
- "No. p / p" - here is shown the positioning of the element on the diagram in the form of a multilevel list.
- "The name of the element and its type" - here the data identifying the element is shown - the name of the element and its type.
- "Item Description" - here the business data for the item is shown - its description from a business point of view.
- “Filling rules” - here the technical data is shown: the rules for filling the element, data format, examples of values, etc.
- "Mn." - here shows the power of the element - the obligation, the multiplicity and the possibility of choosing an element.
An example of the format description is given below in the “Solution” section ...
5. Search for a solution
Based on the use scenarios and the desired result, the main requirements for the functions of the tool were formed, which should automate this activity:
- Generation of the description of element formats for the selected XSD scheme.
- Generate description of element formats for all XSD schemes in the selected folder and its child folders.
- Comparison of the description of the element formats for the selected scheme (or schemes in folders) and its previous version.
- Enrichment of descriptions of element formats in the output document using descriptions of elements specified in a separate file.
- Bringing the description of the element formats to a single form in the “Matryoshka” pattern structure, regardless of the pattern used in the design of XSD circuits.
Despite the prevalence of using XSD and a large number of software that works with it, I still could not find a tool that at least partially met these requirements.
However, the problem was more relevant than ever and such a tool was created ...
6. Decision
For those who are interested in looking at the tool, I will provide links to it in the commentary after the article, in the framework of the same article it is more interesting to look at the result as an example of documenting message formats.
6.1. An example of processing a documented scheme
Here is the result of the description of the element formats obtained from the XSD schema with the filled "documentation".
6.1.1. Source diagram
Spoiler header<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/Customer" xmlns:tns="http://www.example.org/Customer" elementFormDefault="qualified"> <xsd:annotation> <xsd:documentation> .</xsd:documentation> </xsd:annotation> <xsd:element name="Customer"> <xsd:annotation> <xsd:documentation>.</xsd:documentation> </xsd:annotation> <xsd:complexType> <xsd:sequence> <xsd:element name="CustomerId" type="xsd:int"> <xsd:annotation> <xsd:documentation>ID .</xsd:documentation> </xsd:annotation> </xsd:element> <xsd:element name="FirstName" type="xsd:string"> <xsd:annotation> <xsd:documentation>.</xsd:documentation> </xsd:annotation> </xsd:element> <xsd:element name="LastName" type="xsd:string"> <xsd:annotation> <xsd:documentation>.</xsd:documentation> </xsd:annotation> </xsd:element> <xsd:element name="Address"> <xsd:annotation> <xsd:documentation>.</xsd:documentation> </xsd:annotation> <xsd:complexType> <xsd:sequence> <xsd:element name="StreetAddress" type="xsd:string"> <xsd:annotation> <xsd:documentation> .</xsd:documentation> </xsd:annotation> </xsd:element> <xsd:element name="City" type="xsd:string"> <xsd:annotation> <xsd:documentation> .</xsd:documentation> </xsd:annotation> </xsd:element> <xsd:element name="Zip" type="xsd:string"> <xsd:annotation> <xsd:documentation> . >>> .</xsd:documentation> </xsd:annotation> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>
6.1.2. The result

6.2. An example of using an external description
Here is the result of the description of the element formats obtained from the XSD schema with an empty “documentation”.
6.2.1. Source diagram
Spoiler header <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/Customer" xmlns:tns="http://www.example.org/Customer" elementFormDefault="qualified"> <xsd:element name="Customer"> <xsd:complexType> <xsd:sequence> <xsd:element name="CustomerId" type="xsd:int" /> <xsd:element name="FirstName" type="xsd:string" /> <xsd:element name="LastName" type="xsd:string" /> <xsd:element name="Address"> <xsd:complexType> <xsd:sequence> <xsd:element name="StreetAddress" type="xsd:string"/> <xsd:element name="City" type="xsd:string"/> <xsd:element name="Zip" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>
6.2.2. External description file data
Spoiler header \matr . Customer . CustomerId ID . FirstName . LastName . Address . StreetAddress . City . Zip . >>> .
6.2.3. The result

Please note that the result obtained is completely identical to the result of processing the documented scheme!
6.3. An example of comparing two schemes
This is a description of the element formats obtained by comparing different versions of the XSD schema.
6.3.1. Source diagram
Spoiler header <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/Customer" xmlns:tns="http://www.example.org/Customer" elementFormDefault="qualified"> <xsd:element name="Customer"> <xsd:complexType> <xsd:sequence> <xsd:element name="CustomerId" type="xsd:int" /> <xsd:element name="FirstName" type="xsd:string" /> <xsd:element name="LastName" type="xsd:string" /> <xsd:element name="Address"> <xsd:complexType> <xsd:sequence> <xsd:element name="StreetAddress" type="xsd:string"/> <xsd:element name="City" type="xsd:string"/> <xsd:element name="Zip" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>
6.3.2. Previous version of the scheme
Spoiler header <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/Customer" xmlns:tns="http://www.example.org/Customer" elementFormDefault="qualified"> <xsd:element name="Customer"> <xsd:complexType> <xsd:sequence> <xsd:element name="CustomerId" type="xsd:int" /> <xsd:element name="FullName" type="xsd:string" /> <xsd:element name="Address"> <xsd:complexType> <xsd:sequence> <xsd:element name="StreetAddress" type="xsd:string"/> <xsd:element name="City" type="xsd:string"/> <xsd:element name="Country" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>
6.3.3. The result

The new elements “FirstName”, “LastName” and “Zip” have all columns in bold. The “Address” element has changed position - only the first column is highlighted in bold. Deleted elements “FullName” and “Country” are highlighted in gray. Also, the background of the lines helps to “read” the changes.
This view makes the differences easy to read, both on screen and in print.
7. Summary
Now, to make a new version of the album formats for several hundred XSD schemes, it takes only a few minutes. The output file in the Word document format is obtained in the amount of nearly 1500 sheets. The problems of errors in the description disappeared, and most importantly - the irrelevance of the description of the schemes. Thus, it turned out to successfully automate one of the most time-consuming areas in the management of application development.