How we develop documentation in an open project Embox

Good day.

As one of the developers of the open-source Embox project, I have often heard (too often lately) that the project is interesting, but since there is no documentation, it cannot be used. We replied that there is some kind of documentation, that we can always answer the questions, that in a pinch, you can try to figure it out yourself, because the project is open, but all this did not fit. I had to deal with this very unpleasant for developers theme. But of course, the article is not about the fact that documentation is “unpleasant”! And how we made the documentation development process more comfortable. Indeed, in any more or less large project, there are always issues related to documentation.

For those who are too lazy to read, I will immediately say that in the end we came to the development of documentation in the markdown format. Well, those who are interested in the details, the reasons why it is markdown and what are the pros and cons of this approach, please under the cat.
')
I will begin with the justification of the relevance of the topic. Not only Embox has documentation problems. For example, Google has announced an analogue of the Google Summer Of Code (GSOC) program for technical writers Season of Docs . Kaspersky Lab company holds conferences for technical writers . And the company Parallels publishes articles on how to write documentation . All this indicates that the topic is important and possibly undeservedly deprived of attention.

The above series of articles deals with the correct content of the documentation, but I want to focus on the process of developing documentation, technology, if you wish. After all, for an open source project, the hallmark is precisely openness as a property of the development process. And we wanted to create a process that will satisfy the requirements of our project.

A little insight into the history of our documentation development processes.
Once the project was located on googlecode . We had a pretty decent wiki, many even remember about it, ask where to find it and ask to transfer it to github, where the project is located now (or to another available place). Wiki on googlecode was really convenient. It was multilingual and, in my opinion, had more features than the wiki on github. But it so happened that, together with googlecode itself, it had sunk into oblivion. The current wiki on github performs quite well the function assigned to it of conveying operational information about the project, but it is rather difficult to create complete documentation on this platform.

Of course, for any open source project, both online (quickly accessible) online documentation is needed, the role of which is performed by the wiki, as well as complete documentation available offline, because it is much easier to understand the essence and ideology of the project. In addition, doing an offline search on a single document, rather than scattered wiki pages, is also much easier.

Probably the best option I know is the ARM documentation . That is, online documentation, where there is a search in all sections, but a specific document is available for download in pdf-format. Unfortunately, Embox has not yet reached ARM in terms of its capabilities. Therefore, we had to do the offline version separately. For this we used the service Google Docs . It is convenient because: it allows you to download data in different formats, work together and has a built-in version control system. We transferred some of the information from the wiki, set the structure, because the purpose of developing an offline version was to create a holistic documentation, and began to develop several documents. But quickly we ran into problems. The information has become outdated, the data from the wiki did not match the data from the offline documentation, and most importantly we have not been able to create holistic documentation. The structure was, but since it was not possible to achieve decent feedback from users, it turned out that only its creators can sort out the documentation. This is certainly not a problem of service, but the fact, as they say, is obvious. We had to look for a different approach to this problem.

Then we tried to just transfer the data from the old wiki to the github wiki, but even then we quickly ran into problems. We found that part of the information was outdated, part was never added, part was not clear how to present in a user-friendly form. Continuing the search for a solution to the problem, at some point we even considered the development of documentation in TeX using the git repository, but quickly realized that this was already overkill. Although this idea had its influence.

We decided to formulate what we want from the documentation we have in mind from the process of developing documentation, leaving the content in parentheses:

The documentation should be stored in text format, since it was supposed to use git as a version control system
Documentation must have online (wiki) and offline (separate documents, for example, in pdf) versions
Online and offline documentation should be fairly easy to sync
The documentation should consist of sections (chapters) that can be developed or studied separately, but from which you can make a holistic documentation

None of the points contradicts the use of markdown, and it was interesting because it is already used on the wiki. You could, of course, use a different format and convert it to markdown. But after drawing up another list of requirements, this time to the ability to add various kinds of content (images, text, formatting), we came to the conclusion that markdown sufficiently satisfies all our current needs. And the very first googling on the topic “markdown to pdf” showed that the options to translate markdown into other formats exist and are quite popular.

There are several options to turn markdown into pdf , but pandoc is definitely the most popular. This utility can turn any text format into any text format. In addition, it is a console. Therefore, not only familiar to us, but can also be embedded in scripts to create documentation in various formats.

We decided on the utility and began to think about the following small question that we had to decide, namely how to make a single document, and not a lot of pdf files with different chapters derived from separate markdown files. The first desire was simply to “over-cat-it” files (merge text from various files) in the right order, but everything turned out to be much simpler, pandoc itself knows how to work with the list of files. This also allowed us to partially solve the problem that we need different documents in which the content may overlap. For example, we generate three documents and all three contain a brief description section. We simply list this file for pandoc for all documents.

A similar principle we apply for cover pages where the name of the document is contained, and so on. We simply created files with caps for each document and include them with the first files in the source list for pandoc.

As you probably guessed, this (a list of different files) solves the problem of multilingualism, we simply specify the files with the desired language.

About Russian let's talk a little more. When generating pdf, pandoc uses latex as a backend for rendering, spell-checker, and so on. By default, Cyrillic is not displayed and an error about an unknown character is displayed. This is solved very simply by simply specifying “babel russian”.

--- ... header-includes: - \usepackage[russian]{babel} ---

It should be noted that it is more correct to specify

 --- ... header-includes: - \usepackage[russian, english]{babel} ---

And in the text to use latex-commands

 \selectlanguage{russian} \foreignlanguage{english}{ English text }

 \begin{otherlanguage}{english} Text in english \end{otherlanguage}

But since we wanted to keep the “clean” markdown for a possible transfer to the wiki and it turned out that an indication of babel was enough for our purposes, we decided not to complicate it and left it as it is.

Of course, we wanted to have not just any formatted documents, but at least looking uniform. And here latex helped us again. The fact is that since pandoc uses latex, you can specify latex templates for it. This is done simply with the option - template

 pandoc --template=embox_pandoc.latex ...

After you have compiled a template and specified it when generating all the documents, the documents are obtained in a more or less uniform style. “Less Than” - because there are several problems that we never managed to solve. For example, the formation of a single block of code. In markdown, it is possible to specify a single block of code so that the syntax highlighting is not formatted and possibly turned on. But we managed to make only single-line blocks. It turned out after viewing the generated latex code.

That is, there were cases when the same block was placed on different documents a little differently.

Another point associated with the Cyrillic alphabet, that is, the use of the Russian language. As already mentioned above, to use the Russian language, it turned out to be enough to indicate the russian babel in the header. But we faced some oddities, for example, there was no bold allocation, and other formatting oddities. Initially, we sinned in the fact that Russian is given in caps, and not in a template. Began to study the problem. It turned out that, in a good way, it is necessary not only to use

 \usepackage[russian, english]{babel}

but also to set fonts

 \usepackage[T1,T2A]{fontenc} \usepackage[utf8]{inputenc}

But even having done this, it was not possible to correct the situation. It turned out that not all font sets contain all rendering options, in particular, ours do not contain bold, italic, and other formats. Since there was no simple solution to the problem, we thought and postponed the problem until better times. Well, since we wanted to use pure markdown (that is, without specifying the latex commands), we made a common template for both languages, and the instruction about the Russian language was included in the title.

Just a couple of words I will say about the interface of the application itself. Since, as I said, console applications are very familiar to us personally and easily packaged into scripts. Actually, the interface is simple, you can of course specify many options, but for our purposes it is enough to specify a template, a list of input files and an output file.

 pandoc --template=embox_pandoc.latex <title file> <list of input markdown files> -o <output file>

On the Internet, you can see that to generate pdf (or another output format) you need a handler type using the option --pdf-engine = xelatex, but by default it is already used if the output file has the pdf extension. Therefore, we did not need this either.

We packed the scripts for the documentation assembly into the Makefile to make it quite familiar. And now to get the documentation, you can set the necessary environment by simply calling

 make [en][ru]

In conclusion, you need to say a few words about the version control system. The principle of not complicating (keep simple) we tried to adhere to everything. And since the simplest solution was to use a separate repository on github, we did so . We hope that the use of github will improve user feedback. After all, as you know, there are issues on github in which you can discuss flaws and offer your ideas and directions.

This process was started recently, according to numerous requests from workers! We managed to make the English and Russian versions of the “quick start”, as well as the first version of the Russian user manual . The process itself seemed more convenient to us, so we are sharing it with the public.

Source: https://habr.com/ru/post/445792/

All Articles

How we develop documentation in an open project Embox

More articles: