When preparing documents, avoid sluggish office packages, use your favorite text editor, separate content from presentation, ensure high readability and transparency of documents for VCS, easily compare versions of texts?
Recently
commented on the release of a new LibreOffice and decided that it was necessary to issue considerations more coherently.
Imagine that we have received an e-mail document in MS Word format, we have to fill it in / correct it and print it out / forward. Most likely, we will periodically send this document again and / or we will need to continue to prepare updated texts based on it.
')
Problem number 1: on our GNU / Linux, of course, there is no MS Office, and OpenOffice and its heirs are terribly slow (especially if we are used to Vim and other lightweight programs).
Problem number 2: very often the layout of incoming files is catastrophic (beating with spaces, formatting without styles, etc.), so you have to practically redo the entire document in order to make changes normally.
Problem number 3: MS Word format is opaque, and OpenDocument conditionally transparent. In other words, even an open format cannot be easily read by simple means: you need to unzip a bunch of files and parse XML. So, for version control, such documents are opaque.
What to do? Unix-Wei comes to the rescue in the form of simple programs that work with simple text.
Instruments
- Antiword - utility to extract text from MS Word;
- reStructuredText (reST) is a very simple and quite powerful semantic markup language for text;
- Docutils tools (rst2latex, rst2html, rst2odt, rst2xml) and rst2pdf - utilities for exporting text from reST to common formats for layout, web and print;
- Bonus: rst2a (online converter with API!)
Workflow
- antiword reads .doc and prints plain text;
- edit the plain text;
- rst2 * utilities convert text from reST markup to arbitrary formats.
For example, we received a document in MSWord format, we want to quickly fix something in it and save ourselves a text / template for the future:
$ antiword estimate.doc> estimate.txt
$ vim estimate.txt
$ rst2pdf estimate.txt -o estimate.pdf
Done, beautiful PDF can be viewed and printed. By the way, it’s convenient to keep the PDF open, for example, in
Okular , when editing the source code. When exporting from reST to PDF (and this can be done automatically), Okular will immediately update the content without discarding the open page. It turns out almost instant preview. And in the same place (in Okular) you can print the document.
I usually add another style file (the same one more or less fits all documents, can be expanded for a specific document). Styles for rst2pdf are written in JSON (see the documentation).
results
Problem number 1 solved: used cross-platform, lightweight, quick and unobtrusive tools.
Problem number 2 is solved: the nightmarish initial layout is killed immediately, instead of it we get the text itself, which can already be easily put into reST order. If necessary, the result can be brought to LaTeX.
Problem number 3 is solved: all documents (and styles) are completely transparent to VCS and can be read without any special means. Just like the program code. So, if something has changed in the official document, you will always have readable diffs for any dates. For inbound, you can also store the original versions (preferably the output of antiword), in order to diffuse them and easily transfer only the changes to the correct reST files.
Notes
- The proposed utility stack does not plug all holes. The beauty of Unix veya is that you can safely replace components.
- By the way, Antiword "is able to convert documents to plain text, to PostScript, to PDF and XML / DocBook", so in some cases you can even avoid reST.
- This note is written in reST and exported via rst2html. ;-)
- UPD: thanks for ingspree for the amendment: reST is not correct, but reST :)