Because This article is the result of several years of experiments, then there will be a lot of letters. But - perhaps - it will save someone many months of cycling on a rake, which are described.
In general, this is not even about Django, but about printing regulated documents from python using template engines.
To whom it is too lazy to read further - I will immediately say - the problem has not been completely resolved. But more or less working version loomed.
1. Task
- User enters data in web form
- The server inserts this data into the print form template.
- And gives the user in a form suitable for printing
2. Limitations
- Forms are “soft” (where accuracy is not very important - for example, Contract or Account) and “hard” (accuracy - maximum, under the scanner - for example, notifying a migrant or a statement on the STS (Form 26.2-1)).
- At the same time, even “soft” forms should be printed as close as possible to the intended creator (if I said that the borders are 1 cm, then the user should receive a document with borders exactly 1 cm) and — especially — take into account page breaks (see Forms 11001, 21001 and t .d.)
- Required - the minimum gestures to transform the source material (as a rule - .xls or .doc, drawn from the "Consultant" or "Guarantor").
- Because this is a web application - responsiveness and reliability of the solution are highly desirable => working with native python libraries is highly desirable.
- The possibility of placing all of this farm on a rented hosting (ideally - GAE) is desirable.
- The ability to visually edit templates is desirable.
- A quick preview of the template is desirable (and even better - and the result).
The first stage is the selection of the final format. After some thought from various tz. (cross-platform, guaranteed results, convertibility in) the choice fell on PDF.
Now - input formats and how to convert them.
3. Soft forms
Odf
We are talking about Open Document Format - ODS, ODT and others.
Everything is very simple here:
- Edit the template in LibreOffice (leaving space for data).
- Somehow fill the fields in Django.
- Somehow we get a PDF
Place for data: either we add user-defined fields to the document - or we insert {{django}} {{tags_django}} directly into the text. In the first case, filling in these fields later from python is most likely possible, but I can’t even imagine how (or rather, everything that is presented looks extremely confusing). Therefore, simply arrange the tags as text.
In this case, filling in the fields is elementary - we simply feed the template to the Django template engine (we’ll leave picking the python libraries inside the template to the gantushnikam :-). And in order not to unzip / zip the documents with every kick - documents are saved in * .fodX (Flat X) - the only one unpacked xml. The template is fed as xml.
Obtaining a PDF — without options — using LibreOffice: feeding the demon LibreOffice (libreofficed (found somewhere at ubuntovodov)) or unoconv or handmade LO launch in daemon mode. All of these options are about the same.
Virtues
- You can immediately use documents that are hidden on the Internet (as a rule, from the “Consultant”, in Microsoft Office formats).
- With editing templates - no problem.
- As with the preview.
- Perhaps - getting a PDF about Google Docs - has not yet tried. But I’m sure that it will be reactive now (and I don’t doubt that it’s incorrect; try to upload the same form 21001 from the Consultant into the gugledox (it lies on the tax site)).
disadvantages
- Sometimes when writing templates, LibreOffice spontaneously ruins tags, inserting into {{..}} all sorts of span lang = "en-GB" and others. Then you have to manually return everything back.
- Simply fantastic resource consumption for the server - CPU 100% (only one, no matter how many they are), hundreds of meters of RAM, receiving PDF - before a minute or after (form 21001 - 50 seconds at P4-3.0). Java same.
- Pulls for unmeasured packages (Fedora, CentOS).
- The presence of at least some X-server (Xvfb for example).
- Probably, on some hosting they will allow LibreOffice to be deployed - but I strongly doubt nic.ru for example. About GAE it is not even talking.
- Preview result - no.
Summary
As an extreme backup option - suitable. But just as extreme.
HTML
Here, with the editing of templates (with hands) and the template engine (distortion) everything is clear. Only one small one remains, but the main question is how to get the PDF? Quickly, efficiently, with page breaks where necessary. And here was the most experiments.
Numerous experiments with pure python html render (such as PISA and ancestors / heirs / forks) led to one important (IMHO) conclusion: to get a guaranteed result, use a ready-made html engine. Which, as we all know, already 4 (from normal). From them it is possible to use as much as 2 in linux - gecko and webkit. Most likely, it is possible to call a gecko from python - but a) for this you need a running X (as in the case of LibreOffice) and b) [semi] I did not find the finished recipe.
There is a webkit:
- PyQt4> Qt> WebKit> QPrinter (such as this ). Natively (although it carries a lot with it), quickly - but the pagebreak does not catch. In addition - we need special dances with DPI and ZoomFactor.
- GTK> WebKit> GTK printer (like this ). Native, smart - but also does not catch page break.
- Use a specially modified webkit - wkhtmltopdf - as an external binary (now this option is used) or through native python binding (in progress, but there are some minor problems). Natively (if binding), smartly, catches the page-break, the result is guaranteed.
Virtues
- Theoretically , visual editing is possible.
- Instant preview (in the same html form) - both the template - and the result.
- Reactive conversion to PDF.
- Pure python API conversion to PDF (this is “in progress”).
disadvantages
- Still, high-quality HTML - handmade.
- Complex forms (such as 21001) will have to write or draw by yourself - because on the Internet this is a terrible .xls.
- Because The lib / binary compiled for Linux is used - on the same nic.ru (FreeBSD) it will not work (without crutches). About GAE it is still not talking.
Summary
The main option for “soft” documents. But still, you need to look for
high-quality pure python html render - without flash drives, JS and other cartoons - but with high-quality processing of CSS.
maybe
For the future, TeX, LaTeX, Lyx, docbook formats are considered - but so far there are no advantages (especially for “almost soft” forms - like the same 21001).
')
4. Hard forms
Here everything is much sadder. Especially in the light of the fact that there is already a visual editor is highly desirable.
In addition - the vast majority (if not all) of “hard” RF forms use “squares” - when the text is broken into letters - and each fits into its own square (
example ).
Let's drop the first available ones (like “drag the text onto the tiff”) and go straight to the finalists.
RML
The development of
Reportlab (yes, python-reportlab is theirs) is an ordinary XML that allows you to create miracles from PDF. Because The well-known python-trml2pdf is already RIP (as the developer honestly wrote it to me) - I had to take this trml2pdf and finish it a bit, because It does not support many interesting features of RML, and
religion prohibits me from buying (and even less breaking) commercial rml2pdf.
Virtues
- Natively
- Smartly
- Flexibly
- There should be no problems with hosting (theoretically) - even in GAE (I haven't tried it).
disadvantages
- Strictly handmade
- Very annoying syntax - when you need to mix precise positioning (“graphics”) with “soft” text (“flowables”) (hence, apparently, the lack of a visual editor).
- No preview - no template, no result.
Summary
Substitute option for accurate forms (especially simple ones).
PDF forms
Everything is very simple here: source in PDF - and the final result in PDF.
- Take the original PDF form in your left hand
- XFDF (unpretentious xml), processed by the built-in Django template - to the right
- merge them (populate) into a new PDF (“unrolled” - flatten)
- and give the user
The problem is only one - p.3.
To date, the native and correctly working python API for working with PDF forms has not been found (although poppler can already do something — but there is still a lot of sawing there), so the only acceptable option is
iText . Through pdftk or your bike - this is already to your taste.
Virtues
- You can turn anything into a PDF form (as a separate question).
- You can even edit (likewise).
- Absolutely guaranteed result.
- Built-in PDF “squares” (combo).
- Most likely - no problems with hosting (perhaps - and with GAE) - have not tried.
disadvantages
- Call an external application instead of the python API.
- Java same.
Summary
The main option for accurate printing forms.
5. General summary
Total formed today:
- “Soft” forms - html | webkit - but through a rather heavy, redundant and not very portable webkittox library (and keep looking).
- “Hard” forms - PDF forms, but through a
crutch to an external JAVA library (and continue to rape poppler). - ODF and RML - as backup options, respectively.
Ps. How it all works - you can see
here - without ODF and RML, but the latter are provided.