Every day, in the process of activity of the
REG.RU registrar , in which I work as a programmer, hundreds of operations are performed that require the issuance of any official documents. Among them are various contracts, invoices, certificates, etc., which both companies and customers need to print. For such purposes, the PDF format is well suited, which today has become the de facto main for the exchange and distribution of documentation. The main advantages of this format include: cross-platform, hardware independence and security. All of the above allowed PDF to gain popularity among users and become one of the most common.
How can I create PDF documents on the fly from a script?
For such purposes there are various tools. One of these is the LaTeX markup language, which allows you to automate many tasks for preparing articles, including typing in several languages, numbering sections and formulas, cross-references, posting illustrations and many others. But LaTeX has one very serious problem: it has a very steep learning curve. To learn it takes a lot of time. And in LaTeX, it is very inconvenient to work with tables. Having spent a lot of time searching for the best solution, I came to the conclusion that the easiest way is to convert the finished HTML page to PDF and transfer it to the client. I have reviewed the programs that can be used for this conversion.
Requirements for converters
The main attention was paid to such features of converters as:
- Easy font customization
- Insert page break
- Indifference to the X server
- CSS support
It is desirable that the fonts and encoding can be easily customized. Ideally, the converter should recognize the encoding and font used. That the necessary data was located on one page, but did not spread out on two, users make a gap on the page. I would like to be able to create breaks in a simple way - through the CSS property. The converter must be independent of X Windows, because it runs on a Web server that is heavily loaded even without X Windows. Of course, you can use
Xvfb , but this is not the right solution. For the test were made two simple HTML-pages that have been validated. The first page contains a break made by the CSS property, the second one contains a complex table with cell associations.
')
So the pages are displayed in the browser:
Overview
wkhtmltopdf . Probably the most popular to date converter and, as it turned out, not without purpose. It is based on the webkit engine: it takes fonts from the system, can make page breaks, and for work you need library files from the X-server.
Work example:
As you can see from the example, wkhtmltopdf did a good job. All blocks are in place, there are pictures, there is a page break.
webkit2pdf . Analogue wkhtmltopdf. He needs a running X server. The results of his work can be found a little higher.
pisa (xhtml2pdf) . The converter is written in python, which means it is independent of the X server. Able to make breaks, fonts are configured in a separate CSS file, the path to which is passed through the parameter. However, it is very picky: in the case of the slightest error or omission in the HTML code, it falls.
Work example:
Very bad: the font is correctly identified, with the markup failed.
html2pdf . Easy to handle, takes fonts from the system, can make page breaks. For conversion uses some old version of the browser Firefox. But he needs a running X server. In addition, it can turn off and refuse to work. Paid.
Work example:
In addition to the basement on the second page, all the blocks and pictures are in place.
htmldoc . Simple converter without frills.
Work example:
Does not understand CSS.
html2ps, ps2pdf . According to the characteristics similar to htmldoc.
Work example:
prince . Paid converter is not cheap. Uses system fonts, can make breaks, is indifferent to the X-server.
Work example:
Everything moved out, problems with positioning.
Results in the form of a table
Name | Font customization method | Support page breaks | X server independence | CSS support | Free |
wkhtmltopdf | Uses system | + | + | + | + |
webkit2pdf | Uses system | + | - | + | + |
html2pdf | Uses system | + | - | + | - |
htmldoc | Set through the parameters | - | + | - | + |
pisa (xhtml2pdf) | Need to specify the paths to the fonts in the CSS file | + | + | + | + |
Bundle html2ps, ps2pdf | ? | - | + | - | + |
prince | Uses system | + | + | + | - |
findings
As it turned out, free converters handled the conversion tasks better. If you need to convert a page with a large amount of graphics, frames and javascript, it is better to use converters based on webkit. If the page with the minimum number of HTML elements, then htmldoc will do its job well.
Note
An overview of PHP converters can be read
here . And
here you can read a review on online converters.
UPD: Disable your ad blocker if the pictures are not visible.