⬆️ ⬇️

From html to pdf - easy! Converter Overview

Every day, in the process of activity of the REG.RU registrar , in which I work as a programmer, hundreds of operations are performed that require the issuance of any official documents. Among them are various contracts, invoices, certificates, etc., which both companies and customers need to print. For such purposes, the PDF format is well suited, which today has become the de facto main for the exchange and distribution of documentation. The main advantages of this format include: cross-platform, hardware independence and security. All of the above allowed PDF to gain popularity among users and become one of the most common.



How can I create PDF documents on the fly from a script? For such purposes there are various tools. One of these is the LaTeX markup language, which allows you to automate many tasks for preparing articles, including typing in several languages, numbering sections and formulas, cross-references, posting illustrations and many others. But LaTeX has one very serious problem: it has a very steep learning curve. To learn it takes a lot of time. And in LaTeX, it is very inconvenient to work with tables. Having spent a lot of time searching for the best solution, I came to the conclusion that the easiest way is to convert the finished HTML page to PDF and transfer it to the client. I have reviewed the programs that can be used for this conversion.



Requirements for converters


The main attention was paid to such features of converters as:

It is desirable that the fonts and encoding can be easily customized. Ideally, the converter should recognize the encoding and font used. That the necessary data was located on one page, but did not spread out on two, users make a gap on the page. I would like to be able to create breaks in a simple way - through the CSS property. The converter must be independent of X Windows, because it runs on a Web server that is heavily loaded even without X Windows. Of course, you can use Xvfb , but this is not the right solution. For the test were made two simple HTML-pages that have been validated. The first page contains a break made by the CSS property, the second one contains a complex table with cell associations.

')

So the pages are displayed in the browser:

image



image



Overview


wkhtmltopdf . Probably the most popular to date converter and, as it turned out, not without purpose. It is based on the webkit engine: it takes fonts from the system, can make page breaks, and for work you need library files from the X-server.

Work example:

image image image

As you can see from the example, wkhtmltopdf did a good job. All blocks are in place, there are pictures, there is a page break.



webkit2pdf . Analogue wkhtmltopdf. He needs a running X server. The results of his work can be found a little higher.



pisa (xhtml2pdf) . The converter is written in python, which means it is independent of the X server. Able to make breaks, fonts are configured in a separate CSS file, the path to which is passed through the parameter. However, it is very picky: in the case of the slightest error or omission in the HTML code, it falls.

Work example:

image image

Very bad: the font is correctly identified, with the markup failed.



html2pdf . Easy to handle, takes fonts from the system, can make page breaks. For conversion uses some old version of the browser Firefox. But he needs a running X server. In addition, it can turn off and refuse to work. Paid.

Work example:

image image image

In addition to the basement on the second page, all the blocks and pictures are in place.



htmldoc . Simple converter without frills.

Work example:

image image

Does not understand CSS.



html2ps, ps2pdf . According to the characteristics similar to htmldoc.

Work example:

image image



prince . Paid converter is not cheap. Uses system fonts, can make breaks, is indifferent to the X-server.

Work example:

image image image

Everything moved out, problems with positioning.



Results in the form of a table
NameFont customization methodSupport page breaksX server independenceCSS supportFree
wkhtmltopdfUses system++++
webkit2pdfUses system+-++
html2pdfUses system+-+-
htmldocSet through the parameters-+-+
pisa (xhtml2pdf)Need to specify the paths to the fonts in the CSS file++++
Bundle html2ps, ps2pdf?-+-+
princeUses system+++-
findings


As it turned out, free converters handled the conversion tasks better. If you need to convert a page with a large amount of graphics, frames and javascript, it is better to use converters based on webkit. If the page with the minimum number of HTML elements, then htmldoc will do its job well.



Note


An overview of PHP converters can be read here . And here you can read a review on online converters.



UPD: Disable your ad blocker if the pictures are not visible.

Source: https://habr.com/ru/post/134505/



All Articles