📜 ⬆️ ⬇️

Electronic books and their formats: we tell about PDF - its history, pluses and minuses

In our blog, we have already discussed the specifics of the standards DjVu , FB2 and FB3 . Today we are talking about the PDF format, which has become the embodiment of the dream of a “paperless office”.


/ Flickr / Kim Siever / PD

A brief history of the format


The creator of PDF, or Portable Document Format, was made by John Warnock, one of the founders of Adobe, who wanted to facilitate the process of printing text and images from a computer. In 1984, Warnock introduced the PostScript page description language .
')
In Wikipedia, PostScript code is given as an example, which draws several words of “Wikipedia” in a circle .

%!PS-Adobe-1.0 %  ,     1 =1/72 , %    . 72 25.4 div % 1  = 72/25.4  dup %      scale %         100 100 translate %     (100 , 100 ) /Times-Roman findfont %   Times-Roman 10 scalefont %    10 (  - !) setfont %    0 30 330 { %     0  330   30 gsave %     rotate %    (      ) 15 0 moveto %    (15 , 0 ) (Wikipedia) show %     grestore %    } for %   for showpage %   

Initially, PostScript was developed as a tool for printing documents on a printer, but later Warnock decided that with the help of a new language it was possible not only to print documents, but also to completely “digitize” the document management system.

As part of this vision, Adobe (whose founder was Warnock) created the IPS format (the abbreviation stands for Interchange PostScript). Adobe Illustrator was created to work with it - a cross-platform graphics editor for Windows and Mac.

For the first time, IPS was shown at the Seybold conference in San Jose in 1991, but with this title the format existed for two years - in 1993 it was renamed to PDF. Acrobat Distiller and Acrobat Reader (later renamed as Adobe Reader) appeared at the same time.

The first time PDF was not popular. It was all due to the high price of software: Acrobat Distiller cost $ 700 for personal use, and $ 2,500 for corporate use. Acrobat Reader asked for another 50 dollars. Over time, Adobe reduced prices, and the popularity of PDF began to gain momentum.

By the time Zero Acrobat Reader 4.0 was downloaded, one hundred million people had downloaded , and large IT companies, such as Microsoft and Apple, began to use the PDF format.

How PDF works


The basic approach to the presentation of graphics and text in PDF is very similar to the one used by PostScript. For the display of text on the page correspond to the so-called text elements. They indicate where the characters should be drawn. Wikipedia provides the code for writing Hello World:

 /Courier %   20 selectfont %     72 500 moveto %     72, 500 (Hello world!) show %     showpage %     

For drawing vector graphics in PDF, paths are used: straight lines or cubic Bezier curves . Figures constructed with the help of contours can be filled with color or shaded. As for raster images, they are represented as dictionaries and streams . The dictionaries describe properties, and the stream contains binary information about the image.

The size of the PDF file depends on the resolution of images, font parameters, the use of hyperlinks, video, and so on. Before the 2000s, the size of PDF files was measured in megabytes, because most of the documents were JPEG files. To solve this problem, Adobe offered MRC ( Mixed Raster Content ) compression technology.

The MRC “ divides ” the scanned file into layers: a background layer, a text layer, and a color mask. For the compression of the information available on each layer, its own codec is responsible. For example, for text, JBIG2 can be used, which forms groups of similar letters and makes up a dictionary of them. So, the same characters are encoded once, and in the rest of the places they are simply referenced by them.

To compress other content, use JPEG, JPEG2000 or ZIP codecs. By means of them the background of the image, color selection of the text, pictures and photos are saved. Due to this approach, the size of each page is reduced by half or more. Visual examples of PDF compression are given by Abbyy in their blog on Habré .

PDF merits


One of the main advantages of PDF files is the fact that all pages look exactly the way the author of the document intended them. The format retains the original background, fonts and pictures in their original form, regardless of the device or operating system. In addition, PDF allows you to work with interactive elements: hyperlinks to navigate through the footnotes. Also added to the document media files: music, GIF, and even video clips.

At the same time, the PDF file can be made read-only, which helps protect the contents of the document from being copied and modified. For additional protection there is an opportunity to put a password or an electronic signature.

Another advantage of the format is its availability. A program for reading a PDF document Adobe Acrobat Reader is now freely available online. You can open PDF on electronic books. Most of them can work with this format by default. It is supported by many reader applications, for example, FBreader or NEO Reader.

Format Cons


The immutability of the PDF format, although it is its advantage, also turns out to be a big disadvantage. Such files (especially large schemes and graphics, notes, large format documents) are difficult to read on devices with small screens - smartphones, or compact electronic readers. The page simply does not fit on the device screen, or the text is displayed too small.

There are electronic readers on the market with a display resolution of 13.3 or 10.3 inches, which makes it comfortable to work with A4 sheets of PDF. Examples of such gadgets can be ONYX BOOX MAX 2 (review of which we have prepared in our blog ), ONYX BOOX Note or ONYX BOOX Gulliver (there is also a review on it). They provide an opportunity to consider all the details of the drawings and illustrations in their original size and will suit those who often have to read technical literature. However, the cost of such gadgets is quite high.

There is another problem with the display of pages of documents. It is associated with the JBIG2 format. Although the codec allows you to compress the text several times, it is affected by the “yin” problem (we wrote about it in the DjVu material). When compressing text and compiling a dictionary, some characters are replaced with similar ones (for example, “and” turns into “n”), which leads to a distortion of the essence of the text.

There are also problems with editing PDF files, since for this you have to install special programs that are often paid (for example, Acrobat DC ). There are free editing services like PDF2GO on the network , but they only allow you to add text or pictures "on top" of the original file.

Further development


Despite the shortcomings, today PDF remains a popular format. The marketing company HubSpot asked three thousand visitors to their site about what they do with e-books: read online or download to PDF. It turned out that 90% of respondents prefer to download a PDF file.

Developers are constantly adding new features, including for reading on portable gadgets. For example, in early 2018, the Adobe team provided Acrobat DC with improved display and editing features for mobile devices.

In addition, in August, there was information about the new project - PDF audible . It will allow you to combine the features of PDF and the functionality of voice assistants: Alexa, Google Home and Siri. So far, only a prototype is ready, but the developers promise to release a working version in the near future.

In Adobe, follow the new directions and intend to make the format more interactive, for example, add the functions of augmented reality. How it will look is not yet clear, but the developers promise that the PDF ecosystem will reach a new level of user interaction in the coming years.



Additional reading - reviews of ONYX BOOX readers:

Source: https://habr.com/ru/post/435308/


All Articles