📜 ⬆️ ⬇️

Habrahabr PDF for e-book

Often hanging on Habré and not only many times I caught myself thinking that information and articles are much more efficiently perceived from a phone or tablet when you read in a comfortable posture, or even not at home - in transport, business trips, etc. Description of games with a file for the original conversion of Habrahabr to PDF-version for comfortable offline reading on the e-book is rather a curious experiment variant, where several interesting services and technologies known to everyone are involved: PHP, CURL, ajax, js, css.

So, everything is in order.

The idea to purchase an e-book on e-ink ink has long been ripe. Everyone knows the reasons for this: less strain on the eyes, longer time to work on a single charge. Not welcoming “all-in-one” combines la ’, I stopped at Amazon Kindle 6 version, which, thanks to one popular bulletin board, I got almost the price as on ebay, but immediately and with the opportunity to bargain and touch. There are plenty of reviews of this model on the web, but the main point is in sufficient conservatism of the manufacturer. Yes, this is an electronic book in all its manifestations. There is no mp3-player, applications, games and other chips. Strictly popular world formats of "paper recycling" and a simple browser, nothing more. I note that the initial rate on the built-in browser was clearly overstated. And the same Habr opened in a very small version of the font, rendered the “Habrovsky” colors of the headings # b5b5b5 in a very low contrast color. In general, reading directly from the browser strained a little.


')
Moreover, in contrast to the downloaded books, which are turned over to the next page with one click, in the browser you had to flip through the standard “telephone” method, swiping your finger from bottom to top. And if you don’t notice such gestures on the hot-tempered glass of the phone, then the gentle, vulnerable and slightly rough e-ink display somehow didn’t lift a hand to rub several hours a day. In addition, it was also blind, because the picture was redrawn only when the finger was released, and the eye was forced to look for a place to read from.

Then the idea was born to program a certain "proxy" for convenient surfing and exporting the most interesting to PDF. Pretty quickly, I wrote a PHP interlayer, to the input of which in the GET parameter the url of the target site was transmitted and which requested the required resource via CURL.

First code outline
$q = $_GET['q']; if ( get_magic_quotes_gpc()) $q=addslashes(stripslashes(trim($q))); if ($ch = curl_init()) { //   CURL curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0'); curl_setopt($ch, CURLOPT_URL, $q ); $p = curl_exec($ch); //    header("Content-Type: text/html; charset=utf-8"); echo $p; } else echo '  CURL!'; 


Such a simple solution was successfully tested on a PC and in a book. Of course, all css-styles and js-scripts are safely "lost", because This simplest code did not take into account the relative links in the html-code of the site being parsed.

To correct this annoying misstep, I had to implement
replacing relative absolute references:
 $p = str_replace ('href=\'/', 'href=\'http://m.habrahabr.ru/' , $p); $p = str_replace ('href="/', 'href="http://m.habrahabr.ru/' , $p); $p = str_replace ('src=\'/', 'src=\'http://m.habrahabr.ru/' , $p); $p = str_replace ('src="/', 'src="http://m.habrahabr.ru/' , $p); 

I pay attention to pair replacements with single and double quotes - in the code of Habrashranits there were both spelling variants.

I didn’t want to stop and I began to think about improving my idea.

The following suggestions were made to improve the usability of web surfing:
  1. Scale (for comfortable reading it was necessary to increase the font size by 2 times);
  2. Color (due to the color characteristics of e-ink, all the “fashionable” colors needed to be made more contrast);
  3. Convenient page scrolling.

All the corrections I decided to make a replacement similar to the above manipulation with the substitution of references.

Implementing CSS Add-ons
 //     ,         $p = str_replace ('</head>', '<style>body { zoom:2; } * {color: #000 !important; }</style></head>', $p); 


The design idea does not stand still - I decided to start the experiments with convenient scrolling with the simplest button that will be tightly attached to the lower right corner of the screen. As planned, clicking on it should have called the simplest function, which through scrollBy would move the scroll down to the desired value. Intermediate tested codes will not give. Alas, all these examples, brilliantly working on a regular PC, were not workable in the browser browser ... All known cross-browser functions, at least just giving the value of the current scroll, were undefined or 0 when testing on a device. Moreover, even the button itself, which has a clearly defined positioning fixed and tied to the bottom of the screen, scrolled in the book along with the site. Having spent half an hour of time (and the number of nerves per hour * 0.5 proportional to my rate), I decided to go another way. Which, as it turned out, is quite efficient and convenient.

Namely - I wanted to make a convenient opportunity to export to PDF, so as not to be distracted by the features of the browser of the book - and calmly read, the blessing of doing it from the file is much more comfortable. In addition, offline reading is relevant where there is no wi-fi.

By monitoring the current resources for quickly exporting web pages to PDF ( selectpdf.com and web2pdfconvert.com ), I reassigned my fixed-button function to such quick export.

Introduction of the button for quick download of the page in PDF
 //    $p = str_replace ('</head>', '<style>body { zoom: 2; } * {color: #000 !important; } .fixed-buttons {position: fixed; right: 0; bottom: 0; margin-top: 0; background-color:gray; width:40px; height:40px;} </style></head>', $p); //   (   BODY) $p = str_replace ('</body>', '<a class="fixed-buttons" href="http://selectpdf.com/save-as-pdf">PDF</a></body>', $p); 


Yes, this solution worked. Selectpdf.com, by the value of the referer field, determines where the user came from, converts this page and returns with the desired mime header. For the user, this means exactly that he does not even suspect the existence of selectpdf.com, but only clicks on a magic button on my site - and almost instantly the pdf is downloaded. Yes, by the way, “my site” is not quite a correct expression ... Indeed, in this case, all the content is kindly provided by Habr. However, I do not plan to lay out the service in public access and I think that for my personal purposes such parsing CURL does not bring any problems to Habra himself in this case.

So, we seem to be close to our goal - to read the page in a magical downloaded pdf-version. But Kindle has prepared a surprise for us again! Downloading in the browser is allowed only in several formats - MOBI, TXT, and a couple more. Not quite clear limitation, because the book itself copes with most formats, including even RTF and Word-ovsky DOC (X).

Well, it turns on abnormal programming. And the excitement.

I remember about a curious opportunity to synchronize documents, which Amazon laid in its device. Namely - each authorized user has a personal account and a personal Kindlow email to which you can send letters with documents from the white list of mailboxes and they will magically appear in the book. A kind of parody of the dropbox and other cloud file storage systems. But not to use this opportunity would be stupid. In addition, the tests performed showed that in such a scenario of “courier delivery of documents to a book”, at least PDF, or any other list supported by Kindl, is successfully downloaded.

Now about the second half of the "bridge", which we will continue to build. And which exactly as the bridge grew together in my head during the analysis of all the options for solving the set task. Namely - the PdfByEmail service provided by one of those services that I monitored earlier. Essence - you can send the target url to the web2pdfconvert.com portal in a letter, specifying the keyword Convert in the message subject and in response (as the service developers promise - within 1 ... 3 minutes) you will receive a letter with attached PDF nickname. Brilliant! For our case, the most that. After all, we can make sure that the response letter is sent directly to the book, i.e. to her authorized email. All you need to do is add the email conversion service no-reply@web2pdfconvert.com to the whitelist of your Amazon personal account.

I had to rewrite the function of the same fixed-button at the bottom of the site. Now it launches an ajax request to another page of my site, indicating which url to parse. Since there was no certainty whether jquery is connected on all sites where I will wander, and it would be cumbersome to connect my own, I had to remember old times when I did ajax on native js, without frameworks.



A bunch of js and php, responsible for sending a letter to the conversion service
Frontend
 // AJAX function getXmlHttp() { var xmlhttp; try {xmlhttp = new ActiveXObject("Msxml2.XMLHTTP");} catch (e) { try {xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");} catch (E) {xmlhttp = false;} } if (!xmlhttp && typeof XMLHttpRequest!='undefined') { xmlhttp = new XMLHttpRequest(); } return xmlhttp; } // ajax-  function www_to_pdf(q) { var req = getXmlHttp(); req.onreadystatechange = function() { if (req.readyState == 4) { if(req.status == 200) { if (req.responseText == "ok") { alert('     !'); } else alert(req.responseText); } else alert(req.status); } } req.open('GET', 'www_to_pdf.php?q=' + encodeURIComponent(q), true); req.send(null); //   } 

Backend:

 $q = $_GET['q']; if ( get_magic_quotes_gpc()) $q=addslashes(stripslashes(trim($q))); //  e-mail $title = "convert"; $headers = "From: myamazonlogin<myamazonlogin@kindle.com>\r\nReply-To: myamazonlogin<myamazonlogin@kindle.com>\r\nContent-type: text/plain; charset=utf-8\r\n"; $ret = mail('submit@web2pdfconvert.com', $title, "http://mysite.ru/proxy.php?exp=1&q=$q", $headers); if ($ret == 1) echo 'ok'; else echo $ret; 

The exp = 1 parameter in the link to send I added for slightly different display of the web page for the PDF converter and for my viewing.

The converter is too "shallow", and for him I did a triple scale body in css. In addition, my fixed-button should not be visible in the export document. This and ran the flag exp = 1.

Of course, at first I tested sending to my box, and only then decided to run the whole chain. Most of all, I was afraid that my email sent from a domain that does not match the email domain for a kindle .com reply would be banned from spam filters of the PDF service. But everything went well and earned the first time. Exactly 2-3 minutes, as claimed by the developers of web2pdfconvert - and the file falls into a book with a nameplate NEW.

To summarize. Now, thanks to the self-made proxy, I have the opportunity to surf your favorite sites in the book in a convenient scale, contrast. And having stumbled upon an interesting article, one click to send it for conversion, knowing that in three minutes it will fall into my book in a PDF-version. Roughly speaking, you can watch the news feed daily and click on articles that interest you. And then read them offline and with convenient usability. By the way, as it turned out, web2pdfconvert saves links. So even reading a PDF, I can just as before, open something on the hyperlink from the document. For example, the comments section or "related articles" below.

By the way, I had to introduce another small hack about spoilers in Habr's articles. After all, in PDF you can’t open them ...

The problem was solved by a couple more corrective inserts.
 $p = str_replace ('</head>', '<style>body { zoom: ' . ( $exp ? 3 : 2 ) . '; } * {color: #000 !important; } .fixed-buttons {position: fixed; right: 0; bottom: 0; margin-top: 0; background-color:gray; width:40px; height:40px;} .spoiler_text {display:block;} </style><script type="text/javascript" src="http://mysite.ru/scripts/js/www_to_pdf.js"></script></head>', $p); $p = str_replace ('class="spoiler_text"', 'class="spoiler_text" style="display:block; line-height: 120%;"', $p); 

Pay attention to the above $ exp parameter, which varies the scale for conversion and simple viewing.

Finita la comedy! Now, reading your favorite articles, and even through a homemade crutch (so what, that crutch) - brings even more pleasure. Of course, there is still much to improve the idea. For example, the names of files that the converter sends are equal to mysyte-ru and do not depend on the further link. I think that can be solved through the creation of subdomains. To at least supernovost-mysyte-ru or something else differed by name. We must read about the length limitations of domain names. Well, in the headers sometimes converts the converter (see below). But this stuff.

Source: https://habr.com/ru/post/257539/


All Articles