📜 ⬆️ ⬇️

Application for reading habr offline

Good day!

I have already talked about the data extracting SDK , which aroused a certain interest among users. Since then, many tasty features have been added and I would like to tell a little about them. As an example, a small application was implemented - HabraPDFReader, which is designed to save Habratopics in PDF files for offline reading. Along the way, I will talk about the intricacies of implementation.

Algorithm


The algorithm is very simple:
1. Select the number of pages processed.
2. Download the list of the latest topics.
3. We note your favorite topics.
4. Save them to PDF files.
')
Now more.

Create an application


This will be a Windows application written in C # version 3.5 in VS2008 SP1.

Appearance is shown in the picture:

image

We get a list of the latest topics (from the first page):

UriHtmlProcessor proc = new UriHtmlProcessor( new Uri ( "http://habrahabr.ru/new/page1/" ));
proc.Initialize();

var links = from l in proc.Links
where l.Class == "topic" && EndsWithInt(l.Href) == true
select new ResultItem{
Link = l.Href,
TopicName = l.Text.ToWindows1251()
};


* This source code was highlighted with Source Code Highlighter .

Data Extracting SDK has learned how to query the DOM tree , so we can use all the features of Linq to get the links we need. In our case, we take all the links whose class is equal to “topic” and the links end in a number (this is necessary for filtering out links like #habracut and #comment).

Convert text by reference to PDF file


This functionality could be easily implemented using VSTO, but I did not want to be attached to MS Office. Therefore, the Html2Pdf service was found, with which we will receive our PDF files.

The code is as follows:

UriHtmlProcessor proc = new UriHtmlProcessor("<www.htm2pdf.co.uk/?url= ");
proc.Initialize();
var pdf = proc.Links.Where(ll => ll.Text == "Download PDF" ).FirstOrDefault();

if (pdf != null )
{
Uri uri = pdf.Href.FixUrl( new Uri ( "http://www.htm2pdf.co.uk/" ));
proc.Download(uri, Path.Combine(<, >, "_.pdf" ));
}


* This source code was highlighted with Source Code Highlighter .


Thus, we can get a set of PDF for reading Habr offline.

Screenshot after program execution:

image

SDK project address: http://extracting.codeplex.com/ (the old version is still there)

Download HabraPDFReader

Waiting for your comments and suggestions!

Source: https://habr.com/ru/post/79220/


All Articles