Making a PDF book from a web comic with C # using the example of xkcd

Looking at the new xkcd release, I looked at my newly acquired Sony PRS-650 electric book, and immediately thought - I want to watch comics on it! Xkcd is just black and white and usually small in size. Slightly googling, I found only a collection of pictures on TPB, and a script for bash, which should make PDF. I decided to poke around a little bit in programming and make a comic book grabber on my favorite C #.

It would be possible to do with the console application, but, for clarity, I made a simple interface on WPF.

A complete code review will be redundant, so I will explain the main points. I recommend immediately open / download the full application code with Google Code .
')

1. Get pictures, titles and alt-text from the site

On xkcd comics are conveniently located at addresses like xkcd.com/n , where n = 1 ...
The first thought was to rip out the necessary from the page code, but it turned out that you can get all the information in JSON at xkcd.com {0} / info.0.json

For JSON in .NET there is a DataContractJsonSerializer
Create the appropriate DataContract:

[DataContract] public class XkcdComic { #region Public properties and indexers [DataMember] public string img { get; set; } [DataMember] public string title { get; set; } [DataMember] public string month { get; set; } [DataMember] public string num { get; set; } [DataMember] public string link { get; set; } [DataMember] public string year { get; set; } [DataMember] public string news { get; set; } [DataMember] public string safe_title { get; set; } [DataMember] public string transcript { get; set; } [DataMember] public string day { get; set; } [DataMember] public string alt { get; set; } #endregion }

... and use:

  private static XkcdComic GetComic(string url) { var stream = new WebClient().OpenRead(url); if (stream == null) return null; var serializer = new DataContractJsonSerializer(typeof (XkcdComic)); return serializer.ReadObject(stream) as XkcdComic; }

At xkcd.com/info.0.json, you can get the latest comic, and by taking its number from the num field, find out their total number.
It remains to deflate the picture itself, everything is simple:

 var imageBytes = WebRequest.Create(comicInfo.img).GetResponse().GetResponseStream().ToBytes();

... where comicInfo is our data from JSON, and ToBytes () is a simple extension-method that reads data from a stream into an array.

The comic class is used to represent a comic strip (comic strip, or how to correctly name it in the singular?). To validate the received bytes of the picture (we could download something wrong, the server could return an error, etc.) the class constructor was made private, and the factory Create method was added, which returns null in case of a decoding error. BitmapImage is used for decoding, which, if successful, will be used as a thumbnail to preview the result:

  public static Comic Create(byte[] imageBytes) { try { // Validate image bytes by trying to create a Thumbnail. return new Comic {ImageBytes = imageBytes}; } catch { // Failure, cannot decode bytes return null; } } public byte[] ImageBytes { get { return _imageBytes; } private set { _imageBytes = value; var bmp = new BitmapImage(); bmp.BeginInit(); bmp.DecodePixelHeight = 100; // Do not store whole picture bmp.StreamSource = new MemoryStream(_imageBytes); bmp.EndInit(); bmp.Freeze(); Thumbnail = bmp; } }

Putting it all together, we get a method for downloading a comic strip by its number:

  protected override Comic GetComicByIndex(int index) { // Download comic JSON var comicInfo = GetComic(string.Format(UrlFormatString, index + 1)); if (comicInfo == null) return null; // Download picture var imageStream = WebRequest.Create(comicInfo.img).GetResponse().GetResponseStream().ToMemoryStream(); var comic = Comic.Create(imageStream.GetBuffer()); if (comic == null) return null; comic.Description = comicInfo.alt; comic.Url = comicInfo.link; comic.Index = index + 1; comic.Title = comicInfo.title; // Auto-rotate for best fit var t = comic.Thumbnail; if (t.Width > t.Height) { comic.RotationDegrees = 90; } return comic; }

Thus, we have the number of comics and the method to get a strip by index.

Parallel downloads

I will use the Task Parallel Library , since I was going to try it for a long time, but there was no reason. At first glance, everything is simple, instead of a direct call to GetComicByIndex (i), everything is done in a var task = Task.Factory.StartNew (() => GetComicByIndex (i)). We write all the running tasks into the tasks array and do Task.WaitAll (tasks), after which we get the results of each task from task.Result. But this approach will not allow us to track progress and show already loaded strips to the user. To solve this problem, we will use WaitAny and yield return to return the result of each task immediately upon completion:

  public IEnumerable<Comic> GetComics() { var count = GetCount(); var tasks = Enumerable.Range(0, count).Select(GetTask).ToList(); while (tasks.Count > 0) // Iterate until all tasks complete { var task = tasks.WaitAnyAndPop(); if (task.Result != null) yield return task.Result; } }

Here, the GetTask method returns the GetComicByIndex (i) task, plus error handling and caching (this is beyond the scope of this article). The WaitAnyAndPop - extension method, which waits for the completion of one of the tasks, removes it from the list and returns:

 WaitAnyAndPop — extension ,      ,      : public static Task<T> WaitAnyAndPop<T>(this List<Task<T>> taskList) { var array = taskList.ToArray(); var task = array[Task.WaitAny(array)]; taskList.Remove(task); return task; }

Now in the ViewModel code (I do not consider architectural issues in this article, but MVVM (Model-View-ViewModel) is the de facto standard for WPF applications, and the code for pumping, exporting and other things, of course, is broken down into the corresponding classes) we can iterate over the result of the GetComics method in the background thread and show the user the strips as they arrive:

  private readonly Dispatcher _dispatcher; private readonly ObservableCollection<Comic> _comics = new ObservableCollection<Comic>(); private void StartGrabbing() { _dispatcher = Dispatcher.CurrentDispatcher; // ObservableCollection modifications should be performed on the UI thread ThreadPool.QueueUserWorkItem(o => DoGrabbing()); } private void DoGrabbing() { var grabber = new XkcdGrabber(); foreach (var comic in grabber.GetComics()) { var c = comic; _dispatcher.Invoke((Action) (() => Comics.Add( c )), DispatcherPriority.ApplicationIdle); } }

2. Display comics in WPF

In the XAML code, all we have to do is bind to our ObservableCollection, and prepare the corresponding DataTemplate to observe the loading process and the comics themselves, with the alt text in the Tooltip:

  <ListView ItemsSource="{Binding Comics}" ScrollViewer.VerticalScrollBarVisibility="Disabled" x:Name="list" Margin="5,0,5,0" ScrollViewer.HorizontalScrollBarVisibility="Visible" Grid.Row="1"> <ItemsControl.ItemTemplate> <DataTemplate> <Border BorderBrush="Gray" CornerRadius="5" Padding="5" Margin="5" BorderThickness="1"> <StackPanel Orientation="Vertical"> <StackPanel Orientation="Horizontal"> <TextBlock Text="{Binding Index}" FontWeight="Bold" /> <TextBlock Text="{Binding Title}" FontWeight="Bold" Margin="10,0,0,0" /> </StackPanel> <Image Source="{Binding Thumbnail}" ToolTip="{Binding Description}" Height="{Binding Thumbnail.PixelHeight}" Width="{Binding Thumbnail.PixelWidth}" /> </StackPanel> </Border> </DataTemplate> </ItemsControl.ItemTemplate> <ItemsControl.ItemsPanel> <ItemsPanelTemplate> <StackPanel Orientation="Horizontal" /> </ItemsPanelTemplate> </ItemsControl.ItemsPanel> </ListView>

3. Create a PDF book

PDF is chosen because of its popularity and good support in Sony electric books. To work with PDF in .NET there is a convenient open source library iTextSharp (you will need to download it separately to build the project). Everything is pretty simple-minded. Omitting exception handling, adjusting the size of the image and fonts, we get the following:

 var document = new Document(PageSize.LETTER); var wri = PdfWriter.GetInstance(document, new FileStream(fileName, FileMode.Create)); document.Open(); foreach (var comic in comics.OrderBy(c => c.Index).ToList()) { var image = Image.GetInstance(new MemoryStream(comic.ImageBytes)); var title = new Paragraph(comic.Index + ". " + comic.Title, titleFont); title.SetAlignment("Center"); document.Add(title); document.Add(image); document.Add(new Phrase(comic.Description, altFont)); document.Add(Chunk.NEXTPAGE); } document.Close();

results

It turned out here is an application that, in addition to exporting to PDF, makes it quite convenient to view comics:

Webcomic Grabber Screenshot

How the result looks like on the book can be seen on the first picture of the article.

What remains beyond the scope of the article

Cache the loaded data between application launches (done using IsolatedStorage).
Support for other webcomixes (For this purpose, I pre-allocated the IGrabber interface, and brought some of the functionality to TaskParallelGrabber. While writing the article, I added grabbers for WhatTheDuck and Cyanide & Happiness).

Links

Application Code (C #): Google Code
Working with PDF on .NET: iTextSharp
Comics: xkcd

UPD:
Thanks to XHunter for uploading the resulting PDF and compiled program !

UPD2:
I will simply leave here a link to a good “response” article, which reveals in detail the topic of pumping out comics using WCF: http://darren-brown.com/?p=37

Source: https://habr.com/ru/post/112965/

All Articles