[PF] Print PDF under .NET, raster approach

Under the cut I want to tell you about printing PDF files under .NET. With an emphasis on printing from different trays of the printer. This is useful when you need to print documents consisting of different types of paper laid out in the trays of the printer.

The .NET environment out of the box does not provide tools for working with PDF files. Of course there are paid libraries, but not always even in them you can find a solution to the tasks.

To print the document “as is”, you need, using “winspool.drv”, to send the file to the printer in the form of Raw Data. This is written in detail on the support site. With this approach, it should be borne in mind that the printer used must be able to handle the PDF format. If there is no such printer, you need to convert PDF to PostScript and send it to the printer as Raw Data. On how to convert PDF to PostScript, you can always consult with Google.

If you need to change printer settings when printing, you will have to look for another way. PostScript is a device-independent document description language that does not provide legal means for fine work with iron, for example, for selecting trays.
')
Another way is to render PDF to raster and print the last one. This method is the most costly in terms of resources, but it is simply implemented and allows printing arbitrary documents by regular .Net tools. This is the approach we will use.

The .NET environment provides a convenient PrintDocument class for working with printers, in which the printer canvas is represented by a Graphics object. You can drag a bitmap image of a PDF page onto the canvas using DrawImage. For rendering PDF documents to raster there is a wonderful free utility GhostScript.

GhostScript utility is available for free download on the official site GhostScript . Download and install the appropriate version of Download . For convenient work, the GhostScriptNet wrapper has been developed from the .NET environment, which will also have to be downloaded. Archive unpacked near the project. In the archive we are interested in the assembly of Ghostscript.NET.dll, which we immediately connect to the project of the application being developed, it is assumed that it has already been created;)

The environment is configured and ready to go. To check, we will write a very small console application that converts the document pages to * .jpg files:

using Ghostscript.NET; using Ghostscript.NET.Rasterizer; namespace GhostScript { class Program { private static GhostscriptVersionInfo _lastInstalledVersion = null; private const int DPI = 200; static void Main(string[] args) { const string outputPath = @"output\"; if (!args.Any()) { Console.WriteLine("{0} [*.pdf]", Path.GetFileName(Environment.GetCommandLineArgs()[0])); return; } var inputPdfPath = args[0]; _lastInstalledVersion = GhostscriptVersionInfo.GetLastInstalledVersion( GhostscriptLicense.GPL | GhostscriptLicense.AFPL , GhostscriptLicense.GPL ); var rasterizer = new GhostscriptRasterizer(); rasterizer.CustomSwitches.Add("-dNOINTERPOLATE"); rasterizer.CustomSwitches.Add("-sPAPERSIZE=a4"); rasterizer.TextAlphaBits = 4; rasterizer.GraphicsAlphaBits = 4; rasterizer.Open(inputPdfPath, _lastInstalledVersion, false); if (Directory.Exists(outputPath)) { Directory.Delete(outputPath, true); } Directory.CreateDirectory(outputPath); for (var pageNumber = 1; pageNumber <= rasterizer.PageCount; pageNumber++) { var outputFileName = string.Format("Page-{0:0000}.jpg", pageNumber); var outputFilePath = Path.Combine(outputPath, outputFileName); using (var img = rasterizer.GetPage(DPI, DPI, pageNumber)) { img.Save(pageFilePath, ImageFormat.Jpeg); } } } } }

Working with GhostScriptNet is quite simple; we create an object of the GhostscriptRasterizer class that will provide functionality for converting document pages into objects of the Image class. We set the parameters that I selected so that the result matches the AcrobatReader as much as possible. Open the PDF file, cycle through the pages, getting Image objects. We save in jpg files in in advance prepared directory.

Now let's deal with printing using PrintDocument. This class has a simple and intuitive interface, while it contains everything you need for flexible work with the printer. It is better to see once than hear a hundred times, so go to the code.

An example of a function for printing an image from a file:

 static void Print(string file) { using (var pd = new System.Drawing.Printing.PrintDocument()) { pd.PrinterSettings.Duplex = Duplex.Simplex; pd.PrintPage += (o, e) => { var img = System.Drawing.Image.FromFile(file); e.Graphics.DrawImage(img, e.Graphics.VisibleClipBounds); }; pd.Print(); } }

The first thing you might need when working with a PrintDocument is to specify a printer. This is done through the PrinterName property:

 pd.PrinterSettings.PrinterName = " ";

If this is not done, the default printer on the system will be used.

The name of the printer can be obtained from the list of available printers:

 var printrers = PrinterSettings.InstalledPrinters; pd.PrinterSettings.PrinterName = printrers[1];

To print multiple pages, the page handler should use the HasMorePages flag belonging to an object of the PrintPageEventArgs class. The PrintPage handler will be called until HasMorePages == true.

For example, HasMorePages can be used like this:

 e.HasMorePages = ++index < pages.Count;

To print double-sided documents, you must specify before printing:

 pd.PrinterSettings.Duplex = Duplex.Vertical;

In this case, every 2 consecutive pages will be interpreted as pages of one sheet.

To disable the print progress window, you need to specify a standard print controller:

 pd.PrintController = new StandardPrintController();

Interesting from a practical point of view is the ability to specify the source of paper (tray). And you can change the tray on the fly, i.e. when one page comes from one tray, another - from another. This means that you can print documents consisting, for example, of different types of paper, which are laid out in trays.

You can specify the tray for the entire print as follows:

 pd.DefaultPageSettings.PaperSource = pd.PrinterSettings.PaperSources[SourceId];

And for the page like this:

 e.PageSettings.PaperSource = pd.PrinterSettings.PaperSources[SourceId];

In the second case, remember that e.PageSettings.PaperSource will only affect the next page. Those. we always have a one page delay: for the first page - pd.DefaultPageSettings.PaperSource, for all subsequent pages - e.PageSettings.PaperSource.

Now, by crossing the generation of images with print, you can write a simple program for outputting * .pdf files to a printer. I will not give the code, because nothing new in it will be. In addition, the solution to the forehead has a significant drawback - a lot of machine resources are spent on render pages, so the print goes insanely long. For example, printing a large 5000 page document will take at least 30 minutes, while Acrobat Reader would have done it in about 10 to 15 minutes. Since most of the time is spent on image generation, then we will optimize it over time. This process can be accelerated ten times or more, depending on the iron. It is enough to parallelize the generation of images of pages. In fact, such acceleration will be just a change of the time resource for the processor one.

For parallelization we will use a pool of threads. Each thread will process its own fragment of a document consisting of N pages. The result will be added to the dictionary, where the key will be the number of the current page, and the value will be the MemoryStream. “Why Stream, and not, say, Bitmap?” A curious reader will ask. It's simple. The fact is that in Stream we will keep pages compressed in Jpeg format, thus saving memory, because 5000 pages is a lot. As soon as all pages are calculated, we send them to the printer. It is very easy to find out that the processing is over: the number of pages of the original document should match the number of elements of the dictionary.

In C #, the above can be expressed with the following code.

 using System; using System.Collections.Generic; using System.Diagnostics; using System.Drawing.Imaging; using System.Drawing.Printing; using System.IO; using System.Threading; using Ghostscript.NET; using Ghostscript.NET.Rasterizer; namespace GS_Parallel { class Program { public static Dictionary<int, MemoryStream> PageStore; //   private const int Dpi = 200; private const int Quants = 30; private const int MaxThreads = 10; static void Main(string[] args) { PageStore = new Dictionary<int, MemoryStream>(); if (!args.Any()) { Console.WriteLine("{0} [*.pdf]", Path.GetFileName(Environment.GetCommandLineArgs()[0])); return; } var inputPdfPath = args[0]; ThreadPool.SetMaxThreads(MaxThreads, MaxThreads); var mainRasterizer = CreateRasterizer(inputPdfPath); //     var step = mainRasterizer.PageCount / Quants; var tail = mainRasterizer.PageCount % Quants; var shift = 0; for (var i = 0; i < Quants; i++) { var wi = new WorkInfo() {StartPage = shift + 1, EndPage = shift + step, SourcefilePath = inputPdfPath}; ThreadPool.QueueUserWorkItem(PdfProcessing, wi); shift += step; } if (tail > 0) { var wi = new WorkInfo() { StartPage = shift + 1, EndPage = shift + tail, SourcefilePath = inputPdfPath }; ThreadPool.QueueUserWorkItem(PdfProcessing, wi); } Console.WriteLine("Start preparation"); while (PageStore.Count < mainRasterizer.PageCount) { //    Console.WriteLine("{0:000.0}%", ((double)PageStore.Count) / mainRasterizer.PageCount * 100); Thread.Sleep(100); } Console.WriteLine("Start printing"); PrintPages(PageStore); } static GhostscriptVersionInfo _lastInstalledVersion = GhostscriptVersionInfo.GetLastInstalledVersion(GhostscriptLicense.GPL | GhostscriptLicense.AFPL, GhostscriptLicense.GPL); static GhostscriptRasterizer CreateRasterizer(string file) { var rasterizer = new GhostscriptRasterizer(); rasterizer.CustomSwitches.Add("-dNOINTERPOLATE"); rasterizer.CustomSwitches.Add("-dCOLORSCREEN=0"); rasterizer.CustomSwitches.Add("-sPAPERSIZE=a4"); rasterizer.TextAlphaBits = 4; rasterizer.GraphicsAlphaBits = 4; rasterizer.Open(file, _lastInstalledVersion, true); return _rasterizer; } static void PdfProcessing(object stateInfo) { var wi = (WorkInfo)stateInfo; var rasterizer = CreateRasterizer(wi.SourcefilePath); for (var pageNumber = wi.StartPage; pageNumber <= wi.EndPage; pageNumber++) { using (var img = rasterizer.GetPage(Dpi, Dpi, pageNumber)) { var mem = new MemoryStream(); img.Save(mem, ImageFormat.Jpeg); lock (PageStore) { PageStore[pageNumber] = mem; } } } } static void PrintPages(IReadOnlyDictionary<int, MemoryStream> pageStore) { using (var pd = new PrintDocument()) { pd.PrinterSettings.Duplex = Duplex.Simplex; pd.PrintController = new StandardPrintController(); var index = 0; pd.PrintPage += (o, e) => { var pageStream = pageStore[index + 1]; var img = System.Drawing.Image.FromStream(pageStream); e.Graphics.DrawImage(img, e.Graphics.VisibleClipBounds); index++; e.HasMorePages = index < pageStore.Count; Console.WriteLine("Print {0} of {1}; complete {2:000.0}%", index, pageStore.Count, ((double)index) / pageStore.Count * 100); }; pd.Print(); } } } class WorkInfo { public int StartPage; public int EndPage; public string SourcefilePath; } }

To save memory when saving Bitmap in a MemoryStream, you can increase the compression ratio.

 //Getting and configuration a jpeg encoder const long quality = 35L; var encoders = ImageCodecInfo.GetImageDecoders(); var jpgEncoder = encoders.FirstOrDefault(codec => codec.FormatID == ImageFormat.Jpeg.Guid); var encoderParams = new EncoderParameters(1); encoderParams.Param[0] = new EncoderParameter(Encoder.Quality, quality); //Save with the jpeg encoder img.Save(mems, jpgEncoder, encoderParams);

However, an increase in the compression ratio will slow down processing and degrade the quality of printed documents.

If the amount of printed data is very large (for example, automation of a mini-printing house), you can do even more interesting. You can divide the task into two applications: one will render documents, the other will print. Moreover, the result of the PDF rendering can be saved again in PDF. When printing prepared PDFs, they no longer need to render them; it is enough to extract the saved images, which is done fairly quickly.

Cycle of articles:
Raster approach
Vector approach theory
Vector approach practice

Source: https://habr.com/ru/post/279361/

All Articles

[PF] Print PDF under .NET, raster approach

More articles: