📜 ⬆️ ⬇️

About how I got Java from PDF Flash

image
A long time ago, when the grass was greener, I was caught and tortured for a long time. I had to raise the performance in one wonderful bundle.

As the architect understood the task


Given: there is an insane catalog of products in the form of a large number of PDF for a couple of thousand pages each. It is necessary to give them to the web in the form of colorful animated presentations.

Attempt to solve: they wrote the players on flash and javascript, which fed this converted directory, and they use something like a different algorithm to twist something with advertising.
')
Problem: catalogs are constantly changing, and converting only one ledger from the catalog takes more than an hour (!).

Why and how to improve?



How is it done before us


image
There is a reasonable question - why is it so difficult? So, besides desktops, mobile phones should be spud, and generating a raster for 200 devices and then testing it all is not our method. And then - and with a zoom then what to do?

Therefore, for the desktop - flash in ancient browsers (cheers for corporate IE), and HTML5 for everything else. Vector pictures (except for thumbnails, otherwise the SVG-NIS will be mastered at once, while the tablets are weak).

Open sorts (as always) to the rescue.

Analyzing - what is being done there. I discover this


I start it all in turn on a test PDF of 500 pages. Opening hours 1 hour 2 minutes .

Here you are, grandmother, and St. George's day!

Who is guilty?


image
Obviously, parsing PDF as many as four times in a row is not the best choice.

It is no less obvious that ImageMagick is clearly the wrong choice, 3/4 of the time was spent using the convert utility.

It was at that moment that a customer with an iron-shrouded iron looked into my room and said: your wisdom is urgently needed! I enter the game.

We recognize the general direction as conditionally correct - where to go, the release, as always, “just yesterday”, but there are no questions for the player. But we consider the junk picked up poorly, and turn our attention to Java.

New actors


image
Take the following set of gentleman:



and begin to combine.

First and foremost, we remember about Marcus and Boris , and begin:

public class PdfConv { public int startConversion(String pdfFile) { ... } public int getPages() { ... } public int nextPage(int pageNo, String outputFileName) { ... } public int endConversion() { ... } } 

And we write how we will use it.

 public static void main(String[] argv) throws Exception { for(int jjk =0; jjk <argv.length; ++jjk) { PdfConv conv = new PdfConv(); conv.startConversion(argv[jjk]); int k = conv.getPages(); for (int j = 0; j < k; ++j) { conv.nextPage(j, argv[jjk] + "_" + j + ".svg"); conv.nextPage(j, argv[jjk] + "_" + j + ".png"); conv.nextPage(j, argv[jjk] + "_" + j + ".swf"); ... } } } 

Things are easy - to write all the necessary converters.

First rake


image
It turns out that the chosen lib is really bright. But she has a couple of serious flaws:

  1. No SMask support
  2. There are no gradients either.
  3. JPEG2000 misunderstanding chronic
  4. Strange with fonts in the form of any there arrows, etc. In his best

Googling solves font problems, adding JAI to the classpath with image formats.

SMask has to be added to the PDFRenderer code with a file. Trivially - we add a recognition in the parser, a command to save to the context, and change the Shape drawing to a drawing with a mask overlay. Trite, but textually abundant.

Gradients are simply ignored - they are not present in those places that fall into the slides. Cropy and other processing for simplicity, I do not show, if that.

The first stage is completed - it draws as it should. We implement our API (I removed error handling):

 private PDFFile pdf = null; private FileChannel fic = null; public int startConversion(String pdfFile) throws Exception { File fix = new File(pdfFile); FileInputStream fin = new FileInputStream(fix); fic = fin.getChannel(); MappedByteBuffer mbb = fic.map(FileChannel.MapMode.READ_ONLY, 0, fix.length()); pdf = new PDFFile(mbb); return pdf.getNumPages(); } public int getPages() throws Exception { return pdf.getNumPages(); } public int endConversion() throws Exception { if (fic != null) fic.close(); pdf = null; fic = null; return 1; } public int nextPage(int pageNo, String outputMask) { PDFPage page = pdf.getPage(pageNo + 1); if (page == null) return -1; Rectangle bounds = page.getBBox().getBounds(); DrawingCtx ctx = DrawingCtxBuilder.build( outputMask, new Dimension(bounds.x + bounds.width, bounds.y + bounds.height)); PDFRenderer rx = new PDFRenderer(page, ctx.getContext(), bounds, null, null); rx.go(); rx.waitForFinish(); ctx.saveTo(outputMask); return 1; } 

Let's proceed to the actual conversion.

We draw abstractions - but do not forget about the Roosevelts


image
The subject area is such that even Boris is not confused.

 abstract class DrawingCtx { protected Graphics2D g2; protected Dimension size; DrawingCtx(Dimension size) { this.size = size; } public Graphics2D getGraphics() { return g2; } public abstract void saveTo(String fileName) throws Exception; } class DrawingCtxBuilder { public static build(String fileName, Dimension size) throws Exception { String type = fileName.substring(fileName.lastIndexOf('.') + 1).toUpperCase(); if(type.equals("SVG")) return new SvgDrawingCtx(size); else if(type.equals("PNG")) return new ImageDrawingCtx(size); else if(type.equals("SWF")) return new SwfDrawingCtx(size); ... throw new Exception(type + ": unknown converter requested"); } } 

Fill our skeleton with meat - and the cadavrik will be ready for work.

SVG and Batik


image
Here, in general, everything has been done before us, Apache Batik SVGgen to the rescue. No problems noticed. There would be gradients - then yes, with radial gradients you would have to either say goodbye or impose a patch on the batik. This is after the gradients themselves can be added to the PDFRenderer, by itself.

The only subtlety is to make hints correctly so that anti-aliasing is not forgotten, and the pictures are not cut into pieces.

 class SvgDrawingCtx extends DrawingCtx { private DOMImplementation domImpl; private Document doc; private SVGGraphics2D svgGenerator; SvgDrawingCtx(Dimension size) { super(size); domImpl = SVG12DOMImplementation.getDOMImplementation(); doc = domImpl.createDocument(SVGConstants.SVG_NAMESPACE_URI, SVGConstants.SVG_SVG_TAG, null); svgGenerator = new SVGGraphics2D(doc); svgGenerator.getGeneratorContext().setPrecision(4); svgGenerator.getGeneratorContext().setEmbeddedFontsOn(true); g2 = svgGenerator; g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON); g2.setRenderingHint( RenderingHints.KEY_INTERPOLATION, RenderingHints.VALUE_INTERPOLATION_BILINEAR); g2.setRenderingHint( RenderingHintsKeyExt.KEY_AVOID_TILE_PAINTING, RenderingHintsKeyExt.VALUE_AVOID_TILE_PAINTING_ON); } public void saveTo(String fn) throws Exception { Element svgRoot = svgGenerator.getRoot(); OutputStream os = new FileOutputStream(fn); if (fn.endsWith(".svgz")) os = new GZIPOutputStream(os); svgGenerator.stream(svgRoot, new OutputStreamWriter(os), false /* CSS */, true /* escaped */); os.close(); } } 


PNG: can not be easier


image
It's so trite here that I'll just give the code.

 class ImageDrawingCtx { private BufferedImage bf = null; ImageDrawingCtx(Dimension size) { super(size); bf = new BufferedImage(size.width, size.height, BufferedImage.TYPE_INT_ARGB); g2 = (Graphics2D) bf.getGraphics(); } public void saveTo(String fn) throws Exception { OutputStream os = new FileOutputStream(fn); ImageIO.write(bf, "PNG", os); os.close(); g2.dispose(); bf = null; } } 


Now for the dessert - create the SWF


image
The code is also simple (and mostly borrowed from the Flex SDK examples).

 class SwfDrawingCtx extends DrawingCtx { SwfDrawingCtx(Dimension size) { super(size); g2 = new SpriteGraphics2D(size.width, size.height); } public void saveTo(String fn) throws Exception { OutputStream os = new FileOutputStream(fn); flash.swf.Frame frame1; Movie m = new Movie(); m.version = 7; m.bgcolor = new SetBackgroundColor(SwfUtils.colorToInt(255, 255, 255)); m.framerate = 12; frame1 = new flash.swf.Frame(); DefineSprite tag = ((MyG2D) g2).defineSprite("swf-test"); frame1.controlTags.add(new PlaceObject(tag, 0)); m.frames = new ArrayList(1); m.frames.add(frame1); TagEncoder tagEncoder = new TagEncoder(); MovieEncoder movieEncoder = new MovieEncoder(tagEncoder); movieEncoder.export(m); tagEncoder.writeTo(os); os.close(); g2.dispose(); g2 = null; } } 

Nothing foreshadowed, and suddenly. Part of the pictures in the SWF did not appear.

Hot pursuit investigation


image
This is how it is - there is a SVG, there is a PNG, but there is no SWF.

Tracing the source adobe source code made me think, and I made the code for the horse:

 class MyG2D extends SpriteGraphics2D { public MyG2D(int width, int height) { super(width, height); } public MyG2D() { super(); } @Override public boolean drawImage(Image image, AffineTransform at, ImageObserver obs) { //    return super.drawImage(image, at, obs); } } 


An autopsy revealed that
 at.createTransformedShape(new Rectangle(0, 0, image.getWidth(), image.getHeight()).getBounds() 
returns us a 1x1 rectangle.

Eureka!

  @Override public boolean drawImage(Image image, AffineTransform at, ImageObserver obs) { AffineTransform good = getTransform(); good.concatenate(at); return super.drawImage(image, good, obs); } 

solves the problem.

Results


image
The test run showed that the first version, sewn on a live thread, requires less than 10 minutes . It is still a lot, and there is something to think about.

However, the transition from kosher C ++ to Java accelerated the procedure more than six times, and the creation of a new converter from scratch required the solution of a couple of rakes in a total of three days.

Now the guys have something to think about, and repeat all these steps in C ++. They have time - java so far copes.

Costs: I had to take with myself 40 megabytes of jar-ok, and build it in Tomkat, however the web service was in PHP.

Source: https://habr.com/ru/post/153633/


All Articles