📜 ⬆️ ⬇️

Integration of Java and 1C through the .Net framework on the example of Apache PDFBox

Java and 1C integration via IKVM.Net using the example of Apache PDFBox



On the Internet there is little information on the integration of Java and 1C. Nevertheless, there are interesting Java-projects, the work of which I would like to evaluate inside 1C. Apache PDFBox is one such popular project. It so happened that pdf files are very common, and 1C does not have good tools for working with this format. The method proposed here is to use the IKVM.NET utility to transfer the JAVA library to a .Net assembly, and then use this assembly inside 1C integration tools.

Apache PDFBox is a Java library for working with PDF documents. Allows you to perform operations: extract text, print PDF, merge and split documents, convert to image, fill out forms, create PDF, check PDF / A, integrate with Lucene Search Engine. The example uses version 1.8.2.
')
IKVM.Net is a Java virtual machine for the Mono and .Net framework. IKVM.Net allows you to convert a Java library into a .Net assembly and then access the library using the .Net framework. IKVM.Net contains many helper assemblies responsible for various Java classes. The example uses version 7.2.4630.5.


Jar to dll



In this step, it is assumed that IKVM.Net 7.2.4630.5 is installed on the computer.

Before converting the Jar library into the .Net framework assembly, you need to install the Java Runtime Engine and set the JAVA_HOME environment variable:

JAVA_HOME C:\Progra~1\Java\jre6

JAVA_HOME environment variable

The transform build command has the following form:

ikvmc.exe -out:pdfbox.dll pdfbox-app-1.8.2.jar

The output is the build pdfbox.dll, depending on the builds:

 IKVM.OpenJDK.Beans.dll IKVM.OpenJDK.Core.dll IKVM.OpenJDK.Jdbc.dll IKVM.OpenJDK.Media.dll IKVM.OpenJDK.Naming.dll IKVM.OpenJDK.Security.dll IKVM.OpenJDK.SwingAWT.dll IKVM.OpenJDK.Text.dll IKVM.OpenJDK.Util.dll IKVM.OpenJDK.XML.API.dll IKVM.Runtime.dll 


At this stage, there is a drawback of the method associated with a large volume of jointly supplied assemblies. PDFBox.dll takes about 10 MB, and auxiliary assemblies take about 18 MB.

Perform the simplest PDFBox operations inside 1C


The start of the PDFBox.dll assembly converted from JAVA will be implemented inside 1C via the .Net Bridge.

Download all required builds:
 net.LoadAssemblyFrom( + "IKVM.AWT.WinForms.dll"); net.LoadAssemblyFrom( + "IKVM.OpenJDK.Beans.dll"); net.LoadAssemblyFrom( + "IKVM.OpenJDK.Core.dll"); net.LoadAssemblyFrom( + "IKVM.OpenJDK.Jdbc.dll"); net.LoadAssemblyFrom( + "IKVM.OpenJDK.Media.dll"); net.LoadAssemblyFrom( + "IKVM.OpenJDK.Naming.dll"); net.LoadAssemblyFrom( + "IKVM.OpenJDK.Security.dll"); net.LoadAssemblyFrom( + "IKVM.OpenJDK.SwingAWT.dll"); net.LoadAssemblyFrom( + "IKVM.OpenJDK.Text.dll"); net.LoadAssemblyFrom( + "IKVM.OpenJDK.Util.dll"); net.LoadAssemblyFrom( + "IKVM.OpenJDK.XML.API.dll"); net.LoadAssemblyFrom( + "IKVM.Runtime.dll"); net.LoadAssemblyFrom( + "pdfbox.dll"); 


Open pdf file:
 pdf = net.CallStatic("org.apache.pdfbox.pdmodel.PDDocument", "load", ); 


Get the text from pdf:
 stripper = net.New("org.apache.pdfbox.util.PDFTextStripper"); Pdf = stripper.getText(pdf); 


Divide the document into single-page Pdf:
 splitter = net.New("org.apache.pdfbox.util.Splitter"); splitter.setSplitAtPage(1);  = splitter.split(pdf).toArray();   = 0  .Length - 1  .GetValue().save( + ( + 1) + ".pdf"); ; 


Create a new document from the odd-numbered pages of the original Pdf:
  = pdf.getDocumentCatalog().getAllPages(); newPdf = net.New("org.apache.pdfbox.pdmodel.PDDocument");   = 0  .size() - 1    % 2 = 1  ; ; newPdf.addPage(.get()); ; newPdf.save(Pdf); 


Unsolved problem



Despite the fact that the simplest operations have worked successfully, the problem of converting a page / document into image files remains unresolved.

  = net.GetStatic("java.awt.image.BufferedImage", "TYPE_INT_ARGB"); imageWriter = net.New("org.apache.pdfbox.util.PDFImageWriter"); success = imageWriter.writeImage(pdf, "png", "", 1, 3, "document-img", , 96); 


The above code leads to incorrect text output to the image file. The resulting png file looks like this. The text is displayed in a very small font in the upper left corner of the image.

PDFBox output error

Archive of materials for the article: Java1C.zip (14.36 mb)

Source: https://habr.com/ru/post/193212/


All Articles