📜 ⬆️ ⬇️

Making a visual web-editor of documents based on LibreOffice, jodconverter and TinyMCE

How I love the office specification! Much time has passed since the writing of the previous article about generating Excel documents from a template, and the task has changed somewhat. A new task was set as follows: from a finished excel or word document to make a template via a web interface. In the process of formation, substitute the desired values ​​into the template, remove and / or “clone” pieces of the template. After formation, the document should be available to the user for visual editing in the browser. The finished document should be saved on the server, be available for download by the user in its extension (* .doc / *. Xls), and in pdf. At the same time, the layout of the downloaded file should be identical to the template that was loaded at the very beginning (without any distortion of fields and print areas).
Well, the task is - we will solve!


1. Tested tools
First you need to decide how to transfer the downloaded files from doc, docx, xls, xlsx to html and back, without spoiling the layout.

Apache POI: A great tool that we successfully used, but it doesn’t know how to generate HTML markup from an existing document.
DocX4J: There has been a long history with this one. She can do all sorts of nice things that have been repeatedly written about. And initially we wanted to use this particular library.
Disadvantages of DocX4J: you can work only with docx and xlsx. But it is not so scary. The problems begin when you try to convert the HTML back to docx or xlsx. All styles of the document go, the fonts are generally arbitrary, etc. Appealed to the developer. He said that there is such a problem and it is partially solved in the paid version - docx4j-web-editor. But the paid version also appeared with its bugs. At the end of the days, this library also had to be abandoned.
')
The solution is to use LibreOffice. Let him on the server convert files to HTML and back. It remains only to connect it with our web-application.
To work with LibreOffice, a small library is used - jodconverter , which, unfortunately, has not been updated for a long time, but it works perfectly well. It connects to LibreOffice via a TCP socket and gives it a file for conversion, and the converted file comes back. All this works much faster and more correctly than all of the above Java libraries. In addition, LibreOffice works in its process, freeing a Java application from such a cumbersome task as parsing and storing a document in a pile of a web application.

2. Upload the file to the server and make a template out of it.

But jodconverter can work with the file system on the server. Therefore, you need to transfer the downloaded file from the web application to it and solve the inverse problem - convert the HTML into the required format file and give it to the user.

Under the cut, a small wrapper class for jodconverter with comments:
Libre.java
package ru.cpro.uchteno.util; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.util.logging.Level; import java.util.logging.Logger; import org.artofsolving.jodconverter.OfficeDocumentConverter; import org.artofsolving.jodconverter.office.ExternalOfficeManagerConfiguration; import org.artofsolving.jodconverter.office.OfficeConnectionProtocol; import org.artofsolving.jodconverter.office.OfficeManager; public class Libre {//    public static void doc2html(InputStream is, OutputStream os) {// doc  html try { File inf = File.createTempFile("doc", ".doc"); //   FileOutputStream infos = new FileOutputStream(inf); //    //        int n = 0; byte buff[] = new byte[1024]; while (n >= 0) { n = is.read(buff); if (n > 0) { infos.write(buff, 0, n); } } //  is.close(); infos.close(); //   File onf = File.createTempFile("doc", ".html"); //  jodconverter' ExternalOfficeManagerConfiguration officeConfiguration = new ExternalOfficeManagerConfiguration(); // tcp  officeConfiguration .setConnectionProtocol(OfficeConnectionProtocol.SOCKET); // officeConfiguration.setPortNumber(2002); //   officeManager OfficeManager officeManager = officeConfiguration .buildOfficeManager(); //     officeManager.start(); //  OfficeDocumentConverter converter = new OfficeDocumentConverter( officeManager); //    converter.convert(inf, onf); //  officeManager.stop(); //         FileInputStream outfis = new FileInputStream(onf); n = 0; while (n >= 0) { n = outfis.read(buff); if (n > 0) { os.write(buff, 0, n); } } //  outfis.close(); os.close(); //   inf.delete(); onf.delete(); } catch (IOException ex) { Logger.getLogger(Libre.class.getName()).log(Level.SEVERE, null, ex); } } public static void doc2pdf(InputStream is, OutputStream os) { try { File inf = File.createTempFile("doc", ".doc"); FileOutputStream infos = new FileOutputStream(inf); int n = 0; byte buff[] = new byte[1024]; while (n >= 0) { n = is.read(buff); if (n > 0) { infos.write(buff, 0, n); } } is.close(); infos.close(); File onf = File.createTempFile("doc", ".pdf"); ExternalOfficeManagerConfiguration officeConfiguration = new ExternalOfficeManagerConfiguration(); officeConfiguration .setConnectionProtocol(OfficeConnectionProtocol.SOCKET); officeConfiguration.setPortNumber(2002); OfficeManager officeManager = officeConfiguration .buildOfficeManager(); officeManager.start(); OfficeDocumentConverter converter = new OfficeDocumentConverter( officeManager); converter.convert(inf, onf); officeManager.stop(); FileInputStream outfis = new FileInputStream(onf); n = 0; while (n >= 0) { n = outfis.read(buff); if (n > 0) { os.write(buff, 0, n); } } outfis.close(); os.close(); inf.delete(); onf.delete(); } catch (IOException ex) { Logger.getLogger(Libre.class.getName()).log(Level.SEVERE, null, ex); } } public static void html2doc(InputStream is, OutputStream os) { try { File inf = File.createTempFile("doc", ".html"); FileOutputStream infos = new FileOutputStream(inf); int n = 0; byte buff[] = new byte[1024]; while (n >= 0) { n = is.read(buff); if (n > 0) { infos.write(buff, 0, n); } } is.close(); infos.close(); File onf = File.createTempFile("doc", ".doc"); ExternalOfficeManagerConfiguration officeConfiguration = new ExternalOfficeManagerConfiguration(); officeConfiguration .setConnectionProtocol(OfficeConnectionProtocol.SOCKET); officeConfiguration.setPortNumber(2002); OfficeManager officeManager = officeConfiguration .buildOfficeManager(); officeManager.start(); OfficeDocumentConverter converter = new OfficeDocumentConverter( officeManager); converter.convert(inf, onf); officeManager.stop(); FileInputStream outfis = new FileInputStream(onf); n = 0; while (n >= 0) { n = outfis.read(buff); if (n > 0) { os.write(buff, 0, n); } } outfis.close(); os.close(); inf.delete(); onf.delete(); } catch (IOException ex) { Logger.getLogger(Libre.class.getName()).log(Level.SEVERE, null, ex); } } public static void html2docx(InputStream is, OutputStream os) { try { File inf = File.createTempFile("doc", ".html"); FileOutputStream infos = new FileOutputStream(inf); int n = 0; byte buff[] = new byte[1024]; while (n >= 0) { n = is.read(buff); if (n > 0) { infos.write(buff, 0, n); } } is.close(); infos.close(); File onf = File.createTempFile("doc", ".docx"); ExternalOfficeManagerConfiguration officeConfiguration = new ExternalOfficeManagerConfiguration(); officeConfiguration .setConnectionProtocol(OfficeConnectionProtocol.SOCKET); officeConfiguration.setPortNumber(2002); OfficeManager officeManager = officeConfiguration .buildOfficeManager(); officeManager.start(); OfficeDocumentConverter converter = new OfficeDocumentConverter( officeManager); converter.convert(inf, onf); officeManager.stop(); FileInputStream outfis = new FileInputStream(onf); n = 0; while (n >= 0) { n = outfis.read(buff); if (n > 0) { os.write(buff, 0, n); } } outfis.close(); os.close(); inf.delete(); onf.delete(); } catch (IOException ex) { Logger.getLogger(Libre.class.getName()).log(Level.SEVERE, null, ex); } } public static void html2pdf(InputStream is, OutputStream os) { try { File inf = File.createTempFile("doc", ".html"); FileOutputStream infos = new FileOutputStream(inf); int n = 0; byte buff[] = new byte[1024]; while (n >= 0) { n = is.read(buff); if (n > 0) { infos.write(buff, 0, n); } } is.close(); infos.close(); File onf = File.createTempFile("doc", ".pdf"); ExternalOfficeManagerConfiguration officeConfiguration = new ExternalOfficeManagerConfiguration(); officeConfiguration .setConnectionProtocol(OfficeConnectionProtocol.SOCKET); officeConfiguration.setPortNumber(2002); OfficeManager officeManager = officeConfiguration .buildOfficeManager(); officeManager.start(); OfficeDocumentConverter converter = new OfficeDocumentConverter( officeManager); converter.convert(inf, onf); officeManager.stop(); FileInputStream outfis = new FileInputStream(onf); n = 0; while (n >= 0) { n = outfis.read(buff); if (n > 0) { os.write(buff, 0, n); } } outfis.close(); os.close(); inf.delete(); onf.delete(); } catch (IOException ex) { Logger.getLogger(Libre.class.getName()).log(Level.SEVERE, null, ex); } } } 



3. We work with the template

When we have HTML, the necessary operations on it are quite easily performed using velocity. Everything is easily done by description .

4. Visual document editing

There are some peculiarities with visual editors - visual editors spoil the HTML code and, when converting back, the entire layout of our document will be distorted beyond recognition. In the course of experiments with different editors, we came to the conclusion that TinyMCE is least of all clever and distorts the markup and has little effect on the final result when it is converted backwards.

As a result, the method of trying out trial and error selected the optimal configuration of the editor:

 tinymce.init({ selector: "textarea", theme: "modern", fullpage_default_doctype: "<!DOCTYPE xhtml>", plugins: [ "advlist autolink lists link image charmap print preview hr anchor pagebreak", "searchreplace wordcount visualblocks visualchars code fullscreen", "insertdatetime media nonbreaking save table contextmenu directionality", "emoticons template paste textcolor fullpage" ], toolbar1: "insertfile undo redo | styleselect | bold italic | alignleft aligncenter alignright alignjustify | bullist numlist outdent indent | link image", toolbar2: "print preview media | forecolor backcolor emoticons", image_advtab: true }); 


Every time, to reset the contents of the editor in the DOM, do not forget to do tinyMCE.triggerSave();

5. Download the finished document.

For this purpose, we will again use the Libre.java library:

Convert html to doc - html2doc()
Convert html to docx - html2docx()
Convert html to pdf - html2pdf()

That's all. We will be glad if this article will help someone and reduce the time spent dancing with a tambourine!

Material prepared: akaiser , boiler5 .

Source: https://habr.com/ru/post/224795/


All Articles