We work with documents in the browser with the help of jDoc

I think most of us often use the excellent Google Docs service. Once I had the thought of trying to write something like this, but not a service, but a library that solves the problem of working with documents of the most popular formats directly in the browser and offline! And of course, to develop only JavaScript and nothing more :) Thus, the jDoc library was born, which is still at an early stage of its development, under version 0.1.0.

What we can do

With the help of jDoc at this stage it is possible to read , with preservation of formatting, pictures, links, etc., documents of such formats:

.docx
.txt
.fb2
.odt
.csv
.tsv

Specially highlighted the word "read", because editing and creating new documents in the future plans.

How to use

We connect the library to the project:

<script src="js/jDoc.0.1.0.js"></script>

')
Or its minified version:

 <script src="js/jDoc.0.1.0.min.js"></script>

And we read the necessary file:

 //file -   File jDoc.read(file, { success: function (result) { }, error: function (error) { } });

result - an object, the result of reading the file. It has 2 methods:

result.html () - returns the documentFragment with the converted file structure in html, it is convenient to use for instant display of the document to the user
result.data () - returns the read document structure as a regular JavaScript object

Used technologies

To read some file formats (docx, odt), which is an archive, it became necessary to write my archive extractor , but I soon found an excellent zip.js library and saved a lot of time. The library provides a convenient API for working with archives using JavaScript, and although I had to fix a few bugs in zip.js, I was very pleased with this find.

Of course, you guessed that the latest technologies are used to read files using JavaScript in the browser, all of which include the same HTML5 concept, such as: Blob , FileAPI , JavaScript typed arrays

The source code is compiled into a single file using Grunt .

Special features

For me, working with the Office Open XML format , which includes the .docx files, has become a nightmare. Compared to the same OpenDocument (.odt-files), the structure of Open XML looks incredibly monstrous.

Not without fun incidents. In the manifest to the .docx file there is a tag

 <Pages>

which regulates the number of pages of the document to display. But if you take and open the same document in MS Word, then the number of pages will be 2 :) For almost a dozen other tested documents, this situation did not arise.

I had to suffer with WebWorkers , because in both Chrome and FireFox, the simultaneous creation of more than 20 workers ended in nothing or crashed the browser. For example, one .docx file is "parsed" into 20-25 files, not including images and other third-party resources.

Browser Support

Unfortunately, there was an opportunity to check the work of the library only in the latest versions of Google Chrome and Mozilla FireFox.

Plans

~~Napoleonic~~ optimistic library development plans:

Customization - make a separate page for building a library for specific formats with a document, something like a jQuery UI build
Still use WebWorkers, t. To. The technology is very convenient and powerful, although it may not be used for all file formats.
Ability to edit, create new files with the ability to specify a specific format to save the file
Expansion of the list of supported file formats

PS Although the library is still quite damp, it can already be used for very trivial tasks, for example, reading, previewing documents on the client before sending it to the server.
This is my first post, so do not kick hard. I will be glad to constructive suggestions and wishes in the comments.
PPS Thank you very much to aleks_raiden for posting a demo
After selecting the file, you should wait a bit, because I simply did not have time to visualize the processing process :)

Source: https://habr.com/ru/post/195342/

All Articles