I think most of us often use the excellent
Google Docs service. Once I had the thought of trying to write something like this, but not a service, but a library that solves the problem of working with documents of the most popular formats directly in the browser and offline! And of course, to develop only JavaScript and nothing more :) Thus, the
jDoc library was born, which is still at an early stage of its development, under version 0.1.0.
What we can do
With the help of jDoc at this stage it is possible
to read , with preservation of formatting, pictures, links, etc., documents of such formats:
- .docx
- .txt
- .fb2
- .odt
- .csv
- .tsv
Specially highlighted the word "read", because editing and creating new documents in the future plans.
How to use
We connect the library to the project:
<script src="js/jDoc.0.1.0.js"></script>
')
Or its minified version:
<script src="js/jDoc.0.1.0.min.js"></script>
And we read the necessary file:
//file - File jDoc.read(file, { success: function (result) { }, error: function (error) { } });
result - an object, the result of reading the file. It has 2 methods:
- result.html () - returns the documentFragment with the converted file structure in html, it is convenient to use for instant display of the document to the user
- result.data () - returns the read document structure as a regular JavaScript object
Used technologies
To read some file formats (docx, odt), which is an archive, it became necessary to write my archive
extractor , but I soon found an excellent
zip.js library and saved a lot of time. The library provides a convenient API for working with archives using JavaScript, and although I had to fix a few bugs in zip.js, I was very pleased with this find.
Of course, you guessed that the latest technologies are used to read files using JavaScript in the browser, all of which include the same HTML5 concept, such as:
Blob ,
FileAPI ,
JavaScript typed arraysThe source code is compiled into a single file using
Grunt .
Special features
For me, working with the
Office Open XML format , which includes the .docx files, has become a nightmare. Compared to the same
OpenDocument (.odt-files), the structure of Open XML looks incredibly monstrous.
Not without fun incidents. In the manifest to the .docx file there is a tag
<Pages>
which regulates the number of pages of the document to display. But if you take and open the same document in MS Word, then the number of pages will be 2 :) For almost a dozen other tested documents, this situation did not arise.
I had to suffer with
WebWorkers , because in both Chrome and FireFox, the simultaneous creation of more than 20 workers ended in nothing or crashed the browser. For example, one .docx file is "parsed" into 20-25 files, not including images and other third-party resources.
Browser Support
Unfortunately, there was an opportunity to check the work of the library only in the latest versions of Google Chrome and Mozilla FireFox.
Plans
Napoleonic optimistic library development plans:
- Customization - make a separate page for building a library for specific formats with a document, something like a jQuery UI build
- Still use WebWorkers, t. To. The technology is very convenient and powerful, although it may not be used for all file formats.
- Ability to edit, create new files with the ability to specify a specific format to save the file
- Expansion of the list of supported file formats
PS Although the library is still quite damp, it can already be used for very trivial tasks, for example, reading, previewing documents on the client before sending it to the server.
This is my first post, so do not kick hard. I will be glad to constructive suggestions and wishes in the comments.
PPS Thank you very much to
aleks_raiden for posting a
demoAfter selecting the file, you should wait a bit, because I simply did not have time to visualize the processing process :)