Ideal document repository

Sometimes you really want to quickly find the file you want. Considering the fact that there are hundreds of thousands of files, and you know neither its name, nor its content, nor its type — nothing. But approximately you know the categories. And I want to quickly calculate it and immediately edit and write.
At present, convenient cross-platform open-source file-washing with direct access to files - NO .
The following discussion will focus not on the media library and not on semaweb - but on a simple and convenient system for managing a huge file storage facility with direct access to files.

1. TK

~~British scientists~~ Practice has shown that even in a small company for a couple of dozen users can accumulate more than a dozen (or even hundreds) of thousands of files - of very different content and format. And finding a file in this ~~mess~~ is so difficult that it is easier to do it all over again.
The search problem did not arise much today ( Chekhov is a witness) - but it has not been solved yet.
The nuance is that in this case “to find” is not in terms of search engines or Explorer - but in terms of humans. The man does not know the words he is looking for - he knows the concepts. And with the concepts (semantics) of search engines and file anagents tight. For a normal user is not looking for “\\ server \ public \ Inbox \ Contracts \ Clients \ Horns and Hooves \ Contract Horns and Hoofs with Chamomile for delivery.doc” - he searches for “Some kind of contract about last month with our favorite client about two lemons. ” And he will find him (if lucky) in “\\ server \ private \ secretary \ Outgoing \ Romashka LLC \ Commercial offers \ Hoof \ How they already got me.xls”.
In the situation “they don’t know what they want” (Zoshchenko) they have to give a person a choice (which the search engines are trying to do, but so far unsuccessfully). Those. I do not know the type of the document - before my eyes the possible types. I do not remember the exact name of the company - I have a list of companies. Etc. Therefore, this is no longer a search - but a filter.
So - let's say I have 100,500,000 files (from prona to AutoCAD drawings) - and I want to quickly and conveniently:

“Calculate” (filter) a file by certain signs that are not physically in the file itself,
open (do not download and open its copy - but open it),
change - and write (do not download back - namely, write - ^ S).

Total - we need a system that:

Works
Cross-platform (Windows, Linux, Mac OS)
Filters
Direct access (open and write)
Integration (integration into the desktop environment - as a consequence of clause 4)
Multiplayer
Internet (access from anywhere in the world)
Open source

2. Who is to blame

What we have today:

2.1. FS

We are talking about file storage in the form of a spreading tree of folders and files on the local / remote file system. This option has one major advantage and one major drawback:

- The complete absence of filters.
+ Built-in cross-platform direct file access.

Of course - the hierarchical structure can, with a stretch, be called a “filter”. But it is with a stretch. For if one person put the file in “\\ server \ public \ LLC Chamomile \ Contract \ For Delivery” - and another looks for it in “\\ server \ public \ Legal \ Outgoing \ Chamomile” - this never corresponds to associative thinking person In fact, this file should be “Legal” And “Outgoing” And “Chamomile” And “Contract” - and not in that order, but at the same time. But at the same time, the current file ads will not provide.

2.2. Web

Thousands of them. Meets all requirements except two:

- Direct access to any files (Gugledox and MS Live is good - but what do people do with AutoCAD and SmetaWizard?)
- Integration (as a result)

Those. partial access

2.3. All-in-one

IBM Domino, MS SharePoint, MS Exchange. This is a “thing in itself” that is trying to solve the shortcomings of existing technologies with its own means.

+ Works
+ Filters
+ Direct Access
+ Multiplayer
+ Internet
- Non-cross-platform
- Not integrated
- Not open source

2.4. Semantic FS

Nepomuk, WinFS, ReFS etc. With all due respect, I didn’t see them live, so I’m not seen along with other exotic things.

3. What to do

In short - ~~hang yourself~~ using a web-based interface for managing files and give a link to direct access to files.
With links, everything is very simple - especially you will not turn around. If we proceed from cross-platform, then the options (of those that do not require special squats) are as much as 3: http: //, ftp: // and file: // (no longer understands Vend). At the same time, file metadata can be organized as you wish - from simple tags to semaweb freaks. But with the links you need to think.

3.1. HTTP

Only reading. In the sense - to give a link http: // you can - and you can even download, open and edit. But pour exactly there - will not work. A combination of ^ S at least. From any application under any platform.

3.2. FTP

Probably you can somehow cross the web interface with links to files in ftp. But guaranteed to ensure the integrity of the information in the metadata database and in ftp-storage is quite difficult - these are two completely separate services. To intervene in the work of an ftp-server is very hard, and writing your own ftp-server ...
Set aside.

3.3. file: //

Frank crutch. Those. You can somehow zamapit a remote resource on a local one on some of the Internet protocols - and even will work. But this design looks too enchanting.

3.4. WebDAV

And here everything is very interesting. As such - there is no “standard” WebDAV server. But all the common OS / DE support WebDAV (as clients) out of the box and in many ways. In this case, you can write your own WebDAV on the server side (not the web server - but only the http request processing) and literally do wonders.
Theoretically, of course ...
Updated: after a month of sekasa with Windows XP, we can say that WebDAV is a "hack" for purely nominal support.

4. And what does Django have to do with it?

While the demonstration of this idea (Web UI + WebDAV) uses just Django (more precisely, it is a small set of Apache, mod_dav, Django and a quick-release program made specifically for this article). And so that the attentive reader carefully read up - a link to the demo - somewhere in the text :-)
It demonstrates, in particular, file management via the web (the “comment” field for files) - and direct file access directly from a web page.
Of course, I wanted to get a full turbo from WebDAV - but suddenly it turned out that out of thousands of implementations of a WebDAV provider in python - both (wsgidav and pywebdav) are quite difficult to integrate into your web application (if at all possible, since this is not WebDAV providers, namely servers). I had to read the letters and start sculpt my bike. There are a lot of letters, alone, things are going slowly, so I invite those who wish to collaborate on development.
Slides
')

5. Comments

Linux

Konqueror with webdav: // works fine. The truth is - only KDE applications or those adapted to it (libreoffice-kde) - at least not when working in KDE. Those. integration here is only partial (Users of LibreCAD, JuffEd and other non-KDE applications are forced to suck a paw.).
Epiphany with dav: // is the same as (s / KDE / GNOME /). Although the gnome with WebDAV works worse than sneakers.

Mac os

Not tested

Windows

Because on Habré, the use of obscene language is not welcome, the summary will be short - everything is very bad. You can write a book about the “features” of Microsoft’s view on WebDAV, HTTP, and XML.
But with a certain position of the stars, something somehow somehow works.
Although perhaps apache mod_dav is not very compatible with Windows ;-)

6. Summary

This whole slender system of crutches and supports works, but not a fountain. In any OS (client). Those. currently, the OS / DE is not quite ready to fully integrate web applications with the desktop (so that it is transparent and always).
But life on Mars is still there!

Source: https://habr.com/ru/post/148670/

All Articles