Developers often have to deal with files that are a tree structure: XML, JSON, YAML, all sorts of markup languages ββlike Markdown or Org-mode. Facilitating our lives in general, such files tend to grow uncontrollably, at some point turning from solution into a problem.
The standard solution to this problem is splitting into smaller files. This, of course, works, but is not always convenient.
But there is an alternative, about which - below.
Perhaps you should first set out my problem. I use Emax and β like many Emax users β for writing almost all of my documents, notes, work diary and task lists, I use the markup language org-mode . The document in this markup looks like this:
... ... > cat tests/simple.org document section * headline 1 headline section 1 ** inner headline 1 some inner section 1 some inner section 1-2 ** inner headline 2 inner section 2 ** inner headline 3 *** inner inner headline 1 * headline 2 section text 2
One of my documents has grown to several hundred subheadings of various depths of nesting, and for all these headings I regularly stroll through scripts in search of different information. I didn't want to parse the document into several files, because it is still easier to process one file between machines or any script. But it was absolutely impossible to continue living like this.
And then it occurred to me that it would be great to walk through my file as directories, using, say, the standard Unix cd headline1
or cd ..
, ls -l
and cat section
.
In other words, I wanted to be able to represent a tree of headings and text sections in the form of an ordinary tree of directories and files. In terms of the same Unixes, this desire sounds like this: mount some kind of specialized file system.
Of course, writing a full-fledged file system for Linux is a long, ungrateful business, and certainly not worth it in such rare cases.
However, nowadays nobody does it anyway, that is, since ten years ago the FUSE module was included in Linux, which allows you to create file systems as an ordinary user process, to which all those connected with the mounted file system are routed to the kernel system calls.
With the help of FUSE, a lot of different file systems were written, from toy file systems that assemble, for example, Wikipedia articles, to quite serious parts of modern Linux like the same Gnome. Thus, FUSE has become a mandatory element of popular distributions.
Even more pleasant work with FUSE is made by the fact that nowadays very trivial wrappers are available in high-level languages ββlike Python, Ruby, Java and many others, i.e. Your own file system can be done in just two or three hours.
Specifically on Python, there are even a few wrappers around libfuse
(the client part of FUSE), but most of all I liked the fusepy project: the project code is very simple and understandable, except for the examples on GitHub and the source code, I didnβt need anything.
The fusepy
based file system comes down to redefining the methods of the fuse.Operations
class, each of which corresponds to a system call.
For non-redefined system calls, there is either a reasonable default behavior or a standard error.
Actually, the specific file format that you want to represent in the form of a tree of directories and files is not so important. In the case of the org-mode
markup, I did not like any of the available parsers for Python, and I just wrote my own . The parser passes through the specified file, creating a tree reflecting the structure of the document.
The parse tree of the markup file is then converted to another tree , reflecting the files and directories that the file system user will see.
To work with the last tree, it was enough to implement four system calls ( open
, read
, readdir
, getattr
), each of which took literally a few lines of Python code :
class FuseOperations(Operations): def __init__(self, tree): self.tree = tree self.fd = 0 def open(self, path, flags): self.fd += 1 return self.fd def read(self, path, size, offset, fh): node = self.tree.find_path(path) if node is None: raise FuseOSError(EIO) return node.content[offset:offset + size] def readdir(self, path, fh): node = self.tree.find_path(path) if node is None: raise FuseOSError(EROFS) return ['.', '..'] + [child for child in node.children] def getattr(self, path, fh=None): node = self.tree.find_path(path) if node is None: raise FuseOSError(ENOENT) return node.get_attrs()
The final script works like this:
... ... > mkdir mount > python orgfuse.py tests/simple.org mount/ ... ... > tree mount mount/ βββ headline 1 β βββ inner headline 1 β β βββ section β βββ inner headline 2 β β βββ section β βββ inner headline 3 β β βββ inner inner headline 1 β βββ section βββ headline 2 β βββ section βββ section 6 directories, 5 files
All this miracle takes about two hundred lines or 3-4 hours of my lazy evening work, copes with my small task remarkably.
Installation instructions and code, as usual, can be found Github .
If anyone is interested in converting a prototype into something digestible, with the ability to edit files and support more formats, I will be glad to talk.
Source: https://habr.com/ru/post/315654/
All Articles