
I would like to devote an article to the review of the API provided by different operating systems to monitor changes in the directory. The article appeared as the result of my work on the demons tracking the changes for the dklab_realsync utility (
article on the habr ,
github repository ) and my own, which I don’t want to announce.
Windows ReadDirectoryChangesW
For the Windows operating system, there is a great function
ReadDirectoryChangesW , which returns a set of changes for a directory, including a flag for working recursively (bWatchSubtree). Thus, the implementation of tracking changes in the directory is not difficult and in the same dklab_realsync
implementation takes 80 lines of code or 3.5 KB. Interestingly, in Windows, these events are supported even through SMB!
However, there are certain pitfalls:
- the final size of the change buffer, after which the event queue will overflow and these events will be lost
- According to the watchdog package documentation, the move event is sent before the changes are visible in the file system.
- buffer size is limited to 64 KB for network FS
')
Conclusion: The ReadDirectoryChangesW function allows you to easily find out about all the events in the files, but the event queue may overflow and then you will need to perform a full scan of the file system. Also, delivery of events is possible before they become relevant.
Mac OS X, FSEvents
Mac OS X also has a convenient and simple API for tracking changes in the file system called
FSEvents . Using this API, the
simplest daemon implementation is 50 lines of code or 1.8 KB. The queue cannot overflow (!), But a full scan may still be required if the fseventsd daemon crashes. It should be noted that this API, up to version 10.7, does not provide changes by files, it only reports directories in which something has changed. Since events do not disappear anywhere and are written to the log (
FSEvents service stores events in a persistent, per-volume database ), specifying with accuracy for the directory saves disk space.
Conclusion: FSEvents API for Mac OS X is the most unusual of all such APIs. The queue does not overflow and even has the opportunity to receive events from the past. However, the event details are given up to the directory (up to version 10.7), which means that the daemon is less efficient for file synchronization.
Linux inotify
In linux vanilla kernel, there is one way to monitor changes in a directory - it is
inotify . There is good and detailed documentation for this API, but there is no support for recursive change tracking! Also, inotify has a limit on the maximum number of objects that can be monitored.
The simplest daemon
implementation takes up 250 lines of code or 8 kb. Static build using
dietlibc takes about 14 kb. Another unpleasant point is that the application itself must maintain correspondences between the watch descriptor (in our case it is always a directory) and the name. There is a function
inotify_add_watch , which passes the path to the monitored directory, but there is no inverse - inotify_get_path, which would return this same path by the passed descriptor. Events, however, contain only a watch descriptor and a relative path to the changed file within the directory.
Pitfalls of recursive directory tracking via inotify:
- The possibility of queue overflow (the queue length is set in / proc / sys / fs / inotify / max_queued_events)
- Limit on the maximum number of tracking objects (set in / proc / sys / fs / inotify / max_user_watches)
- Lack of recursive directory tracking
- The need to separately handle the case when a directory is created (for example, mkdir -pa / b / c). You will receive an event that the “a” directory has been created, but as long as you hang up the handler on this directory, you can already create another directory in it and you will not have an event about it anymore.
- The theoretical possibility of an integer overflow watch descriptor (wd), as it is given by uint32
FreeBSD, Mac OS X, kqueue
FreeBSD and Mac OS X allow you to track changes using kqueue, which is similar to inotify in its characteristics and also does not have the ability to recursively track directories. Also, kqueue accepts open file (directory) descriptors as arguments, so when using this API, the restrictions on the number of tracked directories are even stricter.
Total:
Mechanism | Queue overflow | Recursive? | Max. of objects | Detailing |
---|
ReadDirectoryChangesW | Yes | Yes | - | file |
FSEvents | Not | Yes | - | file (10.7+) |
inotify | Yes | Not | 8192 | file |
kqueue | Yes | Not | 1024 | file |
As you can see, all APIs have their advantages and disadvantages. The least convenient mechanisms are kqueue and inotify, but they are also the most effective and reliable. Commercial operating systems provide more convenient mechanisms for tracking changes, but they also have their own characteristics. I hope you now have a better idea of how hard the fate of Dropbox and similar programs need to get along with all this and implement reliable and effective data synchronization :).
* Picture taken from www.alexblogger.com/2008_01_01_archive.html