📜 ⬆️ ⬇️

How I found a bug in GNU Tar

The author of the article is Chris Siebenmann , Unix system administrator at the University of Toronto

From time to time in my work something strange happens that makes you think. Even if it is not immediately clear what the conclusions follow. Recently, I mentioned that we found a bug in GNU Tar, and the story of how this happened is one such case.

For backup file servers, we use Amanda and GNU Tar. For a long time, we occasionally had a rather rare problem when tar went crazy when backing up the file system with the /var/mail directory, producing a huge amount of output. Usually this process went to infinity and had to kill the dump; in other cases, it did end up with a terabyte (s) of data that seemed to be perfectly compressed. When I once again got such a giant tar file, I checked it - and found out that it partially consists of zero bytes, which the tar -t testing team doesn’t like very much, after which everything returns to normal.
')
(Because of this, it became interesting to me whether people in mailboxes appear naturally in zero bytes. It turned out that finding zero bytes in text files is not so simple and yes, they are there).

We recently moved the file system from /var/mail to new Linux file servers under Ubuntu 18.04 and therefore switched to a later and more standard version of GNU Tar than is on OmniOS machines. We hoped that this would solve our problems, but almost immediately the same incident occurred. This time GNU Tar worked on the Ubuntu machine, where I am well acquainted with all the available debugging tools, so I checked the running tar process. The test showed that tar produces an endless stream of read() , returning 0 bytes:

 read(6, "", 512) = 0 read(6, "", 512) = 0 [...] read(6, "", 512) = 0 write(1, "\0\0\0\0\0"..., 10240) = 10240 read(6, "", 512) = 0 [...] 

lsof said that file descriptor 6 is someone's mailbox.

Using apt-get source tar I downloaded the source code and started looking for read() system calls that do not check for file completion. Having examined several levels of indirect addressing, I found an obvious place where such a check seems to be omitted, namely in the function sparse_dump_region from the file sparse.cs . And then I remembered something.

A few months ago, we ran into an NFS problem in Alpine . While working on this bug, I traced the Alpine process and noticed, among other things, that it uses ftruncate() to resize mailboxes; sometimes it expands them, temporarily creating a sparse section of the file, until it fills it, and, possibly, sometimes compresses it. This seemed to coincide with the current situation: sparse areas are related, and reducing the file size with ftruncate() creates a situation where tar unexpectedly encounters file termination.

(This even explains why tar is sometimes restored; if later a new mail suddenly arrives in the box, it returns to the expected size and tar no longer faces an unexpected completion of the file).

I fiddled a bit with GDB on the Ubuntu debugging symbols and tar source code I received, and was able to reproduce the error, although it was somewhat different from my original theory. It turned out that sparse_dump_region does not reset sparse areas of the file, but resets not sparse (well, of course), and is used for all files (sparse or not) if you run tar with the argument --sparse . Thus, the actual error is that if you run GNU Tar with the argument --sparse and the file is compressed during its reading, tar cannot correctly handle the end of the file received earlier than expected . If the file grows again, tar restores.

(Except when the file is sparse only at the end and is compressed only in this place. In this case, everything is fine).

I thought that all the same I could check many years ago on our OmniOS file servers. There are ways to trace the system calls of the program and analogs of lsof , and I could find and see the source code of my version of GNU Tar and run it with the OmniOS debugger (although GDB is not installed there), and so on. But I did not. Instead, we shrugged and moved on. I had to move the file system under Ubuntu so that I could lift a finger and figure out the problem.

(It's not just about tools and environments; we also automatically assumed that some old unsupported version of GNU Tar was on OmniOS, which makes no sense to investigate, because the problem was of course decided in a newer version).

PS: Probably, as a quick fix, we simply forbid Amanda from using the tar --sparse when backing up. Mailboxes should not be sparse, and if this happens, we still compress the file system backups , so all these zero bytes will compress well.

PPS: I did not try to report a bug to the developers of GNU Tar, because I discovered it only on Friday, and the university is now on winter vacation. Feel free to do this before me.

Source: https://habr.com/ru/post/434624/


All Articles