On September 14, the GitLab support service reported a critical problem that occurred to one of our clients: first, GitLab works fine, and then the users get an error. They tried to clone some repositories via Git, and suddenly there was an incomprehensible message about an obsolete file: Stale file error
. The error persisted for a long time and did not allow to work until the system administrator manually started ls
in the directory itself.
I had to study the internal mechanisms of Git and the NFS network file system. As a result, we found a bug in the Linux v4.0 NFS client, Trond Myklebust wrote a patch for the kernel , and since October 26 this patch is included in the main Linux kernel .
In this post I will tell you how we studied the problem, in which direction we thought and what tools we used to track the bug. We were inspired by the excellent detective work of Oleg Dashevsky, described in the post “How I have been hunting for a memory leak in Ruby for two weeks” .
This is also a great example of that debugging open source software is a team sport in which many people, companies and countries participate. The GitLab motto “ Everyone can contribute ” is valid not only for GitLab itself, but also for other open source projects, for example, for the Linux kernel.
We kept NFS on GitLab.com for many years, but then we stopped using it to access repository data on machines with applications. We moved all the Git calls to Gitaly . We support NFS for clients who manage their installations on GitLab, but have never encountered a problem like that of the above client.
The client gave some helpful hints :
fatal: Couldn't read ./packed-refs: Stale file handle
.git gc
command.ls
utility in a directory.git gc
process terminated.It is clear that the first two points are related. When you send changes to a Git branch, Git creates a weak link — a long file name that indicates the name of the branch for a commit. For example, when sending to master
, a file will be created with the name refs/heads/master
in the repository:
$ cat refs/heads/master 2e33a554576d06d9e71bfd6814ee9ba3a7838963
The git gc
command performs several tasks. For example, it collects these weak links (refs) and packs them into a single file called packed-refs
. This speeds up the work a bit, because reading one large file is easier than many small ones. For example, after running the git gc
command, the packed-refs
file might look something like this:
# pack-refs with: peeled fully-peeled sorted 564c3424d6f9175cf5f2d522e10d20d781511bf1 refs/heads/10-8-stable edb037cbc85225261e8ede5455be4aad771ba3bb refs/heads/11-0-stable 94b9323033693af247128c8648023fe5b53e80f9 refs/heads/11-1-stable 2e33a554576d06d9e71bfd6814ee9ba3a7838963 refs/heads/master
How is the file packed-refs
? To find out, we ran the command strace git gc
where we had a weak link. Here are the lines that are relevant:
28705 open("/tmp/libgit2/.git/packed-refs.lock", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 28705 open(".git/packed-refs", O_RDONLY) = 3 28705 open("/tmp/libgit2/.git/packed-refs.new", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 4 28705 rename("/tmp/libgit2/.git/packed-refs.new", "/tmp/libgit2/.git/packed-refs") = 0 28705 unlink("/tmp/libgit2/.git/packed-refs.lock") = 0
System calls have shown that the git gc
command:
packed-refs.lock
. This tells other processes that the packed-refs
file is locked and cannot change.packed-refs.new
opened packed-refs.new
.packed-refs.new
.packed-refs.new
to packed-refs
.packed-refs.lock
.The key point here is the fourth, that is, the renaming, where Git enters the file packed-refs
. git gc
not only collects weak links, but also performs a much more resource-intensive task - it searches for and removes unused objects. In large repositories this can take more than an hour.
And we asked ourselves: does the large gos repository keep the git gc
file open during the cleanup? We studied the strace
logs, launched the lsof
utility, and here is what we learned about the git gc
process:
As you can see, the file packed-refs
closes at the very end, after the potentially long process of Garbage collect objects
.
So the following question arose: how does NFS behave when the packed-refs
file is opened on one node and the other renames it at this time?
"For scientific purposes" we asked the client to conduct one experiment on two different machines (Alice and Bob):
1) In an NFS shared volume, create two files: test1.txt
and test2.txt
with different contents, to make them easier to distinguish:
alice $ echo "1 - Old file" > /path/to/nfs/test1.txt alice $ echo "2 - New file" > /path/to/nfs/test2.txt
2) On the Alice machine, the test1.txt
file must be open:
alice $ irb irb(main):001:0> File.open('/path/to/nfs/test1.txt')
3) On Alice's machine, continuously display the contents of test1.txt
:
alice $ while true; do cat test1.txt; done
4) Then, on the Bob machine, execute the command:
bob $ mv -f test2.txt test1.txt
The last step reproduces what git gc
does with the packed-refs
file when overwriting an existing file.
By client machine, the result looked like this:
1 - Old file 1 - Old file 1 - Old file cat: test1.txt: Stale file handle
There is! It seems we controlled the problem in a controlled way. But in the same experiment on a Linux NFS server, this problem did not arise. The result was expected - after renaming, new content was accepted:
1 - Old file 1 - Old file 1 - Old file 2 - New file <--- RENAME HAPPENED 2 - New file 2 - New file
Where does this difference in behavior come from? It turns out that the client used the Isilon NFS storage, which only supported NFS v4.0. When we changed the connection parameters to v4.0 using the vers=4.0
parameter in /etc/fstab
, the test showed a different result for the Linux NFS server:
1 - Old file 1 - Old file 1 - Old file 1 - Old file <--- RENAME HAPPENED 1 - Old file 1 - Old file
Instead of the outdated Stale file handle
Linux NFS v4.0 server showed obsolete content . It turns out that the difference in behavior can be explained by the NFS specifications. From RFC 3010 :
A file descriptor may become obsolete or expire when renamed, but not always. Server implementers are advised to take steps so that file descriptors are not outdated and do not expire in this way.
In other words, NFS servers can choose how to behave when the file is renamed, and the NFS server rightly returns a Stale file error
in such cases. We assumed that the cause of the problem was the same, although the results were different. We suspected that it was a cache check, because the ls
utility in the directory cleared the error. Now we had a reproducible test script, and we turned to the experts - Linux NFS maintainers.
When we managed to reproduce the error step by step, I wrote to the Linux NFS contacts about what we learned. For a week, I corresponded with Bruce Fields, the Linux Maintainer of the NFS server, and he suggested that the bug in NFS should be examined for network traffic. He thought the problem was delegating tasks to the NFS server.
In a nutshell, the NFS v4 version introduced a delegation function to speed up file access. The server can delegate read or write access to the client so that the client does not have to constantly ask the server whether the file has been changed by another client. Simply put, delegation of a record is how to lend your notebook to someone and say: “You are still writing here, and I will take it when I'm ready.” And a person will not have to ask for a notebook every time they need to write something down - he has complete freedom of action, until the notebook is taken away. In NFS, please return the notebook is called a revocation of delegation.
A bug in the NFS delegation recall could explain the problem of the Stale file handle
. Remember how in our experiment Alice opened test1.txt
, and then replaced it with test2.txt
. Maybe the server failed to revoke the delegation for test1.txt
, and this led to an invalid status. To test this theory, we recorded the NFC traffic with the tcpdump
utility and visualized it with Wireshark.
Wireshark is a great open source tool for analyzing network traffic, especially for exploring NFS in action. We recorded the trace using the following command on the NFS server:
tcpdump -s 0 -w /tmp/nfs.pcap port 2049
This command records all NFS traffic that usually passes through TCP port 2049. Since our experiment was successful with NFS v4.1, but not with NFS v4.0, we could compare the behavior of NFS in the working and non-working cases. With Wireshark, we saw the following behavior:
In this diagram, you can see that in step 1, Alice opens test1.txt
and obtains an NFS file descriptor with the identifier stateid
0x3000. When Bob tries to rename the file, the NFS server asks to try again by sending the NFS4ERR_DELAY
message, and he himself withdraws the delegation from Alice via the CB_RECALL
message (step 3). Alice returns the delegation (DELEGRETURN in step 4), and Bob tries to send the message RENAME
again (step 5). RENAME
is executed in both cases, but Alice continues to read the file using the same descriptor.
Here the difference is visible in step 6. In NFS v4.0 (with an obsolete file), Alice tries to use the same stateid
. In NFS v4.1 (working case), Alice performs additional LOOKUP
and OPEN
operations, so the server returns a different stateid
. In v4.0, it does not send any additional messages. This explains why Alice sees obsolete content — she uses the old descriptor.
Why does Alice suddenly decide on an additional LOOKUP
? Apparently, the delegation recall was a success, but some problem apparently remained. For example, the invalidation step is missed. To check this, we excluded the NFS delegation on the NFS server itself with this command:
echo 0 > /proc/sys/fs/leases-enable
We repeated the experiment, but the problem did not disappear. We made sure that the problem is not in the NFS server or delegation, and decided to look at the NFS client in the kernel.
The first question we had to answer to the NFS maintainers was:
The problem occurred in CentOS 7.2 and Ubuntu 16.04 kernels with versions 3.10.0-862.11.6 and 4.4.0-130, respectively. But both cores lagged behind the latest version, which at that time was 4.19-rc2.
We deployed a new Ubuntu 16.04 virtual machine on the Google Cloud Platform (GCP), cloned the latest Linux kernel, and set up a kernel development environment. We created a .config
file using menuconfig
and checked that:
CONFIG_NFSD=m
).Genetics track the evolution in real time of fruit flies, and using the first item, we could quickly make changes to the NFS client without restarting the kernel. The second point guaranteed that the kernel would start after installation. Fortunately, we were satisfied with the default kernel parameters.
We made sure that the problem of the obsolete file has not disappeared in the latest version of the kernel. We asked ourselves:
To answer these questions, we delved into the NFS source code. We did not have a kernel debugger, so we sent two types of calls to the source code:
pr_info()
( printk
).dump_stack()
: it shows the stack trace for the current function call.For example, first of all, we connected to the nfs4_file_open()
function in fs/nfs/nfs4file.c
:
static int nfs4_file_open(struct inode *inode, struct file *filp) { ... pr_info("nfs4_file_open start\n"); dump_stack();
Of course, we could dprintk
with dynamic Linux debugging or use rpcdebug
, but we wanted to add our own messages to check for changes.
After each change, we recompiled the module and reinstalled it into the kernel using the commands:
make modules sudo umount /mnt/nfs-test sudo rmmod nfsv4 sudo rmmod nfs sudo insmod fs/nfs/nfs.ko sudo mount -a
With the NFS module, we could repeat the experiments and receive messages to understand the NFS code. For example, you can immediately see what happens when an application calls open()
:
Sep 24 20:20:38 test-kernel kernel: [ 1145.233460] Call Trace: Sep 24 20:20:38 test-kernel kernel: [ 1145.233462] dump_stack+0x8e/0xd5 Sep 24 20:20:38 test-kernel kernel: [ 1145.233480] nfs4_file_open+0x56/0x2a0 [nfsv4] Sep 24 20:20:38 test-kernel kernel: [ 1145.233488] ? nfs42_clone_file_range+0x1c0/0x1c0 [nfsv4] Sep 24 20:20:38 test-kernel kernel: [ 1145.233490] do_dentry_open+0x1f6/0x360 Sep 24 20:20:38 test-kernel kernel: [ 1145.233492] vfs_open+0x2f/0x40 Sep 24 20:20:38 test-kernel kernel: [ 1145.233493] path_openat+0x2e8/0x1690 Sep 24 20:20:38 test-kernel kernel: [ 1145.233496] ? mem_cgroup_try_charge+0x8b/0x190 Sep 24 20:20:38 test-kernel kernel: [ 1145.233497] do_filp_open+0x9b/0x110 Sep 24 20:20:38 test-kernel kernel: [ 1145.233499] ? __check_object_size+0xb8/0x1b0 Sep 24 20:20:38 test-kernel kernel: [ 1145.233501] ? __alloc_fd+0x46/0x170 Sep 24 20:20:38 test-kernel kernel: [ 1145.233503] do_sys_open+0x1ba/0x250 Sep 24 20:20:38 test-kernel kernel: [ 1145.233505] ? do_sys_open+0x1ba/0x250 Sep 24 20:20:38 test-kernel kernel: [ 1145.233507] __x64_sys_openat+0x20/0x30 Sep 24 20:20:38 test-kernel kernel: [ 1145.233508] do_syscall_64+0x65/0x130
What are the do_dentry_open
and vfs_open
? Linux has a virtual filesystem ( virtual filesystem, VFS ), an abstraction layer that provides a common interface for all file systems. The VFS documentation states:
VFS implements open (2), stat (2), chmod (2) and other system calls. VFS uses the pathname argument passed to them to search the directory entry cache (dentry cache, or dcache). This provides a very fast search mechanism that converts the path name (or file name) to a specific dentry. Dentry are in RAM and never stored on disk - they exist only for performance.
We noticed that the dentry cache is usually checked in fs/nfs/dir.c
We were especially interested in the nfs4_lookup_revalidate()
function, and as an experiment we made it work earlier:
diff --git a/fs/nfs/dir.cb/fs/nfs/dir.c index 8bfaa658b2c1..ad479bfeb669 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1159,6 +1159,7 @@ static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags) trace_nfs_lookup_revalidate_enter(dir, dentry, flags); error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, fhandle, fattr, label); trace_nfs_lookup_revalidate_exit(dir, dentry, flags, error); + goto out_bad; if (error == -ESTALE || error == -ENOENT) goto out_bad; if (error)
And in this experiment, the problem of an obsolete file did not arise! Finally, we attacked the trail.
To find out why the problem did not occur in NFS v4.1, we added pr_info()
calls to each if
block in this function. We experimented with NFS v4.0 and v4.1 and found a special condition in version v4.1:
if (NFS_SB(dentry->d_sb)->caps & NFS_CAP_ATOMIC_OPEN_V1) { goto no_open; }
What is NFS_CAP_ATOMIC_OPEN_V1
? This kernel patch says that this is a feature of the NFS v4.1 version, and the code in fs/nfs/nfs4proc.c
confirmed that this parameter is in v4.1, but is missing in v4.0:
static const struct nfs4_minor_version_ops nfs_v4_1_minor_ops = { .minor_version = 1, .init_caps = NFS_CAP_READDIRPLUS | NFS_CAP_ATOMIC_OPEN | NFS_CAP_POSIX_LOCK | NFS_CAP_STATEID_NFSV41 | NFS_CAP_ATOMIC_OPEN_V1
Therefore, the versions behaved differently - in v4.1 goto no_open
causes more checks in the nfs_lookup_revalidate()
function, and in v4.0 the function nfs4_lookup_revalidate()
returns earlier. And how did we finally solve the problem?
I talked about our findings on the NFS mailing list and suggested a primitive patch . A week later, Trond Myklebust sent a series of patches to fix the bug and found another related problem in NFS v4.1 .
It turns out that the fix for the NFS v4.0 bug was deeper in the code base than we thought. Trond described it well in the patch :
It is necessary to make inode and dentry correctly rechecked when an already opened file is opened. Now we do not recheck either one or the other in NFSv4.0, because the open file is cached. Let's fix this and we will cache open files only in special cases - to restore open files and return delegations.
We made sure that this fix solved the obsolete file problem, and sent bug reports to the Ubuntu and RedHat teams.
We were well aware that the changes would not be in a stable version of the kernel soon, so we added a temporary solution to this problem in Gitaly . We experimented and verified that calling stat()
in the packed-refs
file causes the kernel to recheck the renamed file in the dentry cache. For simplicity, we implemented this in Gitaly for any file systems, not just NFS. The check is performed only once before Gitaly opens the repository, and for other files there are already other stat()
calls.
A bug can hide in any corner of the software stack, and sometimes you need to look for it outside the application. If you have useful links in the open source world, it will ease your work.
Many thanks to Trond MĂĽklebust for correcting the problem, and to Bruce Fields for answering our questions and helping to figure out NFS. It is for such responsiveness and professionalism that we value the open source community.
Source: https://habr.com/ru/post/432922/
All Articles