getdents()
directly. #!python f = open("sparse", "w") f.seek(1024 * 1024 * 1024 * 200) f.write("\0")
Many people recommend using the dd utility for this, for example dd if=/dev/zero of=disk-image bs=1M count=1M
, but this works incomparably slower, and the result, as I understand it, is the same.
mkfs -t ext4 -q sparse # TODO: less FS size, but change -N option sudo mount sparse /mnt mkdir /mnt/test_dir
Unfortunately, I found out about the -N option of the mkfs.ext4 command after the experiments. It allows you to increase the limit on the number of inode on FS, without increasing the size of the image file. But, on the other hand, the standard settings are closer to real conditions.
#!python for i in xrange(0, 13107300): f = open("/mnt/test_dir/{0}_{0}_{0}_{0}".format(i), "w") f.close() if i % 10000 == 0: print i
By the way, if at the beginning the files were created fairly quickly, then the next ones were added more and more slowly, random pauses appeared, the memory usage of the kernel increased. So storing a large number of files in a flat directory is in itself a bad idea.
$ df -i / dev / loop0 13107200 13107200 38517 100% / mnt
$ ls -lh / mnt / drwxrwxr-x 2 seriy seriy 358M Nov. 1 03:11 test_dir
sudo sh -c 'sync && echo 1 > /proc/sys/vm/drop_caches'
$ rm -r /mnt/test_dir/
getdents()
, then it calls unlinkat()
lot and so on in a loop. Took 30MB of RAM, not growing.iotop 7664 be / 4 seriy 72.70 M / s 0.00 B / s 0.00% 93.15% rm -r / mnt / test_dir / 5919 be / 0 root 80.77 M / s 16.48 M / s 0.00% 80.68% [loop0]
rm -r ///
is quite normal.$ rm /mnt/test_dir/*
^C
Did not delete anything.glob
is processed by the shell itself by an asterisk, accumulates in memory and is transmitted to the rm
command after the whole directory is considered.$ find /mnt/test_dir/ -type f -exec rm -v {} \;
getdents()
called. The find
process has grown to 600MB , nailed by ^C
Did not delete anything.find
works the same way as * in a shell - first it builds a complete list in memory.$ find /mnt/test_dir/ -type f -delete
^C
Did not delete anything.$ cd /mnt/test_dir/ ; ls -f . | xargs -n 100 rm
| - ls 212kb | - xargs 108Kb | - rm 130kb # pid rm is constantly changing
iotop jumps strongly 5919 be / 0 root 5.87 M / s 6.28 M / s 0.00% 89.15% [loop0]
ls -f
in this situation behaves more adequately than find
and does not accumulate a list of files in memory unnecessarily. ls
without parameters (like find
) - reads a list of files in memory entirely. Obviously, to sort. But this method is bad because it constantly calls rm
, which creates an additional overhead projector.ls -f
to a file and then delete the contents of the directory on this list.$ perl -e 'chdir "/mnt/test_dir/" or die; opendir D, "."; while ($n = readdir D) { unlink $n }'
$ perl -e 'chdir "/mnt/test_dir/" or die; opendir D, "."; while ($n = readdir D) { unlink $n }'
(picked up here )strace
it calls getdents()
once, then unlink()
many times, and so on in a loop. Took 380Kb of memory, not growing.iotop 7591 be / 4 seriy 13.74 M / s 0.00 B / s 0.00% 98.95% perl -e chdi ... 5919 be / 0 root 11.18 M / s 1438.88 K / s 0.00% 93.85% [loop0]
//file: cleandir.c #include <dirent.h> #include <sys/types.h> #include <unistd.h> int main(int argc, char *argv[]) { struct dirent *entry; DIR *dp; chdir("/mnt/test_dir"); dp = opendir("."); while( (entry = readdir(dp)) != NULL ) { if ( strcmp(entry->d_name, ".") && strcmp(entry->d_name, "..") ){ unlink(entry->d_name); // maybe unlinkat ? } } }
$ gcc -o cleandir cleandir.c
$ ./cleandir
strace
it calls getdents()
once, then unlink()
many times, and so on in a loop. Took 128Kb of memory, not growing.iotop: 7565 be / 4 seriy 11.70 m / s 0.00 b / s 0.00% 98.88% ./cleandir 5919 be / 0 root 12.97 M / s 1079.23 K / s 0.00% 92.42% [loop0]
readdir
is quite normal, if you do not accumulate the results in memory, but delete the files right away.readdir()
+ unlink()
functions to remove directories containing millions of files.rm -r /my/dir/
, since it comes up smarter - first it builds a relatively small list of files in memory, calling readdir()
several times, and then deletes files from this list. This allows you to more smoothly alternate the load on the reading and writing, which increases the removal rate.nice
or ionice
. Or use scripting languages and insert small sleep () in cycles. Or generate a list of files through ls -l
and skip it through a slow down pipe .Source: https://habr.com/ru/post/157613/
All Articles