📜 ⬆️ ⬇️

When the environment variable speeds up the process 40 times

Today we want to talk about some of the latest updates of the Sherlock system [this is a high-performance cluster of Stanford University - approx. per.], which significantly speeds up the listing of files in directories with a large number of entries.

Unlike regular articles, it’s rather an insider report on how Sherlock regularly works to keep it in the best possible way for our users. We hope to publish more such articles in the future.

Listing many files takes time


It all started with a question in tech support from the user. He reported a problem that executing ls takes several minutes in the directory c with more than 15,000 entries in $SCRATCH [directory for temporary files - approx. trans.].

Thousands of files in the same directory usually cause problems for the file system and this is definitely not recommended. The user knew this and admitted that it was not good, but mentioned that the listing on his laptop is 1000 times faster than in Sherlock. Of course, it hurt us. Therefore, we looked deeper.

Because ls looks beautiful


We looked at what ls actually does when listing a directory, and why the process takes so long. In most modern distributions, ls defaults to ls --color=auto , because everyone likes colors.
')
But beautiful colors have their price: for each file, ls must get information about the file type, its permissions, flags, extended attributes, and the like, in order to choose the appropriate color.

One of the simple solutions to the problem is to turn off the color in ls altogether, but imagine the users perturbation. In no case can not take the color output, we are not monsters.

Therefore, we looked deeper. ls colors entries through the LS_COLORS environment LS_COLORS , which is set by dircolors(1) based on the dir_colors(5) configuration file. Yes, the executable file reads the configuration file to create an environment variable, which is then used by ls (and if you do not know about the door (do) ​​files, then dir_colors will work , no matter what).

We will understand more


To determine which of the colorization schemes causes a slowdown, we created an experimental environment:

 $ mkdir $SCRATCH/dont $ touch $SCRATCH/dont/{1..10000} # don't try this at home! $ time ls --color=always $SCRATCH/dont | wc -l 10000 real 0m12.758s user 0m0.104s sys 0m0.699s 

12.7 seconds for 10,000 files is not very good.
By the way, the flag - --color=always needed: although it turns into ls --color=auto , but ls detects when it is not connected to the terminal (for example, via a channel or with redirection) and disables coloring if set to auto . Clever guy.
So what takes so much time? We looked with strace :

 $ strace -c ls --color=always $SCRATCH/dont | wc -l 10000 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 44.21 0.186617 19 10000 lstat 42.60 0.179807 18 10000 10000 getxattr 12.19 0.051438 5 10000 capget 0.71 0.003002 38 80 getdents 0.07 0.000305 10 30 mmap 0.05 0.000217 12 18 mprotect 0.03 0.000135 14 10 read 0.03 0.000123 11 11 open 0.02 0.000082 6 14 close [...] 

Wow: 10,000 calls to lstat() , 10,000 calls to getxattr() (which all fail, because in our environment there are no attributes that ls is looking for), 10,000 calls to capget() .

Surely this can be optimized.

Attributes capabilities? Nope


Following the advice of a 10-year-old bug , we tried to disable the validation of the capabilities attribute:

 $ eval $(dircolors -b | sed s/ca=[^:]*:/ca=:/) $ time strace -c ls --color=always $SCRATCH/dont | wc -l 10000 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 98.95 0.423443 42 10000 lstat 0.78 0.003353 42 80 getdents 0.04 0.000188 10 18 mprotect 0.04 0.000181 6 30 mmap 0.02 0.000085 9 10 read 0.02 0.000084 28 3 mremap 0.02 0.000077 7 11 open 0.02 0.000066 5 14 close [...] ------ ----------- ----------- --------- --------- ---------------- 100.00 0.427920 10221 6 total real 0m8.160s user 0m0.115s sys 0m0.961s 

Wow, acceleration up to 8 seconds! We got rid of all these expensive getxattr() calls, and the capget() calls also disappeared, great.

But there are still these annoying lstat() calls, though ...

How many colors do you need?


Therefore, we looked at LS_COLORS in more detail.

At first, we simply turned off this variable:

  $ echo $ LS_COLORS
 rs = 0: di = 01; 34: ln = 01; 36: mh = 00: pi = 40; 33: so = 01; 35: do = 01; 35: bd = 40; 33; 01: cd = 40; 33; 01: or = 40; 31; 01: su = 37; 41: sg = 30; 43: ca =: tw = 30; 42: ow = 34; 42: st = 37; 44: ex = 01; 32 : *. tar = 01; 31: *. tgz = 01; 31: *. arc = 01; 31: *. arj = 01; 31: *. taz = 01; 31: *. lha = 01; 31: * .lz4 = 01; 31: *. lzh = 01; 31: *. lzma = 01; 31: *. tlz = 01; 31: *. txz = 01; 31: *. tzo = 01; 31: *. t7z = 01; 31: *. Zip = 01; 31: *. Z = 01; 31: *. Z = 01; 31: *. Dz = 01; 31: *. Gz = 01; 31: *. Lrz = 01 ; 31: *. Lz = 01; 31: *. Lzo = 01; 31: *. Xz = 01; 31: *. Bz2 = 01; 31: *. Bz = 01; 31: *. Tbz = 01; 31 : *. tbz2 = 01; 31: *. tz = 01; 31: *. deb = 01; 31: *. rpm = 01; 31: *. jar = 01; 31: *. war = 01; 31: * .ear = 01; 31: *. sar = 01; 31: *. rar = 01; 31: *. alz = 01; 31: *. ace = 01; 31: *. zoo = 01; 31: *. cpio = 01; 31: *. 7z = 01; 31: *. Rz = 01; 31: *. Cab = 01; 31: *. Jpg = 01; 35: *. Jpeg = 01; 35: *. Gif = 01 ; 35: *. Bmp = 01; 35: *. Pbm = 01; 35: *. Pgm = 01; 35: *. Ppm = 01; 35: *. Tga = 01; 35: *. Xbm = 01; 35 : *. xpm = 01; 35: *. tif = 01; 35: *. tiff = 01; 35: *. png = 01; 35: *. svg = 01; 35: *. svgz = 01; 35: * .mng = 01; 35: *. pcx = 01; 35: *. mov = 01; 35: *. mpg = 01; 35: *. mpeg = 01; 35: *. m2v = 01; 35: *. mkv = 01; 35: *. Webm = 01; 35: *. Ogm = 01; 35: *. Mp4 = 01; 35: *. M4v = 01; 35: *. Mp4v = 01; 35: *. Vob = 01 ; 35: *. Qt = 01; 35: *. Nuv = 01; 35: *.  wmv = 01; 35: *. asf = 01; 35: *. rm = 01; 35: *. rmvb = 01; 35: *. flc = 01; 35: *. avi = 01; 35: *. fli = 01; 35: *. Flv = 01; 35: *. Gl = 01; 35: *. Dl = 01; 35: *. Xcf = 01; 35: *. Xwd = 01; 35: *. Yuv = 01; 35: *. Cgm = 01; 35: *. Emf = 01; 35: *. Axv = 01; 35: *. Anx = 01; 35: *. Ogv = 01; 35: *. Ogx = 01; 35: * .aac = 00; 36: *. au = 00; 36: *. flac = 00; 36: *. mid = 00; 36: *. midi = 00; 36: *. mka = 00; 36: *. mp3 = 00; 36: *. mpc = 00; 36: *. ogg = 00; 36: *. ra = 00; 36: *. wav = 00; 36: *. axa = 00; 36: *. oga = 00; 36: *. Spx = 00; 36: *. Xspf = 00; 36:
 $ unset LS_COLORS
 $ echo $ LS_COLORS

 $ time ls --color = always $ SCRATCH / dont |  wc -l
 10,000

 real 0m13.037s
 user 0m0.077s
 sys 0m1.092s 

What!?! Still 13 seconds?

It turns out that when the LS_COLORS environment LS_COLORS not defined or only one of its <type>=color: elements is missing, it uses the built-in database by default and still uses colors. Therefore, if you want to turn off coloring for a specific file type, you need to redefine it with <type>=: or <type> 00 in the DIR_COLORS file.

After a lot of trial and error, we narrowed the search to this point:

 EXEC 00 SETUID 00 SETGID 00 CAPABILITY 00 

what is written as

 LS_COLORS='ex=00:su=00:sg=00:ca=00:' 

This means: do not color the files either by the capabilities attribute, by the setuid/setgid bits, or by the executable flag .

Accelerate ls


And if you do not do any of these checks, then the calls to lstat() disappear, and now the situation is completely different:

 $ export LS_COLORS='ex=00:su=00:sg=00:ca=00:' $ time strace -c ls --color=always $SCRATCH/dont | wc -l 10000 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 63.02 0.002865 36 80 getdents 8.10 0.000368 12 30 mmap 5.72 0.000260 14 18 mprotect 3.72 0.000169 15 11 open 2.79 0.000127 13 10 read [...] ------ ----------- ----------- --------- --------- ---------------- 100.00 0.004546 221 6 total real 0m0.337s user 0m0.032s sys 0m0.029s 

0.3 seconds on the list of 10,000 files, a record.

Customize Sherlock


From 13 seconds with default settings up to 0.3 seconds with a small LS_COLORS setting means 40-fold acceleration due to the absence of setuid / setgid and colored executable files. Not such a big loss.

Of course, this is now configured in Sherlock for each user.

But if you want to return the coloring, you can simply return to the default settings:

 $ unset LS_COLORS 

But then on directories with a large number of files, be sure to make coffee while ls is running.

Source: https://habr.com/ru/post/450806/


All Articles