📜 ⬆️ ⬇️

Sponge command: “sponge” for standard input

We all know that when executing commands in a shell, we can redirect the standard output to the standard input of other commands, and also write it to a file.

This is described in sufficient detail in the I / O Redirection chapter in the Advanced Bash Scripting Guide .

In particular, it sometimes happens that you need to read a file, somehow process it (for example, select only those lines that fit a regular expression), and then write the result to the same file. Suppose your file is called “messages.log”, and you want to leave in it only those lines that begin with the word “Success”, a colon and a space (and remove all other lines).
')
We can assume that this command is suitable for this:

grep "^Success:\s" messages.log > messages.log 

But this assumption will be wrong - if you run this line, the message.log file will be opened for writing and cleared even before grep starts to view it.

However, it is interesting that when grep is still running, it will find that the output is redirected to the same file that it is trying to read, and immediately ends with the following message:

grep: input file 'messages.log' is also the output

GNU cat does the same thing (try running cat messages.log> messages.log):

cat: messages.log: input file is output file

This is done by comparing the device and inode for the input file with the corresponding values ​​for the file used to write the standard output. You can see the implementation of this approach in src / cat.c.

By the way, BSD cat doesn’t provide such checks, but in this case it’s not so important: the file is somehow cleaned up, so there’s nothing to read and write, so cat will simply end.

However, take another example:

 cat messages.log >> messages.log 

In this case, we do not clear messages.log, but append the output of the cat command to the end of the file. And if cat checks that the two files match, and ends, the file will remain in the same state, and the user will see an error. But if there is no such verification, cat will enter the loop and will complete the file until the place runs out or the user completes the process.

And now let's think about how you can still write the output to the same file that we read. The obvious solution is to use a temporary file. I.e:

 mv messages.log tmpmessages.log grep "^Success:\s" tmpmessages.log > messages.log rm tmpmessages.log 

This is not to say that it is very convenient, but at least the task is thus completely solved for itself.

Another option is to use sed.

 sed -i -n -e '/^Success:\s/{p}' messages.log 

But this solution, of course, is not very universal - after all, the choice of strings that match in a regular expression is only one of many problems associated with text processing. In addition, the syntax in this case is already much more complicated.

By the way, in fact, sed also uses a temporary file - this can be seen by looking at the strace output:

  open ("messages.log", O_RDONLY) = 3
 ...
 open ("./ sedWiaEAG", O_RDWR | O_CREAT | O_EXCL, 0600) = 4
 ...
 read (3, "Success: 123 \ nError: 123 \ n", 4096) = 24
 write (4, "Success: 123 \ n", 13) = 13
 read (3, "", 4096) = 0
 ...
 close (3) = 0
 ...
 close (4) = 0
 ...
 rename ("./ sedWiaEAG", "messages.log") = 0
 close (1) = 0
 close (2) = 0
 exit_group (0) =? 

Obviously, you need to be able to somehow do without intermediate files at all. And there is such an opportunity - this is a sponge from moreutils program.

sponge reads the standard file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. Write to the same file.

sponge reads the standard input and writes it to the specified file. Unlike command shell redirects, sponge “absorbs” all transferred input before opening the file to which it needs to be written. This allows the use of such pipelines, where the reading comes from the same file that is being written to.

So, using sponge, we can remove from our example the redirection of the command shell, and, instead, pass the name of the file to which we want to write the result, as an argument for the sponge command. The output of the grep command is passed through a pipe.

 grep "^Success:\s" messages.log | sponge messages.log 

In principle, the entire blog recording could be reduced to this example, but, I think, it turned out more interesting, and, perhaps, we were able to even talk about some nuances that some of the readers did not know before.

I wish you all a great Friday!

Source: https://habr.com/ru/post/178141/


All Articles