Frequent programming errors in Bash (continued)

I continue to acquaint the community with the translation of Bash Pitfalls .
Part one .
Initial publication of the translation .

11. cat file | sed s / foo / bar /> file

You cannot read from a file and write to it in the same pipeline. Depending on how the pipeline is built, the file may be zeroed (or truncated to a size equal to the buffer size allocated by the operating system for the pipeline), or increase indefinitely until it takes up all the available disk space, or restrictions on file size set by the operating system or quota, etc.

If you want to make a change in a file other than adding data to its end, you must create a temporary file at some intermediate time. For example (this code works in all shells):

  sed 's / foo / bar / g' file> tmpfile && mv tmpfile file

The following snippet will only work when using GNU sed 4.x and higher:
')

  sed -i 's / foo / bar / g' file

Note that this also creates a temporary file and then renames it - it is simply done imperceptibly.

In the BSD version of sed it is necessary to specify the extension added to the backup file. If you are confident in your script, you can specify a zero extension:

  sed -i '' 's / foo / bar / g' file

You can also use perl 5.x, which is probably more common than sed 4.x:

  perl -pi -e 's / foo / bar / g' file

Various aspects of the bulk replacement line in a bunch of files are discussed in the Bash FAQ # 21 .

12. echo $ foo

This relatively innocent looking team can lead to unpleasant consequences. Since the $foo variable is not enclosed in quotation marks, it will not only be divided into words, but also the template it contains may be converted to the names of the files matching it. Because of this, bash programmers sometimes mistakenly think that their variables contain incorrect values, whereas with variables everything is fine - this is the echo command that displays them according to the bash logic, which leads to misunderstandings.

  MSG = "Please enter a file name of the form * .zip"
 echo $ MSG

This message is broken down into words and all templates, such as *.zip , are expanded. What will your script users think when they see the phrase:

  Please enter a file name for the form freenfss.zip lw35nfss.zip

Here is another example:

  Var = *. Zip # var contains an asterisk, a dot and the word "zip"
 echo "$ var" # will print * .zip
 echo $ var # will list the files whose names end with .zip

In fact, the echo command cannot be used at all absolutely safely. If the variable contains only two "-n" characters, the echo command will treat them as an option, not as data to be printed, and will output absolutely nothing. The only reliable way to print the value of a variable is to use the printf command:
printf "%s\n" "$foo" .

13. $ foo = bar

No, you cannot create a variable by putting a "$" at the beginning of its name. This is not perl. It is enough to write:

  foo = bar

14. foo = bar

No, you cannot leave spaces around the "=" by assigning a value to a variable. This is not C. When you write foo = bar , the shell breaks it into three words, the first of which, foo , is taken as the name of the command, and the remaining two are its arguments.

For the same reason, the following expressions are also incorrect:

  foo = bar # WRONG!
 foo = bar # WRONG!
 $ foo = bar # ABSOLUTELY WRONG!

  foo = bar # Correct.

15. echo << EOF

Embedded documents are useful for embedding large blocks of text data into a script. When the interpreter encounters a similar construct, it directs lines up to the specified marker (in this case, EOF ) to the input command stream. Unfortunately, echo does not accept data from STDIN.

  # Wrong:
 echo << EOF
 Hello world
 EOF

  # Right:
 cat << EOF
 Hello world
 EOF

16. su -c 'some command'

On Linux, this syntax is correct and will not cause errors. The problem is that on some systems (like FreeBSD or Solaris), the -c argument to the su command has a completely different purpose. In particular, in FreeBSD, the -c switch specifies a class whose restrictions are applied when executing a command, and the shell arguments must appear after the target user name. If the username is missing, the -c option will refer to the su command, not the new shell. Therefore, it is recommended to always specify the name of the target user, regardless of the system (who knows which platforms your scripts will run on ...):

  su root -c 'some command' # Correct.

17. cd / foo; bar

If you do not check the output of cd , in case of an error, the command bar may not be executed in the directory where the developer intended This can be a disaster if bar contains something like rm * .

Therefore, you should always check the return code of the “cd” command. The easiest way:

  cd / foo && bar

If cd is followed by more than one command, you can write this:

  cd / foo ||  exit 1
 bar
 baz
 bat ... # Many teams.

cd will report a directory change error with a message to stderr like bash: cd: /foo: No such file or directory . If you want to display your error message in stdout, you should use command grouping:

  cd / net ||  {echo "Can't read / net. Make sure you’ve been logged in to the Samba network, and try again.";  exit 1;  }
 do_stuff
 more_stuff

Pay attention to the space between { and echo , as well as the semicolon before the closing } .

Some add a set -e command to the beginning of the script so that their scripts are interrupted after each command that returns a non-zero value, but this trick should be used with great care, since many common commands can return a non-zero value as a simple warning message (warning), and it is not necessary to consider such errors as critical.

By the way, if you work a lot with directories in a bash script, reread man bash in places related to the pushd , popd and dirs commands. Perhaps all your code stuffed with cd and pwd is simply not needed :).

Let's return to our sheep. Compare this snippet:

  find ... -type d |  while read subdir;  do
     cd "$ subdir" && whatever && ... && cd -
 done

with this:

  find ... -type d |  while read subdir;  do
     (cd "$ subdir" && whatever && ...)
 done

Forcing a call to a subshell causes cd and subsequent commands to run in subshell; in the next iteration of the cycle, we will return to the initial location, regardless of whether the change of directory was successful or it ended with an error. We do not need to return manually.

In addition, the penultimate example contains another error: if one of the whatever commands fails, we may not go back to the initial directory. To fix this without using a sub-shell, at the end of each iteration you will have to do something like cd "$ORIGINAL_DIR" , and this will add a little more confusion to your scripts.

18. [bar == "$ foo"]

Operator == not an argument to the command [ . Use = or replace [ keyword [[ :

  [bar = "$ foo"] && echo yes
 [[bar == $ foo]] && echo yes

19. for i in {1..10}; do ./something &; done

Do not put a semicolon ";" right after &. Just delete this extra character:

  for i in {1..10};  do ./something & done

The & character itself is a sign of the end of the command, just like ";" and line feed. You can not put them one by one.

20. cmd1 && cmd2 || cmd3

Many prefer to use && and || as an abbreviation for if ... then ... else ... fi . In some cases, it is absolutely safe:

  [[-s $ errorlog]] && echo "Uh oh, there were some errors."  ||  echo "Successful."

However, in the general case, this construction cannot serve as a complete equivalent of if ... fi , because the cmd2 command before && can also generate a return code, and if this code is not 0 , the command following the || will be executed. A simple example that can bring many into a state of stupor:

  i = 0
 true && ((i ++)) ||  ((i--))
 echo $ i # will print 0

What happened here? In theory, the variable i should take the value 1, but at the end of the script it contains 0. That is, both i ++ and i-- are executed sequentially. The ((i ++)) command returns a number that is the result of an expression in brackets in the C style. The value of this expression is 0 (the initial value is i), but in C an expression with an integer value of 0 is treated as false. Therefore, the expression ((i ++)), where i is 0, returns 1 (false) and the command is executed ((i--)).

This would not have happened if we used the pre-increment operator, because in this case the return code ++ i is true:

  i = 0
 true && ((++ i)) ||  ((--i))
 echo $ i # prints 1

But we were just lucky and our code works solely on a “random” coincidence of circumstances. Therefore, you cannot rely on x && y || z x && y || z , if there is the slightest chance that y will return false (the last code fragment will be executed with an error if i equals -1 instead of 0)

If you need security, or you doubt the mechanisms that make your code work, or you did not understand anything in the previous paragraphs, do not be lazy and write if ... fi in your scripts:

  i = 0
 if true;  then
     ((i ++))
 else
     ((i--))
 fi
 echo $ i # will print 1.

Bourne shell is also concerned:

 # Both command blocks are executed:
 $ true && {echo true;  false;  } ||  {echo false;  true;  }
 true
 false

21. Concerning UTF-8 and BOM (Byte-Order Mark, byte order mark)

In general: in Unix, UTF-8 encoded texts do not use byte order marks. Text encoding is determined by the locale, mime type of the file, or by some other metadata. Although the presence of a BOM will not spoil the UTF-8 document in terms of human readability, problems may arise with the automatic interpretation of such files as scripts, source codes, configuration files, etc. Files starting with BOM should be treated as alien, as well as files with DOS line breaks.

In shell scripts: “Where UTF-8 can be used transparently in 8-bit environments, the BOM will intersect with any protocol or file format that assumes ASCII characters at the beginning of the stream, for example, #! at the beginning of Unix shell scripts » http://unicode.org/faq/utf_bom.html#bom5

Source: https://habr.com/ru/post/47915/

All Articles