Bash scripts: start
Bash scripts, part 2: loops
Bash scripts, part 3: command line options and keys
Bash scripts, part 4: input and output
Bash Scripts, Part 5: Signals, Background Tasks, Script Management
Bash scripts, part 6: functions and library development
Bash scripts, part 7: sed and word processing
Bash scripts, part 8: awk data processing language
Bash scripts, part 9: regular expressions
Bash scripts, part 10: practical examples
Bash scripts, part 11: expect and automate interactive utilities
$ awk options program file
-F fs
- allows you to specify a delimiter character for fields in the record.-f file
- specifies the name of the file from which to read the awk script.-v var=value β
allows you to declare a variable and set its default value, which will be used by awk.-mf N
- sets the maximum number of fields to be processed in the data file.-mr N β
sets the maximum record size in the data file.-W keyword
- allows you to set the compatibility mode or awk alert level.
program
. It points to an awk-script file written by a programmer and intended for reading data, processing it and outputting results. $ awk '{print "Welcome to awk command tutorial"}'
STDIN
.Enter
, awk will process the entered data using the script specified when it was started. Awk processes the text from the input stream line by line; in this way it looks like sed. In our case, awk does nothing with the data, it only, in response to each new line it receives, displays the text specified in the print
command.CTRL + D
$0 β
represents the entire line of text (write).$1 β
first field.$2 β
second field.$n β
n-th field. $ awk '{print $1}' myfile
$1
, which allows you to access the first field of each line and display it on the screen.-F
key, which allows you to specify the separator required for processing a specific file: $ awk -F: '{print $1}' /etc/passwd
/etc/passwd
. Since colons are used as delimiters in this file, this particular character was passed to awk after the -F
key. $ echo "My name is Tom" | awk '{$4="Adam"; print $0}'
$4
, and the second displays the entire line.-f
. Prepare a file testfile
, in which we write the following: {print $1 " has a home directory at " $6}
$ awk -F: -f testfile /etc/passwd
/etc/passwd
names of users who fall into the $1
variable, and their home directories, which fall into $6
. Notice that the script file is set with the -f
key, and the field separator, a colon in our case, with the -F
key. { text = " has a home directory at " print $1 text $6 }
print
command. If you reproduce the previous example, writing this code to the file testfile
will testfile
the same thing.BEGIN
. The commands that follow BEGIN
will be executed before data processing begins. In its simplest form, it looks like this: $ awk 'BEGIN {print "Hello World!"}'
$ awk 'BEGIN {print "The File Contents:"} {print $0}' myfile
BEGIN
block, after which the data is processed. Be careful with single quotes using similar constructs on the command line. Notice that both the BEGIN
block and the thread processing commands are one line in the awk view. The first single quote delimiting this string is before BEGIN
. The second is after the closing brace of the data processing command.END
keyword allows you to specify the commands to be executed after the end of data processing: $ awk 'BEGIN {print "The File Contents:"} {print $0} END {print "End of File"}' myfile
END
block commands. This is a useful feature, with its help, for example, you can create a basement report. Now we will write a script with the following content and save it in a myscript
file: BEGIN { print "The latest list of users and shells" print " UserName \t HomePath" print "-------- \t -------" FS=":" } { print $1 " \t " $6 } END { print "The end" }
BEGIN
block, the heading of the tabular report is created. In the same section, we specify the delimiter character. After the end of the file processing, thanks to the END
block, the system will inform us that the work is finished. $ awk -f myscript /etc/passwd
$1
, $2
, $3
, which allow us to extract field values; we worked with some other variables. In fact, they are quite a lot. Here are some of the most commonly used:FIELDWIDTHS β
spaceFIELDWIDTHS β
separated list of numbers that defines the exact width of each data field, taking into account field separators.FS
is a variable you already know that allows you to specify a field separator character.RS β
variable that allows you to specify a record separator character.OFS β
a field separator in the output of an awk script.ORS β
separator records on the output of the awk-script.
OFS
variable is configured to use a space. It can be installed as needed for data output purposes: $ awk 'BEGIN{FS=":"; OFS="-"} {print $1,$6,$7}' /etc/passwd
FIELDWIDTHS
variable allows FIELDWIDTHS
to read records without using the field separator character.FIELDWIDTHS
variable so that its contents correspond to the data presentation features.FIELDWIDTHS
variable is FIELDWIDTHS
awk will ignore the FS
variable and find the data fields according to the width information specified in FIELDWIDTHS
.testfile
file containing such data: 1235.9652147.91 927-8.365217.27 36257.8157492.5
$ awk 'BEGIN{FIELDWIDTHS="3 5 2 5"}{print $1,$2,$3,$4}' testfile
FIELDWIDTHS
, as a result, the numbers and other characters in the lines are broken according to the specified width of the fields.RS
and ORS
define the order in which records are processed. By default, RS
and ORS
set to a newline character. This means that awk takes each new line of text as a new record and displays each record from a new line.addresses
: Person Name 123 High Street (222) 466-1234 Another person 487 High Street (523) 643-8754
FS
and RS
set to default values, awk will consider each new line as a separate entry and highlight the fields based on spaces. This is not what we need in this case.FS
. This will indicate to awk that each line in the data stream is a separate field.RS
variable. Notice that in the file, data blocks for different people are separated by a blank line. As a result, awk will treat blank lines as record delimiters. Here's how to do it all: $ awk 'BEGIN{FS="\n"; RS=""} {print $1,$3}' addresses
ARGC
is the number of command line arguments.ARGV
is an array with command line arguments.ARGIND
- the index of the current file being processed in the arrayARGV
.ENVIRON
is an associative array with environment variables and their values.ERRNO
is a system error code that may occur when reading or closing input files.FILENAME
is the name of the input data file.FNR
is the current record number in the data file.IGNORECASE
- if this variable is set to a non-zero value, the case is ignored during processing.NF
is the total number of data fields in the current record.NR
is the total number of records processed.
ARGC
and ARGV
variables allow you to work with command line arguments. In this case, the script passed by awk does not fall into the array of arguments ARGV
. Let's write this script: $ awk 'BEGIN{print ARGC,ARGV[1]}' myfile
ARGV
recorded the name of the file being processed. In the array element with index 0 in this case will be βawkβ.ENVIRON
is an associative array with environment variables. Let's try it out: $ awk ' BEGIN{ print ENVIRON["HOME"] print ENVIRON["PATH"] }'
ENVIRON
. You can do this, for example, as follows: $ echo | awk -v home=$HOME '{print "My home is " home}'
NF
allows you to access the last data field in the record without knowing its exact position: $ awk 'BEGIN{FS=":"; OFS=":"} {print $1,$NF}' /etc/passwd
$
sign in front of the NF
.FNR
and NR
, although they may seem similar, are actually different. Thus, the FNR
variable stores the number of records processed in the current file. The variable NR
stores the total number of records processed. Consider a couple of examples, passing the same file to awk twice: $ awk 'BEGIN{FS=","}{print $1,"FNR="FNR}' myfile myfile
FNR
reset at the beginning of processing each file.NR
variable behaves in this situation: $ awk ' BEGIN {FS=","} {print $1,"FNR="FNR,"NR="NR} END{print "There were",NR,"records processed"}' myfile myfile
FNR
, as in the previous example, is reset at the beginning of processing each file, but NR
, when moving to the next file, saves the value. $ awk ' BEGIN{ test="This is a test" print test }'
if-then-else
format in many programming languages. A single-line version of the operator is an if
keyword followed by, in parentheses, the checked expression, and then the command to be executed if the expression is true.testfile
: 10 15 6 33 45
$ awk '{if ($1 > 20) print $1}' testfile
if
block, you need to enclose them in braces: $ awk '{ if ($1 > 20) { x = $1 * 2 print x } }' testfile
else
block: $ awk '{ if ($1 > 20) { x = $1 * 2 print x } else { x = $1 / 2 print x }}' testfile
else
branch can be part of a single-line record of a conditional statement, including only one line with the command. In this case, after the if
branch, immediately before the else
, you need to put a semicolon: $ awk '{if ($1 > 20) print $1 * 2; else print $1 / 2}' testfile
while
allows you to iterate over the data sets by checking the condition that stops the loop.myfile
, the processing of which we want to organize using a loop: 124 127 130 112 142 135 175 158 245
$ awk '{ total = 0 i = 1 while (i < 4) { total += $i i++ } avg = total / 3 print "Average:",avg }' testfile
while
enumerates the fields of each record, accumulating their sum in the variable total
and increasing each variable by 1 counter variable i
. When i
reaches 4, the condition at the entrance to the cycle will be false and the cycle will end, after which the remaining commands will be executed - the calculation of the average value for the numeric fields of the current record and the output of the found value.while
loops, you can use the break
and continue
commands. The first allows you to complete the cycle ahead of time and proceed to the execution of commands located after it. The second allows, without completing the end of the current iteration, to the next.break
command works: $ awk '{ total = 0 i = 1 while (i < 4) { total += $i if (i == 2) break i++ } avg = total / 2 print "The average of the first two elements is:",avg }' testfile
$ awk '{ total = 0 for (i = 1; i < 4; i++) { total += $i } avg = total / 3 print "Average:",avg }' testfile
while
, to increment the counter ourselves.printf
command in awk allows you to output formatted data. It allows you to customize the appearance of the output data through the use of templates, which may contain text data and format specifiers.printf
. %[modifier]control-letter
c
- perceives the number passed to it as an ASCII character code and outputs this character.d
- displays a decimal integer.i
is the same asd
.e
- displays a number in exponential form.f
- displays a floating point number.g
- displays the number either in exponential notation or in floating point format, depending on how it is shorter.o
- displays the octal number representation.s
- displays a text string.
printf
: $ awk 'BEGIN{ x = 100 * 100 printf "The result is: %e\n", x }'
printf
.cos(x)
βx
(x
).sin(x)
βx
.exp(x)
β .int(x)
β .log(x)
β .rand()
β 0 β 1.sqrt(x)
βx
.
$ awk 'BEGIN{x=exp(5); print x}'
toupper
: $ awk 'BEGIN{x = "likegeeks"; print toupper(x)}'
$ awk ' function myprint() { printf "The user %s has home path at %s\n", $1,$6 } BEGIN{FS=":"} { myprint() }' /etc/passwd
myprint
, .Source: https://habr.com/ru/post/327754/
All Articles