Linux basics from the founder of Gentoo. Part 2 (4/5): Text Processing and Redirects

In this passage you will learn about the many interesting and useful commands for working with textual data in Linux. Also given are the basics of working with I / O streams in bash.

Navigating Linux basics from the founder of Gentoo:

Part I
BASH: Basics of Navigation (Intro)
File and Directory Management
Links, and deleting files and directories
Glob-substitutions (totals and links)

Part II
Regular Expressions (Intro)
Folder Assignments, File Search
Process management
Text processing and redirection
Kernel modules (totals and links)

Word processing

Go back to redirect

Earlier in this series of tutorials, we saw an example of using the> operator to redirect the output of a command to a file, as shown below:

$ echo "firstfile" > copyme

In addition to redirecting output to a file, we can use such a powerful shell chip as channels (pipes). Using the pipe, we can transfer the output of one command to the input of another. Consider the following example:
')

 $ echo "hi there" | wc 
 1 2 9

Symbol | used to connect the output of the command to the left, to the input of the command to the right of it. In the example above, the echo command prints “hi there” with a line feed at the end. This output usually appears in the terminal, but the channel redirects it to the wc command, which shows the number of lines, words, and characters.

Example with channels (pipes)

Here is another simple example:

$ ls -s | sort -n

In this case, ls -s would normally output the current directory to the terminal, with the size in front of each file. However, instead, we pass the output to the sort -n program, which will numerically sort it. This is very convenient for finding files that occupy the most space in a directory.

The following examples are more complicated, they demonstrate the power and convenience that can be obtained using channels. Next, we use teams that have not yet been reviewed, but do not direct your attention to them. Instead, concentrate on understanding how the pipes work and how you can use them in your daily work with Linux.

Unpacking channel

To unzip and unzip a file, you could do the following:

$ bzip2 -d linux-2.4.16.tar.bz2
$ tar xvf linux-2.4.16.tar

The disadvantage of this method is the creation of an intermediate, unzipped file on disk. Since tar can read data directly from its input (instead of the specified file), we can get the same final result using the pipe:

$ bzip2 -dc linux-2.4.16.tar.bz2 | tar xvf -

Woooo! The compressed tarball was unpacked and we did without an intermediate file.

Channel is longer

Here is another example of the pipe:

$ cat myfile.txt | sort | uniq | wc -l

We use cat to send the contents of myfile.txt to the sort command. When sort receives input data, it sorts it line by line alphabetically, and sends it to the uniq program in this form. uniq removes duplicate lines (by the way, uniq requires a sorted list of entries) and sends the result to wc -l. We considered the wc command earlier, but without its options. When the -l option is specified, the command displays only the number of lines, the number of words and characters in this case are not displayed. You will see that such a pipe will print the number of unique lines in a text file.

Try creating a couple of files in your text editor. Use on them the given pipe and look at the result that you get.

Storm text processing begins!

We will now proceed to a cursory examination of Linux commands for standard word processing. Since we are going to consider many programs now, we will have no place for examples for each of them. Instead, we encourage you to read the man pages of the commands given (typing man echo, for example) and study each command with its options, spending some time playing with them. As a rule, these commands print the result of processing to the terminal, and do not modify the file directly. After this quick review, we’ll take a deeper look at I / O redirection. So yes, the light is already visible at the end of the tunnel. :)

echo prints its arguments to the terminal. Use the -e option if you want to include control sequences in the output; for example echo -e 'foo \ nfoo' will type foo, then go to a new line, then type foo again. Use the -n option to prevent echo from adding a newline character to the end of the output, as is done by default.

cat prints the contents of the specified file to the terminal. Convenient as the first command of the pipe, for example, cat foo.txt | blah.

sort displays the contents of the file specified on the command line in alphabetical order. Naturally, sort can also accept input from the pipe. Type man sort to see the command options that control the sorting options.

uniq accepts a already sorted file or data stream (via pipe) and removes duplicate lines.

wc displays the number of lines, words and characters in the specified file or in the input stream (from the pipe). Enter man wc to learn how to customize the output of the program.

head prints the first ten lines of a file or stream. Use the -n option to specify how many lines should be displayed.

tail prints the last ten lines of a file or stream. Use the -n option to specify how many lines should be displayed.

tac is like cat , but prints all the lines in reverse order, in other words, the last line prints first.

Expand converts input tab characters to spaces. The -t option specifies the size of the tab.

unexpand converts input spaces to tabs. The -t option specifies the size of the tab.

cut is used to extract from the input file or stream fields separated by the specified character. (try echo 'abc def ghi jkl' | cut -d '' -f2,2 approx. lane)

The nl command adds its number to each input line. Convenient for printing.

pr breaks a file into pages and numbers them; usually used for printing.

tr is a translation tool (s) for characters; used to display specific characters in the input stream to the specified characters in the output stream.

sed is a powerful stream-oriented text editor. You can learn more about sed from the following guides on the Funtoo website:

If you are planning to take the LPI exam, be sure to read the first two articles in this series.

awk - iskusny language of line-by-line parsing and processing of the input stream according to specified templates. To learn more about awk, read the following series of guides on the Funtoo website:

od is designed to represent input stream in octal, hexadecimal, etc. format.

split - this command is used to split large files into several smaller, more manageable parts.

fmt is used to "wrap" long lines of text. Today it is not very useful, because this feature is built into most text editors, although the command is good enough to know it.

paste takes two or more files as input, combines line by line and displays the result. It may be convenient to create tables or columns of text.

join is similar to paste, this utility allows you to combine two files on a common field (by default, the first field in each line).

tee prints input arguments to a file and to the screen at the same time. This is useful when you want to create a log for something, and also want to see the process on the screen.

The storm is over! Redirection

Like> in the command line, you can use <to redirect a file, but as an input to the command. For many commands, you can simply specify the file name. Unfortunately, some programs work only with standard input stream.

Bash and other shells support the “herefile” concept. This allows you to give input to a command as a set of lines followed by a command that signifies the end of the input sequence of values. This is easiest to show with an example:

$ sort <<END
apple
cranberry
banana
END
apple
banana
cranberry

In the example above, we enter the words apple, cranberry and banana, followed by "END" to indicate the end of the input. The sort program then returns our words in alphabetical order.

Using ">>"

You can expect >> to be somewhat similar to <<, but it is not. It allows you to simply add output to a file, rather than overwrite it every time it does>. Example:

$ echo Hi > myfile
$ echo there. > myfile
$ cat myfile
there.

Woops! We lost part with “Hi”! And this is what we meant:

$ echo Hi > myfile
$ echo there. >> myfile
$ cat myfile
Hi
there.

That's better!

Thank you Dmitry Minsky (Dmitry.Minsky@gmail.com) for the translation.

Continued ...

About the authors

Daniel Robbins

Daniel Robbins is the founder of the Gentoo community and the creator of the Gentoo Linux operating system. Daniel lives in New Mexico with his wife, Mary, and two energetic daughters. He is also the founder and head of Funtoo , has written many technical articles for IBM developerWorks , Intel Developer Services and the C / C ++ Users Journal.

Chris Houser

Chris Hauser was a UNIX supporter since 1994 when he joined the team of administrators at Taylor University (Indiana, USA), where he received a bachelor's degree in computer science and mathematics. After that, he worked in many areas, including web applications, video editing, drivers for UNIX, and cryptographic protection. Currently working in Sentry Data Systems. Chris also contributed to many free projects, such as Gentoo Linux and Clojure, co-authored The Joy of Clojure .

Aron griffis

Airon Griffis lives in Boston, where he spent the last decade working with Hewlett-Packard on projects such as UNIX network drivers for Tru64, Linux security certification, Xen and KVM virtualization, and most recently, the HP ePrint platform. In his spare time, Airon prefers to ponder over the problems of programming while riding his bike, juggling bits, or cheering on the Boston Red Baseball team.

Source: https://habr.com/ru/post/105926/

All Articles