Linux basics from the founder of Gentoo. Part 2 (1/5): Regular Expressions

Foreword

About this tutorial

Welcome to Administration Basics, the second of four tutorials designed to prepare you for exam 101 at the Linux Professional Institute. In this part we will look at how to use regular expressions to search for text in files by patterns. Then, you will get acquainted with the “File System Hierarchy Standard” (Filesystem Hierarchy Standard or abbr. FHS), we will also show you how to find the necessary files in your system. After that, you will learn how to get complete control over the processes in Linux, running them in the background, viewing the list of processes, disconnecting them from the terminal, and much more. Next comes a quick introduction to pipelines, redirects, and word processing commands. Finally, we will introduce you to the Linux kernel modules.

In particular, this part of the tutorial (Part 2) is ideal for those who already have a good basic knowledge of bash and want a good introduction to the basic tasks of administering Linux. If you are new to Linux, we recommend that you first complete the first part of this series of practical guides . For some, much of this material will be new, and more experienced Linux users may find it an excellent tool to sum up their basic administrative skills.

If you studied the first edition of this tutorial for a purpose other than preparing for the LPI exam, then you may not need to reread this edition. However, if you are planning to take an exam, then you are strongly advised to re-read this, the revised version of the tutorial.

Regular expressions

What is a "regular expression"?

A regular expression (in English. The regular expression, abbreviated “regexp” or “regex”, sometimes called “regular” in the fatherland - lane) is a special syntax used to describe text patterns. In Linux-based systems, regular expressions are widely used to search in text by pattern, as well as for search and replace operations on text streams.

Compared to Globing

As soon as we begin to look at regular expressions, you may notice that their syntax is very similar to the globbing syntax of the file that we discussed in the first part. However, one should not be mistaken, this similarity is very superficial. Regular expressions and globbing patterns, even when they look similar, are fundamentally different things.

Simple substring

After this caution, let's consider the most basic in regular expressions, the simplest substring. To do this, we will use “grep”, a command that scans the contents of the file according to the specified regular expression. grep prints every line that matches a regular expression, ignoring the rest:

 $ grep bash /etc/passwd 
 operator:x:11:0:operator:/root:/bin/bash root:x:0:0::/root:/bin/bash ftp:x:40:1::/home/ftp:/bin/bash

Above, the first parameter for grep is regex; the second is the file name. grep read each line from / etc / passwd and applied a simple regex substring “bash” to it in search of a match. If a match was found, grep would print the entire string; otherwise, the string was ignored.

Understanding of simple substrings

In general, if you are looking for a substring, you can simply specify it literally, without using any “special” characters. You need to take special care only if your substring contains +,., *, [,] Or \, in this case, these characters should be escaped with a backslash, and the substring should be in quotes. Here are some examples of regular expressions in the form of a simple substring:

/ tmp (search string / tmp)
"\ [box \]" (search string [box])
"\ * funny \ *" (search string * funny *)
"Ld \ .so" (search string ld.so)

Metacharacters

Using regular expressions using metacharacters, it is possible to perform a much more complex search than in the examples that have been considered recently. One of these metacharacters is "." (dot) that matches any single character:

 $ grep dev.sda /etc/fstab 
 /dev/sda3 / reiserfs noatime,ro 1 1 /dev/sda1 /boot reiserfs noauto,noatime,notail 1 2 /dev/sda2 swap swap sw 0 0 #/dev/sda4 /mnt/extra reiserfs noatime,rw 1 1

In this example, the text dev.sda does not appear literally in any of the lines in / etc / fstab . However, grep scans it not literally on the dev.sda line, but on the dev.sda pattern. Remember that "." will match any single character. As you can see, the metacharacter "." functionally equivalent to how the "?" metacharacter works in glob permutations.

Using []

If we want to set a character more specifically than it does a ".", Then we can use [and] (square brackets) to specify a subset of characters to match:

 $ grep dev.sda[12] /etc/fstab 
 /dev/sda1 /boot reiserfs noauto,noatime,notail 1 2 /dev/sda2 swap swap sw 0 0

As you noticed, in particular, this syntactic construction works identically to the "[]" construction when glob-substitute file names. Again, this is one of the ambiguities in the study of regular expressions: the syntax is similar, but not identical to the syntax of glob-substitutions, which is confusing.

Using [^]

You can invert the value of the square brackets by placing ^ immediately after [. In this case, the brackets will match any character that is NOT listed inside them. And again, notice that [^] we use with regular expressions, and [!] With glob:

 $ grep dev.hda[^12] /etc/fstab 
 /dev/hda3 / reiserfs noatime,ro 1 1 #/dev/hda4 /mnt/extra reiserfs noatime,rw 1 1

Different syntax

It is very important to note that the syntax inside the square brackets is fundamentally different from the rest of the regular expression. For example, if you put "." inside the square brackets, this will allow the square brackets to match the "." literally, just like 1 and 2 in the example above. For comparison, "." placed outside the square brackets will be interpreted as a metacharacter, if you do not attach the "\". We can benefit from this fact to output lines from / etc / fstab that contain the string dev.sda, as it is written:

$ grep dev[.]sda /etc/fstab

Also, we could type:

$ grep "dev\.sda" /etc/fstab

These regular expressions probably do not satisfy a single line from your / etc / fstab file.

Matasimvol *

Some metacharacters themselves do not correspond to anything, but change the value of the previous character. One of these characters is * (asterisk), which is used to match zero or more repetitions of the preceding character. Note that this means that * has a different meaning in regulars than in globing. Here are a few examples, and pay special attention to cases where regular expression matching is different from glob-lookups:

ab * c matches abbbbc, but not abqc (in the case of glob substitutions, both lines will match the pattern. Do you already understand why?)
ab * c coincides with “abc”, but not with “abbqbbc” (again, with glob-substitution, the pattern is comparable with both lines)
ab * c matches ac, but not cba (in the case of globing, neither ac nor cba matches the pattern)
b [cq] * e matches “bqe” and “be” (glob-lookup satisfies “bqe”, but not “be”)
b [cq] * e coincides with “bccqqe”, but not with “bccc” (when globing, the pattern exactly matches the first, but not the second)
b [cq] * e coincides with “bqqcce”, but not with “cqe” (also with glob-permutation)
b [cq] * e satisfies “bbbeee” (but not in the case of globing)
. * is comparable to any string (glob-lookup satisfies only strings starting with ".")
foo. * matches any substring starting with “foo” (in the case of a glob substitution, this pattern will match the strings beginning with the four characters “foo.”)

So, to repeat for fixing: the line “ac” fits the regular expression “ab * c” because the asterisk also allows the repetition of the previous expression (b) zero times. And again, it is valuable to note that the metacharacter * in regulars is interpreted in a completely different way than the character * in glob-substrates.

Start and end of line

The last metacharacters that we will look at in detail are ^ and $, which are used to match the beginning and end of the line, respectively. Using ^ at the beginning of your regex, you “attach” your template to the beginning of the line. In the following example, we use the regular expression ^ #, which satisfies any string starting with the # character:

$ grep ^# /etc/fstab # /etc/fstab: static file system information. #

Full-line regulars

^ and $ can be combined to match the entire string. For example, the following regular line will match the lines starting with the # character, and the ending ".", With an arbitrary number of characters between them:

$ grep '^#.*\.$' /etc/fstab
# /etc/fstab: static file system information.

In the example above, we enclosed our regular expression in single quotes to prevent the shell from interpreting the $ character. Without single quotes, $ would disappear from our regular season even before grep could see it.

Continued ...

About the authors

Daniel Robbins

Daniel Robbins is the founder of the Gentoo community and the creator of the Gentoo Linux operating system. Daniel lives in New Mexico with his wife, Mary, and two energetic daughters. He is also the founder and head of Funtoo , has written many technical articles for IBM developerWorks , Intel Developer Services and the C / C ++ Users Journal.

Chris Houser

Chris Hauser was a UNIX supporter since 1994 when he joined the team of administrators at Taylor University (Indiana, USA), where he received a bachelor's degree in computer science and mathematics. After that, he worked in many areas, including web applications, video editing, drivers for UNIX, and cryptographic protection. Currently working in Sentry Data Systems. Chris also contributed to many free projects, such as Gentoo Linux and Clojure, co-authored The Joy of Clojure .

Aron griffis

Airon Griffis lives in Boston, where he spent the last decade working with Hewlett-Packard on projects such as UNIX network drivers for Tru64, Linux security certification, Xen and KVM virtualization, and most recently, the HP ePrint platform. In his spare time, Airon prefers to ponder over the problems of programming while riding his bike, juggling bits, or cheering on the Boston Red Baseball team.

Source: https://habr.com/ru/post/102442/

All Articles