📜 ⬆️ ⬇️

About replacements in Vim using regular expressions

Hi, Habr! It's no secret that the old Vim is very good for solving a diverse range of problems. I would like to pee a little about one of the components that make our favorite editor as powerful as it is - about the replacement toolkit using regular expressions. I plan to build my story, talking about how I solved a couple of specific problems, and complementing this story with some basic background information.


On the one hand, about all this there is a most detailed help available at: help usr_27.txt - from there everything has been learned that will be discussed. On the other hand, when I needed to solve the problems described, I spent considerable time on this. This gives me the right to hope that my text will still be useful. I just want to make a reservation that I am a person far from programming, so my terminology may seem strange or ridiculous - please forgive me for this.

One day I was faced with the need to remove all tags from an html file. A little thought, I decided that I just need to make a replacement of everything that is surrounded by triangular brackets, in an empty space, i.e. Here is a replacement
')
<''> -> ' ' ().

Search and replace in Vim is done with the command: substitute, but where it is more convenient to use the abbreviation for it: s. The general syntax of this command is something like this:

:{}s/{ }/{ }/{}

The element {limits} must contain the area in which we would like the replacement to take place. If you omit this element, the search and replacement will be made only in the line where the cursor is located. For replacement in the entire file, you can use the '%' symbol. For search and replace in the region starting with the string l1 and ending with the string l2, {limits} must have the form 'l1, l2', for example: 14,17s / will search and replace in lines 14 through 17. Special mention deserve a line with a cursor, the number of which is symbolically indicated by a dot, and the last line, the number of which is denoted by a dollar sign. Thus, in order to perform a search from the current line to the end of the file, use the command ':., $ S /'.

This entire command, within the specified limits, searches for a sequence that satisfies the criteria of the element “replaceable”, and replaces this sequence with a sequence of characters constructed according to the rules of the element “with what to replace”, taking into account the options specified after the last slash.

The first team I tried to solve my problem was the following

:%s/<.*>//g

Until the first slash comes the search and replace command in the entire file. Between the first and second slashes is the sequence that Vim will look for. More about her.

First comes the triangle bracket, Vim will look for a literal match with it. A dot denotes any character, and an asterisk denotes the occurrence of the previous symbol an arbitrary number of times - starting from zero and to infinity. Thus, the sequence '. *' Means any sequence of any characters. Finally, a closing triangular bracket. Yes, I apologize if the terminology of “triangular brackets” offends the perception of those who remember that these are “less-more” signs (:

Between the second and third slashes there is a sequence of characters that will be substituted for the place of the sequence that meets the specified criteria. We just want a mass removal, so we have nothing there.

The g character that terminates a command indicates a search in the entire line. Otherwise, Vim would only look for the first match in each of the lines in {limits}. Another useful option is the 'n' option, which performs only the search, but does not replace it (this helps to check whether the actual search criteria match the desired ones), and 'c', which asks for confirmation before each replacement act.

So, the described command searches for a sequence that consists of any characters enclosed in triangular brackets. Vim will simply remove every such sequence. Unfortunately, this command does not work properly, since it searches for any characters between the triangular brackets. Including other triangular brackets. Therefore, if there are several pairs of triangular brackets in one line, Vim will select a sequence that begins with the first opening and ends with the last closing triangular bracket.

The conclusion suggests itself - you need to look for any character between the triangular brackets, excluding the closing triangular bracket. In this case, Vim has the appropriate command. If, when describing the desired sequence, enclose a certain set of characters in square brackets, then Vim will search for anything from these square brackets. For example, the pattern '[az]' will satisfy any lowercase Latin letter. If the first character between the square brackets is the cap '^', then Vim will be satisfied by finding anything, except that inside the brackets. In our case, the phrase

[^>]

will match anything other than the closing triangular bracket. Here it is necessary to add that for a pair of square brackets Vim searches for only one character. Those. The last written pattern is satisfied by any single character except the closing triangular bracket. In order for this sequence to satisfy as many characters as you like, you need to add an asterisk to it. As a result, the necessary team takes the form

:%s/<[^>]*>//g

You can figure out how to solve this problem in, say, a notebook, and in Vim. In a notebook, I would first massively replace the most popular tags with an empty space (for example, I would be the first to start replacing the 'p' tag with an empty space), and then I would look for triangular brackets and delete them and what is inside. It would take me a lot of time to process a really large file. And here everything is obtained in one team - so simple.

Now about one more task - on duty I have to use the Wolfram Mathematica program, which at the output gives a lot of ASCII information, which, in turn, needs to be processed for readability. For example, the search for the absolute value of an expression this program denotes the word 'Abs' and takes this expression in square brackets. I like to read math texts skipped through Latex, and finding the absolute value is quite natural to mark with vertical sticks (vertical bar). So I need to make a replacement in the whole file.

Abs[ '' ] -> | '' |

If it were necessary to simply remove all occurrences of the word 'Abs', it would be quite simple and similar to the previous task, but in this case we also need to save the 'expression', and, each time it will be new. What to do? The grouping team comes to the rescue. If, when describing the searched sequence, to enclose any expression in brackets \ (\), then Vim places it in the memory under the corresponding number (the first expression is numbered one, the second is two) and allows you to further call with the command \ x, where x is the number under which the expression was placed in memory.

Thus, the necessary command will look something like this:

:%s/Abs\[\([^\]]*\)\]/|\1|/g

It is worth noting that for a literal coincidence, the square brackets are preceded by slashes, since they are special characters. In general, any special character, if it must participate in the search, denoting its immediate meaning, is preceded by a slash: \ ^; \* etc. The slash itself is also preceded by a slash. It looks like this: to search for the sequence '\ cos', you must enter '\\ cos'.

Finally, the last task I would like to write about. The same Mathematica operates with a set of values, which are denoted by a capital Latin letter with a numeric index consisting of one digit. In ASCII format, this Latin letter and number just goes in a row, for example, 'U1'. In order for Latex to process them as a letter with an index, the index must be prefixed with an underscore '_'. The task is outlined - to make a replacement of the form

' ''' -> ' '_''

The most trivial solution that suggests itself is to go through all the combinations, if there are not many of them. That is, start the replacement first with 'U1' -> 'U_1', then 'U2' -> 'U_2', etc. It is clear that this is not our method. We recall that there are square brackets. And in order to find one capital Latin letter, simply enter the pattern '[AZ]'. But this is not the limit. For such a template, Vim has a special abbreviation: '\ u' (from 'uppercase'). For digits, there is '\ d' (from 'digit'). More information about such structures can be found at: help pattern.txt. Using these abbreviations, the search command will look like

:%s/\(\u\)\(\d\)/\1_\2/g

Here again there is a grouping in parentheses: it allows you to put the found letter and number into memory under the corresponding numbers when searching, and subsequently extract them from there by calling the commands with the same numbers: '\ 1' will call the letter, and '\ 2' will call the number.

These three simple tasks, it seems to me, perfectly demonstrate the capabilities of Vim in search and replace. I believe that if I would need to solve one of them, having a text editor like a notepad or, say, notepad ++ in my hands, the time I would spend on the solution would substantially exceed the time I would have spent on same machine copy vim (:

Source: https://habr.com/ru/post/119059/


All Articles