📜 ⬆️ ⬇️

Regular expressions in WinEdt: searching for formulas with unused numbers

After a more detailed acquaintance with the WinEdt editor's manual (intended almost exclusively for creating LaTeX documents), I discovered additional features of the search / replace tool of this program. To activate the "smart" search, you need to put a tick in the check box Regular Expressions in the Find or Find and Replace menu, as a result of which the search string will turn into a command line, with which you can work wonders. That is, it will be possible to do almost anything with the text, another question, which is sometimes too perverse (therefore, in the case of serious tasks, the creation of corresponding macros looks more appropriate).

Joke pro gynecologist
The gynecologist comes to get a job at the car wash. He is asked to disassemble, assemble the engine. He performs and is interested in evaluating his work. They answer him: “in principle, nothing, only now we see for the first time that all this was done through the exhaust pipe”.

I will give an example. It is necessary to find all unused labels \ label, that is, those to which there is not a single ref-reference in the text of the work (all labels that are never referred to, as stated in the English-language manual). An extra tag, innocuous in itself, can signal, in particular, that the Latek document formulas are excessively “over-numbered” (that is, that there are formulas in it with unused numbers). If the text is large enough and has many enumerated relations, the occurrence of such labels is almost inevitable (you once referred to this equation, then changed the text, removing the link, and you probably forgot to remove the number from the equation). At the same time, the manual detection of “extra” tags turns (again due to the large volume of material) into an overly cumbersome and, most importantly dull, mechanical work, the nature of which simply screams of a rational alternative.

So, let's solve the problem with the help of the “smart” search of the WinEdt editor (version 5.3 should certainly suffice). First of all, I note that WinEdt reserves memory cells (registers) with the names%! 0, ...,%! 9 for user needs. And it must be borne in mind that this memory is essentially operative in the sense that it is reset on each restart of WinEdt. We use this memory to save the contents of all links \ ref as a single long line: press ctrl + F, do not forget about the check box in the Regular Expressions checkbox in the menu that opens, in the search bar which we enter the following text:
')
\\ref\(\{*\}\)\X{\"|GetTag(0,0);LetReg(1,"%!1%!0");|} 



Some explanations (partially revealing the meaning of the last line). When Regular Expressions is on, some characters (for example, \, {and}) acquire a service meaning; if we need them in their immediate meaning (that is, as the corresponding symbols), they should be used together with a forward slash (for example, \\, \ {and \}). But there are exceptions: for example, parentheses themselves are not official symbols (thus, literally meaning parentheses), but in the combinations \ (and \), on the contrary, they have a special meaning. Text enclosed between \ (and \) turns into a so-called tag (tag expression or marked text) and can be used later: refer to this text (for example, in the same search string to detect a repeating fragment or in the string " replace with ") is performed using the \ 0 command (zero is the default number of the tagged fragment). If there is a need to separate several parts, you should use constructions of the form:

 \(0 -  \), …, \(9 -  \) 

and the commands \ 0, ..., \ 9 to refer to the relevant parts.

And what's the asterisk * between \ {and \} at the beginning of the entered text? This asterisk is called a pattern and means an arbitrary sequence of characters (including an empty one) within a single line (I note that, starting with WinEdt 5.3, the combination ** encodes arbitrary text that includes line breaks).

Thus, the character set:

 \\ref\(\{*\}\) 

that is, the first part of the expression under consideration, specifies the search for any combinations of the form: \ ref {arbitrary text}. When such a combination is detected, the macros are started, as evidenced by the second part of the expression starting with \ X (in the absence of the macro launch command, WinEdt simply goes to the found combination, highlighting it in the text of the document). Moreover, the macro startup command can begin with \ x (the register matters!), As well as with \ Xx and \ xX. The fact is that depending on the results of executing the WinEdt macros, it can either go to the found fragment (in our case it is \ ref {arbitrary text}), and ignore it (as if it were different from the one specified in the search line) by clicking to search for the next match. And which one of these two alternatives it prefers is determined by the “x-command” register and the value of the boolean variable IFOK used by WinEdt (by default equal to true), which some macros can change. In the case of the \ X command, the WinEdt response agrees with the IFOK value: if the IFOK value is true, WinEdt jumps to the found fragment; if it is false, WinEdt ignores this fragment. In the case of the \ x command, the WinEdt response to the IFOK value is exactly the opposite, and when using \ Xx or \ xX, WinEdt displays the detected text regardless of the IFOK value.

Let us consider in more detail the second part of the line being analyzed, that is, the command:

 \X{\"|GetTag(0,0);LetReg(1,"%!1%!0");|} 

It runs two macros: GetTag (0,0) and LetReg (1, "%! 1%! 0"). The GetTag (n, m) macro writes the contents of the nth tag (in our case, the zero tag, that is, the argument of the \ ref command along with the curly braces framing it) into the mth register, that is, into the memory cell named %! m (in our case - with the name%! 0). The macro LetReg (k, "string") writes its second argument to the k-th cell (without framing quotes). It turns out that in our case, LetReg overwrites the first register (initially there is nothing in it), adding to it the contents of the 0th register, i.e. the argument enclosed in curly brackets of the \ ref command detected by WinEdt. Thus, in order to put in the cell%! 0 the sequence of arguments of all the \ ref commands found in the text, you can type in the search string:

 \\ref\(\{*\}\)\X{\"|GetTag(0,0);LetReg(1,"%!1%!0");|} 

go search through the entire document. This is done relatively easily and quickly: after the first successful detection of a given text, we search for all subsequent occurrences by pressing and holding the F3 key (for a document containing many hundreds of numbered relationships, F3 had to be held no more than 30 seconds). However, there is an alternative option - you can use the WinEdt editor replacement tool: press ctrl + R, in the search bar enter:

 \\ref\(\{*\}\)\X{\"|GetTag(0,0);LetReg(1,"%!1%!0");|} 

in the line "replace with":

 \ref\(\{*\}\) 

when prompted to confirm the replacement, select All and go ahead (do not forget about the tick in the check box Regular Expressions).

The preparatory work is complete. Now the detection of unused tags is carried out by calling the search with the expression:

 \\label{\{\(*\)\}}\x{FindInString("%!1","\0");} 

in the search box (think about the link \ xc IFOK!). Search with argument:

 \\label{\{\(*\)\}}\X{FindInString("%!1","\0");} 

solves the reverse question, showing only those labels that appear in the argument of at least one of the ref-commands. I note that the presence of external curly braces in the expression: \\ label {\ (\ {* \} \)} is syntactically superfluous, however, if they are not available, the WinEdt search gives, generally speaking, an incorrect result. This feature does not have a rational explanation - it must be remembered (in the English-language manual it is simply said that it is important to use: {\ {\ (* \) \}} because \ {\ (* \) \} will not work here! ).

Anecdote about the Georgian school
The teacher at the lesson of the Russian language in the Georgian school: “Children, remember: the words sol, bean and noodles are written with a soft sign, and the words fork, gurgle and plate - without. This is inexplicable and you just need to remember! ”

I also note that framing the arguments of the ref-commands with curly brackets when writing to the%! 0 and%! 1 registers is not strictly required, however, it is highly advisable, since it allows to avoid errors in cases like the one below:

 … \label{h1} … \label{h2} … \label{1h} … \label{h3} … \ref{h1} … \ref{h2} … \label{h} 

(if instead of the \\ ref \ (\ {* \} \) construct we use \\ ref \ {\ (* \) \} without including {and} in the tagged fragment, the contents of the links form the string: h1h2, the search for which will give a false result about using labels with the names h and 1h). This, however, does not relieve us of all possible errors, since the arguments of labels and references themselves may contain (of course, in a pair-wise manner) curly brackets (for example, \ label {h {1}}). To completely eliminate misunderstandings, the easiest way is to stop using curly brackets when naming references; if you managed to create a huge document with an incredible number of links, in the names of which there are data brackets, then you probably can’t do without a special macro.

So, the method outlined here allows (with the above clause) to detect all cases where an environment that generates a number (for example, an equation) contains unused \ label labels. But an “extra” number may appear even when such an environment does not contain a \ label at all. Fortunately, using WinEdt advanced search mechanisms is easy to find (Search for field):

 \\begin\{equation\}\(**\)\end\{equation\}\x{FindInString("\0","\label");} 

and even fix (field Replace with):

 \begin\{equation\*\}\0\end\{equation\*\} 

all such misunderstandings (for definiteness, the case of the equation mentioned above is considered).

Source: https://habr.com/ru/post/192356/


All Articles