Using Regular Expressions in Ruby

Regular expressions are a salvation from all misfortunes for some and a nightmare for other developers, and speaking objectively, this is a powerful tool that, however, requires great care when applied. Regular expressions (regexes, regexps, regulars) in the Ruby language are based on the Perl 5 syntax and therefore are familiar to everyone who used Perl, Python or PHP. But Ruby is so good that each component of the language is implemented with its own approach, which simplifies the use of this tool and increases its power. The short article I proposed covers the features of regulars in Ruby and their use in various operators.

In Ruby, everything is an object

First of all, it is worth noting that a regular expression is an object of the corresponding class. Accordingly, it can be created by calling new and uniting (union).

r1 = Regexp.new “a” r2 = Regexp.new “b” ru = Regexp.union r1, r2

The expression resulting from the merge will match the rows that match at least one of the patterns to be merged.

')

The regular-string matching operator returns the index of the first match, or nil, but in many cases we also need other information about the match found. You can, like Perl, use special variables, $ ~, $ ', $ &, and so on. If the variables $ 1, $ 2, ..., corresponding to groups, are quite simple to remember, then how people generally use the rest for me has always remained a mystery. Therefore, in Ruby, of course, there is another approach - you can use the Regexp.last_match method

 “abcde” =~ /(b)(c)(d)/ Regexp.last_match[0] # "asd" Regexp.last_match[1] # "b" Regexp.last_match[2] # "c" Regexp.last_match[3] # "d" Regexp.last_match.pre_match # "a" Regexp.last_match.post_match # "e"

Named groups

Ruby, starting with version 1.9, supports the syntax of named groups:

 "a reverse b".gsub /(?<first>\w+) reverse (?<second>\w+)/, '\k<second> \k<first>' # “ba”

The same example demonstrates the use of backlinks, but this possibility already exists in all modern PCRE implementations.

\ k <group_name> - this special sequence is essentially the same as back links for named groups.

\ g <group_name> is the sequence corresponding to the repetition of a previously defined named group. The difference between them is simply to show by example:

 "1 1" =~ /(?<first>\d+) \k<first>/ # 0 "1 2" =~ /(?<first>\d+) \k<first>/ #nil "1 a" =~ /(?<first>\d+) \k<first>/ #nil "1 1" =~ /(?<first>\d+) \g<first>/ # 0 "1 2" =~ /(?<first>\d+) \g<first>/ # 0 "1 a" =~ /(?<first>\d+) \g<first>/ #nil

You can also get matches related to these groups through the MatchData object:

 Regexp.last_match[:first]

Other ways to check compliance

In addition to the traditional = ~ in Ruby, there are other ways to check a string for a regular expression match. In particular, the match method is used for this, which is especially good in that it can be called for both an object of the String class and an instance of Regexp. But that's not all. You can get a regular line match using the usual indexing method:

 "abcde"[/bc?f?/] # "bc"

, as well as the slice method:

 "abcde".slice(/bc?f?/) # "bc"

In addition, there is one more, seemingly not the most logical way:

 /bc?f?/ === "abcde" # true

It is unlikely that anyone will use this syntax, but this remarkable property of the Ruby language has an application, which will be described later.

The use of regulars in various functions

One of the most useful uses of regular expressions in Ruby, which is, however, not as common, is their use in the case statement. Example:

 str = 'september' case str when /june|july|august/: puts "it's summer" when /september|october|november/: puts "it's autumn" end

The thing is that the comparison in the case is just performed by the above operator === (more here ), which makes it very concise and elegant to use regexps in such cases.

Regulars can also be used in the split function. Example with ruby-doc:

 "1, 2.34,56, 7".split(%r{,\s*}) # ["1", "2.34", "56", "7"]

One way to get a list of words from a string using this function:

 “one two three”.split(/\W+/)

To work with Cyrillic strings:

 ",      ".split(/[^[:word:]]+/) # ["", "", "", "", "", "", ""] (ruby 1.9 only)

To split a string into parts, it is sometimes much more convenient to use the scan method. Previous example using this method:

 ",      ".scan(/[[:word:]]+/) # ["", "", "", "", "", "", ""] (ruby 1.9 only)

The sub function, which replaces the first occurrence of a substring, can also accept a Regexp object as input:

 "today is september 25".sub(/\w+mber/, 'july') # "today is july 25"

Similarly, you can use regular expressions in the sub! Gsub and gsub! Methods ..

The partition method, which divides the line into 3 parts, can also use a regular expression as a separator:

 "12:35".partition(/[:\.,]/) # ["12", ":", "35"]

Similarly, you can use regular expressions in the rpartition method.

The index and rindex methods can also work with regulars, they return, of course, the indices of the first and last entry in the string.

additional literature

1. Friedl - Regular Expressions

2. Flanagan, Matsumoto - Ruby Programming Language

3. Ruby-doc class Regexp

Source: https://habr.com/ru/post/156395/

All Articles