Basics of regular expressions in javascript

If you sometimes glance at regular expressions, but you still cannot decide to master them, thinking that all this is incredibly difficult - know - you are not alone. For anyone who does not understand what regular expressions are, or do not understand how they work, they look like complete nonsense.

Powerful picture to attract attention :) Caution, can suck!

But, in fact, regular expressions are a powerful tool that can help you save a lot of time. In this article we will look at the basics of regular expressions in JavaScript.

Creating regular expressions in js

In JavaScript, a regular expression is one of the object types used to search for combinations of characters in strings.
')
There are two ways to create regular expressions.

The first is to use regular expression literals. With this approach, the regular expression pattern is enclosed in slashes. It looks like this:

var regexLiteral = /cat/;

The second involves the RegExp object's constructor, which is passed a string from which it creates a regular expression:

 var regexConstructor = new RegExp("cat");

In both of the above examples, the same pattern is created - the character c , followed by the character a , followed by the character t .

What way to create regular expressions to choose? Here it is necessary to adhere to this rule: if you intend to use a regular expression in such a way that it remains unchanged - it is better to use a literal. If your regular expression is dynamic, it may change during the execution of the program, it is better to use the RegExp constructor.

Regular expression methods

Above, you may have noticed that regular expressions in JS are objects. Objects, as you know, have methods, and regular expressions are no exception.

One of the main regular expression methods is .test() , which returns a boolean value:

 RegExp.prototype.test()

Namely, this method returns true if the string contains a match with the specified regular expression pattern. If no match is found, it returns false .

Consider the following example. We have two strings and one regular expression. We can use a regular expression to check whether a given text pattern occurs in strings:

 const str1 = "the cat says meow"; const str2 = "the dog says bark"; const hasCat = /cat/; hasCat.test(str1); // true hasCat.test(str2); // false

As expected, when we check the first line, str1 , for the presence of a sequence of cat characters in it, we get true . But after checking the second line, str2 , we do not find cat in it, so the .test() method returns false .

Basic Regular Expression Constructs

Fortunately (or, unfortunately, this is someone else), the basic approach to the study of regular expressions is to memorize the basic constructs denoting symbols and groups of symbols.

Here is a short list of basic regular expression constructs. If you are serious about studying them, set aside some 20 minutes and learn these constructions.

▍ Symbols

. (dot) - matches any single character except for line breaks.
* - corresponds to the previous expression, which is repeated 0 or more times.
+ - matches the previous expression, which is repeated 1 or more times.
? - corresponds to the previous expression, repeated 0 or 1 time.
^ - matches the beginning of the line.
$ - matches the end of the line.

Group of characters

\d matches any single numeric character.
\w - matches any character - a digit, letter, or underscore.
[XYZ] - a set of characters. Matches any single character in the set specified in brackets. In addition, character ranges can be specified in a similar way, for example, [AZ] .
[XYZ]+ - Matches a character from brackets, repeated one or more times.
[^AZ] - inside expressions defining character ranges, the symbol ^ used as a negation sign. In this example, the pattern matches everything that is not uppercase.

▍ Flags

There are five optional regular expression flags. They can be used together or separately, they are placed after the closing slash. Regular expressions with flags look like this: /[AZ]/g . We will consider here only two flags:

g - global search by string.
i - case-insensitive search.

▍ Additional constructions

(x) - exciting brackets. This expression corresponds to x and remembers this correspondence; as a result, we can use it later.
(?:x) - non-capturing brackets. The expression matches x , but does not remember this match.
x(?=y) is a forward match. Matches x only if it is followed by y .

More sophisticated regular expression examples

Before we proceed to the educational project, let us dwell in more detail on the practical use of what we have just considered.

First, check the string for the presence of any digits. In order to do this, we can use the \d pattern. Take a look at the code below. It returns true in cases where there is at least one digit in the string under study.

 console.log(/\d/.test('12-34')); // true

As you can see, the code returns true - this is not surprising, since there are four numeric characters in the string under study.

But what if we need to check a string for the presence of a certain sequence of digital characters? In such a case, you can use the \d pattern repeated several times. For example, in order for a regular expression to match line 11 , you can use the \d\d construct, which describes any two consecutive numeric characters. Take a look at this code:

 console.log(/\d-\d-\d-\d/.test('1-2-3-4')); // true console.log(/\d-\d-\d-\d/.test('1-23-4')); // false

As you can see, here we check the string to see if there are sequences of single digits separated by lines in it. The first line of this pattern matches, and the second - no.

What if no matter how many digits are before or after the lines, if their number is greater than or equal to one? In this situation, you can use the + sign to indicate that the /d pattern may occur one or more times. Here's what it looks like:

 console.log(/\d+-\d+/.test('12-34')); // true console.log(/\d+-\d+/.test('1-234')); // true console.log(/\d+-\d+/.test('-34')); // false

In order to make life easier for ourselves, we can use brackets and group expressions with their help. Let's say we want to check if there is something in the line that resembles a cat's meow. To do this, you can use the following construction:

 console.log(/me+(ow)+w/.test('meeeeowowoww')); // true

Happened. Now let's look at this expression in more detail. In fact, a lot of interesting things happen here.

So, here is the regular expression.

 /me+(ow)+w/

m - corresponds to a single letter m .
e+ - matches the letter e , repeated one or more times.
(ow)+ matches the combination of ow repeated one or more times.
w - corresponds to a single letter w .

As a result, this expression interprets the string as follows:

 'm' + 'eeee' +'owowow' + 'w'

As you can see, if operators like + are used immediately after the expressions enclosed in brackets, they refer to everything that is in brackets.

Here is another example, does it concern the use of an operator ? . The question mark indicates that the presence of the character preceding it in the string is optional.

Take a look at this:

 console.log(/cats? says?/i.test('the Cat says meow')); // true console.log(/cats? says?/i.test('the Cats say meow')); // true

As you can see, each of the expressions returns true . This is because we made the s characters at the end of the sequences cat and say optional. In addition, you can notice that at the end of the regular expression is the flag i . Thanks to him when analyzing strings ignored case of characters. That is why a regular expression reacts to both the cat line and the Cat line.

About escaping service characters

Regular expressions are enclosed in slashes. In addition, some characters like + ? , and others, have a special meaning. If you need to organize a search in the strings of these special characters, they need to be escaped with a backslash. Here's what it looks like:

 var slash = /\//; var qmark = /\?/;

In addition, it is important to note that you can use different regular expressions to search for the same string structures. Here are a couple of examples:

\d is the same as [0-9] . Each of these expressions matches any numeric character.
\w is the same as [A-Za-z0-9_] . Both will find in the string any single alphanumeric character or underscore.

Project №1: adding spaces to lines built in camel Style

Now it's time to put this knowledge into practice. In our first project, we are going to write a function that accepts a string, like CamelCase , and adds spaces between the individual words of which it is composed. Using a ready-made function, which we call removeCc , looks like this:

 removeCc('camelCase') // =>  'camel Case'

For starters, you need to write a skeleton of a function that accepts a string and returns a new string:

 function removeCc(str){ //    }

Now we just need to write in the return expression of this function some kind of construction that uses regular expressions that process the input data. In order to do this, you first need to find all capital letters in the string, using a construction that defines a range of characters and provides a global search in the string.

 /[AZ]/g

This regular expression will respond to the letter C in the string camelCase . How to add a space before this letter C ?

In order to do this, we need exciting brackets. In regular expressions, exciting parentheses are used to find matches and to memorize them. This allows us to use the stored values when we need them. Here's how to work with exciting brackets:

 //   /([AZ])/ //     $1

Here you can see that we use the construct $1 to refer to the captured value. It is worth noting that if there are two sets of exciting brackets in the expression, you can use the expressions $1 and $2 to refer to the captured values in the order they follow from left to right. In this case, exciting brackets can be used as many times as needed in a particular situation.

Notice that we do not need to capture the value in brackets. It is possible and not to use it, or to use non-capturing brackets with the help of a construction of the form (?:x) In this example, there is a match with x , but it is not remembered.

Let's return to our project. There is a String object method that can be used to work with exciting brackets — this is .replace() . In order to use it, we will search for any capital letters in the string. The second argument of the method, representing the replacement value, will be the stored value:

 function removeCc(str){ return str.replace(/([AZ])/g, '$1'); }

We are already close to the solution, although the goals have not yet been achieved. Take a look at our code again. Here we capture capital letters, then change them to the same letters. And we need to have spaces in front of them. It's quite simple to do this - just add a space before the $1 variable. As a result, before each capital letter in the string that the function returns, there will be a space. As a result, we got the following:

 function removeCc(str){ return str.replace(/([AZ])/g, ' $1'); } removeCc('camelCase') // 'camel Case' removeCc('helloWorldItIsMe') // 'hello World It Is Me'

Project №2: removing capital letters from a line

We will continue to bring the lines written in camel Style to a normal form. So far this problem has been solved only partially. Namely, we are not satisfied with the fact that the final line contains an excessive number of capital letters.

Now we will deal with the removal of extra capital letters from the line and replacing them with capital letters. Before reading further, think about this problem and try to find a solution. However, if you fail, don’t be discouraged, as the solution to our problem, although simple, cannot be called very simple.

So, the first thing we need is to select all capital letters in the line. It uses the same construction as in the previous example:

 /[AZ]/g

Here we will use the .replace() method already familiar to you, but this time we will need something new when calling this method. This is how the outline of what we need will look like. Question marks indicate this new, as yet unknown, code:

 function lowerCase(str){ return str.replace(/[AZ]/g, ???); }

The .replace() method is remarkable in that we can use a function as its second parameter. This function will be called after a match is found, and what this function returns will be used as a string, replacing what the regular expression has found.

If we also use the global search flag, the function will be called for every match with the pattern found in the string. With this in mind, we can use the .toLowerCase() method of the String object to convert the input string to the desired form. Here is how, taking into account the above, the solution to our problem looks like:

 function lowerCase(str){ return str.replace(/[AZ]/g, u => u.toLowerCase()); } lowerCase('camel Case') // 'camel case' lowerCase('hello World It Is Me') // 'hello world it is me'

Project №3: conversion to upper case the first letter of the first word of a line

This will be our last training project, in which we are going to make the first letter of the line being processed capitalized. This is what we expect from the new feature:

 capitalize('camel case') // =>     'Camel case'

Here, as before, we will use the .replace() method. However, this time we need to find only the very first character of the string. In order to do this, use the symbol ^ . Recall one of the above examples:

 console.log(/cat/.test('the cat says meow')); // true

If you add the ^ symbol to the beginning of the template, true this construction will not return. This will happen because the word cat is not at the beginning of the line:

 console.log(/^cat/.test('the cat says meow')); // false

We need the special character ^ affect any lower-case character at the beginning of a line. Therefore, we add it right before the [az] construct. As a result, the regular expression will only respond to the first letter of the string in lowercase:

 /^[az]/

In addition, here we do not use the global search flag, since we need to find only one match with the template. Now all that is left to do is convert the found character to upper case. This can be done using the string method .toUpperCase() :

 function capitalize(str){ return str.replace(/^[az]/, u => u.toUpperCase()); } capitalize('camel case') // 'Camel case' capitalize('hello world it is me') // 'Hello world it is me'

Sharing previously created functions

Now we have everything necessary to turn the lines written in camel Style into lines, separate words in which are separated by spaces, and which begin with a capital letter, despite the fact that the words inside these lines will be written in capital letters. Here's how the shared functions you just created will look like:

 function removeCc(str){ return str.replace(/([AZ])/g, ' $1'); } function lowerCase(str){ return str.replace(/[AZ]/g, u => u.toLowerCase()); } function capitalize(str){ return str.replace(/^[az]/, u => u.toUpperCase()); } capitalize(lowerCase(removeCc('camelCaseIsFun'))); // "Camel case is fun"

Results

As you can see, although regular expressions look very unusual for an unprepared person, they can be easily mastered. The best way to learn regular expressions is practice. We suggest that you try the following: write, on the basis of the three functions we have created, one that converts the string passed to it, like camelCase, into a regular sentence and adds a period after its last word.

Dear readers! If you managed to write a function that was just discussed - we suggest sharing its code in the comments. In addition, if you are familiar with regular expressions, please tell us if this familiarity helped you in real projects.

Well, the top Habrapostov about regular expressions.

Source: https://habr.com/ru/post/343798/

All Articles