
It is known that regular expressions are, in fact, patterns of characters that define a specific search rule. And, among other things, with their help it is possible to check a certain string or strings for compliance with a certain predetermined pattern or standard.
One of the vivid and fairly simple examples of using regular expressions in Java and not only is checking the verification of user data entered during registration on a specific site for correctness. First of all, this concerns the e-mail address, since it always has certain spelling requirements.
Thus, I propose to analyze the particular case of the use of regular expressions on the example of a simple Java application that could handle the email address entered by the user.
So, in Java, all classes that describe regular expressions are stored in the java.util.regex package. We need two classes - Pattern and Matcher.
The first class, as can be seen from its name, describes the pattern or template to which the data entered by us must correspond (in our case, the e-mail address), the second - the data itself.
public class RegularExpression { public static void main(String [] args) { Pattern pattern = Pattern.compile(“”); Matcher matcher = pattern.matcher(“”); boolean matches = matcher.matches(); } }
Using the Pattern class, an object is created that is returned by the static compile () method. This method has no constructors and a string is passed to this method, which, in fact, will be our template. In addition, the Pattern class provides a matcher method, in which another string is passed by parameters — the one we want to check for compliance with the above pattern. This method creates an instance of the Matcher class.
The Matcher class, in turn, has a matches () method that returns true if the data matches the pattern and flase if the data has not passed the test. The result is written in the boolean variable matches.
In order to go, in fact, to the most important part - the template itself and its syntax, it is necessary to agree on part of the requirements for an email address. In general, the email address should:
- Consist of two parts, separated by the “@” symbol.
- The left part must consist of English letters or numbers, may contain periods and dashes, and at least one letter must necessarily follow a dot or a dash.
- The right side must contain at least one dot at the end, after which two to four letters must follow.
- Both parts must begin with alphabetic characters.
We start the test on the left side. It contains alphanumeric characters from one to infinity (in fact, of course, their number is limited, but for clarity, let us imagine a potentially endless email). The regular expression syntax is described as follows:
Pattern pattern = Pattern.compile(“[A-Za-z0-9]{1,}”);
Characters in square brackets indicate the range of possible literal values.
In curly brackets, we describe the permissible number of characters specified earlier. The minimum value (one) is indicated on the left of the comma, and the maximum value on the right. The absence of a value, as in our case, means that the number of characters does not have a maximum value. A pattern can also contain a strictly fixed number of characters or not at all. In the latter case, the symbol can be used only once.
Further in our pattern there can be a dash sign. “Maybe” means that the symbol will be present in the pattern either once or never, therefore:
Pattern pattern = Pattern.compile(“[A-Za-z0-9]){1,}[\\-]{0,1}”);
In the case of the presence of a dash, as we have already stipulated, at least one letter must follow it, i.e. the initial pattern is repeated. The dash symbol is denoted as [\\ -]:
Pattern pattern = Pattern.compile(“[A-Za-z0-9]{1,}[\\-]{0,1}[A-Za-z0-9]{1,}”);
In addition, the line may also contain a period ([\\.]), After which, again, an alphabetic character must follow:
Pattern pattern = Pattern.compile(“[A-Za-z0-9]{1,}[\\-]{0,1}[A-Za-z0-9]{1,}[\\.]{0,1}[A-Za-z0-9]{1,}”);
Since the described pattern of the left part must be repeated, we denote this in the following form:
Pattern pattern = Pattern.compile(“([A-Za-z0-9]{1,}[\\-]{0,1}[A-Za-z0-9]{1,}[\\.]{0,1}[A-Za-z0-9]{1,})+”);
A plus sign after a common bracket means that a pattern can repeat from one time to an indefinite number of times.
Since the left part is separated from the right dog, we indicate that after the left part, this symbol will be present without fail:
Pattern pattern = Pattern.compile(“([A-Za-z0-9]{1,}[\\-]{0,1}[A-Za-z0-9]{1,}[\\.]{0,1}[A-Za-z0-9]{1,})+@”);
The right part of the pattern should contain the already mentioned set of letters in the amount from one to infinity with the obligatory presence of a dot at the end. As in the previous case, the pattern to the point can be repeated:
Pattern pattern = Pattern.compile(“([A-Za-z0-9]{1,}[\\-]{0,1}[A-Za-z0-9]{1,}[\\.]{0,1}[A-Za-z0-9]{1,})+@([A-Za-z0-9]{1,}[\\-]{0,1}[A-Za-z0-9]{1,}[\\.]{0,1}[A-Za-z0-9]{1,})+[\\.]{1});
At the end of the pattern, the symbols should follow again, and in the amount of from two to four:
Pattern pattern = Pattern.compile(“([A-Za-z0-9]{1,}[\\-]{0,1}[A-Za-z0-9]{1,}[\\.]{0,1}[A-Za-z0-9]{1,})+@([A-Za-z0-9]{1,}[\\-]{0,1}[A-Za-z0-9]{1,}[\\.]{0,1}[A-Za-z0-9]{1,})+[\\.]{1}[az]{2,4}”);
That's the whole pattern. Not small, is not it? Fortunately, there is a way to somewhat reduce this set, making it more readable and easy to read.
To begin with, there is a way to simultaneously express the presence of a dash or a dot in a pattern. Instead of separately tracing all points ([\\.]) And dashes ([\\ -]), they can be expressed by a single symbol -
[\\ .-] . Using it, we can reduce the pattern to the following:
Pattern pattern = Pattern.compile(“([A-Za-z0-9]{1,}[\\.-]{0,1}[A-Za-z0-9]{1,})+@([A-Za-z0-9]{1,}[\\.-]{0,1}[A-Za-z0-9]{1,})+[\\.]{1}[az]{2,4}”);
Also, there is a symbol that can mean any letter or number -
\\ w . Those. it is able to replace the description of the type [A-Za-z0-9]:
Pattern pattern = Pattern.compile(“(\\w{1,}[\\.-]{0,1}\\w{1,})+@(\\w{1,}[\\.-]{0,1}\\w{1,})+[\\.]{1}[az]{2,4}”);
Since the plus sign means the presence of a symbol in the amount from one to infinity, the above can also be reduced to:
Pattern pattern = Pattern.compile(“(\\w+[\\.-]{0,1}\\w+)+@(\\w+[\\.-]{0,1}\\w+)+[\\.]{1}[az]{2,4}”);
In addition, the presence of a symbol at most once can be designated by a symbol
? :
Pattern pattern = Pattern.compile(“(\\w+[\\.-]?\\w+)+@(\\w+[\\.-]?\\w+)+[\\.]{1}[az]{2,4}”);
There is also a symbol meaning the presence of something in the pattern an indefinite number of times, i.e. {0,}. It is designated as *. In the final version, we have the following:
Pattern pattern = Pattern.compile(“\\w+([\\.-]?\\w+)*@\\w+([\\.-]?\\w+)*\\.\\w{2,4}”);
Such a pattern is much more compact than what we came to earlier. All that remains for us now is to implement the formal part of the application using our ready-made pattern and a boolean variable:
public class RegularExpression { public static void main(String [] args) { Pattern pattern = Pattern.compile(“\\w+([\\.-]?\\w+)*@\\w+([\\.-]?\\w+)*\\.\\w{2,4}”); Matcher matcher = pattern.matcher(“”); boolean matches = matcher.matches(); } }
How exactly to use this boolean variable is a matter of taste or possibilities. The key moments are already ready and further refinement remains entirely yours.
I hope this presentation was quite affordable. And of course, go for it.