📜 ⬆️ ⬇️

Validation email

This article discusses email validation using regular expressions. All regexps are performed with the i modifier, i.e. make case-insensitive checking.

Training


Before writing a validation, you need to know what the email address consists of. I think everyone knows that this is “username @ hostname”. It would be best to break the creation of the regexp into 2 logical parts - hostname validation and username validation. Let's start with a more voluminous.

Hostname validation


First, let's think about it, but what does the hostname consist of?

The host name consists of several components, separated by a dot and not exceeding 63 characters, and suffixes (first-level domains). Components, in turn, consist of Latin letters, numbers and hyphens, and hyphens cannot be at the beginning or end of a component. Suffixes are a limited list of first-level domains (I found the list on the IANA website). To simplify the expression, we write the domains of countries as [az][az] (any 2 characters from a to z are not case sensitive). We also will not use non-Latin characters until they are officially introduced for public use. As a result, we obtain an expression that checks the suffix (the construction (foo|bar) indicates that the search is either foo or bar, that is, replaces or):
')
(aero|arpa|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|[az][az])

For components, the code will be more complicated:

([a-z0-9]([-a-z0-9]{0,61}[a-z0-9])?\.)

Understand the expression:

[a-z0-9] #
([-a-z0-9]{0,61}[a-z0-9])? #
\. #


Consider the optional part:

# ,
# {0,61} , 0 61
[-a-z0-9]{0,61}
# 61 , 63
[a-z0-9]


As a result, we received an expression responsible for checking the hostname:

([a-z0-9]([-a-z0-9]{0,61}[a-z0-9])?\.)*(aero|arpa|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|[az][az])

I draw attention to the fact that the presence of components is not necessary, since Some first-level domains are supported by servers. An example .

Username validation


Username may contain:

I will give the expression immediately:

[-a-z0-9!#$%&'*+/=?^_`{|}~]+(\.[-a-z0-9!#$%&'*+/=?^_`{|}~]+)*

In fact, everything is simple: 1 or more [-a-z0-9!#$%&'*+/=?^_`{|}~] , then 0 or more \.[-a-z0-9!#$%&'*+/=?^_`{|}~]+ .

Eventually


Regexp email verification:

^[-a-z0-9!#$%&'*+/=?^_`{|}~]+(\.[-a-z0-9!#$%&'*+/=?^_`{|}~]+)*@([a-z0-9]([-a-z0-9]{0,61}[a-z0-9])?\.)*(aero|arpa|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|[az][az])$

This expression can be optimized drop (about optimization, I think, there will be a separate article):

^[-a-z0-9!#$%&'*+/=?^_`{|}~]+(?:\.[-a-z0-9!#$%&'*+/=?^_`{|}~]+)*@(?:[a-z0-9]([-a-z0-9]{0,61}[a-z0-9])?\.)*(?:aero|arpa|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|[az][az])$

Bonus


Consider regexp, which was cited as an example in the comments to the introductory topic:

^(\S+)@([a-z0-9-]+)(\.)([az]{2,4})(\.?)([az]{0,4})+$

What are the main problems here?


PS Write about bugs and wishes - I will definitely fix it.
PPS At will, I can post the email verification function that I use.

Source: https://habr.com/ru/post/55820/


All Articles