📜 ⬆️ ⬇️

Exact check of email address by regular expression

As everyone knows, one of the most convenient ways to check e-mail addresses is regular expressions. Recently I had to face the problem of the most accurate verification of addresses. This check was necessary in the system of automatic distribution of spam questionnaires, where each list of addresses was loaded automatically by one large file. It was necessary to exclude the maximum number of obviously invalid addresses.
The problem was that all e-mail verification templates found on the Internet, MSDN and other sources did not meet the verification requirements. Referring to the original sources in the form of RFC 2821 and RFC 2821, I found out how to correctly and correctly address addresses.


E-mail address = local part @ domain part

Local part


')
Symbols allowed in local part:


Characters NOT allowed in local part


Characters that are undesirable to use in the local part, but which may be present. (Requires testing, whether their server accepts).

The reason why they should not be used in addresses is that many belong to the UNIX shell special characters group of characters.

Domain part



- maybe either in the form of an IP address, an IP address with a port, or just a literal expression containing only lowercase and uppercase Latin letters and a dash symbol ('-', but there is a restriction: there cannot be a dash either at the end or at the beginning ; about the limitation on two dashes in a row nothing is said), separated by dots. Accordingly, the expression domain..com is invalid.

As a result, modifying one of the Internet templates received:

^ [a-zA-Z0-9 _ '+ * / ^ & =? ~ {} \ -] (?? [a-zA-Z0-9 _' + * / ^ & =? ~ {} \ -]) * \ @ ((\ d {1,3} \. \ d {1,3} \. \ d {1,3} \. \ d {1,3} (\: \ d {1,3}) ?) | (((([a-zA-Z0-9] [a-zA-Z0-9 \ -] + [a-zA-Z0-9]) | ([a-zA-Z0-9]) 1,2})) [\.] {1}) + ([a-zA-Z] {2.6}))) $

References:
RFC 2821: www.remote.org/jochen/rfc/rfc821.txt
RFC 2822: www.remote.org/jochen/rfc/rfc822.txt

List of valid / invalid characters: www.remote.org/jochen/mail/info/chars.html

If the regular expression is incomplete or in some case incorrect suggestions and comments are welcome.
I would like to emphasize that this is a special case in which it took the use of serious and accurate verification (as requested by the client). In other cases, you can not bother :)

Source: https://habr.com/ru/post/74206/


All Articles