As everyone knows, one of the most convenient ways to check e-mail addresses is regular expressions. Recently I had to face the problem of the most accurate verification of addresses. This check was necessary in the system of automatic distribution of
spam questionnaires, where each list of addresses was loaded automatically by one large file. It was necessary to exclude the maximum number of obviously invalid addresses.
The problem was that all e-mail verification templates found on the Internet, MSDN and other sources did not meet the verification requirements. Referring to the original sources in the form of RFC 2821 and RFC 2821, I found out how to correctly and correctly address addresses.
E-mail address = local part @ domain part
Local part
')
Symbols allowed in local part:
- +
- -
- . (except for cases local..part - two points in a row, .localPart - a point at the beginning and localPart. - a point at the end)
- 0-9
- AZ, az
- -
Characters NOT allowed in local part
- !
- “
- #
- $
- %
- (
- )
- ,
- :
- ;
- <
- >
- [
- \
- ]
- '
- |
- SPACE, DEL, Control chars
Characters that are undesirable to use in the local part, but which may be present. (Requires testing, whether their server accepts).
The reason why they should not be used in addresses is that many belong to the UNIX shell special characters group of characters.
Domain part
- maybe either in the form of an IP address, an IP address with a port, or just a literal expression containing only lowercase and uppercase Latin letters and a dash symbol ('-', but there is a restriction: there cannot be a dash either at the end or at the beginning ; about the limitation on two dashes in a row nothing is said), separated by dots. Accordingly, the expression domain..com is invalid.
As a result, modifying one of the Internet templates received:
^ [a-zA-Z0-9 _ '+ * / ^ & =? ~ {} \ -] (?? [a-zA-Z0-9 _' + * / ^ & =? ~ {} \ -]) * \ @ ((\ d {1,3} \. \ d {1,3} \. \ d {1,3} \. \ d {1,3} (\: \ d {1,3}) ?) | (((([a-zA-Z0-9] [a-zA-Z0-9 \ -] + [a-zA-Z0-9]) | ([a-zA-Z0-9]) 1,2})) [\.] {1}) + ([a-zA-Z] {2.6}))) $References:
RFC 2821:
www.remote.org/jochen/rfc/rfc821.txtRFC 2822:
www.remote.org/jochen/rfc/rfc822.txtList of valid / invalid characters:
www.remote.org/jochen/mail/info/chars.htmlIf the regular expression is incomplete or in some case incorrect suggestions and comments are welcome.
I would like to emphasize that this is a special case in which it took the use of serious and accurate verification (as requested by the client). In other cases, you can not bother :)