📜 ⬆️ ⬇️

100% correct way to verify email addresses

Congratulations. Starting today, you will never waste time selecting the most optimal regular expression for checking your email address. And you will never again reject the address, which to your surprise turned out to be valid.

The trick is to immediately determine the meaning of the word "valid."

We are developers - technical guys, so the most logical would be to check for compliance with official criteria. Here are some examples of valid email addresses that meet the criteria .
')

en.wikipedia.org/wiki/Email_address#Valid_email_addresses

Everything you know is wrong


The above example of checking with regex largely ignores reality. Instead, I suppose we need to ask two questions:

  1. Does the user understand that he is required to enter an email address in this field?
  2. Did he enter his address correctly in the field?

If you have a well-marked form with the signature "email" and the user somewhere in this form entered the symbol '@', then it will be safe to assume that the answer to the first question is positive. It is easy.

Next, we want to verify that the user entered his email address correctly.

Impossible.

It is important that you agree with me in this thesis: it is impossible .

I know what you think. “But it will help , right?” Well, this is how a quick closure of the refrigerator saves energy and helps fight global warming. Of course it will help if we want to become slaves of the word "help." But most people will agree that you have a promising career in a straitjacket, if you rattle with canned food to save polar bears.

Investigate the issue


Imagine my email address is davidgilbertson@example.com . This is 27 keystrokes on the keyboard, which can lead to a typo. An error in any character will definitely lead to the entry of an incorrect address, but this address will not necessarily be invalid.

[epiphany]

Even if the sun had shone brightly in the window, a particularly wild sneeze would have overwhelmed me (I suffer from
light sneezing reflex
Poorly studied autosomal dominant disease - approx.
) and I mistakenly typed # !$%&&*+ -/ =?^_ʻERT ()|~@example.com - so, I still pass most of the checks for a "valid" email address. (Even worse, if the system says that the address is invalid, when it actually exists! For the sake of interest, I just wrote to the address #!$%&'*+-/=?^_ʻ | | ~~@example.com - and the girl said that she was terribly enraged when her email address was considered invalid, she also regret buying the example.com domain, but is not going to give up, like that guy who bought milk.com. We started chatting and it turned out that she lives only a couple of blocks away from me and also collects vintage cameras, next week we will go play golf. I think maybe she is the same one. other things, I need to finish talking and continue to write the article).

So what is the probability that any one typo will invalidate the address? Let's make a statistical model! For example, here is the letter 'g'. I would rather make a typo on the visible keyboard without pressing Shift (in the model I take into account the “unchanged” keys). Among all the keys that can be pressed on the physical keyboard, there are only six characters, which in some cases can invalidate the address: [] \; and space . 6 of 48. The probability of 12%.

But a typo on the next key is more likely. For example, clicking on 'h' instead of 'g'. So from a list of 117 million existing email addresses, I calculated the frequency of each character, for each of them I wrote down what keys are next to each other from the keyboard, and brought up the final probability that a typo would lead to an invalid address (I know that hacking LinkedIn for the sake of reasoning about validation postal addresses were a bit too much, but any opinion is important to confirm the real data).

For example, the character 'e' is considered a low-risk invalidation character, because all the surrounding characters will leave the email address still valid. But next to 'p' is' [ 'and' ; 'at the touch of a button! So although it is a rarer character than 'e', ​​it carries a greater risk of making the address invalid if the key misses.

I also took the calculation of the relative dexterity of each finger. We all know that the little finger is a mentally retarded relative in the family of fingers, so this is taken into account.


Graphic representation of the model showing the typo zone around P, taking into account the shortcomings of the little finger

Now suppose Silky (Fox) is sitting on the Shift key and I press the wrong key on the keyboard. Here I risk to get on one of six dangerous keys: [] \ ;, and a space . And again, they invalidate the address only under certain conditions. It is more likely that the pressed Shift key will only work for letters on one side of the “dog” in the address, so the letter 'l' on either side is considered especially dangerous.

All of the above refers to the only typo, but if I make a second typo, then there is a chance that the address will become valid again (for example, if you add \ after \). Of course, all this is taken into account in the model.

Needless to say, I made the same effort in calculating the model for on-screen keyboards.

Remember also that if you make a typo in the @ symbol, the error will be taken into account at the very first stage, when we check the presence of @ as the user's intention to enter an email address.

I also put some common sense into the model. It is known that people with the address aol.com clumsily typing. Deryls tend to press all keys with their index fingers, as if they are afraid that each button will burn their hand. People with the letter 'z' in the name use mechanical keyboards and rarely make mistakes. Famous life axioms.

I also took into account the fact that any button in front of the "dog" in the address is ignored, and that the 'f' and 'h' are in many ways the same letter, if you think carefully.

Result


So, having taken into account all the factors, I missed 117 million addresses through the model. And the probability that an incorrect email address will be recorded by the address validation program is ...

0.000000000000000000000000000000000000625%

I'm afraid that I don’t have time to type an algorithm that absolutely definitely exists and is incontestably flawless, so you have to take my word for it that this number is not in any way invented.

Total


It makes no sense to try to find out if the address is "valid". The user is much more likely to enter an incorrect and valid address than an invalid one.

Therefore, it is better to spend your time literally on any other matter than to check the validity of email addresses.

100% correct way


Send users an activation letter . (Here is the bullet point for the effect).

I published a continuation of this article primarily on how to help users avoid entering the wrong email address. With real code! Forward. Read .



If it seems to you that this article is meaningless and stupid, and you want more of the same, rate my podcast " David Reads Wikipedia ". He is exactly what you thought.

Source: https://habr.com/ru/post/320572/


All Articles