📜 ⬆️ ⬇️

Analysis of accounts of one (not) reliable email-service

Statistics

Start


I have a hobby - collecting merged into the Internet database and other "trash". Once I decided to rummage around RGhost in search of “yummy”. Often on file hosting find instructions on connecting to anything with logins and passwords nested there. This time I accidentally stumbled upon a 700 MB text file "dump.txt", in which the login-password pairs were beautiful. At first I was upset: “Probably, these are all accessible, gathered together, recently merged Yandex, Mail, Google accounts,” I thought. But no ... Having studied the discovery a bit, I was stunned.

For those on the dial-up: the day before, the information security community exploded from the news of 1,260,614 addresses merged on September 5 from Yandex, 8 September 4,425,522 addresses from Mail.ru and 10 September 4,661,763 addresses from Gmail.com. Total 10 347 899 addresses from 3 services. All of them, apparently, were obtained from the bot network and caused a sensation - except that the lazy one did not write an article about it then.

And so, in front of me lay a file with 25,929,527 pairs of login-password from one Russian mail service ... with working pairs! It shocked me then. 25 million accounts, some with very complex passwords. This means that they all lay in an open (or poorly obfuscated) form in the database of a very large e-mail provider. After all, the same hashes, for example, from the passwords "18101547481590210" or "y4_F37TRf-2U-k", were not twisted. Friends, please do not use the same passwords for different services. As practice shows, very often the owners of even large sites do not care enough about their users and store passwords in databases in the clear.

You ask, how did I know that the credits are from this email provider? Of course, after a little analysis, but everything is in order.
')

Where does the drain come from?


I think that any information security specialist who compiled his own dictionaries for brutus will answer this question rather quickly. Of course, having made a frequency list of passwords in the Top 100, there is sure to be a service name. Yes, sadly, but people still put the name of the service as a password. For example, in the Top 100 merged passwords from Adobe there are the following instances: "adobe123", "adobe1" and "adobeadobe". The last, as they say, “freaked out”, apparently, the registration system did not accept just “adobe” as a password. In our case, on the 59th place from the Top 100 passwords, I found the required @ SERVICENAME @.

Top Passwords


Having such a database, I decided to compile my list of the most frequently used passwords for further effective brut. It is clear that this list will be optimal with other things being equal - for Russian-language email services. However, it would still be useful. Compiled a frequency list and the question arose, how many top passwords would be optimal for the most efficient brut? To answer this question made a histogram. Swung on Top 1000 passwords:



We see that after the top 200 passwords the number of their uses is insignificant.

Bar chart for top 200 passwords:



Again, after the Top 50, the frequency of using passwords remains low. Bar graph for Top 50 passwords:



Next, I compared the effectiveness of using the Top 50 passwords with respect to the Top 100 passwords. When using the list in the Top 50 passwords, we guess the password with a probability of 6.98%, and when using the Top 100 passwords, the probability is 8.72%. With an increase in the time of the brutus by 2 times, the efficiency will increase by only 1.74%. Subjectively, but the list of the Top 50 passwords seemed to me sufficient.

Of the year


Then I wondered, but how relevant is this base? When did they start filling it out and when did it merge? I offer my own answer to this question. I assumed that I could get this information from the same login passwords. I selected a regular expression (19 ** OR 20 **) passwords and logins and built a histogram:



What can I say, looking at the chart? Obviously, 1987 is the most frequent year of birth among users of the service. You can also find a correlation with demographic statistics :


Take a look at the period from 1999 to 2020:



From the assumption that users often indicate the date of registration in the login, we see that the logins with the date 2007 are the largest number, with the date 2008 - a little less, and with the date 2009 - a lot less. Considering the annual growth of the public, it can be concluded that the base ceased to be filled (i.e. it was merged) in mid-2008. It can also be assumed that the base began to be filled around 1999.

Endings


It is known that users like endings like “123” or “1” to “strengthen” their passwords. I decided to finally make sure of this.

Bar graph on top 50 password endings:



It is empirically understandable (and you can see on the histogram) that it makes sense to check only the end points “1” and “123”, and then under special conditions. For example, when with high probability we can say that the user has a password with an ending. Or when you unsuccessfully tried, for example, the Top 10,000 passwords, then you can check the Top 100 passwords with the end of "1" and "123". Also, when compiling a dictionary of passwords with endings, one should take into account that endings are most often added to dictionary passwords, for example, “password123”, “qwerty1”, but rarely “19411945123”. In the latter case, it will not always be clear that this is the “ending” and not the “root” of the password.

To make it clearer - 1% of people use passwords with endings "123" and "1", only 5% of people use endings from the list of Top 50 endings in their passwords. In general, additionally checking a password with endings is almost always not as effective as simply checking the next password in frequency.

Interesting Facts


1. The effectiveness of the frequency dictionary. Interestingly, only 0.00054% of passwords (50 pieces) can crack 6.98% of accounts! Strong, is not it?

2. Names . I was very surprised that the top 100 passwords included the names marina (60th), nikita (86th) and natasha (98th). Why exactly these names ?! So Marina, Nikita and Natasha are less concerned about their IS, then everyone loves Marin, Nikita and Natasha. In general, it remains a mystery to me.

3. Weak passwords . In the Top 50 passwords there were passwords "1" and "123", apparently, then any length of the password was allowed during registration. They, of course, should not be added to the list for the brutus of modern services, such passwords are now banned almost everywhere.

4. 1941-1945. In the histogram of passwords with dates, there are 3 peaks 1937, 1941, 1945.



For Russians, these dates of the Great Victory are very close to their hearts, and therefore indicate them in their passwords. I was glad for the country and for the patriotism of Russian users. Well done!

5. What happened in 1937? Known date? And I do not. He broke his head trying to find out what kind of event happened this year, and it’s so important that for a Russian he stands along with the years of the Great Patriotic War. Google didn't help either. What happened? I decided to just a convenient combination for dialing on the NumPad-e, because a similar peak was found in 8246, 2846, and the like. But in the comments, Anton-K and bobermaniac recalled that the Great Terror had happened. Solzhenitsyn in the GULAG Archipelago wrote about this in great detail. I recommend to read. Terror was not limited to the years 1937-1938, but during this period it was exceptional.

6. About Marin. At the request of StopDesign and ilrandir I add detailed statistics on accounts named “marina”:
25929527   93547     "marin"   Casing-   (  93547): 1593 (1.7 %)   "marina" 95 (0.1 %)   "Marina" 42 (0.04 %) c  "MARINA" 7 (0.007%)   "marina"  Suffle-Casing- ("mArINA", "MaRiNa", "MARINa") 658 (0.7 %)    /marina\d+/ ("marina43") 43 (0.05 %)    /Marina\d+/ ("Marina51236") 18 (0.02 %)    /MARINA\d+/ ("MARINA8734") 651 (0.7 %)     "marina" ("marinaiii", "17marina77", "Hrayr & Marina") 9314      "marin",    "marina" 


Conclusion


After this incident, it became obvious to me that the information that flows into the public is only a small part of all the interesting things. Much more interesting and global things happen out of public view. It is possible that right now your account data is merged by a big-bellied guy from Orenburg, sitting in his underpants behind a monitor and five VPNs, and you only suspect this when you are intrusively offered to change your password in the mail. And well, if so.

Bonus


And finally, I post the Top 100 passwords from the Russian mail service by relevance, as it turned out, for 2008 (@ SERVICENAME @ is the name of the service itself):

TOP 100 passwords
123456
666666
654321
000000
555555
7777777
123321
123123
12345678
1234567890
777777
123
111111
121212
12345
112233
123456789
159753
987654321
123654
999999
222222
gfhjkm
1234567
qazwsx
qwerty
987654
333333
1234
1111111
asdasd
131313
zxcvbn
789456
159357
one
888888
147258
asdasd123
111
asdfgh
11111111
111222
777
zxcvbnm
qwertyuiop
098765
1111
1q2w3e
0987654321
88888888
7654321
147852
123789
444444
ghbdtn
123qwe
12344321
@ SERVICENAME @
marina
010203
qwertyu
5555555
1111111111
666
147258369
123123123
101010
135790
252525
789456123
password
samsung
55555
1q2w3e4r
232323
qqqqqq
555
1986
1985
1984
1234554321
1987
qweasd
666999
nikita
159951
qazwsxedc
1983
456123
87654321
134679
999999999
142536
212121
11111
1982
natasha
11223344
124578

Source: https://habr.com/ru/post/257881/


All Articles