📜 ⬆️ ⬇️

Black datamining archeology: what could be more effective than a dictionary attack?

For those who are lazy to read further, I will immediately say the answer: the attack “login is equal to password”. According to statistics, login equal password is more common than the most common password from the dictionary. Further in the article there will be some statistical studies on this topic, and a story from which it all began.




')
One such story, which happened back in 2000 with a certain young man, prompted me to begin such a study. Not being a hacker, he wanted to crack the mailbox of one person. Her login ended in two digits, like this: masha86@mail.com. After the trivial passwords that did not come up, the guy suggested that the password might look like this: mashaDD, where DD is two random numbers. The complexity of this attack is only 100 attempts, and at about the twentieth attempt the password came up, the box was cracked. What can you not do in your youth because of jealousy and for the sake of love ...


Therefore, in today's study, I decided to check how often passwords are found that are either equal to or are a small modification of their passwords.

For a start, in addition to 6 million passwords for mail entries, I connected to the study a database of passwords from a single non-mail site for 3.5 million entries. These are fresh entries (May 2015), containing quite a few invalid passwords. I built statistics on passwords from this site separately.

Password is equal to login
The number of entries where the password is equal to the login: approximately 87 thousand for mail passwords, 50 thousand for passwords from the site. Is it a lot or a little? For comparison, I cite the two most common passwords (1st and 2nd place in the distribution). Also for convenience, I quote values ​​in thousandths (‰) of the total number of passwords.

Mail passwordsPasswords from the site
amountamount
Password is equal to login8690814.3Password is equal to login4932714.0
Top 1 "123456"8283013.6Top 1 "qwerty"333229.5
Top 2 qwerty531448.7Top 2 "123456"217756.2


Overlap
Next, let's see how often there are cases where the password is a small modification of the login. Such cases are not so frequent, but this is offset by the small complexity of the attack.

Mail passwordsPasswords from the site
Type of attackComplexityamountamount
Password and login are different
one last character
~ 7018350.3020869 (!)5.93
Differ by two characters,
and the numbers:
10017020.2812260.35
One character added for password~ 10055080.9019300.55
Or two~ 10,00050870.8432690.93
Added 4, but only numbersfrom 100
up to 10,000
72671.1932520.92


In terms of frequency, all these cases fall into the top 50 most common passwords:

Top 50 Email Passwords
12345682830
qwerty53144
12345678923286
11111113831
qwertyuiop12399
qwe1239021
12345678908364
12345677452
123456420
password6410
123456786374
1233216170
77777775861
1231235533
04977
6666664197
1qaz2wsx4181
qazwsx4143
1q2w3e4r3982
6543213760
5555553539
123qwe2973
1q2w3e4r5t2967
zxcvbnm2832
qweqwe2816
gfhjkm2806
1q2w3e2748
klaster2695
1122332565
1212122445
9876543212371
1597532338
7777772204
qwer12342015
1234qwer1999
qwerty1231846
12341801
asdfgh1779
abc1231722
1236541568
2222221557
I love you1508
9876543211432
samsung1427
zxcvbn1422
ghbdtn1313
888888881311
marina1284
1313131268
asdfghjkl1243


Top 50 passwords from the site
qwerty33322
12345621775
(empty password)20002
UsdopaA (bots)16016
1234567898298
12345678904117
qwertyuiop2247
1233212235
12345672214
1q2w3e4r5t2142
1111112004
1q2w3e4r1682
123qwe1554
1231231364
qazwsx1319
1q2w3e1256
qazwsxedc1196
qwe1231186
qweasdzxc1126
93799921020
01018
48151623421015
I love you991
12345678979
666666977
zxcvbnm957
asdfgh930
Jskasgfdfjg923
gfhjkm914
qwertyuiop []904
1234qwer899
1q2w3e4r5t6y890
qwerty123839
nastya799
555555770
987654321755
ghbdtn746
12345qwert740
159753737
loveyou735
1234554321716
7777777711
1qaz2wsx708
123123123679
samsung670
123qweasdzxc662
adidas642
asdfghjkl641
789456123636




Conclusion
Now on many mail portals (but very rarely on regular sites and forums) you cannot set a password equal to the login. However, for all the time I met only one site in which it was impossible to set a password by adding one character to the login. The site wrote: “your password is very similar to the login”. However, such a situation on the modern Internet is more the exception.

Meanwhile, attacks with the selection of a password by a small modification of the login are quite effective in terms of the frequency of occurrence, and do not require great complexity in terms of the number of options.

Of course, if we compare the integral complexity, a dictionary attack is somewhat more advantageous. On the other hand, attacks by login modification are not counted in security systems, even on large portals.

R-code, if anyone is interested
################################################ DATA <- readRDS( file = "ClearData.rds" ) ################################################ ################################################ #  : 3520000 nrow(DATA) #    : 49327 length( which( DATA$login == DATA$passwd) ) ################################################ #     : 1930 length( which( substr( DATA$login ,0, nchar(DATA$login) ) == substr( DATA$passwd ,0, nchar(DATA$passwd)-1 ) ) ) #  : 3269 length( which( substr( DATA$login ,0, nchar(DATA$login) ) == substr( DATA$passwd ,0, nchar(DATA$passwd)-2 ) ) ) ################################################ #   (   ): 3252 length( which( ( substr( DATA$login ,0, nchar(DATA$login) ) == substr( DATA$passwd ,0, nchar(DATA$passwd)-4 ) ) & ( grepl( "\\d\\d\\d\\d", substr( DATA$passwd ,nchar(DATA$passwd)-3,nchar(DATA$passwd) ) ) ) ) ) ################################################ #        : 20869 length( which( ( substr( DATA$login ,0, nchar(DATA$login)-1 ) == substr( DATA$passwd ,0, nchar(DATA$passwd)-1 ) ) & ( DATA$login != DATA$passwd ) ) ) ################################################ #     (1477),   : 1226 length( which( ( substr( DATA$login ,0, nchar(DATA$login)-2 ) == substr( DATA$passwd ,0, nchar(DATA$passwd)-2 ) ) & ( DATA$login != DATA$passwd ) & ( substr( DATA$login ,0, nchar(DATA$login)-1 ) != substr( DATA$passwd ,0, nchar(DATA$passwd)-1 ) ) & ( grepl( "\\d\\d", substr( DATA$passwd ,nchar(DATA$passwd)-1,nchar(DATA$passwd) ) ) ) ) ) ################################################ ################################################ ###    library(dplyr) tmpD <- DATA[,c(3,4)] PASS_SUM <- summarise(group_by(tmpD,passwd), count = sum(count) ) PASS_SUM <- arrange(PASS_SUM,desc(count)) #   : 2132935 nrow(subset(PASS_SUM, PASS_SUM$count==1)) #   : 887 nrow(subset(PASS_SUM, PASS_SUM$count>64)) PASS_100 <- PASS_SUM[1:100,] write.csv(PASS_100,file = "SpPassSum100.csv", row.names = F) ########################################### 



Previous issue: Black Archeology Mining Date

In the next issue: a comparison of the mail password database with the password database of a non-mail website. How useful were mail passwords leaked in 2014?

Source: https://habr.com/ru/post/261331/


All Articles