Account hacking and unicode characters

In the technical blog " Spotify " an interesting study was published on the topic of hacking service accounts by using the features of canonization of user-entered data. This was made possible thanks to the fact that Spottifayans are proud of - a fully Unicode login. For example, a user can easily have a snowman as an account name if he so desires. The implementation of this, however, from the very beginning brought some inconvenience.

A few years ago on Good Friday, a message was posted on the tech support forum about the possibility of hacking any account on the service. A company representative asked to demonstrate this using the example of his own account, and within a few minutes he was given a new password and a new playlist was created. This immediately attracted the attention of several employees who were forced to spend Easter in an attempt to close the “hole”. Due to the nature of the vulnerability, registration of new accounts has been temporarily closed.

The hacker acted as follows: wanting to hack an account with a name, say, bigbird, he registered an account with the name ᴮᴵᴳᴮᴵᴿᴰ (in Python, this line looks like u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30′ ). After requesting a link to reset the password, a new password was set that matched the bigbird account.

The problem was to canonize a username that incorrectly handled forbidden and equivalent characters. Obviously, no spaces are allowed in the username, and BigBird and bigbird are the same login. The first case is the processing of prohibited characters, the second is the processing of some characters as equivalent to each other. Similar is achieved by canonizing the username.
')
If only Latin characters are allowed (a — z, A — Z), then

 canonical_username = username.lower()

Thus BigBird, Bigbird, bigbird and any derivatives will be the same login. In this case, BigBird is called the exact user name, and bigbird is called canonical. When creating an account it is necessary that the canonical login is free in the system.

Appeal to lower case has the property of idempotency, that is, its application to the same line one and more times gives the same result:

 x.lower() == x.lower().lower()

With the resolution of unicode characters, various problems begin. For example, it is difficult to externally distinguish Ω from Ω, although the first is the letter omega, and the second is the unit symbol, and in Unicode these are different symbols. Obviously, just going to lower case would not be enough. Fortunately for developers, they did not have to develop their own canonization system, the Twisted framework had the necessary methods developed for XMPP.

 from twisted.words.protocols.jabber.xmpp_stringprep import nodeprep def canonical_username(name): return nodeprep.prepare(name)

Idempotency is promised in the specifications . So what's the deal? Let's see what happens when you type ᴮᴵᴳᴮᴵᴿᴰ.

 >>> canonical_username(u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30') u'BIGBIRD' >>> canonical_username(canonical_username(u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30')) u'bigbird'

As can be seen, the idempotency property is not satisfied for these symbols. The fact is that according to official documentation, Unicode 3.2 characters were taken into account, which does not include any of the ᴮᴵᴳᴮᴵᴿᴰ characters. When registering an account after a single use of the canonization function, a BIGBIRD login was created, which was acceptable, since the canonical name did not overlap with the existing bigbird account. When sending an e-mail with a link to reset the password, ᴮᴵᴳᴮᴵᴿᴰ was canonized once, so the user received a BIGBIRD message. But when using the link, the canonization function was reused, which led to resetting the password for the bigbird account, and not BIGBIRD.

The vulnerability was first corrected by the requirement of fulfilling the condition X==canonical_username(X) . Later, a function was added that, in fact, only performed the function of canonization and refused to register if old_prepare(old_prepare(name)) != old_prepare(name) . The problem in Twisted was fixed in version 11.0.0, and, as it turned out, the bug manifested itself only when upgrading the version of Python from 2.4 to 2.5, which was caused by changes in the standard library.

Such cases once again underline the need to validate user input and avoid negative information when communicating between users and employees. The hackers who found the vulnerability were awarded several months of a free premium subscription, and this was not the first or the last case of problems with character processing features. Also, do not forget that updates do not always promise to get rid of bugs and sometimes generate new ones.

Source: https://habr.com/ru/post/183878/

All Articles

Account hacking and unicode characters

More articles: