📜 ⬆️ ⬇️

Be careful with copy-paste: fingerprinting text with unprintable characters

Do not want to read? See the demo .

Zero width characters are non-printing control characters that most applications do not display. For example, in this proposal, I would get about ten blanks of zero width, did you notice or? (Hint: insert the sentence in the Diff Checker to see the location of the characters!). These characters can be used as unique text footprints to identify users.


Of course, he can be here. And you never guess
')

What for?


Well, the original reason is not very interesting. A few years ago, I and a team participated in competitions in various video games. The team had a private page for important announcements, among other things. But in the end, these ads began to repost in other places, with bullying the team, revealing confidential information and command tactics.

Protection of the site seemed quite stable, so we made the assumption that an insider acts, who logs in by login and password, and then simply copies the ad and places it in a different place. Therefore, I developed a script that invisibly imprints the name of the user who displays this ad in each ad.

After the recent post of Zack Aisan, it became clear that people are interested in the topic of non-printing characters. So I decided to publish this method here along with an interactive demonstration for everyone. Code samples are updated for modern JavaScript, but the overall logic is the same.

How?


The exact steps and logic are described below, but if in two words: the user name string is converted to binary form, then the binary file is converted to a series of unprintable characters representing each bit. Then the unprintable line is quietly inserted into the text. If the text is published on another site, a string of non-printable characters can be extracted and the reverse process can be performed to find out the name of the user who made the copy-paste!

Fingerprinting Text


1. Get the name of the logged in user and convert it to a binary file.

Here we simply convert each letter of the user name to a binary equivalent.

const zeroPad = num => '00000000'.slice(String(num).length) + num; const textToBinary = username => ( username.split('').map(char => zeroPad(char.charCodeAt(0).toString(2))).join(' ') ); 

2. Take the username in binary format and convert it to unprintable characters

The following script iterates over the binary string and converts every bit 1 to a non-printable space, each 0 to a non-printable no-joiner. After the conversion of each letter, insert the unprintable symbol of the resolution of ligatures (joiner) - and proceed to the next.

 const binaryToZeroWidth = binary => ( binary.split('').map((binaryNum) => { const num = parseInt(binaryNum, 10); if (num === 1) { return '​'; // zero-width space } else if (num === 0) { return '‌'; // zero-width non-joiner } return '‍'; // zero-width joiner }).join('') // zero-width no-break space ); 

3. Insert "username" in non-print confidential text

Here we simply insert a block of non-printing characters into confidential text.

Extract username from tagged text


The same steps in reverse order.

1. Extract the unprintable “username” from confidential text

Remove confidential text from a string, leaving only non-printable characters.

2. Convert the unprintable "username" back to a binary file

Here we break the string into fragments, taking into account the added inter-letter delimiters. This gives the equivalent in control characters for each letter of the user name! Enumerate the characters and return 1 or 0 to recreate the binary string. If you don’t find the corresponding 1 or 0, then you hit the letter delimiter (the ligature resolution symbol) and thus complete the binary transformation for the character: you can add one space to the string and move on to the next character.

 const zeroWidthToBinary = string => ( string.split('').map((char) => { // zero-width no-break space if (char === '​') { // zero-width space return '1'; } else if (char === '‌') { // zero-width non-joiner return '0'; } return ' '; // add single space }).join('') ); 

3. Convert user name from binary back to text

Finally, we analyze the binary string and convert each series 1 and 0 into the corresponding character.

 const binaryToText = string => ( string.split(' ').map(num => String.fromCharCode(parseInt(num, 2))).join('') ); 

Conclusion


Companies more than ever pay much attention to information leaks and insider search. This is just one of many tricks you can use. Depending on the direction of your work, it may be vital to understand the risks associated with copying text. Very few applications display non-printable characters. For example, you can assume that your terminal will try to display them (my not!).

If you go back to the secret bulletin board, the plan worked as it should. Soon after the introduction of the script, a new ad was released. Within a few hours, the text was distributed elsewhere with an unprintable line attached. The culprit's username was successfully identified, and they were banned: happy ending!

Of course, there are certain reservations about using this method. For example, if the user knows about the script, then theoretically it can replace unprintable characters in order to substitute another person. So it's better to insert a unique secret ID instead of a username.

To play with the script, run the demo or see the source code .

Source: https://habr.com/ru/post/352950/


All Articles