Do not want to read? See the
demo .
Zero width characters are non-printing control characters that most applications do not display. For example, in this proposal, I would get about ten blanks of zero width, did you notice or? (Hint: insert the sentence in the
Diff Checker to see the location of the characters!). These characters can be used as unique text footprints to identify users.
Of course, he can be here. And you never guess')
What for?
Well, the original reason is not very interesting. A few years ago, I and a team participated in competitions in various video games. The team had a private page for important announcements, among other things. But in the end, these ads began to repost in other places, with bullying the team, revealing confidential information and command tactics.
Protection of the site seemed quite stable, so we made the assumption that an insider acts, who logs in by login and password, and then simply copies the ad and places it in a different place. Therefore, I developed a script that invisibly imprints the name of the user who displays this ad in each ad.
After the
recent post of Zack Aisan, it became clear that people are interested in the topic of non-printing characters. So I decided to publish this method here along with
an interactive demonstration for everyone. Code samples are updated for modern JavaScript, but the overall logic is the same.
How?
The exact steps and logic are described below, but if in two words: the user name string is converted to binary form, then the binary file is converted to a series of unprintable characters representing each bit. Then the unprintable line is quietly inserted into the text. If the text is published on another site, a string of non-printable characters can be extracted and the reverse process can be performed to find out the name of the user who made the copy-paste!
Fingerprinting Text
1. Get the name of the logged in user and convert it to a binary file.Here we simply convert each letter of the user name to a binary equivalent.
const zeroPad = num => '00000000'.slice(String(num).length) + num; const textToBinary = username => ( username.split('').map(char => zeroPad(char.charCodeAt(0).toString(2))).join(' ') );
2. Take the username in binary format and convert it to unprintable charactersThe following script iterates over the binary string and converts every bit 1 to a non-printable space, each 0 to a non-printable no-joiner. After the conversion of each letter, insert the unprintable symbol of the resolution of ligatures (joiner) - and proceed to the next.
const binaryToZeroWidth = binary => ( binary.split('').map((binaryNum) => { const num = parseInt(binaryNum, 10); if (num === 1) { return '';
3. Insert "username" in non-print confidential textHere we simply insert a block of non-printing characters into confidential text.
Extract username from tagged text
The same steps in reverse order.
1. Extract the unprintable “username” from confidential textRemove confidential text from a string, leaving only non-printable characters.
2. Convert the unprintable "username" back to a binary fileHere we break the string into fragments, taking into account the added inter-letter delimiters. This gives the equivalent in control characters for each letter of the user name! Enumerate the characters and return 1 or 0 to recreate the binary string. If you don’t find the corresponding 1 or 0, then you hit the letter delimiter (the ligature resolution symbol) and thus complete the binary transformation for the character: you can add one space to the string and move on to the next character.
const zeroWidthToBinary = string => ( string.split('').map((char) => {
3. Convert user name from binary back to textFinally, we analyze the binary string and convert each series 1 and 0 into the corresponding character.
const binaryToText = string => ( string.split(' ').map(num => String.fromCharCode(parseInt(num, 2))).join('') );
Conclusion
Companies more than ever pay much attention to information leaks and insider search. This is just one of many tricks you can use.
Depending on the direction of your work, it may be vital to understand the risks associated with copying text. Very few applications display non-printable characters. For example, you can assume that your terminal will try to display them (my not!).
If you go back to the secret bulletin board, the plan worked as it should. Soon after the introduction of the script, a new ad was released. Within a few hours, the text was distributed elsewhere with an unprintable line attached. The culprit's username was successfully identified, and they were banned: happy ending!
Of course, there are certain reservations about using this method. For example, if the user knows about the script, then theoretically it can replace unprintable characters in order to substitute another person. So it's better to insert a unique secret ID instead of a username.
To play with the script, run the demo or see the source code .