I. What is the problem
Many computer technologies, developed initially in the world of
analytical language , run into additional difficulties when being transferred to a community with a
synthetic language .
For example, a search for morphology in English and Russian texts requires a different level of complexity. The branching of the Russian inflection has long been the subject of popular jokes about the torment of foreigners who study Russian grammar with all the rules and exceptions.
One example of how technology stumbles across the difference in languages ​​is in English and Russian blogs and social networks. While the tags are allocated in a separate block (as implemented in Habrahabr or LJ), there are no problems: in both languages ​​the initial forms of words are used, sometimes the plural number (and even then English is the remnants of former synthism). But as soon as the tags get into the text, the difference sharpens. And sometimes it seems that, for example, Twitter hashtags are becoming a powerful factor in increasing analyticism in the Russian language. Now and then you come across phrases like:
')
We # husband in a restaurant.Starting tomorrow, in #Moscow.Returned from # sea.There is a very strange feeling, some language dizziness and a split.
People who do not want to put up with such protruding unnaturalness solve the problem in different ways.
Someone puts tags to the end of the tweet, as if in a separate block.
Returned. # sea # vacationSomeone takes the tags forward, as if additionally isolating the
topic of the next sentence .
#Moscow. From tomorrow.Someone turns all wordforms into tags. Although this significantly reduces the potential of search by tags - after all, the morphology has not yet been screwed to them.
My husband and I are in a restaurant.Someone adds the endings with a hyphen, double or single quote, and even a space. This insertion terminates the automatic conversion of a word into a tag just on such a character. But this decision makes it difficult to read, even though such punctuation “offal” sticking out is still easier to perceive than the sudden lack of declination.
We # husband-eat in the restaurant.We are # husband in a restaurant.My husband and I eat in a restaurant.Such a solution is good only for those words in which the so-called zero ending is in the initial form, that is, in which the initial form coincides with the basis of the word, to which are added the endings of other cases. In other words, such a design does not make sense, because no one will search by a tag that is not a full-fledged word. Compare:
Streets #Kiev-a. Evening # Kiev th. We are in Kiev.Streets # Moscow s. Evening # Moscow-oh. We are in Moscow.But still there are many words with a matching base and initial form to find a readable implementation for tagging.
Ii. Two possible solutions and their features
In fact, there are two suitable characters in character sets that both interrupt tagging and leave no visible gap between the base tag and the ending: this is a
soft transfer (abbreviated "shy") and a
space of zero width (abbreviated "zwsp").
Their use allows you to create, for example, such hashtags on Twitter:

Or those on Facebook (the previous version will not work, because Facebook filters the same hashtags, leaving only the first one to be tagged):

The two characters have a number of
similarities .
1. In most cases, they are invisible inside the word.
2. They become the place for transferring a part of a word to another line when the word no longer fits entirely.
3. They do not allow parts of the word to be torn when the text is justified in width (at least in the major browsers of the latest versions).
4. You can detect both characters in a word in the following ways: a. when moving the carriage from symbol to symbol with arrows from the keyboard, the carriage “slips” once in the place of an invisible symbol, as if it does not respond to a key press; b. when adding spaces before the word, sooner or later, going to the end of the line, it “breaks” into two parts in the place of the inserted mark; at. Some text editors (often programmers) display invisible characters (for example, a zero space can be seen in the well-known jsfiddle.net service, but the soft transfer is not displayed there; by the way, you can observe the behavior of text with such hidden characters if you
shift the frames of the blocks:
jsfiddle .net / k37ssezj (in the first segment after each “word”, a zero space is inserted, in the second - a soft carry)).
But there are
differences between the two signs, and all of them are in favor of a soft transfer:
1. Soft transfer is a more traditional and ancient sign for the computer world, it is in the initial part of Unicode, is present in more fonts and is easier to type from the keyboard. A space of zero width is located much later in the Unicode table, it may be in a smaller number of fonts and it is more difficult to enter it from the keyboard.
2. When transferring a part of a word to a new line, the hyphen is more logical and more explicitly denotes the unity of the word.
Some of the differences
depend on the application .
1. Signs may have different effects on the search on the page (the transfer does not interfere with the search in Chrome and Firefox, but interferes with IE; the space does not interfere with the search in Chrome, but interferes with Firefox and IE).
2. When double clicking on a word containing an invisible character, sometimes only part of the word under the cursor is highlighted, sometimes the whole word (when inserting a transfer, the selection is divided only in the hybrid Twitter tags in Chrome; when inserting a space, it is divided in all words in Chrome and IE) .
3. It is necessary to remember about the IE11 bug: when working with advanced input fields (the so-called
Rich Editor
, allowing you to see the design in real time in the style of WYSIWYG editors; they are generated by the
element.contentEditable
and
document.designMode
properties) sometimes paste from the clipboard does not work In this case, in the developer console, you need to switch from Edge mode to compatibility mode with a lower browser version (starting with IE10). For example, such a problem manifests itself when trying to insert something into the text of a note (Note) on Facebook.
Finally,
sites may
have different effects on the insertion of characters.
1. Facebook is more hostile to an invisible space. He deletes it almost immediately at the time of entry, and certainly does not save it when publishing a post (
to all appearances, this is not very difficult for anyone ). The invisible transfer is preserved, including in unpopular yet face-based hashtags, but sometimes for some reason you have to move the input carriage back and forth by the word to make the website see this sign in the word “saw”, otherwise it may not be saved either (for manual input This problem appears less often than when trying to insert a symbol into the text programmatically using a script (more on that later)).
2. When trying to insert a sign in Twitter with a script, you need to remember about
this annoyingly old bug in Firefox . You have to either use other methods of insertion, or set the
security.csp.enable
key from
about:config
to
false
, which probably would be too radical a way to solve typographical problems.
Iii. Ways of implementation
1. Enter manually.If you rarely type in rare characters from the keyboard, it may be useful for you to read
this small article . It describes two methods for entering characters: in decimal notation (it works reliably only with the initial unit of Unicode and some symbols of common encodings) and in hexadecimal. For the convenience of the registry editing mentioned in the article, you can save this text to a file with the .reg extension and Unicode encoding, click on it and agree to enter the data into the registry.
EnableHexNumpad.reg Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER \ Control Panel \ Input Method]
"EnableHexNumpad" = "1"
So, when entering manually, we need to know the sequence number of our characters in two numerals:
Soft transfer:
0173
and
00ad
(respectively,
Alt + '0173'
and
Alt + '+00ad'
).
Zero space:
8203
and
200b
(respectively
Alt + '8203'
and
Alt + '+200b'
).
(By the way, it is interesting that the zero space is not included in the javascript space class '\ s': the
enumeration of one of the groups of spaces is interrupted just before it ).
What to consider:
but. The decimal input of characters with a large sequence number works very rarely. Sometimes nothing is inserted, sometimes something completely unexpected is inserted.
b. When entering hexadecimal codes containing letters, keyboard shortcuts specified in applications often work, and character input is broken. Sometimes it depends on the current keyboard layout, sometimes not.
2. Paste from clipboard.You can save these two characters somewhere to copy them from there and paste them through
Ctrl+C/Ctrl+V
Although storing and copying invisible characters is a little more difficult than visible ones: you have to either save each in a separate file, or know their place and select using the keyboard (
Shift+
).
In Windows, you can use the familiar utility by selecting the poorest font, setting the options you want, entering the hexadecimal number and pressing the buttons:

3. Input using a script ( bookmarklet ).A little further will be presented two programs that are identical in everything except the character code and its variable. They sequentially look through three options of circumstances:
but. The input focus is in a simple single-line or multi-line text field. The
usual properties and methods for such cases are used .
b. The input focus is in the field with advanced features like
Rich Editor
(the options are taken into account both with the element of the current window / document, and with the fully allocated
iframe
editor with its window / document).
document.execCommand()
.
at. The focus is not in the text field or the desired method of working with
Rich Editor
not implemented in the browser (in IE11, the insertText command
insertText
not supported, but
will be implemented in the Edge ). In this case, a window pops up in front of the user with a text field in which the required character is already selected (although it is invisible at the same time). It remains only to press
Ctrl+C
, close the window (you can press
Enter
or
Esc
), then insert the cursor in the right place of the text and press
Ctrl+V
You can consider this option a convenient implementation of the previous method of copying and pasting from a file or utility.
Soft carry javascript: (function(d, e, shy, s1, s2, v, sy, sx) { if (e.type == 'textarea' || e.type == 'text') { s1 = e.selectionStart; s2 = e.selectionEnd; sy = e.scrollTop; sx = e.scrollLeft; v = e.value; e.value = v.substring(0, s1) + shy + v.substring(s2); e.selectionStart = e.selectionEnd = ++s1; e.scrollTop = sy; e.scrollLeft = sx; e.focus(); } else if ((e.isContentEditable || d.designMode == 'on') && d.queryCommandSupported('insertText') || (d = e.contentDocument) && (d.activeElement.isContentEditable || d.designMode == 'on') && d.queryCommandSupported('insertText')) { d.execCommand('insertText', false, shy); } else { prompt('Copy and paste in the text:', shy); } })(document, document.activeElement, '\u00ad')
Zero space javascript: (function(d, e, zwsp, s1, s2, v, sy, sx) { if (e.type == 'textarea' || e.type == 'text') { s1 = e.selectionStart; s2 = e.selectionEnd; sy = e.scrollTop; sx = e.scrollLeft; v = e.value; e.value = v.substring(0, s1) + zwsp + v.substring(s2); e.selectionStart = e.selectionEnd = ++s1; e.scrollTop = sy; e.scrollLeft = sx; e.focus(); } else if ((e.isContentEditable || d.designMode == 'on') && d.queryCommandSupported('insertText') || (d = e.contentDocument) && (d.activeElement.isContentEditable || d.designMode == 'on') && d.queryCommandSupported('insertText')) { d.execCommand('insertText', false, zwsp); } else { prompt('Copy and paste in the text:', zwsp); } })(document, document.activeElement, '​\u200b')
In Chrome or Firefox, you can create a new bookmark on an arbitrary page, and then insert the code (from the very first letter
javascript:
to the last bracket after
'\u00ad'
or
'​\u200b'
inclusive) in the address field, changing the name of the bookmark as necessary.
In Chrome, you can simply select the code and drag the selected text directly to the bookmarks, then change the name to more readable.
In IE, both of these methods are impossible (but the method
described here for creating a file in the Favorites folder is possible).
Finally, in all browsers, you can drag and drop links from
this page to bookmarks (in IE11 you will have to agree with the preservation of the bookmarklet).
Iv. Features of application in different browsers and on different sites.
Here is a small table with the results of testing the three latest version browsers in Windows 7 SP 1. Minus or plus displays the current final result, depending on the sum of circumstances. It may change during the development of browsers or websites. More general circumstances are displayed in notes to the row and column headers, more particular circumstances are noted in the cells. Separate options with a post and a Facebook note are taken for a different implementation of the
Rich Editor
in them: in the first case as an element of the page, in the second - as a whole embedded frame with its window / document.

(
picture without reduction )
Notes in the table:1. The hashtags on Facebook notes (for now?) Don't work, be it with invisible characters, be it without them.
2. When entering the decimal code of the zero space, the symbol ♂ appears in all browsers instead of the expected character (the sequence number is 10: 9794, 16: 2642).
3. On Facebook, the zero space is removed by the site immediately after insertion or after saving the post.
4. Instead of a soft transfer, keyboard shortcuts work.
5. To prevent the keyboard shortcut in Firefox to work in this case, you can by further pressing the Win key.
6. The problem depends on the keyboard layout: either the keyboard shortcut works, or nothing happens at all.
7. In Facebook, softly inserted soft hyphenations are not always immediately “picked up” by the page and thus not always saved when the text is published, sometimes you need to move the cursor inside the word with the keyboard arrows. Perhaps there are other ways to "update" the insert. Sometimes the insertion is not recognized during the initial creation of the post, but is recognized when editing an already saved one.
8. In IE11, in the Rich Editor
type fields, only the compromise bookmarklet method will work, copying and pasting a character from the buffer.
(PS Verification via the virtual machine confirms the implementation of insertText
in MS Edge version 11.00.10240.16397 from 07.22.2015 (according to the file version), it is also 20.10240.16384.0 (according to the information in the settings): document.queryCommandSupported('insertText')
returns true
).
9. In IE11, the Facebook post field becomes a regular field, without the Rich Editor
features. According to the visual results of the tests, the table highlights the columns with the most versatile - cross-browser and cross-site - methods.
Thanks for attention.