📜 ⬆️ ⬇️

Ghosts in Unicode

In 1978, the Ministry of Economy, Trade and Industry of Japan established the encoding, which would later be called JIS X 0208. It is still the basis of all Japanese encodings. But after the release of the JIS standard, people noticed something strange: some of the added characters had no obvious sources. No one could say what they mean or how to pronounce them. No one was sure where they came from. These characters are now known as ghosts ( 幽 霊 文字 ).


Be careful what you write. via NDL

For a long time, the ghost symbols remained inexplicable and almost forgotten by curiosity, but in 1997, an investigation into their origin began. Although for all characters in the JIS standard, the source must be specified, but even if such a record is present, it is not very specific: the document is usually just the document from which the character was received.

You might think that the name will facilitate the search for the origin of characters. But it is important to clarify that one of the most common “sources” for ghosts was the “Overview of National Administrative Regions” (国土 行政区 画 総 覧), a complete list of all Japanese toponyms. As I initially suggested, you can present a kind of atlas, a small book with several hundred pages. But no, the latest edition of the handbook is a seven-volume book, each of which has about nine hundred pages. Imagine searching for a single character without a link to a page.
')
Despite the difficulties, the investigation of ghost characters has made the search for the origin of characters a success - mostly. The researchers interviewed catalogers who participated in the creation of the standard, and found that some characters were accidentally invented as errors in the cataloging process. For example, 妛 is an error that occurred while trying to write “山 over 女”. This phrase is found in the name of a certain place and, thus, was suitable for inclusion in the JIS standard, but since it was still impossible to print the whole composite character, the catalogers printed 山 and 女 separately, cut them out and put them on a piece of paper. But when copying the junction of two small pieces of paper looked like a dash - and it was mistakenly added to the symbol. Valid character ( ) added to JIS and Unicode much later - and it still does not appear on most sites.


Basic ghost characters: 妛 挧 暃 椦 槞 蟐 袮 閠 駲 墸 壥 彁

As a result, for only one symbol, neither the exact source nor any historical precedent was found: it is a symbol. The most likely explanation is that it was created as a misreading of the 彊 symbol, but no specific incident could be found.

After the general adoption of the JIS standard, all these characters got into Unicode, which already had its own separate set of ghost characters since the unification of CJK.

To sum up: in 1978, as a result of a series of small errors, several characters appeared from nowhere. Errors went unnoticed long enough for the characters to take root in the standard, so now these ghosts, at least theoretically, have penetrated every computer on the planet, hiding in the dark corners of the tables of characters.

It is likely that they will remain with humanity forever. Ψ

Reference materials / links:

Source: https://habr.com/ru/post/418717/


All Articles