Localization of Unity games in Hindi

Publishing my game in the Play Market, I localized it into only two languages: English and Russian. Survey articles on the game also posted, respectively, only in Russian and English-speaking forums. Statistics of the first month allowed to determine the list of countries in which the game began to actively swing. Therefore, it was decided to add localization to a number of languages to consolidate the success. Among other interest in the game showed and India.

For help in localizing the game in Hindi, I turned to my old Indian friend, Dr. Rudra Narayanu Pandey, who kindly agreed to help me with this. I provided Rudra with a table of play phrases, he quickly translated everything into Hindi, and I, without any ulterior motive, just as quickly copied the translation into a JSON localization file. And, pleased, I sent Rudra bild to be checked. Oh, how naive I was then!

Rudra only condescendingly smiled at my naivety and said that the text contained a huge amount of inaccuracies that were not in his translation. According to Rudra's description, the text in the game looked as if, by analogy, in Russian, instead of “E” there would stand “YO”, and instead of “C” - “TS”. But for me, completely uninitiated in the subtleties of Hindi, everything looked the same: what is the translation in the MS Word document, what is the text in the game. But, having carefully examined the text of the text, I really noticed the difference: as if in some places the letters were rearranged in places, and in some places they looked different. Neither my friend nor I were privy to the mysteries of fonts, and they could only guess about the reasons for this behavior of the text.
')

Finding a solution

After some wandering across the vast expanses of the Internet, the reason was found out. The problem was that Unity does not support working with GSUB and GPOS font tables. The first, GSUB (Glyph Substitution Table), is responsible for replacing a sequence of characters with one ligature , and the second, GPOS (Glyph Positioning Table), is responsible for adjusting the relative position of characters.

If for the Russian text as an example of the work of GSUB, I immediately recall only replacing three points in a row by one ellipsis in MS Word, then the Hindi text is almost half made up of ligatures.

For example, this is the language self-name:

There are 2 ligatures in this word:

and

And this is how this word is displayed in Unity:

Unity simply draws the characters in the order in which they appear in the word, without changing them according to GSUB rules.

So, we found out the reason, but there was no unambiguous solution to the issue. Basically, on the forums, everyone recognized the fact that there was a problem and something had to be done with it. We managed to find some useful ideas on this topic.

Idea one

Taken here . Before displaying the text, programmatically change the position of the vowing mark

(and) and the letter preceding it.

Yes, this decision makes the text in Unity more similar to Hindi. But it affects only a small part of the rules of the GSUB, moreover, it does not allow to adjust the width of the “cap” of the sign depending on the width of the “covered” letter. One could write such replacements for all GSUB rules, but there is one more problem: Unity does not import font characters if they do not have Unicode numbers. The standard Hindi font in Windows is Mangal - all ligatures in it do not have Unicode numbers, so Unity does not see them. Consequently, using Mangal, programmatically make all replacements fail.

Before swapping:

After permutation:

It should be:

Second idea

From this discussion, starting with comment # 20. To translate text into Hindi, use a text editor written in Unity, and a font that includes all possible ligatures, but with Unicode numbers.

The guys have already created such an editor - you can download and try it. But, unfortunately, he was very uncomfortable at work. The lower part of the editor screen contains a huge virtual keyboard with all the ligatures from this font, on which the translator must manually find the necessary characters and insert them into the text with a mouse, which is extremely inconvenient and long, there are more than a thousand of them there!

Idea three

From the same discussion, comment # 38. Use the Chanakya font and text converter for this font.

This font contains the Hindi alphabet, as well as a small number (several dozen) of the most commonly used ligatures that have Unicode numbers, and, therefore, can be imported and displayed in Unity. The web converter converts the Unicode text into the encoding of the Chanakya font with the simultaneous replacement of sequences of characters forming ligatures with characters from the font with images of these ligatures.

Having tested this method, I was convinced of its efficiency.

The localization process in this way looks like this:

1) We translate to Hindi in any convenient Unicode editor, for example, MS Word.
2) Convert Unicode text from an MS Word document to the encoding of the Chanakya font using a web converter.
3) The resulting sequence of characters is stored in the localization file.
4) In Unity, this sequence is displayed in Chanakya, resulting in a visually beautiful and correct Hindi.

But not everything is as simple as we would like. This method turned out to be a large number of pitfalls, which had to be dealt with.

First , the huge disadvantage of this method is that the Chanakya font does not contain standard Latin characters, not to mention Cyrillic. Because of this, it is impossible to jointly display in Hindi text and text in any other language in one text field.

Secondly , the font does not contain all standard punctuation marks, and those that contain are not under their Unicode numbers. If there are any punctuation marks in the source text, they will be replaced by the converter with unwanted Hindi characters. Therefore, for the correct result, you need to remove all punctuation marks in the source text before converting, and add Unicode characters to the result, the places of which have the required punctuation marks in Chanakya.

For example, in the source text there are quotes. Before converting, we delete them, and in the resulting text we add the symbols and to the places in the opening and closing quotes, respectively, since instead of these characters in Chanakya there are quotes (single characters only).

Thirdly , since the Chanakya font contains a limited number of ligatures, some non-convertible sequences are possible - this is evident from the fact that on the converter page in the result field, the text contains Latin characters instead of Hindi characters. In this case, you should ask the translator to find synonyms for the words that have not been converted.

Illustration of the localization process:

Translation in Hindi:

Translation prepared for conversion - punctuation marks removed:

Unicode converted text:

Converted text with added quotes and a dot:

Converted text displayed in Unity using the Chanakya font:

So, through the dances with a tambourine I localized the game. In my game, the language switches "on the fly", so I wrote a wrapper on the text field, which, when switching to Hindi, changes the Unicode font to Chanakya and back, respectively, when switching to another language. Also, the wrapper increases the font size of the text field, because Chanakya characters are small.

When preparing screenshots for the Play Market and adding captions to them, it turned out that Photoshop, like Unity, also does not know how to work with GPOS and GSUB tables. This method helped here. To do this, it was necessary to install the Chanakya font into the system, convert the text of the signatures with a web converter, and in Photoshop display the resulting text with the Chanakya font.

My friend Rudra was very pleased with the result. He proudly told Facebook about our teamwork, recalling the long-forgotten slogan "Hindi Russi bhai bhai". The reward was a smooth but steady increase in the number of downloads and a large number of fives with comments about localization.

However, my inner perfectionist was not satisfied. The above disadvantages did not allow me to recognize this method as convenient for further use in my games, especially to recommend it to a wide audience of Unity developers.

Universal localization method for Unity games in Hindi.

It was then that another idea was born, namely, from idea number two, take the Siddhanta font containing the full set of ligatures with Unicode numbers, and write a text converter in C # that emulates the action of GSUB and GPOS.

The main obstacle in the implementation of this idea was that I did not have a complete list of sequences of characters displayed in the form of ligatures. On the Internet, I could not find such a list. I had to independently study the issue and collect information from disparate sources. As a result, if I did not start reading in Hindi, then, as they say in the classics, at least I began to see “blondes, brunettes, redheads” ...

I also wrote to the author of the Siddhanta font, Mikhail Boyarin, who allowed me to use my font and advised me on some issues. I express my gratitude to Michael for help. And the amount of work done by Michael in creating such a comprehensive font is amazing and respectful!

The result of my research was the HindiCorrector script, which contains the ligatures character string correspondence table, and the Siddhanta Unity font.

Download the project here: HindiCorrector .

The format of the GSUB table entries from the HindiCorrector script is as follows:

{source = "\ u091F \ u094D \ u091F", dest = "\ uF5A2 \ uF61F"}

The characters in the source fields are mostly simple letters, a sign “ chalant” and signs of a vowel , which the translator enters from the keyboard in a text editor. The characters in dest fields are finite ligatures, which, unlike the Mangal font, have Unicode numbers in the Siddhanta font, so they can be imported and displayed in Unity.

The script method Correct replaces in the source text all occurrences of source with dest.

In addition to replacing sequences of characters with ligatures, the script also solves the problem of adjusting the width of the “caps” of vowing marks.

(and) and

(ii) In the Siddhanta font there are lines of these characters with "caps" of different widths. The script selects the character with the desired width depending on the letter in front of this character.

Similarly, the rest of the vowel marks are modified if the letters are very wide, and the vowel marks do not coincide with the vertical line of the letters.

This localization method does not require any manual modification of the source text in Hindi. Everything makes a script.

If you need to "train" Hindi another font used in the game, simply copy the character ranges 0900-097F and F000-F633 from the Siddhanta Unity font to the desired font.

The font of Siddhanta Unity differs from the original font of Mikhail Boyarin in that all the ligatures that are not participating in the GSUB table of the HindiCorrector script are removed from the second one. This has significantly reduced the size of the final font, which is extremely important for mobile development. The range of characters F000-F01B has also been added, in which some existing characters with corrected positions are placed.

My GSUB table does not claim to be complete, but it is definitely more than what is implemented in the Chanakya font from the third idea. If you need to add a new ligature that is not in my GSUB table, then you need to perform the following steps:

1) find the required ligature in the original Siddhanta font and copy it under your Unicode number into the Siddhanta Unity font;
2) add to the GSUB table of the HindiCorrector script an entry about replacing a sequence of characters with this ligature.

There is another important point: Unity does not know how to recognize the Hindi language installed on the phone in the standard way - the Application.systemLanguage method returns the Unknown value, which makes all efforts to localize half meaningless, because not all players will see the options for choosing a language in the settings. The solution described here came to the rescue - request the system language directly from the JAVA environment.

In conclusion, I thank my friend Dr. Rudra Narayan Panday for the fruitful collaboration, interesting communication and new knowledge. "Hindi Russi bhai bhai"!

Source: https://habr.com/ru/post/323594/

All Articles