
You have submitted a new site - and everyone is delighted. Your design is fresh, the code is flawless, you are fully ready to launch. But here someone is interested: “Does it work in Japanese?”
You are thrown into a cold sweat: you have no idea. The site works in English, and you planned to do the rest of the languages later. Now you have to rewrite the entire engine to support other languages. The launch date is delayed, and you spend the next two months correcting errors, just to make sure that you missed a good half of them.
')
Localization makes your engine ready to work in any language - and it will be much easier if you take it from the very beginning.
Alconost localization company has translated a dozen simple rules for you, thanks to which you can safely run anywhere in the world.
1. Deliver all lines to resources
The first step in localization is to extract all the lines visible to users from the code in the resource files. These include headings, product names, error messages, captions on images, and any other text that can be visible to the user.
In most resource files, each line is given a name, which allows you to specify different translation options for it. Many languages use configuration files like these:
name = Username
Or here are such
.pot
files :
msgid "Username" msgstr "Nom d'utilisateur"
Or here are
XLIFF files:
<trans-unit id = "1">
source xml: lang = "en"> Username </ source>
<target xml: lang = "fr"> Nom d'utilisateur </ target>
</ trans-unit>
Then these files are loaded by the library, which uses a combination of language and country (
locale ) to determine the correct string.
After all strings have been moved to external resource files, you can send them to translators and receive translations from them in separate files for each locale supported by your application.
2. Do not allow string concatenation.
Adding one line to another almost always results in a localization error. This is clearly seen in the example of such a parameter as
color
.
Suppose there are items in your stationery store such as pencils, pens, and sheets of paper. Buyers will choose the product and then its color. In the basket, you will show them positions such as a “red pencil” or “blue pen” using a function of this type:
function getDescription() { var color = getColor(); var item = getItem(); return color + " " + item; }
This code works great for the English language, in which the color comes first - the “red pencil”, but is completely unsuitable for French, to which “red pencil” translates as “crayon rouge”, and “blue pen” - as “stylo - encre bleue. In French, definitions come after defined words. The
getDescription
function could not support such languages with simple string concatenation.
The solution is to specify the parameterized lines that determine the order of the name and color of the product for each language. Define a resource string that looks like this:
itemDescription = {0} {1}
It may seem insignificant, but it is it that makes translation possible. We can use it like this in the new
getDescription
function:
function getDescription() { var color = getColor(); var item = getItem(); return getLocalizedString('itemDescription', color, item); }
Now your translators can easily change the word order, for example:
itemDescription = {1} {0}
So, the function
getLocalizedString
takes the name of the resource string (
itemDescription
) and several additional parameters (color and product) to substitute their values into the resource string. Most programming languages contain a function similar to
getLocalizedString
. (The only significant exception is JavaScript, but we'll talk about that later.)
This method also works for strings containing text such as this:
invalidUser = The username {0} is already taken. Please choose another one.
3. Add punctuation marks to resource strings.
You always want to leave the refinement of punctuation for later in order to retain the ability to use the same line, say, in the field name, where after it you will need a colon, and in the prompt where it is not needed. But this is another bad example of string concatenation.
Here, for example, we add a simple login form using PHP in WordPress:
<form> <p>Username: <input type="text" name="username"></p> <p>Password: <input type="text" name="password"></p> </form>
We need the form to work in other languages, so let's add lines for localization. In WordPress, this is easily done using the
__
function (i.e., two underscores in a row):
<form> <p><?php echo(__('Username', 'my-plugin')) ?>: <input type="text" name="username"></p> <p><?php echo(__('Password', 'my-plugin')) ?>: <input type="text" name="password"></p> </form>
See a mistake? This is the same string concatenation. The colon after the text is not localized. The error will manifest itself in languages such as French, where the colon must always be beaten with spaces on both sides. Punctuation is part of a string and must be included in the resource file.
<form> <p><?php echo(__('Username:', 'my-plugin')) ?> <input type="text" name="username"></p> <p><?php echo(__('Password:', 'my-plugin')) ?> <input type="text" name="password"></p> </form>
Now the form can use
Username:
for English and
Nom d'utilisateur :
- for French.
4. Sometimes a name is not a name.
My name is Zack Grossbart. Zack is my name, Grossbart is a surname. Everyone in my family has the last name Grossbarth, but I am the only Zack.
In English-speaking countries, the first name is taken, then the last name. In most Asian countries, the opposite is true, and in some cultures they use only one name.
The cellist Yo-Yo Ma is a representative of the Ma family. In Chinese, he first writes his last name: Ma Yo-Yo (馬友友).
But it is still more interesting, because many people, moving from Asian countries to English-speaking countries, change the order of mentioning their first and last names in order not to violate local traditions. That is, you can not make any assumptions.
You must provide a way to adapt the mapping of names; you cannot assume that the name will always go first or the last name will be last.
WordPress solves this problem quite well by asking you for the desired way of displaying your name (Name / Last Name / Nickname / Name display option, which is visible to everyone):
It would be even better if WordPress would also support the second name, as well as provide the ability to define a format for a specific locale, so that you can specify one display option for English and another for Chinese. However, there is no limit to perfection.
5. Never write in the code format date, time or currency
There is no agreement in the world about the date and time display formats. Someone writes the first month (6/12/2012), and someone - the day (21/6/2012). Some people specify the time in 24-hour format (14:00), others - in the 12-hour (2:00 PM). Taiwan uses the AM and PM strings in translation and puts them at the beginning (上午 2:00).
Your best option is to store all dates and times in a standard format, such as
ISO 8601 or
UNIX time , and use libraries like
Date.js or
Moment.js to display everything for a specific locale. These libraries also handle the display of time for a specific time zone, so you can store all dates and times on the server in a common format (such as
UTC ) and convert to the correct version for each time zone in the browser.
Dates and times are no less complicated when displaying calendars and choosing dates. In the US, the week starts on Sunday, in the UK - on Mondays, and in the Maldives - on Fridays.
The jQuery UI date picker contains over 50 localized files to support different calendar formats around the world.
The same is true for currencies and other numeric formats. In some countries, a comma is used as a separator in numbers, in others - a period. Always use a library with localized files for each locale you need to maintain.
This question is well covered in the
discussion of best practices for displaying summer time and time zones on StackOverflow.
6. Almost always use UTF-8
The history of computer encodings is
long , but the most important thing is to remember that 99% of the time your right choice is
UTF-8 . The only case when UTF-8 does not fit is when you work mainly with Asian languages and you cannot do without
UTF-16 .
This often happens with web applications. If the browser and server use different encodings, the characters are displayed with distortion, and the application is filled with squares and question marks.
Many programming languages store files in the encoding that the system uses by default. But the English language of your server will not matter if all your users are viewing the site in Chinese. UTF-8 solves this problem by standardizing the browser and server encodings.
Set UTF-8 at the beginning of all your HTML pages:
<head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
And specify UTF-8 in the HTTP Content-Type header:
Content-Type: text/html; charset=utf-8
The JSON specification requires that all JSON documents use Unicode with UTF-8 by default, so make sure you use UTF-8 for any read or write data.
7. Provide shortening and lengthening strings.
When translating changes the length of the lines.
(210 pixels in English, 380 in German)“Repeat password” in German - almost twice as long as in English. If there is not enough space, your lines will crawl onto the controls. WordPress solves this problem, leaving additional space for each line in case of its extension.
This approach is good for languages in which strings are about the same length, but in languages with long words, such as German and Finnish, strings will crawl onto controls if you don't leave enough space. However, if you add more space, in compact languages, such as Chinese, the names will be placed too far away from the corresponding controls, complicating the use of the form.
Many form designers provide enough extra space for strings, aligning them to the right, or placing them above the controls.
Placing text above controls solves a problem for short forms, but long ones make it too long.
There is no perfect solution for how to ensure your application works in all languages; Many form designers combine these approaches. Short signatures such as “Username” or “Role” will change slightly when translated and require very little additional space. Longer lines will change seriously and will require much more space in width and / or in height.
Here, WordPress leaves some extra space for the “Biographical Information” line, but places a longer description below the field to provide a margin to increase it during translation.
8. Always use the full locale.
The complete locale includes the language and country code, supports alternative spellings, date formats, and other features that may differ for two countries using the same language.
Always use the full locale when translating, and not just the language, to make it clear that they served for lunch - potato pancakes or pancakes, and one could understand that 100 Russian rubles for another country is a lot or a little.
9. Never trust the browser to select a locale.
Localization is much more difficult for browsers and javascript because they determine the locale depending on who requests it.
In JavaScript, there is a property reporting the current language, which is called
navigator.userLanguage
. It is supported by all browsers, but it is usually useless. If I install Firefox in English, English will be displayed for
navigator.userLanguage
. Now I can go into my settings and change the preferred languages. Firefox allows me to select multiple languages, so I can set it in order of preference: American English, any other English, and Japanese.
Selecting multiple locales allows servers to find the best match between the languages that I know and they support. Firefox takes these locales and sends them to the server in the HTTP header as follows:
Accept en-us,en;q=0.7,ja;q=0.3
Firefox even uses the quality factor (part with
q=
) to indicate the degree of preference of one locale over another.
This means that the server can provide content in English, Japanese or another language, if none of these it supports. However, even after installing the desired languages in Firefox, English will remain in the property
navigator.userLanguage
and only it will be registered. With other browsers, the situation is not much better. And everything can end with the fact that the server decides that I prefer Japanese, and JavaScript - that I want to read in English.
JavaScript never had a solution to this problem, and instead of a single standard localization library there are dozens of standards. The best way out of this situation is to integrate into the JavaScript code of the locale page selected by the server when processing the request. Then you can use the locale when formatting any lines, dates, and numbers from JavaScript.
10. Consider languages with reading from left to right and from right to left
Most languages are written from left to right, but Arabic, Hebrew and many others are written from right to left. In HTML, there is a property for an
html
element called
dir
, which determines whether the
ltr
page is read (from left to right) or
rtl
(from right to left).
<html dir="rtl">
The directional property is also in CSS:
input { direction: rtl; }
After setting the
direction
property, the page will work on standard HTML markup, but the CSS properties
float:
will not switch from
left
to
right
, and the absolute positioning will remain unchanged. For more complex layouts, you need a new style sheet.
A simple way to determine the direction of the current language is to include the
direction
string in the resource strings.
direction = rtl
Now you can use this line to load another style sheet tied to the current locale.
11. Never sort in the browser
JavaScript contains a
sort
function that sorts strings in alphabetical order. It does this by comparing each character in each line to determine if
a is greater than
b and less than
y than
z . Therefore, she will put 40 before 5.
The browser determines that
y goes before
z , using a large match table for each character. However, the browser includes these tables only for the current locale. This means that the browser will not be able to properly sort your list of Japanese names using the English locale; it will sort them by Unicode values, which is incorrect.
This problem can be observed in languages such as Polish and Vietnamese, which often use diacritical marks. The browser can determine that
a goes before
b , but does not know whether
goes before
ã .
Only the server can correctly sort the rows. Make sure that the server has all the code cards for the supported languages, that you send sorted lists to the browser and that you contact the server whenever you want to re-sort them. Also make sure that when sorting the server takes into account the locale (including everything from reading from right to left).
12. Test earlier and more often.
Most teams do not worry about localization until it is too late. A large customer in Asia will complain that the site is not working - and everyone will rush to correct 100 small localization errors, which no one thought of. If you follow the rules in this article, you can avoid many problems with localization, but you still have to test; translations, as a rule, will not be ready until the end of the project.
I used to translate my projects into
pig Latin , but Asian characters were not tested this way, and most browsers do not support it. Now I am testing translations using the spit language (
xh_ZA
). All browsers support
Spit , and for Nelson Mandela is the native language, but no one ever asked me to support it for the product.
I don’t speak spit, so I create a new translation file and add
xh
to the beginning and end of each line. It is so easy to see if I missed the line in the code. I insert a couple more Japanese kanji characters to check the encoding and get a random string with which you can check all the nuances of my translations.
Creating a test translation file is easy. Just save a few configuration files with
xh_ZA
in the file name and replace ...
name = Username
… on:
name = xh吳清源Username吳清源xh
The resulting mixture allows you to check if I transferred all strings to resources, if I use the correct locale, if my forms are suitable for longer strings and if the encoding is correct. Then I quickly scan the application for everything where there is no
xh
, and correct errors before they become urgent problems.
Go to the localization correctly in advance - and you will save yourself from big problems in the future.
About the translatorThe article is translated in Alconost.
Alconost is engaged in the
localization of applications, games and websites in 60 languages. Language translators, linguistic testing, cloud platform with API, continuous localization, 24/7 project managers, any formats of string resources.
We also make
advertising and training videos - for websites selling, image, advertising, training, teasers, expliners, trailers for Google Play and the App Store.
Read more:
https://alconost.com