📜 ⬆️ ⬇️

JavaScript internationalization API: implementation in Firefox

What is internationalization?


Internationalization (internationalization, and for brevity - i18n , i mean i, 18 more letters and n; in Russian, this will turn out to be i17 ) is a way to create applications that can be easily adapted for different audiences speaking different languages. It’s very easy to make a mistake, assuming that all your users come from the same locality and use the same language - especially if you don’t even think about what you are supposed to do.

function formatDate(d) { //    ,  //.  ? var month = d.getMonth() + 1; var date = d.getDate(); var year = d.getFullYear(); return month + "/" + date + "/" + year; } function formatMoney(amount) { //   –  ,     .  ? return "$" + amount.toFixed(2); } function sortNames(names) { function sortAlphabetically(a, b) { var left = a.toLowerCase(), right = b.toLowerCase(); if (left > right) return 1; if (left === right) return 0; return -1; } //     ,   ? names.sort(sortAlphabetically); } 


Historically, i18n support in JavaScript is poorly done. For formatting with support and 1717 and used toLocaleString () methods. The final lines contain those details that are provided by a specific implementation of the language - there is no possibility to choose (do you need the day of the week in the date? Does the year matter or not?). Even if all the details are included, the format may be incorrect - decimal instead of percent, etc. And the locale can not be selected.

When sorting, an almost useless comparison of the text with regard to locale (collation) is suggested. There is a localeCompare (), but with an inconvenient interface, unsuitable for sorting. And she also can not choose a locale.
')
These restrictions are so complex that serious applications send data to the server in order to perform the necessary locale-sensitive operation there, and then get the result. Data transfer to the server and back, only for formatting monetary amounts. Rave.

New API for i17 and JS


New ECMAScript Internationalization API enhances JS capabilities. All the bells and whistles are provided for formatting dates and numbers and sorting text. Locale can be chosen, and in order to speed it can be done once, and not every time before the operation.

But this API is not a panacea, but at best a “good attempt.” The exact output format is not specified. The implementation can support exotic languages, or simply ignore all formatting options. Most implementations will support many locales, but without any guarantees.

The implementation of Firefox depends on the International Components for Unicode (ICU) library, which itself depends on the Unicode Common Locale Data Repository (CLDR). However, most of the functions of ICU are written in JavaScript.

Intl interface


The API and 17 also live in an Intl object. It contains three constructors: Intl.Collator, Intl.DateTimeFormat and Intl.NumberFormat. Creating an object happens like this:

 var ctor = "Collator"; //     var instance = new Intl[ctor](locales, options); 


locales - a string specifying a language tag or an object containing several language tags. Tag - lines such as en (English), de-AT (Austrian German) or zh-Hant-TW (Taiwanese Chinese in a traditional entry). Tags can include unicode extensions in the form -u-key1-value1-key2-value2 ..., where each key is an extension key. Different designers interpret this differently.

options is an object whose properties determine formatting and sorting.

Firefox supports more than 400 locales for sorting and more than 600 for formatting - so most likely there will be a necessary locale.

Intl does not guarantee specific behavior. If the requested locale is not supported, Intl tries to best handle the request. If supported, its behavior is not specified rigidly. You can never assume that a particular set of settings matches a specific format. This may vary from browser to browser or from version to version. No formatting components are specified - a brief entry for the day of the week can be “S”, “Sa” or “Sat”.

Date and time formatting


Settings

 weekday, era "narrow", "short", or "long". (era –   : BC/AD,    ,  ..) month "2-digit", "numeric", "narrow", "short"  "long" year day hour, minute, second "2-digit"  "numeric" timeZoneName "short"  "long" timeZone  "UTC"     UTC.   "CEST"  "America/New_York"   ,      Firefox. 


The exact format is not set, but the point is that “narrow”, “short” and “long” give results of different lengths - “S” or “Sa”, “Sat” and “Saturday”. The conclusion may be ambiguous Saturday and Sunday in the short form can give an "S". “2-digit” and “numeric” mean two-digit or full-length dates of dates: “70” and “1970”.

There are special settings:
 hour12 12-  24- .    .       ,    – 0   12pm,      . 


There are two special properties of localeMatcher (“lookup” or “best fit” values) and formatMatcher (“basic” or “best fit”), both default values ​​are “best fit”. Specify how locale and formatting is used. They are used very rarely and can be ignored.

Locale related settings

DateTimeFormat allows formatting with customizable calendar and number systems. These details are specified in the language tag in the Unicode extension settings.

For example, the Thai tag in Thailand is th-TH. The format of the Unicode extension is -u-key1-value1-key2-value2 ... The key of the calendar system is ca, the numeric one is nu. Thailand’s numerical system will have a thai value, while the Chinese will have a Chinese one. Therefore, to format dates, we append these extensions to the end of the language tag: th-TH-u-ca-chinese-nu-thai.

Read the documentation for details.

Examples

After creating the DateTimeFormat object, use it with the format () function. This is a related function, so no need to call it directly. It is passed a timestamp or a Date object.

 var msPerDay = 24 * 60 * 60 * 1000; // July 17, 2014 00:00:00 UTC. var july172014 = new Date(msPerDay * (44 * 365 + 11 + 197)); 


Let's format the date for American English. We include two-digit month / day / year, hour / minute and time zone in a short record.

 var options = { year: "2-digit", month: "2-digit", day: "2-digit", hour: "2-digit", minute: "2-digit", timeZoneName: "short" }; var americanDateTime = new Intl.DateTimeFormat("en-US", options).format; print(americanDateTime(july172014)); // 07/16/14, 5:00 PM PDT 


Now we will do the same for the Portuguese Brazilian and for the Portuguese in Portugal. We will make the format longer, with the full record of the year and the name of the month, but in the UTC zone.

 var options = { year: "numeric", month: "long", day: "numeric", hour: "2-digit", minute: "2-digit", timeZoneName: "short", timeZone: "UTC" }; var portugueseTime = new Intl.DateTimeFormat(["pt-BR", "pt-PT"], options); // 17 de julho de 2014 00:00 GMT print(portugueseTime.format(july172014)); 


A compact schedule of Swiss trains for UTC using official languages, listing from most popular to least popular:

 var swissLocales = ["de-CH", "fr-CH", "it-CH", "rm-CH"]; var options = { weekday: "short", hour: "numeric", minute: "numeric", timeZone: "UTC", timeZoneName: "short" }; var swissTime = new Intl.DateTimeFormat(swissLocales, options).format; print(swissTime(july172014)); // Do. 00:00 GMT 


Let's try to display the date in Japanese, using the Japanese calendar with the year and era:

 var jpYearEra = new Intl.DateTimeFormat("ja-JP-u-ca-japanese", { year: "numeric", era: "long" }); print(jpYearEra.format(july172014)); // 平成26年 


And now - a long date for Thailand, using Thai numbers and the Chinese calendar:

 var options = { year: "numeric", month: "long", day: "numeric" }; var thaiDate = new Intl.DateTimeFormat("th-TH-u-nu-thai-ca-chinese", options); print(thaiDate.format(july172014)); // ๒๐ 6 ๓๑ 


Formatting numbers


Settings

The basic settings for formatting numbers are as follows:

 style "currency", "percent"  "decimal" ( )    currency    USD  CHF. ,  style = "currency",    . currencyDisplay "code", "symbol"  "name",   "symbol". "code"   , "symbol"    $  £. "name"   . minimumIntegerDigits   1  21 (),   1.     . minimumFractionDigits, maximumFractionDigits   0  20 ().      minimumFractionDigits,    maximumFractionDigits   .   –    ( 2,  0  3)  style = "currency",  0.    0, 3   ,    –    . minimumSignificantDigits, maximumSignificantDigits   1  21 ().  ,       ,       . useGrouping  ,   true.       ( ,      ). 


Locale settings

NumberFormat supports custom numeric formatting for the nu key, just like the DateTimeFormat does. For example, the Chinese language tag is zh-CN. The Han number entry system is given as hanidec. To format the number for this system, we attach the Unicode extension as a tag: zh-CN-u-nu-hanidec.

For a full description of the features, see the documentation.

Examples

To begin with, let's format the currency for Chinese using the Han number entry. Select the “currency” style, then use the code for Chinese renminbi (yuan), grouping by default, with the usual number of fractional digits.

 var hanDecimalRMBInChina = new Intl.NumberFormat("zh-CN-u-nu-hanidec", { style: "currency", currency: "CNY" }); print(hanDecimalRMBInChina.format(1314.25)); // ¥ 一,三一四.二五 


Now let's format the cost of gasoline according to the rules of the USA and UK

 var gasPrice = new Intl.NumberFormat("en-US", { style: "currency", currency: "USD", minimumFractionDigits: 3 }); print(gasPrice.format(5.259)); // $5.259 


Let's try the Arab percent for use in Egypt. Make sure they have at least two decimal places. The order of numbers may be different in systems with right-to-left entries.

 var arabicPercent = new Intl.NumberFormat("ar-EG", { style: "percent", minimumFractionDigits: 2 }).format; print(arabicPercent(0.438)); // ٤٣٫٨٠٪ 


Now - Persian language used in Afghanistan. At least two digits in the integer part and not more than two in fractional.

 var persianDecimal = new Intl.NumberFormat("fa-AF", { minimumIntegerDigits: 2, maximumFractionDigits: 2 }); print(persianDecimal.format(3.1416)); // ۰۳٫۱۴ 


Finally, we derive the number of Bahraini dinars in Arabic. These dinars are uncharacteristically divided into thousandths, so we should have three decimal places.

 var bahrainiDinars = new Intl.NumberFormat("ar-BH", { style: "currency", currency: "BHD" }); print(bahrainiDinars.format(3.17)); // د.ب.‏ ٣٫١٧٠ 


Sorting


Settings

 usage "sort"  "search" (  "sort") "base", "accent", "case"  "variant".    ,         ,    .  ""     –     “a” and “ä”   ,    – .   "base"     (    “a”, “A”  “ä”  ). "accent"     ,    (“a”  “A”  ,  “ä”    ). "case"     ,   (“a”  “ä”  ,  “A”  ). , "variant"    .   "sort"    "variant";  –    . numeric  ,         .  ,   numeric   "F-4 Phantom II", "F-14 Tomcat", "F-35 Lightning II";    numeric   "F-14 Tomcat", "F-35 Lightning II", "F-4 Phantom II". caseFirst "upper", "lower"  "false" ( ).      - "upper"     ("B", "a", "c"), "lower"  ("a", "c", "B")  "false"   ("a", "B", "c"). ignorePunctuation  , false  , ,       (  "biweekly"  "bi-weekly"  ). 


Locale settings

The sorting setting in the Unicode extension is set as co, and sets the sorting type — address book (phonebk), dictionary (dict), and others.

Additionally, the kn and kf keys can duplicate the numeric and caseFirst properties of the options object. But their support is not guaranteed, so it’s better not to use them.

Examples

Collator objects have a compare function. It takes x and y arguments and returns a number less than zero if x <y, 0 if x = y, and a number greater than zero if x> y.

Let's try to sort the German names. In Germany, there are two different sorting sequences - an address book and a dictionary. The first is based on pronunciation, when “ä”, “ö” and others are revealed as “ae”, “oe”, etc.

 var names = ["Hochberg", "Hönigswald", "Holzman"]; var germanPhonebook = new Intl.Collator("de-DE-u-co-phonebk"); //  ["Hochberg", "Hoenigswald", "Holzman"]: // Hochberg, Hönigswald, Holzman print(names.sort(germanPhonebook.compare).join(", ")); 


Some words are joined by umlauts, so in dictionaries it makes sense to sort them ignoring umlauts (except when words differ only in umlauts, for example, schon before schön).

 var germanDictionary = new Intl.Collator("de-DE-u-co-dict"); //  ["Hochberg", "Honigswald", "Holzman"]: // Hochberg, Holzman, Hönigswald print(names.sort(germanDictionary.compare).join(", ")); 


Sort the versions of Firefox, indicated with different errors, random accents and diacritics, according to the rules of American English. We take into account the version number and sort by the value of the number, and not by the characters of the digits.

 var firefoxen = ["FireFøx 3.6", "Fire-fox 1.0", "Firefox 29", "FÍrefox 3.5", "Fírefox 18"]; var usVersion = new Intl.Collator("en-US", { sensitivity: "base", numeric: true, ignorePunctuation: true }); // Fire-fox 1.0, FÍrefox 3.5, FireFøx 3.6, Fírefox 18, Firefox 29 print(firefoxen.sort(usVersion.compare).join(", ")); 


Finally, we search for strings ignoring case and accents.

 var decoratedBrowsers = [ "A\u0362maya", // A͢maya "CH\u035Brôme", // CH͛rôme "FirefÓx", "sAfàri", "o\u0323pERA", // ọpERA "I\u0352E", // I͒E ]; var fuzzySearch = new Intl.Collator("en-US", { usage: "search", sensitivity: "base" }); function findBrowser(browser) { function cmp(other) { return fuzzySearch.compare(browser, other) === 0; } return cmp; } print(decoratedBrowsers.findIndex(findBrowser("Firêfox"))); // 2 print(decoratedBrowsers.findIndex(findBrowser("Safåri"))); // 3 print(decoratedBrowsers.findIndex(findBrowser("Ãmaya"))); // 0 print(decoratedBrowsers.findIndex(findBrowser("Øpera"))); // 4 print(decoratedBrowsers.findIndex(findBrowser("Chromè"))); // 1 print(decoratedBrowsers.findIndex(findBrowser("IË"))); // 5 


Additional Information


It may be useful to determine whether some operations are supported in specific locales, or whether the locale itself is supported. For this, each constructor has a supportedLocales () function, and each prototype has a resolvedOptions () function.

 var navajoLocales = Intl.Collator.supportedLocalesOf(["nv"], { usage: "sort" }); print(navajoLocales.length > 0 ? "Navajo collation supported" : "Navajo collation not supported"); var germanFakeRegion = new Intl.DateTimeFormat("de-XX", { timeZone: "UTC" }); var usedOptions = germanFakeRegion.resolvedOptions(); print(usedOptions.locale); // de print(usedOptions.timeZone); // UTC 


Hereditary behavior


Before ES5, the toLocaleString and localeCompare functions did not have such advanced semantics, they did not accept the settings and were, in fact, useless. Therefore, their behavior has been changed to support Intl operations. If the exact behavior of the program with respect to locales is not particularly important to you, you can use the old functions. Otherwise, it is recommended to use primitives from Intl directly.

Conclusion


Internationalization is a very interesting topic, the complexity of which is limited only by the nature of human communication. The internationalization API addresses a small but necessary part of this complexity, and makes it easier to write web applications that take into account localization.

Source: https://habr.com/ru/post/253913/


All Articles