📜 ⬆️ ⬇️

These entertaining regional settings

Today we will talk about regional settings. But first, a small problem: what will the code below show? (The code is given in C #, but quite a common problem is considered, so you can imagine some other language in its place.)

Console.WriteLine((-42).ToString() == "-42"); Console.WriteLine(double.NaN.ToString() == "NaN"); Console.WriteLine(int.Parse("-42") == -42); Console.WriteLine(1.1.ToString().Contains("?") == false); Console.WriteLine(new DateTime(2014, 1, 1).ToString().Contains("2014")); Console.WriteLine("i".ToUpper() == "I" || "I".ToLower() == "i"); 

How many true values ​​did you get? If it is greater than 0, then it does not bother you to learn more about the regional settings, because the correct answer is “depends.” Unfortunately, many programmers do not even think that these settings may differ in different environments. And it’s lazy to expose InvariantCulture for the entire code by this programmer, as a result of which their excellent applications behave very strangely, getting to users from other countries. Errors are very different, but they are often associated with formatting and parsing lines - quite frequent tasks for many programmers . The article provides a brief selection of some important points that are affected by regional settings.

CultureInfoExplorer Sceenshot
')
Quite a bit of theory: in .NET, all information about a specific language and regional settings can be found using the CultureInfo class. If you have not experienced cultures before, then this post is well suited for initial familiarization. A sophisticated programmer, fascinated by the study of various existing regional settings, can get tired of manually viewing all the CultureInfo . Personally, I got tired at some point. Therefore, a small WPF application called CultureInfoExplorer ( link to GitHub , binaries ) appeared on the screenshot above. It allows:I hope there will be readers who will find this program useful. You can learn a lot about the various regional settings. Well, now move on to the examples.

Numbers


We have NumberFormatInfo (available through CultureInfo.NumberFormat ) for the representation of numbers. And it means not only ordinary numbers, but also percentage and monetary values. Note that the values ​​are positive and negative: if you are working with localization / globalization, then it is important to pay attention to this. I strongly recommend to at least run through the eyes of the documentation and see the available properties.

One of the most popular properties that causes problems in humans is called NumberDecimalSeparator . It is responsible for what the integer part of the number will be separated from the fractional when formatting a number. A typical example of an error: a programmer merges an array of fractional numbers into a line, separated by commas. After that, it tries to parse the line back into the array. If NumberDecimalSeparator is equal to a point, then everything will be fine. For example, with the en-US culture exposed, everything worked for the programmer, he released his product. This product is downloaded by a user with the ru-RU culture and begins to be sad: after all, he has a NumberDecimalSeparator equal to a comma: an array of elements 1.2 and 3.4 will turn into a “1,2,3,4” line during such a merge, and it will be problematic to parse. Personally, I feel even sadder when a programmer who has encountered a similar problem does not try to solve it normally, indicating the correct NumberFormatInfo when formatting, but begins to conjure points with commas or commas with dots. You need to understand that the NumberDecimalSeparator , in principle, can be any. For example, in the fa-IR (Persian) culture, it is equal to the forward slash ('/').

We also have similar properties for interest and currency: PercentDecimalSeparator and CurrencyDecimalSeparator . All these three values ​​are not required to match. For example, for Kazakhs (kk-KZ), NumberDecimalSeparator and PercentDecimalSeparator are comma, and CurrencyDecimalSeparator is equal to minus sign (exactly the same with which negative numbers are denoted).

Some believe that an integer when converting to a string gives a value consisting only of numbers. But these figures can be divided into groups. For the size of groups, the NumberGroupSizes property is responsible , and the separator is NumberGroupSeparator (percent and currencies have similar properties, but they are not required to match again). Groups can be of different sizes: for example, in many cultures (as-IN, bn-BD, gu-IN, hi-IN, etc.) NumberGroupSizes is {3, 2}. For example, the number 1234567 in an as-IN culture will look like “12,34,567” . You can use a space \ u0020 as a group separator (for example, in af-ZA and lt-LT), but when you see it, do not rush to drive in another crutch for parsing and formatting strings. Most often, instead of the usual space, a non-breaking space is used \ u00A0 (our native ru-RU).

Signs to indicate negative and positive numbers are also included in the culture: NegativeSign , PositiveSign . Thank God, in all available cultures they are equal to minus and plus, but it is not worth laying down for it: the environment can be redefined and any properties can be set to properties. And the most interesting thing is not the signs, but the formatting patterns of positive and negative values. For example, the formatting of a negative number is determined using the NumberNegativePattern , which has five possible values:

 0 (n) 1 -n 2 - n 3 n- 4 n - 

For example, in the ti-ET culture (Tigrinya (Ethiopia)), the value -5 will appear as (5). With percentages and currencies ( PercentNegativePattern , PercentPositivePattern , CurrencyNegativePattern , CurrencyPositivePattern ), the situation is even more fun. For example, for CurrencyNegativePattern there are as many as sixteen possible values:

 0 ($n) 1 -$n 2 $-n 3 $n- 4 (n$) 5 -n$ 6 n-$ 7 n$- 8 -n $ 9 -$ n 10 n $- 11 $ n- 12 $ -n 13 n- $ 14 ($ n) 15 (n $) 

There are also special properties for special characters and special numerical values: PercentSymbol , PerMilleSymbol , NaNSymbol , NegativeInfinitySymbol , PositiveInfinitySymbol . I have seen a real project in which a double was taken, formatted into a string (of course, in the current culture of the user), and then compared in string form with “-Infinity” . And depending on this very current culture, NegativeInfinitySymbol can take on a variety of values:

 '- ', '-- អនន្ត', '(-) முடிவிலி', '-∞', '-Anfeidredd', '-Anfin', '-begalybė', '-beskonačnost', 'Éigríoch dhiúltach', '-ifedh', '-INF', '-Infini', '-infinit', '-Infinit', '-Infinito', '-Infinitu', '-infinity', 'Infinity-', '-Infinity', 'miinuslõpmatus', 'mínusz végtelen', '-nekonečno', '-neskončnost', '-nieskończoność', '-njekónčne', '-njeskóńcnje', '-onendlech', '-Sonsuz', '-tükeniksizlik', '-unendlich', '-Unendlich', '-Άπειρο', '-', ' ', '-უსასრულობა', 'אינסוף שלילי', '-لا نهاية', 'منهای بی نهایت', 'مەنپىي چەكسىزلىك', '-අනන්තය', 'ᠰᠦᠬᠡᠷᠬᠦ ᠬᠢᠵᠠᠭᠠᠷᠭᠦᠢ ᠶᠡᠬᠡ', 'མོ་གྲངས་ཚད་མེད་ཆུང་བ།', 'ߘߊ߲߬ߒߕߊ߲߫-', 'ꀄꊭꌐꀋꉆ', '負無窮大', '负无穷大' 

Examples of various useful properties we have disassembled. And now let's fool around a bit: we’ll slightly change Russian culture so that its new meaning spoils our life in the example from the beginning of the post:

 var myCulture = (CultureInfo)new CultureInfo("ru-RU").Clone(); myCulture.NumberFormat.NegativeSign = "!"; myCulture.NumberFormat.PositiveSign = "-"; myCulture.NumberFormat.PositiveInfinitySymbol = "+Inf"; myCulture.NumberFormat.NaNSymbol = "Not a number"; myCulture.NumberFormat.NumberDecimalSeparator = "?"; Thread.CurrentThread.CurrentCulture = myCulture; Console.WriteLine(-42); // !42 Console.WriteLine(double.NaN); // Not a number Console.WriteLine(int.Parse("-42")); // 42 Console.WriteLine(1.1); // 1?1 

Perhaps someone here will tell me: “Why should we consider such examples at all? No programmer will ever write such a thing! ” And I will answer: "Well, well, not one will, of course." The situation becomes sad when you distribute some library, and one of its users decided to have fun with the culture. Maybe he just loves to have fun, or maybe he writes an application for some kind of cultural culture (say, a dead or fictional language). But it is not important. And the important thing is that your library begins to behave strangely in an environment unfamiliar to it. Therefore, it is not necessary to base on the fact that NegativeSign and PositiveSign never change. It is better to simply indicate the culture you need and live happily.

And yet, I advise everyone to read the recent post of John Skithe The BobbyTables culture . Brief essence: John Skit swears at those who do not escape parameters in SQL queries, even if these are numbers and dates. And then John takes a couple of requests.

 "SELECT * FROM Foo WHERE BarDate > '" + DateTime.Today + "'" "SELECT * FROM Foo WHERE BarValue = " + (-10) 

and defines the wonder culture:

 CultureInfo bobby = (CultureInfo) CultureInfo.InvariantCulture.Clone(); bobby.DateTimeFormat.ShortDatePattern = @"yyyy-MM-dd'' OR ' '=''"; bobby.DateTimeFormat.LongTimePattern = ""; bobby.NumberFormat.NegativeSign = "1 OR 1=1 OR 1="; 

With a flick of the wrist, requests turn into:

 SELECT * FROM Foo WHERE BarDate > '2014-08-08' OR ' '=' ' SELECT * FROM Foo WHERE BarValue = 1 OR 1=1 OR 1=10 

Well, I think no further explanation is needed.

date and time


With dates and times, everything is especially hard. For dates, we have the DateTimeFormatInfo class (the CultureInfo.DateTimeFormat property), and it has a Calendar . And there is a main culture calendar ( CultureInfo ), and there is a list of available calendars for use ( CultureInfo.OptionalCalendar ). We have a large bundle of standard calendars: ChineseLunisolarCalendar , EastAsianLunisolarCalendar , GregorianCalendar , HebrewCalendar , HijriCalendar , JapaneseCalendar , JapaneseLunisolarCalendar , JulianCalendar , KoreanCalendar , KoreanLunisolarCalendar , PersianCalendar , TaiwanCalendar , TaiwanLunisolarCalendar , and an image that is used for an image that is used to be an image that is used to be an art tantal calendar Their logic, I tell you, is the most varied. We will not dwell in detail, because there is enough information on this topic on the Internet, and there will be enough material for a series of independent posts. The rules for formatting dates and times are even more fun than the numbers: a bunch of patterns for different options for formatting dates, native names for months and days of the week, designations for AM / PM, separators, etc. Say, December 31, 2014 can be represented ( dateTime.ToString ("d") ) in the following formats:

 09/03/36 10/3/1436 12/31/2014 1436/3/10 2014.12.21. 2014/12/21 2014-12-21 31. 12. 2014 31.12.14 31.12.14 ý. 31.12.2014 31.12.2014 . 31.12.2014. 31/12/14 31/12/2014 31/12/2557 31-12-14 31-12-2014 31- 14 31--14 

And these are only default values ​​(without connecting optional calendars). But even here you can see a variety of chronology: someone in the yard is 1436, and someone is 2557 (this is a reference to the penultimate line of the example from the beginning of the article) If you operate with dates, then you should think about whether they should always be shown in the same format or adapt to the user and display the date in a more familiar form. Well, about the parsing of dates, I generally keep silence.

The turkey test


Turkey flag

There is a classic post from 2008 called Does Your Code Pass The Turkey Test? . I will not retell it in detail, it is better to read the original yourself. The brief summary of The Turkey Test is: change the current culture to tr-TR (Turkish (Turkey)) and run your application. Does everything work fine? In this culture, there is enough fun with dates, numbers, and strings. If we go back to our first example, then in this culture “i” .ToUpper () is not equal to “I” , and “I” .ToLower () is not equal to “i” (if you are interested in learning more about upper and lower case letters, then I highly recommend this post and this SO-answer about UTF-8 , it's just great). At the end of the post is a wonderful example in which a string consisting of Arabic numerals "٤٦٠٣٨" is used for the regular expression \ d {5} .

Instead of conclusion


The science of regional settings is complex. In this post, in no case do I pretend to give out complete information about what they can influence. There are still a lot of different interesting things connected with internalization (I think, only about the text going from right to left, you can write a separate post, and not just one). I just wanted to show some interesting examples of how CultureInfo.CurrentCulture can affect your application. Hopefully, in terms of expanding the general erudition, this material will be useful to someone. The general moral is this: if you don’t want to think that there are many different cultures in the world, then use CultureInfo.InvariantCulture everywhere (or another culture that suits you) - in most cases you will be able to sleep peacefully. And if you think about it, then it would be nice to study this area more thoroughly. This is a good book that can help: Net Internationalization: Windows and Web Applications .

Any additional facts on how CultureInfo can affect the operation of various functions are welcome. I think many will find their own fascinating stories.

Source: https://habr.com/ru/post/237209/


All Articles