📜 ⬆️ ⬇️

Long story about date localization without a year in PHP

Let's start with a simple task - output a localized date: there should be a day, the full name of the month in the locale language and a full year. Nowadays it is really very simple. PHP has its own i18n extension, intl , which is included in the kernel from version 5.3. And in this intl there is a class IntlDateFormatter , which in turn has several predefined formats. We use it LONG format.

 <?php foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) { $formatter = new IntlDateFormatter( $locale, IntlDateFormatter::LONG, IntlDateFormatter::NONE, 'Europe/Moscow' ); echo $formatter->format(1455111783), PHP_EOL; } 

Result :

 February 10, 2016 10  2016 . 10 de febrero de 2016 ۱۰ ﻑﻭﺭیﻩٔ ۲۰۱۶ ﻡ. //   - RTL-,        

So far so good. And now let's change the conditions slightly: “ output a localized date: there should be a day and the full name of the month in the locale language ”. That is, we do not want to display the year.

The desired result looks like this:
')
 February 10 10  10 de febrero ۱۰ ﻑﻭﺭیﻩٔ ﻡ. //     ,    ,  ,    

Actually, now the task is not as simple as it seems.

Honestly, this entire post will be solely about this task.

Wait, wait, but why do you even need it?


Some, perhaps, may wonder, what kind of format would I need at all? It’s pretty obvious if you’re working with a system that displays a tape of events with time stamps. But if not, let's just take a look at Twitter.

twitter date formats

If the tweet was published recently, Twitter will show you a correctly formatted time interval. If the tweet happened a long time ago, you will see a properly formatted date. Moreover, if the tweet was published this year, then there will not be a year at this date. And that's great, it's part of a good UX.

The idea is that users quickly understand when it happened. The extra data is noise.

Now you know why. Let's return to our task.

OK, then there must be a separate format for this, like LONG , but without a year, right?


Not this way. IntlDateFormatter only four standard formats: FULL , LONG , MEDIUM and SHORT , and each of them has a year.

 <?php $formats = [ IntlDateFormatter::FULL, IntlDateFormatter::LONG, IntlDateFormatter::MEDIUM, IntlDateFormatter::SHORT, ]; foreach ($formats as $format) { $formatter = new IntlDateFormatter( 'en_US', $format, IntlDateFormatter::NONE, 'Europe/Moscow' ); echo $formatter->format(1455111783), PHP_EOL; } 

Result :

 Wednesday, February 10, 2016 February 10, 2016 Feb 10, 2016 2/10/16 

If you think a little, it will become clear that it is impossible to determine in advance the constants for each custom format that a developer, designer, or manager can ever think of.

Ha, I just came up with a simple solution!


Come on? Let me guess: you just want to cut a year out of what you get after formatting with LONG , right? Ready to argue that way. Without examples it can be difficult to understand what the problem is. Let's just see.

I remind you that we have.

 February 10, 2016 10  2016 . 10 de febrero de 2016 ۱۰ ﻑﻭﺭیﻩٔ ۲۰۱۶ ﻡ. 

Cut the year.

 February 10, 10  . 10 de febrero de ۱۰ ﻑﻭﺭیﻩٔ ﻡ. 

Do you see all these residual artifacts of the type , and the . and the de and definitely something else in the last line, which I cannot single out?

Therefore - no, this is not even a close decision. By the way, there is nothing shameful about it, it was my first “quick fix”.

Well, there are patterns, let's use patterns!


Yes, IntlDateFormatter actually works only with patterns inside (format constants are simply converted to the corresponding pattern), and when creating you can specify your own.

A pattern consists of several predefined sequences of letters .

We'll see ... It looks like we need the "d MMMM" pattern.

 <?php foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) { $formatter = new IntlDateFormatter( $locale, IntlDateFormatter::NONE, IntlDateFormatter::NONE, 'Europe/Moscow', null, "d MMMM" ); echo $formatter->format(1455111783), PHP_EOL; } 

Result :

 10 February 10  10 febrero ۱۰ فوریهٔ 

Looks great! Although wait, what's that for? Oh...

Remind we want

 February 10 10  10 de febrero ۱۰ ﻑﻭﺭیﻩٔ ﻡ. 

No, the guys are great, just one coincidence. This is because you specified with the pattern not only parts of the date (day and month), but also the order in which they should go, and the separators. As if you said “the day number must first go, then the full name of the month, and a space between them”. For any locale. This is complete nonsense.

The truth is that the locale is not only a language, it is also a date formatting pattern . And the pattern is not only what to include in the result, but also in what order to arrange it.

 <?php foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) { $formatter = new IntlDateFormatter( $locale, IntlDateFormatter::LONG, IntlDateFormatter::NONE, 'Europe/Moscow' ); echo $formatter->getPattern(), PHP_EOL; } 

Result :

 MMMM d, y d MMMM y ''. d 'de' MMMM 'de' y d MMMM y G 

See, they are all different.

But there must be a ready solution! It can not be that it is not! Or...?


sigh Yes, I thought the same way. How is it even possible that in PHP, in an adult language with a developed ecosystem and one of the most powerful community, there may not be some basic functionality? Sad but true: possible.

There is at least one very important thing missing in intl - the DateTimePatternGenerator class from ICU. It is made exactly in order to solve our little puzzle and all other similar.

Wait, wait, wait, what's the ICU?


ICU is "International Components for Unicode" - the components of internationalization for Unicode.

Quotes from the ICU website.

ICU is a mature, widely used set of C / C ++ and Java library software providing unicode and globalization support for software applications. CU C ++ and Java software is widely portable and gives

...

Formatting: Format numbers, dates, times and amounts according to the conventions of a chosen locale. This code includes the correct language, choosing the appropriate abbreviations, ordering fields correctly, etc. This data also comes from the Common Locale Data Repository.

In short, this is such a cool set of libraries. The intl extension itself is not capable of any magic, it is something like a proxy to these libraries.

 $ php -i | grep intl -A5 intl Internationalization support => enabled version => 1.1.0 ICU version => 56.1 ICU Data version => 56.1 

To use IntlDateFormatter , ICU must be installed on your system (or you must build PHP with ICU initially). Different versions of ICU will give different formatting results.

 $ dpkg -S icu libicu52:amd64: /usr/lib/x86_64-linux-gnu/libicule.so.52.1 libicu52:amd64: /usr/lib/x86_64-linux-gnu/libicule.so.52 libicu52:amd64: /usr/lib/x86_64-linux-gnu/libicutest.so.52 ... 

(The system has version 52.1 installed, and PHP, as can be seen above, is compiled from 56.1. This is normal.)

Clear. You mentioned some kind of DateTimePatternGenerator , tell


Exactly, DateTimePatternGenerator , this is, in my opinion, the most magical thing in ICU of those that are about formatting date-time.

Another quote from the ICU website:

This class provides you with a "yy-mm-dd" format.

By adding successive patterns. Once it is done, it can be made using a skeleton, The generator will return the "best fit" pattern to that skeleton.

This is a method that has been used to get the best way to get it. However, generators can be built directly from other data as well.

This is exactly what we need! We feed the so-called "skeleton" (parts of the date that need to be included in the formatting result) to the getBestPattern method, and it returns the most appropriate pattern, and with it we already know what to do: transfer to IntlDateFormatter - and that's it!

How it could work.

 $skeleton = "MMMMd"; foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) { $pgen = new IntlDateTimePatternGenerator($locale); $pattern = $pgen->getBestPattern($skeleton); $formatter = new IntlDateFormatter( $locale, IntlDateFormatter::NONE, IntlDateFormatter::NONE, 'Europe/Moscow', null, $pattern ); echo $formatter->format(1455111783), PHP_EOL; } 

Result (probably would have been):

 February 10 10  10 de febrero ۱۰ ﻑﻭﺭیﻩٔ ﻡ. 

Yuhhhu Dadada, I stupidly copied the “desired result” block. In fact, I do not know what will be there, but I hope it will be so.

So what if I want to use the custom format right now?


As a result, I came to the second obvious solution: generate a config containing each custom pattern for each locale that your project supports. And do this with the appearance of each new locale.

Here is a simple snippet.

 <?php // ... foreach ($locales as $locale) { $pattern = <<<CONFIG '%s' => [ 'medium_no_year' => "%s", // %s 'long_no_year' => "%s", // %s ], CONFIG; $mediumF = new IntlDateFormatter($locale, IntlDateFormatter::MEDIUM, IntlDateFormatter::NONE); $longF = new IntlDateFormatter($locale, IntlDateFormatter::LONG, IntlDateFormatter::NONE); printf( $pattern, $locale, $mediumF->getPattern(), $mediumF->format(1455111783), $longF->getPattern(), $longF->format(1455111783) ); } 

In the end, you will have something like

  'en_US' => [ 'medium_no_year' => "MMM d, y", // Feb 10, 2016 'long_no_year' => "MMMM d, y", // February 10, 2016 ], 'ru_RU' => [ 'medium_no_year' => "d MMM y ''.", // 10 . 2016 . 'long_no_year' => "d MMMM y ''.", // 10  2016 . ], 'es_ES' => [ 'medium_no_year' => "d MMM y", // 10 feb. 2016 'long_no_year' => "d 'de' MMMM 'de' y", // 10 de febrero de 2016 ], 'fa_IR' => [ 'medium_no_year' => "d MMM y G", // ۱۰ فوریهٔ ۲۰۱۶ م. 'long_no_year' => "d MMMM y G", // ۱۰ فوریهٔ ۲۰۱۶ م. ], 

Then you need to manually walk through all these lines and remove from them the part responsible for displaying the year.

  'en_US' => [ 'medium_no_year' => "MMM d", // Feb 10 'long_no_year' => "MMMM d", // February 10 ], 'ru_RU' => [ 'medium_no_year' => "d MMM", // 10 . 'long_no_year' => "d MMMM", // 10  ], 'es_ES' => [ 'medium_no_year' => "d MMM", // 10 feb. 'long_no_year' => "d 'de' MMMM", // 10 de febrero ], 'fa_IR' => [ 'medium_no_year' => "d MMM", // ۱۰ فوریهٔ م. 'long_no_year' => "d MMMM", // ۱۰ فوریهٔ م. ], 

When you have dozens of locales, it becomes exhausting work, JUST TRUST ME ( silent crying is heard ).

And that is not all. If you want to display such a date over time, you will have to load twice as many patterns. Because you can't just take and put together a separately formatted date and time. Add time to the end of the date, to the beginning or somewhere in the middle?

 <?php foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) { $pattern = <<<CONFIG '%s' => [ 'medium_no_year-short' => "%s", // %s 'long_no_year-short' => "%s", // %s ], CONFIG; $mediumF = new IntlDateFormatter($locale, IntlDateFormatter::MEDIUM, IntlDateFormatter::SHORT); $longF = new IntlDateFormatter($locale, IntlDateFormatter::LONG, IntlDateFormatter::SHORT); printf( $pattern, $locale, $mediumF->getPattern(), $mediumF->format(1455111783), $longF->getPattern(), $longF->format(1455111783) ); } 

Result :

  'en_US' => [ 'medium_no_year-short' => "MMM d, y, h:mm a", // Feb 10, 2016, 2:43 PM 'long_no_year-short' => "MMMM d, y 'at' h:mm a", // February 10, 2016 at 2:43 PM ], 'ru_RU' => [ 'medium_no_year-short' => "d MMM y ''., H:mm", // 10 . 2016 ., 14:43 'long_no_year-short' => "d MMMM y ''., H:mm", // 10  2016 ., 14:43 ], 'es_ES' => [ 'medium_no_year-short' => "d MMM y H:mm", // 10 feb. 2016 14:43 'long_no_year-short' => "d 'de' MMMM 'de' y, H:mm", // 10 de febrero de 2016, 14:43 ], 'fa_IR' => [ 'medium_no_year-short' => "d MMM y G،‏ H:mm", // ۱۰ فوریهٔ ۲۰۱۶ م.،‏ ۱۴:۴۳ 'long_no_year-short' => "d MMMM y G، ساعت H:mm", // ۱۰ فوریهٔ ۲۰۱۶ م.، ساعت ۱۴:۴۳ ], 

Remove the part responsible for the year, again, yes.

  'en_US' => [ 'medium_no_year-short' => "MMM d, h:mm a", // Feb 10, 2:43 PM 'long_no_year-short' => "MMMM d, 'at' h:mm a", // February 10, at 2:43 PM ], 'ru_RU' => [ 'medium_no_year-short' => "d MMM, H:mm", // 10 ., 14:43 'long_no_year-short' => "d MMMM, H:mm", // 10 , 14:43 ], 'es_ES' => [ 'medium_no_year-short' => "d MMM H:mm", // 10 feb. 14:43 'long_no_year-short' => "d 'de' MMMM, H:mm", // 10 de febrero, 14:43 ], 'fa_IR' => [ 'medium_no_year-short' => "d MMM،‏ H:mm", // ۱۰ فوریهٔ،‏ ۱۴:۴۳ 'long_no_year-short' => "d MMMM، ساعت H:mm", // ۱۰ فوریهٔ، ساعت ۱۴:۴۳ ], 

Yes, you just look at it, even a comma in Farsi is not an ordinary “our” comma, but ،, you just cannot guess how to collect the result from individual dates and times.

But how does everyone else in the PHP world do this?


It’s hard to believe, but they either don’t bother at all with the localization of dates, or do it wrong .

I looked into the source of several CMS, made in PHP.

I did not begin to look at frameworks, because I think that this is not a framework task. Although there may be some very basic support ... Well, maybe. Actually, I tried it, looked at Yii2, and they just recommend using pure intl . So let's get a better CMS.

Drupal


From the very first search, I came across an amazing ticket with an intriguing name - "Date intl support is broken, remove it". Lolshto !? And this is not a joke, they really did it.

intl was removed

In general, they solve the problem of formatting dates in approximately the same way (a pack of custom configs)
, but before the patch (see the screenshot) there was an intl key just for the localization pattern. And now they just do not care.

Also, if I understood correctly (I’m not an active Drupal user enough), after installing the CMS, each user himself must use these pens to register all these patterns. For each locale. Well, or there is something that I don’t know about, but it looks that way.

This is how it is done in 8.1.

date formats 8.1

But in version 9.x

date formats 9.x

(It seems that they have not had time to cut intl keys from this branch)

In general, this is not so bad, but as a CMS user, I don’t want to study all possible cultures at all to find out what kind of date formatting is used by them. All this work has already been done in the CLDR . Of course, sometimes I need custom patterns, but all I agree on is specifying which parts of the date and time I want to see as a result (“only day and month, please” or “please, I would have month and time without seconds ").

Wordpress


I have about the same relationship with WordPress, I am not one of the active users, so I used the search on Github. It seems that the main functions.php interest to us here is the date_i18n from functions.php (by the way, guys, wtf? A file with functions from 5.2k lines of code? Seriously? We’ve got the 2016th already.).

wordpress date_i18n

I honestly spent half an hour trying to understand how it works. But ... but ... yes, you just look at it.

wordpress date_i18n contents

Holy shit ... In short, it definitely does not look like a correctly implemented localization of dates, but at least because of date_format . It seems that they are trying to localize the names of the months and days of the week, no more. Connoisseurs correct me if I am mistaken.

Joomla!


I never touched Joomla at all. Therefore, the scheme is the same.

The result: almost the same as in Drupal, a pack of predefined formats that need to be set for each locale.

en-GB from default installation

en-GB Joomla config

See these letters? They tell us that date_format uses date_format (like many others) and does not use intl . Is that justified? In my opinion, no. intl since version 5.3 is included in the kernel, and we already have PHP 7 in the yard, and 5.5 is the minimum requirement for most modern code. That is, I absolutely understand when frameworks do not use any third-party extensions or chips of the latest versions of the language, because they believe that their code should work even on an iron, and they already have a lot of users who have code on irons of old versions. But this is clearly not the case. Call it "Legacy" or "technical debt" or something else and continue.

ModX Revolution


transport.core.system_settings.php



get.class.php



modifier.date_format.php



strftime sounds a little better than date_format (there are formats that give a false sense of correct localization) or nothing when we talk about localization, but this function does not do what we need.

Magento2


Magento is not a CMS in the generally accepted sense, it is an e-commerce platform, but it is widely known, and it even has a separate framework of its own. So why not.

And here I must say that this is the only code base in my review, in which the localization of dates is done almost correctly! This is the only code in which I have met using IntlDateFormatter , moreover, it is used as the basis of the formatting component.



But their code is not perfect. We look in Timezone.php .



I could not find a place in which they would try to format a date without a year. But there is a feeling that they are making the same mistake as we did a little earlier, trying to replace the year when formatting "dates with a long year." Or not? I am not a regular guru (although I am familiar with lookahead and lookbehind), so I’d better just execute the code from getDateFormatWithLongYear and see what happens.

 <?php foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) { $dateFormat = (new \IntlDateFormatter( $locale, \IntlDateFormatter::SHORT, \IntlDateFormatter::NONE ))->getPattern(); $formatWithLongYear = preg_replace( '/(?<!y)yy(?!y)/', 'Y', $dateFormat ); $formatter = new \IntlDateFormatter( $locale, \IntlDateFormatter::NONE, \IntlDateFormatter::NONE, null, null, $formatWithLongYear ); echo $formatter->format(1455111783), PHP_EOL; } 

Result :

 2/10/2016 10.02.2016 10/2/2016 ۲۰۱۶/۲/۱۰ م. 

It looks like success! Well, sometimes this trick works. Here the fact that they are so perverted is more important. This is another confirmation that PHP lacks that pattern generator itself.

And getDateTimeFormat is an obvious mistake. They concatenate date and time patterns. No, dudes, the order of displaying the date and time is not the same in all locales, we have already seen this above.

You can also look here . Anyway, great job, Magento!

Do you want to say that you are the first to notice the problem?


Not at all. The guys from HHVM have long ported this generator - https://github.com/facebook/hhvm/commit/bc84daf7816e4cd268da59d535dcadfc6cf01085 . Respect!
And there is also a bug in the PHP tracker - https://bugs.php.net/bug.php?id=70377 "Please add DateTimePatternGenerator to intl". By the way, vote if this problem touches you too. Suddenly there this vote really means something. ~ By the way, there is no protection against CSRF ~.

Anyone can criticize. Here I would have taken it myself!


Well, in general, after several rather miserable attempts to draw the community’s attention to the problem and ask some of the developers to add the missing things to the intl , I did - https://github.com/ksimka/intl_dtpg . This extension, which essentially implements a single function (as a function or class method) “find the most appropriate pattern for this set of parts of the date and time”. The same DateTimePatternGenerator::getBestPattern (there are still a lot of interesting things in this class, but personally I’m critically missing only this method so far).

But the problem is that I don’t shave either in C ++ or C. Therefore, use it at your own risk (after introduction, we’ll tell you if everything is fine there). The extension is written in C ++ based on the PHPCPP method examples + google + stackoverflow. Therefore, any improvements are extremely welcome.

In general, right there we have guys from PHP core dev team, port already, pliz, these missing parts in intl someone, or at least zadolbayte PHP developers requests. You can say that I forced you, threatening to force go training.

That's all. Thank you for reading to the end!

Source: https://habr.com/ru/post/278673/


All Articles