📜 ⬆️ ⬇️

“Never say never” or working with timezone correctly

This article talks about the problems that await a programmer working with time zones. In theory, it seems, everything is good, simple and understandable, but life is a complicated thing, and in practice, at times, completely unexpected situations arise.

TL; DR: Working with time zones is pain and humiliation. Never work with timezone!

So, everyone around you is telling you that when you receive time from a user, you need to immediately transfer it to UTC, you only need to work with time in UTC, and you also need to store time strictly in UTC. The advice, at first glance, looks reasonable, and following it makes your life easier ... Unless your program involves complex work with dates. Write to the database the date and time of user registration on the site? Save the time of sending the message or the date of creation of the order in the online store? Print a message to the log with the date-time? Use UTC and everything will be fine, you can not even read this article further. Any current time can be easily converted to UTC and forget about problems. But what if we want to work with time in the future? Or in the past? For example, if we write a calendar service, or a service for postponing sending messages?
')

UTC is not a panacea

Let me explain by example. Suppose we created the same pending message service. Having visited our site, a user can create a reminder for himself at any time (of course, in the future) by mail or SMS. Our site is extremely simple: we set the date, time, enter the reminder text and communication channel (email address or phone number), add the data received from the user to the database and then periodically select samples from it and send messages. Everything, profit and respect of grateful people!

No, not all. Following the advice to always keep everything in UTC everywhere, we converted the date and time received from the user to UTC and put them into a database. Let a user from Moscow come to our site on March 2, 2014 and create a reminder at 09:00 am on November 3, 2014. Accordingly, we put the value “2014-11-03 05:00:00” in the database, because on that day, March 2, 2014, the offset for the Europe / Moscow timezone for November 3, 2014 was “UTC + 4”.

Do you understand what I'm getting at?

Yes, on July 21, 2014, the State Duma of the Russian Federation adopted a bill on the abolition of summer time. According to this law, since October 26, 2014, the offset for the Europe / Moscow timezone was “UTC + 3” instead of “UTC + 4” (and the daylight saving time was canceled, but this is not about that now). Accordingly, if we send a notification to the user on November 3 at 5:00 am UTC, he will receive it at 8:00 am Moscow time, and I am sure that the user will be perplexed, because he requested that the notification be sent to him exactly at nine in the morning.

The conclusion is simple: you can store time in UTC, but only for events in the present and recent past, that is, for those dates, the time zone of which will not change. It is dangerous to keep time in UTC for dates in the future, because no one knows what other laws the governments of which countries will adopt, and what will happen to the time zones in ten years, five years, or even a year.

On the other hand, if you store the user's local time and time zone in the database, it will be almost impossible to work with such data. Let us return to our example of the notification service: two users created by notification. The first user from Moscow asked me to send him an SMS on December 15, 2014 at 3:00 pm (we write to the database his local time “2014-12-15 15:00:00” and his time zone “Europe / Moscow”). The second user from New York asked to send him an email on December 15, 2015 at 7:00 PM (we write to the database his local time “2014-12-15 19:00:00” and his time zone is “America / New_York” ). So far so good: we have a local time recorded in which the user would like to receive his notification, and he will receive it strictly at this time, even if the government of one of these countries changes one of these time zones (offset, daylight saving time, anything).

Problems begin when you write a script that selects from the notification database to send. If all dates were recorded in UTC, everything would be simple - every minute we select messages to send:
SELECT * FROM reminders WHERE remind_time < NOW(); 

Provided that “SELECT NOW ();” returns the time in UTC. But we recorded the user's local time and time zone in the base, what to do? Suffer :-) After all, “NOW ()” in UTC is “+3” hours in Moscow (and the message is already late) and “-5” hours in New York (the message is still too early to send).

No, of course, you can come up with many ways to select from the database those notifications that it’s time to send, but all of them on a more or less loaded service will lead to performance problems, and indeed we want to do everything right, without crutches, right?

What are the options? There are many of them, but I see only one more or less acceptable option: store three values ​​in the database: the time in UTC (for sampling by this field), the local time of the user and his time zone (time zone). Yes, we will have redundant data stored, however, I don’t know of a single loaded service that wouldn’t resort to data denormalization. In the real world, this is normal. What are the benefits we get? In the case of changes in time zones, we can go through the records for the changed time zones with a special script and update the time in UTC if it has changed as a result of updating the time zone. In my humble opinion, this is a good compromise.

Still worse than it seems

Like everything, yes? No, we just started :-) The government can not only change the configuration of time zones, but also add new ones and throw out old time zones. For example, for the residents of the Russian city of Chita (and not only for him, but now is not about that), from October 26, 2014, a new time zone “Asia / Chita” was introduced (before such a time zone did not exist) instead of “ Asia / Yakutsk. The difference with UTC in the former time zone (Asia / Yakutsk) is +09: 00, and in the new time zone (Asia / Chita), the difference is +08: 00. The problem is that we store in the database only the time and time zone of the user, but not his geographical location. And for records with the Asia / Yakutsk time zone, we cannot in any way know whether our user is from Chita or Yakutsk, and we cannot reliably determine the time of sending a message to the user. Checkmate! Do not forget to suffer, friends.

If you have the opportunity to find out the geographic location of the user and the next time you visit the site to determine that he is in a region with a changed time zone (Chita for the case above), you can ask him for the correct time zone. And to propose to update the time zone for all its events (with recalculation of time in UTC for each event), but here too there may be pitfalls and nuances that are beyond the scope of this article. By the way, partly for this reason, in the Mail.ru Calendar settings, we ask the user to select his geographical location (city), and not the time zone, as other services do :-) And even so, to be honest, periodically there are problems.

Keeping time in the past is also not so simple. If this past is relatively recent (for example, we are talking about the twenty-first century), then there should be no problems with keeping time in UTC (although no one can guarantee you, of course). If we are talking about the twentieth century or (oh, horror) more ancient times, problems are guaranteed. Let's start with the fact that for many periods of the history of the last century, information about the transfer of hours is constantly changing to this day. For example, in the update of the tzdata time zone database version 2014g dated August 30, 2014, changes were made for a number of USSR time zones by a few seconds or minutes for dates before 1926. Just someone noticed the inconsistency and notified the tzdata compilers. Or, here’s another example from the times closer to us: in the tzdata update of version 2014a of March 9, 2014, the information on the date of Ukraine’s transition from Moscow time to Eastern Europe changed: this transition did not take place on January 1, 1992 (as recorded in this database), but first of July 1990.

The time zone database is updated several times a year, new time zones around the world appear, existing rules change, information about the past time is updated, some changes are constantly taking place, and they need to be constantly taken into account.

How is it still correct to store time?

So, how is it all right to store time in the database? It is better, of course, not to do this, but if it’s very necessary, then here are my personal recommendations (I will be glad to hear criticism or suggestions):
  1. If you need to keep the time of the event that just happened, the current time, in fact, of a certain action, store it in UTC. These can be log entries, time of user registration, making an order or sending a letter.
  2. If the time is not tied to the user or his time zone, store it in UTC. This may be, for example, the time of the next solar eclipse.
  3. If you need to store time in the past or in the future, save the user's local time, and save it next to the timezone. And even better, so that for sure, save the geographic location of the user. If you need to make samples for this time, save alongside the time in UTC, and update this time when the time zone information changes.
  4. If you need to know exactly the time for any date for a given geographic location (for example, for astronomical calculations) - store the exact coordinates of the user, but not his time zone. However, if you are faced with such a task, then you already know how to do it right.

The first option covers possible use cases for 99% of programs and, quite possibly, this will be enough for you. However, it is necessary to clearly understand and be aware of the choice of one or another variant of actions.

We work with time

With the storage of time, sort of, sorted out. However, you can often hear the same advice “always work with time in UTC”. The implication is that as soon as you get the time from the user, you need to immediately transfer it to UTC and work only with time in UTC. Sounds logical, doesn't it?

Not true. At least, not in all cases, and here's a specific example.

Let's return to our example with the service of deferred messages. Everything is good, the service is evolving, users are satisfied, but they are asked to add functionality for repeating notifications. And repetitions are not only simple (“every day”, “every other day”, “every month”), but also quite complex (“every week on Tuesdays”, “every month on the last Friday of the month”, etc.). In order not to write your bike for these repetitions, we will study ready-made solutions. There is such a thing as "recurring events." There is a special format for describing the rules of repetition, which, of course, does not take into account all possible options (for example, you cannot specify “two days after two”), but it covers most cases. Examples of this format can be seen in the description of the RRULE field of the iCalendar specification and in the documentation for the rrule object of the python-dateutil module for Python.

Take the python-dateutil module and use it in our code. It seems everything should be fine, but users complain, and the study of these complaints leads us with rather unexpected results.

One of the options for recurring events - repeat by day of the week. We can describe an event that repeats itself, for example, at 12:00 every week on Tuesdays and Fridays. Here is how it may look like in practice, in real code:
 >>> import datetime >>> from dateutil import rrule >>> list(rrule.rrule(rrule.WEEKLY, count=4, byweekday=(rrule.TU, rrule.FR), dtstart=datetime.datetime(2014, 11, 3, 12, 0))) [datetime.datetime(2014, 11, 4, 12, 0), datetime.datetime(2014, 11, 7, 12, 0), datetime.datetime(2014, 11, 11, 12, 0), datetime.datetime(2014, 11, 14, 12, 0)] 

It would seem that all is well. Now let's imagine that a user from Moscow has created a recurring event that occurs at one in the morning. As soon as we received time from him "2014-11-03 01:00:00" we, according to the recommendations of smart people, immediately transfer it to UTC (the translation process does not interest us now, we should know that in fact we take three hours from the received time), and get the next time in UTC: datetime.datetime (2014, 11, 2, 23, 0). So far, so good. Let's get replays for the time received:
 >>> list(rrule.rrule(rrule.WEEKLY, count=4, byweekday=(rrule.TU, rrule.FR), dtstart=datetime.datetime(2014, 11, 2, 23, 0))) [datetime.datetime(2014, 11, 4, 23, 0), datetime.datetime(2014, 11, 7, 23, 0), datetime.datetime(2014, 11, 11, 23, 0), datetime.datetime(2014, 11, 14, 23, 0)] 

It seems that something went wrong. If we translate the obtained values ​​into the user's local time (we add three hours to each one), we will see that the repetitions have shifted and the event repeats itself all the same at one in the morning, but already on Wednesdays and Saturdays. And this is not an error of the python-dateutil module, the code worked correctly. This is our mistake, in this particular case we needed to work with the user's local time.

By the way, many calendar services have this bug, for example, the iCal program in OS X, in certain cases, considers repetitions completely wrong.

Do not forget to suffer

The conclusion can be made simple and completely banal: never listen to categorical statements recommending that you never do certain things. Always think over and work through all possible options, carefully study the architecture of the project, write quality tests and keep them up to date.

And working with time zones is pain and suffering, yes. If there is even the slightest opportunity not to work with them - use it, you will not regret. Finally, I will give a couple of examples of incorrect work of real programs:

Python 2.7.6
 ➜ date , 9  2014 . 22:44:32 (MSK) ➜ python -c "import datetime; print datetime.datetime.now()" 2014-11-09 22:44:33.310904 ➜ python -c "import datetime; print datetime.datetime.utcnow()" 2014-11-09 19:44:34.405287 

It seems all is well. Look further:
 ➜ date +%z +0300 ➜ python -c "import time; print time.timezone/3600" -4 

WAT? No, like this is not a bug, but a feature , but this is not easier for anyone. What is the general sense in the code that can break at any moment (and breaks!)?

Firefox 33.0.3
 new Date(2015, 0, 6) "Tue Jan 06 2015 00:00:00 GMT+0300 (Russia TZ 2 Standard Time)" new Date(2015, 0, 7) "Tue Jan 06 2015 23:00:00 GMT+0300 (Russia TZ 2 Standard Time)" new Date(2015, 0, 8) "Thu Jan 08 2015 00:00:00 GMT+0400 (Russia TZ 2 Daylight Time)" 

WAT? No, I understand that this question has already been raised many times, but it is no easier to live from it.

In general, I can say, do not forget to suffer :-)

And how do you work with dates, time and time zones?

Vladimir Rudnykh,
Technical Director of the Mail.Ru Calendar.

Source: https://habr.com/ru/post/242645/


All Articles