How to work with timestamps in PostgreSQL?

The topic of working with timestamps in PostgreSQL is poorly disclosed in Russian-language specialized publications on the Internet and is a frequent source of problems in the work of programmers. I bring to your attention the translation of material from Hubert Lubaczewski, the author of the popular foreign blog depesz.com . I hope the article will be useful for you!

From time to time in the IRC or in the mailing lists, someone asks questions that show a deep misunderstanding (or lack of understanding) of timestamps, especially those that take into account time zones. Since I’ve come across this before, let me tell you what timestamps are, how to work with them, and what are the most common problems you might encounter.

We have two types of data that we can use:
')

timestamp
timestamp with time zone (or timestamptz)

The timestamp type contains only the date and time, no other information. On the one hand, it may seem like a good idea (after all, in everyday life we only use the date and time, don’t we?), But on the other, it’s a complete horror.

Let's imagine that you have a time stamp “2014-04-04 20:00:00". What does it tell you? Unfortunately, not much. It all depends on what point of the planet we are talking about. Eight pm 4 April is a different point in time in Los Angeles, Chicago, London, Warsaw or Moscow. This is the problem of time zones.

Of course, you might think: “I will always be in the same time zone, I do not need any problems with the support of different time zones. In my time zone, the date and time will be quite enough to mark any point in time, because this is exactly what we do in “real life”.

But is it really?

Imagine that you have the label '2013-10-27 02:00:00', and you know that your application is tied to Polish time. In this case, you are no longer lucky, because it may be 2 am CST or an hour more, in the usual Central European time. All because of the seasonal shift hours.

I believe that using timestamps without a time zone is almost always a bug and needs to be fixed. There are even more problems if the entries in your application come from different time zones (for example, a scheduler application).

So the most obvious solution is to use timestamps with time zones (timestamptz).

First, it will not take more disk space:

$ select typname, typlen from pg_type where typname ~ '^timestamp'; typname | typlen -------------+-------- timestamp | 8 timestamptz | 8 (2 rows)

How does this work? The label must know the time zone, so why it does not require more space?

The fact is that she does not know the time zone. Inside, all values in the timestamptz columns are in UTC (Coordinated Universal Time) format.

UTC has nice features: it has no offset (it itself is the starting point from which the offset of other time zones is considered), and it has no difference between summer and winter time. So any timestamp in UTC format is always guaranteed to indicate only one point in time.

But if I always use UTC, how do I know the time in the time zone I need?

Every time, when it comes to the values of timestamptz, if the time zone is not specified, PostgreSQL uses a pre-configured time. And you can configure it in different ways:

timezone parameter in postgresql.conf
alter database ... set timezone = '...'
alter user ... set timezone = '...'
SET timezone = '...'

The first parameter is used to indicate in which time zone your server is located. In other words, the default time zone that will be used in the absence of other changes.

The following two methods change the default for the selected database and user.

The latter method can be used if you want your database connection to work with different settings.

Take into account how the output now () changes:

 $ select now(); now ------------------------------- 2014-04-04 20:32:59.390583+02 (1 row) $ set timezone = 'America/New_York'; SET $ select now(); now ------------------------------- 2014-04-04 14:33:06.442768-04 (1 row)

So every time you view or change the timestamptz values, PostgreSQL converts them to / from UTC.

This means that the values can be easily compared (they are all in the same time zone, there are no changes in daylight saving time, so comparison is always possible).

And now let's imagine that you have the value '2014-04-04 20:00:00'. And you know that this is the time in Los Angeles, but you want to save it in your database, which operates in a different time zone. You can check that the current offset is -7 hours, and use the value like this:

 $ select '2014-04-04 20:00:00-07'::timestamptz; timestamptz ------------------------ 2014-04-05 05:00:00+02

What happened? Why is not shown 8 pm?

The reason is simple - I inserted a timestamp in a request in some time zone. Inside, the label was converted to UTC, and then, again converted (perhaps even without UTC, I'm not sure) to my usual time zone, which is:

 $ show timezone; TimeZone ---------- Poland (1 row)

If I had set the time zone of Los Angeles, the result of the query would be:

 $ set timezone = 'America/Los_Angeles'; SET $ select '2014-04-04 20:00:00-07'::timestamptz; timestamptz ------------------------ 2014-04-04 20:00:00-07 (1 row)

It is important to understand that the displayed value always takes into account the setting of the time zone.

There is another way to get 20:00 in Los Angeles:

 $ set timezone = 'Poland'; SET $ select '2014-04-04 20:00:00'::timestamp at time zone 'America/Los_Angeles'; timezone ------------------------ 2014-04-05 05:00:00+02 (1 row)

It is very important to add “:: timestamp" after the value, otherwise we will get something strange:

 $ set timezone = 'Poland'; SET $ select '2014-04-04 20:00:00' at time zone 'America/Los_Angeles'; timezone --------------------- 2014-04-04 11:00:00 (1 row)

What happened here? Where did 11:00 come from?

Value in quotes (2014-04-04 20:00:00) is interpreted as timestamptz, which means 8 pm in my time zone:

 select '2014-04-04 20:00:00'::timestamptz; timestamptz ------------------------ 2014-04-04 20:00:00+02 (1 row)

And only after transferring the value to my time zone, the PG reads “at time zone ...", which is used to display the time in the selected time zone.

Thus, the timestamp at time zone gives the timestamptz value, which indicates the time when the local time in the selected time zone was as specified in the command.

And timestamptz at time zone gives the timestamp value, which shows what time it was in the selected time zone at the specified time.

This sounds a bit confusing, so let me give examples:

 select '2014-04-04 20:00:00'::timestamptz at time zone 'UTC'; timezone --------------------- 2014-04-04 18:00:00 (1 row) select '2014-04-04 20:00:00'::timestamp at time zone 'UTC'; timezone ------------------------ 2014-04-04 22:00:00+02 (1 row)

Interestingly, we can use this to transfer time from one time zone to another, even if Pg is not in any of them.

Suppose we want to know what time it is in Los Angeles when it is 8 am in Moscow. My local time is as follows:

 $ show timezone; TimeZone ---------- Poland (1 row)

There is little benefit from it.

First we need to define a point in time (in the format of timestamptz), which shows 8 am in Moscow:

 $ select '2014-04-04 08:00:00'::timestamp at time zone 'Europe/Moscow'; timezone ------------------------ 2014-04-04 06:00:00+02 (1 row)

This tells me that it corresponds to 6 am in my time zone. But we want to know the time in Los Angeles. I could write '2014-04-04 06: 00: 00 + 02' in the time zone 'LA', but you can do it differently:

 $ select ('2014-04-04 08:00:00'::timestamp at time zone 'Europe/Moscow') at time zone 'America/Los_Angeles'; timezone --------------------- 2014-04-03 21:00:00 (1 row)

Since the expression 'timestamp at time zone ..' is the same as timestamptz, we can use “at time zone" again to translate it back to the timestamp (without specifying a time zone) related to some another place.

I hope now everything is clear to you. I myself for a long time tried to understand this question, and finally understood everything :)

All of this has one interesting side effect: it’s not so easy to add indexes to functions that work with timestamptz. For example, you cannot create an index that will be used to get the day of the week:

 $ create table test (i timestamptz); CREATE TABLE $ create index q on test (to_char(i, 'Day')); ERROR: functions in index expression must be marked IMMUTABLE

As shown in the example above, the reason is very simple - the same point in time can refer to different days of the week depending on the time zone. And since to_char () uses the current time zone, it can produce different values for the same source data, depending on the time zone settings in the system:

 $ set timezone = 'Europe/Warsaw'; SET $ insert into test (i) values ('2014-04-04 06:00:00'); INSERT 0 1 $ select i, to_char(i, 'Day') from test; i | to_char ------------------------+----------- 2014-04-04 06:00:00+02 | Friday (1 row) $ set timezone = 'Europe/Moscow'; SET $ select i, to_char(i, 'Day') from test; i | to_char ------------------------+----------- 2014-04-04 08:00:00+04 | Friday (1 row) $ set timezone = 'America/Los_Angeles'; SET $ select i, to_char(i, 'Day') from test; i | to_char ------------------------+----------- 2014-04-03 21:00:00-07 | Thursday (1 row)

The same point in time, but different days. These may be different months or even different years, depending on where it was.

The timestamp (without a time zone) here "shows" a strong side - since it does not indicate the time zone, you can safely use it to extract information.

But we know how to translate timestamptz into a timestamp. You just need to specify her time zone. Therefore, we can try to do this:

 create index q on test (to_char(i at time zone 'Poland', 'Day'));

But, unfortunately, nothing comes out. The fact is that to_char is too versatile. You can use to_char like this:

 $ select to_char(now(), 'TMMonth'); to_char --------- April (1 row) $ set lc_time = 'pl_PL.UTF-8'; SET $ select to_char(now(), 'TMMonth'); to_char ---------- Kwiecień (1 row)

This time we get other results not because of the time zone, but because of locale.

The correct solution to the problem of indexing is to write your own function, which will call to_char in an absolutely constant "environment" and then index it. Like this:

 create function day_from_ts(timestamptz) returns text as $$ select to_char( $1 at time zone 'Poland', 'Day' ); $$ language sql immutable; CREATE FUNCTION

And now we can use it to index:

 create index q on test (day_from_ts( i )); CREATE INDEX

This is safe because the function itself causes the time zone to take the value “Poland”, and it calls to_char in such a way as to ignore the value of the locale (in other words, there is no TM prefix in the to_char format).

Naturally, to use this index, all queries must also use this function:

 select * from test where day_from_ts(i) = 'Friday';

Another important thing when working with time zones is getting Unix time, or the so-called epoch. In general, it is simple:

 $ select extract(epoch from now()); date_part ------------------ 1396638868.57491 (1 row)

Interestingly, it does not depend on the time zone:

 $ begin; BEGIN $ show timezone; TimeZone ---------- Poland (1 row) $ select now(), extract(epoch from now()); now | date_part -------------------------------+------------------ 2014-04-04 21:15:27.834775+02 | 1396638927.83477 (1 row) $ set timezone = 'America/Los_Angeles'; SET $ select now(), extract(epoch from now()); now | date_part -------------------------------+------------------ 2014-04-04 12:15:27.834775-07 | 1396638927.83477 (1 row) $ commit; COMMIT

The reason not known to all is that Unix time is always accepted in the UTC time zone. This means that when you extract an epoch from the timestamp timestamp, the PG assumes that it is in UTC. From which the following potential problems arise:

 $ select now(), extract(epoch from now()); now | date_part -------------------------------+----------------- 2014-04-04 21:19:01.456205+02 | 1396639141.4562 (1 row) $ select extract(epoch from '2014-04-04 21:19:01.456205'::timestamp); date_part ----------------- 1396646341.4562 (1 row)

In the first case, Pg gets a “point in time”, which is converted internally to UTC (and when it is displayed, it is converted to my time zone, +2).

In the second case, the timestamp is in my time zone, but it is assumed that this is UTC (without conversion!), And the epoch is taken from the value '2014-04-04 21: 19: 01.456205 UTC', not '2014-04-04 21: 19: 01.456205 + 02 '.

Wisely.

In short, try to avoid timestamp and use timestamptz.

The last thing I would like to say is not a bug or a potential problem, but rather a functionality that many people don’t know about.

As you have seen, PostgreSQL uses the timestamp (and timestamptz) up to microseconds. Many people insist that accuracy be only up to a second, although I personally do not like it.

Both the timestamp and timestamptz (and other time related data) can have extra precision (“precision”).

Let me give you a simple example:

 $ select now(), now()::timestamptz(0), now()::timestamptz(1); now | now | now -------------------------------+------------------------+-------------------------- 2014-04-04 21:23:42.322315+02 | 2014-04-04 21:23:42+02 | 2014-04-04 21:23:42.3+02 (1 row)

Of course, you can use it in the tables:

 $ create table test (i timestamptz(0)); CREATE TABLE $ insert into test(i) values (now()); INSERT 0 1 $ select * from test; i ------------------------ 2014-04-04 21:24:16+02 (1 row)

Fine! You do not need to change “now ()" or anything else, just add precision to the data type and it will correct everything.

I mentioned that I do not like it. The reason is simple - in any sufficiently loaded system, the second is too low level of accuracy. Moreover, the storage of data up to a microsecond does not cost me anything, but it can be useful. On the other hand, if the data is up to microseconds, how can I get the values to be displayed without fractions of a second?

It's simple: I use (in SELECT queries) to_char (), or date_trunc functions, or even cast to the timestamptz (0) type:

 $ select now(), to_char(now(), 'YYYY-MM-DD HH24:MI:SS TZ'), date_trunc('second', now()), now()::timestamptz(0); now | to_char | date_trunc | now -------------------------------+--------------------------+------------------------+------------------------ 2014-04-04 21:28:20.827763+02 | 2014-04-04 21:28:20 CEST | 2014-04-04 21:28:20+02 | 2014-04-04 21:28:21+02 (1 row)

For more details on how to work with timestamps, we are going to talk at the PG Day'16 Russia conference in July 2016! Prepare your questions, we will try to answer them.

You, for sure, will also find for yourself an article written by colleagues from the Mail.ru Group. It clearly shows an example when only timestampt is not enough to solve the problem.

Source: https://habr.com/ru/post/273177/

All Articles

How to work with timestamps in PostgreSQL?

More articles: