Reducing the amount of PostgreSQL data on disk

Usually when compiling data structures and tables, no one bothers with the order of the columns. Actually, what's the point? If necessary, you can change the order of the columns in the SELECT, so why bother? So, there is something to worry about, since the order of the columns can significantly affect the size of the table. Yes, the size of the table may depend on the order of the columns, even if the data are the same.

How is this explained? There is such a thing as the alignment of the CPU data, and the low-level size of the data structure depends on it. The conscious choice of the order of the columns makes it possible to optimize the size of the data. Do not believe? Let's try:

test=# CREATE TABLE t_test ( i1 int, i2 int, i3 int, v1 varchar(100), v2 varchar(100), v3 varchar(100) ); CREATE TABLE

In this example, 6 columns. 3 integer columns one after another and 3 varchar columns, also one after another. Add 10 million lines to this table:

 test=# INSERT INTO t_test SELECT 10, 20, 30, 'abcd', 'abcd', 'abcd' FROM generate_series(1, 10000000); INSERT 0 10000000

The total size of the table is 574 MBytes:
')

 test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 574 MB (1 row)

Let's try to change the location of these columns. In the following example, after the varchar column there is an integer column. This is repeated three times:

 test=# CREATE TABLE t_test ( v1 varchar(100), i1 int, v2 varchar(100), i2 int, v3 varchar(100), i3 int ); CREATE TABLE

Now add 10 million lines ...

 test=# INSERT INTO t_test SELECT 'abcd', 10, 'abcd', 20, 'abcd', 30 FROM generate_series(1, 10000000); INSERT 0 10000000

... and the table will increase significantly:

 test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 651 MB (1 row)

The data in the table has not changed - they are simply selected to illustrate this effect. If I wrote “abc” instead of “abcd”, we would not see the difference in size, but a string of 4 characters no longer fits in a smaller buffer.

Conclusion

An important conclusion that can be drawn from this experiment is that packing the same types of data next to each other definitely makes sense. In addition, I found that it makes sense to pack integer columns at the beginning of tables. Often, this may add a couple more percent to performance, simply because the data structure is slightly smaller than if it is not done.

From the translator:

Article author: Hans-Jürgen Schönig. The original is available by reference .

Source: https://habr.com/ru/post/319222/

All Articles

Reducing the amount of PostgreSQL data on disk

Conclusion

More articles: