FreeBSD + PostgreSQL: database server tuning

Hi, Habsoobshchestvu!

Probably, my article will not be of interest to experienced system administrators and will seem like a copy-paste. But I address it to those who, like me, being only a developer, first encountered the need to also administer the server, while solving the tasks of a high-load database. And so that Google does not curse you, I will try to gather in one place the basic techniques for overclocking the database server, which I successfully managed to implement.

The input to my task is as follows: a dual-processor (Intel Xeon) machine, 8 hard drives of 500 GB and 12 GB of RAM. And full, including physical, access to this good. Task: organize a fast database server based on FreeBSD and PostgreSQL OS.

1. RAID

Proper partitioning of the existing hard drives into raids will be needed for the PostgreSQL feature like tablespacing (see below). I divided my 8 hards into pairs, organizing in this way: I combined two pairs in RAID1 and two pairs in RAID0 (in general, we need at least 6 hards for our purposes - combine two pairs in RAID1, leave the other 2 as is). If there are more hard drives, you can think of something more reliable, such as RAID5, RAID10, etc., but there is a chance that it will work somewhat slower. I will not go into details on how to organize raids, because I am not strong in hardware, I can only say that I did not touch any controllers, because on the server after the BIOS utility is loaded that allows you to do it programmatically.
')

2. Installing the OS, database server and using your own kernel

At first, we simply install fryahu on the first RAID1. I installed the FreeBSD 8.2 Release AMD64 distribution with all the files. The 64-bit version is needed for the system to “see” all the RAM.

Now the most interesting thing: why do we need to compile the kernel and what parameters to change? This is necessary to allow the PostgreSQL server to use as many resources as is required for high load. So, what parameters of a DB interest us. In the book by Alexey Vasiliev “Working with Postgresql. Configuring, scaling "The following parameters are recommended for heavily loaded databases (postgresql.conf file):

shared_buffers = 1/8 RAM or more (but not more than 1/4);
swork_mem in 1/20 RAM;
smaintenance_work_mem in 1/4 ram;
smax_fsm_relations in the planned number of tables in the bases * 1.5;
max_fsm_pages in max_fsm_relations * 2000;
fsync = true;
wal_sync_method = fdatasync;
commit_delay = 10 to 100;
commit_siblings = 5 to 10;
effective_cache_size = 0.9 from the cached value, which shows free;
random_page_cost = 2 for fast cpu, 4 for slow;
cpu_tuple_cost = 0.001 for fast cpu, 0.01 for slow ones;
cpu_index_tuple_cost = 0.0005 for fast cpu, 0.005 for slow ones;
autovacuum = on;
autovacuum_vacuum_threshold = 1800;
autovacuum_analyze_threshold = 900;

These options really suit us, except for two:

1) Maximum number of connections

Depends on the specific situation. The script in krone works for me (to connect to a DB and enters data), I considered that 256 should suffice:

max_connection = 256;

But the default FreeBSD configuration does not provide such a value for the number of connections. If you set this value and try to start the postgre daemon, it will not work. It is necessary to increase the corresponding parameters of the system. To do this, and collect your core. Take the default kernel configuration GENERIC, make a copy with the name KERNEL_MAX_PERF, edit KERNEL_MAX_PERF as follows: change the number of semaphores, adding the following to the default options:

options SEMMNI=512 options SEMMNS=1024 options SEMUME=64 options SEMMNU=512

(these are values for max_connection = 256).

2) The maximum amount of RAM that PostgreSQL can take (this is important for large queries). The shared_buffers parameter in postgresql.conf is responsible for it. There are different recommendations for the value for this quantity. I came to the conclusion that if this is a dedicated server for a database, then it is possible for one process to give almost the entire amount of RAM minus what the system needs for its needs. I selected 8 GB from 12. In order for the system to allow us to set the value we need for shared_buffers, in the kernel it is necessary to change the SHMMAXPGS option, the value of which is calculated by the formula:

SHMMAXPGS = shared_buffers / PAGE_SIZE

in my case shared_buffers = 8GB, PAGE_SIZE = 4Kb for all i386, it means
SHMMAXPGS = 8 * 1024 * 1024/4 = 2097152); now we can write the SHMMAX parameter (it is calculated dynamically in the kernel). So, we write to the kernel config:

options SHMMAXPGS = 2097152 options SHMMAX = "(SHMMAXPGS*PAGE_SIZE + 1)"

It remains to compile the kernel with the KERNEL_MAX_PERF config. The kernel compilation procedure itself is simple, here I refer you to the official mana.

We load the OS with our kernel, install the latest version of PostgreSQL (I had version 9.0.4), first we will start PostgreSQL with the default config for verification. If everything is ok, we change the parameters in postgresql.conf to those specified above, we restart PostgreSQL. It started - we go further.

Note: if for some reason it was not possible to compile the kernel with the parameters set, then you can register them in sysctl.conf:

kern.ipc.shmall=2097152 kern.ipc.shmmax=8589938688 kern.ipc.semmap=256

and start up with the default kernel GENERIC.

3. Tablespacing

Tablespacing is the ability of PostgreSQL to determine locations in the file system where files representing database objects will be stored. Simply put, if we spread tables, indexes and logs on different disks, then read / write data will be faster than if all of this were on the same disk.

This is where we need our raids. Let me remind you that we have four sections: two RAID1 and two RAID0. On the first RAID1, we have installed an OS and a postgres. On the second RAID1 we will store the tables of our database. Suppose it is mounted as / disk1. On the first RAID0 we will store indexes. Let it be mounted on the file system as / disk2. Logs leave the second RAID0, we assume that it is mounted as / disk3.

The following steps should be taken:

create folders for tables, indexes and log:
#mkdir -p /disk1/postgresql/tables #mkdir -p /disk2/postgresql/ind #mkdir -p /disk3/postgresql/log
make a postgres ounder for these folders, and take away all the rights from the rest (let me remind you that postgres is the user who starts up when installing PostgreSQL, if the installation is done in the standard way according to the official mana):
#chown -R postgres /disk1/postgresql/tables /disk2/postgresql/ind /disk3/postgresql/log #chmod -R go-rwx /disk1/postgresql/tables /disk2/postgresql/ind /disk3/postgresql/log
log into the psql client under postgres and create two tablespaces:
CREATE TABLESPACE space_table LOCATION '/ disk1 / postgresql / tables' ;
CREATE TABLESPACE space_index LOCATION '/ disk2 / postgresql / ind' ;
if your database’s invader is not postgres, but, for example, myuser, then you must give the user myuser permissions to the created tablespace (you can also execute in the client):
GRANT CREATE ON TABLESPACE space_table TO myuser;
GRANT CREATE ON TABLESPACE space_index TO myuser;
Now under myuser 'om you can change the tablespace for tables and indexes:
ALTER TABLE mytable SET TABLESPACE space_table;
ALTER INDEX mytable SET TABLESPACE space_index;
stop the postgres daemon, move the folder with the log and make a symbolic link to it:
#/usr/local/bin/rc.d/postgres.sh stop #mv /usr/local/pgsql/data/pg_xlog /disk3/postgresql/log #cd /usr/local/pgsql/data #ln -s /disk3/postgresql/log/pg_xlog

We start postgres:
#/usr/local/bin/rc.d/postgres.sh start

If everything is done correctly, the daemon should start.

4. Partitioning

Partitioning is a logical partitioning of one large table into small physical chunks. This allows you to significantly speed up the query execution time, if the table is really large.

I have a rather typical situation: the script works in crown, collecting statistics for a certain dimension. On the web interface, the user can view these statistics. For a week about 10 million rows are inserted into the table. If everything is written in one table, then you will be cursed. It will all work terribly slow.

Let's try to break this table into pieces, taking time as the partitioning criterion. In this case, when the user wants to see the statistics, and you can see it only for a certain time period, the database server, when prompted, will have to wool not the entire large table, but several small ones that fall into the selected time period.

Unfortunately, in PostgreSQL, partitioning is not implemented at the database level, so you have to do it with pens using the table inheritance property.

So, we have a table measure_data_master, where we write our measurements. For example, as a time period, one week suits. Go:

for the master table measure_data_master do NOT make any integrity constraints check and do NOT create indexes
in the postgresql.conf config edit option:
constraint_exclusion = on
create descendant tables of the form:
CREATE TABLE measure_data_y2011m06d06 ( CHECK ( measure_time = DATE '2011-06-06' AND measure_time DATE '2011-06-13' )
) INHERITS ( measure_data_master ) ;
create indexes for descendant tables:
CREATE INDEX measure_data_y2011m06d06_key ON measure_data_y2011m06d06 ( measure_time ) ;
it is necessary that when you insert a new row, it is recorded in the desired child table. Create for this trigger function:
CREATE OR REPLACE FUNCTION measure_insert_trigger ( )
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW . Measure_time > = DATE '2011-06-06' AND
NEW . measure_time < DATE '2011-06-13' ) THEN
INSERT INTO measure_data_y2011m06d06 VALUES ( NEW . * ) ;
ELSIF ( NEW . Measure_time > = DATE '2011-06-13' AND
NEW . measure_time < DATE '2011-06-20' ) THEN
INSERT INTO measure_data_y2011m06d13 VALUES ( NEW . * ) ;
.....................................
ELSIF ( NEW . Measure_time > = DATE '2011-12-19' AND
NEW . measure_time < DATE '2011-12-26' ) THEN
INSERT INTO measure_data_y2011m12d19 VALUES NEW . * ) ;
ELSE
RAISE EXCEPTION 'Date out of range.Fix the measure_insert_trigger () function!' ;
END IF ;
RETURN NULL ;
END;
$$
LANGUAGE plpgsql;
well, the trigger itself, which will call the function:
CREATE TRIGGER insert_measure_trigger
BEFORE INSERT ON measure_data_master
FOR EACH ROW EXECUTE PROCEDURE measure_insert_trigger ( ) ;

Of course, writing such large queries is inconvenient. I wrote a php script that creates the tables and all that is needed for them for the whole year ahead.

Here, perhaps, all that I wanted to tell. If you share your experiences from this area, I will be very grateful.

Source: https://habr.com/ru/post/120932/

All Articles

FreeBSD + PostgreSQL: database server tuning

1. RAID

2. Installing the OS, database server and using your own kernel

3. Tablespacing

4. Partitioning

More articles: