Failover Evolution in PostgreSQL: Replication Phase

We continue to publish a series of translations of Gulcin Yildirim , the developer of the company 2ndQuadrant, on the resiliency of PostgreSQL and today we bring to your attention the second post of the series.

Gulcin will arrive at PG Day'17 and personally answer the participants' questions, and also tell in more detail not only about replication in PG, but also about automating Postgres upgrades in the cloud and not only. Prepare your questions!

')
PostgreSQL is a terrific project that is evolving with amazing speed. In this series of articles, we will focus on the evolution of fault tolerance capabilities in PostgreSQL throughout its versions. This is the second article in the series, in which we will talk about replication and its significance for the resiliency and reliability of Postgres.

If you want to follow the evolution process from the very beginning, I recommend that you read the first article in the series: The Evolution of Failover in PostgreSQL .

Replication in PostgreSQL

Database replication is a concept that we use to describe the technology for creating a copy of a dataset on a remote system. Storing a reliable copy of the existing system is one of the most important backup tasks, because we all want our copies to be easy to maintain and use, as well as stable.

Let's look at the basic architecture. As a rule, separate database servers are called nodes . The entire group of database servers involved in replication is known as a cluster . The database server that allows the user to make changes is called main (master) or primary (primary) , or can be described as a source of changes. The database server that allows access only in read mode is called Hot Standby or Hot Standby (the term Hot Standby is explained in detail in the Standby Modes section ).

A key aspect of replication is that changes to the data are committed to the main server, and then transferred to other nodes. In some cases, a node may send data changes to other nodes, and this process is known as cascading or relay . Thus, the primary server is the sender node, but the sender node is not necessarily the primary server. Replication is often divided into categories, based on whether it is allowed to have more than one primary node. In this case, multimaster replication takes place.

Let's take a look at how PostgreSQL handled replication throughout its existence and what the current state of fault tolerance is in terms of replication.

Replication history in PostgreSQL

Once (about 2000–2005). Postgres only supported fault tolerance / recovery of a single node, which was achieved mainly with the help of WAL - transaction log. Partially, fault tolerance is provided by the MVCC (multi-version control parallel access control system), but this is mostly optimization.

WAL (write ahead logging) was and remains the main fault tolerance method in PostgreSQL. Its essence is that you have WAL files in which you write everything, and in the event of a system failure, you can play them and restore everything. This was enough for single-site architectures, and replication is considered the best solution for achieving the fault tolerance of multiple nodes.

The Postgres community has long been convinced that replication inside Postgres is useless and can be done with external tools, so tools like Slony and Londiste have appeared. ( We’ll talk about triggering replication solutions in the next articles in this series .)

In the end, it became clear that the stability of a single server was not enough, and more and more people began to demand full hardware fault tolerance, as well as the switching method built into Postgres. Then physical replication appeared, at the time known as physical streaming.

We will go through all the replication methods in this article, but first let us follow the main releases in the chronology of replication history events in PostgreSQL:

PostgreSQL 7.x (~ 2000)
- Replication should not be part of the Postgres kernel
- Londiste - Slony (trigger logical replication)
PostgreSQL 8.0 (2005)
- Recovery at time, (Point-In-Time recovery, WAL)
PostgreSQL 9.0 (2010)
- Streaming replication (physical)
PostgreSQL 9.4 (2014)
- Logical decoding (extracting the changeset [changsets])

Physical replication

PostgreSQL solved the key problem with the need to replicate in the same way as most relational databases: it took WAL and made it possible to send it over the network. These WAL files are then loaded into a separate Postgres instance, which works in read mode.

The backup instance in read mode simply applies the changes (from WAL), and the only write operations come from the same WAL journal. In general, this is how streaming replication works. Initially, replication was considered the original sending of all files ( log shipping ), but later it became streaming.

In log shipping, we sent entire files using the archive_command command. The logic there was quite simple: you simply send the archive and register it somewhere - for example, a whole WAL file for 16MB - and then apply it somewhere, request the next one, apply it and so on. It was later put on stream via the network, thanks to the use of the libpq protocol in PostgreSQL 9.0.

Current replication is more commonly known as physical stream replication , since we are transferring a series of physical changes from one node to another. This means that when we insert a row into a table, we generate change records for the insert and all index entries .

When we apply to the VACUUM table, change records are also generated.

In addition, physical streaming replication records all changes at the byte / block level, so it is almost impossible to do anything other than replay them on a replica.

Figure 1 shows how physical replication works with just two nodes. The client performs requests at the primary site, the changes are written to the transaction log (WAL), and copied over the network to the WAL at the backup site. The recovery process on the node in standby mode reads changes from WAL and applies them to data files, just like in case of disaster recovery. If the backup node is in hot standby mode, clients can perform read requests while this happens.

Note : Physical replication is simply sending WAL files across the network from the primary node to the backup. Files can be sent by different protocols, for example, scp, rsync, ftp ... The difference between physical and physical stream replication is that stream replication uses an internal protocol to send WAL files ( sender and receiver processes ).

Standby Modes

Multiple nodes provide high availability. For this reason, modern architectures usually have backup nodes. There are different backup node modes (“warm” and “hot” reserve, warm & hot standby). Next, we explain the main differences between the backup node modes and consider the case of multimaster architecture.

Warm standby

It can be activated immediately, but cannot perform useful work until the moment of activation. If we continuously transfer the sequence of WAL files to another machine onto which the basic backup copy of the same database was loaded, then we have the warm standby system: at any time we can activate the second machine, and it will have an almost up-to-date copy of the database. Warm standby does not allow making requests in read mode, which is clearly demonstrated in Fig.2.

Recovery in warm stanby mode is so fast that the backup server usually becomes fully available a few moments after activation. That is why this mode is called a “warm” (backup) configuration with high availability.

Hot standby

Hot standby is a term to describe the ability to access the server and perform read requests while the server is in recovery or standby mode. This is useful both for replication purposes and for restoring a backup to the desired state with great accuracy.

The concept of “hot standby” also refers to the ability of the server to go from recovery to normal operation, while users continue to perform requests and / or keep their connections open. In Figure 3, you can see that this backup mode allows read requests.

Multimaster

All nodes can perform read / write tasks. ( We will look at multi-master architecture in the next articles in this series. )

Parameter WAL Level

There is a link between setting the wal_level parameter in the postgresql.conf file and what this setting is suitable for. The table below shows this connection in PostgreSQL version 9.6.

WAL Level	Suitable for
Minimal	Disaster recovery
Replica	Physical replication File-based archiving
Logical	Logical replication

A quick note: the wal_level parameter determines how much information is written to WAL. The default value is minimal, which records only the information necessary for disaster recovery or sudden shutdown. replica adds the logging required for archiving WAL, as well as the information needed to execute read requests on the standby server. Finally, logical adds the information needed to support logical decoding. Each level includes information recorded at all low levels.

In versions prior to 9.6, this parameter also allowed the archive and hot_standby values. They are still accepted, but displayed at the replica level.

Failover and Switchover

When replicating with one primary node, if this node dies, one of the backup ones should take its place (promotion). Otherwise, we will not be able to accept new write operations. Thus, the names “primary” and “reserve” are simply roles that can be assigned to any node at a specific point in time. To transfer the main role to another node, we perform a procedure called Switchover .

If the primary node dies and does not recover, a more serious role change occurs, known as failover . These two procedures are similar in many respects, but it is more convenient to use different terms for each event. ( Knowledge of the concepts of failover and switchover will help us deal with chronology questions in the next article. )

Conclusion

In this article, we discussed replication in PostgreSQL and its importance for fault tolerance and reliability. We looked at physical streaming replication and talked about standby modes in PostgreSQL. Failover and Switchover were also mentioned. We will continue the topic of talking about timelines in PostgreSQL in the next article.

In addition to the Gulcin speech , we are preparing for you a series of reports on replication and the problems of organization of the resiliency of various data storage systems from speakers such as Sveta Smirnova , Colin Charles , Nikolai Macievsky and many others. Vote for the most relevant topics for you, we will surely take into account all wishes when drawing up the final program of the conference!

Source: https://habr.com/ru/post/326440/

All Articles