In this article I will explain how to set up database replication for PostgreSQL. For the experiments, we will use the CentOS 5.3 Linux distribution, although this is not critical. we will use PostgreSQL 8.4.7 and SymmetricDS-2.2.2.
What is replication?
In fact, it is a mechanism for automatically synchronizing the contents of databases running on different servers. As a result of replication, these databases contain absolutely identical data. This is necessary, for example, in order to ensure the fault tolerance of the system (in case of the first database server crashes, the second one comes into operation), or to load balance the different servers can serve different clients.
For replication, you need at least two database servers, so we prepare two identical servers with a PostgreSQL database on each. The first will have an IP address of 10.0.2.20, the second will have 10.0.2.21, both will have a gateway 10.0.2.2.
You can get by with a virtual machine, such as VirtualBox, create two virtual servers in it and run them on your own computer.
')
In the above commands, the first character will be a # or $ sign, these signs mean that the command runs as root or as a regular user, respectively.
So, what actions need to be taken:
Start setting
Enable the postgresql service, if it is not already enabled:
# chkconfig --level 3 postgresql on
# service postgresql start
Now we need to create a database. Create it "in the forehead" does not work:
# createdb mytest
psql: FATAL: user "root" does not exist
here you need to create a role in the postgres for the Linux user. Create a superuser in the post-login with the login βsaβ and a Linux user with the same name, the password will be the same, βsaβ.
# adduser -m sa
# su - postgres
$ createuser -d -s -P sa
Now you can create the βtestβ table by initiating a command from under the user βsaβ:
# su - sa
$ createdb mytest
now you can use the psql utility:
psql mytest
Exit this utility by typing two characters
\q
Installing SymmetricDS
Well, now it's time to get down to the most interesting part of the work. Before continuing, make sure that we simultaneously have two servers with a working postgreSQL database server on each.
Servers must ping each other because SymmetricDS uses the HTTP protocol for synchronization. Ports 8080 and 9090 on servers should not be blocked by a firewall.
OK. SymmetricDS requires a java interpreter and JRE 1.6 itself, so install them on the servers, if this has not already been done:
Download the jre-6u24-linux installer from java.com, install it (just run this file),
set up a symbolic link (if it was not created for some reason):
# ln -s /usr/java/latest/bin/java /usr/bin/java
and write the classpath on the command line - without them, the SymmetricDS will fail with an error that the class was not found.
$ CLASSPATH=/usr/java/latest/lib; export CLASSPATH
In order for sym (the main executable of the Symmetric DS) to work, you also need to specify the database server so that it listens on the external network interface (and not just on localhost) and that it launches Unix users to it. This is done on each of our two servers:
1) uncomment this line in /var/lib/pgsql/data/postgresql.conf:
listen_addresses = '*'
2) specify the trust method for all connections in /var/lib/pgsql/data/pg_hba.conf:
local all all trust
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
3) To support session variables, add the line to the /var/lib/pgsql/data/postgresql.conf file:
custom_variable_classes = 'symmetric'
4) restart the database server:
# service postgresql restart
Check that it works, you can:
$ psql mytest -U sa -h localhost
This error message should not be: psql: FATAL: Ident authentication failed for user "sa".
Instead, psql should open as usual.
Now we download SymmetricDS-2.2.2 itself and unpack it on both servers:
$ unzip symmetric-ds-2.2.2-server.zip
$ cd symmetric-ds-2.2.2/samples
Configure SymmetricDS
We edit the root.properties and client.properties files that lie in this samples directory. They need to uncomment the lines for postgresql, comment out or delete other options, and specify the username and password to the database (we have this sa).
root.properties is needed only on the first node, client.properties - only on the second.
Let's start in order.
Setting up SymmetricDS on the first server
Form the following root.properties file:
root.properties
# The class name for the JDBC Driver
db.driver=org.postgresql.Driver
# The JDBC URL used to connect to the database
db.url=jdbc:postgresql://localhost/mytest
# The user to login as who can create and update tables
db.user=sa
# The password for the user to login as
db.password=sa
registration.url=http://10.0.2.20:8080/sync
sync.url=http://10.0.2.20:8080/sync
# Do not change these for running the demo
group.id=corp
external.id=00000
# Don't muddy the waters with purge logging
job.purge.period.time.ms=7200000
Create the initial database on the root node:
$ psql postgres -c "CREATE database mytest;"
For the purposes of this simple example, we will have in the database one table t2, which will be synchronized (replicated) in both directions. Create this table.
$ ../bin/sym -p root.properties --run-ddl create_sample.xml
create_sample.xml file is used here
<? xml version = "1.0"?>
<! DOCTYPE database SYSTEM "http://db.apache.org/torque/dtd/database.dtd">
<database name = "mytest">
<table name = "t2">
<column name = "id" type = "INTEGER" required = "true" primaryKey = "true" autoIncrement = "true" />
<column name = "mydata" type = "VARCHAR" size = "64" required = "false" />
<column name = "intval" type = "DECIMAL" size = "10.2" required = "false" />
</ table>
</ database>
You also need to create plpgsql handlers in the database - run this script:
$ ./create_func.sh
create_func.sh
#!/bin/sh
psql mytest -U sa -h localhost -c "CREATE FUNCTION plpgsql_call_handler() RETURNS language_handler AS '\$libdir/plpgsql' LANGUAGE C;"
psql mytest -U sa -h localhost -c "CREATE FUNCTION plpgsql_validator(oid) RETURNS void AS '\$libdir/plpgsql' LANGUAGE C;"
psql mytest -U sa -h localhost -c "CREATE TRUSTED PROCEDURAL LANGUAGE plpgsql HANDLER plpgsql_call_handler VALIDATOR plpgsql_validator;"
Now you can initialize the SymmetricDS system on the root node:
$ ../bin/sym -p root.properties --auto-create
The output will be something like this:
[sa@CENTOS1 samples]$ ../bin/sym -p root.properties --auto-create
Log output will be written to ../logs/symmetric.log
SymmetricLauncher - Option: name=properties, value={root.properties}
SymmetricLauncher - Option: name=auto-create, value={}
PlatformFactory - The name/version pair returned for the database, PostgreSQL8,
was not mapped to a known database platform. Defaulting to using just the database type of PostgreSql
PostgreSqlDbDialect - The DbDialect being used is org.jumpmind.symmetric.db.postgresql.PostgreSqlDbDialect
ConfigurationService - Initializing SymmetricDS database.
PostgreSqlDbDialect - There are SymmetricDS tables missing. They will be auto created.
PostgreSqlDbDialect - Starting auto update of SymmetricDS tables.
PostgreSqlDbDialect - Just installed sym_triggers_disabled
PostgreSqlDbDialect - Just installed sym_node_disabled
PostgreSqlDbDialect - Just installed sym_fn_sym_largeobject
ConfigurationService - Auto-configuring config channel.
ConfigurationService - Auto-configuring reload channel.
ConfigurationService - Done initializing SymmetricDS database.
Everything worked out. This script created as many as 26 tables for your work. These tables start with sym_ and you can see their list by entering the command
$ psql mytest -c "select tablename from pg_tables where tablename like 'sym%';"
Further. Configure symmetricDS by populating the newly created tables.
$ ../bin/sym -p root.properties --run-sql insert_sample.sql
The data to fill out are:
insert_sample.sql
insert into t2(mydata, intval) values('sdsdsdsds', 102);
--
-- Nodes
--
insert into sym_node_group (node_group_id, description)
values ('corp', 'Central Office');
insert into sym_node_group (node_group_id, description)
values ('store', 'Store');
insert into sym_node_group_link (source_node_group_id, target_node_group_id, data_event_action)
values ('store', 'corp', 'P');
insert into sym_node_group_link (source_node_group_id, target_node_group_id, data_event_action)
values ('corp', 'store', 'W');
insert into sym_node (node_id, node_group_id, external_id, sync_enabled)
values ('00000', 'corp', '00000', 1);
insert into sym_node_identity values ('00000');
--
-- Channels
--
insert into sym_channel
(channel_id, processing_order, max_batch_size, enabled, description)
values('channel_t2', 1, 100000, 1, 't2 data from register and back office');
--
-- Triggers
--
insert into sym_trigger
(trigger_id,source_table_name,channel_id,last_update_time,create_time)
values('trigger_t2','t2','channel_t2',current_timestamp,current_timestamp);
-- Example of a "dead" trigger, which is used to only sync the table during initial load
insert into sym_trigger
(trigger_id,source_table_name,channel_id, sync_on_insert, sync_on_update, sync_on_delete, last_update_time,create_time)
values('t2_dead','t2','channel_t2',0,0,0,current_timestamp,current_timestamp);
--
-- Routers
--
-- In this example, both routers pass everything all the time.
insert into sym_router
(router_id,source_node_group_id,target_node_group_id,create_time,last_update_time)
values('corp_store_identity', 'corp', 'store', current_timestamp, current_timestamp);
insert into sym_router
(router_id,source_node_group_id,target_node_group_id,create_time,last_update_time)
values('store_corp_identity', 'store', 'corp', current_timestamp, current_timestamp);
--
-- Trigger Router Links
--
insert into sym_trigger_router
(trigger_id,router_id,initial_load_order,last_update_time,create_time)
values('trigger_t2','corp_store_identity',100,current_timestamp,current_timestamp);
insert into sym_trigger_router
(trigger_id,router_id,initial_load_order,last_update_time,create_time)
values('trigger_t2','store_corp_identity', 200, current_timestamp, current_timestamp);
-- Example of a "dead" trigger, which is used to only sync the table during initial load
insert into sym_trigger_router
(trigger_id,router_id,initial_load_order,last_update_time,create_time)
values('trouter_dead','corp_store_identity', 300, current_timestamp, current_timestamp);
What does the above code mean?
In essence, we create a configuration for SymmetricDS β triggers for changing values ββin table fields. Each time the source_table_name table is changed (in our case, this is table t2), which is listed in the SymmetricDS configuration in the sym_trigger table, the SymmetricDS code is triggered. It determines which routes are associated with this particular trigger (see sym_trigger_router). If only one route is specified, for example, from the first server to the second, then the changed data on the second server will not be delivered to the first. In our case there are two channels, in both directions: from the first server to the second and from the second to the first (see sym_router).
We start the replication server itself on the first server. This command will not give up control, it will work and dump logs on the screen.
$ ../bin/sym -p root.properties --port 8080 --server
We allow registration on the first server (we start from another terminal window)
$ ../bin/sym -p root.properties --open-registration "store,1"
Everything. Setting up the replication server on the root node is complete.
Setting up SymmetricDS on the second server
Go to the second server and make the settings there. Create such a file in the symmetric-ds-2.2.2 / samples directory:
client.properties
# The class name for the JDBC Driver
db.driver=org.postgresql.Driver
# The JDBC URL used to connect to the database
db.url=jdbc:postgresql://localhost/mytest
# The user to login as who can create and update tables
db.user=sa
# The password for the user to login as
db.password=sa
# The HTTP URL of the root node to contact for registration
registration.url=http://10.0.2.20:8080/sync
# Do not change these for running the demo
group.id=store
external.id=1
job.routing.period.time.ms=2000
# This is how often the push job will be run.
job.push.period.time.ms=5000
# This is how often the pull job will be run.
job.pull.period.time.ms=5000
We create the same database on the second server:
$ psql postgres -c "CREATE database mytest;"
and create the same triggers and functions in it:
$ ../bin/sym -p client.properties --run-ddl create_sample.xml
$ ./create_func.sh
Run the replication process on the second server.
$ ../bin/sym -p client.properties --port 9090 --server
At this stage, our databases have not yet been replicated, replication has not begun. Let's rate it. go to the first server and type:
$ psql mytest -c "select * from t2;"
This command will bring us the contents of the table on the root node, there will be one record created by the insert_sample.sql script, which we run only on the root node.
Go to the second server, type the same command there. The table will be empty.
Now the most basic thing: we throw an initial set of data on the client, entering this command from the first server, and thus we start replication!
$ ../bin/sym -p root.properties --reload-node 1
In a couple of seconds we will have the same contents on table t2 on the second server! Hooray!
If something went wrong with you, for example, you entered the wrong IP address of the server for registration, then the easiest thing to do is to kill our database (along with all 26 SymmetricDS tables) and start everything from the beginning. To do this, call the command on both servers:
$ psql postgres -c "drop databse mytest;"
and start again from this line (look for it above :)
$ psql postgres -c "CREATE database mytest;"
Real load simulation
Let's write a perl script to upload information to a table on the root server.
To simulate the actual load, it will produce one hundred UPDATE per second, in a loop.
test.pl
#! / usr / bin / perl
use DBI;
my $ dbh = DBI-> connect ("DBI: Pg: dbname = mytest", "sa", "sa");
for (my $ idval = 5000; $ idval <9000; $ idval ++)
{
$ dbh-> do ("insert into t2 (id, mydata, intval) values ββ($ idval, 'some data', $ idval)");
}
my $ newval = 3434;
my $ interval = 0;
for (my $ j = 0; $ j <100000; $ j ++)
{
for (my $ idval = 5000; $ idval <9000; $ idval ++)
{
$ newval ++;
$ interval ++;
$ dbh-> do ("UPDATE t2 set intval = $ newval where id = $ idval");
printf ("UPD [% 04d] id =% d val =% d \ r", $ j, $ idval, $ newval);
if ($ interval == 100)
{
$ interval = 0;
`ping localhost -w 1> / dev / null 2> & 1`;
}
}
}
$ dbh-> disconnect;
print "\ nOK \ n";
By running this script on root node, the table will be constantly changing. On the second server, we will be able to see how the changes are sent there, with some delay, for example, like this:
$ psql mytest -c "select * from t2 where id=5000;"
The intval value for this table row will lag behind what is printed by the script running on the first server.
Problems under high load
In this example, we, in fact, did not take into account the problems of high load on the database server. When replicating, SymmetricDS keeps a record of all data sent to the remote database in its sym_data table. This table contains a separate entry for each insert / update that runs on the root database. Thus, with high traffic (say, hundreds of inserts per second), the size of the sym_data table starts to grow. The growth of the table leads to an increase in replication overhead.
To solve this problem, you need to configure the Purge Service in Symmetric DS. The key parameters of this service that need to be specified in the root.properties file are:
start.purge.job
job.purge.period.time.ms (how often Purge Job will run)
purge.retention.minutes (how long the history of changes already sent to the second database will be kept)
The last two parameters directly affect the maximum number of entries in the sym_data table: it will be equal to purge.retention.minutes * insert / update number per minute.
The PurgeService should automatically start along with other services, such as Push and Poll (which provide data synchronization in the databases). But, if for some reason the Purge Service does not start, this leads to an excessive growth of the service tables SymmetricDS, so you need to start the Purge Service manually, preferably during periods of low load on the database server:
../bin/sym βp root.properties βX
Successes.