πŸ“œ ⬆️ ⬇️

Replication from Percona Server for MySQL to PostgreSQL using the pg_chameleon tool

Replication is one of the well-known functions that allow you to create an identical copy of the database. It is supported in virtually any relational database management system (RDBMS). The ability to replicate provides significant benefits, especially high availability and load sharing. But what if replication is required between two databases (DB) with different structure, such as MySQL and PostgreSQL? Is it possible to continuously replicate changes from a MySQL database to a PostgreSQL database? The answer to this question is the replication tool pg_chameleon .


image


For continuous replication of changes, pg_chameleon uses the MySQL replication library, which allows you to get logical copies of rows from a MySQL database that are converted to a jsonb object. The pl / pgsql function in Postgres decodes the jsonb object and reproduces the changes in the Postgres database. To configure this type of replication, the binlog_format variable for the MySQL database must be ROW (string).


A few things to know before setting up this tool:


  1. Tables that need to be replicated must have a primary key.
  2. The tool works on PostgreSQL versions above 9.5 and MySQL systems above 5.5
  3. To set up such replication, the binlog_format variable must be set to ROW.
  4. Python version must be higher than 3.3

When replication starts, pg_chameleon receives data from MySQL in CSV format with grouping into groups of a certain length in order to avoid memory overload. This data is reset to Postgres with the COPY command (copy). If the copy fails, the INSERT (paste) command is executed, which can slow down the process. If the INSERT command fails, the string is lost.


To replicate changes from MySQL, pg_chameleon simulates the behavior of a replica (slave) of MySQL. This creates a schema in Postgres, performs initial data loading, connects to the MySQL replication protocol, copies of rows are stored in the Postgres table. At the same time, the corresponding Postgres functions provide decoding of strings and making changes. This is similar to storing transfer logs in Postgres tables and applying them to the Postgres schema. Creating a Postgres database schema using any data description languages ​​is not required. For tables specified during replication setup, the pg_chameleon tool does this automatically. If you need to convert any types in a specific way, you can specify this in the configuration file.


The following is an exercise that can be experimented with. Use the suggested options if it fully meets your requirements. We carried out such tests on CentOS Linux OS version 7.4.


Environment preparation


Configure Percona Server for MySQL


Install MySQL version 5.7 and add the appropriate parameters for replication.


In this exercise, I installed Percona Server for MySQL version 5.7 using the YUM repository.


yum install http://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm yum install Percona-Server-server-57 echo "mysql ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers usermod -s /bin/bash mysql sudo su - mysql 

Pg_chameleon requires configuration of the following parameters in the my.cnf file (file with MySQL server parameters). You can add the following parameters to the /etc/my.cnf file


 binlog_format= ROW binlog_row_image=FULL log-bin = mysql-bin server-id = 1 

Now, after including the above parameters in the my.cnf file, start the MySQL server.


 $ service mysql start 

Obtain a temporary password for the root account from the mysqld.log file and reset the root password using the mysqladmin command.


 $ grep "temporary" /var/log/mysqld.log $ mysqladmin -u root -p password 'Secret123!' 

Now connect to your own MySQL database instance and create a sample schema / tables. I also created an emp table for validation.


 $ wget http://downloads.mysql.com/docs/sakila-db.tar.gz $ tar -xzf sakila-db.tar.gz $ mysql -uroot -pSecret123! < sakila-db/sakila-schema.sql $ mysql -uroot -pSecret123! < sakila-db/sakila-data.sql $ mysql -uroot -pSecret123! sakila -e "create table emp (id int PRIMARY KEY, first_name varchar(20), last_name varchar(20))" 

Create a user to configure replication using the pg_chameleon tool and grant it permissions by following the steps below.


 $ mysql -uroot -p create user 'usr_replica'@'%' identified by 'Secret123!'; GRANT ALL ON sakila.* TO 'usr_replica'@'%'; GRANT RELOAD, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'usr_replica'@'%'; FLUSH PRIVILEGES; 

When creating a user on the MySQL server ('usr_replica' @ '%'), it may be necessary to replace the β€œ%” symbol with the corresponding IP address or host name of the server that pg_chameleon is running on.


Configure PostgreSQL


Install PostgreSQL and run a copy of the database.


To install PostgreSQL version 10.x, follow these steps.


 yum install https://yum.postgresql.org/10/redhat/rhel-7.4-x86_64/pgdg-centos10-10-2.noarch.rpm yum install postgresql10* su - postgres $/usr/pgsql-10/bin/initdb $ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start 

As you can see from the following logs, you need to create a user in PostgreSQL, with which pg_chameleon will be able to write the changed data to PostgreSQL. Also create a target database.


 postgres=# CREATE USER usr_replica WITH ENCRYPTED PASSWORD 'secret'; CREATE ROLE postgres=# CREATE DATABASE db_replica WITH OWNER usr_replica; CREATE DATABASE 

Steps for installing and configuring replication using pg_chameleon


Step 1. In this exercise, I installed the Python language interpreter version 3.6 and pg_chameleon version 2.0.8 by performing the following steps. If you already have the necessary version of the Python interpreter installed, you can skip the steps to install it. We can create a virtual environment if the OS does not include Python version 3.x by default.


 yum install gcc openssl-devel bzip2-devel wget cd /usr/src wget https://www.python.org/ftp/python/3.6.6/Python-3.6.6.tgz tar xzf Python-3.6.6.tgz cd Python-3.6.6 ./configure --enable-optimizations make altinstall python3.6 -m venv venv source venv/bin/activate pip install pip --upgrade pip install pg_chameleon 

Step 2. This tool requires a configuration file that stores information about the source and target servers, and a folder for storing logs. To make the pg_chameleon tool create a configuration file template and corresponding folders, use the following command.


 $ chameleon set_configuration_files 

When executing this command, the following results are displayed. They show that this command has created several folders and files in the place from which you started it.


 creating directory /var/lib/pgsql/.pg_chameleon creating directory /var/lib/pgsql/.pg_chameleon/configuration/ creating directory /var/lib/pgsql/.pg_chameleon/logs/ creating directory /var/lib/pgsql/.pg_chameleon/pid/ copying configuration example in /var/lib/pgsql/.pg_chameleon/configuration//config-example.yml 

Copy the sample configuration file to another file, say, default.yml


 $ cd .pg_chameleon/configuration/ $ cp config-example.yml default.yml 

This is what my default.yml file looks like after including all the necessary parameters. In this file, you can optionally specify data type conversion, tables that should be skipped during replication, and data manipulation language events that should be ignored for the selected list of tables.


 --- #global settings pid_dir: '~/.pg_chameleon/pid/' log_dir: '~/.pg_chameleon/logs/' log_dest: file log_level: info log_days_keep: 10 rollbar_key: '' rollbar_env: '' # type_override allows the user to override the default type conversion into a different one. type_override: "tinyint(1)": override_to: boolean override_tables: - "*" #postgres destination connection pg_conn: host: "localhost" port: "5432" user: "usr_replica" password: "secret" database: "db_replica" charset: "utf8" sources: mysql: db_conn: host: "localhost" port: "3306" user: "usr_replica" password: "Secret123!" charset: 'utf8' connect_timeout: 10 schema_mappings: sakila: sch_sakila limit_tables: # - delphis_mediterranea.foo skip_tables: # - delphis_mediterranea.bar grant_select_to: - usr_readonly lock_timeout: "120s" my_server_id: 100 replica_batch_size: 10000 replay_max_rows: 10000 batch_retention: '1 day' copy_max_memory: "300M" copy_mode: 'file' out_dir: /tmp sleep_loop: 1 on_error_replay: continue on_error_read: continue auto_maintenance: "disabled" gtid_enable: No type: mysql skip_events: insert: # - delphis_mediterranea.foo #skips inserts on the table delphis_mediterranea.foo delete: # - delphis_mediterranea #skips deletes on schema delphis_mediterranea update: 

Step 3. Create a replica (target database) using the command:


 $ chameleon create_replica_schema --debug 

The above command creates the schema and nine tables in the PostgreSQL database specified in the .pg_chameleon / configuration / default.yml file. These tables are required to manage replication from the source database to the target database. The same can be seen in the next journal.


 db_replica=# \dn List of schemas Name | Owner ---------------+------------- public | postgres sch_chameleon | target_user (2 rows) db_replica=# \dt sch_chameleon.t_* List of relations Schema | Name | Type | Owner ---------------+------------------+-------+------------- sch_chameleon | t_batch_events | table | target_user sch_chameleon | t_discarded_rows | table | target_user sch_chameleon | t_error_log | table | target_user sch_chameleon | t_last_received | table | target_user sch_chameleon | t_last_replayed | table | target_user sch_chameleon | t_log_replica | table | target_user sch_chameleon | t_replica_batch | table | target_user sch_chameleon | t_replica_tables | table | target_user sch_chameleon | t_sources | table | target_user (9 rows) 

Step 4. Add the source database data to pg_chameleon using the following command. Specify the name of the source database as specified in the configuration file. In this example, the source database name is mysql, and the target is Postgres database, defined as pg_conn.


 $ chameleon add_source --config default --source mysql --debug 

After executing this command, you will see that the source database data is added to the t_sources table.


 db_replica=# select * from sch_chameleon.t_sources; -[ RECORD 1 ]-------+---------------------------------------------- i_id_source | 1 t_source | mysql jsb_schema_mappings | {"sakila": "sch_sakila"} enm_status | ready t_binlog_name | i_binlog_position | b_consistent | t b_paused | f b_maintenance | f ts_last_maintenance | enm_source_type | mysql v_log_table | {t_log_replica_mysql_1,t_log_replica_mysql_2} $ chameleon show_status --config default Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay ----------- ------------- ------ -------- ------------ ---------- ----------- ------------ ------------- 1 mysql mysql ready Yes N/AN/A 

Step 5. Initialize the replica (target database) using the following command. Specify the source database from which changes are replicated to the PostgreSQL database.


 $ chameleon init_replica --config default --source mysql --debug 

Initialization includes the following tasks on the MySQL server (source).


  1. Clear the table cache and set a read-only lock
  2. Get the coordinates of the source database
  3. Copy the data
  4. Unlock

The above command automatically creates a schema of the target Postgres database.
In the default.yml file, we mentioned the following schema mappings (schema_mappings).


 schema_mappings: sakila: sch_sakila 

A new scott scheme is now created in the target database db_replica.


 db_replica=# \dn List of schemas Name | Owner ---------------+------------- public | postgres sch_chameleon | usr_replica sch_sakila | usr_replica (3 rows) 

Step 6. Now start replication using the following command.


 $ chameleon start_replica --config default --source mysql 

Step 7: Check replication status and errors using the following commands.


 $ chameleon show_status --config default $ chameleon show_errors 

Here is the state of replication:


 $ chameleon show_status --source mysql Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay ----------- ------------- ------ -------- ------------ ---------- ----------- ------------ ------------- 1 mysql mysql running No N/AN/A == Schema mappings == Origin schema Destination schema --------------- -------------------- sakila sch_sakila == Replica status == --------------------- --- Tables not replicated 0 Tables replicated 17 All tables 17 Last maintenance N/A Next maintenance N/A Replayed rows Replayed DDL Skipped rows 

Now you can see that the changes are constantly replicated from the MySQL database to the PostgreSQL database.


Step 8. To verify, you can insert an entry into the MySQL database table that we created to verify replication in the Postgres database.


 $ mysql -u root -pSecret123! -e "INSERT INTO sakila.emp VALUES (1,'avinash','vallarapu')" mysql: [Warning] Using a password on the command line interface can be insecure. $ psql -d db_replica -c "select * from sch_sakila.emp" id | first_name | last_name ----+------------+----------- 1 | avinash | vallarapu (1 row) 

The above log shows that the record inserted into the MySQL table was replicated to the table in the Postgres database.


You can also add multiple source databases for replication to a Postgres target database.


Link


From this documentation you can get information about the many additional features available in pg_chameleon.


')

Source: https://habr.com/ru/post/425705/


All Articles