Ruby on Rails + legacy_migrations: one-way data synchronization between two projects

This article aims to describe the solution of one non-trivial task - automatic one-way synchronization of data in the bases of two projects using Ruby on Rails, the legacy_migrations heme , and relatively direct hands.

initial situation

There is a loaded project that has been written for 3 years in several stages without serious refactoring, which is why the code is swollen and the technologies used are noticeably outdated. It was decided to rewrite the project from scratch on everything new.

Old project:

Rails 2.3.4 (later updated to 2.3.12 + dependency control via bundler)
MySQL 5
Sphinx, Delayed Job, AR sendmailer, interlock + memcached for caching and the rest trivially

New project:

Rails 3.1.0.rc5 (for now)
Postgresql 8.4 (possibly 9 later)
It’s too early to talk about trifles, but Solr, Redis, resque

The main difficulty is the synchronization of database content with the ability to move away from the old database architecture in a new project. A lot of options were considered, but the chosen one finally allowed creating an automatic content synchronization system that saves id records, saves timestamps of records, adds new ones and updates existing ones. And with all that, it does not oblige you to strictly copy the schema of the existing database .
')

Gem legacy_migrations

In the process of searching for a semi-ready solution, a convenient gem was found (not without the help of the ror2ru group), written during Rails 2.3.x, which allows you to transfer the contents of arbitrary attributes of one model to another, with the ability to perform arbitrary operations on them. It was a good start, but the testing process revealed significant shortcomings:

id records were not saved (he put the objects in the database in a row starting with id = 1)
Temporary record stamps were not saved (updated_at, created_at)
when re-running the rake task, the data in the table was duplicated under the new id

I want to focus on why to keep the id elements, there is the following way - to create an attribute like old_id and use it to rebind to the new id. But the task is not just to create a new project, but to replace the old project with a new one, and from this follows at least the identity of all urls. To understand how important this is, it’s enough to catch an SEO specialist on the street and tell you that you want to change the URLs on a running project. An unequivocal reaction should follow, which can manifest itself in various forms - from fainting to psychosis :)
To eliminate the identified deficiencies, I made this fork . To better understand what I’ve written there, you can proceed here .
It should be noted that the method of transferring data through the AR model has one major drawback - poor performance, but in my case the base was relatively small (about 400 MB in general), and the server is powerful enough not to abandon this approach.

Transfer process

First of all, I was lucky that when transferring the old project (and parallel upgrading the version of Rails from 2.3.4 to 2.3.12), the databases of both projects were on the same server - there is nothing better for periodic synchronization.

Setting the necessary gems

First you need to make sure that the adapters for both DBMS are entered in the Gemfile:

gem 'mysql2' gem 'pg'

To install legacy_migrations, there are two options - a fork (in which the necessary changes have already been made) or an original gem with the possibility of a hand-made doping (I quote lines from the Gemfile for both options, respectively):

 gem 'legacy_migrations', :git => 'git://github.com/Antiarchitect/legacy_migrations.git' gem 'legacy_migrations', :path => 'vendor/gems/legacy_migrations-0.3.7'

after which you can make changes to the code yourself, and in order for the gem to appear in the path, you need to execute the following command in the root of your application:

 gem unpack legacy_migrations --target vendor/gems

Work principles

The essence of the legacy_migrations work is the following: in the project there must be models from which we take the data and the models in which we mirror these records. Thus, the config / database.yml of the new project will look something like this:

 production: adapter: postgresql encoding: utf8 database: newapp_production username: postgres password: somecomplicatedpassword legacy: adapter: mysql2 encoding: utf8 database: oldapp_production username: root password: anothercomplicatedpassword

Where legacy is the configuration for the base of the old project (the name can be chosen arbitrarily).
Next you should look at the models of the old project and choose a free prefix in order to avoid further confusion. In my case it was the prefix “Old”. Then create the app / models / old directory and put an abstract class there, from which all others will be inherited. Example app / models / old / old_base.rb:

 class OldBase < ActiveRecord::Base self.abstract_class = true establish_connection 'legacy' end

where the argument 'legacy' must match the name of the settings group for the old database in config / database.yml. Thus, all models inherited from the OldBase class (and not directly from ActiveRecord :: Base) will know which database to connect to. Here is an example of one such model:

 class OldNewsDoc < OldBase set_table_name 'news_docs' end

since our classes now have a prefix that was not originally intended, you must directly specify the name of the table.
In order for the classes from app / models / old to automatically load, you need to register this path in config / application.rb like this:

 module NewApp class Application < Rails::Application ... config.autoload_paths += %W(#{config.root}/app/models/old) ... end end

And then everything is quite simply necessary to create a rake task, for example, the following (lib / tasks / legacy.rake):

 require 'legacy_migrations' namespace :legacy do namespace :transfer do desc 'Transfers News Docs from onru to onru2' task :news_docs => :environment do transfer_from OldNewsDoc, :to => NewsDoc do from :id, :to => :id from :updated_at, :to => :updated_at from :created_at, :to => :created_at from :news_rubric_id, :to => :news_rubric_id from :title, :to => :title from :annotation, :to => :annotation from :text, :to => :text end end end end

All - now the launch of the task is possible like this:

 bundle exec rake legacy:transfer:news_docs RAILS_ENV=production

More information about the possibilities of legacy_migrations is worth reading in this author's post .

Process automation

There are possible options, since the rake task is already there, and how to launch it is the tenth thing, but I would like to suggest the option of periodically launching any tasks for the project that I liked the most.

Gem whenever

There is a very convenient gem - whenever - for the purpose of automatically launching tasks for the application through cron, which easily integrates into Capistrano and allows you to adjust the launch of basic things (such as the rake task, runner script or console command) to a specific production environment and write your own types executable tasks.
To do this, you must install whenever (a line from a gemfile):

 gem 'whenever', :require => false

from the root of the application run the command

 wheneverize .

and place in the newly created config / schedule.rb file something like the following:

 job_type :rake, "rvm use ree && cd :path && RAILS_ENV=:environment bundle exec rake :task :output" if environment == 'production' every :day, :at => '2am' do rake "legacy:transfer:news_docs" end end

I rewrote the definition of the rake task for my environment: I use the custom installation rvm and ree as a ruby interpreter (as soon as Rails 3.1 becomes stable I switch to 1.9.2 - while there are some problems), I also use the bundler, so any binary or script should run through bundle exec.
In Capistrano wheneverize integrates as easily and naturally (deploy.rb):

 require 'whenever/capistrano' ... set :whenever_command, "bundle exec whenever" #  ,   bundler -      whenever

After the deployment you can admire the beautiful and neat lines in the crontab:

 crontab -l

PS I hope that the article will be useful for people who are faced with a similar problem of transferring data from one project to another.

Andrey Voronkov, Evrone.com .

Source: https://habr.com/ru/post/126001/

All Articles