πŸ“œ ⬆️ ⬇️

Inheriting tables in Postgresql with Ruby on Rails

Migrating to Postgres Inheritance


What is it and why?


Suppose you have a large news publication that has many different types of materials.


Each type of material has its own model: Topics::Article , Topics::Online , Topics::NewsItem and so on. They will have the same majority of fields, such as title, cover, text, authors. The difference is only in a few specific fields, unique for each type of topic.


Therefore, you do not want to lay them out on separate tables. In addition to the reluctance to create almost completely repetitive tables, there may be several other reasons for this. The need for complex samples with different combinations of these types, UNION waterfalls and polymorphism of connecting models as well.


Under the cat, the experience of organizing similar models within Postgresql, with the result in the form of migration to the inheritance of tables. Shooting a silver bullet in the foot is also present, but without it.


Single table inheritance


The first thing that comes to mind is the classic Single Table Inheritance. In Rails, it automatically turns on if you inherit one model from another.


By creating one common Topics table and adding a type field to it, you can store all classes within a single table.


 class Topic < ActiveRecord::Base end class Topics::Article < Topic end class Topics::NewsItem < Topic end ... 

Common logic (for example, posting material) goes to Topic. Specific to inherited classes.


This simple and run-through scheme has problems with scaling. What if the types of topics are already more than fifteen and each has from two to ten unique fields?


At this point in the DB, purists frown as if from a severe toothache - imagining a table whose lines are always filled by no more than 15-20%.


STI + Jsonb


STI + Jsonb


Jsonb


In Postgresql c 9.4 it is possible to create jsonb type fields. How can this help us?
By adding a data field of this type to topics, we can store all of our additional fields in json keys.


Connect in Rails can be this way:


 class Topics::Online < Topic store_accessor :data, :start_at, :live end 

Now you can do this:


 online = Topics::Online.new(live: true) 

 online.live # => true 

Or directly contact json:


 online['data']['live'] # => true 

Jsonb problems


The enjoyment of success achieved is quickly overshadowed by crutches.


Type conversion


In addition to exotic things like arrays and objects (hashes), jsonb suggests using only Number, String and Boolean for all fields.
For other types of fields, you will have to write additional methods. And if you prefer sugar, then for these fields too.


Predicates:


 def live? live == true end 

More complicated case for datetime:


 def start_at return Time.zone.parse(super) if super.is_a?(String) end 

Here you need to parse the string in time and not break. Because you need to break at the stage of saving data.


Validation of input values


There is no type validation at the database level; you can easily and naturally save this topic:


 online.live = 'Elvis' online.start_at = 'Presley' 

Existing gems, such as activerecord-typedstore, partially solve the problem of string parsing, but do not cope with checking input values ​​at all. Everything needs to be closed from the application by custom validations.


Cumbersome requests


Boolean field:


 scope live, -> { where(live: true) } 

 scope live, -> { where("(params->>'live')::bool = ?", true) } 

And now the dates:


 scope :by_range, ->(start_date, end_date) { where(date: start_date..end_date) } 

 scope :by_range, lambda { |start_date, end_date| where( "to_date(params->>'date', 'YYYY-MM-DD') BETWEEN ? AND ?", start_date, end_date ) } 

In addition to the general monstrosity, this query will also be slower to work, due to the forced use of to_date Postgres.


Uniq


Postgres does not know how to do the usual DISTINCT (.uniq) for records with jsonb, you need to do this:


 .select('DISTINCT ON (tags.id) tags.*') 

No defaults


You have to use different constructions in before_initialize instead of the usual null: false, default: false in the migration.


Connections


Use rail has_many and belongs_to will not work. Need to write something of their own.


At this stage, jsonb received a black mark, and the matter did not reach the trolleybus from a loaf of bread.


Migrating to Postgres Inheritance


Inheritance of tables appeared in Postgres for a long time (most likely it will not be necessary to update the version) and close to the concept of class inheritance.
Only not classes, but tables, and not in Ruby, but in Postgres.


You still have a table of topics, but you are expanding it not through a set of additional fields in this very table, but through additional tables containing only fields unique to each class.


The easiest way to show an example:


 CREATE TABLE topics ( headline text, author_id int ); CREATE TABLE topics_onlines ( status char(2) ) INHERITS (topics); 

By creating topics_onlines we can work with it as with a regular table, which will have all three fields:


 class Topics::Online < Topic # headline, author_id, status end 

It is simple, beautiful and does not require massive rewriting of the code.


Postgres Inheritance + Rails


 SELECT c.tableoid, c.headline, c.author_id FROM topics c 

  tableoid | headline | author_id ----------+-----------+---------- 139793 | Las Vegas | 2174 139793 | Mariposa | 1953 139798 | Madison | 845 

Parent and child tables are linked inside a Postgre via tableoid. We will not see any joins by tableoid in explain, it all works inside Postgres.


From the application, topics_onlines will look like the most common table without inheritance, containing all fields from topics and online-specific fields from topics_onlines .


And this means that from the side of the rail it is only necessary to write a migration creating tables.
And that table inheritance can be used with any framework.


Migration from STI to PGI


To take advantage of all this joy you need to write a migration.


First we need a standard wrapper for sql migration in Rails:


 class CreateInheritanceTablesForTopics < ActiveRecord::Migration def change reversible do |dir| dir.up do ... end end end 

The code is further inserted in place of the dot. In order not to throw a code sheet at once, I will show the migration by chunks.


Trigger for checking for uniqueness on all topic tables


Create, but for now, never use the trigger in Postgresql:


 CREATE OR REPLACE FUNCTION check_for_topic_dups() RETURNS trigger AS $func$ BEGIN PERFORM 1 FROM topics where NEW.id=id; IF FOUND THEN RAISE unique_violation USING MESSAGE = 'Duplicate ID: ' || NEW.id; RETURN NULL; END IF; RETURN NEW; END; $func$ LANGUAGE plpgsql; 

A trigger causes an error if a topic with such an id already exists. This is insurance in case something went wrong.


The most important limitation of PGI is that all child tables do not have indexes and parent table constraints. That is, in this regard, everything really feels like different physical tables.


For our conditions, different types of topics cannot repeat the ID, so this trigger was added. It is optional and needed as insurance.


Creating tables


 Topic.descendants.each do |topic_type| sql = <<-SQL CREATE TABLE #{topic_type.pgi_table_name} ( CHECK (type='#{topic_type}') ) INHERITS (topics); CREATE RULE redirect_insert_to_#{topic_type.table_name} AS ON INSERT TO topics WHERE (type='#{topic_type}') DO INSTEAD INSERT INTO #{topic_type.table_name} VALUES (NEW.*); CREATE TRIGGER check_uniquiness_#{topic_type.table_name} BEFORE INSERT ON #{topic_type.table_name} FOR EACH ROW EXECUTE PROCEDURE check_for_topic_dups(); SQL execute(sql) add_index topic_type.table_name, :id end 


Of course, the migration can be put on a strict diet, and all checks can be pulled out into the Rails themselves.


Add native fields to inherited tables


You can add fields to the new tables using standard migrations:


 Class PopulateTopicsTablesWithFields < ActiveRecord::Migration def change add_column :topics_onlines, :start_at, :datetime add_column :topics_news, :main, :boolean, null: false, default: false end end 

If you are not ready to completely get rid of STI, then the necessary table is written in the topic classes:


 class Topics::Online < Topic self.table_name = :topics_online end 

It remains only to change the type of the scheme to sql:


 # config/application.rb config.active_record.schema_format = :sql 

And everything is ready.


Speed ​​- PGI vs jsonb


Speed ​​- PGI vs jsonb


The final stage would be interesting to evaluate the performance. Since all this was started for the sake of ease of development, not so much time was devoted to testing the PGI speed, but some conclusions can be made.


After the migration, two versions of the application were raised, PGI and the old one with jsonb.
More than 5_000_000 topics in each database.


The number of all topics


The most synthetic example:


PGI:


 Topics::Topic.count (8591.6ms) SELECT COUNT(*) FROM "topics" => 5316226 

Jsonb:


 Topics::Topic.count (8580.1ms) SELECT COUNT(*) FROM "topics" => 5316226 

Do not be surprised at the strange number, the topics were created until the place on the ssd ended.


Number of topics of one type


PGI:


 Gazeta::Topics::Sport::Online.count * (219.5ms) SELECT COUNT(*) FROM "topics_sport_onlines" WHERE "topics_sport_onlines"."type" IN ('Gazeta::Topics::Sport::Online') => 1000000 

Jsonb:


 Gazeta::Topics::Sport::Online.count * (419.0ms) SELECT COUNT(*) FROM "topics" WHERE "topics"."type" IN ('Gazeta::Topics::Sport::Online') => 1000000 

Query on boolean field


The index is not used due to high selectivity.


PGI:


 Gazeta::Topics::Sport::Online.megauho.explain * Gazeta::Topics::Sport::Online Load (1376.2ms) SELECT "topics_sport_onlines".* FROM "topics_sport_onlines" WHERE "topics_sport_onlines"."type" IN ('Gazeta::Topics::Sport::Online') AND "topics_sport_onlines"."megauho" = $1 [["megauho", "t"]] 

Jsonb:


 Gazeta::Topics::Sport::Online.megauho.explain * Gazeta::Topics::Sport::Online Load (5819.6ms) SELECT "topics".* FROM "topics" WHERE "topics"."type" IN ('Gazeta::Topics::Sport::Online') AND ((topics.params->>'megauho')::bool = 't') 

The difference is significant.


Query on boolean field with limit


At least already intersects with the real world.


PGI:


 Gazeta::Topics::Sport::Online.megauho.limit(1000).explain * Gazeta::Topics::Sport::Online Load (9.1ms) SELECT "topics_sport_onlines".* FROM "topics_sport_onlines" WHERE "topics_sport_onlines"."type" IN ('Gazeta::Topics::Sport::Online') AND "topics_sport_onlines"."megauho" = $1 LIMIT 1000 [["megauho", "t"]] 

Jsonb:


 Gazeta::Topics::Sport::Online.megauho.limit(1000).explain * Gazeta::Topics::Sport::Online Load (23.7ms) SELECT "topics".* FROM "topics" WHERE "topics"."type" IN ('Gazeta::Topics::Sport::Online') AND ((topics.params->>'megauho')::bool = 't') LIMIT 1000 

There is a difference.




PGI for low selectivity query


Search by index, returns 123 entries out of a million, Index Scan.


PGI:


 Gazeta::Topics::Sport::Online.megauho.megauho_by_date('2015-12-26').explain * Gazeta::Topics::Sport::Online Load (6.0ms) SELECT "topics_sport_onlines".* FROM "topics_sport_onlines" WHERE "topics_sport_onlines"."type" IN ('Gazeta::Topics::Sport::Online') AND "topics_sport_onlines"."megauho" = $1 AND (topics_sport_onlines.date = '2015-12-26') [["megauho", "t"]] QUERY PLAN ---------- * Index Scan using index_type_megauho_date on topics_sport_onlines (cost=0.42..42.12 rows=20 width=682) Index Cond: (((type)::text = 'Gazeta::Topics::Sport::Online'::text) AND (megauho = true) AND ((date)::text = '2015-12-26'::text)) Filter: megauho (3 rows) 

Jsonb:


 Gazeta::Topics::Sport::Online.megauho.megauho_by_date('2015-12-26').explain * Gazeta::Topics::Sport::Online Load (7.7ms) SELECT "topics".* FROM "topics" WHERE "topics"."type" IN ('Gazeta::Topics::Sport::Online') AND ((topics.params->>'megauho')::bool = 't') AND (topics.params->>'date' = '2015-12-26') QUERY PLAN ---------- * Index Scan using index_type_megauho_date on topics (cost=0.56..217.61 rows=27 width=948) Index Cond: (((type)::text = 'Gazeta::Topics::Sport::Online'::text) AND ((params ->> 'date'::text) = '2015-12-26'::text)) Filter: ((params ->> 'megauho'::text))::boolean (3 rows) 


Properly preparing jsonb


For the task of fully expanding jsonb models can be extremely toxic. Of course, PGI also has its limitations, but for our tasks they were overcome at one time during the migration.


Still, jsonb can be useful for weakly structured data from an external source. These fields do not even need to be defined via the store_accessor , you can simply save them as they are, and then a separate class Builder will collect something useful from them.


For us, sports broadcasts taken from the external api became such data.


')

Source: https://habr.com/ru/post/282676/


All Articles