ActiveRecord a little about rakes, Relations and indices

I want to tell you about sore: about working with AR in general and with Relation in particular; warn against standard garden products that can easily ruin your life and make the code slow and voracious. The narration will be based on Rails 3.2 and ActiveRecord of the same spill. In Rails 4, of course, a lot of new and useful things, but you still need to switch to it, and the foundation is the same anyway.

This material, for the most part, is intended for beginners, because the author is very painful to look at extracting the contents of entire tables into memory in the form of ActiveRecord objects and other shooting of limbs when using AR . Developers who know Zen, the topic is unlikely to benefit, they can only help, adding to it with their examples and edifications.

How many times have they told the world ...

If you started working with Relation (and with any ActiveRecord object in general), then you need to clearly represent one thing: at what point we “embody” the sample, that is, at what point we stop constructing the SQL query. In other words: when data is sampled and we go to in-memory processing. Why is it important? Yes, because it is awkward:

Product.all.find{|p| p.id == 42}

It can hang the server, pick up all the RAM and do many more dirty tricks. And the same, but in other words:
')

 Product.find(42)

will work quickly and without consequences. This way, find and find is not at all the same thing! Why? Yes, because in the first case, we said Product.all and shot ourselves, because it means extracting the entire contents of the products table and building an AR object for each row, creating an array of them, and then looking for it, which is the class method Array (generally speaking, find from Enumerable , but these are details). In the second case, everything is much better: find is the AR method and is intended for searching by pk . That is, we generate a request

 SELECT * FROM products WHERE products.id = 42;

We execute it, we receive one line and all.

What is good and what is bad

Now, having understood why working with AR is a big responsibility, let's figure out how not to shoot yourself in the foot. This is quite simple: you need to use the methods that AR gives us. Here they are: where, select, pluck, includes, joins, scoped, unscoped, find_each and a few more, which can be found in the documentation or in the next hub . But the better not to use the list will be very difficult and, at the same time, very simple: it is undesirable to use everything else, since almost all the remaining variety of methods turns Relation into an Array with all the ensuing consequences.

Simple recipes

Now, I will give a few standard and not very designs that make life easier, but which are very often forgotten. But before asking the reader a question: remember the function has_many. Think about what parameters you know and what are you actively using? List them in your mind, count ... and now the question: do you know how many of them really are?

Answer

24 pieces in Rails3 and 12 in Rails4. The 12pcs difference is made up of methods like where, group , etc., as well as methods for working with pure SQL, which in Rails4 are passed in a block, not in a hash.

Why did I ask this? Yes, to very roughly assess your level and say that if most of the options you know, then the following is unlikely to bring you new knowledge. This assessment is very conditional, therefore, dear reader, do not be angry much, if it seemed to you to be ridiculous / untenable / strange / etc (underline the necessary).

Recipe number one

So now let's go in order. About update_attributes and update_attribute everyone knows (or not all?). The first is to massively update fields with calling validations and callbacks. Nothing of interest. The second one skips all validations, starts callbacks, but can update the value of only one selected field (someone prefer save (validate: false) ). But about update_column and update_all for some reason are often forgotten. This method skips both validations and callbacks and writes directly to the database without any preliminary caresses.

Recipe number two

The comments reminded about the wonderful method of touch . They also often forget about him and write something like

 @product.updated_at = DateTime.now @product.save

 @product.update_attribute(:updated_at, DateTime.now)

Although, for good, for such purposes it is easier to do this:

 @product.touch(:updated_at) #

In addition, touch has its own after_touch callback , as well as the option : touch is present in the belongs_to method.

How to iterate correctly

The hub already talked about find_each, but I can't help but mention it again, because

 product.documents.map{…}

and they are isomorphic, there are a little more than everywhere. The problem with ordinary iterators applied to Relation is only one: they pull everything from the database at once. And this is terrible. In contrast, find_each , by default, drags 1000 pieces at a time and it's just great!

UPD: as already noted in the comments, all methods that are not uniquely projected onto raw-sql are delegated to to_a because of what the entire query is retrieved into memory and working with it is no longer on the DB side, but on the Ruby side.

Tip about default_scope

Wrap the contents of default_scope in a block. Example:

 default_scope where(nullified: false) # ! default_scope { where(nullified: false) } #

What is the difference? The first option is executed right at the server start and if the nullified field was not found in the database, then the server will not take off. The same applies to migrations - they will not pass due to the lack of a field, which, most likely, we just want to add. In the second case, due to the fact that Ruby is lazy, the block will be executed only at the moment of accessing the model and the migration will be performed normally.

Has_many through

Another common patient is

 product.documents.collect(&:lines).flatten

here the product has many documents that have many lines. It often happens that you want to get all the lines of all documents related to the product. And in this case, create the above construction. In this case, you can recall the through option for reports and do the following for the product:

 has_many :lines, through: documents

and then execute

 product.lines

It turns out and clearer and more efficient.

Little about JOIN

Continuing on the topic of joins, let's remember about includes . What is special about it? Yes that is LEFT JOIN . Quite often I see that the left / right join is written explicitly.

 joins("LEFT OUTER JOIN wikis ON wiki_pages.wiki_id=wikis.id")

This of course also works, but pure SQL in RoR was always not in high esteem.

Also, without departing from the cash register, it is necessary to remind about the difference of values in joins and where when used together. Suppose we have a users table, and various entities, for example, products have an author_id field and an author relational report, which has a users table.

 has_one :author, class: 'User', foreign_key: 'author_id' #  ,

The following code for this case will not work

 products.joins(:author).where(author: {id: 42})

Why? Because joins indicates the name of the relation that joins, and where the condition is imposed on the table and you need to say

 where(users: {id: 42})

You can avoid this by explicitly specifying 'AS author' in the join, but this will again be pure SQL.

Next, look at joins from a different angle. Whatever we do not join, in the end we get the objects of the class with which it all began:

 Product.joins(:documents, :files, :etc).first

In this case, we get the product regardless of the number of joins. Some people are saddened by this behavior, since they would like to get fields from the tables. And they start doing the same query from the other side: take documents, join them with products, write pure SQL for communication with other entities, generally invent the bicycle when the correct and logical code was written at the very beginning. Therefore, I recall the very basis:

 Product.joins(:documents, :files, :etc).where(...).pluck('documents.type')

Here we get an array with the desired field from the database. Pros: minimum requests, no AR objects are created. Minuses: in Rails 3 pluck takes only 1 (one) parameter and this

 pluck('documents.type', 'files.filename', 'files.path')

can only be done in Rails 4.

Build reports

We now turn to the consideration of working with the build- nd relation. In general, everything is quite simple:

 product.documencts.build(type: 'article', etc: 'etc').lines.build(content: '...')

After calling product.save , we will save all associations along with validations, preference and courtesans. In all this joyful action there is one nuance: all this is good when the product is not readonly and / or there are no other restrictions on preservation. In such cases, many are satisfied with the garden, similar to the garden with joins in the example above. That is, create a document , bind it to the product and build the lines for the document. It turns out that the default behavior, which is usually tied to error handling, does not work. Therefore, in the appendage, all this is immediately surrounded with crutches, forwarding errors and it turns out pretty disgusting. What to do in this case? We need to remember about autosave and understand how it works. Without going into details I will say that it works on callback . Therefore, there is a way to keep the relation for the above described product:

 product.autosave_associated_records_for_documents

In this case, the document will be saved, its callbacks will be called to save the lines, etc.

Some words about indexes

Lastly, I need to say about the indexes, because many people have beaten their heads against hard objects because of problems on the basis of indexes. Immediately I apologize for interfering with ActiveRecord and the possibilities of the database, but according to my personal conviction: it is impossible to work well with AR, not realizing what is happening at this moment on the side of the database.

Problem one

For some reason, many people believe that the order for Relation does not depend on which column we sort by. A variation on this misconception is the lack of understanding of the difference between order Relation and order Array . Because of this, you can meet the default_scope with an order for the VARCHAR field and questions in the spirit: “Why is it that your page loads so slowly? There are only a couple of records retrieved from the database! ”. The problem here is that defaulting sorting is damn expensive if we don't have an index on this column. By default, AR sorts by pk . This happens when we do.

 Products.first

But pk has an index almost always and there are no problems. But when we say that it will do order (: name) for any access to the model, problems begin.
For reference : if you explain “on the fingers”, then when sorting by the indexed column, the real sorting does not occur, it is already present in the database and the data is immediately sent in the correct order.

Problem two

Composite indexes. Not everyone knows about them and even fewer people know why they are needed. In short, a composite index is an index based on two or more DB fields. Where can it come in handy? Two frequent places of use:

polymorphic associations
intermediate table of relations "many to many".

About polymorphic links have been told here . For them, very often, it is convenient to create a composite index. Here is a slightly updated example from off-line :

 class CreatePictures < ActiveRecord::Migration def change create_table :pictures do |t| t.string :name t.integer :imageable_id t.string :imageable_type t.timestamps end add_index :pictures, [:imageable_id, :imageable_type] #     end end

Here are a few words about the difference between the ordinary and composite index. Further, I will not go into details, because the topic is for a separate hub. Besides, before me everything was painted.
Now about the intermediate table of relations. All known HBTM . Here, in some cases, it is appropriate to hang the composite index on assemblies_parts (see the HBTM link). But we must remember that the sequence of fields in the composite index is known. Details here .

Problem three

"Indices are needed everywhere!" It occurs not so often, but causes terrible brakes of everything and everything. It must be remembered that the index is not a panacea and guaranteed x10-x100 to speed, but a tool that needs to be used in the right places, and not to wave it over your head and shove it into each hole. Here you can read about the types of indexes, and here you can find out why they are generally needed.

Behind this all

Thank you for reading to the end. Write about typos and inaccuracies in HP, I will be glad to fix it. I would also be happy if you share your experience and what you need to remember and what is better to use in different situations during development.

Source: https://habr.com/ru/post/242225/

All Articles