... or the right work with collections.
I want to tell you about the errors that I saw on almost every project on Magento that had performance problems. Working with Magento, I sometimes have to audit someone else's code. Therefore, I would like to share with you an experience that will help improve the performance of your sites and avoid mistakes in the future.
This article is about Magento 1. *, but the described is also suitable for Magento 2. *.
In almost every project where there are performance problems, you can come across something like this:
')
$temp = array(); $collection = Mage::getModel('catalog/product')->getCollection()->addAttributeToSelect('*'); foreach ($collection as $product) { $product = $product->load($product->getId()); $temp[] = $product->getSku(); }
Wronginstead
$temp = array(); $collection = Mage::getModel('catalog/product')->getCollection()->addAttributeToSelect('sku'); foreach ($collection as $product) { $temp[] = $product->getSku(); }
RightThe reasons for this are very simple:
- No attributes needed after upload.
- So do the "programmers" on the Internet
- Downloading extra attributes on a “no worse basis”
To understand what is wrong here and what we can do with performance, I suggest concentrating on working with collections:
- Eav / Flat tables
- Cache
- Proper work with collections
And of course the conclusions.
EAV / Flat tables
EAV is a data storage approach where the entity to which the attribute belongs, the attribute itself and its value are spaced apart in different tables.
In Magento, EAV entities include: products, categories, customers, and customer addresses. The attributes themselves are stored in the eav_attribute table.
Total attribute value types in Magento 5: text, varchar, int, decimal and datetime. There is 1 more type - static, it differs from the other 5 in that it is in a table with an entity.
The attribute table indicates which table or type of an attribute is present in, and Magento already knows where to write it and where to read it from.
Such a storage of values allows you to have simply settable attribute sets (when each entity can have its own attribute or not have it at all), adding a new attribute is just another line in the database. Added a new value for 1 attribute for another store - a new line in the table of values of this attribute.
How it is stored in the databaseEntity:
Product - catalog_product_entity,
Category - catalog_category_entity,
Customer - customer_entity,
Customer address - customer_address_entity
Attribute:
eav_attribute
catalog_eav_attribute
customer_eav_attribute
Value:
* _text
* _varchar
* _int
* _decimal
* _datetime
Flat is the usual approach for all of us, where everything lies in one place and no additional tables are needed to get the product and all its attributes without unnecessary work - SELECT * FROM WHERE label id = some kind of id and that's it.
From EAV entities, the Flat view can be used only for categories and for products.
How it is stored in the databaseProduct:
catalog_product_flat_1 // * _N store_view
Category:
catalog_category_flat_1 // * _N store_view
In order to include an attribute in the Flat table and generally enable the use of Flat tables, do the followingIn the admin panel of Catalog> Attributes> Manage attributesMagento will add an attribute to the Flat table if the attribute has 1 of the following values.
In the admin System> Configuration> CatalogMagento will use Flat tables for the entities listed below.

Note the following facts:- Flat tables are used ONLY on the category pages, the list of products in the Group product, and indeed everywhere where the collection is used. They are not used on the product page, in the admin, when using the load method on the model.
- After the inclusion of Flat tables, it is necessary to re-index, otherwise Magento will continue to use only EAV tables.
- After enabling Flat Tables, Magento continues to use EAV anyway, but also starts copying changes to the Flat table while saving changes.
Why is all this necessary and why not use the Flat approach everywhere? Look at the summary table of pros and cons.EAV:
+ More flexible system than Flat
+ When adding a new attribute, there is no need to re-index the data.
+ Virtually unlimited attributes
+ All attributes are always available.
+ Static attributes (sku, created_at, updated_at) are always present in the sample, even if they are not specified
- Fatal error: Call to a member function getBackend () when sampling / filtering by a non-existing attribute
- Performance
Flat:
+ Performance
+ Only existing attributes that have been added to the Flat table can be applied to the selection / filtering.
- A limit on the size of the row (up to 65,535 bytes, i.e. 85 varchar 255) and the number of columns (InnoDB up to 1000, some up to 4096)
- Used only when working with collections (EAV is always used when loading)
- The result is different from issuing a request for EAV (there are no static attributes)
- After activation, re-indexing is required, otherwise EAV tables will be used
- When adding a new attribute, it is necessary to re-index Flat tables.
Cache
Of course, each of you can tell me why we need to figure out how to speed up queries in the database and, in general, how collections work if the cache will save us and everything will be cached. I will answer shortly - the cache will not save you. None of the caches presented in Magento either caches collections automatically or does not work in your custom controllers and models that you use, for example, when importing data or counting something. And besides, before it gets into the cache, you need to somehow put it in there and quickly show it to the user.
Types of caches in Magento 1. *:

- Configuration - caches configuration files
- Layout - caches layout files
- Block HTML output - caches phtml templates. The default is used on the frontend only in the top menu and footer.
- Translations - csv translate files cached
- Collections data - caches collections that use the -> initCache (...) method. By default, only core_store, core_store_group, core_website collections are cached during initialization.
- EAV types and attributes - must cache eav attributes, but does not cache. Used in 1 method that has never been called since Magneto CE 1.4
- Web services cache - caches api.xml files
- Page Cache (FPC) - caches all HTML, caches only CMS, Category, Product pages. Ignored if https protocol, get parameter? No_cache = 1, cookie NO_CACHE
- DDL Cache (Hidden) - caches DESCRIBE calls to the database, used in write operations
... and neither caches collections automatically.
Proper work with collections
In order to show more clearly why something needs to be done differently than many are used to, I decided to give some performance tests of different approaches. Let's start with the test bench. For testing, I used:
Test bench:OS X 10.10
3.1 GHz Intel Core i5 (4 cores)
8GB
Magento configuration:Magento EE 1.14.0
MySQL 5.5.38
PHP 5.6.2
Content:3 Categories
2000 Products
2000 CMS pages
Process:For tests, an extension with 1 controller and 1 action was created, each test was performed 5 times, then the average time was calculated. All results are shown in seconds.
class Test_Test_IndexController extends Mage_Core_Controller_Front_Action { public function indexAction() { $temp = array(); $start = microtime(true); Init values Loop start $temp[] = $product->getSku(); Loop end Or Some code snippet $stop = microtime(true); echo $stop - $start; } }
Pseudo codeTests
- EAV / Flat with and without model reload
- Collection caching
- Proper use of count () and getSize ()
- Proper use of getFirstItem and setPage (1,1)
EAV / Flat with and without model reload
The cycle of the collection. With load (reload) models inside the loop: $temp = array(); $collection = Mage::getModel('catalog/product')->getCollection()->addAttributeToSelect(...); foreach ($collection as $product) { $product = $product->load($product->getId()); $temp[] = $product->getSku(); }
The cycle of the collection. Without load models inside: $temp = array(); $collection = Mage::getModel('catalog/product')->getCollection()->addAttributeToSelect(...); foreach ($collection as $product) { $temp[] = $product->getSku(); }
3 types of data sampling:- addAttributeToSelect ('*'); // all attributes
- addAttributeToSelect ('sku'); // 1 static attribute
- addAttributeToSelect ('name'); // 1 standard attribute
results
As you probably noticed, the time without rebooting the models is several times less than when you reload the models. Also, the time is even shorter when the Flat tables are turned on (i.e. there are no unnecessary joins and unions) and we select only the necessary attributes.
In the first case, we perform a download with a bunch of joins ... and then do it again, but for the model and so 2000 times.
The second time we do this is for attribute statics (it is in the same label as the product itself) and Magento does not need to make joins. Therefore, time is less.
The third time Magento need to add another nameplate where this attribute is stored.
With Flat tables, everything is the same, and in 2 cases everything is identical - this is because both attributes are in table 1, hence the time is identical.
I think the numbers speak for themselves.
Collection caching
Without cache: $collection = Mage::getModel('catalog/product')->getCollection() ->addAttributeToSelect('*');
Using the initCache method: $collection = Mage::getModel('catalog/product')->getCollection() ->addAttributeToSelect('*') ->initCache(Mage::app()->getCache(),'our_data',array('SOME_TAGS'));
Custom caching implementation: $cache = Mage::app()->getCache(); $collection = $cache->load('our_data'); if(!collection) { $collection = Mage::getModel('collection/product')->getCollection()->addAttributeToSelect('*')->getItems(); $cache->save(serialize($collection),'our_data',array(Mage_Core_Model_Resource_Db_Collection_Abstract::CACHE_TAG)); } else { $collection = unserialize($collection); }
Consider a sample without using a cache, using the method that Magento offers us and with a crutch, which I have never seen ... the pile itself, based on the methods of the model cache. Please note that for all the tests, after making a query, I downloaded the data and converted the collection to an array of objects.
results
Without the cache itself is not surprising ... everything is as usual.
But using the Magentov cache, I was personally surprised when I saw that time had become more. And about EAV, caching is generally a silly undertaking, because the EAV collection first loads entities from the product table (this is what is cached), and then selects the attribute values and fills the objects with a separate query. In Flat there everything from 1 table is being chased. But nevertheless, the time is spent on working with the cache more than from the database (I tested it both with the file system and with redis - the differences are the 4th decimal point ... that is, it does not exist on 2k entities). The essence of the InitCache method is that it first collects all the data into the collection itself (pagination, filters, events, and so on), creates a hash from the sql query and will search it in the cache, and if there is something there, then it is anseralizes, and then all the events and subsequent methods are launched. This is the slowest procedure in the whole process; it is here that the cache is slower than a simple query in the database. But it does not send a request to the database ... which is not so scary already.
Separately, there is an example of the cache written by me on my knee, where we cache the final result of the collection, and bypassing all the events and reloading attributes. This works for EAV and for Flat collections.
Proper use of count () and getSize ()
getSize () $size = Mage::getModel('catalog/product')->getCollection() ->addAttributeToSelect('*') ->getSize();
count () $size = Mage::getModel('catalog/product')->getCollection() ->addAttributeToSelect('*') ->count();
results
The difference in methods is that
count () loads all the objects in the collection, and then the usual count counts the number of objects and returns the number to us.
getSize does not load the collection, but generates 1 more query to the database, where there are no limits, orders and a list of selectable attributes, there is only COUNT (*).
An example of using both methods is:
If you need to know if there are any values in the database or how many there are, use getSize, if in any case you need a loaded collection, or already loaded, use count () - it will return you the number of elements loaded into the collection.
Proper use of getFirstItem and setPage (1,1)
getFirstItem () $product = Mage::getModel('catalog/product')->getCollection() ->getFirstItem();
setPage (1,1) $product = Mage::getModel('catalog/product')->getCollection() ->setPage(1,1) ->getFirstItem();
load () $product = Mage::getModel('catalog/product')->load(22);
results
The problem with getFirstItem is that it loads the entire collection, and then simply returns the first item in foreach, and if it is not there, it returns an empty object.
setPage (also known as $ this-> setCurPage ($ pageNum) -> setPageSize ($ pageSize)) limits the selection to exactly 1 record, which, as you can see, significantly speeds up the loading of the result.
Even load is faster than getFirstItem, but note that load was slower than selecting one item from the collection. This is due to the fact that load always works with EAV tables.
findings
Summarizing everything written above, I want to advise all people working with Magento:
- Never call the load method again on objects obtained from the collection.
- Load only necessary attributes.
- If applicable to the project, use flat tables.
- Use count to count the results of the loaded collection and getSize to get the number of all records.
- Do not use the getFirstItem method without setPage (1,1) or similar methods.