At once I will make a reservation, this is not a teaching post and not a proclamation of a new paradigm)), rather, the decision to which I arrived, and I want to discuss it in a broad and honest discussion.
Now to the point, imagine that there is some ORM written in PHP, which describes the
Posts model, which has many-to-many links through intermediate tables with other models:
Comments, Tags, Categories . The question is, in what way is it better to pick up the associated data, all at once or with delayed loading?
In the database community, the prevailing view is that it is better to pick up the data with a single query with a bunch of joines, saying the DBMS is smart, it will figure out how fast it is done and the fewer requests to the database the better. In my practice, there have been cases when several simple queries on high-loaded projects with large tables worked faster than one big one with several associations.
From the ORM side, raising all data with one query is also not the best option, because, almost always, extra data will be raised that is not needed in this place (or even can be prevented, and then they still have to be removed from the set), or you need to have methods like findWithComments, findWithCategoriesAndTags, findWithAllRelations with inevitable duplication.
Thus, we have three ways to load links (model methods):
- One find ($ id) method that always loads all the data in a single query.
- Several methods find ($ id), findWithComments ($ id), findWithTagsAndCategories ($ id) ...
- The find ($ id) method, which loads only the current model + explicit methods for loading getComments (), getTags () links ... and the latter methods work the same way for a single object and for a collection of objects (similar to Composite).
If we talk about the design of a universal ORM, the last option seems to me more correct, and the ORM should not allow any other approach. I'll tell you about the benefits.
- DB caching. Small chunks of data should be taken more often from the cache (due to their reuse in different queries) than large unique queries.
- Application level caching. As a rule, data from different tables have different fading rates, if you use the first method, then you have to focus on data that most quickly becomes irrelevant. If you use the third method, then for each model you can specify your own cache lifetime, plus it gives a more flexible (model-like) cache clearing control (by event or by manual).
- With the growth of data volumes or the number of requests, we have a ready-made architecture for vertical sharding of the database.
- In each case, we can load only the data we need.
- Ready-made architecture for mixed use of SQL and noSQL, for example, we can transfer some models to mongo or redis by rewriting for this a predictable number of methods.
The weak link of such rigid isolation of models is how to make a flexible search at once by many criteria, for example, posts by tags and categories.
In general, I invite you to the discussion, what other disadvantages are there, maybe someone came / left such a decision, would you use such an ORM in your projects?