📜 ⬆️ ⬇️

ORM or object-relational projector

Today we bring to your attention an excerpt from the book by Sergey Tarasov “Defragmentation of the brain. Soft-building from the inside ” , which is being prepared for release in our publishing house.

Hiding the database or how to cross a hedgehog with a snake


Having mentioned one of the largest pillars of modern soft-building - the world of OOP, one cannot ignore the other - the world of relational databases. I deliberately inserted the adjective “relational” in relation to all major DBMSs, although in the 1970s such a generalization would be inappropriate.

Nevertheless, it was precisely relational DBMS that in the 1980s managed to free programmers from knowing unnecessary details of organizing physical data storage, isolating themselves from them by logical level structures and the standardized SQL language to access information. It also turned out that most of the data formats used by the programs fit well on the model of two-dimensional tables and the relationships between them. These two factors predetermined the success of relational DBMSs, and as a rewarding prize the community received a rigorous mathematical theory in the basis of the technology.
')

Unlike the relational world, the PLO developed by practical engineers rather spontaneously, based on the needs of the programming community, and therefore had no strict theory under it. Attempts to bring one such to the PLO retroactively failed. The maximum result was achieved by the authors declared by the UML standard, which, however, is still mainly used as illustrative code for pictures. But a bad standard is better than none.

Both relational and object models refer to the logical level of software system design. They are orthogonal and in essence represent two views on the same essence. This means that you can implement the same system while remaining in the framework of only one relational-procedural approach, or by following exclusively the OOP.
In practice, a situation has arisen where programs are written mainly using OOP, whereas data is stored in relational databases. Without touching on the question of the feasibility of such a crossover “hedgehog with a snake”, we take the situation as a given. From which it follows the necessity of mapping (projection) of objects on relational structures and back.

In view of the absence of the formal theoretical base under the PLO, this task is unsolvable in the general case, but is feasible in particular. The component of the software system that implements the mapping is called ORM (Object-Relational Mapping) or the object-relational projector - ORP. A full ORM can be a very non-trivial component, exceeding the complexity of the rest of the system. Therefore, although many developers have successfully used their own private implementations, in the industry over the past 10 years, several widely used frameworks have emerged that perform the projection task as well.

An overview of object-relational projection tools was not planned within the framework of the book. They are already enough in the network, including my little one, made back in 2005, but not very outdated. Therefore, the following examples will mainly concern the NHibernate framework.

There is a very important point in the technology of displaying objects on an RDBMS, the understanding of which largely determines the success of your project. I have repeatedly heard the opinion of programmers that, for a domain layer, the projector-generated SQL is the equivalent of translating a high-level language into an assembler of a target processor. This opinion is not just deeply erroneous, but at a rapid pace leads the team to the creation of difficult-to-maintain systems with inherent and practically unrecoverable performance problems.

Simply put, as soon as you think about SQL as a kind of assembler in relation to the OOP language you use, you immediately get into a very bad story.

SQL is a high-level declarative specialized language of the fourth generation, in contrast to the same Java or C #, still belonging to the third generation of imperative languages. A single SQL statement for three dozen lines that performs something more complicated than sampling by key will require more than C # lines to achieve the same result at times, if not an order of magnitude.

This situation leads ORM developers to create their own SQL-like language for manipulating objects and already translate it into a sequel (HQL - Hibernate Query Language - SQL-like query language used in Hibernate / NHibernate). Or use SQL directly with dynamic conversion of the result into a collection of objects.

Otherwise, the application programmer is doomed to retrieve from the database and then process large data arrays directly in his application. Similarly, the table data was processed in the absence of embedded SQL by developers on early versions of Clipper in the late 80s. There it was called "navigation processing." I think the term is appropriate here.

In the era of mass transition from Clipper-like programs and file-server technologies to client-server RDBMS, many applications and their developers continued to use the navigation approach. Applications worked slowly, often blocking work in a multi-user environment. Because for effective work with RDBMS, it is necessary to use approaches oriented to processing sets on the server, assuming that the developer has the skills to work with declarative languages.

However, having received the ORM, the programmer often returns to the navigation approaches of processing arrays of data outside the RDBMS only with the difference that now this array, I hope, does not represent the contents of the whole table.

Why? The lack of knowledge of RDBMS is attempted to be replaced by additional levels of abstractions. In fact, the opposite happens: the levels of abstraction do not hide the details of the object storage layer from the programmer, but, on the contrary, its incompetence in the field of databases from the DBMS. Until some time.

Despite the thick layer of abstractions provided by an ORM like Hibernate, it is almost impossible to make an application work effectively with RDBMS without the knowledge of the relevant principles of the orthogonal world and the SQL language.

But attempts continue. Some are still developing projectors for their internal needs, often very light. Others are looking for simplification and access to noSQL. But the winners are still having “base” competencies programmers and consultants who earn on those who do not possess them.

How ORM is commonly used


On soft-building presentations, they often draw beautiful schemes for the separation of presentation layers, business logic and stored data. The blue dream of a novice programmer is to use only one medium and language to develop all layers and forget about the need for knowledge of relational DBMS, reducing their purpose to some kind of “intelligent file system.” The word SQL causes negative associations associated with something ancient, not to mention triggers or stored procedures. Good people appear on the horizon, with books of different gurus about domain-oriented development under the arm, telling the newcomers something like: “Guys, relational DBMS is a relic of a protracted era 30 years ago. Now everything is built on the PLO. And there is a wonderful thing - ORM. Start using it and forget about the heavy legacy of the past! ”

The guys accept the offer. Further evolution of the development of the system is approximately as follows.
Initially, the ORM framework is selected for display. Already at this stage, it turns out that the situation is bad with theory and standards. It would be good to be on the alert, but a presentation showing how in 10 minutes you can create the basis of an application such as a contacts notebook is fascinating. Resolved!

We begin to implement the domain model. Add classes, properties, relationships. We generate database structure or connect to existing one. We build an interface for managing objects of type CRUD. Everything is quite simple. At least it seems quite comparable with the manipulations of the DataSet. Those who know about them, of course, because not everyone suspects about the existence of tabular forms of data life in the application outside of the DBGrid display grids.

As soon as the developers have implemented the CRUD-logic, the main action begins. Using the sequel directly is now difficult. Without touching the display strategies and the problems of portability of the application between the DBMS, in fact, each SQL query with connections, rising into the domain, is accompanied by a specific projection of the table result on the class created on this occasion. Therefore, you have to use your own query language ORM. Non-standard, without debugging and profiling tools. If it is, the language is generally available in this ORM. To support the appropriate integration environment. NET allows you to use LINQ, which allows you to catch some errors at the compilation stage.

Compare the expressiveness of the language with a simple example, which I will leave without comment.

SQL
SELECT * FROM task_queue WHERE id_task IN (2, 3, 15) AND id_task_origin = 10 


NHibernate HQL
 IList<TaskQueue> queues = session .CreateQuery("from TaskQueue where Task.Id in (2, 3, 15) and TaskOrigin.Id = 10") .List<TaskQueue>(); 


NHibernate without HQL with criteria
 IList<TaskQueue> queues = session.CreateCriteria() .Add(Expression.In("Task.Id", someTasks.ToArray())) .Add(Expression.Eq("TaskOrigin.Id", 10)) .List<TaskQueue>(); 

LINQ (NHibernate)
 IList<TaskQueue> queues = session .Query<TaskQueue>() .Where(q => someTasks.Contains(q.Task.Id) && q.TaskOrigin.Id == 10).ToList(); 


Suddenly it turns out that its own query language generates far from optimal SQL. When the database is relatively small, hundreds of thousands of records in the longest tables, and the queries are not too complex, even a non-optimal sequel will not cause obvious problems in many cases. The user will wait a little.

However, requests like “select employees whose salary during the last year did not exceed the average for the previous year” already cause problems at the level of the embedded language. Then the developers often go the only possible way: we select a collection of objects and, in cycles, we filter and calculate, calling the methods of related objects. Or use the same LINQ over the selected array. The number of intermediate short SQL queries to the DBMS during such processing of collections can amount to tens of thousands.

Source: https://habr.com/ru/post/165327/


All Articles