Fetching data from ORM is easy! Or not?

Introduction

Virtually any information system interacts with external data warehouses in one way or another. In most cases, this is a relational database, and, often, an ORM framework is used to work with data. ORM eliminates most of the routine operations, instead offering a small set of additional abstractions for working with data.

Martin Fowler published an interesting article , one of the key thoughts there: “ORMs help us solve a large number of tasks in enterprise applications ... This tool is not pretty, but the problems with which it deals are also not pretty. I think ORM deserve more respect and more understanding. ”

We use ORM very intensively in the CUBA framework, so we know firsthand about the problems and limitations of this technology, since CUBA is used in various projects around the world. There are many topics that can be discussed in connection with ORM, but we will focus on one of them: the choice between “lazy” (lazy) and “greedy” (eager) ways of sampling data. Let's talk about different approaches to solving this problem with illustrations from JPA API and Spring, and also tell you how (and why exactly) ORM is used in CUBA and what kind of work we are doing to improve the work with the data in our framework.

Data sampling: lazy or not?

If there is only one entity in your data model, then you most likely will not notice any problems when working with ORM. Let's look at a small example. Suppose we have a User () entity, which has two attributes: ID and Name () :

 public class User { @Id @GeneratedValue private int id; private String name; //Getters and Setters here }

To pull an instance of this entity from the database, we only need to call one method of the EntityManager object:

 EntityManager em = entityManagerFactory.createEntityManager(); User user = em.find(User.class, id);

Things get a little more interesting when the one-to-many relationship appears:

 public class User { @Id @GeneratedValue private int id; private String name; @OneToMany private List<Address> addresses; //Getters and Setters here }

If we need to extract a user instance from the database, the question arises: “Do we also choose addresses?”. And the “right” answer is here: “Depends on ...” In some cases, we will need addresses, in some - not. Ordinarily, ORM provides two ways to fetch dependent records: lazy and greedy. By default, most ORMs use the lazy method. But, if we write this code:

 EntityManager em = entityManagerFactory.createEntityManager(); User user = em.find(User.class, 1); em.close(); System.out.println(user.getAddresses().get(0));

... then we get an exception “LazyInitException” , which terribly confuses newbies who have just started working with ORM. And here comes the moment when you need to begin a story about what “Attached” and “Detached” are instances of an entity, what are sessions and transactions.
Yeah, then the entity must be “attached” to the session so that you can select dependent data. Well, let's not immediately close the transaction, and life will immediately become easier. And here another problem arises - transactions become longer, which increases the risk of interlocking. Make transactions shorter? It’s possible, but if you create many, many small transactions, you’ll get a “Tale about Komar Komarovich - a long nose and a hairy Misha - a short tail” about how a horde of bear bear mosquitoes won - so it’s with the database. If the number of small transactions increases significantly, then performance problems will arise.
As it was said, when retrieving data about a user, the addresses may or may not be required, therefore, depending on the business logic, one must either select the collection or not. We need to add new conditions to the code ... Hmmm ... Something somehow all gets complicated.

So, and if to try other type of selection?

 public class User { @Id @GeneratedValue private int id; private String name; @OneToMany(fetch = FetchType.EAGER) private List<Address> addresses; //Getters and Setters here }

Well ... I can not say that it will help a lot. Yes, we’ll get rid of the hated LazyInit and don’t need to check if the entity is attached to the session or not. But now we may have performance problems, because we don’t always need addresses, and we still select these objects in the server’s memory.
Any more ideas?

Spring jdbc

Some developers are so tired of ORM that they switch to alternative frameworks. For example, on Spring JDBC, which provides the ability to convert relational data into object data in “semi-automatic” mode. The developer writes requests for each case where one or another set of attributes is needed (or the same code is reused for cases where the same data structures are needed).

This gives us more flexibility. For example, you can select only one attribute without creating the corresponding entity object:

 String name = this.jdbcTemplate.queryForObject( "select name from t_user where id = ?", new Object[]{1L}, String.class);

Or choose an object in its usual form:

 User user = this.jdbcTemplate.queryForObject( "select id, name from t_user where id = ?", new Object[]{1L}, new RowMapper<User>() { public User mapRow(ResultSet rs, int rowNum) throws SQLException { User user = new User(); user.setName(rs.getString("name")); user.setId(rs.getInt("id")); return user; } });

You can also select a list of addresses for the user, you just need to write a little more code and correctly compile a SQL query to avoid the problem of n + 1 queries .

Soooo, again complicated. Yes, we control all queries and how data is mapped to objects, but you need to write more code, learn SQL and know how queries are executed in the database. Personally, I think that knowledge of SQL is an obligatory skill for an application programmer, but not everyone thinks so, and I'm not going to engage in controversy. After all, knowledge of the x86 assembly instructions these days is also optional. Let's better think about how to make life easier for programmers.

JPA EntityGraph

And let's take a step back and think about what we need at all? It seems that we just need to specify what attributes we really need in each particular case. Well, let's do it! JPA 2.1 has a new API - EntityGraph (entity graph). The idea is very simple: we use annotations to describe what we will choose from the base. Here is an example:

 @Entity @NamedEntityGraphs({ @NamedEntityGraph(name = "user-only-entity-graph"), @NamedEntityGraph(name = "user-addresses-entity-graph", attributeNodes = {@NamedAttributeNode("addresses")}) }) public class User { @Id @GeneratedValue private int id; private String name; @OneToMany(fetch = FetchType.LAZY) private Set<Address> addresses; //Getters and Setters here }

Two graphs are described for this entity: the user-only-entity-graph does not select the Addresses attribute (labeled lazy), while the second graph tells ORM to select this attribute. If we mark Addresses as eager, the count will be ignored and addresses will be selected anyway.

So, in JPA 2.1 you can do data sampling like this:

 EntityManager em = entityManagerFactory.createEntityManager(); EntityGraph graph = em.getEntityGraph("user-addresses-entity-graph"); Map<String, Object> properties = Map.of("javax.persistence.fetchgraph", graph); User user = em.find(User.class, 1, properties); em.close();

This approach greatly simplifies the work, you do not need to think separately about the lazy attributes, and the length of the transaction. An additional bonus is the graph applied at the SQL query level, so “extra” data is not selected in a Java application. But there is one small problem: you cannot say which attributes were selected and which were not. To check there is an API, this is done using the PersistenceUtil class:

 PersistenceUtil pu = entityManagerFactory.getPersistenceUnitUtil(); System.out.println("User.addresses loaded: " + pu.isLoaded(user, "addresses"));

But it is rather sad and not everyone is ready to do such checks. Is it possible to simplify something and just not show attributes that were not selected?

Spring projections

There is a great thing in the Spring Framework called “ Projections ” (and this is not the same thing as the projections in Hibernate ). If you need to select only some attributes of an entity, an interface is created with the necessary attributes, and Spring selects the “instances” of this interface from the database. As an example, consider the following interface:

 interface NamesOnly { String getName(); }

You can now define the Spring JPA repository to select User entities as follows:

 interface UserRepository extends CrudRepository<User, Integer> { Collection<NamesOnly> findByName(String lastname); }

In this case, after calling the findByName method, in the resulting list we will get entities that have access open only to attributes that are defined in the interface! By the same principle, it is possible to choose dependent entities, i.e. We immediately select the “master-detail” relationship. Moreover, Spring generates the “correct” SQL in most cases, i.e. Only those attributes that are described in the projection are selected from the database; this is very similar to how entity graphs work.
This is a very powerful API, when defining interfaces you can use SpEL expressions, use classes with some built-in logic instead of interfaces, and much more, the documentation describes everything in great detail.
The only problem with the projections is that inside they are implemented as “key-value” pairs, i.e. are read only. And this means that even if we define the setter method for the projection, we will not be able to save the changes either through the CRUD repository or through the EntityManager. So the projections are those DTOs that can be converted back to Entity and saved only if you write your own code for this.

How data is selected in CUBA

From the very beginning of the development of the CUBA framework, we tried to optimize a part of the code that works from the database. In CUBA, we use EclipseLink as the basis for the data access API. What is good about EclipseLink is that it supported partial loading of entities from the very beginning, and this was a decisive factor in the choice between it and Hibernate. In EclipseLink, it was possible to specify attributes to load long before the JPA 2.1 standard appeared. CUBA has its own way of describing an entity graph, called CUBA Views (CUBA views) . Views CUBA is a rather advanced API, you can inherit some views from others, combine them, applying both to the master and to the detail entities. Another motivation for creating CUBA views is that we wanted to use short transactions so that we could work with detached entities in the user web interface.
In CUBA, views are described in an XML file, as in the example below:

 <view class="com.sample.User" extends="_minimal" name="user-minimal-view"> <property name="name"/> <property name="addresses" view="address-street-only-view"/> </property> </view>

This view selects the User entity and its local name attribute, and also selects addresses by applying the address-street-only-view representation to them. All this happens (attention!) At the SQL query level. When a view is created, you can use it in retrieving data using the DataManager class:

 List<User> users = dataManager.load(User.class).view("user-edit-view").list();

This approach works fine, while economically consuming network traffic, since unused attributes are simply not transferred from database to application, but, as in the case of JPA, there is a problem: you cannot say which attributes of the entity were loaded. And in CUBA there is an exception “IllegalStateException: Cannot get unfetched attribute [...] from detached object” , which, like LazyInit , must have been met by everyone who writes using our framework. As in JPA, there are ways to check which attributes have been loaded and which are not, but, again, writing such checks is a tedious, painstaking exercise, which is very frustrating for developers. We need to come up with something else in order not to burden people with work that, in theory, machines can do.

Concept - CUBA View Interfaces

But what if all the same to try to combine the graphs of entities and projections? We decided to try to do this and developed interfaces for representing entities (entity view interfaces) that repeat the approach from the Spring projections. These interfaces are translated to CUBA views when the application is started and can be used in DataManager. The idea is simple: we describe the interface (or set of interfaces), which is the entity graph.

 interface UserMinimalView extends BaseEntityView<User, Integer> { String getName(); void setName(String val); List<AddressStreetOnly> getAddresses(); interface AddressStreetOnly extends BaseEntityView<Address, Integer> { String getStreet(); void setStreet(String street); } }

It should be noted that for some specific cases, you can make local interfaces, as in the case of AddressStreetOnly from the example above, in order not to “pollute” the public API of your application.

In the process of starting the CUBA application (most of which is the initialization of the Spring context), we programmatically create CUBA views and put them into the internal bin repository in the context.
Now, we need to slightly change the implementation of the DataManager class so that it accepts interface views, and you can select entities as follows:

 List<UserMinimalView> users = dataManager.load(UserMinimalView.class).list();

Under the hood, a proxy object is generated that implements the interface and wraps the entity instance selected from the database (in much the same way as in Hibernate). And, when the developer requests an attribute value, the proxy delegates the method call to the “real” instance of the entity.

When developing this concept, we are trying to kill two birds with one stone:

Data that is not described in the interface is not loaded into the application, thereby we save server resources.
The developer can use only those attributes that are accessible via the interface (and, therefore, selected from the database), thereby UnfetchedAttribute exceptions, about which we wrote above.

Unlike the Spring projections, we wrap the entities in proxy objects; in addition, each interface inherits the standard CUBA interface - Entity . This means that the attributes of the Entity View can be changed, and then save these changes to the database using the standard CUBA API for working with data.
And, by the way, “the third hare” - you can make attributes read-only if you define an interface with getter methods only. Thus, we are already at the API level of the entity, we set the modification rules.
In addition, you can do some local operations for detached entities using the available attributes, for example, name string conversion, as in the example below:

 @MetaProperty default String getNameLowercase() { return getName().toLowerCase(); }

Note that computed attributes can be removed from the entity class model and transferred to interfaces that apply to specific business logic.

Another interesting feature is interface inheritance. You can make several representations with different sets of attributes, and then combine them. For example, you can create an interface for a User entity with name and email attributes, and another with name and addresses attributes. Now, if you need to choose name, email and addresses, then you do not need to copy these attributes to the third interface, you just need to inherit from the first two views. And yes, instances of the third interface can be passed to methods that take parameters with the type of parent interface, the OOP rules are the same for everyone.

A conversion between views was also implemented - in each interface there is a reload () method, to which you can pass the view class as a parameter:

 UserFullView userFull = userMinimal.reload(UserFullView.class);

The UserFullView may contain additional attributes, so the entity will be reloaded from the database, if necessary. And this process is deferred. A call to the database will be made only when the first call to the attributes of the entity occurs. This will slightly slow down the first call, but this approach was deliberately chosen - if an entity instance is used in the “web” module, which contains the UI and its own REST controllers, then this module can be deployed on a separate server. This means that the forced overload of the entity will create additional network traffic - access to the core module and then to the database. Thus, postponing overload until the moment when it is necessary, we save traffic and reduce the number of queries to the database.

The concept is designed as a module for CUBA, an example of use can be downloaded from GitHub .

Conclusion

It seems that in the near future we will still be massively using ORM in enterprise applications simply because we need something that will turn relational data into objects. Of course, for complex, unique, super-loaded applications, some specific solutions will be developed, but it seems that ORM frameworks will live as much as relational databases.
In CUBA, we try to simplify the work with ORM to the maximum, and in the next versions we will introduce new features for working with data. Whether it will be interface-representations or something else is difficult to say now, but I am sure of one thing: we will continue to simplify working with data in future versions of the framework.

Source: https://habr.com/ru/post/451986/

All Articles