Conditional indexing. Optimizing the full-text search process

In this article I want to talk about the integration of Apache Lucene and Hibernate Search. To be more precise, about one of the Hibernate Search mechanisms, which can greatly increase the performance of a full-text search project.

For anyone who has worked with the above technologies, it is no secret that indexing is necessary for full-text search. In other words, when adding and modifying records in the database, it is necessary to add / change indexes, which, in fact, will be used for full-text search. Apache Lucene is responsible for this process. But how do we notify Lutzen that this entity needs to be indexed:

@Entity @Indexed public class SomeEntity { @Id @GeneratedValue private Integer id; @Field private String indexedField; private String unindexedField; //getters and setters }

In the class above, the Indexed annotation says that the entity is indexed by Lutzen. The @Field indicates which fields will be indexed. Since the @Field annotation @Field hung only over the indexedField field, which means that we can carry out full-text search only by this field.
')
Note. For the normal operation of the Lucena, other settings besides the annotation data are necessary. But since the article is not about setting up Lucena as a whole, but only optimizing the indexing process, we’ll omit these details.

Now let's look at an example of indexing some entity. Suppose we have an ad site. And here is our essence:

 @Entity public class Ad { @Id @GeneratedValue private Integer id; private String text; private AdStatus status; //getters and setters }

We want to provide our users with the possibility of full-text search on all site ads To do this, add the appropriate annotations:

 @Entity @Indexed public class Ad { @Id @GeneratedValue private Integer id; @Field private String text; private AdStatus status; //getters and setters }

Now it's time to mention that the ad can have one of the following statuses: DRAFT, ACTIVE, ARCHIVE. After a brief reflection, we come to the decision that the users in the search results only need to display ads in the ACTIVE status. Consider two options for solving this problem. The first - in the forehead. Add the @Field annotation over the status field. And every time we search, we add predicate, which will indicate what this status should be. The disadvantages of this solution are: a noticeable drop in performance with a large number of ads in ARCHIVE and DRAFT status, excessive indexing of entities that are no longer being searched.

Another solution immediately comes to mind - do not index / delete existing indexes for ads in all statuses except ACTIVE. This mechanism will help us and such as interceptors. First, set the task. We want the indexing to occur when the entity changes, depending on the new ad status. Now we proceed to the implementation. Create an AdIndexInterceptor class that implements the EntityIndexingInterceptor interface:

 public class AdIndexInterceptor implements EntityIndexingInterceptor<Ad> { @Override public IndexingOverride onAdd(Ad entity) { if (entity.getStatus() == AdStatus.ACTIVE) { return IndexingOverride.APPLY_DEFAULT; } return IndexingOverride.SKIP; } @Override public IndexingOverride onUpdate(Ad entity) { if (entity.getStatus() == AdStatus.ACTIVE) { return IndexingOverride.UPDATE; } return IndexingOverride.REMOVE; } @Override public IndexingOverride onDelete(Ad entity) { return IndexingOverride.APPLY_DEFAULT; } @Override public IndexingOverride onCollectionUpdate(Ad entity) { return onUpdate(entity); } }

As seen above, the class must implement 4 methods that will be called when adding a record, editing a record, deleting and updating a collection of records, respectively. Each of these methods must return one of the IndexingOverride values, which in turn is enum. There are a total of four values for this enum. I will sign for what happens when you return each of them:

APPLY_DEFAULT - the indexing process continues as if it were held in the absence of an interceptor.
SKIP - indexing does not occur.
UPDATE - the existing index is updated.
REMOVE - the existing index is deleted, the new one is not created.

Now back to the entity class. In order for Luzen to know that before indexing it is necessary to call the appropriate interceptor methods, we add the interceptor attribute to the Indexed annotation above the entity:

 @Entity @Indexed(interceptor = AdIndexingInterceptor.class) public class Ad { @Id @GeneratedValue private Integer id; @Field private String text; private AdStatus status; //getters and setters }

It remains only to correctly document the use of this interceptor, so that the behavior of Lucena was expected for your teammates.

PS In the official documentation, developers indicate that this feature is experimental and its functioning may change depending on user feedback.

Link to official documentation.

Source: https://habr.com/ru/post/247897/

All Articles

Conditional indexing. Optimizing the full-text search process

More articles: