📜 ⬆️ ⬇️

Optimization of LIKE expression when using Sqlite in iOS application

Recently, I was faced with the task of optimizing a query to Sqlite in my iOS application.
The task was as follows. There was a list of documents (PDF files), a dictionary (a list of words), and a link between documents and words from a dictionary (the entry of words into a document). It was necessary to implement a search and display a list of documents in which there is an entered word.

The structure of the base was as follows:

CREATE TABLE document ( id Int PRIMARY KEY NOT NULL, root_id Int, name Varchar(100), active Tinyint ); CREATE INDEX IDX_documentId ON document (id); CREATE INDEX IDX_documentName ON document (name); CREATE TABLE dictionary ( id Int PRIMARY KEY NOT NULL, word Varchar(100) NOT NULL ); CREATE INDEX IDX_dictionaryId ON dictionary (id); CREATE UNIQUE INDEX IDX_dictionaryWord ON dictionary (word ASC); CREATE TABLE document_index ( id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, document_id Int NOT NULL, word_id Int NOT NULL, FOREIGN KEY(document_id) REFERENCES document(id), FOREIGN KEY(word_id) REFERENCES dictionary(id) ); CREATE INDEX IDX_documentIndexId ON document_index (id); CREATE INDEX IDX_documentIndexDocId ON document_index (document_id); CREATE INDEX IDX_documentIndexWordId ON document_index (word_id); 

A query to retrieve all documents in selected sections that have the specified word:

 SELECT document.id, document.name FROM document INNER JOIN document_index on document_index.document_id=document.id INNER JOIN dictionary on dictionary.id=document_index.word_id WHERE dictionary.word LIKE @pQuery AND document.active = 1 AND document.root_id in (@pRoot1, @pRoot2, @pRoot3, @pRoot4, @pRoot5, @pRoot6, @pRoot7) LIMIT @First, @Count 

With the size of dicrionary ~ = 400K, document ~ = 1K and document_index ~ = 500K entries, the query ran for about 30 seconds on my iPad 2, which was unacceptable for my application.
')
In the process of finding a solution to speed up the execution of the query, I found out that in Sqlite3, when using the LIKE expression, the index is not taken into account, which was not at all good, considering my number of entries. I couldn’t refuse to use LIKE and replace it with a comparison operation with equality, since I needed to search by substring. Then I came across this article , it was proposed to replace the expression LIKE with the operation> = and <+ characters yaz (zzz if the word is in English) in the second expression.

 /*  */ dictionary.word LIKE '%' /*  */ dictionary.word >= '' AND dictionary.word < '' 


With this optimization, the index is taken into account when searching, and all the words starting with 'abakan' are found. The execution time is only 0.5 seconds! Of course there are limitations, it is impossible to implement a search for a substring from the beginning of the string, i.e. I found words starting with any characters and ending in 'abakan', but in my particular case it was not necessary.

Source: https://habr.com/ru/post/165663/


All Articles