In this article, I want to share the experience of integrating the extensions in SQLite code. All actions were performed in Ubuntu OS 11.10.
Problem
In fts3 SQLite there is a simple stemmer that implements
Porter’s stemming algorithm, but there is no implementation for Russian words. Those. MATCH cannot find entries containing the word 'hotel', etc.
Preparing to compile
What is needed
- source sqlite3 from the repository ;
- our C language stemmer (see below);
- optional readline library (libreadline), if you need a history of input commands for the console client.
Further it is supposed that source codes of sqlite3 lie in $ HOME / SQLite.
')
Code stemmer
Encoding of Russian characters UTF-8.
Stemmer uses Porter's built-in stemmer for Latin words, and implements a similar algorithm for Russian words.
Initially, the code was written for C ++ and loaded as an extension for SQLite. I modified it so that you can compile it on the C compiler, so it's very far from beautiful and rigorous here. Here's what I got:
fts3_porter_ext.cPut our stemmer in $ HOME / SQLite / ext / fts3 / fts3_porter_ext.c
Edit files
Makefile.in
Rule $ HOME / SQLite / Makefile.in.
- We add to the variable LIBOBJS0 stemmer fts3_porter_ext.lo
- Add $ (TOP) /ext/fts3/fts3_porter_ext.c to the SRC variable
- We write the rule for the assembly fts3_porter_ext.lo:
fts3_porter_ext.lo: $(TOP)/ext/fts3/fts3_porter_ext.c $(HDR) $(EXTHDR)
$(LTCOMPILE) -DSQLITE_CORE -c $(TOP)/ext/fts3/fts3_porter_ext.c
fts3.c
Rule $ HOME / SQLite / ext / fts3 / fts3.c.
Add after line
void sqlite3Fts3PorterTokenizerModule(sqlite3_tokenizer_module const**ppModule);
the string
void sqlite3Fts3PorterTokenizerModule1(sqlite3_tokenizer_module const**ppModule);
After line
sqlite3Fts3PorterTokenizerModule(&pPorter);
Add initialization of our module
const sqlite3_tokenizer_module *pPorter1 = 0;
sqlite3Fts3PorterTokenizerModule1(&pPorter1);
Finally after
|| sqlite3Fts3HashInsert(pHash, "porter", 7, (void *)pPorter)
add our module to the hash of embedded tokenizers
|| sqlite3Fts3HashInsert(pHash, "russian", 8, (void *)pPorter1)
mkfts3amal.tcl
Rule $ HOME / SQLite / ext / fts3 / mkfts3amal.tcl
After line
fts3_tokenizer1.c
Add
fts3_porter_ext.c
mksqlite3c.tcl
Rule $ HOME / SQLite / tool / mksqlite3c.tcl
After line
fts3_tokenizer1.c
Add
fts3_porter_ext.c
Compilation
Perform the following (--prefix = $ HOME is better to replace with something more sane. This will be the installation path)
cd $HOME/SQLite && mkdir build && cd build && ../configure --prefix=$HOME CFLAGS='-DSQLITE_SOUNDEX -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS' && make
Now we’ll check that our stemmer is in sqlite3.c
grep fts3_porter_ext.c sqlite3.c
It should get something like this:
/************** Begin file fts3_porter_ext.c *********************************/
/************** End of fts3_porter_ext.c *************************************/
Now install sqlite3 on the computer:
sudo make install
Using
When creating fts3 tables, you need to specify our stemmer, for example:
CREATE VIRTUAL TABLE tag_fti USING fts3(name, tokenize=russian);
Now, with MATCH queries on the tag_fti table, our stemmer will be used.
Total
We received 2 files sqlite3.c and sqlite3.h, which can be connected to our projects.
No need to load extension modules.
We received a console client that correctly processes requests to the fts3 tables that our applications will create. The opposite is also true that the tables created by the console client will be processed by our applications.
I would be glad if the article for someone will be useful.
Upd: corrected links