📜 ⬆️ ⬇️

SQLite and full UNICODE

Probably many people know the embed SQLite database. SQLite fully supports UTF-8 and UTF-16 encodings. But there is one caveat, for string and text fields whose characters extend beyond the ASCII table, case-insensitive LIKE and ORDER BY do not work.
For example:
 sqlite> SELECT s "LIKE s";
 0
while
 sqlite> SELECT "s" LIKE "S";
 one
Let's see how to fix it.

A long google leads us to the ICU extension for SQLite. As can be seen from the readme, the ICU extension replaces the functions upper () and lower (), which are responsible for converting the case of characters. In addition, this extension adds the implementation of the REGEXP operator, for selecting text fields by a regular expression (SQLite at the language level supports the REGEXP operator, but the function that implements it is supplied without implementation, with the expectation of a custom implementation).

To get started with ICU extensions, you first need to compile it into a dynamic library.
This will require the unicode support library icu, and the extension ICU code itself:
For Mac OS (assuming macports is installed):
 $ sudo port install icu
 $ wget http://www.sqlite.org/cvstrac/getfile?f=sqlite/ext/icu/icu.c
 $ gcc -dynamiclib icu.c -o libsqliteicu.dylib `icu-config --cppflags`` icu-config --ldflags`
Or Debian:
 $ sudo apt-get install libicu-dev
 $ wget http://www.sqlite.org/cvstrac/getfile?f=sqlite/ext/icu/icu.c
 $ gcc -shared icu.c -o libsqliteicu.so `icu-config --cppflags`` icu-config --ldflags`

Now run the SQLite3 CLI, and enjoy the result :)
 $ sqlite3
 load the extension
 sqlite> .load libsqliteicu.dylib
 we set Russian collation
 sqlite> SELECT icu_load_collation ('en_US', 'RUSSIAN');
 sqlite> SELECT s "LIKE s";
 one

But in the end, everything is not so simple. To load an extension through the API and not through the CLI, you need to call the function sqlite3_enable_load_extension . If the SQLite driver for your language has a wrapper for this function, or you write in C / C ++, then everything is fine. But the ruby ​​for ruby ​​ruby-sqlite3, this function does not support ...

The first thought is to add this function to the driver :) But, the option was found simpler. It turns out that the ICU extension can be embedded in SQLite.
 $ wget http://www.sqlite.org/sqlite-amalgamation-3.6.13.tar.gz
 $ tar xzfv sqlite *
 $ cd sqlite *
 $ CFLAGS = '- Os -DSQLITE_ENABLE_ICU' CPPFLAGS = `icu-config --cppflags` LDFLAGS =` icu-config --ldflags` ./configure
 $ make && sudo make install

After that, reinstall the adapter for Ruby (to rebuild), and rejoice in solving the problem :)
')
But, of course, not without its drawbacks. Because of its dependence on ICU, the library will weigh several megabytes, which is not very good for an embed database. If sqlite is used for the site (as in my case), then this should not worry.
The solution may be to create your own collation using sqlite3_create_collation .

UPD: For those who use the Ruby Sequel library. As I said above, ICU extension also adds a REGEXP statement. But, if in Sequel try to execute a query with regular expressions, like so
 p DB [: artists] .filter (: name => /^a.*/i).all
then throws an exception that SQLite does not support regular expressions.
To get around this, I wrote a small monkey patch .

Source: https://habr.com/ru/post/57915/


All Articles