SQLite is a great embedded database (part 3)

The first part is introductory .
The second part is a quick start .

The third part - the subtleties and features.

')
This part is a hodgepodge of all sorts of SQLite features. I gathered here (in my opinion) the most important topics, without understanding of which it is impossible to comprehend SQLite Nirvana.

Since, again, there is a lot of information, the format of the article will be as follows: a small intro into an interesting topic and a link to the home site, where the details are. The site, alas, in English.

Using SQLite in multi-threaded applications

SQLite can be compiled in a single-threaded version (compilation parameter SQLITE_THREADSAFE = 0 ).

In this embodiment, it cannot be simultaneously used from several threads, since the synchronization code is completely absent. What for? For breakneck speed.

You can check whether multithreading is possible via the sqlite3_threadsafe () call: if returned 0, then this is a single-threaded SQLite.

By default, SQLite is compiled with thread support (sqlite3.dll).

There are two ways to use multithreaded SQLite: serialized and multi-thread.

Serialized (you must specify the SQLITE_OPEN_FULLMUTEX flag when opening a connection). In this mode, threads can arbitrarily pull calls to SQLite, no restrictions. But all calls block each other and are processed strictly sequentially.

Multi-thread ( SQLITE_OPEN_NOMUTEX ). In this mode, you cannot use the same connection from several threads simultaneously (but simultaneous use of different connections by different threads is allowed). This mode is usually used.

To learn more

Data format

SQLite database can store (text) data in UTF-8 or UTF-16.

A set of API calls consists of calls that receive UTF-8 ( sqlite3_XXX ) and calls that receive UTF-16 ( sqlite3_XXX16 ).

If the data type of the interface and connection does not match, then the conversion is performed on the fly.

Always use UTF-8.

UNICODE support

By default, there is no support. You need to create your collation (comparison method) in sqlite3_create_collation .
And define your built-in functions like (), upper (), lower () via www.sqlite.org/c3ref/create_function.html .

There is such a project “International Components for Unicode” , ICU.

And some collect SQLite DLL already with it .

How to use ICU in SQLite.

Came across article on Habré .

Data Types and Value Comparison

As already mentioned, SQLIte allows you to write any value to any column.

The value inside the database can belong to one of the following types of storage ( storage class ):
Null
INTEGER (takes 1,2,3,4,6 or 8 bytes),
REAL (floating point number, 8 bytes in IEEE format),
TEXT (a string in the database data format, usually UTF-8),
BLOB (binary data stored "as is").

Sorting order of values of different types:
- NULL least (including other NULL );
- INTEGER and REAL are less than any TEXT and BLOB , they are compared arithmetically to each other;
- TEXT is less than any BLOB , compared to each other on the basis of their collation;
- BLOBs are compared with each other via memcmp () .

SQLite performs implicit type conversions on the fly in several places:
- when a value is entered into a column (the column type specifies a recommendation for conversion);
- when comparing the values between themselves.

A column can have the following casting recommendations : TEXT , NUMERIC , INTEGER , REAL , NONE .

BLOB and NULL values are always entered in any column “as is”.

In the TEXT column, the TEXT values are entered “as is”, the values INTEGER and REAL become strings.
In the column NUMERIC , INTEGER, the numbers are written “as is”, and the lines become numbers if _mog_ (that is, the inverse “lossless” transformation is allowed).
For the REAL column, the rules are similar to INTEGER ( NUMERIC ); The difference is that all numbers are in floating point format.
In the NONE column, the values are entered “as is” (this type is used by default, unless otherwise specified).

When comparing values of different types with each other, additional type conversion may be performed.

When comparing a number with a string, if the string can be converted to a “lossless” number, it becomes a number.

To learn more

I will note here that in SQLite in a unique index there can be any number of NULL values (Oracle agrees with this and MS SQL disagrees).

Database in memory

If in the call to sqlite3_open () to transfer the file name as ": memory:" , then SQLite will create a connection to the new (clean) database in memory .

This connection is absolutely indistinguishable from the connection to the database in the file according to the logic of use: the same set of SQL commands is available.

~~Alas, it is not possible to open two connections to the same database in memory.~~

UPD: Already, it turns out, you can open two connections to one database in memory.

rc = sqlite3_open("file:memdb1?mode=memory&cache=shared", &db);

 ATTACH DATABASE 'file:memdb1?mode=memory&cache=shared' AS aux1;

To learn more.

Joining simultaneously to several databases

To open a connection to the database, use the sqlite3_open () call.

At any time, we can attach another 10 databases to the open connection via the ATTACH DATABASE SQL command .

 sqlite3_open('foo.sqlite3', &db); //       "foo.sqlite3" sqlite3_exec(&db, "ATTACH 'bar.sqlite3' AS bar", ... ); //  "bar.sqlite3"

Now all the database tables in the db1.sqlite3 file have become transparently available in our connection.

To resolve name conflicts, use the name of the attachment (the main base is called “main” ):

 SELECT * FROM main.my_table UNION SELECT * FROM bar.my_table

Nothing prevents a new database from being added to the database and used for caching, etc.

 sqlite3_open('foo.sqlite3', &db); //       "foo.sqlite3" sqlite3_exec(&db, "ATTACH ':memory:' AS mem", ... ); //

To learn more

This is a very useful feature. The joining databases must have the same data format as the main database, otherwise it is an error.

Temporary database

Pass an empty string instead of the file name in sqlite3_open () and a temporary database will be created in the file on disk. Moreover, after closing the connection to the database, it will be deleted from the disk.

DB tweaks through the PRAGMA command

The PRAGMA SQL command is used to specify various settings for the connection or for the database itself:

  PRAGMA name; //     name PRAGMA name = value; //   name  value

Connection setup (obviously) should be carried out immediately after opening and before using it.

A full description of all parameters is here .

I will dwell on the most important things.

 PRAGMA page_size = bytes; //   ;   -       ,       (  4096) PRAGMA cache_size = -kibibytes; //      ,     2000   PRAGMA encoding = "UTF-8"; //   ,   UTF-8 PRAGMA foreign_keys = 1; //   foreign keys,   -  PRAGMA journal_mode = DELETE | TRUNCATE | PERSIST | MEMORY | WAL | OFF; //   , .  PRAGMA synchronous = 0 | OFF | 1 | NORMAL | 2 | FULL; //   , .

Transaction log and commit

Here we come to the topic, the mastery of which immediately takes you to the third level of the master of SQLite.

SQLite carefully maintains the integrity of data in the database ( ACID ), implementing the mechanism for changing data through transactions .

Briefly about transactions: the transaction is either fully rolled out, or completely rolled back. Intermediate states can not be.

If you are not using transactions explicitly ( BEGIN; ...; COMMIT; ), then an implicit transaction is always created . It starts before the execution of the command and is committed immediately after.

From here, by the way, and complaints about the "slowness" of SQLite. SQLite can insert up to 50 thousand records per second, but it cannot record transactions more than ~ 50 per second.

That is why it is not possible to insert records quickly using an implicit transaction.

With the default settings, SQLite ensures the integrity of the database even when the power is turned off during operation.

Such amazing behavior is achieved by logging (a special file) and an ingenious mechanism for synchronizing changes on disk.

A brief update of the data in the database works like this:

- before any modification of the database, SQLite saves the modified pages from the database in a separate file (log), that is, simply copies them there;
- making sure that a copy of the pages is created, SQLite begins to change the database;
- making sure that all changes in the database "reached the disk" and the database became complete, SQLite erases the log.

The atomicity of the transaction mechanism is described in detail here .

If SQLite opens a connection to the database and sees that the log is already there, he realizes that the database is in an incomplete state and automatically rolls back the last transaction.

That is, the database recovery mechanism after failures, in fact, is built into SQLite and works seamlessly for the user.

By default, the log is kept in DELETE mode.

 PRAGMA journal_mode = DELETE

This means that the log file is deleted after the completion of the transaction. The fact that there is a log file in this mode means for SQLite that the transaction was not completed, the database needs to be restored. The log file has the name of the database file to which "-journal" is added.

In TRUNCATE mode, the log file is truncated to zero (on some systems, this works faster than deleting the file).

In PERSIST mode , the beginning of the log file is filled with zeros (at the same time its size does not change and it can take up a lot of space).

In MEMORY mode, the log file is kept in memory and it works quickly, but it does not guarantee recovery of the database in case of failures (there is no copy of the data on the disk).

And you can completely disable the journal ( PRAGMA journal_mode = OFF ). In this situation, the rollback of transactions stops working (the ROLLBACK command) and the base is likely to deteriorate if the program is terminated abnormally.

For a database in memory, the log mode can only be either MEMORY or OFF .

Let's go back a bit. How does SQLite “convince” that the base will always be complete?

We know that modern systems use ingenious caching to improve performance and can postpone writing to disk.

Suppose SQLite has completed writing to the database and wants to erase the log file to mark the fact of a committed transaction.

What if the file is erased before the database is updated?

If during this period the power is turned off, then the log will no longer exist, and the database will not be complete - data loss!

In short, the ingenious mechanism for committing changes must rely on some guarantees from the disk system and the OS.

PRAGMA synchronous sets the degree of SQLite paranoia to that account.

The OFF (or 0) mode means: SQLite assumes that the data is fixed on the disk immediately after it was transferred to the OS (that is, immediately after calling the corresponding OS API).

This means that integrity is guaranteed when the application crashes (as the OS continues to operate), but not when the OS crashes or when the power is turned off.

NORMAL synchronization mode (or 1) ensures integrity during OS crashes and almost all power cuts. There is a non-zero chance that if you lose power at the most inopportune moment, the base will deteriorate. This is a kind of medium compromise in performance and reliability.

FULL mode ensures integrity always and everywhere and in case of any accidents. But it works, of course, more slowly, since waiting periods are made in certain places. And this is the default mode.

So, only the topic of a WAL -type journal remained uncovered .

WAL log mode

By default, the database logging mode is always “returned” to DELETE . Suppose we opened a connection to the database and set the PERSIST mode. Changed the data, closed the connection.

There is a log file on the disk (the beginning of which is filled with zeros).

Open the connection to the database again. If you do not set the log mode in this connection, it will again work in DELETE . As soon as we update the data, the transaction commit mechanism will erase the log file.

WAL logging mode works differently - it is “permanent.” As soon as we transferred the base to WAL mode, it will remain in this mode until it is explicitly changed the mode of the journal to another.

So why is it needed?

Initially, SQLite was designed as a built-in database. The architecture of separation of simultaneous access to data was arranged primitively: at the same time several connections can read a database, but only one connection can record at a time. This, at a minimum, means that the writing connection is waiting for the “release” of the database from the readers. When you try to write to the "busy" database, the application receives an error SQLITE_BUSY (not to be confused with SQLITE_LOCKED !). This sharing mechanism is achieved through the file-blocking API (which do not work well on network drives, so SQLite is not recommended; learn more )

In the WAL ( Write-Ahead Logging ) mode, the “readers” of the database and the “writers” in the database no longer interfere with each other, that is, data modification is allowed while reading simultaneously. In short, this is a step towards large and serious DBMS, in which everything is. It is also argued that SQLite in WAL is faster.

But there are also disadvantages:
- some additional nishtyaki from OS are required (unix and Windows have these nishtyaki);
- DB occupies several files (files “XXX-wal” and “XXX-shm” );
- does not work on large transactions (conditionally, if the transaction is more than 50 MB);
- it is impossible to open such a database in “read only” mode;
- an additional checkpoint operation occurs.

In fact, in WAL mode, the database data is divided between the database and the log file. The checkpoint operation transfers data to the database. By default, this is done automatically if the log has occupied 1000 pages of the database.
That is, fast COMMITs are coming and suddenly some COMMIT has thought and started making a checkpoint. If this behavior is undesirable, you can do checkpoint manually (when everything is calm), you can do it in a separate process.

Limits

Despite its small size, SQLite in reality does not impose serious restrictions on the size of fields, tables or databases.

By default, a BLOB or string value can occupy 1 GB and this is the same size limit for one record (you can raise to 2 ^ 31 - 1, the parameter SQLITE_MAX_LENGTH).

Number of columns: 2000 (can be raised to 32767, SQLITE_MAX_COLUMN).

SQL statement size: 1 MB (1073741824 bytes, SQLITE_MAX_SQL_LENGTH).

Simultaneous join: 64 tables.

Attach databases to the connection: 10 (up to 62, SQLITE_MAX_ATTACHED)

The maximum number of pages in the database: 1073741823 (up to 2147483646, SQLITE_MAX_PAGE_COUNT).

If you set the page size to 65636 bytes, then the maximum database size will be approximately 14 Terabytes.

The maximum number of records in the table: 2 ^ 64 - 1, but in practice, of course, the size limit will come earlier.

Learn more sqlite.org/limits.html

UDP: SQLite optimization links: 1 2 android-1 android-2

To be continued

Source: https://habr.com/ru/post/149635/

All Articles