📜 ⬆️ ⬇️

Tuning SQLite for PHP


This article evaluates the use of SQLite for PHP projects, discusses the features of multi-threaded work and ways to speed up work. "For dessert" - a small comparison with other DBMS (MySQL, PosgreSQL, MS SQL, MongoDB).

Start


Not so long ago I wanted to make my promotional code service for one of my projects. I did not hesitate in choosing a platform. Once I have a server with a deployed PHP environment, then the service itself can be safely written in PHP (especially since I know it). As a DBMS, I wanted to try SQLite (although MySQL and PostgreSQL are deployed on the server). The reason is easier to transfer to the “combat” environment (just copy the folder), easier to deploy on the “bare” server (except PHP, nothing is needed). Additionally, I wanted to simplify the backup (I planned to use disk snapshots + just copy the resulting “slice”).

But before using, it is necessary to check the applicability and performance of this solution.

Iron


Testing took place at 2 sites:

')

Preliminary estimate


A simple script was written for the initial performance evaluation:
<?php //   function initDB() { $guid = ''; //      if (function_exists('com_create_guid') === true) { $guid = trim(com_create_guid(), '{}'); } else { $guid = sprintf('%04X%04X-%04X-%04X-%04X-%04X%04X%04X', mt_rand(0, 65535), mt_rand(0, 65535), mt_rand(0, 65535), mt_rand(16384, 20479), mt_rand(32768, 49151), mt_rand(0, 65535), mt_rand(0, 65535), mt_rand(0, 65535)); } //     $file = "_$guid.db"; //   ( ) $db = new SQLite3($file); //   $db->exec('CREATE TABLE foo (i INT, i2 INT)'); return $db; } //    function testDB($db, $count) { for ($i=0; $i<$count; $i++) testIteration($db, $i); } // 1   function testIteration($db, $iteration) { $db->exec('INSERT INTO foo VALUES ('.$iteration.', 1)'); } // ----------     ---------- //    $COUNT = 1000.0; if(isset($_REQUEST['COUNT'])) $COUNT = $_REQUEST['COUNT']; //   $db = initDB(); //    $start = microtime(true); //   testDB($db, $COUNT); //   $time = microtime(true) - $start; //   echo "$COUNT inserts per " . number_format($time, 2) . "sec. " . number_format($COUNT / $time, 2) . " operation per second"; ?> 

We consistently insert 1000 records into the table. We measure elapsed time. Calculate the number of operations per second.
In principle, for a rough estimate, more is not necessary.
I draw your attention to the fact that I did not specifically use transactions. Each operation will go in its own transaction. This mode of operation is closer to the actual use of the database in multi-threaded mode.

And now the results:
For the SSD drive received an estimate of 365 operations per second. Not bad.
For HDD received a total of 13 operations. Sadly But what will the server say?
And the server pleased. 210 operations.
In general, the performance evaluation on the server is quite decent. You can continue testing PHP work with SQLite. Just immediately make a mark that the server must use a fast file subsystem (SSD).
In our case, as I said, the cloud in Selectel is used. SSD acts there as a cache between disks and cloud machines (in any case, as I understood from the stories of the notorious amarao ).

Multithreaded mode


Now we will try to answer the question whether the database will be able to work in multi-threaded mode.
We read a wonderful series of articles about SQLite: habrahabr.ru/post/149356
And we see that for a multithreaded mode, you need to include a WAL log. Add a line to the database initialization (after opening):
 $db->exec('PRAGMA journal_mode=WAL;'); 

And immediately run the test:
SSD: 2000 transactions per second. The indicator has grown more than 5 times!
HDD: 42 operations. Growth by 3 times.
Server: 1000 operations. Growth almost 5 times.

Very unexpected results. A simple change in the mode of the journal gives an increase of 5 times.
The next step is to rewrite the script on multithreading.
We will achieve multithreading via a PHP script call several times in several threads (closer to the real work of sites). To do this, use the utility AB (Apache Benchmark).
 ab -n 1000 -c 10 "http://localhost:81/test/benc_sqlite.php" 

The resulting script text:
 <?php function openDB() { //     $file = "_TEMP.db"; //   ( ) $db = new SQLite3($file); //   $db->busyTimeout(5000); $db->exec('PRAGMA journal_mode=WAL;'); return $db; } //   function initDB() { $db = openDB(); //   $db->exec('CREATE TABLE foo (i INT, i2 INT)'); return $db; } //    function testDB($db, $count) { return testIteration($db, mt_rand(0, 1000)); } // 1   function testIteration($db, $iteration) { return $db->exec('INSERT INTO foo VALUES ('.$iteration.', 1)'); } // ----------     ---------- //   if(isset($_REQUEST['init'])) $db = initDB(); else $db = openDB(); //   if(testDB($db, 1)) echo 'OK'; else echo 'FAIL'; ?> 

From the script, the insertion speed was measured, AB will give it to us. Now initialization is carried out only 1 time, because work with one DB is in progress.
Additionally, installed
 $db->busyTimeout(5000); 

This is necessary so that when trying to multi-threaded recording, the process waited some time (I set 5 seconds) of my turn, and did not immediately fall with the fact that “SQLITE_BUSY”.

Results:
SSD: 970 operations per second. More than enough.
HDD: 35 operations.
Server: 90 operations during the test from the local machine and 210 when running AB from the server. A very big difference with the launch on SSD. Apparently, it is due to differences in the configuration of systems (on the Apache server, it is cut down to the minimum possible).
Additionally, it should be noted that the local indicators are very floating, the server cloudiness affects. From me to the server is almost one and a half thousand kilometers.
In any case, even if we take at least 90 operations per second, then this is more than enough, in my opinion. For example, Habr has an average of 30 views per second (of course, strong peaks during the day, but even before Habr grow and grow).

In the future, indicators for SSD and HDD are taken in the local test. And for the server - from a remote machine.
Server and SSD testing is done by running 1000 requests in 10 threads. For HDD - 100 requests in 10 streams.

Complication


Let me remind you that in the test there was 1 simple INSERT request. In principle, it is good to have a service that performs 1 operation on the base. But it is very far from reality. Complicate up to three queries (1 SELECT, 1 UPDATE and 1 INSERT).
The final testIteration function:
  function testIteration($db, $iteration) { $rez = $db->querySingle('SELECT count(*) FROM foo WHERE i='.$iteration); if($rez > 0) { $db->exec('UPDATE foo SET i2 = i2+1 where i='.$iteration); } return $db->exec('INSERT INTO foo VALUES ('.$iteration.', 1)'); } 

It is clear that you can do without a SELECT query, but nevertheless, we have a test, not an optimization competition.

Indicators:
SSD: from 420 requests. From launch to launch indicator falls.
HDD: from 20 requests (and falls).
Server: from 65 requests (and falls).

Database Optimization


I did not like the drop in performance. Well, let's try to fix this problem. Specifically, add indices. Our search takes place in just one field, so add it to the index.

In the initialization, add the line
 $db->exec('CREATE INDEX "idx_foo" ON "foo" ("i");'); 

Results:
SSD: 671 queries. Not bad.
HDD: from 25 to 12 requests. I do not know what the problem is, but the result is not indicative. Perhaps due to the fact that the server and SSD, I test 1000 requests in 10 streams, and HDD - 100 requests in 10 streams. There is less data on the order. But such a spread should not have affected.
Server: 60 requests.

Well, the results make me happy. But we already have 2 operations for changing the base, where is my transaction? Add it:
  function testIteration($db, $iteration) { $rez = $db->querySingle('SELECT count(*) FROM foo WHERE i='.$iteration); $db->exec('BEGIN;'); if($rez > 0) { $db->exec('UPDATE foo SET i2 = i2+1 where i='.$iteration); } $rez = $db->exec('INSERT INTO foo VALUES ('.$iteration.', 1)'); $db->exec('COMMIT;'); return $rez; } 

And again the results:
SSD: from 720 requests.
HDD: from 30 requests. And this time the indicator is quite stable.
Server: from 70 requests.

Optimize database failures


Apache Benchmark in addition to the execution of requests, monitors the success of requests. The length of the answer is taken and all answers, longer or shorter than the original, are considered errors. Interestingly, when testing for a long time, similar problems began to emerge. Let only a dozen per 1000 requests, but nonetheless.
In the logs, the message “SQLite3 :: exec (): database is locked”. Strange, WAL mode, transactions, and still “database is locked”. The solution was found in the SQLite newsletter: http://www.mail-archive.com/sqlite-users@sqlite.org/msg05525.html
In multi-threaded mode, you must use not “BEGIN”, but “BEGIN IMMEDIATE” or “BEGIN EXCLUSIVE”. Special transactions for multi-threading. The “BEGIN EXCLUSIVE” mode sometimes gave me this problem, but the “BEGIN IMMEDIATE” mode solved it completely.

And now the final tests:
SSD: from 700 requests.
HDD: from 30 requests. Moreover, the figure is quite stable.
Server: from 53 requests. Strange, I expected about 70. But this is a cloud, there is no need to talk about stability.

By the way, the indicator of the number of requests drops a little with time. On a clean base, the SSD issues 1000 requests. On the “dirty” - up to 350. At the same time, about 50,000 records have accumulated in the database.

It should be added that if a single-threaded test is complicated, as we have complicated multi-threaded (1 SELECT, 1 UPDATE and 1 INSERT query + transaction + indices), then the result of SSD is 1500 iterations per second. There is where to strive. At the same time, with the growth of the database, the performance drops about 2 times slower than for the multi-threaded mode.
Repeated tests did not reveal significant advantages of single-thread operation mode of SQLite with respect to multi-threaded.

In principle, the test showed quite good results. Let him in some places contradictory, controversial and insufficiently learned. But nevertheless, for me personally, an assessment of the health of the SQLite and PHP bundles was fully demonstrated.

Comparison with other DBMS


SQLite we looked. But how do other DBMSs behave in these conditions?
This test will occur only on the local machine and only on the SSD. The reason is that the local machine costs more DBMS than the server, and this mode is closer to the work of my server.
On hand were: SQLite (v3.7.7.1), PostgreSQL (v9.2.1), MS SQL Server (v11.0.3000.0 (2012, Developer Edition)), MySQL (v5.5.28), MongoDB (v2.4.4) .
PHP used the standard driver for SQLite and MongoDB. PostgreSQL, MySQL and MS SQL worked through PDO.
Additionally, I decided to change the test a bit: 10,000 requests are executed in 10 threads. For the purity of the experiment, the base was empty each time.
Database initialization occurs in manual mode. Initialization includes: creating a table and an index across the field.
Summary script for SQLite
 <?php function openDB() { //     $file = "_TEMP.db"; //   ( ) $db = new SQLite3($file); //   $db->busyTimeout(5000); $db->exec('PRAGMA journal_mode=WAL;'); return $db; } //   function initDB() { $db = openDB(); //   $db->exec('CREATE TABLE foo (i INT, i2 INT)'); $db->exec('CREATE INDEX "idx_foo" ON "foo" ("i");'); return $db; } //    function testDB($db, $count) { return testIteration($db, mt_rand(0, 1000)); } // 1   function testIteration($db, $iteration) { $rez = $db->querySingle('SELECT count(*) FROM foo WHERE i='.$iteration); $db->exec('BEGIN IMMEDIATE;'); if($rez > 0) { $db->exec('UPDATE foo SET i2 = i2+1 where i='.$iteration); } $rez = $db->exec('INSERT INTO foo VALUES ('.$iteration.', 1)'); $db->exec('COMMIT;'); return $rez; } // ----------     ---------- //   if(isset($_REQUEST['init'])) $db = initDB(); else $db = openDB(); if(testDB($db, 1)) echo 'OK'; else echo 'FAIL'; ?> 

Script for PDO (change connection string for PostgreSQL, MySQL and MS SQL)
 <?php function openDB() { // MySQL $db = new PDO('mysql:host=127.0.0.1;dbname=test', 'root', ''); // PosgreSQL //$db = new PDO("pgsql:host=127.0.0.1;dbname=test", "postgres", "" ); // MS SQL //$db = new PDO("sqlsrv:Database=test;Server=127.0.0.1\SQL2012", "sa", "" ); return $db; } //    function testDB($db, $count) { return testIteration($db, mt_rand(0, 1000)); } // 1   function testIteration($db, $iteration) { $rez = $db->query('SELECT count(*) FROM foo WHERE i='.$iteration); $rez2 = $rez->fetchAll(); $rez2 = $rez2[0][0]; $rez->closeCursor(); if(!$db->beginTransaction()) return false; if($rez2 > 0) { $db->exec('UPDATE foo SET i2 = i2+1 where i='.$iteration); } $rez = $db->exec('INSERT INTO foo VALUES ('.$iteration.', 1)'); $db->commit(); return $rez; } // ----------     ---------- //   $db = openDB(); if(testDB($db, 1)) echo 'OK'; else echo 'FAIL'; ?> 

Script for MongoDB
 <?php function openDB() { $conn = new Mongo('localhost'); $db = $conn->test; return $db; } //    function testDB($db, $count) { return testIteration($db, mt_rand(0, 1000)); } // 1   function testIteration($db, $iteration) { $foo = $db->foo; $q = array('i' => $iteration); $ch = $foo->find($q); if($ch->count() > 0) { $foo->update(array('i' => $iteration), array('$inc' => array("i2" => 1))); } $rez = $foo->insert(array('i' => $iteration, 'i2' => 1)); return $rez; } // ----------     ---------- //   $db = openDB(); if(testDB($db, 1)) echo 'OK'; else echo 'FAIL'; ?> 

Results:
SQLite: 993 queries per second. Quite an expected result.
PostgreSQL: 178 202 queries per second. Pretty little. Apparently, the problem is in the default setting.
MS SQL: 1400 queries per second. Not bad, very good.
MySQL: 1490 queries per second. So you and default settings :). It is worth noting that initially there were 9 requests per second. The replacement of the call to localhost with 127.0.0.1 helped. As far as I know, due to the active IPv6 in the system.
MongoDB: 3173 requests per second. Remarkable result. Although I expected more from NoSQL.

Frankly, the comparison is not the most honest. The test took place at default settings. I do not have the necessary administrative skills to align the settings of the DBMS.
MongoDB, first of all, is focused on speed. SQLIte is focused on reliability. And PostgeSQL in the default setting is focused on a minimum of resource consumption. Perhaps this is the reason for the failure.

UPD: I was told by the “big secret” that PostgreSQL does not like reconnecting. Therefore, I conducted single-threaded tests for each DBMS.
The test measures the operating time in single-point mode. 10,000 operations, clean db. The operation is 1 SELECT, 1 UPDATE and 1 INSERT in a transaction. Measurement of speed in the script (as in the 1st test), without connection time to the database.
The results are as follows:
PostgreSQL: 1403. Growth by 7 times. What confirms PostgreSQL dislike for reconnections.
PostgreSQL (fsync = off): 1766. Growth by a quarter relative to the default mode. In multithreaded mode, the increment did not appear.
SQLite: 938. Strange, before 1500 it was. Probably in the test was a mistake.
MS SQL: 1993. There is growth, but not significant. Somewhere in 40% (an increase of 1.4 times).
MySQL: 2514. There is growth, 1.7 times.
MongoDB: 7603. Growth almost 2.5 times.


The results are quite interesting. So much so that I decided to compare the performance of different DBMS on the server. Let me remind you that you are using Debian 6. On the server are: SQLite (v3.7.3), PostgreSQL (v9.0.4), MySQL (v5.1.66). All in default setting. AB was run locally on the server. 10,000 requests in 10 threads.
Results:
SQLite: 174 queries per second.
PostgreSQL: 104 queries per second.
MySQL: 167 queries per second.

As we see, there is no such big gap as for the local machine. But do not forget that the server uses SSD (albeit not in “pure” form, but as an element of the cloud storage system).

Total


As a result, we can say that a bunch of PHP and SQLite is quite efficient for low-loaded projects. Enough to follow some simple rules:
  1. Use WAL logging mode.
  2. Do not forget to install Busy Timeout.
  3. Record unite in the transaction. And not simple, but “BEGIN IMMEDIATE”.

If the project cannot be classified as “small”, then using the SSD for the base may well help here.

If we talk about Hi-Load, then the guys involved in it, they know what to do without my advice.

If we compare SQLite with other DBMS, then they have comparable performance on the default settings. Anyway when using an SSD drive.
At the same time, ease of setup and administration (nothing needs to be configured and administered), ease of transfer to another server (with a simple copy-paste) provide some advantage for SQLite. First of all for developers. For SQLite to work with PHP, nothing but PHP itself is needed.

Source: https://habr.com/ru/post/204438/


All Articles