📜 ⬆️ ⬇️

PHP and various types of NoSQL



Recently, various NoSQL databases are gaining popularity. This article began as a study of the features of the graphite graph database Neo4j. But, in the process of selecting information, I wanted to systematize information about NoSQL solutions and graph databases in particular.
In the course of this small study, DBMS that are successfully used in the Web domain were selected for detailed consideration. And since there is “PHP” in the tags, I chose a DBMS that can already be used with this language.



The article turned out to be voluminous, for ease of navigation, I suggest using the table of contents
  1. NoSQL Types
  2. Key-value stores

  3. Bigtable stores

  4. Graph stores

  5. Document Stores

  6. Some conclusions

')

NoSQL Types


All NoSQL DBMS are divided into several categories:


The figure below shows schematically the volumes of data used and the complexity of this data in these types of NoSQL.


In each section I tried to locate the DBMS in order of increasing functionality. Perhaps it was somewhat subjective.

There are databases that combine several categories, for example, OrientDB . According to the official description of the link above, it is both graph and document-oriented. Sometimes it is even referred to as Key-value stores and Column Family stores. More details about it later in the graph DBMS section.

Consider each category below:


Key-value stores / Key-value stores


Key-value stores are the very direction in which NoSQL solutions show their superiority over SQL.
And many consider this direction to be the most popular in the short and long term.
For example, the author of the original version of an open MySQL database, Michael Widenius, thinks so.
Key-value NoSQL is very popular and they are developing quickly and well, apparently because of their large number and strong competition. The largest number of NoSQL databases that were studied in the process of writing the article related precisely to key-value stores.

On Habré there is an article about key-value repositories for PHP , with which I do not fully agree. The general selection of the repositories represented in it (Voldemort, Scalaris, MemcacheDB, ThruDB, CouchDB) seemed to me not so relevant after almost five years that have passed since the publication of the article. And the CouchDB described there is not a key-value store at all, but a document-oriented DBMS (see the section about document-oriented DBMS ).


MemcacheDB

Description : the same memcached, only with a BerkeleyDB background.
Performance : the developers presented test results , according to the results of which the average performance in one thread is 18868 w / s (write operations per second) and 44444 r / s (read operations per second). Tested on the server Dell 2950III, which even in the weakest configuration is a non-sickly device .
Installation : everything is collected from source. In PHP, we use the usual Memcached from PECL.
License : BSD-like License - free for commercial and non-commercial projects.


Redis

Description : On Habré there is an introductory article with a blackjack benchmark and links. There are transactions ( about them ) and replication. On the approach is version 3.0, in which Redis-Cluster will appear and significantly increase its speed. There is a nice interactive tutor .
Productivity : ~ 110.000 w / s, ~ 81.000 r / s on the middle gland.
Installation : Redis and the client for PHP are recommended to be collected from source codes. There are quite a few clients ( list ), I would recommend phpredis from myself for a good description and support of all (or almost all) existing Redis functionality.
License : BSD license - everything is free, but if something breaks, then no complaints about the developers.


Tarantool

Description : In-memory repository. Opposed to Redis, which differs, according to developers, increased speed, due to the fact that all data are in memory. There is a built-in queue mechanism. There are good habrostaty , describing the main features.
Installation : on Ubuntu is installed using apt-get and droplets of magic (the official page ), the client for PHP is assembled from sources ( github )
Performance : at the level with Redis, the test results are contradictory: Tarantool is faster than Redis with its developer , Tarantool at the level with Redis for an ordinary person
License : Simplified BSD - all for free.


Riak

Description : A database with a strong focus on fault tolerance and distribution. This emphasis is so strong that the development company recommends allocating at least five servers to Riak in order to be able to evaluate its capabilities. At first glance, this is a key-value repository, but there is a search in all fields, secondary keys, MapReduce. No transactions. Detailed and thorough habrostatya .
Installation : many ways up to installation from packages for Debian / Ubuntu. For PHP, there is a PECL package, as well as the official PHP-client .
Performance : it is not given the most important place, but there are references to 2,500 operations per second.
License : Apache 2 License is free for ordinary people, but for commercial use, prices for one copy of Riak Enterprise start at $ 2,800 / year.


Aerospike

Description : Scalable storage for huge amounts of data with minimal latency. Transactions by default, ACID support is allocated a separate page . In version 3, secondary indexes appeared. The amount of proprietary scaling, replication, and clustering technologies is impressive ( link ). For myself, this system is remembered as a powerful industrial Memcached.
Installation : Aerospike is installed from the distribution, the official client for PHP exists only for Aerospike2, it is built from source.
Performance : declared speed from 180,000 to 400,000 operations per second with a delay in microseconds ( source ).
License :



FoundationDB

Description : It is positioned as a comprehensive and simplest solution to install and configure. Easy scalability, easy management are the keywords that catch on. Users are offered "uncompromised ACID transactions." Ability to use different data models - key / value, document, and even SQL. This DBMS seemed to me especially interesting when I read about its performance.
Productivity : 3,750,000 r / s * . * Reading random records from RAM (cache). There are many interesting tests on the official website in the performance section, the “slowest” of which shows the result of ~ 235,000 operations per second (50/50 read and write operations). Delay reading less than 2ms, commit delay less than 15ms. The results were obtained on a cluster of 24 machines, each with 16Gb RAM, 2x200Gb SSD, the test database consisted of 2 million key-value records, all operations were transactional with the maximum isolation level and triple replication.
Installation : and everything is simple: DEB-package for Ubuntu, PEAR-package for PHP.
License :


Some interesting projects were not included in this list due to the lack of PHP support. The projects Voldemort , Scalaris , ThruDB were also not included. Due to poor performance, or poor documentation, and due to the fact that since 2009 nothing has changed for the better.




Column Family (Bigtable) stores / Scalable distributed storage


The stores presented in this section are mainly designed based on the design of the original Google Bigtable.
The main feature of these NoSQL is working with data, whose volumes are measured in terabytes.
Here, the instant access speed is not so important, where a greater emphasis is placed on distribution, fault tolerance and the ability to process huge amounts of information.


Hbase

Description : Open Source development based on the original Google Apache Bigtable design. Developed through the Hadoop project. Used by Facebook itself as the basis of the messaging service. For HBase, the selection is made on a single indexed field. There is partial support for ACID, it turns out that the transaction seems to be there , but it is not supported in the most obvious way.
Installation : installed using a magic pill named Thrift, the installation and use process is well described in this habrostate .
Performance : field tests with an unusual method of measuring performance: on a cluster of 7 servers (16Gb RAM, 8x core CPU, HDD) operations were performed in a table with 3 billion records. 300 read / write processes were launched simultaneously, the time spent on the operation was measured. As a result, the average write time was 10ms , reading - 18ms .
License : Apache License 2.0 - use for any purpose for free.


Hypertable

Description : An interesting development, similar to HBase. It has a little more performance and much more familiar with the syntax of HQL queries. Request example:
select * from QueryLogByUserID where row =^ '003269359' AND "2008-11-13 05:00:00" <= TIMESTAMP < "2008-11-13 06:00:00" 

There are no transactions, which is clearly stated in the first lines of the documentation on the official website.
Installation : connect with PHP using Thrift and the official ThriftClient ( github ).
Performance : several graphs on the official site. As mentioned above, performance is similar to HBase.
License : GNU General Public License Version 3. - use for any purpose for free. 24/7 support is available at an additional cost.


Cassandra

Description : Distributed storage, originally developed on Facebook, subsequently transferred to Apache. Unlike the above, Cassandra is a distributed decentralized hash table (DHT) and is based on Amazon's Dynamo. It has a CQL query language, very similar to SQL with some limitations. You can build queries with a selection of several columns, add secondary indexes. In version 2.0, there are "transactions" that operate on the principle of "compare-and-swap".

The syntax of the transaction request will be noticeably like this:


Installation : There are several ways to establish interaction between PHP and Cassandra (same Trift, Cassandra-PHP-Client-Library, cassandra-pdo ). The last option seemed to me the most pleasant.
Performance : good comparative tests with graphs, the results of which, on 8 servers with a ratio of 50/50 read / write operations, Cassandra performs about 9,000 operations per second. HBase makes about 2,500 under the same conditions.
License : Apache License 2.0 - use for any purpose for free.

There are other BigTable solutions, for example, Stratosphere , HPCC , Cloudera , Cloudata . They are not reviewed in detail for various reasons, for example: lack of PHP support, low prevalence, poor documentation.




Graph Stores / Graph DBMS



It was for them that this article was started. Recently, I discovered NoSQL graph as a new version of the data storage structure and was pleased a lot, because in a number of projects the basic graph DBMS functionality had to be implemented using the not very simple MySQL queries.

In a graphical DBMS, the structure of the stored data may look like this:

If you add all the films to the graphical DBMS and associate with each of the actors acting in it, you can easily find
 ,    ,  -     "",        "  " 



Neo4j

Description : the most successful and sought-after development in the field of graphical DBMS. It fully supports ACID. Just installed and effortlessly scaled. She has already developed a developed community, you can quickly find answers to most of the emerging issues. You can read about its capabilities in conjunction with PHP in this article .
Installation : installed from your repository, Neo4jPHP client is used for PHP
Performance : in view of the specific nature, it seemed strange to me to give specific read / write speeds. It allows you to select hard data and makes it many times faster than relational DBMS.
License :



In this section, I described only one DBMS, and its most interesting competitor, OrientDB, is below. As it turned out, there are so many graph databases for the Web and for PHP in particular.
There is also Titan , which uses HBase, BerkleyDB or Cassandra as the back-end. There is not a lot of information on this miracle; there are even fewer ways to make friends with PHP.
It is worth remembering about FlockDB from Twitter, which can be connected to php using a Thirt client. But, again, due to the small amount of information about this DBMS, it is difficult to form a complete and objective opinion about it.




Document Stores / Document Storage


In this section, we consider document-oriented storages - DBMS for hierarchical data structures. These storages are universal: they have high read / write speeds, have a flexible approach to the formats of stored data, easily work with unstructured data and provide ample opportunities for scaling.


MongoDB

Description : Perhaps the most popular document-oriented NoSQL DBMS. Data is stored in JSON / BSON format. Good scaling, replication, indexes, Map-Reduce. Transactions are represented as compare-and-swap.
Installation : MongoDB from repository, php-client from PECL.
Performance : a little higher were the comparative tests , in which there were results on MongoDB.
License : GNU AGPL - open source, free use.


Couchdb

Description : Apache development. In many ways similar to MongoDB. It is distinguished by the absence of blocking during reading operations, and by the more complicated sharding technology.
Installation : CouchDB from the repository, for php client there are several options (PHPillow, PHP Object Freezer, PHP-on-Couch, extension from PECL).
Performance : according to the results of one test , it is noticeably slower than MongoDB
License : Apache 2.0 - use for free.

There are many more developments in this area, but they seemed to me very monotonous. Although, perhaps, I just did not study them deeply enough.


OrientDB

Description : document-oriented and, at the same time, graphical DBMS.

Its closest competitor as document-oriented is MongoDB. A separate page is devoted to this comparison.
The main advantages of OrientDB:

Separately, I want to note the query language, compare what identical update-queries look like:

Its main competitor as a graph is Neo4j. And I must say that mastering graph capabilities in OrientDB is much more complicated than in Neo4j. The first ideas about this can be obtained in this article .
Installation : with installation, you need to do some work, here is a completely working manual , and this library is recommended as a PHP client.
Performance : promise 150.000 w / s , there is also a comparison of graph DBMS
License :



Some conclusions


In the course of writing the article, I found a lot of interesting useful and useful information, and I am glad to share it with habrovchanami.

I really liked such solutions as FoundationDB, Neo4j, OrientDB. I would like to devote each of them a separate article.

In conclusion, I would like to share a fun picture that helps you quickly choose a NoSQL solution for your project. I saw the picture in 4dmonster's comments, for which he thanks.
image

Source: https://habr.com/ru/post/214647/


All Articles