We build real-time web applications with RethinkDB

From the translator: I just recently found out about this rather interesting database and just came across a fresh article. On Habré there is almost no word about RethinkDB, in connection with which it was decided to make this translation. Welcome under the cut!

The RethinkDB database simplifies the development of web applications designed for real-time updates.

RethinkDB is an open source database for real-time applications. It has a built-in change notification system that continuously updates updates for your application. Instead of constantly requesting new data, let the database send you the latest updates yourself. The ability to "subscribe" to streaming updates can greatly simplify the architecture of your application and work with clients that maintain a constant connection to your server side.
')
RethinkDB is a schematic storage of JSON documents, but also supports some features of relational databases. RethinkDB also supports clustering, which makes it very easy to expand. You can configure sharding and copying via the built-in web interface. The latest version of RethinkDB also includes automatic fail-over for clusters with three or more servers. ( Approx. translator: implies the possibility of continuing to work with the database in the event of a server crash.)

The query language in RethinkDB, which is called ReQL, is natively embedded in the code in the language in which you write your application. If, for example, you code in Python, then when writing queries to the database you will use the usual Python syntax. Each request is made up of functions that the developer assembles in a chain in order to accurately describe the necessary operation.

A few words about ReQL
RethinkDB contains tables in which traditional JSON documents are stored. The structure of the JSON objects themselves can be deeply nested. Each document in RethinkDB has its primary key (primary key) - the “id” property with a unique value for the parent table. Referring to the primary key in your request, you can get a specific document.

Writing ReQL queries in an application is quite similar to using the SQL query designer API. Below, in JavaScript, there is a simple ReQL query example for determining the number of unique last names in the users table:

r.table("users").pluck("last_name").distinct().count()

In a ReQL query, each function of the chain works with data obtained from the previous function. To be precise, the order of execution of this request is as follows:

table queries the specified table in the database.
pluck gets a specific property (or several properties) from each entry
disctinct removes duplicate values, leaving only one unique
count counts and returns the number of items received.

Traditional CRUD operations are also simple. ReQL includes the insert function, which can be used to add new JSON documents to the table:

 r.table("fellowship").insert([ { name: "Frodo", species: "hobbit" }, { name: "Sam", species: "hobbit" }, { name: "Merry", species: "hobbit" }, { name: "Pippin", species: "hobbit" }, { name: "Gandalf", species: "istar" }, { name: "Legolas", species: "elf" }, { name: "Gimili", species: "dwarf" }, { name: "Aragorn", species: "human" }, { name: "Boromir", species: "human" } ])

The filter function retrieves documents that match certain parameters:

 r.table("fellowship").filter({species: "hobbit"})

You can add such functions as update or delete to a chain to perform certain operations on documents returned from filter :

 r.table("fellowship").filter({species: "hobbit"}).update({species: "halfling"})

ReQL includes more than 100 functions that can be combined to achieve the desired result. There are functions for managing flows, changing documents, aggregating, recording, etc. There are also functions “sharpened” to perform standard operations with strings, numbers, time stamps and geospatial coordinates.

There is even an http command that you can use to get data from third-party Web APIs. The following example shows how you can use http to get posts with Reddit:

 r.http("http://www.reddit.com/r/aww.json")("data")("children")("data").orderBy(r.desc("score")).limit(5).pluck("score", "title", "url")

After the posts are received, they are sorted by points and then certain properties of the best five posts are displayed. Using ReQL "at full capacity", developers can perform really complex data manipulations.

How ReQL Works

The RethinkDB client libraries (hereinafter referred to as “drivers”) are responsible for integrating ReQL into the programming language in which the application is being developed. Drivers implement functions for all kinds of queries supported by the database. ReQL expressions are treated as structured objects that look like an abstract syntax tree . But in order to fulfill the request, the drivers transfer these request objects to the special format " RethinkDB's JSON wire protocol format ", in which they are then transmitted to the database.

The run function, which closes the chain, translates the query, executes it on the server, and returns the result. As a rule, you will transfer a connection to the server to this function so that it can perform the operation. In the official drivers, work with the connection is performed manually. This means that you need to create a connection and close it after performing the operation.

The following example shows how to execute a query in RethinkDB from under Node.js with the ReQL driver for JavaScript installed. This query retrieves all halflings from the fellowship table and displays them in the console:

 var r = require("rethinkdb"); r.connect().then(function(conn) { return r.table("fellowship") .filter({species: "halfling"}).run(conn) .finally(function() { conn.close(); }); }) .then(function(cursor) { return cursor.toArray(); }) .then(function(output) { console.log("Query output:", output); })

The rethinkdb module provides access and use of RethinkDB drivers. You can use this module to compose and send requests to the database. The example shown above uses promises for asynchronous flow control, but drivers also support working with regular callbacks.

The connect method establishes a connection, which is then used by the run function. to fulfill the request. By itself, the query returns a cursor , which is something like an open window in the contents of the database. Cursors support “lazy fetching” (lazy fetching) and offer efficient ways to iterate through large amounts of data. In the example above, I just decided to convert the contents of the cursor into an array, since the size of the result is relatively small.

Despite the fact that ReQL queries are written in your application as plain code, they are executed on the database server and return the results obtained. Integration is so seamless that beginners are often confused where in the code the boundary is between their application and work with the database.

ReQL chains and integration into various languages greatly increase the possibility of code reuse and separation of frequent operations. Since queries are written in the language of the application, encapsulation of query subexpressions into variables and functions becomes very simple and convenient. For example, this JavaScript function summarizes pagination, returning a ReQL expression that will already contain the specified values:

 function paginate(table, index, limit, last) { return (!last ? table : table .between(last, null, {leftBound: "open", index: index})) .orderBy({index: index}).limit(limit) }

Another remarkable advantage of ReQL is that it also offers work with familiar SQL and is well insured against conventional injection attacks. You can easily include external data in your queries without worrying about the need for risky string concatenation.

Many of the more advanced features of ReQL, such as secondary indexes, concatenation of tables, and the use of anonymous functions, are beyond the scope of this article. However, if you wish, you can familiarize yourself with them on the ReQL API documentation page.

Creating real-time web applications using changefeeds

RethinkDB has a built-in change notification system, which greatly simplifies the development of real-time applications. If you insert the changes function at the end of the chain, then as a result of the query, a continuous stream will be launched, reflecting all the changes that are taking place. Such streams are called changefeeds (hereinafter “chendzhfid”).

Habitual queries to the database are well suited to the traditional request / response web model. However, continuous polling of the server is not practical for real-time applications using a permanent connection to the server or streaming data. Chanjfidy provide an alternative to the usual survey, namely the ability to continuously submit updated results to the application.

You can attach the changfid directly to the table to track any changes to its contents. You can also use chendzhfidy with more complex queries to receive updates only the data you need. For example, you can attach changfid to a query that uses the orderBy and limit functions to create a dynamic highscore table for a multiplayer game:

 r.table("players").orderBy({index: r.desc("score")}).limit(5).changes()

Players are sorted by points and then the first five are displayed. As soon as there are any changes in this top five, chendzhfid will send you updated data. Even if a player who was not originally in the TOP-5, scores enough points and replaces another player from the top five, chendzhfid will report this and transfer all the necessary data to update the list.

Chanjfid sends not only the new value of the record, but the previous one, allowing us to compare the results. If one of the records is deleted, then its new value will be null . As well as for the newly added, new record, the old value will be null . By the way, you can add other operations to the chain after the changes , if any manipulations with the incoming data are necessary.

When you execute a query with the changes command, the cursor will be returned, which will remain open forever (remember the window, right?). The cursor will display the new changes as they become available. Below you can see an example showing how you can get updates from changfed in Node.js:

 r.connect().then(function(conn) { return r.table("data").changes().run(conn); }) .then(function(cursor) { cursor.each(function(err, item) { console.log(item); }); });

The work of the changfed cursor is performed in the background, which means your application is not blocked. In native asynchronous environments, such as Node.js, you do not need to take any additional measures to work correctly. If you are working with other languages, you will probably need to install frameworks for asynchronous code, or manual implementation of threads. Official RethinkDB drivers for Python and Ruby support such popular and widely used frameworks as Tornado and EventMachine.

At the moment, the changes command works with the functions get, between, filter, map, orderBy, min and max . Support for other types of queries is planned for the future.

When creating a real-time web application using RethinkDB, you can use WebSockets to broadcast updates to the front-end. And such libraries as Socket.io are easy to use and will simplify this process.

Chanjfidy especially useful for applications designed for horizontal expansion. When you distribute the load across multiple instances of your application, you usually resort to using additional mechanisms, such as message queues or in-memory db, to distribute updates to all servers. RethinkDB takes this functionality to the level of your application, flattening its architecture and eliminating the need for additional infrastructure. Each instance of the application connects directly to the database to receive new changes. As updates are available, each server transmits them to the appropriate WebSocket clients.

In addition to real-time applications, chendzhfidy can greatly simplify the implementation of mobile push-notifications and other similar functionality. Chanjfidy represent an event-oriented model of interaction with the database and this model in many cases turns out to be useful.

Scaling and managing a RethinkDB cluster

RethinkDB is a distributed database aimed at clustering and simple expansion. To add a new server to the cluster, simply launch it from the command line with the --join option and specify the address of an existing server. If you have a cluster with several servers, you can configure sharding and copying individually for each table. Any settings and features running on a single instance of the database will work exactly on the cluster as well.

The RethinkDB server also includes an administrator web interface, which you can open directly in the browser. Using this interface, you can easily manage and monitor cluster operation. You can even customize sharding and copying with a few clicks.

RethinkDB allows you to use the ReQL approach to cluster configuration, which is ideal for fine-tuning and automation. ReQL includes a simple reconfigure function that can be tied to a table to set sharding settings. The cluster also provides most of the internal information about its state and settings through a set of special tables in RethinkDB. You can query the system tables to change settings or to receive information for monitoring. Almost all the functionality provided through the web interface is built on the ReQL API.

You can even use chendzhfidy in conjunction with the ReQL monitoring API to get a stream of data about the server. For example, you could create your own monitoring tool that attaches chendzhfid to the system table with statistics and transmits data in real time to plot the read / write load.

RethinkDB 2.1, recently released, has built-in support for automatic fail-over. The new functionality improves the availability of clusters and reduces the risk of crashing the database server. If the primary (primary) server is faulty, then the remaining secondary working servers “choose” a new primary, which will fulfill this role until the faulty server starts working or is removed from the cluster.
Iron breakdowns, or network interruptions, no longer affect data availability as long as most servers are online.

Installing RethinkDB

RethinkDB works under Linux and MacOS X. The Windows version is under active development and is not yet available for download. The RethinkDB documentation details the installation process. We have prepared the APT and Yum repositories for Linux users, as well as the installer for OS X. You can also install RethinkDB using Docker or compile the source code from Github . To understand this, you will help our 10-minute instruction .

Original: link

Source: https://habr.com/ru/post/266085/

All Articles

We build real-time web applications with RethinkDB

More articles: