📜 ⬆️ ⬇️

Master Tarantool 1.6



Evgeny Shadrin (Sberbank Digital Ventures)


Tracking the news of the last few years, you can see that new NoSQL solutions, some releases appear almost every two weeks. Of course, many of them do not survive, lose the competition, disappear, but the world of NoSQL is replenished with new solutions very often.

At the conference, there are people who have never used NoSQL in their lives, and people who have used NoSQL for more than five years in their projects and companies. Some even participate in open-source projects. They are few, but there are some too.

My name is Evgeniy. I work in a small division of Sberbank Digital Ventures, which is engaged in the implementation of innovative products and solutions, i.e. we make IT prototypes at the junction of new technologies.
')
In this report, I want to tell you about my case of using a NoSQL solution by the user, so in the beginning I would like to briefly go through the theory.

What is NoSQL?



I use the acronym not only SQL, i.e. not just SQL. These are solutions that contain data models other than relational ones and designed to solve some issues; for example, the question of ease of scaling. Often, due to the fact that in NoSQL solutions for storing and working with data, it is not necessary to set specific schemes, entities and many configurations, it is very easy and simple to solve the scaling problem, deploy large clusters with a large number of nodes, add and delete these nodes. Also, NoSQL solutions are often highly specialized, i.e. Each team of developers does not always try to make a large universal project, but tries to solve some problem. Such specialization of solutions allows to achieve very high rates of performance in specific tasks. And in such tasks, using NoSQL solutions can be convenient and easy.



Here I gave the most popular databases in order of classification. You've heard about the key-value of the Redis repository and Riak — they use the key-value model to store data. The MongoDB document-oriented database is quite common and well known. The document-oriented model is slightly more complex than the key-value model and allows you to store very large hierarchical information. Next come the tabular, matrix database - for example, Apache HBase. They can work with a large amount of distributed information. OrientDB stands alone - a database that supports multi-paradigms, but I gave it as an example of databases that support the graph model. The graph model has an advantage: it is very convenient to trace the connections between the data, which is not bad when working with projects-analogues of social networks.

How to choose from all this diversity the solution that is right for you? I use the following principles:




And a brief listing of those cases that NoSQL solves.



I came across most of them personally.


Cases, of course, much more, but I talked about those with whom I came across. In general, in Sberbank Digital Ventures I develop systems that work in real time and whose main purpose is to get information from the server, save it, process it, understand what it is, and give the server the right answer.

For example, I get the necessary information about the user who goes somewhere in the depths of the Internet. I get all the data that I could collect, analyze them and can segment the user, put him in one category or another, i.e. find out that, say, this is a young man of 25 years old who is interested in cars, or this is a young girl of 18 who wants to go to university and is looking for a place to take a bribe for admission.



To solve my problem, I use the NoSQL database Tarantool. In the future, I will tell you why and why I use it and how it helps to solve problems that arise before me.

On the slide - a quote from the main page of the developers site, this is their project positioning. "A NoSQL database running in a Lua application server", i.e. The developers themselves are positioning Tarantool as a project that consists of two parts: the first part is the non-relational database, the second part is the Lua application server, i.e. application server using the Lua language.

By the way, note that most of the modern NoSQL database icons use gray and red. Probably, it is in trend =)

Further I will give you code samples. If someone has the opportunity, you can follow the link to the server try.tarantool.org is an interactive service where you can immediately use Tarantool, i.e. This is a kind of interactive Tarantool, which stands out for you on the servers of developers. Probably, someone wants code samples right there to drive in.

What makes Tarantool stand out from the large stack of NoSQL technologies?



Tarantool keeps all its data in memory, so we get quick access to it. The fact that Tarantool keeps them in memory does not mean that it is not safe and we can lose data. Tarantool has data storage mechanisms: logs and binary snapshots of states. These two mechanisms work together: we have points with stored data and descriptions of actions that were done before or after with this data. With such information, we can always come to the desired state.

At one time, the storage of data in memory led to the fact that this memory was running out very quickly. It ends now, but all the same, the amounts of RAM are constantly growing, and similar databases get a wider range of applications. Tarantool uses a document-oriented data model, i.e. it stores all the data in some kind of abstraction called a document. The document has its own fields, with which Tarantool works.

One of the features of Tarantool as a database is the presence of secondary indexes. The presence of secondary indexes allows you to make actions with data more lively, interesting and fast.

I have not yet used transactions in my projects, but Tarantool supports full-fledged transactions. As far as I know, in some companies - Mail.Ru Group, Avito - successfully use them to solve their cases. Tarantool has a lightweight thread model, or green threads. This is a multi-threaded model, only threads are created not at the Unix level, but within the application itself, which allows you to implement some asynchronous things, event models.

Tarantool also works with the network and files: it has its own HTTP server, its own libraries, which save and open files; This also came in handy when solving problems.

Tarantool is a Lua application server, and Lua is an embedded language Tarantool. Here I gave an example of a very artificial, absolutely not used in practice script to show what Lua is:

#!/usr/bin/tarantool -- This is lua script function hw(a, b) print (a.hello..b.world) end b = {} a = { hello = 'Hello ' } b['world'] = 'world!' hw(a, b) 

The Lua language was developed in Brazil, at a Catholic university; it originated from the language SOL, which works with data and was sharpened for work with databases. Here we can see that this is not just a script, but an executable script. The grid and the exclamation mark are the mechanism that allows us to specify who should run this script and how. Having entered tarantool script.lua in the console, we get hello world on the screen. Here we see a function that works with two objects, and below we initialize these objects themselves.

The main data structure of Lua is the tables. The objects a and b are tables, and I specifically initialized them in different ways to show that Lua is very flexible and syntactically pleasant. These tables may contain other data within themselves - for example, the same tables, which, in turn, may contain other tables. Sometimes due to inexperience, very large nestings were made. Functions can also be stored inside a table. In general, it is also possible to work with the “functions” object as with a table — there are methods for this.

Next, I will give an example of a more practical script that can be refined and used in some of your production. He solves a small problem, and solves quite trivially: he counts the number of unique visitors on a page.

 #!/usr/bin/tarantool -- Tarantool init script local log = require('log') local console = require('console') local server = require('http.server') local HOST = 'localhost' local PORT = 8008 box.cfg { log_level = 5, slab_alloc_arena = 1, } console.listen('127.0.0.1:33013') if not box.space.users then s = box.schema.space.create('users') s:create_index('primary', {type = 'tree', parts = {1, 'NUM'}}) end function handler(self) local id = self:cookie('tarantool_id') local ip = self.peer.host local data = '' log.info('Users id = %s', id) if not id then data = 'Welcome to tarantool server!' box.space.users:auto_increment({ip}) id = box.space.users:len() return self:render({ text = data}): setcookie({ name = 'tarantool_id', value = id, expires = '+1y' }) else local count = box.space.users:len() data = 'You id is ' .. id .. '. We have ' .. count .. ' users' return self:render({ text = data }) end end httpd = server.new(HOST, PORT) httpd:route({ path = '/' }, handler) httpd:start() 

This is the so-called executable Lua script that will run in Tarantool and perform the sequence of actions that is contained in it.

Briefly go through the blocks of what is there, and then analyze the details.

At the top, I connect the packages that I need through the require: log, console, server loach mechanism. I define some constants that I use.

Next comes the configuration of the Tarantool database using the box.cfg module, where I set two parameters I need. I launch the console and create entities of our database using box.schema.space.create('users') - i.e. I created the users space. I will tell about all this a bit later.

The second part of the script is to work with the Tarantool server: I describe the handler function (request handler), and below I create a server, create a route for processing, and start this server.

From the user's side, the result of running this script looks like this:



The user went, for example, to localhost and saw a welcome message. In the future, if he updates the page, he will already have a link to the cookie , he will be assigned an id , and he will know the number of unique users who visit this page.

This small script solves some task I need - this is the answer to the question of why we use the Lua language.

Lua is quite simple. There are whole sets of articles “Lua in 15 minutes”, “in 30 minutes” ... You really need a little bit to get to know this interesting language. For a couple of hours you will learn all its features.

It is very convenient that the main data structure is a table. This allows you to work consistently with all other data.

By itself, the standard Lua interpreter is not very fast, or rather, it is not very fast, but there is a similar LuaJIT interpreter that does the JIT compilation, and it is much faster. It is he who makes Lua a very productive language.

There is a luafun library that allows you to program Lua in a functional style. Thanks to LuaJIT, this library is very fast. You can google, read about it, see reviews about its performance - very interesting.

Also, Lua is a very well-built language, it has excellent integration with the C language. Due to the fact that sshishnye procedures can be run inside Lua, and Lua can be run inside C, Lua is very widely used in game dev. Quite a lot of extensions, quests and various game mechanics and logic in the famous game World of Warcraft are written and are still being written in Lua.

Tarantool is a full-fledged Lua interpreter, i.e. if you just run Tarantool, you can trite working with Lua.



Tarantool can be run in two entities:


Let us examine in more detail the above example of the init.lua startup script.



Work begins with the base with its configuration, i.e. the box.cfg mechanism is a box package that contains a cfg label inside itself, and I can set some of its parameters. A box is a box, a box. This package is directly responsible for the database. You can just run Tarantool, execute procedures, functions, write some messages, etc., but you won’t run the database without configuring box.cfg . In this case, I set two parameters that I would like to see. First, I set the level of the logs to be printed - this is the fifth level, DEBUG. I also set a very important parameter slab_alloc_arena - this is the memory that is allocated in Tarantool - more precisely, in RAM - for allocating and placing data. In this case, “1” is 1 GB.

Also, the box package contains many other auxiliary things and tools, such as:



If in the Tarantool interpreter, after setting the parameters, type box.cfg , then you will get an object with a description of all the parameters that are - not only those that I specified, but also those that are set by default.

Here we see that I have set slab_alloc_arena - 1 GB allocation space, see log_level five log_level (DEBUG), we also see an important parameter snapshot_count - this is the number of snapshots that Tarantool will store. In this case, it will store the last 6 images that were taken over a certain period. By the way, this period is also set here using the snapshot_period parameter; its default value is 3600 seconds, i.e. Tarantool will take a snapshot once every hour. You yourself can choose the required level of security, you can take a picture at least every second or minute, but it will take away your resources very much. The snap_dir and wal_dir define where you store your snapshots and logs, respectively.



Here is an example box.info package. Here you can see information about Tarantool, i.e. if Tarantool is running as a daemon, you can find out its pid, the version (version 1.6.5 is currently relevant), the running time and the status of your machine.

After you have configured the data, you can proceed to create the entities themselves, the data itself inside Tarantool.



Here I gave a picture from the documentation. This is an image of the Tarantool data model, i.e. in Tarantool, data is stored in spaces, each space has an entity tuple - these are records and indices that you specify, primary or secondary.

Having finished configuring, we proceed to populate the database, i.e. I need a space in which I will set information about users.



You may notice that I placed this information in a conditional construct. This was done for a specific purpose: if for some reason your Tarantool was stopped and you start it again with saved images and syllables, it will restore its state before starting, i.e. takes a snapshot of the state and does an action from xlog. If you run it like this, it will not allow you to create the necessary users space, but you probably don’t need it, so we often insert such a check in order not to get an error. If we have not created such a space, then we create a space and an index to it. In this case, the primary is the primary index in the form of trees, which is a single number.

Later in the script I need to add new user records. You can do this using the standard insert operation, where we pass the key-value pair, but in my case in this simple script it is convenient to do this with the help of auto_increment . The user will log in, and the key will automatically be assigned one more than the number of records in the database at the moment. If I want to know the number of records in my database, I can use the standard len() mechanism. As you can see, the syntax is very simple and straightforward.



As mentioned above, Tarantool is not just a database, but a full-featured Lua application server. Probably, the developers here meant that in the Lua language you can write any modules and packages and thereby implement the logic you need. In fact, you are not inventing a big bike - you can invent a few small ones if you really need it or not in other solutions.

You can see all this in the repositories on GitHub. The basic modules that are somehow used are http and queue. For example, try.tarantool.org is written entirely in Tarantool, it uses Tarantool-storage, Tarantool-server. Tarantool also supports LuaRocks, a package manager that works with its repository and through which it is very convenient to install packages. This is done by one team.



Packages. Packages need to connect.

A package means any other Lua script that implements some kind of logic. By connecting this package, you can get methods from this file, some data from this file, some variables. In this example, I connect with the help of the require mechanism two packets, console and log .

I launch the console on localhost and hang it on port 33013. With the help of the log package, I can write to the log. The console here refers to the administrator console or remote management console, which allows you to monitor the status of Tarantool. It's easy to do this: if your console is running, standard Unix tools or some other, such as telnet and rlwrap, will do. telnet is needed to connect to the port and listen to it, a rlwrap for convenient command entry and saving command history.

You can go to that Tarantool that works for you and see some information from box.info or box.stat .



The package I use and which is often needed is http. This is still a limited HTTP server, but it works with many of the necessary mechanisms. In this case, I connected the package, created the server, hung up the route for processing, launched it. And then in the handler function I returned the response to the server as text information and set the cookie user - tarantool_id , value = id . Also set the expiration time, i.e. removal time; Here cookie are stored for a year.



The basic mechanisms of the http package allow for the implementation of minimal logic, i.e. there is a very full server, there is a client. This package works with cookies and supports Lua as some kind of embedded language for some variables inside the Template, i.e. we can write small routines inside HTML on Lua.

 #!/usr/bin/tarantool -- Tarantool init script local log = require('log') local console = require('console') local server = require('http.server') local HOST = 'localhost' local PORT = 8008 box.cfg { log_level = 5, slab_alloc_arena = 1, } console.listen('127.0.0.1:33013') if not box.space.users then s = box.schema.space.create('users') s:create_index('primary', {type = 'tree', parts = {1, 'NUM'}}) end 

I tried to tell the main points of this script, so you should already be more clear. To fix, you can and once again walk. We have an executable script Tarantool with a comment. Next we connect packages through require . We have two variables - HOST and PORT . Next comes the configuration of Tarantool via box.cfg , and I set two parameters: log_level (logging level) and slab_alloc_arena (space for allocations).

I create an admin console that I will use. Further, if I do not have the necessary space, I create a users space using box.schema.space.create and create an index for it.

 function handler(self) local id = self:cookie('tarantool_id') local ip = self.peer.host local data = '' log.info('Users id = %s', id) if not id then data = 'Welcome to tarantool server!' box.space.users:auto_increment({ip}) id = box.space.users:len() return self:render({ text = data}): setcookie({ name = 'tarantool_id', value = id, expires = '+1y' }) else local count = box.space.users:len() data = 'You id is ' .. id .. '. We have ' .. count .. ' users' return self:render({ text = data }) end end httpd = server.new(HOST, PORT) httpd:route({ path = '/' }, handler) httpd:start() 

In the processing function, I receive cookie that are contained by a user who has visited my page. I watch his IP, write to the log. If the id not in tarantool_id , then I will autorate the IP information of this user into the database, look at its id and return the welcome information to the data and assign the cookie value to id . Otherwise, I count the number of entries in our tables and return to the user just the number of unique visitors. And at the end, when I described the function, I start the server itself and already work with it.

This is a simple example, but thanks to modules and the extensibility of access to the Lua language, it can be added, added, added, and after some time brought to a state that is used in real projects.



Tarantool has a lot of different packages. There is a package for working with JSON, there is a fiber package (I will talk about it in more detail below), yaml, the digest cryptographic library (contains the basic necessary encryption mechanisms). There is a package of non-blocking sockets, and you can work on the network yourself, implement some protocols. There is a work with MessagePack, there is a fio library (file input / output) for working with files. And there is an interesting net.box mechanism that allows Tarantool to work on a binary protocol - for example, with another Tarantool; It turns out very quickly and conveniently. Also, net.box.sql is implemented for working with any relational SQL database.



Faybery are so-called light flows that work on the model of green threads. The main difference between them and standard streams is that they are created and work inside Tarantool; therefore, they are created fairly quickly and have good switching performance. They can be useful to you if you are implementing some kind of asynchronous model, or you need to run some kind of daemon, which executes something else in parallel with the main logic.

The basic principles of working with a fayber: fayber needs to be created, fayber can be put on standby using fiber.sleep, and fiber_object is fiber.create, you can always equip it and finish working with it.

Very convenient fiber.time library, which from the event loop that counts time, can always bring us the desired value.

With the help of the fiber library, a very popular expirationd library was written, which can remove from the database for some reason; usually this time, i.e. everything that is stored, say, a month, can be removed and cleaned.

We can talk about Tarantool for a long time, we also do not know everything. I don’t know if the developers themselves know all about it. You can always read the documentation on tarantool.org , recently it has changed and become more readable.



Tarantool supports most Unix-like systems, they have their own Buildbot, and we always monitor the emergence of new packages - we work on Red Hat Enterprise Linux. Also, the Tarantool developers officially support the Tarantool package in the same Debian.

And a very important point that I like: in Tarantool communication with developers is possible. I had questions, I found developers on Skype. Kostya Osipov, the main developer of Tarantool, read a small report on the queue at this conference. For developers, especially beginners, it is very important to seek advice and learn first-hand how to do this or that. We must be prepared that the guys who develop open-source applications are very peculiar, there is a very peculiar community. Perhaps this picture will be able to say more than I could:



But at the same time, communication in the community can be a very interesting experience that will allow you to grow yourself and make your projects a little better.

In the end I would like to summarize the report.



Each NoSQL solution has its own scope of application. It is often very difficult to say which base is better or worse, which is more productive. They really often solve different problems.

A development tool is very important: a properly selected tool allows you to develop quickly and easily and avoid a lot of problems. But do not forget that more important is the idea, the goal; after all, the task of each developer is to solve a problem, implement some ideas of his own, and make this world a little bit better.

I hope I could show you that Tarantool is completely uncomplicated and you can also try using it. Thanks for attention.

This report is a transcript of one of the best speeches at the training conference for developers of high-load systems HighLoad ++ Junior .

Also, some of these materials are used by us in an online training course on the development of high-load systems HighLoad.Guide is a chain of specially selected letters, articles, materials, videos. Already, in our textbook more than 30 unique materials. Get connected!

Well, the main news is that we have begun preparations for the spring festival " Russian Internet Technologies ", which includes eight conferences, including HighLoad ++ Junior .

Source: https://habr.com/ru/post/319968/


All Articles