Berkeley DB STL Interface

Hi, Habr. Not so long ago, for one of my projects, an embedded database was needed, which would store the elements in the form of a key-value, provide support for transactions, and, optionally, encrypt data. After a brief search, I came across a Berkeley DB project. In addition to the features I need, this database provides an STL-compatible interface that allows you to work with the database as you would with an ordinary (almost ordinary) STL container. Actually, this interface will be discussed below.

Berkeley db

Berkeley DB is an embedded, scalable, high-performance, open source database. It is available free of charge for use in open source projects, but for proprietary there are significant limitations. Supported features:

transactions
proactive log for failover recovery
AES data encryption
replication
indexes
synchronization tools for multi-threaded applications
access policy - one writer, many readers
caching

And so many others.

When the system is initialized, the user can specify which subsystems to use. This allows you to eliminate the waste of resources on such operations as transactions, logging, locks, when they are not needed.

The choice of storage structure and data access is available:

Btree - implementation of sorted balanced tree
Hash - linear hash implementation
Heap - for storage uses heap file , logically paginated. Each entry is identified by a page and an offset within it. The storage is organized in such a way that the deletion of the record does not require compaction. This allows you to use it with a lack of physical space.
Queue - a queue that stores records of fixed length with a logical number as a key. It is designed for quick insertion at the end, and supports a special operation that deletes and returns an entry from the head of the queue in one call.
Recno - allows you to save records of both fixed and variable length with a logical number as a key. Provides access to an item by its index.

To avoid ambiguity, it is necessary to define several concepts that are used in the description of the work of Berkeley DB .

Database - data store in the form of a key-value. An analogue of the Berkeley DB database in other DBMS is a table.

The database environment is a wrapper for one or more databases . It defines general settings for all databases , such as cache size, file storage paths, use and configuration of locking, transaction, and logging subsystems.

In a typical use case, an environment is created and configured, and in it one or more databases .

STL interface

Berkeley DB is a library written in C. It has binding to languages such as Perl , Java , PHP and others. The interface for C ++ is a shell over C code with objects and inheritance. In order to make it possible to access the database in the same way as operations with STL- containers, there is an STL- interface, like an add-in over C ++ . Graphically, the layers of interfaces look like this:

For example , an STL interface allows you to retrieve an item from a database by key (for Btree or Hash ) or by index (for Recno ), similarly to containers std::map or std::vector , to find an element in the database through the standard algorithm std::find_if , iterate over the entire database through the foreach . All classes and functions of the Berkeley DB STL interface are in the dbstl namespace, for short, the dbstl will also be understood as the STL interface.

Installation

The database supports most Linux platforms , Windows , Android , Apple iOS , etc.

For Ubuntu 18.04 , install the following packages:

libdb5.3-stl-dev
libdb5.3 ++ - dev

To build from source for Linux, you must install autoconf and libtool . The latest version of the source code can be found at the link .

For example, I downloaded the archive with version 18.1.32 - db-18.1.32.zip. It is necessary to unpack the archive and go to the source folder:

 unzip db-18.1.32.zip cd db-18.1.32

Next, move to the build_unix directory and run the build and installation:

 cd build_unix ../dist/configure --enable-stl --prefix=/home/user/libraries/berkeley-db make make install

Add to cmake project

To illustrate examples with Berkeley DB, use the BerkeleyDBSamples project.

The structure of the project is as follows:

 +-- CMakeLists.txt +-- sample-usage | +-- CMakeLists.txt | +-- sample-map-usage.cpp | +-- submodules | +-- cmake | | +-- FindBerkeleyDB

The root CMakeLists.txt describes the general parameters of the project. Source files with examples are in sample-usage . sample-usage / CMakeLists.txt searches for libraries, determines the assembly of examples.

FindBerkeleyDB is used in the examples to connect the library to the cmake project. It is added as a git submodule in submodules / cmake . Build may require BerkeleyDB_ROOT_DIR . For example, for the library above installed from source, you need to specify the cmake flag -DBerkeleyDB_ROOT_DIR=/home/user/libraries/berkeley-db .

In the root CMakeLists.txt file, add the path to the FindBerkeleyDB module to the CMAKE_MODULE_PATH :

 list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/submodules/cmake/FindBerkeleyDB")

After that in sample-usage / CMakeLists.txt the library is searched in the standard way:

 find_package(BerkeleyDB REQUIRED)

Next, add the executable file and link it with the library Oracle :: BerkeleyDB :

 add_executable(sample-map-usage "sample-map-usage.cpp") target_link_libraries(sample-map-usage PRIVATE Oracle::BerkeleyDB ${CMAKE_THREAD_LIBS_INIT} stdc++fs)

Practical example

To demonstrate the use of dbstl, let ’s analyze a simple example from sample-map-usage.cpp . This application demonstrates working with the dbstl::db_map in a single-threaded program. The container itself is similar to std::map and stores data as a key / value pair. Btree or Hash can be used as the underlying database structure. Unlike std::map , for the dbstl::db_map<std::string, TestElement> actual value type is dbstl::ElementRef<TestElement> . This type is returned, for example, for dbstl::db_map<std::string, TestElement>::operator[] . It defines methods for storing an object of type TestElement in the database. One of these methods is operator= .

In the example, working with the database is as follows:

The application calls Berkeley DB data access methods.
these methods access the cache for reading or writing
if necessary, the file is directly accessed

Graphically, this process is shown in the figure:

To reduce the complexity of the example, it does not use exception handling. Some dbstl container methods may throw exceptions when errors occur.

Parsing code

To work with Berkeley DB, you need to connect two header files:

 #include <db_cxx.h> #include <dbstl_map.h>

The first one adds the primitives of the C ++ interface, and the second one defines the classes and functions for working with the database, as with the associative container, as well as many utility methods. The STL interface is located in the dbstl namespace.

For storage, the Btree structure is used , the key is std::string , and the value is the custom structure TestElement :

 struct TestElement{ std::string id; std::string name; };

In the main function, we initialize the library by calling dbstl::dbstl_startup() . It must be located before the first use of primitives STL- interface.

After that, we initialize and open the database environment in the directory specified by the ENV_FOLDER variable:

 auto penv = dbstl::open_env(ENV_FOLDER, 0u, DB_INIT_MPOOL | DB_CREATE);

The DB_INIT_MPOOL flag DB_INIT_MPOOL responsible for initializing the caching subsystem, DB_CREATE for creating all the necessary media files. The same team registers this object in the resource manager. He is responsible for closing all registered objects (it also records database objects, cursors, transactions, etc.) and clearing dynamic memory. If you already have a database environment object and need only register it with the resource manager, you can use the dbstl::register_db_env function.

A similar operation is performed with the database :

 auto db = dbstl::open_db(penv, "sample-map-usage.db", DB_BTREE, DB_CREATE, 0u);

The data will be written to disk in the sample-map-usage.db file , which will be created in the absence (thanks to the DB_CREATE flag) in the ENV_FOLDER directory. Tree is used for storage (parameter DB_BTREE ).

In Berkeley DB, keys and values are stored as an array of bytes. To use a custom type (in our case, TestElement ), you must specify functions for:

get the number of bytes for object storage;
marshaling an object to an array of bytes;
unmarshaling

In the example, this functional is performed by static methods of the TestMarshaller class. It TestElement objects in memory, as follows:

The length of the id field is copied to the beginning of the buffer.
next byte is the content of the id field.
after it the size of the name field is copied
then the content from the name field is placed

We describe the functions of TestMarshaller :

TestMarshaller::restore - fills the TestElement object with data from the buffer
TestMarshaller::size - returns the size of the buffer, which is required to save the specified object.
TestMarshaller::store - saves the object in the buffer.

To register marshaling / dbstl::DbstlElemTraits functions, use dbstl::DbstlElemTraits :

 dbstl::DbstlElemTraits<TestElement>::instance()->set_size_function(&TestMarshaller::size); dbstl::DbstlElemTraits<TestElement>::instance()->set_copy_function(&TestMarshaller::store); dbstl::DbstlElemTraits<TestElement>::instance()->set_restore_function( &TestMarshaller::restore );

Initialize the container:

 dbstl::db_map<std::string, TestElement> elementsMap(db, penv);

This is how copying elements from std::map into the created container looks like:

 std::copy( std::cbegin(inputValues), std::cend(inputValues), std::inserter(elementsMap, elementsMap.begin()) );

But in this way you can print the contents of the database to the standard output:

 std::transform( elementsMap.begin(dbstl::ReadModifyWriteOption::no_read_modify_write(), true), elementsMap.end(), std::ostream_iterator<std::string>(std::cout, "\n"), [](const auto data) -> std::string { return data.first + "=> { id: " + data.second.id + ", name: " + data.second.name + "}"; });

A call to the begin method in the example above looks a bit unusual: elementsMap.begin(dbstl::ReadModifyWriteOption::no_read_modify_write(), true) .
This design is used to get an iterator read-only . dbstl does not define the cbegin method, instead it uses the readonly parameter (the second in a row) in the begin method. You can also use a constant container reference to get an read-only iterator. Such an iterator allows only a read operation, it will throw an exception when writing.

Why is the code above using read-only iterator? First, only a read operation through an iterator is performed. Secondly, the documentation states that it has better performance than the regular version.

Adding a new key / value pair, or, if the key already exists, updating the value is as easy as in std::map :

 elementsMap["added key 1"] = {"added id 1", "added name 1"};

As mentioned above, the elementsMap["added key 1"] introduction returns a wrapper class for which operator= is redefined, the subsequent call of which directly saves the object in the database.

If you need to insert an item into the container:

 auto [iter, res] = elementsMap.insert( std::make_pair(std::string("added key 2"), TestElement{"added id 2", "added name 2"}) );

Calling elementsMap.insert returns std::pair<, > . If the object cannot be inserted, the success flag will be false . Otherwise, the success flag is true , and the iterator points to the inserted object.

Another way to find a value by key is to use the dbstl::db_map::find method, similar to std::map::find :

 auto findIter = elementsMap.find("test key 1");

Through the resulting iterator, you can access the key - findIter->first , the fields of the element TestElement - findIter->second.id and findIter->second.name . To extract a key / value pair , use the dereference operator - auto iterPair = *findIter; .

When applying to the iterator a dereference operator ( * ) or access to a class member ( -> ), the database is accessed and data is extracted from it. Moreover, previously extracted data, even if they were modified, are erased. This means that in the example below, the changes made on the iterator will be discarded and the value stored in the database will be output to the console.

 findIter->second.id = "skipped id"; findIter->second.name = "skipped name"; std::cout << "Found elem for key " << "test key 1" << ": id: " << findIter->second.id << ", name: " << findIter->second.name << std::endl;

To avoid this, you need to get the wrapper of the stored object from the iterator by calling findIter->second and save it to a variable. Next, make all changes on this wrapper, and write the result to the database by calling the _DB_STL_StoreElement wrapper _DB_STL_StoreElement :

 auto ref = findIter->second; ref.id = "new test id 1"; ref.name = "new test name 1"; ref._DB_STL_StoreElement();

It is even easier to update the data - just get the wrapper with the findIter->second instruction and assign the required TestElement object to TestElement , as in the example:

 if(auto findIter = elementsMap.find("test key 2"); findIter != elementsMap.end()){ findIter->second = {"new test id 2", "new test name 2"}; }

Before the end of the program, you must call dbstl::dbstl_exit(); to close and delete all registered objects in the resource manager.

In custody

This article provides a brief overview of the basic features of dbstl containers using the example of dbstl::db_map in a simple single-threaded program. This is only a small introduction, and features such as transactionalism, locks, resource management, exception handling, and execution in a multithreaded environment are not considered here.

I did not intend to describe in detail the methods and their parameters, for this it is better to refer to the relevant documentation on the C ++ interface and on the STL interface

Source: https://habr.com/ru/post/459862/

All Articles