Distributed C ++ applications with minimal effort

The purpose of my post is to talk about the C ++ API Apache Ignite distributed database, which is called Ignite C ++, as well as its features.

About the Apache Ignite on Habré wrote more than once, so surely some of you already have some idea what it is and why you need it.

Briefly about Apache Ignite for those who are not familiar with it.

I will not go into details about how Apache Ignite appeared and how it differs from classic databases. All these questions have already been raised here , here or here .

So, Apache Ignite is essentially a fast distributed database optimized for working with RAM. Ignite itself grew out of the date-grid (In-memory Data Grid) and, until recently, was positioned as a very fast, distributed cache that is completely in memory on the basis of a distributed hash table. That is why, in addition to storing data, it has many convenient features for their fast distributed processing: Map-Reduce, atomic data operations, full ACID transactions, SQL queries on data, so-called Continues Queries, which make it possible to monitor changes in certain data and others.

However, recently the platform has added support for persistent data storage on disk . After that, Apache Ignite got all the advantages of a complete object-oriented database, while maintaining the convenience, wealth of tools, flexibility and speed of the grid.

Some theory

An important detail for understanding how to work with Apache Ignite is that it is written in Java. You ask: “What difference does it make to me, what is the database written on, if I communicate with it in any case using SQL?”. There is some truth in this. If you want to use Ignite only as a database, you can easily take the ODBC or JDBC driver that comes with Ignite, raise the number of server nodes you need with the help of the specially created ignite.sh script, configure them with flexible configs and not much soars about the language, working with Ignite, at least from PHP, at least from Go.

The native Ignite interface provides much more than just SQL. From the simplest: fast atomic operations with objects in the database, distributed synchronization objects and distributed computing in a cluster on local data, when you do not need to drag hundreds of megabytes of data to a client for calculations. As you understand, this part of the API does not work through SQL, but is written in quite specific general purpose programming languages.

Naturally, since Ignite is written in Java, the most complete API is implemented in this programming language. However, besides Java, there are also API versions for C # .NET and C ++. These are the so-called "fat" clients - in fact, the Ignite node in the JVM, launched from C ++ or C #, which is communicated through JNI. This type of node is necessary, among other things, in order for the cluster to be able to run distributed computing in the appropriate languages - C ++ and C #.

In addition, there is an open protocol for the so-called "thin" clients. These are already lightweight libraries in various programming languages that communicate with the cluster via TCP / IP. They take up much less space in memory, start almost instantly, do not require a JVM on the machine, but they have somewhat worse latency and not so rich API compared to fat clients. Today there are thin clients in Java, C #, and Node.js, clients are actively developed in C ++, PHP, Python3, Go.

In the post, I will look at the Ignite Fat Interface API for C ++, since it is he who currently provides the most complete API.

Beginning of work

I will not dwell on the process of installing and configuring the framework itself - the process is routine, not very interesting and well described, for example, in official documentation . Let's go straight to the code.

Since Apache Ignite is a distributed platform, first of all you need to start at least one node to get started. This is done very simply with the help of the class ignite::Ignition :

 #include <iostream> #include <ignite/ignition.h> using namespace ignite; int main() { IgniteConfiguration cfg; Ignite node = Ignition::Start(cfg); std::cout << "Node started. Press 'Enter' to stop" << std::endl; std::cin.get(); Ignition::StopAll(false); std::cout << "Node stopped" << std::endl; return 0; }

Congratulations, you launched your first Apache Ignite node in C ++ with default settings. The Ignite class, in turn, is the main entry point for accessing the entire cluster API.

Work with data

The main component of Ignite C ++, which provides an API for working with data, is a cache, ignite::cache::Cache<K,V> . The cache provides a basic set of methods for working with data. Since Cache is essentially an interface to a distributed hash table, the basic methods for working with it resemble work with ordinary containers like map or unordered_map .

 #include <string> #include <cassert> #include <cstdint> #include <ignite/ignition.h> using namespace ignite; struct Person { int32_t age; std::string firstName; std::string lastName; } //... int main() { IgniteConfiguration cfg; Ignite node = Ignition::Start(cfg); cache::Cache<int32_t, Person> personCache = node.CreateCache<int32_t, Person>("PersonCache"); Person p1 = { 35, "John", "Smith" }; personCache.Put(42, p1); Person p2 = personCache.Get(42); std::cout << p2 << std::endl; assert(p1 == p2); return 0; }

Looks pretty simple, right? In fact, everything is somewhat complicated if we take a closer look at the limitations of C ++.

C ++ integration challenges

As I mentioned, Apache Ignite is written entirely in Java, a powerful OOP-driven language. It is natural that many of the features of this language, associated, for example, with the reflection of the program execution time, were actively used to implement Apache Ignite components. For example, for serialization / deserialization of objects for storage on disk and transfer over the network.

In C ++, unlike Java, there is no such powerful reflection. In general, there is no yet, unfortunately. In particular, there are no ways to find out the list and type of object fields, which would allow automatically generating the code necessary for serializing / deserializing objects of custom types. Therefore, the only option here is to ask the user to explicitly provide the necessary set of metadata about the user type and how to work with it.

In Ignite C ++, this is implemented through the specialization of the ignite::binary::BinaryType<T> template. This approach is used in both “thick” and “thin” clients. For the Person class presented above, a similar specialization might look like this:

 namespace ignite { namespace binary { template<> struct BinaryType<Person> { static int32_t GetTypeId() { return GetBinaryStringHashCode("Person"); } static void GetTypeName(std::string& name) { name = "Person"; } static int32_t GetFieldId(const char* name) { return GetBinaryStringHashCode(name); } static bool IsNull(const Person& obj) { return false; } static void GetNull(Person& dst) { dst = Person(); } static void Write(BinaryWriter& writer, const Person& obj) { writer.WriteInt32("age", obj.age; writer.WriteString("firstName", obj.firstName); writer.WriteString("lastName", obj.lastName); } static void Read(BinaryReader& reader, Person& dst) { dst.age = reader.ReadInt32("age"); dst.firstName = reader.ReadString("firstName"); dst.lastName = reader.ReadString("lastName"); } }; } // namespace binary } // namespace ignite

As you can see, in addition to the methods of serialization / deserialization of BinaryType<Person>::Write , BinaryType<Person>::Read , there are several other methods here. They are needed in order to explain to the platform how to work with custom C ++ types in other languages, in particular, Java. Let's take a closer look at each method:

GetTypeName() - Returns the type name. The type name must be the same on all platforms on which this type is used. If you use the type only in Ignite C ++, the name can be anything.
GetTypeId() - This method returns a cross-platform unique identifier for the type. To work correctly with a type on different platforms, it is necessary that it is calculated the same everywhere. The GetBinaryStringHashCode(TypeName) method returns the same Type ID as on all other platforms by default, that is, such an implementation of this method allows you to work correctly with this type from other platforms.
GetFieldId() - Returns a unique identifier for the type name. Again, for correct cross-platform work, use the GetBinaryStringHashCode() method;
IsNull() - Checks if an instance of a class is an object of type NULL . Used to correctly serialize NULL values. Not very useful with instances of the class itself, but it can be extremely convenient if the user wants to work with smart pointers and define specialization, for example, for BinaryType< std::unique_ptr<Person> > .
GetNull() - Called when trying to deserialize a NULL value. Everything said about IsNull is also true for GetNull() .

SQL

If we draw an analogy with classical databases, the cache is a database schema with the name of a class containing one table — with the name of a type. In addition to schema-caches, there is a general scheme called PUBLIC , in which you can create / delete an unlimited number of tables using standard DDL commands, such as CREATE TABLE , DROP TABLE and so on. It is precisely to the PUBLIC scheme that they are usually connected via ODBC / JDBC if they want to use Ignite simply as a distributed database.

Ignite supports full-fledged SQL queries, including DML and DDL. There is no support for SQL transactions yet, but the community is now actively working on the implementation of the MVCC, which will allow adding transactions, and as far as I know, major changes have recently been infused into master.

To work with cache data through SQL, you must explicitly specify in the cache configuration which fields of the object will be used in SQL queries. The configuration is written in the XML file, after which the path to the configuration file is specified when the node is started:

 <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:util="http://www.springframework.org/schema/util" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util.xsd"> <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"> <property name="cacheConfiguration"> <list> <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="PersonCache"/> <property name="queryEntities"> <list> <bean class="org.apache.ignite.cache.QueryEntity"> <property name="keyType" value="java.lang.Integer"/> <property name="valueType" value="Person"/> <property name="fields"> <map> <entry key="age" value="java.lang.Integer"/> <entry key="firstName" value="java.lang.String"/> <entry key="lastName" value="java.lang.String"/> </map> </property> </bean> </list> </property> </bean> </list> </property> </bean> </beans>

The config is parsed by the Java engine, so basic types must also be specified for Java. After the configuration file is created, you need to start the node, get an instance of the cache, and you can start using SQL:

 //... int main() { IgniteConfiguration cfg; cfg.springCfgPath = "config.xml"; Ignite node = Ignition::Start(cfg); cache::Cache<int32_t, Person> personCache = node.GetCache<int32_t, Person>("PersonCache"); personCache.Put(1, Person(35, "John", "Smith")); personCache.Put(2, Person(31, "Jane", "Doe")); personCache.Put(3, Person(12, "Harry", "Potter")); personCache.Put(4, Person(12, "Ronald", "Weasley")); cache::query::SqlFieldsQuery qry( "select firstName, lastName from Person where age = ?"); qry.AddArgument<int32_t>(12); cache::query::QueryFieldsCursor cursor = cache.Query(qry); while (cursor.HasNext()) { QueryFieldsRow row = cursor.GetNext(); std::cout << row.GetNext<std::string>() << ", "; std::cout << row.GetNext<std::string>() << std::endl; } return 0; }

In the same way, you can use insert , update , create table and other queries. Of course, cross-cache requests are also supported. However, in this case, the cache name should be specified in the query in quotation marks as the name of the schema. For example, instead of

 select * from Person inner join Profession

should write

 select * from "PersonCache".Person inner join "ProfessionCache".Profession

And so on

There are really a lot of opportunities in Apache Ignite and, of course, in one post it was impossible to cover them all. C ++ API is actively developing now, so soon there will be even more interesting things. It is possible that I will write a few more posts, where I will analyze some features in more detail.

PS I have been an Apache Ignite committer since 2017 and have been actively developing the C ++ API for this product. If you reasonably know C ++, Java or .NET and would like to participate in the development of an open product with an active friendly community, we always have a couple of interesting tasks for you.

Source: https://habr.com/ru/post/420623/

All Articles