Apache Cassandra + Apache Ignite - how to combine the best

Apache Cassandra is one of the popular open source distributed NoSQL disk databases. It is used in key parts of the infrastructure by giants like Netflix, eBay, Expedia, and has gained popularity for its speed, ability to scale linearly across thousands of nodes, and “best-in-class” replication between different data centers.

Apache Ignite is an In-Memory Computing Platform, a platform for distributed in-memory data storage and real-time distributed computing over it with support for JCache, SQL99, ACID transactions and basic machine learning algebra.

Apache Cassandra is a classic solution in its field. As in the case of any specialized solution, its advantages are achieved thanks to a number of trade-offs, much of which are caused by the limitations of disk data storage. Cassandra is optimized for the fastest possible work with them to the detriment of the rest. Examples of trade-offs: the lack of ACID transactions and SQL support, the impossibility of arbitrary transactional and analytical transactions, if the data is not adapted for them in advance. These trade-offs, in turn, cause regular difficulties for users, leading to incorrect use of the product and negative experience, or forcing to share data between different types of storage, fragmenting the infrastructure and complicating the logic of storing data in applications.
')
A possible solution to the problem is to use Cassandra in conjunction with Apache Ignite. This will preserve the key advantages of Cassandra, while compensating for its disadvantages due to the symbiosis of the two systems.

How? Read on and see sample code.

Limitations of Cassandra

First I want to briefly go through the main limitations of Cassandra, with which we will work:

Bandwidth and response time are limited by the characteristics of the hard disk or solid-state drive;
The specific storage structure optimized for sequential writing and reading is not adapted to the optimal performance of classical relational operations on data. This does not allow data to be normalized and mapped efficiently using JOINs, and also imposes significant restrictions, for example, on operations such as GROUP BY and ORDER;
As a consequence of paragraph 2 - the lack of support for SQL in favor of its more limited variation - CQL;
No ACID transactions.

You can bet that I want to use Cassandra for other purposes, and I fully agree. My goal is to show that if you solve these problems, the “purpose” of Cassandra can be significantly expanded. Combining a man and a horse, we get a rider who can have a completely different list of things than a man and a horse separately.

How can you get around these limitations?

I would say that the classic option is data fragmentation, when a part lies in Cassandra, and a part in other systems that support the necessary guarantees.

The disadvantages of this approach, which are seen offhand: an increase in the complexity (and, therefore, potentially, deterioration in speed and quality) of application development and support. It is much easier to knock on one gate, rather than at the level of an application or microservice layer, to combine disparate information from various sources. Also, the degradation of either of the two systems can lead to significant negative consequences, forcing the infrastructure team to chase after two hares.

Apache ignite

Another way is to put another system on top of Cassandra, sharing responsibility between them. I believe Apache Ignite is the perfect candidate for this scheme:

The performance constraint imposed by the disk disappears: Apache Ignite works with RAM, now there is nothing faster. In addition, it becomes cheaper so rapidly that it is possible to put a sufficient amount of RAM into the server pool (Apache Ignite is a distributed system, like Cassandra);
Full support for classic SQL99, including JOINs, GROUP BY, ORDER BY, as well as INSERT, UPDATE, DELETE, MERGE, and so on, allows you to normalize data, makes analytics easier, and taking into account performance when working with RAM - opens up the potential of HTAP, analytics real-time operational data;
Support for JDBC and ODBC standards makes it easy to integrate with existing tools, such as Tableau, and frameworks like Hibernate or Spring Data;
ACID transaction support, flexible and deep settings for fault tolerance and data duplication;
Distributed computing, stream data processing, machine learning — you can easily implement many new business scenarios that bring dividends.

In this scheme, Apache Ignite rises on top of Apache Cassandra, which plays the role of a layer of permanent non-volatile storage. Despite the fact that already in the upcoming versions of Apache Ignite there will appear its own Persistence solution with support for expanding memory with a disk, lazy run and pass-through SQL, Cassandra can still be interesting in this role due to its luster over many years of development, prevalence and sharing. responsibility, not putting all the eggs in one basket where it is not needed.

The Apache Ignite cluster absorbs all or part of the data from Apache Cassandra (for example, with the exception of archive data) for which you need to execute queries, and then works in write-through mode, independently servicing API or SQL read requests, and duplicating in synchronous or asynchronous mode requests to write to Cassandra, securely saving them to disk.

Further, these data are analyzed in real time, visualization tools like the Tableau can be used, distributed machine learning algorithms can be used, as well as showcases can be formed.

And on an example?

Next, I will give an example of a simple "synthetic" integration of Apache Cassandra and Apache Ignite, to show how this works and that it is not difficult at all, even if it requires a certain share of the boilerplate code.

First, I will create the necessary tables in Cassandra and fill them with data, then initialize the Java project and write the DTO classes, and then show the main part - configuring Apache Ignite to work with Cassandra.

I will use Mac OS Sierra, Cassandra 3.10 and Apache Ignite 2.0. In Linux, the commands should be similar.

Cassandra: tables and data

To get started, download the Cassandra distribution in the ~ / Downloads directory, following the link , or using curl / wget.

Next, go to the directory and unpack it:

$ cd ~/Downloads $ tar xzvf apache-cassandra-3.10-bin.tar.gz $ cd apache-cassandra-3.10

Run Cassandra with the default settings, this will be enough for testing.

 $ bin/cassandra

Next, run the Cassandra interactive shell and create test data structures (choose the usual surrogate id as the key - for tables in Cassandra, it often makes sense to choose keys that are more meaningful from the point of view of subsequent data extraction, but we simplify the example):

 $ cd ~/Downloads/apache-cassandra-3.10 $ bin/cqlsh

 CREATE KEYSPACE IgniteTest WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1}; USE IgniteTest; CREATE TABLE catalog_category (id bigint primary key, parent_id bigint, name text, description text); CREATE TABLE catalog_good (id bigint primary key, categoryId bigint, name text, description text, price bigint, oldPrice bigint); INSERT INTO catalog_category (id, parentId, name, description) VALUES (1, NULL, ' ', '     !'); INSERT INTO catalog_category (id, parentId, name, description) VALUES (2, 1, '', '  !'); INSERT INTO catalog_category (id, parentId, name, description) VALUES (3, 1, ' ', ' !'); INSERT INTO catalog_good (id, categoryId, name, description, price, oldPrice) VALUES (1, 2, ' Buzzword', '  2027!', 1000, NULL); INSERT INTO catalog_good (id, categoryId, name, description, price, oldPrice) VALUES (2, 2, ' Foobar', '  !', 300, 900); INSERT INTO catalog_good (id, categoryId, name, description, price, oldPrice) VALUES (3, 2, ' Barbaz', '   !', 500000, 300000); INSERT INTO catalog_good (id, categoryId, name, description, price, oldPrice) VALUES (4, 3, ' Habr#', ', , !', 10000, NULL);

Check that all data is recorded correctly:

cqlsh:ignitetest> SELECT * FROM catalog_category;

id | description | name | parentId
----+--------------------------------------------+--------------------+-----------
1 | ! | | null
2 | ! | | 1
3 | ! | | 1

(3 rows)
cqlsh:ignitetest> SELECT * FROM catalog_good;

id | categoryId | description | name | oldPrice | price
----+-------------+---------------------------+----------------------+-----------+--------
1 | 2 | 2027! | Buzzword | null | 1000
2 | 2 | ! | Foobar | 900 | 300
4 | 3 | , , ! | Habr# | null | 10000
3 | 2 | ! | Barbaz | 300000 | 500000

(4 rows)

Java project initialization

There are 2 ways to work with Ignite: you can download the distribution from ignite.apache.org , attach the necessary Jar files with your own classes and XML with configuration, or use Ignite as a dependency in a Java project. In this article I will consider the second option.

Let's create a new project - I will use maven as a classic and understandable tool for the widest, in my opinion, audience.

Depending on prescribe:

ignite-cassandra-store for integration with Cassandra;
ignite-spring to load the XML configuration in the Spring Context format, from here a piece of Spring will come transitively to us, alternatively, you can not include this package and create the necessary classes yourself (first of all, IgniteConfiguration).

Ignite-core, which contains the main Apache Ignite classes, will also be loaded transitively.

 <dependencies> <dependency> <groupId>org.apache.ignite</groupId> <artifactId>ignite-spring</artifactId> <version>2.0.0</version> </dependency> <dependency> <groupId>org.apache.ignite</groupId> <artifactId>ignite-cassandra-store</artifactId> <version>2.0.0</version> </dependency> </dependencies>

Next, you need to create DTO classes that will represent Cassandra tables in the Java world:

 import org.apache.ignite.cache.query.annotations.QuerySqlField; public class CatalogCategory { @QuerySqlField private long id; @QuerySqlField private Long parentId; @QuerySqlField private String name; @QuerySqlField private String description; // public getters and setters } public class CatalogGood { @QuerySqlField private long id; @QuerySqlField private long categoryId; @QuerySqlField private String name; @QuerySqlField private String description; @QuerySqlField private long price; @QuerySqlField private long oldPrice; // public getters and setters }

... or on Kotlin

 import org.apache.ignite.cache.query.annotations.QuerySqlField data class CatalogCategory(@QuerySqlField var id: Long, @QuerySqlField var parentId: Long?, @QuerySqlField var name: String?, @QuerySqlField var description: String?) { constructor() : this(0, null, null, null) } data class CatalogGood(@QuerySqlField var id: Long, @QuerySqlField var categoryId: Long, @QuerySqlField var name: String?, @QuerySqlField var description: String?, @QuerySqlField var price: Long, @QuerySqlField var oldPrice: Long) { constructor() : this(0, 0, null, null, 0, 0) }

We mark the @QuerySqlField annotation for the fields that will participate in SQL queries. If a field is not marked with this annotation, it will not be possible to extract it using SQL or filter by it.

You can also make finer settings for defining indexes and full-text indexes that are outside the scope of this example. More information about configuring SQL in Apache Ignite can be found in the corresponding section of the documentation .

Apache Ignite configuration

Create in src / main / resources our configuration in the file apacheignite-cassandra.xml (the name is chosen arbitrarily). I will give a complete configuration that is quite voluminous, and then consider it in parts:

 <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd"> <bean class="org.apache.ignite.cache.store.cassandra.datasource.DataSource" name="cassandra"> <property name="contactPoints" value="127.0.0.1"/> </bean> <bean class="org.apache.ignite.configuration.IgniteConfiguration"> <property name="cacheConfiguration"> <list> <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="CatalogCategory"/> <property name="writeThrough" value="true"/> <property name="sqlSchema" value="catalog_category"/> <property name="indexedTypes"> <list> <value type="java.lang.Class">java.lang.Long</value> <value type="java.lang.Class">com.gridgain.test.model.CatalogCategory</value> </list> </property> <property name="cacheStoreFactory"> <bean class="org.apache.ignite.cache.store.cassandra.CassandraCacheStoreFactory"> <property name="dataSource" ref="cassandra"/> <property name="persistenceSettings"> <bean class="org.apache.ignite.cache.store.cassandra.persistence.KeyValuePersistenceSettings"> <constructor-arg type="java.lang.String"><value><![CDATA[ <persistence keyspace="IgniteTest" table="catalog_category"> <keyPersistence class="java.lang.Long" strategy="PRIMITIVE" column="id"/> <valuePersistence class="com.gridgain.test.model.CatalogCategory" strategy="POJO"/> </persistence>]]></value></constructor-arg> </bean> </property> </bean> </property> </bean> <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="CatalogGood"/> <property name="readThrough" value="true"/> <property name="writeThrough" value="true"/> <property name="sqlSchema" value="catalog_good"/> <property name="indexedTypes"> <list> <value type="java.lang.Class">java.lang.Long</value> <value type="java.lang.Class">com.gridgain.test.model.CatalogGood</value> </list> </property> <property name="cacheStoreFactory"> <bean class="org.apache.ignite.cache.store.cassandra.CassandraCacheStoreFactory"> <property name="dataSource" ref="cassandra"/> <property name="persistenceSettings"> <bean class="org.apache.ignite.cache.store.cassandra.persistence.KeyValuePersistenceSettings"> <constructor-arg type="java.lang.String"><value><![CDATA[ <persistence keyspace="IgniteTest" table="catalog_good"> <keyPersistence class="java.lang.Long" strategy="PRIMITIVE" column="id"/> <valuePersistence class="com.gridgain.test.model.CatalogGood" strategy="POJO"/> </persistence>]]></value></constructor-arg> </bean> </property> </bean> </property> </bean> </list> </property> </bean> </beans>

The configuration format is Spring Beans.

The configuration can be divided into two sections: the definition of a DataSource to establish communication with Cassandra and the definition of Apache Ignite settings, which in this example are reduced to specifying working caches that fully correspond to the tables in Cassandra.

The first part of the configuration is concise:

  <bean class="org.apache.ignite.cache.store.cassandra.datasource.DataSource" name="cassandra"> <property name="contactPoints" value="127.0.0.1"/> </bean>

We define the source of the Cassandra data, we indicate the addresses by which we can try to establish a connection.

Next is the Apache Ignite configuration. As part of this test, there will be a minimum deviation from the default settings, so we only override the cacheConfiguration property, which will contain a list of caches running on the cluster:

  <bean class="org.apache.ignite.configuration.IgniteConfiguration"> <property name="cacheConfiguration"> <list> ... </list> </property> </bean>

The first cache represents the catalog_category table:

  <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="CatalogCategory"/> ... </bean>

In it, we enable the through read and write mode (if we write something to the cache, the write operation will be automatically duplicated in Cassandra), we indicate that SQL will use the catalog_category scheme, and also enumerate the types that will be stored in this cache and be processed to be accessed via SQL. Types are always specified in key-value pairs, so the number of list items must always be even.

  <property name="readThrough" value="true"/> <property name="writeThrough" value="true"/> <property name="sqlSchema" value="catalog_category"/> <property name="indexedTypes"> <list> <value type="java.lang.Class">java.lang.Long</value> <value type="java.lang.Class">com.gridgain.test.model.CatalogCategory</value> </list> </property>

Finally, we will establish a link with Cassandra, there will be two main subsections here. First, we specify a link to the previously created DataSource: cassandra. Secondly, we will need to specify how to relate the Cassandra tables and the key-value entries of Ignite to each other. This will be done through the persistenceSettings property, in which it is better to refer to the external XML file with the mapping configuration, but for simplicity, we will embed this XML directly into the Spring configuration as a CDATA element:

  <property name="cacheStoreFactory"> <bean class="org.apache.ignite.cache.store.cassandra.CassandraCacheStoreFactory"> <property name="dataSource" ref="cassandra"/> <property name="persistenceSettings"> <bean class="org.apache.ignite.cache.store.cassandra.persistence.KeyValuePersistenceSettings"> <constructor-arg type="java.lang.String"><value><![CDATA[ <persistence keyspace="IgniteTest" table="catalog_category"> <keyPersistence class="java.lang.Long" strategy="PRIMITIVE" column="id"/> <valuePersistence class="com.gridgain.test.model.CatalogCategory" strategy="POJO"/> </persistence>]]></value></constructor-arg> </bean> </property> </bean> </property>

The mapping configuration looks intuitive enough:

 <persistence keyspace="IgniteTest" table="catalog_category"> <keyPersistence class="java.lang.Long" strategy="PRIMITIVE" column="id"/> <valuePersistence class="com.gridgain.test.model.CatalogCategory" strategy="POJO"/> </persistence>

At the top level ( persistence tag), Keyspace (IgniteTest in this case) and Table ( catalog_category ) are indicated, which we will relate. Then it is indicated that the key of the Ignite-cache is the Long type, which is primitive and should correspond to the id column in the Cassandra table. The value is the class CatalogCategory , which should be formed using Reflection ( stategy="POJO" ) from the columns of the Cassandra table.

More information about the more detailed settings of mapping, which are beyond the scope of this example, can be found in the corresponding section of the documentation .

The configuration of the second cache containing product data is similar.

Launch

To start, create the com.gridgain.test.Starter class:

 package com.gridgain.test; import org.apache.ignite.Ignite; import org.apache.ignite.Ignition; public class Starter { public static void main(String... args) throws Exception { final Ignite ignite = Ignition.start("apacheignite-cassandra.xml"); ignite.cache("CatalogCategory").loadCache(null); ignite.cache("CatalogGood").loadCache(null); } }

Here we use the Ignition.start(...) instruction to start the Apache Ignite node, specifying the apacheignite-cassandra.xml file on the classpath as the configuration source.

SQL

To execute SQL queries, you can use any client that supports JDBC, for example, built into IntelliJ IDEA, or SquirrelSQL. In the latter case, for example, you will need to add the Apache Ignite driver (which is in the ignite-core Jar file, it can be downloaded as part of the distribution ):

Create a new connection using a URL like jdbc: ignite: // localhost / CatalogGood, where localhost is the address of one of the Apache Ignite nodes and CatalogGood is the cache to which requests will go by default.

A couple of examples of possible SQL queries:

 SELECT cg.name goodName, cg.price goodPrice, cc.name category, pcc.name parentCategory FROM catalog_category.CatalogCategory cc JOIN catalog_category.CatalogCategory pcc ON cc.parentId = pcc.id JOIN catalog_good.CatalogGood cg ON cg.categoryId = cc.id;

goodName	goodPrice	category	parentCategory
Fridge buzzword	1000	Refrigerators	Appliances
Foobar fridge	300	Refrigerators	Appliances
Barbaz Refrigerator	500,000	Refrigerators	Appliances
Machine Habr #	10,000	Washing machines	Appliances

 SELECT cc.name, AVG(cg.price) avgPrice FROM catalog_category.CatalogCategory cc JOIN catalog_good.CatalogGood cg ON cg.categoryId = cc.id WHERE cg.price <= 100000 GROUP BY cc.id;

name	avgPrice
Refrigerators	650
Washing machines	10,000

Conclusion

In this simple example, you can see how using Apache Ignite you can use Apache Cassandra to raise a distributed SQL engine with ACID transactions and RAM speed.

Faced with Apache Cassandra in existing infrastructure or in a greenfield project, think about this article and that Cassandra is good with Ignite. Or you can now try to write some project, for example, from the world of the Internet of things, using the strengths of Ignite and Cassandra.

Source: https://habr.com/ru/post/329736/

All Articles