Apache Ignite 2.4 Release - Distributed Database and Caching Platform

March 12, 2018, 4 months after the last version, Apache Ignite 2.4 was released. This release is notable for a number of innovations: support for Java 9, multiple optimizations and improvements to SQL, support for the neural network platform, a new approach to building a topology when working with disk, and much more.

The Apache Ignite Database and Caching Platform is a platform for distributed data storage (optimized for active RAM usage), as well as for distributed computing in near real time.

Ignite is used where you need to quickly process large data streams that are not too tough for centralized systems.
')
Examples of use: fast distributed cache; layer aggregating data from disparate services (for example, for Customer 360 View); main horizontally scalable storage (NoSQL or SQL) of operational data; platform for computing, etc.

Next, we consider the main innovations Ignite 2.4.

Baseline topology

If you used Apache Ignite with its own disk storage, you probably had to deal with:

with the need to explicitly activate the cluster after starting the minimum required number of nodes;
with aggressive rebalancing when changing the topology, which can be very painful due to active disk I / O.

Baseline Topology solves these problems by fixing a set of nodes that contain disk data, and have an impact on cluster activation, behavior when changing topology and rebalancing.

Baseline Topology is such an important change in Ignite that in the near future we will publish a separate article on this function.

Thin clients

Now you can create thin clients based on your own binary protocol .

Previously, clients for .NET and C ++ raised a full-fledged JVM with Ignite for communication with the cluster. This provided easy and cheap access to the extensive functionality of the platform, but the customers were heavy.

New thin clients are independent and do not need to use JVM. This significantly reduces resource consumption and increases productivity, and it is now much easier and cheaper for the community to build new clients for many different languages, for example, Python.

In version 2.4, a thin client for .NET appeared.

var cfg = new IgniteClientConfiguration { Host = "127.0.0.1" }; using (IIgniteClient igniteClient = Ignition.StartClient(cfg)) { ICacheClient<int, Organization> cache = igniteClient.GetCache<int, Organization>(CacheName); Organization org = new Organization( "GridGain", new Address(". -, . , . 69–71,  ", 191119), new Email("rusales@gridgain.com"), OrganizationType.Private, DateTime.Now ); //    . cache.Put(1, org); //         . Organization orgFromCache = cache.Get(1); }

Data load optimization

Apache Ignite 2.4 adds tools to optimize the initial load and load large amounts of data.

Now you can temporarily disable WAL (Write Ahead Log) for individual tables in Runtime. This will allow data to be loaded with minimal disk I / O impact, which will have a positive effect on throughput.

After WAL is turned on, a checkpoint will be immediately made to disk on the current data from RAM to ensure data integrity.

You can disable WAL using SQL:

 --  WAL   (  ). ALTER TABLE my_table NOLOGGING; -- , ,     . ALTER TABLE my_table LOGGING;

or via API:

 ignite.cluster().isWalEnabled(cacheName); // ,   WAL. ignite.cluster().enableWal(cacheName); //  WAL. ignite.cluster().disableWal(cacheName); //  WAL.

Java 9

Ignite 2.4 adds Java 9 to existing Java 8 support.

Expanded .NET Support

I often heard the question: “when will Ignite for .NET start supporting .NET Core?”. I am pleased to announce that, starting with version 2.4, Ignite.NET is getting support for .NET Core . Moreover, there is also support for Mono .

Thanks to this, you can build cross-platform applications on .NET, expanding the scope of Ignite applications in the Linux and Mac worlds.

In a separate article, we will discuss in more detail about innovations relating to .NET - thin client and support for .NET Core and Mono.

Numerous optimizations and SQL enhancements

Ignite 2.4 has made many changes to speed up SQL. These include: multithreaded index creation , optimization of object deserialization and primary key search , SQL batching support on the cluster side, and much more.

In the DDL area, you can set DEFAULT values for columns in tables created via CREATE TABLE, specify settings for embedding values in index trees, and perform DROP COLUMN .

An example of creating an index with new attributes:

 // INLINE_SIZE —         ; // PARALLEL —   . CREATE INDEX fast_city_idx ON sales (country, city) INLINE_SIZE 60 PARALLEL 8;

Neural Networks and Other Machine Learning Improvements

In version 2.4 neural networks appeared on Apache Ignite .

Their key advantage is the high performance of training and execution of models. Due to the distributed learning of neural networks and the collocation of computational components with data on cluster nodes, there is no need for ETL and long data transfer to external systems that clog the network.

 //   . int samplesCnt = 100000; //     sin^2   [0; pi/2]. IgniteSupplier<Double> pointsGen = () -> (Math.random() + 1) / 2 * (Math.PI / 2); IgniteDoubleFunction<Double> f = x -> Math.sin(x) * Math.sin(x); IgniteCache<Integer, LabeledVector<Vector, Vector>> cache = LabeledVectorsCache.createNew(ignite); String cacheName = cache.getName(); //    IgniteDataStreamer. try (IgniteDataStreamer<Integer, LabeledVector<Vector, Vector>> streamer = ignite.dataStreamer(cacheName)) { streamer.perNodeBufferSize(10000); for (int i = 0; i < samplesCnt; i++) { double x = pointsGen.get(); double y = f.apply(x); streamer.addData(i, new LabeledVector<>(new DenseLocalOnHeapVector(new double[] {x}), new DenseLocalOnHeapVector(new double[] {y}))); } } //  . MLPGroupUpdateTrainer<RPropParameterUpdate> trainer = MLPGroupUpdateTrainer.getDefault(ignite). withSyncPeriod(3). withTolerance(0.0001). withMaxGlobalSteps(100). withUpdateStrategy(UpdateStrategies.RProp()); //    . MLPArchitecture conf = new MLPArchitecture(1). withAddedLayer(10, true, Activators.SIGMOID). withAddedLayer(1, true, Activators.SIGMOID); MLPGroupUpdateTrainerCacheInput trainerInput = new MLPGroupUpdateTrainerCacheInput(conf, new RandomInitializer(new Random()), 6, cache, 1000); //    . MultilayerPerceptron mlp = trainer.train(trainerInput); int testCnt = 1000; Matrix test = new DenseLocalOnHeapMatrix(1, testCnt); for (int i = 0; i < testCnt; i++) test.setColumn(i, new double[] {pointsGen.get()}); Matrix predicted = mlp.apply(test); Matrix actual = test.copy().map(f); Vector predicted = mlp.apply(test).getRow(0); Vector actual = test.copy().map(f).getRow(0); //     . Tracer.showAscii(predicted); Tracer.showAscii(actual); System.out.println("MSE: " + (predicted.minus(actual).kNorm(2) / predicted.size()));

Other

In addition to these changes, the release also includes:

initial support for Spark DataFrames ;
optimization of memory consumption when working with a disk;
multiple disk storage optimizations (for example, when working with WAL);
forwarding new monitoring value to JMX (for example, long-awaited caches - occupied memory , extended topology information will be available for monitoring;
RPM packages from Ignite ( repository ).

Source: https://habr.com/ru/post/351098/

All Articles