What is Apache Ignite / GridGain for, using the example .NET & C #

Recently, the names GridGain and Apache Ignite are often flashed on the Internet. However, judging by the comments (for example, here ), few people understand what kind of product it is and what it is eaten with.

In this article I will try to explain in an accessible language, and with examples of code, show what Apache Ignite can do.

Apache Ignite Logo

Ignite vs GridGain

A brief educational program: GridGain released the first version of the same product in 2007. In 2014, GridGain donated most of the code to the Apache Software Foundation , which resulted in the birth of the Apache Ignite project. GridGain provides paid support and additional functionality in the form of a plugin.

It is important to understand : Apache Ignite is not owned by GridGain and is free software under the Apache 2.0 license.

The difference from "normal" open-source projects (located on GitHub, for example) here is that there is no possibility to "change your mind", close the code, change the license, and so on.

Ignite is owned by the Apache Software Foundation.

Ignite.NET

Ignite is written in Java, and also provides APIs in .NET and C ++ . In this article we will talk about the .NET API, in which there is approximately 90% of the functionality of Javov's API, plus its own buns (LINQ).

What is it and why?

The simplest and most capacious answer is the database. At the heart of which is the key-value storage; roughly speaking, ConcurrentDictionary , in which the data is located on several machines.

It supports distributed transactions, SQL with indexes, Lucene full-text search, map-reduce calculations, and much more. But first things first.

How to start?

In one line! This is the easiest to install and use database that I know.

The bad news: we will need the installed Java Runtime 7+ . The good news: from this point on, Java can be forgotten.

Let's create a simple Console Application in Visual Studio, install the Apache.Ignite package with NuGet, add one line Ignition.Start(); to the Main method Ignition.Start(); . Done, you can run. After a couple of seconds, "Topology snapshot [ver = 1, servers = 1" will appear in the console. Run the program again and see "Topology snapshot [ver = 2, servers = 2" in both consoles. The distributed database is running on two nodes.

Two Ignite nodes

Terminology retreat

The node or node is the result of the Ignition.Start() command. You can run multiple nodes on the same machine or even in the same process.
Cluster - a set of nodes connected to each other. The nodes see or not see each other depending on the configuration. Thus, it is possible to start several separate clusters even within one process.

Work with data

Well, we started the database, now it would be nice to create tables and fill them.

The table in Apache Ignite is a cache, ICache<K, V> . The work goes directly with user-defined data types, so we have here and ORM in one bottle (although it is possible to work with data directly, without mapping into objects).

 class Car { public string Model { get; set; } public int Power { get; set; } public override string ToString() => $"Model: {Model}, Power: {Power} hp"; } static void Main() { using (var ignite = Ignition.Start()) { ICache<int, Car> cache = ignite.GetOrCreateCache<int, Car>("cars"); cache[1] = new Car {Model = "Pagani Zonda R", Power = 740}; foreach (ICacheEntry<int, Car> entry in cache) Console.WriteLine(entry); } }

As you can see, the basic work with the cache is no different from the familiar Dictionary<,> . In this case, data is immediately available on all nodes of the cluster.

This part of the functionality can be compared with Redis .

SQL

Data can be added and queried via SQL. To do this, you must explicitly specify which fields of the object participate in the queries ( [QuerySqlField] attribute), and specify the key types and values in the cache configuration:

 class Car { [QuerySqlField] public string Model { get; set; } [QuerySqlField] public int Power { get; set; } } ... //      SQL: var queryEntity = new QueryEntity(typeof(int), typeof(Car)); var cacheConfig = new CacheConfiguration("cars", queryEntity); ICache<int, Car> cache = ignite.GetOrCreateCache<int, Car>(cacheConfig); //   (_key -  ): var insertQuery = new SqlFieldsQuery("INSERT INTO Car (_key, Model, Power) VALUES " + "(1, 'Ariel Atom', 350), " + "(2, 'Reliant Robin', 39)"); cache.QueryFields(insertQuery).GetAll(); //  : var selQuery = new SqlQuery(typeof(Car), "SELECT * FROM Car ORDER BY Power"); foreach (ICacheEntry<int, Car> entry in cache.Query(selQuery)) Console.WriteLine(entry);

These two approaches, key-value and SQL, can be mixed to taste. It is easier and faster to get or insert one value through cache[key] , and conditionally update a set of values via SQL.

LINQ

The query from the example above can be rewritten to LINQ (you need the Apache.Ignite.Linq NuGet package):

 var linqSelect = cache.AsCacheQueryable().OrderBy(c => c.Value.Power); foreach (ICacheEntry<int, Car> entry in linqSelect) Console.WriteLine(entry);

This query will be translated to SQL, as can be seen by linqSelect the ICacheQueryable type to ICacheQueryable .

Pay attention to AsCacheQueryable() - this is important! Forgetting this call, we will turn the distributed SQL query into LINQ-To-Objects, which will lead to loading all the data on the local node, which we usually do not want.

How it works?

By default, caches in Ignite work in Partitioned mode, in which data is evenly distributed between nodes. The SQL query is sent to each node and executed, the results are then aggregated on the calling node. Each node in parallel with the other processes only its part of the data. By adding more nodes to the cluster, we can increase the performance and amount of stored data.

To ensure fault tolerance, you can specify one or more backup copies, that is, the number of nodes that store each data item.

In some cases it makes sense to Replicated mode, where each node stores a complete copy of all data.

Map-Reduce, Locks, Atomics ...

Suppose we need to translate a huge text, or recognize a large number of images. Such tasks can be easily parallelized between several servers:

 class Translator : IComputeFunc<string, string> { public string Invoke(string text) => TranslateText(text); } ... IEnumerable<string> pages = GetTextPages(); ICollection<string> translated = ignite.GetCompute().Apply(new Translator(), pages);

You can synchronize different processes between nodes using distributed locks. The functionality is similar to lock {} / Monitor , with the only difference that applies to the entire cluster:

 var cache = ignite.GetOrCreateCache<int, int>("foo"); using (ICacheLock lck = cache.Lock(1)) { lck.Enter(); //   lck.Exit(); }

Familiar with the Interlocked class? Ignite provides similar non-blocking functionality, only atomic operations within the entire cluster.

 var atomic = ignite.GetAtomicLong(name: "myVal", initialValue: 1, create: true); atomic.Increment(); atomic.CompareExchange(10, 20);

This group of features can be compared with Akka .

Conclusion

I work in GridGain, but I am writing this post on behalf of the Apache Ignite contributor. I think the product deserves attention. Especially in the .NET world, where the whole topic of Big Data and distributed computing is poorly disclosed. Few such projects normally support .NET, even less is written on it.

Ignite.NET is really easy to try, it even runs in LINQPad (code samples for LINQPad are included in NuGet!). Ways of use can be mass. There is integration with ASP.NET ( output cache , session state cache ), with the Entity Framework ( second level cache ). Can be used as a platform for (micro) services . In any project where more than one server is required, Ignite can make life easier in one way or another.

Yes, there are other projects that have one or another Ignite feature, but there is no other project where all this is integrated and integrated into one product.