Distributed data structures [part 1, overview]

Long ago when ~~the trees~~ the computers were large and the processors were single core, all applications were launched in one thread and did not experience any synchronization difficulties.

Modern applications strive to use all available resources, in particular, all available CPUs.

Unfortunately, it’s not possible to use standard data structures for multi-threaded processing, so Java 5 has thread-safe data structures that
those. functioning properly, when used from several streams simultaneously, and they are located in the java.util.concurrent package.

About Vector ...

In fact, thread-safe, but inefficient, data structures, such as Vector and Hashtable, appeared in Java 1.0.
Currently, they are not recommended for use.

However, in spite of all the technological power incorporated in the java.util.concurrent package, information processing by thread-safe collections is possible only within one computer, and this raises the problem of scalability.

And what if you need, in real time, to process information about 100 million customers,
When dataset takes 100TB, and every second you need to perform 100+ thousand operations?
It is hardly possible, even on the steepest modern hardware, and if it is possible - just imagine its value!

It is much cheaper to achieve the same computing power by combining many ordinary computers into a cluster.

There remains only the question of intercomputer interaction using familiar tools, similar in API, to thread-safe collections from the java.util.concurrent package and giving the same guarantees, but not on the same computer, but on the entire cluster.

Such capabilities and guarantees can be provided by distributed data structures.

Consider some of the distributed data structures, allowing, without any particular difficulties, to make a distributed multi-threaded algorithm.

Disclaimer

Considered in the following examples, implementations of distributed data structures are part of the Apache Ignite distributed cache functional.

AtomicReference and AtomicLong

IgniteAtomicReference provides the compare-and-set semantics.

Suppose there are 2 computers connected by a common network.

Run Apache Ignite on both ( pre-connecting libraries )

//   (node) Ignite . //    , node       , //   node,      . Ignite ignite = Ignition.ignite(); //   ,  , IgniteAtomicReference //    "someVal" IgniteAtomicReference<String> ref = ignite.atomicReference("refName", "someVal", true);

On both computers, try changing the stored value.

 //      . boolean res = ref.compareAndSet("someVal", "someNewVal"); // ,    Ignite, . //    ,  res   true, //    res  false, ..      "someVal"

Restore the original value

 ref.compareAndSet("someNewVal", "someVal"); //  .

IgniteAtomicLong extends the semantics of IgniteAtomicReference by adding atomic increment / decrement operations:

 //   ,  , IgniteAtomicLong. final IgniteAtomicLong atomicLong = ignite.atomicLong("atomicName", 0, true); //   . System.out.println("Incremented value: " + atomicLong.incrementAndGet());

Detailed documentation: https://apacheignite.readme.io/docs/atomic-types

Github examples

AtomicSequence

IgniteAtomicSequence allows you to get a unique identifier, and uniqueness is guaranteed throughout the entire cluster.

IgniteAtomicSequence is faster IgniteAtomicLong , because instead of being synchronized globally on the receipt of each identifier, it immediately receives a range of values and, further, gives out identifiers from this range.

 //   ,  , IgniteAtomicSequence. final IgniteAtomicSequence seq = ignite.atomicSequence("seqName", 0, true); //  20  . for (int i = 0; i < 20; i++) { long currentValue = seq.get(); long newValue = seq.incrementAndGet(); ... }

Detailed documentation: https://apacheignite.readme.io/docs/id-generator
Sample on github - IgniteAtomicSequenceExample

CountDownLatch

IgniteCountDownLatch allows you to synchronize threads on different computers within the same cluster.

Run the following code on 10 computers of one cluster

 //   ,  , IgniteCountDownLatch //     10 IgniteCountDownLatch latch = ignite.countDownLatch("latchName", 10, false, true); //   latch.countDown(); //   countDown()   10  latch.await();

As a result, all latch.await () will be guaranteed to be executed after all ten latch.countDown () are executed .

Detailed documentation: https://apacheignite.readme.io/docs/countdownlatch
Sample on github - IgniteCountDownLatchExample

Semaphore

IgniteSemaphore allows you to limit the number of simultaneous actions within a single cluster.

 //   ,  , IgniteSemaphore //     20 IgniteSemaphore semaphore = ignite.semaphore("semName", 20, true, true); //   semaphore.acquire(); try { //  ,    } finally { //   semaphore.release(); }

It is guaranteed that, at the same time, no more than 20 threads, within a single cluster, will execute code within the try section.

Detailed documentation: https://apacheignite.readme.io/docs/distributed-semaphore
Sample on github - IgniteSemaphoreExample

BlockingQueue

IgniteQueue provides the same features as BlockingQueue , but within a whole cluster.

 //   ,  , IgniteQueue. IgniteQueue<String> queue = ignite.queue("queueName", 0, colCfg);

Let's try to get the item from the queue.

 //      queue.take();

Execution will pause at queue.take () until, within the same cluster, it is added to the queue

 //     queue.put("data");

Detailed documentation: https://apacheignite.readme.io/docs/queue-and-set
Example on github - IgniteQueueExample

Instead of conclusion

Due to the fact that the article turned out to be purely overview, and many, for sure, I wonder how it all works under the hood - in the next article I will consider the features of the implementation of each of the distributed data structures described in this article.

Source: https://habr.com/ru/post/328086/

All Articles