How to save on spot EC2 instances with Scylla

Spot instances can save you a lot of money. But what if you work with stateful services, such as NoSQL databases? The main problem is that in this case each node in the cluster must retain some parameters — IP, data, and other configurations. In this post, we will talk about Scylla's open source NoSQL database and how it can be used in the ES2 spot instances for continuous operation - using the predictive technology SpotInst, as well as enhanced state saving functionality.

What is Scylla?

Scylla is a NoSQL database distributed through the opensource model. It was designed to be compatible with Apache Cassandra, while providing much higher throughput and lower latency. It supports the same protocols and file formats as Apache Cassandra. However, Scylla is written entirely in C ++, not in Java, like Apache Cassandra. In addition, Scylla was built using the Seastar framework, which is an asynchronous library that replaces execution threads, shared memory, mapped files, and other classic Linux programming techniques. Scylla also has its own unique disk scheduler, which helps improve performance.
')
Tests conducted by both ScyllaDB engineers and third-party companies have demonstrated that Scylla is 10 times more than Apache Cassandra.

How Scylla replicates its data between nodes.

Scylla provides accessibility in AlwaysOn mode. Automatic transition to the backup system, replication between multiple nodes and data centers provide fault tolerance.

Scylla, like Cassandra, uses the gossip protocol to exchange metadata to identify the nodes in the cluster and determine if they are active. There is no single point of failure - there can be no single register of the status of the nodes, so they must exchange information with each other.

How to run Scylla on Spotinst

When creating a new Scylla cluster, there is hardly a desire to immediately resort to spot instances because of their unstable behavior. Not in their favor is the fact that these instances can be disabled within 2 minutes. Therefore, Elastigroup is the standard choice for such an environment.

Elastigroup with an indicator of 100% availability is the leader of the Spot Market. Choosing the right rate for the right spot, analyzing the history of data in real time - all this helps to choose spot instances with the lowest price and the longest term of work. Changes in the Spot Market are predicted to be 15 minutes ahead, which allows replacing the spot without interrupting work.

Now about saving the state. Elastigroup can save data volumes. For any EBS volume that is connected to the instance, snapshots will be continuously executed during operation, and after replacement it will be used for block matching.

In order for your machine to continue working in the event of a failure, you need to remember a few things:

Private IP address. - make sure that the new computer has the same IP address so that the gossip protocol can continue to interact with the machine.
Tom The node must be connected to the same repository and must have the same volume as before. If not, the service will be unavailable.
The config file - scylla.yaml is by default located at /etc/scylla/scylla.yaml. It must be edited so that the nodes have information about their configuration. Here are the key parameters you need to configure:

Cluster_name is the name of the cluster. This parameter separates nodes of different logical clusters. For all nodes within the same cluster, the same value should be set;
Listen_interface - the interface that Scylla assigns to connect to other nodes;
Seeds - sid nodes are used during startup to boot the gossip process and join the cluster;
Rpc_address - interface IP address for client connections (Thrift, CQL);
Broadcast_address - the IP address of the interface for connections between nodes, how it will be visible inside the cluster.

Selection of racks

To increase the availability of your data, it is recommended to distribute the nodes between AZ. You can Ec2Snitch this using the Ec2Snitch value in the scylla.yaml and cassandra-rackdc.properties .

Suppose you have a cluster created in the us-east-1 . If node 1 is in us-east-1a , and node 2 is in us-east-1b , Scylla will assume that they are in two different racks in the same data center. Node 1 will be considered resistant 1a, and node 2 will be considered resistant 1b.

Now we will show how to deploy a cluster of six nodes. Each data center will consist of three ordinary nodes and two seeders. IP addresses look like this:

US US-DC1

Node# Private IP
Node1 192.168.1.1 (seed)
Node2 192.168.1.2 (seed)
Node3 192.168.1.3

US US-DC2

Node# Private IP
Node4 192.168.1.4 (seed)
Node5 192.168.1.5 (seed)
Node6 192.168.1.6

On each Scylla node, you need to edit the scylla.yaml file. Here is another example for one node in each data center:

US Data center 1 - 192.168.1.1

cluster_name: 'ScyllaDB_Cluster'
seeds: "192.168.1.1,192.168.1.2,192.168.1.4,192.168.1.5"
endpoint_snitch: Ec2Snitch
rpc_address: "192.168.1.201"
listen_address: "192.168.1.201"

US Data center 2 - 192.168.1.4

cluster_name: 'ScyllaDB_Cluster'
seeds: "192.168.1.1,192.168.1.2,192.168.1.4,192.168.1.5"
endpoint_snitch: Ec2Snitch
rpc_address: "192.168.1.4"
listen_address: "192.168.1.4"

On each Scylla node, you need to edit the cassandra-rackdc.properties file, specifying the appropriate information about the rack and data center:

Nodes 1-3

dc=us-east-1a
rack=RACK1

Nodes 4-6

dc=us-east-1b
rack=RACK2

Spotinst Console setup

When configuring the Elastigroup, it is important to enable the persistence function — this is necessary in order to preserve the data and network configuration when replacing the instance due to the disconnection of the spot. Open the Compute tab, go to the stateful function and check the options as shown in the screenshot below.

We also recommend using the “nodetool drain” shutdown script command to clear the commit log and stop accepting new connections. Description is in the section shutdown script .

How does this work?

In the animation below, you see a cluster of Scylla with three instances. All nodes work on spot instances, with configured state preservation.

When one of the instances goes down, our state preservation function creates an instance with Private IP and Root / Data volumes. And, as you can see below, the instances are returned to the cluster.

So with Scylla and Spotinst, you can increase productivity while reducing costs.

If you want to see and test the solution, you can contact us through the form on the website , in the comments to the post, by mail to ru@globaldots.com or by phone + 7-495-762-45-85.

Source: https://habr.com/ru/post/345480/

All Articles