Gobetween Exec discovery + Elasticsearch. L4 balancing with Data Node Discovery

Why is all this necessary?

All those who used Elasticsearch cluster for their needs (especially for logging and as the main database) at large loads faced with the problems of consistency and scalability. When it is required to parallelize the load on Elasticsearch, static solutions were usually applied to the NGINX + Elasticsearch type . This allows you to parallelize the load, but does not look too flexible. Especially if we consider that the nodes themselves can drop out of the cluster and a simple helschek will show that everything is fine, but in fact the node is overloaded, excluded from the cluster. In any case, I would like to have data on the status of the cluster at first hand, and not be content with simple checks.
So let's get down to building balancing.

How are we going to do that

In this case, we will use the CAT node API , which is part of the most powerful CAT API , which is a heading search tool for the Elasticsearch cluster.
We will use only Gobetween and the built-in Elasticsearch mechanisms for balancing the write / read CRUD (DATA) nodes with an arbitrary number / status of nodes in the cluster.

Presets

We will need:

Elasticsearch client node:

Need specifically for gobetween. With a multimaster configuration, it will redirect the request to the correct master node (active master), and thus will be our router inside the cluster for the correct operation of our Data Node discovery.

In elasticsearch.conf on the client node we write:

node.master: false node.data: false

and the rest of the settings are identical to the settings of the nodes in your cluster.

Script for discovery:

Now we will create a script that will request the API and return the list of nodes to us.
Let's call it discovery_elasticsearch.sh:

 #!/bin/bash curl -sS -XGET 'http://PI_OF_YOUR_CLIENT_NODE:9200/_cat/nodes?v&h=ip,r=d' |sed '1d'|tr -d ' '|sed 's/$/:9200/'

the output of the script will be something like

  10.0.0.51:9200 10.0.0.55:9200 10.0.0.53:9200 10.0.0.52:9200 10.0.0.54:9200 ...

In this case, the script does not return the weight of each node, then the weight is automatically entered by the balancer is the same - "1".

Now everything is ready, and you can start setting up the balancer itself.

Balancer setting

After installation, it is time to adjust balancing using the EXEC discovery and round robin balancing algorithm.
This example is quite simple and serves to describe the capabilities of this type of balancing. You can extend the script to dynamically generate weights for each node by their loading (cpu, io, etc.).

The configuration of our balancer will look like:

 [logging] level = "warn" # "debug" | "info" | "warn" | "error" output = "stdout" # "stdout" | "stderr" | "/path/to/gobetween.log" [defaults] max_connections = 0 client_idle_timeout = "0" backend_idle_timeout = "0" backend_connection_timeout = "0" [servers.sample3] bind = "100.100.1.5:9200" protocol = "tcp" balance = "weight" [servers.sample3.discovery] kind = "exec" exec_command = ["/etc/gobetween/discovery_elasticsearch.sh"] interval="1m" timeout = "10s" [servers.sample3.healthcheck] kind = "ping" interval = "20s" timeout = "2s" fails = 3 passes = 3

So, we have "fish" solutions for building flexible balancing in the elasticsearch cluster.
In "combat" conditions, we have a more complex configuration with a dynamic definition of weights.
Also in our production we use a complex helcek script also tied to the monitoring API and Elasticsearch itself.

The following articles - json discovery, windocker service discovery and balancing, Windows swarm service discovery and balancing, Balancing (Windows + linux) Doker environments.

Source: https://habr.com/ru/post/304096/

All Articles