📜 ⬆️ ⬇️

Dynamic Apache NiFi cluster creation

Apache NiFi is a convenient platform for working with various data in real time, with the ability to visually build these processes. The purpose of this article is to describe the capabilities of creating an Apache NiFi cluster.

image Fig. 1. GUI Apache NiFi.

Features:
')

→ Read more here

Apache NiFi cluster configuration


To start the Apache NiFi cluster, you can use the built-in or external Apache Zookeeper, you can set it in the conf / nifi.properties settings. We will use inline.
image Fig. 2. Apache NiFi cluster scheme

To configure the Apache NiFi cluster, we need at least 3 nodes in order to provide a quorum. It is generally recommended to run ZooKeeper on 3 or 5 nodes. Work on less than 3 knots provides less durability before a crash. Running on more than 5 nodes usually results in more network traffic than necessary. For all three instances, the general properties of the cluster can be left with the default settings. However, note that as these parameters change, they must be the same for each future cluster node.

For minimum configuration of the Apache NiFi cluster, you must perform the following operations on each node of the future cluster:

  1. set the necessary parameters in nifi.properties
  2. specify the cluster server in zookeeper.properties
  3. set id for Zookeeper at the local host
  4. specify the connection string to the Zookeeper cluster in state-management.xml

We describe each step in more detail.
1. Set in nifi.properties:
nifi.cluster.is.node=true
nifi.cluster.node.address=<local-ip>
nifi.cluster.node.protocol.port=3030
nifi.state.management.embedded.zookeeper.start=true
nifi.remote.input.host=<local-ip>
nifi.web.http.host=<local-ip>
nifi.zookeeper.connect.string=<connect-string>

connect-string list of servers with zk separated by commas.
For example: nifi01: 2181, nifi02: 21818, nifi03: 2181

2. Register the cluster server in zookeeper.properties:
server.1=<nifi-node1-ip>:2888:3888
server.2=<nifi-node2-ip>:2888:3888
server.3=<nifi-node3-ip>:2888:3888
initLimit=5
syncLimit=2


3. Set the id in the ./state/zookeeper/myid file if the local node is part of the Zookeeper cluster.
4. Register the connection string to the cluster in the state-management.xml file

To start Apache NiFi on each node, just run the command:

 bin/nifi.sh start 

It does not matter in what sequence Apache NiFi will be launched on each of the nodes. You can monitor the process of starting a cluster using the logs / nifi-app.log file.

Run a local cluster in a virtual environment


To study the work with the cluster, we need the ability to run the Apache NiFi cluster locally in a virtual environment. To run in a virtual environment, Hashicorp Vagrant and Oracle VM VirtualBox were used. You need to install vagrant-vbguest and vagrant-hostmanager plugins. To speed up and facilitate the startup process, special vagrant provision scripts were written that allow you to start an Apache NiFi cluster in a virtual environment with one command:

 vagrant up 

After launch, within five to seven minutes, the user interface will be available in the browser at localhost : 8080 /. You can also check by opening VirtualBox, you should see three virtual machines running nifi01, nifi02 and nifi03.

The vagrant provision script scripts for running the NiFi cluster are available on github .

Dynamic cluster formation


In some situations, it is necessary that the connected device itself finds a cluster on the network and connects to it. For these purposes, an “agent” program was written that searches for devices on the network, and when a cluster is found (checks via Apache NiFi REST API) it connects to it. Sources of this program are available on github .

An example of running an agent:

 java -cp cluster-joiner-0.0.1-jar-with-dependencies.jar ru.itis.suc.NodeAgent /home/user/nifi/nifi-1.2.0 8085 

where the arguments are the path to Nifi and the port that the agent will listen to if you need to create a new cluster.

After launch, the cluster will be searched for and connected to the local network. If the cluster is not found, there will be an attempt to create a cluster, if there are 2 more devices ready to become part of the new cluster.


Fig. 3. GUI Apache Nifi running in a cluster.


Fig. 4. List of cluster nodes.

Conclusion


This work was done in order to experiment and test the possibility of automatically creating an Apache NiFi cluster in the local network.

Of course, primitive algorithms were used for searching and connecting; however, the purpose of the work was only to verify this possibility.

Source: https://habr.com/ru/post/331444/


All Articles