Recently, I was puzzled by the problem of monitoring several dozen servers (well, probably rarely anyone has encountered such a task). The problem can be described by several rules:
- You need to ping the server periodically
- Sometimes, perform some action with the server (for example, executing a command via ssh) that the user gave
- Server actions can be of several types, each action has its own priority.
- Tasks (from p. 1-3) can not be performed simultaneously for each server
- Tasks may fail, for example due to the lack of communication with the server, you need to wait until the connection is restored and try to complete the scheduled task.
')
The first decision that comes to the majority is to start a stream for each server and do its own work there. This is not bad, but what if during the monitoring process the set of servers changes? Starting and terminating streams in the monitoring process is somehow inelegant. And what if there are a thousand servers? You can probably have a thousand threads, but why do it when most of the time the stream is idle and waiting for its time for the next ping?
You can look at this problem from the other side and present it in the form of the classic “producer-consumer” task. We have producers who produce tasks (ping, ssh command) and we have consuemers who perform these tasks. Of course, producers and consumers do not have one copy each. Solving our “producer-consumer” task in JAVA is not easy, but very easy using the PriorityQueue and ExecutorService classes.
Let's start, as usual, with a unit test:
@Test public void testOffer() { PollServerQueue xq = new PollServerQueue(); xq.addTask(new MyTask(1, 11)); xq.addTask(new MyTask(2, 12)); xq.addTask(new MyTask(1, 13)); MyTask t1 = (MyTask)xq.poll(); assertEquals(1, t1.getServerId()); assertEquals(11, t1.getTaskId()); MyTask t2 = (MyTask)xq.poll(); assertEquals(2, t2.getServerId()); assertEquals(12, t2.getTaskId()); MyTask t3 = (MyTask)xq.poll(); assertEquals(null, t3); xq.FinishTask(1); MyTask t5 = (MyTask)xq.poll(); assertEquals(1, t5.getServerId()); assertEquals(13, t5.getTaskId()); }
In this unit test, we added three tasks of our type MyTask to our queue (the first constructor argument means serverId, the second taskId). The poll method retrieves a task from the queue. If the task could not be retrieved (for example, the tasks are over or there are tasks left in the queue for servers that are already running tasks) - the poll method returns null. From the code it is clear that the completion of the task for serverId = 1 leads to the fact that the next task for this server can be extracted from the queue.
Hooray! Unit test is written, you can write code. We will need:
- Data structure (HashMap) for storing the current executable tasks for each server (currentTasks)
- Data structure (HashMap) for storing tasks queued for execution. For each server - its own queue (waitingTasks)
- Data structure (PriorityQueue) for sequential polling of servers. It is necessary that the next poll () call comes to us with a task for another server. In short, a structure like a revolver, only the bullets after each shot remain in the drum (peekOrder)
- A structure (HashSet) for storing and quickly searching for server identifiers in a revolver so that you do not look through the revolver from the first to the last element (servers) each time
- Simple object to sync (syncObject)
Now, the procedure for extracting a task from the queue will be simple and short. And although the code turned out to be compact, I don’t see the point of publishing it here, but will send you to
https://github.com/get-a-clue/PollServerQueueExampleDisclaimer : the github code is not complete, in particular, it lacks the ability to set priorities for tasks inside the queue for each server and the mechanism for handling errors and returning failed queuing. Well, the code itself for pinging. As the saying goes, less code - better sleep. :)