DevOps: send metrics and sleep well

Suddenly, at night the bell rings and we will find out that our application is not working. There are 2 hours for his resuscitation ...

Yes, we regularly sent logs to Elasticsearch, inspired by the ideas from the article “Publishing logs to Elasticsearch - life without regular expressions and without logstash” . But to investigate the true cause of the “crash” of the application, we clearly do not have enough data from the jmx bins jvm, jmx of the connection pool with the database, data on the number of file descriptors opened by our process, etc. How could we forget about them !? Late to drink Borjomi ...
')
Of course, we can take process data from nagios, zabbix, collectd or proprietary monitoring systems, but it would be more convenient to have all this data at once in Elasticsearch. This would allow to visualize in kibana and receive alerts based on a single source of events from multiple processes on different nodes in the network. Elasticsearch allows you to index, perform full-text search and horizontally scale data storage by adding new search server processes.

Okay, this time we will dig up the application logs, examine the history of the metrics in our miraculous proprietary monitoring and think up on the basis of this the reason for the fall of the application more plausible. Next time, we will try to collect more information about the operation of our application in a more convenient way to analyze and find out the true reason based on the metrics and events of the application.

I will share with you a recipe how to sleep more at night:

Cooking Elasticsearch and kibana
Send metrics from jvm to Elasticsearch
But what about the application logs?
How to deal with already running jvm processes and where is groovy?

Cooking Elasticsearch and kibana

Let's start with where we will collect metrics. Launch Elasticsearch (ES) server. For example, Elasticsearch in the simplest configuration , which immediately sets the default template for all new indices, so that you can search in kibana using the ready-made Logstash Dashboard. Runs as mvn package . Server nodes find each other using multicast UDP packets and cluster into a matching clusterName. It is clear that the lack of unicast addresses simplifies development life, but in industrial operation it is not worth setting up a cluster like this!

For this example, you can run Elasticsearch using the groovy script ...

java -jar groovy-grape-aether-2.4.5.1.jar elasticsearch-server.groovy

Script elasticsearch-server.groovy:

@Grab(group='org.elasticsearch', module='elasticsearch', version='1.1.1') import org.elasticsearch.common.settings.ImmutableSettings; import org.elasticsearch.node.Node; import org.elasticsearch.node.NodeBuilder; import java.io.InputStream; import java.net.URL; import java.util.concurrent.TimeUnit; int durationInSeconds = args.length == 0 ? 3600 : Integer.parseInt(this.args[0]); String template; InputStream templateStream = new URL("https://raw.githubusercontent.com/logstash-plugins/logstash-output-elasticsearch/master/lib/logstash/outputs/elasticsearch/elasticsearch-template.json").openStream() try{ template = new String(sun.misc.IOUtils.readFully(templateStream, -1, true)); } finally{ templateStream.close() } Node elasticsearchServer = NodeBuilder.nodeBuilder().settings(ImmutableSettings.settingsBuilder().put("http.cors.enabled","true")).clusterName("elasticsearchServer").data(true).build(); Node node = elasticsearchServer.start(); node.client().admin().indices().preparePutTemplate("logstash").setSource(template).get(); System.out.println("ES STARTED"); Thread.sleep(TimeUnit.SECONDS.toMillis(durationInSeconds));

We have a storage for metrics. Now we ’ll connect to it using the Kibana web application and don’t see anything, since no one has yet sent metrics there. You can view the status of the Elasticsearch cluster and connected clients using elasticsearch-HQ , but there are many options for administration consoles and the choice will be to your taste.

A few words about the kibana configuration for this example. In the file kibana-3.1.3 / config.js, you need to replace the property: elasticsearch: " 127.0.0.1 : 9200". After that, the web interface will be able to connect to the elasticsearch server running on the same host by the elasticsearch-server.groovy script or the mvn package command.

Send metrics from jvm to Elasticsearch

All the magic will be done using the AspectJ-Scripting agent. The agent has been modified so that it can be easily inserted into the process using AttachAPI and that it receives the configuration through the agent parameter. Now the agent can be used with a configuration without aspects and it is possible to describe periodically executed tasks.

AspectJ-Scripting will allow us, without recompiling and repacking the application, using the jar file with the aspectj-scripting-1.3-agent.jar agent , the configuration file in the file system (or on the web server) and one additional parameter -javaagent at the start of the JVM, send metrics from all application processes in Elasticsearch.

In the java agent configuration, we will use com.github.igor-suhorukov: jvm-metrics: 1.2 and “under the hood” of this library work:

Jolokia to get jmx metrics with connection either to a local mbean server or via JSR160 (RMI)
jvmstat performance counters - metrics are available if we work under the OpenJDK / Oracle JVM
SIGAR for getting metrics from OS and io.kamon: sigar-loader to simplify loading of its native library.

So, all that does the work we need to collect and send metrics is in the file log.metrics.xml

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <configuration> <globalContext> <artifacts> <artifact>com.github.igor-suhorukov:jvm-metrics:1.2</artifact> <classRefs> <variable>SigarCollect</variable><className>org.github.suhorukov.SigarCollect</className> </classRefs> <classRefs> <variable>JmxCollect</variable><className>org.github.suhorukov.JmxCollect</className> </classRefs> </artifacts> <artifacts> <artifact>org.elasticsearch:elasticsearch:1.1.1</artifact> <classRefs> <variable>NodeBuilder</variable><className>org.elasticsearch.node.NodeBuilder</className> </classRefs> </artifacts> <init> <expression> SigarCollect sigar = new SigarCollect(); JmxCollect jmxCollect = new JmxCollect(); reportHost = java.net.InetAddress.getLocalHost().getHostName(); pid = java.lang.management.ManagementFactory.getRuntimeMXBean().getName().split("@")[0]; Thread.currentThread().setContextClassLoader(NodeBuilder.class.getClassLoader()); client = NodeBuilder.nodeBuilder().clusterName("elasticsearchServer").data(false).client(true).build().start().client(); additionalRecords = new java.util.HashMap(); additionalRecords.put("Host", reportHost); additionalRecords.put("Pid", pid); </expression> </init> <timerTasks> <delay>500</delay> <period>2000</period> <jobExpression> import java.text.SimpleDateFormat; import java.util.TimeZone; logstashFormat = new SimpleDateFormat("yyyy.MM.dd"); logstashFormat.setTimeZone(TimeZone.getTimeZone("UTC")); timestamp = new java.util.Date(); index = "logstash-" + logstashFormat.format(timestamp); jmxInfo = jmxCollect.getJsonJmxInfo("java.lang:type=Memory", timestamp, additionalRecords); nativeInfo = sigar.getJsonProcessInfo(additionalRecords); client.index(client.prepareIndex(index, "logs").setSource(jmxInfo).request()).actionGet(); client.index(client.prepareIndex(index, "logs").setSource(nativeInfo).request()).actionGet(); </jobExpression> </timerTasks> </globalContext> </configuration>

In this configuration, we create instances of the SigarCollect and JmxCollect classes that are loaded from the com.github.igor-suhorukov library: jvm-metrics: 1.2 during the start of the agent.

SigarCollect - allows you to collect data on the process and operating system, many of which are not in jxm, and converts metrics to json format. JmxCollect - allows you to request data from the jmx of the virtual machine and the application, also converting it into a json document. We have limited the information from JmxCollect to the “java.lang: type = Memory” filter, but you will definitely need more information from the jxm bins.

Add reportHost to additionalRecords - the name of the host on which the application is running and the pid process ID

In the client, we save the client for elasticsearch, which joins the servers with the Elasticsearch name of the cluster “elasticsearchServer”, which we will use to send the metrics.

Our task of sending metrics will be performed every 2 seconds and write metrics in an index with an appropriate date, for example logstash-2015.11.25. In this task, the metrics from JmxCollect and SigarCollect are returned as json objects in a format that can visualize kibana. Metrics using the Elasticsearch client are sent to the cluster and indexed.

As you can see, the configuration turned out to be quite familiar to the developers of the Java-like syntax of the MVEL language.

In the script to start our application, add the -javaagent: /? PATH? / Aspectj-scripting-1.3-agent.jar = config: file: /? PATH? / Log.metrics.xml
We start all application processes on all nodes and watch how metrics flow into the Elasticsearch cluster

But what about the application logs?

We already send metrics to Elasticsearch, but there is not enough other information from the application. When using the approach from the article “Publishing logs in Elasticsearch - life without regular expressions and without logstash”, the parsing of log files will not be necessary. When changing the format of logging or the appearance of new messages, it is not necessary to support a large set of regulators. In this approach, I suggest intercepting calls to the error, warn, info, debug, trace logger methods and send the data immediately to elasticsearch.

How to deal with already running jvm processes and where is groovy?

The Groovy script will allow us to deploy an agent even to an application already running in jvm, indicating only the process identifier and the path to the configuration. And even jvm tools.jar is not needed to embed an agent into a running process using AttachAPI. This was made possible thanks to the com.github.igor-suhorukov library: attach-vm: 1.0 , based on classes from JMockit / OpenJDK.

To run this script, we need a specially prepared groove groovy-grape-aether-2.4.5.1.jar . About the possibilities of which I recently spoke .

java -jar groovy-grape-aether-2.4.5.1.jar attachjvm.groovy JVM_PID config: file :? PATH? / log.metrics.xml

The attachjvm.groovy script downloads aspectj-scripting: jar: agent from the maven repository and using AttachAPI connects to jvm with the process number JVM_PID , loading into this jvm aspectj-scripting agent with a log.metrics.xml configuration

 @Grab(group='com.github.igor-suhorukov', module='attach-vm', version='1.0') import com.github.igorsuhorukov.jvmattachapi.VirtualMachineUtils; import com.sun.tools.attach.VirtualMachine; import com.github.igorsuhorukov.smreed.dropship.MavenClassLoader; def aspectJScriptingFile = MavenClassLoader.forMavenCoordinates("com.github.igor-suhorukov:aspectj-scripting:jar:agent:1.3").getURLs().getAt(0).getFile() println aspectJScriptingFile def processId = this.args.getAt(0) //CliBuilder def configPath = this.args.getAt(1) VirtualMachine virtualMachine = VirtualMachineUtils.connectToVirtualMachine(processId) try { virtualMachine.loadAgent(aspectJScriptingFile, configPath) } finally { virtualMachine.detach() }

If it is necessary to diagnose the system or change the logging level, then we build a cross-platform ssh server into the java application . Which we can embed in an already running java process:

java -jar groovy-grape-aether-2.4.5.1.jar attachjvm.groovy JVM_PID config: file :? PATH? / CRaSHub_ssh.xml

With the following configuration CRaSHub_ssh.xml ...

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <configuration> <globalContext> <artifacts> <artifact>org.crashub:crash.connectors.ssh:1.3.1</artifact> <classRefs> <variable>Bootstrap</variable> <className>org.crsh.standalone.Bootstrap</className> </classRefs> <classRefs> <variable>Builder</variable> <className>org.crsh.vfs.FS$Builder</className> </classRefs> <classRefs> <variable>ClassPathMountFactory</variable> <className>org.crsh.vfs.spi.url.ClassPathMountFactory</className> </classRefs> <classRefs> <variable>FileMountFactory</variable> <className>org.crsh.vfs.spi.file.FileMountFactory</className> </classRefs> </artifacts> <init> <expression> import java.util.concurrent.TimeUnit; otherCmd = new FileMountFactory(new java.io.File(System.getProperty("user.dir"))); classLoader = otherCmd.getClass().getClassLoader(); classpathDriver = new ClassPathMountFactory(classLoader); cmdFS = new Builder().register("classpath", classpathDriver).register("file", otherCmd).mount("classpath:/crash/commands/").build(); confFS = new Builder().register("classpath", classpathDriver).mount("classpath:/crash/").build(); bootstrap = new Bootstrap(classLoader, confFS, cmdFS); config = new java.util.Properties(); config.put("crash.ssh.port", "2000"); config.put("crash.ssh.auth_timeout", "300000"); config.put("crash.ssh.idle_timeout", "300000"); config.put("crash.auth", "simple"); config.put("crash.auth.simple.username", "admin"); config.put("crash.auth.simple.password", "admin"); bootstrap.setConfig(config); bootstrap.bootstrap(); Thread.sleep(TimeUnit.MINUTES.toMillis(30)); bootstrap.shutdown(); </expression> </init> </globalContext> </configuration>

findings

To sleep well, you need to pay no less attention to the tasks associated with the operation of the application.

Now we have a working cluster Elasticsearch, the data from the indices of which we can analyze in kibana. Applications with metrics from JMX beans, jvm process metrics and the operating system come to Elasticsearch from applications. Here we already know how to write application logs. We set up notifications and, in the case of notifications, quickly find out the cause of problems in our application and promptly fix problems. Moreover, we build a cross-platform ssh server into the java application and we can already diagnose even a running application without restarting. This includes switching the level of logging in it using CRaSH.

I wish your application a stable job, and you quickly search for the causes of problems, if they still arose!

Source: https://habr.com/ru/post/269793/

All Articles