📜 ⬆️ ⬇️

Streaming and analyzing Java application logs in MS Azure using Log4j and Stream Analytics

In this article I will show several working solutions for the problem of transferring and analyzing logs from Java applications to MS Azure. We will consider solutions for both windows and linux virtual machines that are both in the cloud and on-premise. We will use log4j2 as a logging subsystem for Java.


For the analysis of logs we will use Azure Stream Analytics .



To understand what this article is all about, it is desirable to have basic knowledge of log4j2 and some Azure resources, namely stream analytics, event hub, blob storage.
If you have a desire to refresh them (knowledge) - here are the links
Apache Log4j 2
Azure Stream Analytics Documentation
Azure Event Hubs
Azure Storage


Why java?


Perhaps the idea of ​​hosting java applications in Azure VM may seem strange to someone, but according to data from the market analysis report IaaS 20011-2026 in Germany from Colorbridge Gmbh , Azure IaaS use only slightly less than AWS. Therefore, if you leave prejudices, such a question will be quite reasonable, and someone - even topical.


Although Azure has been supporting Java for a long time, especially at the PaaS level, providing an SDK for Java, hosting Java applications in the Web App, as well as SaaS software, which is popular in the java and opensource world. But at the level of IaaS (in general, abstracted from the software itself), the specifics of working with Java is not very illuminated. And it is, at least in the field of logging. Try to fix it.


Why log4j?


For java there are several subsystems for logging. We will use log4j because



What I tested


As a test application, I took the most common springboot starter app with the spring-boot-starter-log4j2 module but the latest version 2.0.0.M5, since For one of the scripts, the latest version of log4j will be needed.


pom.xml
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.example</groupId> <artifactId>log4j2-demo</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>log4j2-demo</name> <description>Demo project for Spring Boot</description> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.0.0.M5</version> <relativePath/> </parent> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> <java.version>1.8</java.version> </properties> <repositories> <repository> <id>sboot</id> <name>your custom repo</name> <url>https://repo.spring.io/libs-milestone</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> </repositories> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> <version>2.0.0.M5</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> <version>2.0.0.M5</version> </dependency> <!-- Exclude Spring Boot's Default Logging --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter</artifactId> <exclusions> <exclusion> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-logging</artifactId> </exclusion> </exclusions> </dependency> <!-- Add Log4j2 Dependency --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-log4j2</artifactId> <version>2.0.0.M5</version> </dependency> </dependencies> <pluginRepositories> <pluginRepository> <id>sbootplug</id> <url>https://repo.spring.io/libs-milestone</url> <snapshots> <enabled>false</enabled> </snapshots> </pluginRepository> </pluginRepositories> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> <version>2.0.0.M5</version> </plugin> </plugins> </build> </project> 

To test the ability to add streaming logging functionality for an already finished application, I set up experiments on a java minecraft server :)


What's next?


And then there will be several scenarios indicating the method of their implementation and their inherent limitations.


The meaning of the scenarios is the organization of the pipeline for



Scenario 1, Universal



Features: Windows or Linux OS, VM in Azure or On-premise
Limitations: need <certain> version of log4j2


Work algorithm:



Implementation details:


Log4j setup
In the config log4j2 HTTP appender must be defined


 <Http name="Http" url='https://<event hubs namespace>.servicebus.windows.net/<event hub>/messages?timeout=60&amp;api-version=2014-01'> <Property name="Authorization" value="SharedAccessSignature sr=xxxsig=yyyse=zzzskn=<event hub poicy>" /> <Property name="Content-Type" value="application/atom+xml;type=entry;charset=utf-8" /> <Property name="Host" value="<event hubs namespace>.servicebus.windows.net" /> <JsonLayout properties="true"/> </Http> 

A little more about Authorization header.
In order to work with the REST API EventHub, authorization by so-called is required. SaS token . This is essentially the hash of the resource and lifetime of the token.


For the formation of the sas token, besides the event hub, the namespace and the event hub, you will also need to know the name and key of the policy hub with the right to send messages. All information is on portal.azure.com.


Microsoft provides code samples for generating the Authorization header in various languages, incl. in java.


I use this html snippet , which I found on the Internet and slightly modified - it generates an Authorization header for an EventHub with a lifetime of one year.


Stream Analytics Setup
In Stream Analytics Jobe configures input with EventHub with parameters
Event serialization format = JSON, Encoding = UTF-8, Event compression type = None
At the same time, access to the data that http appender log4j pushes is simple, directly through select * from (and this will not always be the case)



To show the power of Stream Analytics just a little, please look at this request here.
 WITH errors as ( SELECT * FROM javahub WHERE level='FATAL' OR level='ERROR' ), activity as ( SELECT System.TimeStamp AS WindowEnd, level, COUNT(*) FROM javahub GROUP BY TumblingWindow( second , 10 ), level ) select * into pbierrors from errors; select * into pbiactivity from activity; 

By this request we



A couple of mouse clicks in power bi and we can monitor not only application errors, but also monitor the overall activity.


Scenario 2, Windows only, budget



Features: No EventHub needed. Smaller amount of traffic (logs are archived before transmission), do not bother with SaS tokens.
Restrictions: VM only in Azure. Logs arrive with a slight delay.


Work algorithm



Implementation details


Log4j setup
Be sure to pay attention that



Configure the Azure Monitoring & Diagnostics Extension for VM



 az vm extension set --name IaaSDiagnostics \ --publisher "Microsoft.Azure.Diagnostics" \ --resource-group <group name> \ --vm-name <vm name> \ --protected-settings "privateSettings.json" \ --settings "publicSettings.json" \ --version "1.11.1.0" 

publicSettings.json
 { "WadCfg": { "DiagnosticMonitorConfiguration": { "overallQuotaInMB": 10000, "DiagnosticInfrastructureLogs": { "scheduledTransferLogLevelFilter": "Error" }, "Directories": { "scheduledTransferPeriod": "PT1M", "DataSources": [ { "containerName": "<blob container name in your storage account>", "Absolute": { "path": "C:\\<folder>\\<to monitor>", "expandEnvironment": false } } ] } } }, "StorageAccount": "<your storage account name>", "StorageType": "Table" } 

privateSettings.json
 { "storageAccountName": "<your storage account name>", "storageAccountKey": "<storage account access key (use portal to obtain it)>" } 

Stream Analytics Setup
The input used is Blob storage with parameters:
PathPattern - path to files with archived logs in Blob storage. If you did a directory hierarchy (as in the example above), then it should also be taken into account.


Example: WAD / be7f1c92-2841-4ea1-b9d8-ec83c211b8ea / IaaS / _minesrv / {date} / {time} /
DateFormat must be set according to format format% d in log4j
Event serialization format = CSV, Delimeter = semicolon, Encoding = UTF-8, Event compression type = GZIP


Data access in Steam Analytics queries is also immediate.
select * from returns a table with TS, LEVEL, MESSAGE fields (according to the header defined in log4j)

And one more query is more difficult.
 WITH SessionInfo AS ( SELECT TS, 'START' as EVENT, SUBSTRING(MESSAGE, 0, REGEXMATCH(MESSAGE, '[ ]*joined the game')) as PLAYER FROM logslob TIMESTAMP BY TS WHERE REGEXMATCH(MESSAGE, 'joined the game') > 0 UNION SELECT TS, 'END' as EVENT, SUBSTRING(MESSAGE, 0, REGEXMATCH(MESSAGE, '[ ]*left the game')) as PLAYER FROM logslob TIMESTAMP BY TS WHERE REGEXMATCH(MESSAGE, 'left the game') > 0 ), RawLogs AS ( SELECT TS, LEVEL, MESSAGE FROM logslob TIMESTAMP BY TS ) SELECT * INTO sbq from SessionInfo; SELECT * INTO pbi from RawLogs; 

Here we send all logs to RawLogs output, but in SessionInfo there are separate records about starting and stopping the session with the player's name - for subsequent notifications


Scenario 3, Linux only



Features: The minimum configuration of Log4j (and the minimum requirements for the version of log4j) - just write to the file. All new records in the file are processed (without delay)
Restrictions: VM only in Azure, writing to a file only in json, more complex data access from stream analytics job


Work algorithm:



Implementation details


Log4j setup
Logs should be written in json format and always - each object on one line of the file with logs
Fortunately, with log4j, this can be configured simply


 <File name="FileLog" fileName="app.log"> <JsonLayout properties="true" compact="true" eventEol="true"/> </File> 

Configure the Azure Linux Diagnostics Extension for VM



 az vm extension set --name LinuxDiagnostic \ --publisher "Microsoft.Azure.Diagnostics" \ --resource-group <group name>\ --vm-name <vm name>\ --protected-settings "linux_privateSettings.json" \ --settings "linux_publicSettings.json" \ --version "3.0.109" 


linux_publicSettings.json
 linux_publicSettings.json: { "StorageAccount": "<your storage account>", "sampleRateInSeconds": 15, "ladCfg": { "diagnosticMonitorConfiguration": { "metrics": { "metricAggregation": [ { "scheduledTransferPeriod": "PT1H" }, { "scheduledTransferPeriod": "PT1M" } ], "resourceId": "/subscriptions/<subscription id>/resourceGroups/<resource group>/providers/Microsoft.Compute/virtualMachines/<vm name>" } } }, "fileLogs": [ { "file": "/<path>/<to>/<log file>", "sinks": "LinuxEH" } ] } 

linux_privateSettings.json
 { "storageAccountName" : "<your storage account>", "storageAccountSasToken": "<sas token for storage account - generate it on the portal>", "sinksConfig": { "sink": [ { "name": "LinuxEH", "type": "EventHub", "sasURL": "https://<event hub namespace>.servicebus.windows.net/<event hub>?sr=xxxxxx&sig=yyyy&se=zzzz&skn=<policy name>" } ] } } 

Stream Analytics Setup
In Stream Analytics Jobe configures input with EventHub with parameters
Event serialization format = JSON, Encoding = UTF-8, Event compression type = None
At the same time, access to logged data is not so easy. If we just execute select * from, we will see something like this


those. we need data hidden somewhere in the json object stored in the PROPERTIES field
But this problem in stream analytics can be solved beautifully (yes, I really like this thing :)), for example with this query


 with events as ( select UDF.to_json(properties.MSG) as obj from ehtest ) select obj.* from events 

where UDF.to_json is the function we wrote about converting a string to a JSON object (yes, you can also write functions there, in javascript ...)



As a result, we get easy access to log data.



Finally


I hope this article will be useful to someone.
She has already brought me great benefit, because only through the implementation of practical cases can one really understand the possibilities and maturity of a particular technology.
If suddenly I missed any scripts - please write about it in the comments.


')

Source: https://habr.com/ru/post/341660/


All Articles