📜 ⬆️ ⬇️

We collect, parsim and give logs using Logstash

Greetings.

It so happened that I had to spend a lot of time on the logs. This includes participation in the development of rules and policies for collecting / storing / using logs, this is also the analysis of various incidents and the detection of anomalies. During the day our programs, services and servers generate a VERY large number of logs. And the need to dig in the logs is growing constantly.
I happened to work with commercial log management products like ArcSight, RSA Envision, Q1 Labs. These products have both advantages and disadvantages. But the article is not about them.
This is about Logstash .

What is Logstash? Why is it needed? What does he do?

Logstash is a tool for collecting, filtering and normalizing logs. It is a free and open source application. Apache license type 2.0.
')
My first acquaintance with LS (Logstash) happened more than a year ago, and since that time I have become very close to him. I like his idea and possibilities. For me, Logstash is like a meat grinder. It doesn’t matter what goes into it, but after not complicated manipulations, the output is always beautifully and clearly arranged information.

The format of the Logstash configuration file is simple and straightforward. It consists of three parts:

input { ... } filter { ... } output { ... } 

Input, filtering and output (or outgoing?) Blocks can be any number. It all depends on your needs and capabilities of iron.
Blank lines and lines beginning with # - Logstash ignores. So commenting on the configuration files will not cause any problems.

1. INPUT

This method is the entry point for logs. It determines which channels the logs will get into Logstash.
In this article I will try to introduce you to the basic types that I use - this is file, tcp and udp.
1.1 file
Configuration example, for working with local log files:
 input { file { type => "some_access_log" path => [ "/var/log/vs01/*.log", "/var/log/vs02/*.log" ] exclude => [ "*.gz", "*.zip", "*.rar" ] start_position => "end" stat_interval => 1 discover_interval => 30 } } 


Line by line description of settings:
 type => "some_access_log" 
type / description of the log. When using multiple input blocks, it is convenient to separate them for subsequent actions in filter or output.

 path => [ "/var/log/vs01/*.log", "/var/log/vs02/*.log" ] 
specify the path to the log files to be processed. The path must be absolute (/ path / to / logs /), not relative (../../some/other/path/).

 exclude => [ "*.gz", "*.zip", "*.rar" ] 
excludes files with appropriate extensions from processing.

 start_position => "end" 
waiting for new messages at the end of the file. When processing already existing logs, you can set the "beginning", then the processing of logs will occur line by line from the beginning of the files.

 stat_interval => 1 
how often (in seconds) to check files for changes. For large values, the frequency of system calls will decrease, but the time for reading new lines will also increase.

 discover_interval => 30 
time (in seconds) after which the list of processed files specified in the path will be updated.

1.2 tcp
Configuration example for working with remote service logs:
 input { tcp { type => "webserver_prod" data_timeout => 10 mode => "server" host => "192.168.3.12" port => 3337 } } 


Line by line description of settings:
 type => "webserver_prod" 
type / description of the log.

 data_timeout => 10 
the time (in seconds) after which the inactive tcp connection will be closed. Value -1 - connection will always be open.

 mode => "server" host => "192.168.3.12" port => 3337 
in this case, Logstash becomes the server, and starts listening at 192.168.3.12.73. When setting mode => “client”, Logstash will join the remote ip: port for logging.

1.3 udp
For udp settings similar to tcp:
 input { udp { type => "webserver_prod" buffer_size => 4096 host => "192.168.3.12" port => 3337 } } 

2. FILTER

In this block, the basic log manipulations are configured. This can be a breakdown by key = value, and the removal of unnecessary parameters, and the replacement of existing values, and the use of geoip or DNS queries for ip-addresses or host names.

At first glance, the use of filters may seem complicated and illogical, but this is not entirely true.
2.1 grok
An example of a configuration file for basic log normalization:
 filter { grok { type => "some_access_log" patterns_dir => "/path/to/patterns/" pattern => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" } } 


Line by line description of settings:
 type => "apache_access" 
type / description of the log. Here you need to specify the type (type), which is registered in the input block for which processing will take place.

 patterns_dir => "/path/to/patterns/" 
path to the directory containing log processing templates. All files in the specified folder will be loaded by Logstash, so that extra files are not desirable there.

 pattern => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" 
Indicates a template for dismantling logs. The template can be used either in the configuration file or from a template file. Not to be confused, I create a separate template file for each log format.

Read more about templates
Using the grok filter, you can structure most of the logs - syslog, apache, nginx, mysql, etc., recorded in a specific format.
Logstash has over 120 ready-made regular expression patterns (regex). So writing filters to handle most logs should not cause much fear or misunderstanding.

The format of templates is relatively simple - NAME PATTERN , that is, the name of the template and its corresponding regular expression are specified line by line. Example:
 NUMBER \d+ WORD \b\w+\b USERID [a-zA-Z0-9_-]+ 

You can use any previously created template:
 USER %{USERID} 

Templates can also be combined:
 CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4}) WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2}) MAC (?:%{CISCOMAC}|%{WINDOWSMAC}) 


Suppose we have the following log format:
55.3.244.1 GET /index.html 15824 0.043

Among the ready-made templates, fortunately there are already some regular expressions and there is no need to invent a wheeled vehicle driven by human muscle force through foot pedals or hand levers (I mean the bicycle if that).
With this log example, it is enough to write the pattern as " % {IP: client}% {WORD: method}% {URIPATHPARAM: request}% {NUMBER: bytes}% {NUMBER: duration} ", in this case all logs with this format will already have a certain logical structure.
After processing, our line will look like this:
client: 55.3.244.1
method: GET
request: /index.html
bytes: 15824
duration: 0.043


A list of ready-made grok-templates can be found here .



2.2 mutate
An example of a configuration file for changing / deleting entries from logs:
 filter { mutate { type => "apache_access" remove => [ "client" ] rename => [ "HOSTORIP", "client_ip" ] gsub => [ "message", "\\/", "_" ] add_field => [ "sample1", "from %{clientip}" ] } } 

Line by line description of settings:
 type => "apache_access" 
type / description of the log. The type (log) of logs with which the processing will take place is indicated.

 remove => [ "client" ] 
deletion of all data with the name of the client field. It is possible to specify several field names.

 rename => [ "HOSTORIP", "client_ip" ] 
rename field name HOSTORIP to client_ip .

 gsub => [ "message", "\\/", "_" ] 
replace all "/" with "_" in the messages field.

 add_field => [ "sample1", "from %{clientip}" ] 
Adding a new field “sample1” with the value “from% {clientip}”. The use of variable names is allowed.

2.3 date
Sample configuration file:
 filter { date { type => "apache_access" match => [ "timestamp", "MMM dd HH:mm:ss" ] } } 

Line by line description of settings:
 type => "apache_access" 
type / description of the log. The type (log) of logs with which the processing will take place is indicated.

 match => [ "timestamp", "MMM dd HH:mm:ss" ] 
time stamp An important setting for the further possibility of sorting or sampling logs. If the time in the logs is specified in unix timestamp (squid), then match => ["timestamp", "UNIX"]

2.4 kv
An example of a configuration file for processing logs in the format key = value:
 filter { kv { type => "custom_log" value_split => "=:" fields => ["reminder"] field_split => "\t?&" } } 

Line by line description of settings:
 type => "custom_log" 
type / description of the log. The type (log) of logs with which the processing will take place is indicated.

 value_split => "=:" 
use the "=" and ":" characters to separate key-values.

 fields => ["reminder"] 
the name of the field in which to look for 'key = value'. By default, a breakdown will occur for the entire log line.

 field_split => "\t?&" 
use "\ t? &" characters to separate keys. \ t - tab character

2.5 multiline
An example of a configuration file for “gluing” multi-line logs (for example, Java stack trace):
 filter { multiline { type => "java_log" pattern => "^\s" what => "previous" } } 

Line by line description of settings:
 type => "java_log" 
type / description of the log. The type (log) of logs with which the processing will take place is indicated.

 pattern => "^\s" 
regular expression

 what => "previous" 
when matching pattern, the string belongs to the previous (previous) string.



3. OUTPUT

The name of this block / method speaks for itself - it specifies settings for outgoing messages. Like previous blocks, any number of outgoing sub-blocks can be specified here.
3.1 stdout
An example of a configuration file for outputting logs to standard output:
 output { stdout { type => "custom_log" message => "IP - %{clientip}. Full message: %{@message}. End of line." } } 


 type => "custom_log" 
type / description of the log.

 message => "clIP - %{clientip}. Full message: %{@message}. End of line." 
indicates the format of the outgoing message. It is acceptable to use variables after grok-filtering.

3.2 file
An example of a configuration file for writing logs to a file:
 output { file { type => "custom_log" flush_interval => 5 gzip=> true path => "/var/log/custom/%{clientip}/%{type}" message_format => "ip: %{clientip} request:%{requri}" } } 


 type => "custom_log" 
type / description of the log.

 flush_interval => 5 
the recording interval for outgoing messages. A value of 0 will record every message.

 gzip=> true 
the outgoing message file will be compressed with gzip.

 path => "/var/log/custom/%{clientip}/%{type}" 
path and file name where outgoing messages will be saved. You can use variables. In this example, a separate folder will be created for each unique IP address and messages will be written to the file corresponding to the% {type} variable.

 message_format => "ip: %{clientip} request:%{requri}" 
outgoing message format.

3.3 elasticsearch
An example of a configuration file for writing logs to the Elasticsearch database:
 output { elasticsearch { type => "custom_log" cluster => "es_logs" embedded => false host => "192.168.1.1" port => "19300" index => "logs-%{+YYYY.MM.dd}" } } 


 type => "custom_log" 
type / description of the log.

 cluster => "es_logs" 
the name of the cluster specified in cluster.name in the Elasticsearch configuration file.

 embedded => false 
indicates which Elasticsearch database to use internal or third-party.

 port => "19300" 
transport port Elasticsearch.

 host => "192.168.1.1" 
Elasticsearch IP Address

 index => "logs-%{+YYYY.MM.dd}" 
index name where logs will be recorded.

3.4 email
This plugin can be used for alerts. The downside is that any changes to notifications (in principle, like any other settings) require restarting the logstash program, but the developer says that it may not be necessary in the future.
Sample configuration file:
 output { email { type => "custom_log" from => "logstash@domain.com" to => "admin1@domain.com" cc => "admin2@domain.com" subject => "Found '%{matchName}' Alert on %{@source_host}" body => "Here is the event line %{@message}" htmlbody => "<h2>%{matchName}</h2><br/><br/><h3>Full Event</h3><br/><br/><div align='center'>%{@message}</div>" via => "sendmail" options => [ "smtpIporHost", "smtp.gmail.com", "port", "587", "domain", "yourDomain", "userName", "yourSMTPUsername", "password", "PASS", "starttls", "true", "authenticationType", "plain", "debug", "true" ] match => [ "response errors", "response,501,,or,response,301", "multiple response errors", "response,501,,and,response,301" ] } } 


 type => "custom_log" 
type / description of the log.

 from => "logstash@domain.com" to => "admin1@domain.com" cc => "admin2@domain.com" 
if you have the strength to read this line up, then you can determine the meaning of these 3 settings by yourself :)

 subject => "Found '%{matchName}' Alert on %{@source_host}" 
subject letter of the notice. You can use variables. For example,% {matchName} is the name of the match condition from the “match” setting.

 body => "Here is the event line %{@message}" htmlbody => "<h2>%{matchName}</h2><br/><br/><h3>Full Event</h3><br/><br/><div align='center'>%{@message}</div>" 
body of the letter.

 via => "sendmail" 
way of sending the letter. One option is possible from two - smtp or sendmail.

 options => ... 
standard mail settings.

 match => [ "response errors", "response,501,,or,response,301", "multiple response errors", "response,501,,and,response,301" ] 
"Response errors" - the name of the alert (recorded in the variable% {matchName}). “Response, 501,, or, response, 301” - alert triggering criteria. In this example, if the response field contains the value 501 or 301, then the alert is considered triggered. The second line uses the AND logic, i.e. both conditions must be met.



4. Total


Create a habr.conf file:
 input { tcp { type => "habr" port => "11111" } } filter { mutate { type => "habr" add_field => [ "habra_field", "Hello Habr" ] } } output { stdout { type => "habr" message => "%{habra_field}: %{@message}" } } 


Launch Logstash:
java -jar logstash-1.1.9-monolithic.jar agent -f ./habr.conf

Verify that Logstash is running:
# netstat -nat | grep 11111
If port 11111 is present, then Logstash is ready to receive logs.

In the new terminal window we write:
echo "Logs are cool!" | nc localhost 11111

See the result in the window where Logstash is running. If a secret message appeared there, then everything works.

Ps The latest version of Logstash can be downloaded from here .

Thanks for attention,

Source: https://habr.com/ru/post/165059/


All Articles