Logging with rsyslog, file names in tags, multi-line messages and fault tolerance

Task

Transfer log files to central server:

If the server is unavailable, do not lose messages, but accumulate and transmit when it appears on the network.
Correctly transmit multi-line messages.
When new log files appear, client reconfiguration is enough; server configuration changes are not required
You can transfer the contents of all the log files with the corresponding template name, and their contents on the server will be saved separately into files with the same name.

Conditions: the infrastructure uses only Linux servers.

Choosing software

Why do I need a syslog server when there are elastic beats, logstash, systemd-journal-remote and many more new brilliant technologies?

This is a standard for logging in POSIX-compatible systems.
Some software, for example haproxy, uses only it. That is completely get rid of the syslog you still will not succeed
It is used by network iron.
More difficult to set up, but richer in capabilities than alternative solutions.
For example, Elastic Filebeat is still not able to inotify.
No memory required. It is possible to use on embedded systems after a little tuning .
Allows you to change the message before saving / forwarding.
A strange task, but sometimes required. For example, PCI DSS in Section 3.4 requires that card numbers be masked or encrypted if they are saved to disk. The subtlety is that if someone entered the card number in the search box or in the feedback form, then as soon as you saved the query in the log, you violate the standard.

Observation : Users try to enter the card number in any input field on the page, and strive to inform it to the support along with the CVV.

Message format and legacy

TLDR: everything is bad

Syslog appeared in the 80s, and quickly became the standard of logging for Unix-like systems and network equipment. There was no standard, everyone wrote according to the principle of compatibility with existing software. In 2001, the IETF described the current state of affairs in RFC 3164 (status "informational"). Since the implementations are very different, in particular, this document says "the content of any IP packet sent to UDP port 514 should be considered as a syslog message". Then they tried to standardize the format in RFC 3195, but the document was unsuccessful, for it there is not a single live implementation for it at the moment. In 2009, RFC 5424 was adopted, defining structured messages, but rarely does anyone use it.

Here you can read what rsyslog author Rainer Gerhards thinks about all this. In fact, everything still implements syslog as it is, and the task to interpret all this diversity is to go to the syslog server. For example, a special module is included in rsyslog to parse the format used by CISCO IOS, and for the worst cases, starting from version 5, you can define your own parsers.

Syslog messages when transferring over the network look like this:

<PRI> TIMESTAMP HOST TAG MSG

PRI - Priority. Calculated as facility * 8 + severity .
- Facility (category) takes values from 0 to 23, they correspond to various categories of system services: 0 - kernel, 2 - mail, 7 - news. The last 8 - from local0 to local7 - are defined for services that do not fall into predefined categories. Full list .
- Severity takes on values from 0 (emergency, highest) to 7 (debug, lowest). Full list .
TIMESTAMP - time, usually in the format "Feb 6 18:45:01". According to RFC 3164, it can be recorded in ISO 8601 time format: "2017-02-06T18: 45: 01.519832 + 03: 00" with greater accuracy and taking into account the time zone used.
HOST - the name of the host that generated the message.
TAG - contains the name of the program that generated the message. No more than 32 alphanumeric characters, although in fact many implementations allow more. Any non-alphanumeric character ends the TAG and starts the MSG, usually a colon is used. Sometimes in square brackets contains the number of the process that generated the message. Since [ ] is not alphanumeric characters, the process number along with them should be considered part of the message. But usually all implementations consider this to be part of the tag, considering everything to be after the characters ":"
MSG is a message. Because of the uncertainty about where the tag ends and the message begins, a space may be added to the beginning. Cannot contain newline characters: they are frame delimiters, and will begin a new message. Ways to still send instant message:
- shielding. We will receive on the receiver side text with #012 instead of line breaks
- using octet-counted TCP Framing, as defined in RFC 5425 for TLS-enabled syslog. Non-standard, only some implementations.

Alternative to syslog protocol: RELP

If messages are sent between hosts using rsyslog, you can use the RELP - Reliable Event Logging Protocol instead of plain TCP sysog. It was created for rsyslog, now supported by some other systems. In particular, it is understood by Logstash and Graylog. For transport uses TCP. Can optionally encrypt messages using TLS. More reliable plain TCP syslog, does not lose the message when the connection is broken. Solves the problem with multi-line messages.

Rsyslog configuration

Unlike the second common alternative, syslog-ng, rsyslog is compatible with the configs of historical syslogd:

 auth,authpriv.* /var/log/auth.log *.*;auth,authpriv.none /var/log/syslog *.* @syslog.example.net

Since the rsyslog capabilities are much larger than those of its predecessor, the config file format has been extended with additional directives starting with the $ sign:

 $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat $WorkDirectory /var/spool/rsyslog $IncludeConfig /etc/rsyslog.d/*.conf

Starting with the sixth version, the si-like format RainerScript appeared, allowing to set complex rules for processing messages.

Since all this was done gradually and taking into account compatibility with old configs, a couple of unpleasant moments turned out in the end:

some plugins (I haven't come across such yet) may not support the new RainerScript style of settings, they still need old directives
setting via old directives does not always work as expected for the new format:
- if the omfile module omfile called using the old format:
  auth,authpriv.* /var/log/auth.log , the owner and permissions of the resulting file are governed by the old $FileOwner , $FileGroup , $FileCreateMode . But if it is called using action(type="omfile" ...) , then these directives are ignored, and you must configure the action parameters or specify when the module is loaded
- Directives like $ActionQueueXXX configure only the queue that will be used in the first action after them, then the values are reset.
semicolons are forbidden somewhere, and somewhere opposite are mandatory (the second less often)

In order not to stumble over these subtleties (yes, they are described in the documentation, but who reads it all?), You should follow simple rules:

for small simple configs use the old format:
:programname, startswith, "haproxy" /var/log/haproxy.log
for complex processing of messages and for fine-tuning of Actions always use RainerScript, without touching the legacy directive like $DoSomething

More information about the format of the config here .

Message handling

All messages come from Input (there may be a lot of them) and get processed by the RuleSet attached to it. If this is not explicitly specified, then the messages will fall into the RuleSet by default. All message processing directives that are not rendered into separate RuleSet blocks apply to it. In particular, it includes all directives from the traditional format of configs:
local7.* /var/log/myapp/my.log
A list of parsers is attached to Input to parse the message. If not explicitly specified, a list of parsers will be used to parse the traditional syslog format.
The parser extracts properties from the message. Most used:
- $msg - message
- $rawmsg - the entire message before parser processing
- $fromhost , $fromhost-ip - DNS name and IP address of the sending host
- $syslogfacility , $syslogfacility-text - facility in numeric and text form
- $syslogseverity , $syslogseverity-text - the same for severity
- $timereported - time from the message
- $syslogtag - TAG field
- $programname - TAG field with cut process id: named[12345] -> named
- the entire list can be found here
RuleSet contains a list of rules, the rule consists of a filter and one or more Actions attached to it
Filters are logical expressions using message properties. More about filters
The rules are applied sequentially to the message that entered the RuleSet, the message does not stop at the first rule that was triggered.
To stop processing a message, you can use special discard action: stop or ~ in legacy format.
Inside Action, templates are often used. Templates allow you to generate data for transfer to Action from message properties, for example, the format of a message to transfer over the network or the name of a file to write. More about templates
Typically, an Action uses an output module ("om ...") or a message modification module ("mm ..."). Here are some of them:
- omfile - output to file
- omfwd - transfer on a network, through udp or tcp
- omrelp - send over the network via RELP
- onmysql , ompgsql , omoracle - write to database
- omelasticsearch - write to ElasticSearch
- omamqp1 - transfer via AMQP 1.0 protocol
- entire list of output modules

→ Learn more about the order of processing messages

Configuration examples

We record all the messages of the auth and authpriv categories into the /var/log/auth.log file, and continue processing them:

 # legacy auth,authpriv.* /var/log/auth.log #   if ( $syslogfacility-text == "auth" or $syslogfacility-text == "authpriv" ) then { action(type="omfile" file="/var/log/auth.log") }

All messages with the program name starting with "haproxy" are written to the /var/log/haproxy.log file, without flushing the buffer to the disk after writing each message, and stop further processing:

 # legacy (      ,   ) :programname, startswith, "haproxy", -/var/log/haproxy.log & ~ #   if ( $programname startswith "haproxy" ) then { action(type="omfile" file="/var/log/haproxy.log" flushOnTXEnd="off") stop } #   if $programname startswith "haproxy" then -/var/log/haproxy.log &~

rsyslogd -N 1 check: rsyslogd -N 1 . More configuration examples: one , two .

Client: sending logs with saving file name

We will save the file names in the TAG field. I would like to include directories in the names in order not to observe a single-level scattering of files: haproxy/error.log . If the log is not read from the file, but from the messages sent from the program to the syslog, then it may not agree to write the / symbol to TAG, because it does not conform to the standard. Therefore, we encode them with double underscores, and on the log server we parse back.

Create a template for transferring logs over the network. We want to send messages with tags longer than 32 characters (we have long log names), and transmit a more accurate, than the standard, time stamp indicating the time zone. In addition, the local variable $.suffix will be added to the name of the log file, later it becomes clear why. Local variables in RainerScript begin with a dot. If a variable is not defined, it will expand to an empty string.

 template (name="LongTagForwardFormat" type="string" string="<%PRI%>%TIMESTAMP:::date-rfc3339% %HOSTNAME% %syslogtag%%$.suffix%%msg:::sp-if-no-1st-sp%%msg%")

Now create a RuleSet that will be used to transfer logs over the network. It can be attached to Input, reading files, or called as a function. Yes, rsyslog allows you to call one RuleSet from another. To use RELP, you must first load the appropriate module.

 # http://www.rsyslog.com/doc/relp.html module(load="omrelp") ruleset(name="sendToLogserver") { action(type="omrelp" Target="syslog.example.net" Port="20514" Template="LongTagForwardFormat") }

Now create an Input that reads the log file and attach this RuleSet to it.

 input(type="imfile" File="/var/log/myapp/my.log" Tag="myapp/my.log" Ruleset="sendToLogserver")

It should be noted that for each readable file, rsyslog creates state files in its working directory (set by $WorkDirectory directive). If rsyslog cannot create files there, then the entire log file will be re-transmitted after restarting rsyslog.

In case some application writes to a common syslog with a certain tag, and we want both to save it to a file and send it over the network:

 # Template to output only message template(name="OnlyMsg" type="string" string="%msg:::drop-last-lf%\n") if( $syslogtag == 'nginx__access:') then { # write to file action(type="omfile" file="/var/log/nginx/access" template="OnlyMsg") # forward over network call sendToLogserver stop }

The last stop needed to stop processing these messages, otherwise they will fall into a common syslog. By the way, if an application can choose another unix socket for syslog, besides the standard /dev/log (nginx and haproxy can do this), then you can use the imuxsock module to make a separate Input for this socket and attach the desired RuleSet to it without general logs flow by tags.

Reading log files specified via wildcard

Interlude

Programmer: I can not find somevendor.log logs for the beginning of last month on the log server, see pliz.
Devops: Uh ... but do we really write such logs? It’s necessary to warn. Well, in any case, everything older than a week has rubbed logrothey, if we didn’t save it, then it’s gone.
Programmer: violently indignant

If the application writes a lot of different logs, and sometimes new ones appear, then updating configs every time is inconvenient. I want to automate it. The imfile module can read wildcard files and save the path to the file in the message's metadata. True, the path remains complete, and we need only the last component, which we have to get from there. By the way, this is where the $.suffix variable $.suffix

 input(type="imfile" File="/srv/myapp/logs/*.log" Tag="myapp__" Ruleset="myapp_logs" addMetadata="on") ruleset(name="myapp_logs") { # http://www.rsyslog.com/doc/v8-stable/rainerscript/functions.html # re_extract(expr, re, match, submatch, no-found) set $.suffix=re_extract($!metadata!filename, "(.*)/([^/]*)", 0, 2, "all.log"); call sendToLogserver }

Wildcards are supported only in the imfile inotify mode (this is the default mode). Starting with version 8.25.0, wildcards are supported both in the file name and in the path: / var / log / / .log.

Multiline messages

For working with log files containing multi-line messages, the imfile module offers three options:

readMode=1 - messages are separated by an empty line
readMode=2 - new messages start at the beginning of the line, the message continues with an indent. Often that looks like spectra
startmsg.regex - determine the start of a new message on regexp (POSIX Extended)

The first two options have problems in inotify mode, and if necessary, the third one easily replaces them with the corresponding regexp. Reading multi-line logs has one subtlety. Usually, the sign of a new message is at its beginning, and we cannot be sure that the program has finished writing the past message until the following has begun. Because of this, the last message may never be transmitted. To avoid this, we set readTimeout , after which the message is considered complete and will be transmitted.

 input(type="imfile" File="/var/log/mysql/mysql-slow.log" # http://blog.gerhards.net/2013/09/imfile-multi-line-messages.html startmsg.regex="^# Time: [0-9]{6}" readTimeout="2" # no need to escape new line for RELP escapeLF="off" Tag=" mysql__slow.log" Ruleset="sendToLogserver")

Server

On the server, you must accept the transferred logs and decompose them into directories, in accordance with the IP of the sending host and the time of receipt: /srv/log/192.168.0.1/2017-02-06/myapp/my.log . In order to set the name of the log file depending on the content of the message, we can also use templates. The $.logpath variable will need to be set inside the RuleSet before using the template.

 template(name="RemoteLogSavePath" type="list") { constant(value="/srv/log/") property(name="fromhost-ip") constant(value="/") property(name="timegenerated" dateFormat="year") constant(value="-") property(name="timegenerated" dateFormat="month") constant(value="-") property(name="timegenerated" dateFormat="day") constant(value="/") property(name="$.logpath" ) }

Load the required modules and turn off $EscapeControlCharactersOnReceive , otherwise in the received logs all line breaks will be replaced with \n

 # Accept RELP messages from network module(load="imrelp") input(type="imrelp" port="20514" ruleset="RemoteLogProcess") # Default parameters for file output. Old-style global settings are not working with new-style actions module(load="builtin:omfile" FileOwner="syslog" FileGroup="adm" dirOwner="syslog" dirGroup="adm" FileCreateMode="0640" DirCreateMode="0755") # Module to remove 1st space from message module(load="mmrm1stspace") # http://www.rsyslog.com/doc/v8-stable/configuration/input_directives/rsconf1_escapecontrolcharactersonreceive.html # Print recieved LF as-it-is, not like '\n'. For multi-line messages # Default: on $EscapeControlCharactersOnReceive off

Now we will create a RuleSet, which parses the incoming logs and puts them into folders. Services that rely on logging solely on syslog expect it to save message time. Therefore, logs arriving from the standard facility will be saved in syslog format, and for local0-local7 arriving from the facility, we will extract the log name from the TAG field and record only the message itself without the remaining syslog fields. The problem with the space glued to the message remains for RELP, because it occurs even at the stage of parsing messages, we will cut this space.

To increase performance, we will write asynchronously: asyncWriting="on" and with a large buffer ioBufferSize=64k . We will not flushOnTXEnd="off" buffer after each received message flushOnTXEnd="off" , but we will do this every second so that the logs appear on the log server fairly quickly: flushInterval="1" .

 ruleset(name="RemoteLogProcess") { # For facilities local0-7 set log filename from $programname field: replace __ with / # Message has arbitary format, syslog fields are not used if ( $syslogfacility >= 16 ) then { # Remove 1st space from message. Syslog protocol legacy action(type="mmrm1stspace") set $.logpath = replace($programname, "__", "/"); action(type="omfile" dynaFileCacheSize="1024" dynaFile="RemoteLogSavePath" template="OnlyMsg" flushOnTXEnd="off" asyncWriting="on" flushInterval="1" ioBufferSize="64k") # Logs with filename defined from facility # Message has syslog format, syslog fields are used } else { if (($syslogfacility == 0)) then { set $.logpath = "kern"; } else if (($syslogfacility == 4) or ($syslogfacility == 10)) then { set $.logpath = "auth"; } else if (($syslogfacility == 9) or ($syslogfacility == 15)) then { set $.logpath = "cron"; } else { set $.logpath ="syslog"; } # Built-in template RSYSLOG_FileFormat: High-precision timestamps and timezone information action(type="omfile" dynaFileCacheSize="1024" dynaFile="RemoteLogSavePath" template="RSYSLOG_FileFormat" flushOnTXEnd="off" asyncWriting="on" flushInterval="1" ioBufferSize="64k") } } # ruleset

Reliable message delivery. Queues

image from the blog k-max.name

For some Actions, execution can sometimes slow down or stop, for example, sending logs over the network or writing to the database. In order not to lose the message and not interfere with the following Actions, you can use queues . Each Action is always associated with a message queue, by default it is a zero size Direct Queue. There is also a main queue for incoming messages from all Input, it can also be configured.

Types of queues: disk, in-memory, and the most interesting option- combined: Disk-Assisted Memory Queues. Such queues use memory and begin to use the disk if the queue in memory is full, or you need to save unsent messages while the service is being restarted. Messages will be recorded to disk when the number of messages in the queue reaches queue.highwatermark , and will stop stored on the disk when their number drops to queue.lowwatermark . In order for unsent messages to be saved to disk during a service preload, you must specify queue.saveonshutdown="on" .

If sending logs over the network or writing to the database was unsuccessful, the Action is suspended. rsyslog tries to resume Action at certain time intervals that increase with each attempt. To start sending logs shortly after solving problems, you need to set action.resumeRetryCount="-1" (unlimited) and the interval for stopping the queue is smaller: action.resumeInterval="10" . Read more about the options Actions .

RuleSet on the client with the queue will look like this:

 ruleset(name="sendToLogserver") { # Queue: http://www.rsyslog.com/doc/v8-stable/concepts/queues.html#disk-assisted-memory-queues # Disk-Assisted Memory Queue: queue.type="LinkedList" + queue.filename # queue.size - max elements in memory # queue.highwatermark - when to start saving to disk # queue.lowwatermark - when to stop saving to disk # queue.saveonshutdown - save on disk between rsyslog shutdown # action.resumeRetryCount - number of retries for action, -1 = eternal # action.resumeInterval - interval to suspend action if destination can not be connected # After each 10 retries, the interval is extended: (numRetries / 10 + 1) * Action.ResumeInterval action(type="omrelp" Target="syslog.example.net" Port="20514" Template="LongTagForwardFormat" queue.type="LinkedList" queue.size="10000" queue.filename="q_sendToLogserver" queue.highwatermark="9000" queue.lowwatermark="50" queue.maxdiskspace="500m" queue.saveonshutdown="on" action.resumeRetryCount="-1" action.reportSuspension="on" action.reportSuspensionContinuation="on" action.resumeInterval="10") }

Now you can safely reboot the log server - the messages are stored in the queue and will be transmitted when it becomes available.

ATTENTION: When sending messages from the queue after restoring the network, their relative order may be disturbed (thanks to zystem for comment). The author rsyslog replied that this is the expected behavior, you can read more here: http://www.gerhards.net/download/LinuxKongress2010rsyslog.pdf (section 7 "Concurrency-related Optimizations"). In short: an attempt to preserve a strict message order during multi-threaded processing of a queue led to a drop in performance due to the need for thread blocking; the concept of a strict sequence of messages may not make sense for some types of transport, multi-threaded generators and message receivers.

fault tolerance

You can configure the Action to be executed only if the previous Action has been suspended: description . This allows you to configure a failover configuration. Some Actions use transactions to increase performance. In this case, success or failure will be known only after the completion of the transaction, when the messages have already been processed. This can lead to the loss of part of the message without calling the failover Action. To prevent this from happening, you must set the parameter queue.dequeuebatchsize="1" (default 16), which can reduce performance.

 ruleset(name="sendToLogserver") { action(type="omrelp" Target="syslog1.example.net" Port="20514" Template="LongTagForwardFormat") action(type="omrelp" Target="syslog2.example.net" Port="20514" Template="LongTagForwardFormat" action.execOnlyWhenPreviousIsSuspended="on" queue.dequeuebatchsize="1") }

I have not tried this opportunity in production yet.

Interaction with logrotate

Logs that rsyslog writes

Normally rotated using the default smth.log : smth.log renamed to smth.log.1 , a new smth.log is created. In the post-rotate action, you need to send SIGHUP to the rsyslogd process. : rsyslog SIGHUP, -.

 /var/log/someapp/*.log{ weekly missingok rotate 5 create 0644 syslog adm sharedscripts postrotate test -s run/rsyslogd.pid && kill -HUP $(cat /run/rsyslogd.pid) # postrotate script should always return 0 true endscript }

, rsyslog

, (SIGHUP - ), . rsyslog inode .

logrotate copytruncate , smth.log smth.log.1 . rsyslog ( , ). 8.16.0, imfile reopenOnTruncate (- "off" , "on" ). rsyslog (inode , ). "", . 8.16.0, copytruncate SIGHUP rsyslogd post-rotate action.

: Debian/Ubuntu logrotate , — . /etc/cron.daily/logrotate .

Total

. , syslog. . - . , .

rsyslog v8, . Ubuntu ppa adiscon/v8-stable . CentOS/RHEL .

UPD: , zystem .

UPD2: logrotate.

Source: https://habr.com/ru/post/321262/

All Articles