📜 ⬆️ ⬇️

Collect and analyze logs from Fluentd

fluentd

Any system administrator in his daily activities has to deal with the collection and analysis of logs. Collected logs need to be stored - they may be needed for a variety of purposes: for debugging programs, for analyzing incidents, as an aid to technical support services, etc. In addition, it is necessary to provide the ability to search the entire data array.


Organizing the collection and analysis of logs is not as simple as it might seem at first glance. Let's start with the fact that you have to aggregate the logs of different systems, which among themselves may have nothing at all in common. The collected data is also very desirable to link to a single timeline in order to track the connections between events. Implementing a log search is a separate and complex problem.
')
Over the past few years, interesting software tools have appeared that allow us to solve the problems described above. Solutions that allow storing and processing logs online are becoming increasingly popular: Splunk , Loggly , Papertrail , Logentries, and others. Among the undoubted advantages of these services should be called a user-friendly interface and low cost of use (and they provide quite good opportunities within the framework of basic free tariffs). But when working with large numbers of logs, they often do not cope with the tasks assigned to them. In addition, their use for working with large amounts of information is often unprofitable from a purely financial point of view.

A much more preferred option is to deploy a standalone solution. We thought about this issue when we faced the need to collect and analyze the cloud storage logs.

We began to look for a suitable solution and chose Fluentd , an interesting tool with rather wide functionality, about which there are almost no detailed publications in Russian. About the capabilities of Fluentd, we will describe in detail in this article.

general information



Fluend was developed by Sadayuki Furuhashi, co-founder of Treasure Data (she is one of the sponsors of the project), in 2011. It is written in Ruby. Fluentd is actively developing and improving (see the repository on GitHub, where updates are consistently displayed every few days).

Fluentd users include such famous companies as Nintendo, Amazon, Slideshare, and others.
Fluentd collects logs from various sources and sends them to other applications for further processing. Schematically, the process of collecting and analyzing logs using Fluentd can be represented as follows:

Fluentd

The main advantages of Fluentd are the following:



Fluentd is distributed free under the Apache 2.0 license. The project is documented in sufficient detail; on the official website and in the blog published a lot of useful educational materials.

Installation



In this article, we describe the installation procedure for Ubuntu OS 14.04. Installation instructions for other operating systems can be found here.

The installation and initial configuration of Fluentd is performed using a special script. Run the commands:

 $ wget http://packages.treasuredata.com/2/ubuntu/trusty/pool/contrib/t/td-agent/td-agent_2.0.4-0_amd64.deb

 $ sudo dpkg -i td-agent_2.0.4-0_amd64.deb


When installation is complete, run Fluentd:

 $ /etc/init.d/td-agent restart


Configuration



The principle of operation of Fluentd is as follows: it collects data from various sources, checks for compliance with certain criteria, and then sends them to specified locations for storage and further processing. Clearly all this can be represented in the form of the following scheme:

Fluentd

The settings for fluentd (what data and where to get, what criteria they must meet, where to send them) are written in the configuration file /etc/td-agent/td-agent.conf, which is built from the following blocks:



Consider the structure and content of these blocks in more detail.

Source: where to get the data



The source block contains information on where to get the data. Fluentd can receive data from various sources: these are application logs in various programming languages ​​(Python, PHP, Ruby, Scala, Go, Perl, Java), database logs, logs from various hardware devices, monitoring utilities data ... With a full list of possible sources data is available here. For connecting sources, specialized plugins are used.
The standard plugins include http (used to receive HTTP messages) and forward (used to receive TCP packets). You can use both of these plugins at the same time.
Example:

 # Receive events from port 24224 / tcp
 <source>
   type forward
   port 24224
 </ source>

 # http://this.host:9880/myapp.access?json={"event":"data "}
 <source>
   type http
   port 9880
 </ source>


As can be seen from the above example, the type of the plugin is specified in the type directive, and the port number is specified in the port directive.
The number of data sources is unlimited. Each data source is described in a separate <source> block.
All events received from sources are sent to the message router. Each event has three attributes: tag, time and record. Based on the tag attribute, a decision is made about where the events should be redirected (more on this later in this article). The time attribute indicates the time (this is done automatically), and the record attribute specifies the data in JSON format.
Here is an example of the event description:

 # generated by http://this.host:9880/myapp.access?json={"event":"data "}
 tag: myapp.access
 time: (current time)
 record: {"event": "data"}


Match: what to do with the data



In the Match section, it is indicated on which grounds events will be selected for further processing. Specialized plugins are used for this.

Standard output plugins include match and forward:
 # We receive events from port 24224
 <source>
   type forward
   port 24224
 </ source>

 # http://this.host:9880/myapp.access?json={"event":"data "}
 <source>
   type http
   port 9880
 </ source>

 # Take events marked with tags "myapp.access" 
 # and save them in the file / var / log / fluent / access.% Y-% m-% d
 # data can be broken into chunks using the time_slice_format option.

 <match myapp.access>
   type file
   path / var / log / fluent / access
 </ match>


The above fragment indicates that all events marked with myapp and access tags should be saved in the file, the path to which is specified in the path directive. Note that events that, in addition to the myapp and access tags, are also marked by other tags, will not be sent to the file.
Briefly consider the syntax features of the match directive:



Fluentd checks events for matching tags in the order in which match blocks follow each other in the configuration file. First, matches of a particular nature are indicated, and then more general matches. If this rule is violated, Fluentd will not work correctly. So, a fragment of the form

 <match **>
   type blackhole_plugin
 </ match>

 <match myapp.access>
   type file
   path / var / log / fluent / access
 </ match>



contains an error: first, it contains extremely general matches (<match **> means that events marked with any tag must be written to the file), and then private ones. The made error will lead to the fact that events with the tags myapp and access will not be recorded at all. For everything to work as it should, the fragment should look like this:

 <match myapp.access>
   type file
   path / var / log / fluent / access
 </ match>

 <match **>
   type blackhole_plugin
 </ match>


Include: merge configuration files



Directives can be imported from one configuration file to another and merged. This operation is performed in the include block:

 include config.d / *. conf


In this directive, you can specify the path to one or more files with a mask or URL:

 # absolute file path
 include /path/to/config.conf

 # you can also specify a relative path
 include extra.conf

 # mask
 include config.d / *. conf

 # http
 include http://example.com/fluent.conf


System: install additional settings



In the System block, you can set additional settings, for example, set the logging level (for more details, see here ), enable and disable the option to delete duplicate records from logs, etc.

Supported data types



Each plugin for Fluentd has a specific set of parameters. Each parameter in turn is associated with a specific data type.
Here is a list of data types supported in Fluentd:



Fluentd Plugins: Expanding Features



Fluentd uses 5 types of plugins: output plugins, input plugins, buffering plugins, form plugins and parsing plugins.

Input plugins



Input plugins are used to get contracts from external sources. Typically, such a plugin creates a thread socket (thread socket) and a listening socket (listen socket). You can also configure the plugin so that it will receive data from an external source with a certain frequency.
Input plugins include:


Output plugins



Output plugins are divided into three groups:



Non-buffering plugins include:



The number of plugins with buffering includes:


Buffering plugins



Buffering plugins are used as auxiliary for output plugins that use buffers. This group of plugins includes:

More details on how buffering plugins work can be found here .

Formatting plugins



With the help of formatting plugins, you can change the format of the data obtained from the logs. Standard plugins in this group include:



More details about formatting plugins can be found here .

Parsing plugins



These plugins are used to parse specific input data formats in cases where this cannot be done using standard tools. Detailed information about parsing plugins can be found here .

Naturally, all the fluentd plug-ins cannot be described within the framework of the review article - this is a topic for a separate publication. A complete list of all existing plugins to date can be found on this page .

General options for all plugins



The following parameters are also specified for all plugins:



On the official website of fluentd there are ready-made configuration files, adapted for different usage scenarios (see here ).

Data output and integration with other solutions



Fluentd collected can be transferred for storage and further processing to distributed databases (MySQL, PostgreSQL, CouchBase, CouchDB, MongoDB, OpenTSDB, InfluxDB) (see also the article about integrating Fluentd with Hadoop) cloud services (AWS) , Google BigQuery), search tools (Elasticsearch, Splunk, Loggly).
Fluentd + Elasticsearc + Kibana is often used to provide data search and visualization capabilities (for detailed installation and configuration instructions, see, for example, here ).

A full list of services and tools that Fluentd can transmit data on is available here . The official website also contains instructions for using Fluentd in conjunction with other solutions.

Conclusion



In this article, we presented an overview of the capabilities of the Fluentd log collector that we use to solve our own problems. If you are interested and you have a desire to try Fluentd in practice, we have prepared the Ansible role , which will help simplify the installation and deployment process.

The problem of collecting and analyzing logs probably faced many of you. It would be interesting to know what tools you use to solve it and why you chose them. And if you use Fluentd - share your experience in the comments.
If for one reason or another you cannot leave comments here - welcome to our blog .

Source: https://habr.com/ru/post/250969/


All Articles