📜 ⬆️ ⬇️

Graylog2 incremental setup



In the first article of this series, I explained how and why we chose open-sourced Graylog2 for the centralized collection and viewing of logs in the company. This time I will share how we deployed the greylog in production, and what problems we encountered.

Let me remind you that the cluster will be located on the hosting site, logs will be collected from all over the world via TCP, and the average number of logs will be about 1.2 TB / day under normal conditions.
')
We currently use CentOS 7 and Graylog 2.2, so all configurations and options will be described exclusively for these versions (in Graylog 2.2 and Graylog 2.3, a number of options are different).

Placement planning


According to our calculations, we need 6 servers. Each server has 2 network interfaces; the first is 100MB per world and 1GB private network. On the external interface, it will listen to the web interface and on the part, the nodes will listen to HAproxy, but more on that later. Private 1GB network is used to report the rest.

In total, we have 6 Hp DL380p Gen8 servers, 2x Intel Octa-Core Xeon E5-2650, 64 GB RAM, 12x4TB SATA. This is the standard host configuration. We broke the disks like this: 1 disk for the system, Mongu and the log of the greyloga, the rest - 0 raid and for the elastic storage. Since replication occurs at the level of the elastic itself, we do not need other raids very much.

Servers are distributed as follows:


Schematically, it looks like this:



Settings:


(We will not stop at setting up HAproxy and keepalived, as this is beyond the scope of this article.)

Initial setup


The initial setup of Graylog2 is quite simple and trivial, so I simply strongly advise everyone to act according to official instructions:


There is a lot of useful information that will further help in understanding the principles of configuration and tuning. During the initial setup, I have never had any problems, so let's move on to the configuration files.

In the graillog server.conf, at the first stage, we specified:

# Specify our time zone

root_timezone = Europe/Moscow 

# Since there are not so many hosts, we indicate here all the elastic hosts

 elasticsearch_discovery_zen_ping_unicast_hosts = elasticsearch_discovery_zen_ping_multicast_enabled = false 

# We allow starting the search from wildcard, because we all understand what it is and what it threatens

 allow_leading_wildcard_searches = true 

# This refers more to tuning, but at the first stage we specified a ring_size equal to half of the L2 cache of the processor.

# And specify the data to send letters with greylog (in the blank lines you need to insert your data).

#Email transport

 transport_email_enabled = true transport_email_hostname = smtp.gmail.com transport_email_port = 465 transport_email_use_auth = true transport_email_use_tls = false transport_email_use_ssl = true transport_email_auth_username = transport_email_auth_password = transport_email_subject_prefix = [graylog] transport_email_from_email = transport_email_web_interface_url = 

Next you need to try out the elastic elastic in the / etc / sysconfig / elasticsearch file (31 GB recommended in the dock):

 ES_HEAP_SIZE=31g 

At the initial stage, we did not rule anything any more and for a while did not even know any problems. Therefore, we proceed directly to launching and configuring the greilog itself.

Storage and assembly of logs, access rights


It's time to set up our greylog and start receiving data. The first thing we need is to decide how we will get the logs. We stopped at GELF TCP - it allows you to configure collectors via the web interface (I will show a little below).

We set up our first input. In the System / Inputs web interface at the top left, select GELF TCP and then Launch new input:



A window opens:




Now we have our first input, which will receive messages. We proceed to setting up the storage of logs. It is necessary to decide how many logs and how we will store them.

We divided everything into projects and logically related services within projects, and then divided by the number of logs we need to store. Personally, we store part of the logs for 14 days, and some - 140.

Data is stored in graillog indexes. Indices, in turn, are divided into shards. Shards are Primary and Replica. By default, data is written to the primeri shards and replicated to the replica. We only replicate important indexes. Large indexes have 2 primary shards and one replica, which guarantees the failure of 2 nodes without losing data.

Let's create an index that will have 2 shards and 1 replica and will store their logs for 14 days.

We go to System \ Indices, there we click Create index set:




The following items are responsible for the number of stored logs and by name it becomes clear that they can be stored by the number of messages, by time, and by the size of the index. We want to store 14 days.


Now we need to do the so-called stream. Greylog provides rights at the level of these same streams. The bottom line is this: in the stream we specify in which index to write data and on what conditions. It is located in Sterams. Setup takes place in 2 stages.

1. Creating a stream.




2. Further Manage Rules.

Everything is simple there: we add the necessary rules by which the logs will get there.

Now we have an input that accepts logs; the index that stores them; and stream, which essentially collects a lot of logs in one space.

Further we set up sending logs to the greylog itself.

Agent configuration


The agent configuration path is described here . It all works in the following way: Graylog Collector Sidecar is installed on the client, which manages the log collector's backend (in our case for Linux and Windows, this is nxlog).

Let's prepare the rules for assembling System \ Collectors \ Manage Configurations logs. Create a configuration and go to its configuration, there immediately go to the tab NXLog. We see 3 fields: Output, Configure NXLog Inputs and Define NXLog Snippets. These are all pieces of NXLog's configs, which will be the collector to climb to the final nodes. From here we will manage the fields and their values, as well as the files that we will be monitoring, etc.



Let's start with the tags. We drive in tags according to which the client will understand which configuration he needs to pick up.

Output field, there is one configuration:




 Exec $Hostname= $collector_node_id; Exec if ( $collector_node_id =~ /^(\w+)\.(\w+).(\w+).(\w+).(\w+)/)\ { \ $name = $1;\ $datacenter = $2; \ $region = $3;\ $platform = $5;\ }; 

This was a general configuration setting, where to send and how to sign each log. Next in the Configure NXLog Inputs field, we indicate which files to monitor.




Define NXLog Snippets, we usually do not touch.

At this point, we assume that the default setting is complete, you can add more files, fields, etc. there.

Let's proceed to the installation of the agent. In general, it is very well described by reference , so here we will not stop at the manual rolling of agents, but immediately proceed to automation. We make it ansiblom.

Under Linux, there is nothing complicated, and on Windows there is a problem in the automatic installation, so we simply unpack the file and generate unique UUIDs on the ansibla side. Roles for ansibla:


At this setting can be considered complete and the first logs will begin to appear in the system.

I would also like to talk about how we have tuned the system to fit our needs, but since the article has already turned out to be quite voluminous, it will be continued.

Source: https://habr.com/ru/post/341274/


All Articles