ELK + R as log storage 2. Installation and configuration

In the continuation of my experiments with the storage of logs on ELKR I am writing a kind of “manual” for installation and basic configuration.

Those articles that VERY helped:
We collect, parsim and give logs using Logstash - materiel
Collect and analyze logs with Lumberjack + Logstash + Elasticsearch + RabbitMQ - a good example of real use

Thanks to the authors!
')
So, we will deploy the following architecture:
Device => HAProxy => Logstash-Listener => RabbitMQ => Logstash-Filter => Elasticsearch-Balancer => Elasticsearch DATA / MASTER

I did this on DevStack (which made life pretty easy), your tool can be anything.
Take Ubuntu as an example.
NTP is configured on all nodes.
Default-jre-headless is installed on the Logstash and Elasticsearch nodes

ES installation

For all ES nodes:

Add a repository

wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - echo "deb http://packages.elastic.co/elasticsearch/1.6/debian stable main" | sudo tee -a /etc/apt/sources.list

Install the elasticsearch package

 sudo apt-get update && sudo apt-get install elasticsearch

Next, set up configs.

For Data node:

 cluster.name: CLUSTER_NAME node.name: Data Node NAME node.data: true node.master: false index.number_of_shards: 2 index.number_of_replicas: 2 http.enabled: false path.data: /PATH/TO/DATA/DIR

For Master node:

 cluster.name: CLUSTER_NAME node.name: Master Node NAME node.data: false node.master: true index.number_of_shards: 2 index.number_of_replicas: 2 http.enabled: false

For Balancer node (s):

 cluster.name: CLUSTER_NAME node.name: Balancer Node NAME node.master: false node.data: false index.number_of_shards: 2 index.number_of_replicas: 2 http.port: 9200

CLUSTER_NAME is the name of the cluster. In most cases, this setting is sufficient.
NAME - I used the hostname, but any unique name is possible.
/ PATH / TO / DATA / DIR - path to the directory where data will be stored. In my case, this is / usr / share / elasticsearch / data.

Expand in order:

Data
Master
Balancer

I recommend to do the following:
Create a logstash-template.json template

 { "template": "*", "settings" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }

and run on ES-Balancer

 curl -XPUT 'http://localhost:9200/_template/template_logstash/' -d @logstash-template.json

This is the default template for all indexes. Each new index will be created with 2 shards and one replica. This is suitable for configuration with two Data nodes.
The best would be
number_of_shards = number of cores
number_of_replicas = number of data-node (multiple) - 1

It is also very helpful in administering the mobz / elasticsearch-head module. On ES-Balancer:

 /usr/share/elasticsearch/bin/plugin -install mobz/elasticsearch-head

Do not forget to restart the elasticsearch service after changing configs.
Everything! Where to store the logs we now know.

RabbitMQ installation

Next, install Rabbimq-Server:

 apt-get update && apt-get install rabbitmq-server -y rabbitmq-plugins enable rabbitmq_management && service rabbitmq-server restart rabbitmqctl set_policy ha-all ".*" '{"ha-mode":"all","ha-sync-mode":"automatic"}'

You can create a user for logstesh, but this is not necessary.

Install Logstash

For all nodes:

 wget -qO - https://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add - echo "deb http://packages.elasticsearch.org/logstash/1.5/debian stable main" | sudo tee -a /etc/apt/sources.list sudo apt-get update && sudo apt-get install logstash

Now the configs in /etc/logstash/conf.d

For Filter:

config.conf

 input { rabbitmq { auto_delete => false durable => true exchange => "logstash-exchange" exclusive => false host => '1.1.1.1' key => "logstash-routing-key" password => "guest" prefetch_count => 4 queue => "logstash-filter" type => "logstash-filter-input" user => "guest" threads => 3 arguments => {"x-ha-policy" => "all"} } } # Filter all data filter { if [type] == "apache-access" { grok { match => [ "message", "%{COMBINEDAPACHELOG}" ] add_field => [ "severity", "normal" ] } } else if [type] == "eventlog" { mutate { update => { "host" => "None" } add_field => { "severity" => "None" } } } if [type] =~ "apache" { mutate { add_field => { "Message" => "%{message}" } } } if [type] == "eventlog" { mutate { gsub => [ 'message', '[\[\]\\]', ""] } mutate { gsub => [ 'Message', ':', "-"] } mutate { gsub => [ 'Message', '"', "'"] } json { source => "message" } mutate { update => { "host" => "%{Hostname}" } update => { "severity" => "%{Severity}" } } } if [_jsonparsefailure] { mutate { update => { "host" => "Unknown" } update => { "severity" => "Unknown" } } } } # Output to elasticsearch balancer output { elasticsearch { cluster => "CLUSTER_NAME" protocol => "http" host => ['2.2.2.2'] index => "logstash-%{+YYYY.MM.dd}" } }

1.1.1.1 - The IP address of the entry point to RabbiMQ. In our case, this is specifically the IP Rabbit.
2.2.2.2 - IP address of ES-Balancer. Used to explicitly specify which balancer to use.

A dirty hack with eventlog is needed in order to somehow cope with the endless stream of different quotes and slashes that break all the “beauty”.

For Listener:

config.conf

 input { # Logstash to Logstash lumberjack { codec => "json" port => 3333 ssl_certificate => "/etc/logstash/ls.crt" ssl_key => "/etc/logstash/ls.key" } # Nxlog to Logstash tcp { type => "eventlog" port => 3515 codec => "json" } } output { rabbitmq { durable => true exchange => "logstash-exchange" key => "logstash-routing-key" exchange_type => "direct" host => '1.1.1.1' password => "guest" persistent => true user => "guest" } #file { path => "/var/log/raw/%{host}/%{+YYYY-MM-dd}.log" } }

You can read more here and here.

Install HAProxy

 apt-get update && apt-get install haproxy -y

Rule /etc/haproxy/haproxy.cfg

haproxy.cfg

 global log /dev/log local0 log /dev/log local1 notice chroot /var/lib/haproxy user haproxy group haproxy daemon defaults log global mode http option tcplog option dontlognull contimeout 5000 clitimeout 50000 srvtimeout 50000 listen to_listeners_event_log :3515 mode tcp balance roundrobin option tcpka server one 1.1.1.1:3515 check inter 5000 server two 2.2.2.2:3515 check inter 5000 … server last nnnn:3515 check inter 5000 listen to_listeners_event_log :3333 mode tcp balance roundrobin option tcpka server one 1.1.1.1:3333 check inter 5000 server two 2.2.2.2:3333 check inter 5000 … server last nnnn:3333 check inter 5000 listen stats :1936 stats show-legends stats refresh 5s stats hide-version stats uri /

If suddenly it does not start, then in / etc / default / haproxy we change ENABLED = 0 to 1 .

You can watch statistics on the 1936 port.

NXLOG installation

NXLOG has a Community version, download it from off site.
Config as follows:

nxlog.conf

 #define ROOT C:\Program Files\nxlog define ROOT C:\Program Files (x86)\nxlog Moduledir %ROOT%\modules CacheDir %ROOT%\data Pidfile %ROOT%\data\nxlog.pid SpoolDir %ROOT%\data LogFile %ROOT%\data\nxlog.log <Extension json> Module xm_json </Extension> # Windows Event Log <Input eventlog> # Uncomment im_msvistalog for Windows Vista/2008 and later Module im_msvistalog Query <QueryList> \ <Query Id='1'> \ <Select Path="Application">*[System[(Level=1 or Level=2 or Level=3 or Level=4)]]</Select> \ <Select Path='Security'>*</Select> \ <Select Path="System">*[System[(Level=1 or Level=2 or Level=3 or Level=4)]]</Select> \ </Query> \ </QueryList> Exec $Message = replace ($Message, '"', "'"); # Uncomment im_mseventlog for Windows XP/2000/2003 # Module im_mseventlog Exec to_json(); </Input> <Output out> Module om_tcp # Ip of HAProxy listeners server Host 192.168.0.14 # Port in Listeners Port 3515 </Output> <Route 1> Path eventlog => out </Route>

I deliberately set the IP proxy more familiar, so that it can be seen that this is the only address that the client should see, except for kibana.

Restart the nxlog service.

If we did everything correctly, then you can already see some activity on all hosts.

Kibana installation

 cd /tmp wget https://download.elastic.co/kibana/kibana/kibana-4.1.0-linux-x64.tar.gz tar xvfC kibana-4.1.0-linux-x64.tar.gz /usr/local rm -vf /usr/local/kibana ln -s /usr/local/kibana-4.1.0-linux-x64 /usr/local/kibana cd /usr/local/kibana

We rule config /usr/local/kibana/config/kibana.yml

 … port: 80 … elasticsearch_url: 'http://IP_TO_ES_BALANCER:9200' …

Well, run:

 ./bin/kibana -q &

We go through the browser to the external ip kibana, we will be offered to make intuitive manipulations with the setting.

Everything. Now you can lie back in the chair and make 1-10 dashboards to watch the logs (by the way, you can arrange them beautifully in some cms).
Of course, to make it easier to manage the entire system, it is better to automate this process somehow. In our case, this is chef + openstack: I asked what recipe to run at startup and went to drink tea / coffee / lemonade.
And this is not all that can and should be done. But the base platform is already there.

Minimum grade:
1 HAProxy
2 Listener
1 RabbitMQ
2 Filter
1 ES-Balancer
2 ES-Master
2 ES-Data

If RabbitMQ is thrown into a cluster, then all horizontal scaling will be reduced to adding Listeners, Filteres and ES-Data nodes. Taking into account automation, this process will take 5-15 minutes (we do not take the bureaucracy into account).
I managed to deploy all this from scratch with the help of Chef in a test stand in 1 hour, of which I created hosts for 5 minutes, wrote a script for 5 minutes, which executed the recipes on hosts, read news for Habré for 5 minutes and checked everything for 10 minutes works correctly.

I hope this article will help someone.

Source: https://habr.com/ru/post/261197/

All Articles