Viewing apache archive logs using Logstash + Elastisearch + Kiban

Greetings.

No so long ago, I faced the task of running through the old apache logs. It was necessary to sample at several IP addresses, find some anomalies and attempts at SQL-injection'ov. There were not so many logs, about a million lines, and you could easily do everything with a standard set of grap-awk-uniq-wc and so on.

Since I have been using the Logstash-Elasticsearch-Kibana bundle for more than a year to analyze and view all kinds of logs, I decided to use it in this situation.
')
A brief description of the main components of the system.

Logstash is a free java open-source program for collecting and normalizing logs. It can receive logs either from local files or via tcp / udp ports. At the time of writing, there are 26 different input filters. There is even an input module for collecting messages from twitter or irc.

Elasticsearch is a free open-source Apache Lucene-based search engine. Fast, highly customizable and highly scalable.

Kibana - web interface written in ruby, to display data from Elasticsearch. Simple setup, but many functions - search, graphics, stream.

1. Elasticsearch

1.1 Download Elasticsearch (size 16MB):
It is important to note that for Logstash version 1.1.9 Elasticsearch must be exactly version 0.20.2.
# wget download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.2.tar.gz

1.2 Unpack the file:
# tar -zxf elasticsearch-0.20.2.tar.gz
Those who want to hit others can add the key "v" :)

1.3 By and large, Elasticsearch can also be run with factory settings. But I still change some parameters.
Go to your favorite text editors in the settings file:
# vi elasticsearch-0.20.2/config/elasticsearch.yml

List of my changes for standalone solutions:

 cluster.name: logs index.number_of_replicas: 0 path.data: /elasticsearch/elasticsearch-0.20.2/data path.work: /elasticsearch/elasticsearch-0.20.2/work path.logs: /elasticsearch/elasticsearch-0.20.2/logs bootstrap.mlockall: true discovery.zen.ping.multicast.enabled: false

Before running Elasticsearch, make sure that the directories listed in path.data, path.work and path.logs exist.

1.4 We run Elasticsearch in foreground mode in order to make sure that the server is working correctly:
# ./bin/elasticsearch -f
If we see a line like this, then the server has started.

 [2013-01-11 1151:35,160][INFO ][node ] [Virgo] {0.20.2}[17620]: started

1.5 To run Elasticsearch in background (daemon) mode, just remove the " -f " key
# ./bin/elasticsearch

If on your server there are two tcp ports 9200 and 9300 in LISTEN mode, this means Elasticsearch is ready for operation.

2. Logstash

2.1 Download the latest version of Logstash 1.1.9 (size 60MB)
# wget logstash.objects.dreamhost.com/release/logstash-1.1.9-flatjar.jar

2.2 Create a configuration file (apache.conf) to accept apache archive logs, normalize them and add them to the Elasticsearch database:

 input { tcp { type => "apache-access" port => 3338 } } filter { grok { type => "apache-access" pattern => "%{COMBINEDAPACHELOG}" } date { type => "apache-access" timestamp => "dd/MMM/yyyy:HH:mm:ss Z" } } output { elasticsearch { embedded => false cluster => logs host => "172.28.2.2" index => "apache-%{+YYYY.MM}" type => "apache-access" max_inflight_requests => 500 } }

Brief description of some parameters:

 port => 3338

In our case, Logstash will listen on tcp port 3338. We will send netcat's apache logs to it.

 cluster => logs

Specify the name of the cluster that we registered in cluster.name: in the Elasticsearch settings

 host => "172.28.2.2"

ip address on which Elasticsearch runs

 index => "apache-%{+YYYY.MM}"

in my case, the daily logs of apache are not so much around 40,000, so a monthly index is enough. If there are 500,000 or more logs per day, then it is more appropriate to create a daily index “apache -% {+ YYYY.MM.dd}”

2.3 Starting Logstash
# java -Xmx64m -jar logstash-1.1.9-flatjar.jar agent -f ./apache.conf

Verify that Logstash is running:
# netstat -nat |grep 3338
If port 3338 is present, then Logstash is ready to receive logs.

2.4 Starting to send the old apache logs to Logstash
# gunzip -c archived.apache.log.gz |nc 127.0.0.1 3338
How quickly all logs will be uploaded depends on many parameters - CPU, RAM, the number of logs.
In my case, 600 thousand log lines were completely filled in the database in 4 minutes. So your mileage may vary.

2.5 While the logging process is in progress, you can check whether the data falls into the Elasticsearch database.
To do this, enter elasticsearch_ip : 9200 / _status? Pretty = true in the browser if you find lines like this:

 "index" : "apache-2011.09"

it means everything works as required.

3. Kibana

3.1 Install the Kibana:
git clone --branch=kibana-ruby github.com/rashidkpc/Kibana.git
cd Kibana
gem install bundler
bundle install

If you are behind a proxy server, before the command "git clone ..." specify your proxy server:
git config --global http.proxy proxy.domain.com:3128

3.2 Kibana configuration
# vi KibanaConfig.rb
Settings that may require changes:

 Elasticsearch = "localhost:9200" KibanaPort = 5601 KibanaHost = '172.28.2.2' Smart_index_pattern = 'apache-%Y.%m' Smart_index_step = 2592000

3.3 Starting Kibana
# ruby kibana.rb

After a successful launch, a similar text will appear on the screen:

 == Sinatra/1.3.3 has taken the stage on 5601 for development with backup from Thin >> Thin web server (v1.5.0 codename Knife) >> Maximum connections set to 1024 >> Listening on 172.28.21.21:5601, CTRL+C to stop

3.4 Starting to view logs
In the browser, enter the address http://172.28.21.21:5601 and get a convenient, fast interface for viewing old Apache logs.

For those who want to see what Kibana + Logsatsh + Elasticsearch is, there is a demo page http://demo.logstash.net/

Thanks for attention,

Source: https://habr.com/ru/post/166251/

All Articles

Viewing apache archive logs using Logstash + Elastisearch + Kiban

1. Elasticsearch

2. Logstash

3. Kibana

More articles: