Greetings.
No so long ago, I faced the task of running through the old apache logs. It was necessary to sample at several IP addresses, find some anomalies and attempts at SQL-injection'ov. There were not so many logs, about a million lines, and you could easily do everything with a standard set of grap-awk-uniq-wc and so on.
Since I have been using the Logstash-Elasticsearch-Kibana bundle for more than a year to analyze and view all kinds of logs, I decided to use it in this situation.
')
A brief description of the main components of the system.
Logstash is a free java open-source program for collecting and normalizing logs. It can receive logs either from local files or via tcp / udp ports. At the time of writing, there are 26 different input filters. There is even an input module for collecting messages from twitter or irc.
Elasticsearch is a free open-source Apache Lucene-based search engine. Fast, highly customizable and highly scalable.
Kibana - web interface written in ruby, to display data from Elasticsearch. Simple setup, but many functions - search, graphics, stream.
1. Elasticsearch
1.1 Download Elasticsearch (size 16MB):
It is important to note that for Logstash version 1.1.9 Elasticsearch must be exactly version 0.20.2.# wget download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.2.tar.gz
1.2 Unpack the file:
# tar -zxf elasticsearch-0.20.2.tar.gz
Those who want to hit others can add the key "v" :)
1.3 By and large, Elasticsearch can also be run with factory settings. But I still change some parameters.
Go to your favorite text editors in the settings file:
# vi elasticsearch-0.20.2/config/elasticsearch.yml
List of my changes for standalone solutions:
cluster.name: logs index.number_of_replicas: 0 path.data: /elasticsearch/elasticsearch-0.20.2/data path.work: /elasticsearch/elasticsearch-0.20.2/work path.logs: /elasticsearch/elasticsearch-0.20.2/logs bootstrap.mlockall: true discovery.zen.ping.multicast.enabled: false
Before running Elasticsearch, make sure that the directories listed in path.data, path.work and path.logs exist.
1.4 We run Elasticsearch in foreground mode in order to make sure that the server is working correctly:
# ./bin/elasticsearch -f
If we see a line like this, then the server has started.
[2013-01-11 1151:35,160][INFO ][node ] [Virgo] {0.20.2}[17620]: started
1.5 To run Elasticsearch in background (daemon) mode, just remove the "
-f " key
# ./bin/elasticsearch
If on your server there are two tcp ports 9200 and 9300 in
LISTEN mode, this means Elasticsearch is ready for operation.
2. Logstash
2.1 Download the latest version of Logstash 1.1.9 (size 60MB)
# wget logstash.objects.dreamhost.com/release/logstash-1.1.9-flatjar.jar
2.2 Create a configuration file (apache.conf) to accept apache archive logs, normalize them and add them to the Elasticsearch database:
input { tcp { type => "apache-access" port => 3338 } } filter { grok { type => "apache-access" pattern => "%{COMBINEDAPACHELOG}" } date { type => "apache-access" timestamp => "dd/MMM/yyyy:HH:mm:ss Z" } } output { elasticsearch { embedded => false cluster => logs host => "172.28.2.2" index => "apache-%{+YYYY.MM}" type => "apache-access" max_inflight_requests => 500 } }
Brief description of some parameters:
port => 3338
In our case, Logstash will listen on tcp port 3338. We will send netcat's apache logs to it.
cluster => logs
Specify the name of the cluster that we registered in
cluster.name: in the Elasticsearch settings
host => "172.28.2.2"
ip address on which Elasticsearch runs
index => "apache-%{+YYYY.MM}"
in my case, the daily logs of apache are not so much around 40,000, so a monthly index is enough. If there are 500,000 or more logs per day, then it is more appropriate to create a daily index
“apache -% {+ YYYY.MM.dd}”2.3 Starting Logstash
# java -Xmx64m -jar logstash-1.1.9-flatjar.jar agent -f ./apache.conf
Verify that Logstash is running:
# netstat -nat |grep 3338
If port 3338 is present, then Logstash is ready to receive logs.
2.4 Starting to send the old apache logs to Logstash
# gunzip -c archived.apache.log.gz |nc 127.0.0.1 3338
How quickly all logs will be uploaded depends on many parameters - CPU, RAM, the number of logs.
In my case, 600 thousand log lines were completely filled in the database in 4 minutes. So your mileage may vary.
2.5 While the logging process is in progress, you can check whether the data falls into the Elasticsearch database.
To do this, enter
elasticsearch_ip : 9200 / _status? Pretty = true in the browser if you find lines like this:
"index" : "apache-2011.09"
it means everything works as required.
3. Kibana
3.1 Install the Kibana:
git clone --branch=kibana-ruby github.com/rashidkpc/Kibana.git
cd Kibana
gem install bundler
bundle install
If you are behind a proxy server, before the command "git clone ..." specify your proxy server:
git config --global http.proxy proxy.domain.com:3128
3.2 Kibana configuration
# vi KibanaConfig.rb
Settings that may require changes:
Elasticsearch = "localhost:9200" KibanaPort = 5601 KibanaHost = '172.28.2.2' Smart_index_pattern = 'apache-%Y.%m' Smart_index_step = 2592000
3.3 Starting Kibana
# ruby kibana.rb
After a successful launch, a similar text will appear on the screen:
== Sinatra/1.3.3 has taken the stage on 5601 for development with backup from Thin >> Thin web server (v1.5.0 codename Knife) >> Maximum connections set to 1024 >> Listening on 172.28.21.21:5601, CTRL+C to stop
3.4 Starting to view logs
In the browser, enter the address
http://172.28.21.21:5601 and get a convenient, fast interface for viewing old Apache logs.
For those who want to see what Kibana + Logsatsh + Elasticsearch is, there is a demo page
http://demo.logstash.net/Thanks for attention,