I suggest knowledgeable people to share ways to build command line pipelines (pipelines) in Unix-shaped systems. Maybe it will turn out some reference :-)
I will begin with some of the most primitive sets useful for processing web server logs.
')
Inventory
So, as we all know, the
tail program displays the specified number of lines from the end of the file, in addition, in the mode specified by the -f key, new lines are displayed in the file in real time.
cat is just one of the ways to display the contents of a file or several files.
cut - allows you to select a fragment of a line with a given index, despite the fact that the line is divided into fragments by a specified character
sort - sorts several lines in the required order according to the necessary rules
uniq - removes consecutive identical lines, with the -c switch adds the number of its repetitions before each line
egrep - selection of strings from a file or stream for various logical conditions, including regular expressions
xargs - man xargs
What can you do about it
This is the standard log of some web server access_log in the standard format:
10.10.0.1 - - [11 / Aug / 2008: 02: 40: 15 +0400] “GET / HTTP / 1.0” 403 529 “https://referring.site.com” “Mozilla / 4.0 (compatible; MSIE 6.0; Windows NT 5.1; Hotbar 4.3.1.0)
To see what is happening with the resource, are there any requests and which ones:
tail -f access_logSee only the requested documents:
tail -f access_log | cut -d '' -f7See all requested unique documents:
cat access_log | cut -d '' -f7 | sort | uniqIf you want to quickly find out who made so many requests, sort them in descending order:
cat access_log | cut -d '' -f1 | sort | uniq -c | sort -r -dFunny example from real practice. There is a small botnet attacking the site, making a mistake in the requested URL. The error is that there are two slashes at the end of the URL, not one, for example: '/ rss / tag / CSS //'. With the help of a small set of console programs, it is possible to block access to it quite trivially (this is far from the most effective way, since each bot will make one request). It is understood that in the firewall ipfw there is a table 1, all addresses from which access to the resource is prohibited. So:
cat ./access_log | egrep 'GET [^] + //' | cut -d '' -f1 | xargs ipfw table 1 add $ 1