📜 ⬆️ ⬇️

Command Line Conveyors

I suggest knowledgeable people to share ways to build command line pipelines (pipelines) in Unix-shaped systems. Maybe it will turn out some reference :-)

I will begin with some of the most primitive sets useful for processing web server logs.


')

Inventory


So, as we all know, the tail program displays the specified number of lines from the end of the file, in addition, in the mode specified by the -f key, new lines are displayed in the file in real time.
cat is just one of the ways to display the contents of a file or several files.
cut - allows you to select a fragment of a line with a given index, despite the fact that the line is divided into fragments by a specified character
sort - sorts several lines in the required order according to the necessary rules
uniq - removes consecutive identical lines, with the -c switch adds the number of its repetitions before each line
egrep - selection of strings from a file or stream for various logical conditions, including regular expressions
xargs - man xargs

What can you do about it


This is the standard log of some web server access_log in the standard format:
10.10.0.1 - - [11 / Aug / 2008: 02: 40: 15 +0400] “GET / HTTP / 1.0” 403 529 “https://referring.site.com” “Mozilla / 4.0 (compatible; MSIE 6.0; Windows NT 5.1; Hotbar 4.3.1.0)

To see what is happening with the resource, are there any requests and which ones:
tail -f access_log

See only the requested documents:
tail -f access_log | cut -d '' -f7

See all requested unique documents:
cat access_log | cut -d '' -f7 | sort | uniq

If you want to quickly find out who made so many requests, sort them in descending order:
cat access_log | cut -d '' -f1 | sort | uniq -c | sort -r -d

Funny example from real practice. There is a small botnet attacking the site, making a mistake in the requested URL. The error is that there are two slashes at the end of the URL, not one, for example: '/ rss / tag / CSS //'. With the help of a small set of console programs, it is possible to block access to it quite trivially (this is far from the most effective way, since each bot will make one request). It is understood that in the firewall ipfw there is a table 1, all addresses from which access to the resource is prohibited. So:
cat ./access_log | egrep 'GET [^] + //' | cut -d '' -f1 | xargs ipfw table 1 add $ 1

Source: https://habr.com/ru/post/37105/


All Articles