In light of the fact that recently a similar topic has often begun to appear on the pages of the project, I will publish the task, which for a considerable time we have offered to applicants for the position of php-developer in our company.
So, this is what is required: there is a log of calls to the web server, it doesn’t matter if it’s an Apache or nginx (it’s better to provide both options), it’s known that during the registration of this log the server was under ddos ​​attack. It is required to write a script that as quickly as possible and, if possible, accurately determines the addresses of the attacking nodes to block them.
')
It is understood that Google, blocked for a period of disaster, is not a big problem. It is easy to take a list of search engines and make the appropriate amendments.
Of course, the accuracy and speed of determination based on the given parameters may vary depending on the attack strength (frequency, number of nodes involved).
I enclose a solution, which I myself wrote after the appearance of the task, at the moment when it was particularly “impatient”. The solution is quite simple and rough, but effective (80 megabytes of the magazine in 20 seconds). It was this script that helped protect one of the projects from one of the attacks, when 1,500 idlers were breaking on the server.
Over time, it became clear that when attacking, it is more correct to evaluate the traffic itself (a more thorough and time-consuming approach, who has come across it more often - this is what it does): packet contents, headers, but, nevertheless, experience has shown that solving this problem definitely an idea of ​​the level of a specialist who claims that he is an experienced developer and has “eaten a dog” in the field of web programming.
The decision was given from an hour to two times (depending on the course of a person’s thought), the last time it was resolved in 1 hour and 20 minutes. In total, approximately 10% of applicants solved the problem.
Now we will have another task.
Source code: can read the log either streaming or from a file; if necessary, it adds blocking rules to the firewall. Rejection of regular expressions gives a significant increase in performance. Tested on files up to 500 megabytes.
Of course, it was possible to write all this a little shorter and more beautiful on a pearl, but I wanted to create a certain sample of the solution of the problem in php.
#! / usr / local / bin / php
<? php
if (! empty ($ argv [1])) $ fname = $ argv [1]; else $ fname = 'access.log';
$ fh = fopen ($ fname, 'r');
# $ fh = fopen ('php: // stdin', 'r');
$ timeLimit = 1;
$ countLimit = 50;
$ status = array ();
while ($ string = fgets ($ fh)) {
$ ip = substr ($ string, 0, strpos ($ string, ''));
if (! empty ($ status [$ ip] ['blocked'])) continue;
$ st = strpos ($ string, '[') + 1;
$ time = strtotime (substr ($ string, $ st, strpos ($ string, ']', $ st + 1) - $ st));
$ st = strpos ($ string, '"') + 1;
$ req = substr ($ string, $ st, strpos ($ string, '"', $ st + 1) - $ st);
$ st = strpos ($ req, "") + 1;
$ doc = substr ($ req, $ st, strpos ($ req, "", $ st) - $ st);
$ dot = strrpos ($ doc, ".");
$ dot = $ dot? strlen ($ doc) - $ dot: 0;
if (! $ dot || $ dot> 5) {
if (! empty ($ status [$ ip])) $ status [$ ip] = array ('count' => 0);
if (! empty ($ status [$ ip] ['time']) && $ time - $ status [$ ip] ['time'] <= $ timeLimit) {
$ status [$ ip] ['count'] ++;
if ($ status [$ ip] ['count']> = $ countLimit) {
echo "$ ip: $ doc \ n";
#system ("ipfw table 1 add $ ip");
#echo "$ ip \ n";
$ status [$ ip] ['blocked'] = 1;
}
}
$ status [$ ip] ['time'] = $ time;
$ status [$ ip] ['doc'] = $ doc;
}
}
?>