⬆️ ⬇️

Parsing access.log apache

One day, I needed to write a script that analyzes the access.log of the apache web server. I didn’t have time to write myself and wanted to find something ready. But I did not find anything suitable and had to write myself. So I decided to share my experiences.

Actually I will describe the general algorithm for this task, which everyone can already tailor to fit their needs, if necessary.



So, for a start, let's remember what the access.log file is. And it looks like this:



193.34.12.132 - - [20/Oct/2011:12:46:08 +0400] "GET /scripts/fancyzoom.min.js HTTP/1.1" 200 4435 "http://kropus.amarox.ru/" "Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"

193.34.12.132 - - [20/Oct/2011:12:46:08 +0400] "GET /bitrix/js/main/core/css/core_window.css?1318570950 HTTP/1.1" 200 44471 "http://kropus.amarox.ru/" "Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"

193.34.12.132 - - [20/Oct/2011:12:46:08 +0400] "GET /bitrix/templates/kropus/components/bitrix/menu/kropus/script.js?1315557673 HTTP/1.1" 200 469 "http://kropus.amarox.ru/" "Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"





As you probably already know the format of recording logs, you can change the settings of Apache. The above is the default format. In it you can see the client’s IP address, date and time, protocol, request size, query string, browser and OS.

')

Briefly, the log analysis algorithm itself is the processing of each line of the file separately. When processing a string, an array is formed, each field of which contains one element: the address, length of the request, protocol, etc. Then, each of them, by necessity, does what he wants with this data: it enters the database, immediately displays it in the browser, collects statistics, and generally solves the problem it needs. And so it is repeated for each line or for the desired number of lines.



And now more:

To disassemble each line, it will be more convenient to put them all in one array, where each element is a line from a file. This can be done as simple as the file function.



$file_array = file(' ');





But this method is not always appropriate since the log file may consist of several thousand lines and the script just hangs or returns an error. Therefore, if you probably know that the file can be or is large, then it is better to implement the task using a cycle in which we simply take the next line of the file, parse it and process the data. Thus, it is not necessary to create a huge array with all the rows at once.



Here’s something like the function to get the string

($ fp - pointer to the file obtained earlier by the fopen function)



function get_log_string()

{

if (feof($fp))

{

return false;

}

$bits='';



for (;!feof($fp) && $bits != "\n";)

{

$bits .= fread($fp, 1);

}

return rtrim($bits, "\n");

}





Thus, we read one bit at a time until we reach the end of the line. If the end of the file is reached, then return false.



This function needs to be looped.

while ($ string = $ get_log_string ())

{

// Parse the resulting string $ string

// And make the necessary actions with it

}



Now actually go directly to the very analysis of the line.

I did this using the preg_match function and regular expressions. Set the pattern



$pattern = "/(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] \"(\S+) (.*?) (\S+)\" (\S+) (\S+) (\".*?\") (\".*?\")/"





We get in the end



preg_match ($pattern, $line, $result)





$ pattern - our pattern

$ line - a line to parse

$ result - array in which the results will be written



For convenience, the array can be made in a readable format.



$formated['ip'] = $result [1];

$formated['identity'] = $result [2];

$formated['user'] = $result [3];

$formated['date'] = $result [4];

$formated['time'] = $result [5];

$formated['timezone'] = $result[6];

$formated['method'] = $result [7];

$formated['path'] = $result[8];

$formated['protocol'] = $result[9];

$formated['status'] = $result[10];

$formated['bytes'] = $result[11];

$formated['referer'] = $result[12];

$formated['agent'] = $result[13];





That's all. We received an array of data from the log line. Now with this data you can do what you need in a particular case.

Source: https://habr.com/ru/post/131093/



All Articles