📜 ⬆️ ⬇️

LogParser - a familiar look at unusual things

When I once again used LogParser, then to penetrate and other people's experiences, I entered its name into the search on Habré. As a result - the message "Surprisingly, but the search has not yielded results." This is truly amazing when such an interesting tool is overlooked. It's time to fill this gap. So, meet the LogParser . A small but damn useful utility for SQL lovers.

From the name of the tool, it would seem very difficult to understand what it does in the SQL section. And the truth is that it is the same LogParser, like the ChartGenerator. In the sense that he copes with both tasks with the same success. In general, I would describe it as a heterogeneous data SQL processor. The concept of work in general is such that it takes data from a certain format and converts it into a tabular view (in fact, it is only at this stage that the parsing is done sometimes). Then, by running some SQL query on these tabular data, it forms a table with the result and saves it again in some format. In short, the chain looks like preparing input data-> SQL processing-> generating output data. Or, as illustrated in the documentation:
image
It’s time, perhaps, to move from theory to practice, for it is much more visual. Let's start for the seed with this example:

X:\>LogParser.exe -i:FS -o:CSV "SELECT TOP 100 HASHMD5_FILE(Path) AS Hash, COUNT(*) AS FileCount, AVG(Size) AS FileSize INTO Duplicates.csv FROM X:\Folder\*.* GROUP BY Hash HAVING FileCount > 1 AND FileSize > 0 ORDER BY FileSize DESC"

Statistics:
-----------
Elements processed: 10
Elements output: 2
Execution time: 0.06 seconds


I think many immediately guessed what the mystery happened here. In the familiar and familiar SQL style, we select ... files from the X: \ Folder folder, group these files by MD5, in order to identify duplicates among them by content. Naturally, we discard those cases when the number of such files = 1 (i.e. there are no identical ones). In addition, we organize the found duplicates in descending order of size and display only the top 100 of the most gross. To make sure that the result is correct, look in the Duplicates.csv file. There you will find something in the following spirit:
')
Hash,FileCount,FileSize
7EF9FDF4B8BDD5B0BBFFFA14B3DAD42D,2,5321
5113280B17161B4E2BEC7388E617CE91,2,854


The first value is the MD5 hash of duplicates found, the second is their number and the third is the size. Let us now try to decompose the code of the example in accordance with the previously described concept. Input data is determined by the provider of the input format and some address according to the selected provider. In our case, it is specified with the -i: FS option for the file system. And specific data is addressed (folder X: \ Folder) in the FROM part of our SQL query. A query to the registry, for example, for the \ HKLM \ Software branch would look like this: LogParser.exe -i: REG "SELECT * FROM \ HKLM \ Software". Provider - REG, address - \ HKLM \ Software.

By default, LogParser offers us the following source formats providers:

IIS Log File Input Formats

Generic Text File Input Formats

System Information Input Formats

Special-purpose Input Formats

Already a lot. And given that you can create your own providers - generally wonderful. Later, I will touch on this issue and show how you can create them as “grown-ups” using compiled assemblies, as well as on-the-fly, simply using scripts.

The output artifact format is defined in a similar way. With the -o: CSV option, we indicated that we are interested in the provider for CSV files, and in the INTO part of our SQL query, we addressed the search file to which the result will be saved. By analogy with the input providers, we list the weekends that are available out of the box.

Generic Text File Output Formats

Special-purpose Output Formats


Let's try another example for priming with completely different input and output providers. For example, the frequently encountered task of analyzing web server logs and displaying the top of referring sites.

c:\Program Files\Log Parser 2.2>LogParser.exe -i:W3C -o:CHART "SELECT TOP 10 DISTINCT EXTRACT_TOKEN(EXTRACT_TOKEN(cs(Referer), 1, '://'), 0, '/') AS Domain, COUNT(*) AS Hits INTO Hits.gif FROM X:\ex0909.log WHERE cs(Referer) IS NOT NULL AND Domain NOT LIKE '%disele%' GROUP BY Domain ORDER BY Hits DESC" -chartType:Column3D -chartTitle:"Top 10 referrers" -groupSize:800x600

Statistics:
-----------
Elements processed: 1410477
Elements output: 10
Execution time: 16.92 seconds


As you can see, LogParser thrashed nearly half a million records in less than 17 seconds with non-trivial conditions on the old Pentium D 2.8, which, in my opinion, is not such a bad result. Well, the main result is in the picture:

image

It seems to me that this is a terrific tool for database managers :), because it allows us to apply SQL magic where we often wanted, but it was impossible. Behind this thesis, for the time being I will suspend my narration. First of all, it’s possible to talk about LogParser for a long time and it will not fit in the framework of one post. Secondly, I would like to hope that this material is enough to at least slightly interest my colleagues in the workshop and understand how much they share my enthusiasm for this utility. And as soon as they are interested in continuing, I will immediately do it, because there are a lot of unsolved topics left. This is the conversion mode. And the formation of the output directly, as well as using LogParser-templates and XSLT. Incremental mode for working with large amounts of data. Creating your own input formats using the example of C # and JavaScript. Use LogParser from applications. Multvvod and multiple output in multiple files. Familiar with the abundance of subtleties SQL LogParser'a, including the vastness of its functions. Various options and command line options. Subtleties of various providers and their configuration. And of course, more examples of all sorts and different :)

Source: https://habr.com/ru/post/85758/


All Articles