During the work of the administrator there are questions, the solution of which is periodically postponed for seemingly insignificant, but sometimes they surprise with unexpectedly found answers. I hasten to share one such question with a simple answer (windows files, the solution is linux, so the bias is more towards linux).
The question was: to bypass all text files in subdirectories and display the values ​​of text strings by a regular expression. (It is clear that no explorer or windows-commander will help here).
Circumstances:
Many logs in text files. The log values ​​are mainly the registry hives FireFox, FlashPlayer, office, etc. in JSON format. Scripts were written in JavaScript + WMI and placed in Active Directory in the autoload of the computer and the user. Here are some registry keys that were of primary interest:
')
HKLM \ Software \ Macromedia \ FlashPlayer
HKLM \ Software \ Macromedia \ FlashPlayerActiveX
HKLM \ Software \ Macromedia \ FlashPlayerPlugin
HKLM \ Software \ Microsoft \ Windows \ CurrentVersion \ Uninstall
HKLM \ Software \ Mozilla.org
HKLM \ Software \ Mozilla
HKLM \ Software \ MozillaPlugins
Logs were created in text files in the following format
\\ serverlog \ logs $ \ [Date] \ [computer name] \ [path to the registry hive without forbidden special characters] .txt . An example of the name of such a file is "
\\ serverlegs \ logs $ \ regToFile.ANSI \ 2011-09-13 \ regToFile- [12-143057] [2011-09-03] \ [HKCU] [SOFTWARE] [Macromedia] [FlashPlayer]. txt ". An example of its content:
[
{"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer", "type": "folder"},
{"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer", "type": "REG_SZ", "name": "CurrentVersion", "value": "9,0,45,0"},
{"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer \\ SafeVersions", "type": "folder"},
{"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer \\ SafeVersions", "type": "REG_DWORD", "name": "6.0", "value": 88},
{"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer \\ SafeVersions", "type": "REG_DWORD", "name": "7.0", "value": 65},
{"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer \\ SafeVersions", "type": "REG_DWORD", "name": "8.0", "value": 33},
{"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer \\ SafeVersions", "type": "REG_DWORD", "name": "9.0", "value": 45}
]
There are more than one hundred machines in the domain and the number of files has quickly grown. Having such a set of logs, I sometimes want to make up on the fly some picture of the contents of files in the following form:

But it turned out that if the log files are scattered in subdirectories, then it is impossible to execute the (windows) find command on them, it does not look for it in subdirectories. Mount the network directory with logs in Ubuntu (
sudo mount -t cifs -o user = <domain \\ username>, password = <domain_password>, iocharset = utf8 // serverlogs / logs $ / / media / serverlogs / ). Attempt to linux at first did not bring success. The find command has the same problems there! But linux is so good that its console is admin-oriented, although the interface is not friendly at all. In man it is written that the find command has the option
-exec . This is just a super option. It would seem that we only need to substitute the
grep command in this key and we get the coveted result ... But here we are in for a slight disappointment! Log files were written to UNICODE (maybe my architectural error?), And grep doesn’t understand UNICODE (but UTF-8 understands). We develop the idea further: there is the
iconv command, which can convert encodings on the fly. This is where her opportunity came in handy. Additionally, use the "pipeline" and get a command of this type:
time find /media/serverlogs/regToFile.ANSI/ -name "*.txt" -exec iconv -f UNICODE -t UTF-8 {} \; | grep 'Macromedia\\\\FlashPlayer.*CurrentVersion'
A little bit of explanation:
[
time ] -
displays the time spent on the execution of the command .
[
find /media/server03-logs/regToFile.ANSI/ -name "* .txt" ] -
output all files of type * .txt that are in subdirectories [/media/serverlogs/regToFile.ANSI/][
-exec iconv -f UNICODE -t UTF-8 {} \; ] -
convert the contents of the found file (one at a time) from the UNICODE encoding to UTF-8[
| | grep 'Macromedia \\\\ FlashPlayer. * CurrentVersion' ] -
find the line Macromedia \\\\ FlashPlayer. * CurrentVersion in the converted textThe desired result is achieved and looks like the picture above. I think that I am not the only one who had such a problem. If someone comes in handy, I will be happy.
PS
After analyzing the comments, man grep -r and help on “System.FileSystemObject”. OpenAsTextStream () came to the conclusion that the problem was “hidden” in this very method of OpenAsTextStream (). It has a format parameter. If it is -1, the file is opened in UNICODE mode, and if it is 0, then in ASCII mode (but not ANSI !, and utf-8). I had -1. That was the root of the problem. I installed it at 0 and it started working with grep -r (on linux) and findstr on windows. It is strange, of course, that they do not understand UNICODE. Well, what if I want to do something with the found string before displaying it on the screen, I will use find -exec.
To display the found lines:
JavaScript -> "System.FileSystemObject" .OpenAsTextStream (ForAppending, TristateFalse); (TristateFalse for UTF-8 !!!!)
Windows:
cd <rootPath>
findstr / s "text" * .txt
Linux:
grep -r "text" <rootPath>
Continuing the topic of the search, I converted the files with the UNICODE format logs to the UTF-8 format (in the linux / bash console):
time find /media/serverlogs/ -name "*.txt" -exec iconv -f=UNICODE -t=UTF-8 {} -o {}.utf8 \; -exec echo {} \;
I pay attention that for output of the name of the converted file to the console it is necessary to use the
–exec key two times. Combining commands with the && method in the same key -exec will fail. The -exec key accepts only one command.