📜 ⬆️ ⬇️

Handling large packed files on Mac and more

I somehow had a task to process a file with logs. Basically, the task is trivial, I use Perl for this both in Linux and in Windows. But the fact is that all this happens on a Mac, the file is in the archive and it is big. Unpacked, it takes about 20 GB.
What will be the usual solution?

If the file was small, then you can simply get it from the archive and send it to the input of the script. But this is not the case, and it is a pity to waste disk space. To do this, there is a standard solution to unpack the file in STDOUT and immediately pick it up with a handler from STDIN (via an unnamed pipe, the symbol "|"). No sooner said than done. The standard Mac unlocker has options for this.
unzip -p data.zip log.txt | process.pl > result.txt 

Where, process.pl log handler.
After testing on small files, everything was debugged and I went to the working file. But here I was in for a surprise. The file was processed instantly, but the result was empty. It turned out that files larger than 4 GB are not unpacked. Haha, and it is in a 64-bit OS. After googling, it turned out that yes, there is such a problem. They even say that the file can be packaged, but not unpacked. Some of the programs that were suggested were good, for example, The Unarchiver (http://wakaba.c3.cx/s/apps/unarchiver.html), but had only a graphical interface, but yes, of course, this is a Mac. Fortunately, there was another utility, unar (http://code.google.com/p/theunarchiver/downloads/list) from the same author who knows how to work with the command line. Everything is cool, but ... she can only unpack the file, but only with the original name. And what to do? I had already decided to look for something else, but I remembered in time about the named pipes (named pipe), which a pseudo-file allows to make on a disk that acts as a pipeline, where one program writes, the other reads, and both believe that they work with the present file. That is, the plan of action was as follows:

1. Create a named channel with the same name as the packed file:
 mkfifo log.txt 

2. Run the handler that will read the data from it. Run it with the & character so that it runs in the background, otherwise it will wait for the data and not release the terminal until it has finished processing:
 ./process.pl <log.txt >results.txt & 

3. Now you can run unpacking:
 ./unar -D -f data.zip log.txt 

The -D option does not create a directory.
-f ignore if a file with the same name already exists.
4. After work, delete the named pipe:
 unlink log.txt 

Everything is fine, everything works. Naturally, all of the above can be used in regular Linux.

')

Source: https://habr.com/ru/post/251007/


All Articles