The seemingly trivial task is to remove metadata from the document. For a thousand times, information security
paranoia experts have made recommendations of this kind: “Be sure to remove excess meta information from documents before publishing.” And explained why this may be necessary (
example ). There are a lot of instructions on the Internet about how to do this for various image and document formats, but at the same time there is quite a few information about such a common format as PDF.
I conducted a small experiment and based on the results I collected a small toolchain and freeware utilities. That's all I want to share.
So, the first thing that was done was an attempt to delete the data by means of Adobe Acrobat itself according to the appropriate
instructions . The result is, but it can not be called satisfactory, because Firstly, this is shooting from a cannon on sparrows, and secondly, for some reason, the volume of the output file has increased by almost an order of magnitude.
Then,
among the heap of crapware , the wonderful windows utility
BeCyPDFMetaEdit was found, but it confidently copes with PDF v 1.6 and lower, and for newer revisions of the format the result is not guaranteed.
')
The ultimate solution, as usual, came from the world of * nix and the open source community. This is a bundle of
ExifTool ,
QPDF and
Xpdf utilities, each of which is also available under Windows. Since the licenses of these utilities do not prohibit them from being freely distributed without changes, then I boldly collected them in a
single archive (WINx64) with a script and elementary instructions for use. In short, you unpack the archive, put the pdf file to be cleaned in the resulting folder and then drag it to DEMETA.bat. The script will work and your file will become pristine.
Sources of Inspiration and Links to Used Software