Prehistory
Hello! My name is Zhenya and I am a programmer. Nothing special. For now.
Today I would like to share a story with a good ending, which convinced me that even if you don’t consider yourself to be above average programmers and solve trivial tasks, programming as a process can still be very exciting!
Before me was one of the most typical tasks: output the data to a text file. The file format is such that it opens on any desktop average device. I was sure of the simplicity of this task. But this week, fate decided to teach me a lesson ...
Story
It all started with choosing a library to make life easier and choosing a format. The format was docx and the
OenTBS library. It seems that everything in it was as it should - and the format of the file and use the template can be. But as shown by 3 days of work, if this thing can work with nested arrays, it is not in such an obvious way that it may be necessary to join some sect in order to comprehend this. I decided to follow the path of least resistance and just shared my thoughts about this library with the monitor and started looking for another one.
')
Next was a simple tool called
odtPHP . Whose site does not seem to work so far. As you have already guessed, along with the library, we had to change the format too - it became odt. If you really want to even be glad - open format and all that.
This is where the fun began. The document is generated from the template. If you make a template in LibreOffice (namely, it was my first hand), then in Word it will open only after the issue of restoration. And if you create a template in Word `s? Then without recovery. But after minimal edits in the template, the odtPHP produced an error saying that the variable was not found in the template. Furious yelling and poking a finger at the name of a variable in the template did not help the cause. Strange. That is, it turned out that the desired variable is, as it were, written in the template, but odtPHP cannot find this variable through a regular expression.
Suspicions began to creep in that the space / dash / any-other-character in Wordʻe in the source code itself could be designated somehow differently. Since I knew that odt, like docx, this is a simple archived XML, I decided to delve into this issue and for greater certainty I created the same odt template via Google Drive. After unzipping the odt-files from different “creators”, the following picture came out:
It became obvious that the format is a format, but programs can have a different look on it. As you can guess, the entire contents of the file is stored in content.xml. I open it. Looking for my variables. And, lo and behold! I'm on the right track! This is how my variable looks in Wordʻe (a variant created in Wordʻe):
And this is how this fragment looks in content.xml:
And the regular expression was as simple as possible:
$reg = '@\[!--\sBEGIN\s' . $string . '\s--\](.*)\[!--.+END\s' . $string . '\s--\]@smU';
What is <text: s /> to learn and failed. You may think that this is a space, but why did he get in right here, and in other places just a space? Unclear. It was not possible to predict the arrangement of such magical spaces in Word. So it happened that Word somewhere inserts a space as a space, and somewhere as <text: s />, which breaks off the finding of a variable according to the minimum pattern.
This is how a trivial task turned into a fascinating journey through the structure of the odt format!
Morality
No matter what level of knowledge you have, and how low your own assessment of skills is, don't be afraid to dig deeper! Unless, of course, you are an excavator driver and a visit is not near the sign "It is forbidden to dig."
PS It would be very interesting to read such stories in the format of Prehistory-History-Moral from cool developers, which later could be told for educational purposes.