Greetings to the venerable geeks! In my student years, I went downhill and became a nerd. Yes, the most real, which tears the "grass and deprive," and then sits with heavy determinants. And although this did not become the main type of my activity, I still got a part of the botanical (or rather geobotanical) work. Under the cut - a look at the work of geobotany from the perspective of an IT specialist, expressed through the experience of developing a crutch software. I would like to draw your attention to the fact that the words “work of geobotany” and “a look of an IT person” located in the same sentence mean that the following text is contraindicated for pregnant women, minors, as well as people with unstable psyche and cardiovascular diseases.

“Who is Tyler Durden?”
People often ask me
if I know Tyler Durden who are geobotanists? I answer: geobotanists are specialists in plant communities. Unlike, for example, taxonomists who travel from one dusty herbaric to another, geobotanists collect their material exclusively in the field, making geobotanical descriptions - special tables that indicate which plants and in what volumes grow on the square. From these tables in the future, you can determine the indicators of soil fertility and moisture, anthropogenic disturbance, the nutritional value of the site and many other interesting things. And, as you understand, this definition is the same as in the epoch, “when dinosaurs were small” - with a paper form and graph paper. Most advanced use Excel.
"And now, Hunchback!"
I paraphrase Zheglova: “A geobotanist who enters data from a paper form into Excel, in vain receives a work card.” And it's not at all about Excel. There are no complaints about this program - a wonderful thing, besides it is totally overwhelmed, with an accessible portable version. And even the fact that the xls format is not Gostovsky, not only does not bother, but is even unknown to the majority of specialists. The main problem of transferring paper forms to table editors is the futility of this work.
')
What are bad Excel tables? Here is an analogy with books: five years ago I began to actively collect a collection of scanned pdf and djvu versions of books. Over the years, the collection has grown to hundreds of gigabytes, and perhaps the only thing in it is not that good. After a certain point, I completely stopped using these books, because it took more time to search for information in my library than to search for such information on the Internet. E-book formats are good for art novels on readers, but only the web is suitable for storing technical literature and nothing else. From the fact that the "Flora of the USSR" scanned, it will not be more in demand than the botanical sections of Wikipedia.
The same with geobotanical descriptions. Working on the project of restored circumboreal vegetation, I collected a small electronic phytocenary (a collection of geobotanical descriptions) - in a few hundred descriptions, in a naive attempt to process it. Here it should be noted that the descriptions in the phytocenaries were made in different natural zones, by different authors, at different times, by different methods. Such descriptions are fundamentally impossible to standardize and compare, unless of course we are not talking about a rough qualitative comparison ("there were deprived tamo, and tamo mokhi and evonye berries"). Even the work of creating a single database based on these descriptions is painful and therefore wrong.
"We must understand the depth of our depths!"
Yes, I again nudyu (or need?) About the lack of standards and the absurdity of modern geobotany. This
Leonty Ramensky could afford to collect tens of thousands of descriptions and manually process them. Today it is impossible - no one needs such work, even though it is possible to actually increase productivity through technical means. Therefore, if we want to work with large phytocenaries, it is necessary to combine the developments of each into a single database. But for this you should at least make out descriptions according to uniform standards, and not as it should.
Yes, I certainly understand that “scientific” field work is today very often nothing more than paid tours. Therefore, the question is not raised about uniform standards and methods. Therefore, the issue of the desirability of publishing descriptions in the journal “Vegetation of Russia” is not raised (for publication of descriptions tables in the PAPER journal, it is high time to give the order “honorary starper”). However, the 21st century is in the courtyard, and geobotanists continue to fill out paper forms of their own invention.
“This is not you!”
The first attempt to optimize the work with the "raw" descriptions I made three years ago as part of the work on the program PhytoSoft (the development was carried out in Borland C ++ Builder 6). At that time, the task was to facilitate and speed up data entry from the field form, for subsequent analysis on ecological scales L.G. Ramensky (remember, I said above that with the help of geobotanical descriptions you can determine the fertility and soil moisture? This is precisely the "method of ecological scales"). The program was able to bring to a working condition, but with an extremely low budget, it remained at the stage of alpha testing and was later laid out in open access with all its cockroaches.

Now I understand that the concept of "Phytopht" contained a few terrible strategic mistakes. And it's not even that the code is (cut out by censorship) and the hands grow from the same place as the legs. The very idea that you should simplify entering descriptions from the form is fundamentally wrong.
During the demonstration of the first successfully compiled version, I regularly heard a question about the possibility of importing descriptions from Excel into Phytosoft. Undoubtedly, I provided such an opportunity for development, but the import technology was ugly and I always tried to silence this question, although it was and remains one of the main ones. Even if now, a miracle program for geobotanists will appear, what to do with the descriptions that have already been entered into the table editors? I said above that it’s almost impossible to bring descriptions of different authors to a single template, respectively, each description will have its own structure, or the general structure will be super-complex and there will always be a description that is not embedded in this structure. It required a fundamentally different approach to the organization of data in the descriptions, than the ones we are used to (lines-columns).
The * .gbo format that was used in Fitosfte did not fit this task. I didn’t formalize the specification for it properly, but the most important thing is that it also represented the very “column lines”. Simply put, "* .gbo" is a huge table, a thousand rows high, several hundred columns wide. Each description in the table takes one line. The description is divided into logical elements, which are located in different cells. For example, in the fifth cell of the first line the author of the first description is indicated, in the sixth cell of the first line the date of the first description is indicated, etc., in the fifth cell of the second line the author of the second description is indicated, in the sixth cell of the second line the date of the second description is indicated, etc. ... The format logic is very simple, but to import external files, the latter had to be painfully redone (imagine: your summary table of descriptions needs to be rebuilt in such a way that the names of species go from 50 to 100, and their projective coverings from 101 to 151). This problem arose from the fact that instead of developing a program for a format, a format was developed for a program.
"This is not a tub, but a real Japanese furake!"
Maybe I invented the bicycle, but in my own skin I realized that the software and the file format are unrelated (in terms of development) things. Initially, a file format should be developed, and this should be done regardless of when and by whom the software will be written for this format. After that, it already makes sense to write a program. At the same time, we will have to solve many tasks that would not have arisen when creating the format for ourselves, but on the other hand, the risks that the format will have critical disadvantages are reduced.
Here is a specific example. If you initially develop the format as a separate project, then you will surely consider that it should be adapted for import into GIS. After that, starting development, you will be forced to solve a problem with GIS-compatibility, even if the geographic information systems in your program do not smell. But this format will not be abandoned, which inevitably would have happened when it became clear that the format was useless for ArcGIS, QGIS or another program.
The file format should be as convenient as possible for converting to other and from other formats.
In developing Phytosoft as part of the idea of simplifying the digitization of paper forms, I was wrong. There should be no paper forms at all; field data should be immediately ready for processing. But, since any dead-end path ends sooner or later, the applied concept rather quickly outdated itself, giving rise to a new problem. On the one hand, it became clear that a geobotanist must enter data into the program immediately after receiving it in the field. On the other hand, this means that he will not use either a computer or a laptop. Tablets? But their price is as immoral as the battery life. And most importantly, I remembered the condition in which the field forms returned - soaked, with blood stains from crushed mosquitoes. And from the user’s side, I wouldn’t want that my (and not only mine) habit without thinking about throwing a tablet (which holds paper forms) next to it once destroyed electronics. The tablet did not fit. There was a smartphone. From the point of view of use, this is an ideal option - they cost much less, take them with them in any weather and the probability of damaging them is much less. But at the same time, this means that all the code associated with the interface of Phytoposte can be deleted. For field conditions, something fundamentally different than multiple selection windows is required.
"I'll be back"
The question of organizing a new interface for a long time seemed like a dead end and at present there is still no complete clarity in some details. As the format is worked out, it becomes obvious that it should be as simple and compact as possible. As far as I can imagine, this should be something similar to the command line. As for the format, I consider it most reasonable to abandon the tabular organization of data in favor of an HTML-like meta-language. On the one hand, this makes it easy to restore the table structure of descriptions, on the other hand, the descriptions already entered in the table editors are much easier to bring to the standard by adding the appropriate tags.
For this, I have exhausted my story. And with gratitude I will accept any criticism and advice - for, as one ogre liked to repeat: “One head is good, and two is better.”