Why do we write and store code in text files?

Simple text data formats are great stuff. No, no kidding. Take, for example, the banal txt-files. Well, the beauty! In any iron there is a text editor, you can open the file, read, write. Any self-respecting programming language out of the box will give you the means to work with text files. Or here are the early network protocols - SMTP, POP3, HTTP 1.0. It is generally such bliss that tears in my eyes well. You can take telnet, connect to the server, give commands and read the answers! Writing clients, servers, sniffers, proxies is a pleasure.

But time does not stand still and the convenience of the programmer, unfortunately (or fortunately!), Has long ceased to be the main criterion for the choice of technology. It was necessary to add graphics, pictures and links to beautiful text files - and now we have the pdf and doc formats. With network protocols in general trouble - for some reason people needed traffic compression, encryption, multiplexing and other scary things. And now we have HTTP / 2 , with which you will not really work with a telnet. Even all the beautiful REST are different goats like Google, no, no, and they are trying to replace it with some kind of gRPC . And I have not yet begun to argue that little text files are saved now - for some reason everyone uses some kind of database, completely unreadable when opened with a text editor, but with ~~some kind of magic that~~ allows to structure, index, search effectively. etc.
')
And here we come to the topic of what I would like to discuss. As you can see, all data storage formats, protocols and other things have come a long way in search of their optimal form. However, we, programmers, continue to write the code of our programs in text files, as our grandfathers did it 50 years ago (or is it more?). Whatever OS we use, whatever trendy language we choose, no matter how cool our IDE is - the result of writing the code will be letters in a text file. That, you see, in theory is completely unnecessary, because our computer letters are needed like a hare stop signal. He needs machine instructions for execution, it’s just easier for us to describe them in some letters. But in fact, after all, the “cycle” or “function” will be what they are, how you don’t name them and where you don’t keep them. "The rose smells like a rose, even if you call it a rose, though not."

Yes, we are used to writing code in text files. You open the file - well, it seems to see the code of your program. Some part of it, or rather. Somehow you see. You see. Well, or think you understand. But you also open, see and understand the post on Facebook - and Facebook didn’t keep this post exactly in the text file.

Let's see what problems the storage of code creates in the form of a set of text lines.

All this is terribly inconvenient to handle all the tools we use.
All the code that we write will later be read by a bunch of programs: IDE, its plugins, compiler, debugger, static analysis tools, version control, build server, etc. All of them are just incredibly like text formats (in fact - no). Each of them will read them, analyze them (i.e. you already see the useless work of dozens of development teams for each programming language), waste time on it. And they will sometimes make mistakes, or not make mistakes, and understand your code slightly differently. It's no secret that the IDE and some of its plug-ins can offer different auto-completion and different means of refactoring for the same code. And how it would be nice if one tool once turned the letters you wrote into some AST-like representation (perhaps extended, with metadata) and saved it all in the database, which we will call the “code base of the project”. And then each tool would have turned to this database, according to certain protocols, obtaining stable, standardized and predictable results. The closest approach to the ideal, which I know today, is a project from Google, called Kythe . There is still far from ideal, but the direction is correct.

All this is terribly inconvenient to read and write to people.
“What could be clearer than the text ?!” - you ask. Yes, a lot of things can. Text is a good presentation of information, but not the only possible, and not always the best. Look at the current interest in books - dropped significantly. Because many of the submission of information perceived by us more alive, faster. The same audiovisual series. Or infographics. Or even the text - but not a solid wall, but somehow structured, sorted, grouped and filtered. Yes, we can make our code like this. If we set ourselves this goal ourselves, we will sit down and do it. And after all we do - it is necessary, there are no other options. And how cool it would be to get it all for free, automatically. Have the opportunity to see the code the way you want it to be. To write a SELECT by code - and see only those artifacts that match the parameters of the query. Or present the data transfer between two nodes as a diagram. Or animation. Search function not by name, but by the recollection of “there were more than three parameters and it changed the second of them”. Or by the fact that it had three cycles. Or "she did something with time." Do we have such tools now? No them! Sit down and write regular expressions like a fool.

Code writing is even worse than reading. Someone from the programmers is studying blind typing, someone is wondering about the Vim hotkeys, someone is poking the mouse on the menus of his favorite IDE. The fact that we have to do this is a horror! Creating algorithms is an art, it is magic, it is an act of creation. And we take our stone axes and go with their help to build a starship. And today we have nothing but these hammers. And every day there are new hammers, which significantly change the shape of the handle and the type of attachment of cobblestones to it, in fact, not one iota closer to us with the tools that should be used to build starships.

Code formatting styles
Do you remember the holivars "tabs against spaces" on Habré? These were the battles. “Code formatting style” is a concept that for some reason exists, although it should not exist in principle. Programmers choose it for their project, try to comply (spend time on it), sometimes they even reproach each other if someone inadvertently violates. What kind of nonsense? Imagine an ideal world where code is stored in a binary file, in the form of a coherent set of entities denoting classes, methods, cycles, variables, conditions. In this case, each programmer can customize his own tool for viewing this code so that he can draw even tabs, even spaces, even kittens icons. All these “on which line the curly brace” and “need a space after the comma” would simply be a thing of the past. In the same way, if the code were saved to the database, it would be converted from a code designed in the style of a given programmer to simply code without formatting. And then it was imported by another programmer in his IDE in a convenient form.

Metadata in the code
Recently, a post flashed on Habré that annotations are not a place in Java code. Of course, there is no place for them. Because it is not a code. This is metadata. We write them in code, because there is nowhere else. Well, except that in another XML file next, which is even worse. Annotation in Java, like decorators in Python, attributes in C #, comments to methods in any language, etc. - must be metadata. We should have convenient search tools for this metadata, their quick on / off, refactoring. Of course, it should be possible to show them “as it is now, in code”. But "stitching" them into the code is like knocking yourself on your chest with a tattoo right away. Too drastically, too big consequences.

Aspect-oriented programming
Everyone who first reads about aspect-oriented programming, is imbued with the beauty of the idea. I like it even more functional. But wait, where is it in the real world? And not him. Because aspect-oriented programming simply does not fall on the concept of code written as a sequence of text lines. Attempts to write at least something in the aspect style either lead us into the wilds of reflection, or require massive revisions that permeate the entire code base and are not automated.

Now imagine that our code is effectively stored in a structured format. And we have a tool to work with him. So, you need to add logging of all input parameters of all methods of all classes? Yes, it is easy, one command to change the attributes of all methods (something like "SET logging = true Where item =" method ") + 1 line of code that defines the format of the log. Similarly, in two clicks this is removed. What else do we need do in each function? Profiling? Authentication and checking access rights? Checking contracts? And why not, if they - all of our classes, methods, and adding all this, DO NOT CHANGE THEIR CODE. Added, tested, possibly removed, or perhaps or maybe they saved several sets of aspects (something for the debagging environment, something about another for production).

Large code base
Google recently said that they have 100 million lines of code (probably already more). So much, probably, no one else has, but 1-2 million lines of lines for an enterprise enterprise’s large enterprise’s project is already a commonplace thing. And how do you, please tell me, are you navigating around this codebase? How good are your favorite grep and find? What about functional like Find References? What about refactoring tools? The text is a slow and clumsy thing. Anyone who wants to operate them quickly is forced to build some kind of indexes. These indices are built for a long time, they live separately from the code itself, they are reinvented by each tool. We ourselves have created a problem that we are trying to heroically solve. At this point, I remembered a recent report on structural logging, which tells about a similar problem: we invented writing logs in text formats, in order to come up with separate programs like Logstash to parse these text formats in order to extract previously recorded information from them (of course, , and not everything is written and not everything is extracted, and contexts are lost, and out-of-sync at every step). In general, an interesting report, look. Well, for the logs, we have already reached an understanding of the scale of the trouble, and it’s up to the code itself, it turns out, no.

Long compile time
Look at the first programming language and try to guess what feature its users want. As in that joke about “4 trunks and the whole sky in parrots”, you can shoot at random and it turns out that “it would be nice to speed up the compilation / interpretation time” - that's from fresh , requests to Yandex (and why not in Sportloto?) To improve C ++ . It turns out that the compiler / interpreter does not do anything at the time when you defiantly thrash the keyboard (and you often do it) and only occasionally, after waking up from lethargic sleep, the assembly launch command or the script interpretation requirement memory and all the CPU cores to do their work. And, just like a student who has not done anything for the entire semester, suddenly it turns out that you need to strain, spend a lot of time and effort. And why did you, I ask, have not done anything before? If the format of preserving the code suggested translating the written text of the program into some intermediate form (something slightly more than a stream of tokens, but slightly less than a ready-made binary code), then the compilation would be much faster.

Lack of cross-language development tools
Due to the fact that every programming language, every framework and every OS is a monastery with its own charter, we have absolutely no tools for conveniently coordinating the writing of different parts of the system. No, IDE, of course, there are all sorts. But show me the debugger, which will give you the opportunity to go through a completely typical process now: the Python script launched a program in C ++, which pulled a microservice on Go through the network, which got into the base, where it pulled a couple of snapshots, after which everything returned to the Python script ". And so that the debugger gives me everything - breakpoints at any level from the script to the storage, the call stack, the whole environment. So that I understood that the parameter was transmitted via the command line, and here it was serialized into a REST request, and then it returned in this way. Is there such a thing? There's no such thing. Because the code is small letters. There are such, and here others. There they compiled one compiler, here - another. And somehow to connect them is difficult. But if the code was a code, and there the code (with functions, parameters, cycles, conditions) and here is also code, then it would be possible to forward bridges between them, and then it is convenient to walk along these bridges.

So how long?

Source: https://habr.com/ru/post/302146/

All Articles

Why do we write and store code in text files?

More articles: