📜 ⬆️ ⬇️

About fundamental mistakes in the design of programming languages

Once I came across an article that the most expensive mistake in the design of programming languages ​​was the decision to determine the end of a string in C by NULL-byte. One of the options for translating this article to Habré (although I, in my opinion, read the other). This article surprised me a little. First, as if at that time it was possible to shuffle each bit of memory and allocate another 2-4 bytes in each line to store its size. Secondly, this solution for the programmer does not carry any particularly catastrophic consequences. There are two errors that can be made on this occasion: I can think of two: wrongly allocate memory for the string (forget the place under NULL) and write the string incorrectly (forget the NULL). The compilers warn about the first error, the use of library functions helps to avoid the second one. All the trouble.

A much bigger problem from the times of the C language design (and then C ++) seems to me to be different - the for operator. For all its seeming harmlessness, it is just a storehouse of potential errors and problems.

Let's remember its classic application:
')
for (int i = 0; i <vec.size (); i ++)
{...}

What could possibly go wrong here?

1. for ( int i = 0; i < vec.size(); i++)

Despite the fact that the example with int most often goes in the textbooks on the first pages, the use of int is most often incorrect. We mainly go through arrays \ vectors \ lists. Those. first, we need an unsigned type, and second, we need a data type corresponding to the maximum size of the collection used. Those. it would be right to write

 std::vector<int>::size_type 

Tell me, how often did you write this? That's it. It looks so scary that few have the willpower everywhere to write like that. As a result, we have millions of incorrectly written cycles. What is this, if not an error in the design of a programming language?

2. for (int i = 0; i < vec.size(); i++)
All programmers are taught to correctly name variables. For names like “a, b, temp, var, val, abra_kadabra”, teachers give their hands on pairs, well, or older colleagues to young juniors. However, there is an exception. "Well, if this is a counter in a loop, then you can simply i or j." Br-rr-r. Stop! That is, it is necessary to give correct names to variables in all cases ... except for these cases, when variables are for some reason no clear names required and can we write one incomprehensible letter? Is that why this happened? And this happened because if the programmer were to call the variable “currentRowIndex”, then in the for loop it would have to be written three times:

 for (int currentRowIndex = 0; currentRowIndex < vec.size(); currentRowIndex++) 

As a result, the length of the string grows from 37 to 79 characters, which is inconvenient to either read or write. So we write i. Which leads to the fact that we already use j in the internal for loop, Wikipedia recommends using the k variable for some level of the third level of the loop in the inner Floyd-Warshal algorithm, and so on. In addition to the obvious non-obviousness of the written code, we also have copy-paste errors here. Take write some multiplication of matrices, the first time not confusing anywhere the variables i and j, each of which in one place of the code means a column, and in the other - a row of the matrix.

We live with this because of the poor design of the for loop.

3. for (int i = 0 ; i < vec.size(); i++)
The trouble with the for loop is that, as a rule, we need to start viewing it from the zero element. Except when needed from the first, second, previously found, last, cached, etc. The programmer’s accustomed hand habitually writes copy-paste = 0 , and then debugging and remembering the mother-in-law is required to correct this habitual = 0 to the desired option. You say that the for fault is not here, but is there a carelessness of the programmer? I do not agree. If you ask the same programmer to write the same code with do \ while or while - he will write it the first time without an error. Because in this case he will not have a bored template before his eyes, all the do \ while or while cycles are quite unique, the programmer thinks every time what the cycle begins with and by what criterion it stops. In the design of the for loop, this need to think sometimes seems superfluous, which is why it is almost always neglected.

4. for ( int i = 0 ; i < vec.size(); i++)
A convenient feature of the for loop is that the variable i is created in the scope of the loop and destroyed when it leaves. This is generally good and sometimes saves memory or somehow uses RAII. But this does not work at all in those cases when we need to find something in the cycle and stop. We can stop, but to return the index of the found element, we need an additional variable. Or defining i before the loop. An extra variable is an unreasonable expense for those cases where nothing is found. Announcement i before the cycle breaks the harmony of the code - the first for section remains empty, which causes the reader to ponder the code above, trying to understand whether this is an error, or it should be so.

Perhaps it looks like a niggle, but for me the for loop lacks the ability to return the value of the index in the event of an early stop. This could look like some post-block (like an else for a while loop) in which the last iteration count value would be available. Or a function in the spirit of GetLastError (), which would return the last value of the variable i at the time of the break call;

5. for (int i = 0; i < vec.size() ; i++)
Checking the condition in the second block of the for statement does not seem logical, since at each iteration of the loop (except the first), the counter increment will be performed first (the third block) and then the condition check (the second block). The condition check is in the second block to emphasize the fact that it will be executed at the first iteration of the cycle immediately after the initialization of the counter i - only with this explanation everything looks more or less logical. As a result, we got a loop, the syntax of which is concentrated on its first iteration and poorly reflects what is happening on all subsequent ones (which are usually many times more). Such is the design of the for operator.

6. for (int i = 0; i < vec.size() ; i++)
"Less". Or "less is equal"? Or "not equal"? To ".size ()" or to ".size () - 1"? Yes, it is easy to find the answer to these questions, but why, tell me, can you ask yourself these questions at all? And how, in those rare cases when you need to write a non-standard version, let your fellow programmers know that this is not a mistake, and that is exactly what you were going to write?

7. for (int i = 0; i < vec .size(); i++)
This is generally the only place where we tell the cycle, for what, in fact, the collection is going to go. And even then, we mention it only in the context of size. Here, they say, so many steps need to be done. At the same time, in the cycle itself we can easily walk along the vec2 vector, which, of course, according to the law of meanness, will have exactly the same length in debag, and in the release it will be different, because of what we will find this bug much later than that moment when you had to do it.

8. for (int i = 0; i < vec .size() ; i++)
As people just do not invent the designation of the number of elements of the collection! Yes, STL with its size () is fairly consistent, but other libraries use both length (), and count (), and number () and totalSize () - and all this in different variants of CamelCase and under_score writing styles. As a result, to use the “collection size” concept, we have to give the for loop knowledge of the implementation of this particular collection. And when you change the collection to another - rewrite all the fory.

9. for (int i = 0; i < vec.size(); i++ )
Here we have, of course, any holivar about the prefix and postfix increment form. You want to fight with a colleague and spend half a day remembering the standard of the language and studying the results of code optimization with modern compilers - welcome to the good old thread "++ i vs i ++". There are many different places (and Habr is one of them) where you can talk about it, but did you really need to make the third block of the for statement used by thousands in each first project?

10. for (;;)
Here we also have a classic argument “Yes, this is the most effective way to organize an infinite loop!” With “It looks disgusting, while (true) is much more expressive.” More holivar to god holivar!

11. for (int i = 0; i++; i < vec.size() )
This code is compiled. Some compilers issue a warning, but no one gives an error. The second and third blocks mixed up in places are not striking, since all the familiar things are written there - increment, condition check. The for operator looks like some kind of hardware connector into which the plug can be plugged in and out, and it will only work in one case, and in the second it will burn.

A significant part of the further evolution of programming languages ​​looks like an attempt to fix for. Higher-level languages ​​(and later C ++) introduced the for_each operator. Standard libraries are replenished with algorithms for searching and modifying collections. With ++ I entered the auto keyword - basically to get rid of the need to write wild
 std::vector<int>::iterator 
in each cycle. Functional languages ​​suggested replacing cycles with recursion. Dynamic languages ​​suggested to refuse type indication in the first block. Everyone tried to somehow fix the situation - and after all, it was possible to immediately design a better one.

Source: https://habr.com/ru/post/310338/


All Articles