📜 ⬆️ ⬇️

To the issue of identifiers

We develop the language here on the sly. And besides a huge number of syntactic and semantic questions, we have to solve interface questions (so they can be called): how sexy the code looks like, how quickly a person cuts into writing, and so on. So, one of these questions is the question of what characters to allow identifiers to be made by the programmer, and whether to make them case sensitive. The question is non-trivial, and here's why:


Mostly interscriptum :). In general, I almost always wrote in the style of this_is_the_variable , and if I hadn’t seen the Plan9 code, I wouldn’t have any questions: we would have made identifiers “like in C” in the language, but it so happened that I read Plan9, and it caused me a sharp lack of understanding of the fact that I understand the source code for Plan9 is much easier than the source code for Linux. And this is despite the fact that in Plan9, variables are usually referred to as: wrblock , lzput , hufftabinit , quotefmtinstall , and in Linux like this: spin_lock_irqsave , rt_mutex_adjust_prio_chain , dma_chan_busy , seq_puts . Why is that? When trying to give myself an explanation, there were some thoughts that, I dare to hope, will be useful to someone.

As you know, there are several popular lexical variable naming schemes:
this_is_the_var
thisIsTheVar
thisisthevar

So, which one is better for code perception is an open question. There is a standard point of view: this_is_the_var is the best option, because you can immediately make out the words from which the identifier is composed. But, is it good or bad - a moot point. Because…
')
First, should we strive to express the meaning of an identifier through a description of the process being abstracted? For example, everyone knows that printf is printf , and no one really thinks about what it really is: print_values_with_formatting_on_standart_output . Or, everyone knows that stdout is stdout . Does it make sense to put an identifier in its name, or is it better to read and write the program text when the meaning of the identifier is derived from the program text? And if the latter is true, then vice versa, do long names prevent the perception of the text? Also, do long titles interfere with text understanding? After all, in the case of this_is_the_variable programmer has to work on two levels: evaluate the meaning of the phrase that identifies the identifier, and evaluate the connection of the identifier itself with the entire program. As examples:

 while ((current_character = getc (stdin))! = EOF)
 {
	 do_something ();
 }

and
 while ((c = getc (stdin))! = EOF)
 {
	 do_something ();
 }


The examples are simple, but in the first case, you must first read current_character , understand that this is the current character, then associate this understanding with how getc works, and then every time the current_character character appears in the text, the reader should evaluate in his head this mental construction (this is not a scientific fact at all, but simply mine is a non-expert hypothesis). In the second example, this does not happen, the meaning of c 'hieroglyphic', that is, it is embedded in the identifier not by appeal to an external language, but right here in the text (you can tell the image, therefore, the hieroglyphic) of the program. Is it helpful? I personally do not know, but, quite likely, it is worth thinking about it (?).

Secondly, and this complements the previous one, identifiers in the style of this_is_the_variable simply knock the brain off of the perception of the identifier as a whole. Considering GitHub, for example, in some cases I just read the line for syllables for a relatively long time, trying to figure out where the variable declaration begins. Or you can compare: lpfnWndProc with window_event_handling_procedure_ptr , which is perceived as a whole?

Thirdly, long identifiers that describe something in detail physically expand the field that needs to be analyzed in order to understand the meaning of what has been written.

All this leads to the question: is it necessary to allow underscores in identifiers, thereby stimulating programmers to verbosity and multi-lettering?

Another question: should identifiers be case sensitive? The generally accepted answer to this question: yes, they should. But here, too, you can express doubts: such, for example. Case insensitivity gives more freedom to programmer interaction: one is more comfortable to write lpfnWndProc and another lpfwndproc , the third one marks different appearance for - different types of cycles: for example foR is the list run, and FOR is an iterative search for a numerical solution.

A small digression: all the same, numbers are algorithms, and this is much more natural than numbers — coordinates, or numbers — values.

Dealing with various disputes and arguments around this topic, I came across a remark about what case-insensitive identifiers do and is not underlined badly, because there are a huge number of libraries written in C (or Assembler) for which case-sensitivity and underscore are important. But in our language it will be possible to create such identifiers: ID. “ ID.' ? !' ID.' ? !' , and they can be used to communicate with libraries in C and any other case sensitive language. So is it worth making variables with underscore and case sensitivity? And would not the absence of these opportunities contribute to the writing of better code, and to a better understanding of it later?

Such is the text turned out. Thanks for attention.

PS Here is what Plan9, Rob Pike, wrote about C programming rules: www.lysator.liu.se/c/pikestyle.html

Source: https://habr.com/ru/post/74473/


All Articles