Everything you need to know about the semicolon

The semicolon auto-insertion (";") is one of the most controversial features of javascript, around which a lot of misunderstanding has accumulated.

Some programmers put ";" at the end of each statement, some are only where strictly necessary. The majority is somewhere in the middle, although there are those who add the extra ";" for stylistic reasons.

Even if you always put ";" at the end of each operator, some constructions are parted in an unobvious way. Regardless of your preferences regarding the ";", the rules of such parsing need to know to use the language professionally. Remembering a few simple rules below, you will understand how any program will parse, and become an expert in the ";" in javascript.
')

Where allowed TK

In the formal grammar given in the ECMAscript specification, ";" there are at the end of each operator where they can be. Here is the do-while statement:

do Statement while (Expression);

TORs also arise in the grammar at the end of the var statements, expression statements (like “4 + 4;” or “f ();”), the continue, return, break, throw statements and the debugger statements.

An empty statement is one ";", and is a valid statement in javascript. For this reason ";;;" is a valid program, it is parsed as three empty statements, and performs nothing three times.

Sometimes empty operators are useful, at least syntactically. For example, for an infinite loop, you can write “while (1);”, a semicolon is parsed as an empty statement, making the while statement syntactically valid. Without TZ, the while statement would be incomplete, since an operator is needed after the loop condition.

Finally, ";" are used in cycles in the form of “for (Expression; Expression; Expression) Operator,” and of course, can be used in lowercase and regex literals.

Where the semicolon can be skipped

Formal grammar from ECMAscript ";" mentioned as described above. However, the specification also provides rules that describe how real parsing differs from formal grammar. The rules are described through imaginary ";" inserted into the input stream, but this is just a specification model; in practice, parsers do not need to generate the pseudo ";", but can be perceived as ";" as optional in certain places of the grammar (see, for example, the grammar of parsing in ECMAscript, in particular the Statement, EOS, EOSnoLB, and SnoLB rules). Wherever the specification says "inserted"; "", it means that the current statement ends.

The rules for auto-installation of TK are described in section 7.9 of ECMA-262 [pdf] .

The section gives three rules and two exceptions to them.

The rules are as follows:

When a program encounters a token that is not allowed by the grammar, the ";" is inserted, if (a) there is a line break in this place, or (b) the invalid token is a closing brace. When the end of the file is reached and no other interpretation is possible, the ";" is inserted. When a “restricted production” appears, containing a line terminator at the place where the grammar says "[no LineTerminator here]", the ";" is inserted.

In other words, these rules say that the operator can end without the ";" (a) before the closing brace, (b) at the end of the program, (c) if the next token cannot be otherwise, and, in addition, there are some places in the grammar, where the transfer of the line is completed by the operator unconditionally. What this translates into practice is discussed below.

Exceptions: ";" never inserted into the loop header of the form "for (Expression; Expression; Expression) Operator", and ";" is never inserted if the result is an empty statement.

What does all this mean to us?

First of all, ";" optional only at the end of the line, before the closing brace, and at the end of the program. In addition, ";" It is not supposed at the end of the line if the first token of the next line can be parsed as part of the previous statement.

“42; "hello!" "is an example of a valid program, as well as" 42 \ n "hello!" "(where" \ n "represents a line break) but" 42 "hello!" "is no longer, since a line break causes auto-insertion"; "but there is no space. “If (x) {y ()}” is also valid. Here, "y ()" is an expression statement that can end with ";", but since the next token is a closing brace, ";" optional, despite the lack of line breaks.

Both exceptions, for loops and the empty statement, can be illustrated together:

  for (node = getNode ();
      node.parent;
      node = node.parent);

The loop sequentially calls the next parent node until there is a node without a parent. All this happens in the header of the loop, so nothing is left for the loop body. However, the loop syntax is required by the operator, and we insert an empty statement. Despite the fact that all three ";" in this example, at the ends of lines, all three are necessary, since ";" not inserted in loop headers or to create an empty statement.

Limited spawn

In limited generations, a line break cannot be found in certain places, because if you break a line in these places, the program will not be parsed in the same way, but it may be otherwise.

In the grammar, there are five restricted derivations, these are the postfix operators ++ and -, the operators continue, break and return. The break and continue statements can have an optional identifier to transfer control from a particular loop. When using this feature, the identifier must be on the same line. This is a valid program:

 var c, i, l, quitchars
 quitchars = ['q', 'Q']
 charloop: while (c = getc ()) {
     for (i = 0; i <quitchars.length; i ++) {
         if (c == quitchars [i]) break charloop
     }
     / * ... code for other characters ... * /
 }

If getc () returns a character from the input stream, the program reads it, checks if it is an exit character, and if so, passes control for the loop. The break statement is needed to break through the outer loop, not just the inner one. The same program, distinguished only by a line break, will not give the same result:

 var c, i, l, quitchars
 quitchars = ['q', 'Q']
 charloop: while (c = getc ()) {
     for (i = 0; i <quitchars.length; i ++) {
         if (c == quitchars [i]) break
             charloop
     }
     / * ... code for other characters ... * /
 }

In this case, the charloop token is not part of the break statement. Since the break statement is limited, the transfer of a line in this position ends the statement. The charloop token is parsed simply as a charloop variable, but the control will not receive this place, and the break statement will go out of the inner loop, and not out of the outer, as intended.

Examples of the remaining four restricted creations:

 // PostfixExpression:                                            
 // LeftHandSideExpression [no LineTerminator here] ++
 // LeftHandSideExpression [no LineTerminator here] -
 var i = 1;
 i
 ++;

This will generate an error, and will not parse as “i ++”. The terminator cannot separate the postfix increment or decrement operator, so that “++” or “-” at the beginning of a line will never be parsed as part of the previous line.

 i
 ++
 j

And this is not a mistake, parse as “i; ++ j. Prefix increments and decrements are not limited to generation, so a line break can occur between the "++" or "-" and the expression they modify.

 // ReturnStatement: return [no LineTerminator here] Expressionopt;
 return
   {i: i, j: j}

This is parsed as an empty return statement, followed by an expression statement to which control will never reach. And here it is as planned:

 return {
   i: i, j: j}
 return (
   {i: i, j: j})
 return {i: i
        , j: j}

Note that the return statement MAY contain hyphens within the expression, only not between the return token and the beginning of the expression. If you intentionally omit the ";", the limited generation of the return operator is convenient because it allows you to write an empty return without accidentally returning an expression from the following line:

 function initialize (a) {
   // if a is already initialized, return
   if (a.initialized) return
   a.initialized = true
   / * ... initialize a ... * /
 }

The continue and throw statements are similar to break and return:

 continue innerloop // right
 
 continue
     innerloop;  // wrong
 // ThrowStatement: throw [no LineTerminator here] Expression;
 throw // parse error
   new MyComplexError (a, b, c, more, args);
 // Unlike return, break, continue, 
 // expression after throw is required, 
 // so the above is unparsed altogether.
 throw new MyComplexError (a, b, c, more, args);  // right
 throw new MyComplexError (
     a, b, c, more, args);  // it is truth too
 // any option with throw and new on one line is correct.

Indents are not important for parsing ECMAscript programs, and the presence or absence of line breaks plays. Thus, any source code processor in javascript can cut leading spaces (except for string constants!) Without affecting the semantics of the program, but line breaks cannot be arbitrarily cut or replaced with spaces or semicolons. The minifiers that change the semantics of valid programs are a bad, bad minifiers, and the only way is to write a complete and correct parser.

Line breaks after return, break, continue, and before ++ and - affect parsing. Since only these generations are limited, spaces and line breaks can be freely used elsewhere to improve the readability of the program. In particular, logical, arithmetic, lowercase concatenation operators, a triple (or conditional) operator, member access using a dot or brackets, function calls, while loops, for loops, switcher statements, and other control structures can be written with line breaks anywhere.

The specification reads as follows:

Practical advice for ECMAScript programmers: the postfix operators "++" and "-" should be on the same line with their operand. An expression in a return or throw statement must begin on the same line as the return or throw token. The identifier in the break or continue statement must be on the same line as the break or continue token.

The most frequent error of a programmer with limited generations is to put the return value on the line after the return token, especially if a large object or an array literal or a multi-line constant is returned. Errors with postfix operators, and operators break, continue, throw are rare in practice, due to the fact that such a string split looks unnatural for most programmers.

Last subtlety avtostavki ";" derives from the first rule, which requires that the program contain an invalid token to insert ";". If you omit the optional ";", remember that there are non-optional ones that cannot be skipped. This rule allows you to stretch statements on several lines:

 return obj.method ('abc')
           .method ('xyz')
           .method ('pqr')
 
 return "long string \ n"
      + "stretched \ n"
      + "a few"
 
 totalArea = rect_a.height * rect_a.width
           + rect_b.height * rect_b.width
           + circ.radius * circ.radius * Math.PI

The rule concerns only the first token in the string. If this token can be parsed as part of an operator, then the operator continues (even if the parsing fails further). If the first token cannot be continued by the operator, the next one begins (in this place the specification says “inserted”; "").

The potential for errors occurs when a pair of operators A and B are both separately valid, but the first token B can also be taken as a continuation of A. In such cases, if there is no ";", the parser will not parse B as a separate operator, and either error, or parse the program in an unexpected way. Thus, if ";" skipped, the programmer needs to keep track of any operators A and B, separated by a line break, does B start with a token that can be attached to the end of A.

Most of the operators in javascript begin with an identifier, and most of the rest - with a keyword like “var”, “function”, “if”. For any such operator B, starting with an identifier or a keyword, as well as for any string starting with a string constant, there is no valid operator A (the proof of this from the grammar of the language is left as an exercise for the reader).

 A
 function f (x) {return x * x}
 
 // for any operator A without TK
 // all these examples parse right
 
 A
 f (7)
 
 A
 "a string" .length

Unfortunately, there are five tokens that can either start an operator or continue an already completed one. This is "(", "[", "/", "+" and "-". In practice, the first two cause problems.

This means that not always the line break can replace the ";" between operators.

The specification gives an example:

                    a = b + c
                    (d + e) .print ()

It is not converted by the auto-insert ";", since the expression in brackets can be parsed as a function call argument:

                    a = b + c (d + e) .print ()

The specification suggests that “when the assignment operator must start with the left bracket, it is a good idea to explicitly put a semicolon on the previous line.” A more rigorous alternative is the practice of setting TZ at the beginning of the line, immediately before the token, at the risk of introducing ambiguity:

                    a = b + c
                    ; (d + e) .print ()

Operators beginning with a round or square bracket are infrequent, but occur.

Examples with square brackets are more frequent, since “functional” operations like map, filter, forEach are more frequent with arrays. It is often convenient to write a massive literal with forEach, necessary for its side effects:

 [['January', 'Jan']
 , ['February', 'Feb']
 , ['March', 'Mar']
 , ['April', 'Apr']
 , ['May', 'May']
 , ['June', 'Jun']
 , ['July', 'Jul']
 , ['August', 'Aug']
 , ['September', 'Sep']
 , ['October', 'Oct']
 , ['November', 'Nov']
 , ['December', 'Dec']
 ] .forEach (function (a) {print ("The abbreviation of" + a [0] + "is" + a [1] + ".")})
 
 ['/script.js'
 , '/ style1.css'
 , '/ style2.css'
 , '/ page1.html'
 ] .forEach (function (uri) {
    log ('Looking up and caching' + uri)
    fetch_and_cache (uri)})

If massive literals are used in assignments, or functions are passed, they are not at the beginning of the operator, so the initial square bracket is infrequent, but it occurs.

The last problematic token is a forward slash, and it is very unintuitive. Take a look:

 var i, s
 s = "here is a string"
 i = 0
 /[az[/g.exec(s)

In lines 1-3, we set up variables, and in the fourth, we seem to write the regex literal "/ [az] / g", which globally finds az, and then we call this retex with a string using the exec method. Since the return value of exec () is not used, the code is not particularly useful, but we would expect it to at least compile. However, the slash does not only start rehexps, but is also a division operator. This means that the initial slash on line 4 will be parsed as a continuation of the assignment operator on the previous line. These strings will be sent as “i equals 0 divided by [az] divided by g.exec (s)”.

In practice, this problem almost never arises, since there are few reasons to start the regexp operator. In the example above, the value of the exec () call would normally be passed to the function or assigned to a variable, in any case the string would not start with a slash. A possible exception is, again, the forEach method, which you can usefully use [ original: usefully used ] on the value returned by the exec () call.

The operators "+" and "-" can be used as unary operators, for converting values to the Number type, and for reversing the character in the case of "-". When used at the beginning of the line with the missing ";" they can be perceived as the corresponding binary operators, and a continuation of the previous operator. But this is rarely a problem, since the initial unary operator is even less common than regex (and, moreover, it does not look complete). As with regexp, if a programmer wanted to bring a value to a number, he would somehow use this value, assign it to a variable, or transfer functions, and in any of these cases the unary operator would not be at the beginning:

 var x, y, z
 x = + y;  // useful
 y = -y;  // useful
 print (-y);  // useful
 + z;  // useless

In all such cases, if you omit the ";", it is a safe practice to begin the lines with a bracket just a semicolon. The same advice for unlikely cases of operators "+", "-", or a slash. Thus, even if TK is not used everywhere, the line will be protected from incorrect parsing, regardless of how the previous line may change.

Delusions

Many novice javascript programmers receive advice to put ";" everywhere, and they believe that if they do not use the autoinsert rules ";", this property of the language can be ignored. This is not the case, due to the constrained spawning rules given above, especially the return statement. And when they become familiar with restricted spawning, they begin to fear line breaks, and avoid even when they improve readability. It is best to master the rules of the autostep ";" in order to be able to read any code, and to be able to write the code in the clearest way.

Another misconception says that the bugs in the javascript browser engines mean that putting semicolons is more reliable everywhere, and that it improves compatibility. This is simply not the case. All existing browsers implement the specification correctly in relation to the autohard ";", and any bugs that may have existed have long gone into the darkness of the early history of the web. There is no reason to worry about browser compatibility: all browsers implement these rules as outlined above.

Conclusion

Do I put a semicolon? That's your business. Simply, the choice should be made on the basis of information, and not vague fears about unknown syntactic traps or non-existent browser bugs. If you remember these rules, you are armed for the right choice, and it will be easy to read the code in javascript.

If you decide not to put ";", I advise you to put them before the opening brackets in operators that they begin, and in operators that begin with "/", "+", "-", if you happen to write such an operator.

Regardless of the semicolons, remember the constrained spawning rules (return, break, continue, throw, and postfix increment and decrement operators), and you can break lines in any other places for convenience and readability of the code.

Source: https://habr.com/ru/post/111563/

All Articles