Instead of introducing
The author of the article, Alan Keefer
1 , is the chief architect of Guidewire Software
2 , which develops software for the insurance business. While still a senior developer, he participated in the work on the Gosu
3 language. In particular, Alan dealt with issues of compiling a language into Java bytecode.
This article was written in 2009 and focuses on the details of a try / catch / finally implementation in JVM version 1.6. To read it, you must have a basic knowledge of Java syntax, as well as understand the purpose of the bytecode, the sheets of which are under the cut. Also at the end of the article are a number of examples similar to SCJP tricky tasks.
JVM insides
One of the things that we are currently working on for a variety of reasons is the compilation of our “home” language into Java bytecode. (For reference, I can’t say when we’re done. Even approximately. Even if he’s going to future releases.) The fun is to study the insides of the JVM, and also to search for all the fucked sharp corners of our own language. But most of all the "fun" and sharp corners deliver such operators as try / catch / finally. Therefore, this time, I will not go into philosophy or Agile. Instead, I’ll go deep into the JVM, where most do not need (or don’t want) to go deep.
')
If two weeks ago you asked me about finally blocks, I would assume that their processing is implemented in the JVM: this is the basic part of the language, it should be embedded, shouldn't it? What was my surprise when I learned: no, not so. In fact, finally blocks are simply substituted into all places after try- or associated catch blocks. These blocks are wrapped in “catch (Throwable)”, which will rethrow an exception after the finally-block finishes. It remains only to tweak the exception table so that the substituted finally blocks are skipped. Well, how? (Small nuance: up to the JVM 1.6 version for the finally operator, it seems that subroutines were used instead of full substitution. But now we are talking about version 1.6, to which all of the above applies.)
To understand whether this approach makes sense, rewind a little back and see how the JVM handles exceptions. Their processing is built into the JVM as a declaration of try / catch blocks using a special method. All that is required of you is to say “between point A and point B, any exception of type E must be processed by code at point C”. You can have as many such declarations as you need. When an exception is passed to this method, the JVM will find the corresponding catch block depending on its type.
A simple example try / catch block
Consider a simple example:
public void simpleTryCatch() { try { callSomeMethod(); } catch (RuntimeException e) { handleException(e); } }
For him, in the end, you will get the byte code below. (I use the formatting that ASM Eclipse offers - an invaluable tool for studying the mechanisms of the JVM. It seems to me that the code in this format is quite easy to read. “L0”, etc., are code labels.)
public simpleTryCatch()V TRYCATCHBLOCK L0 L1 L2 java/lang/RuntimeException L0 ALOAD 0 INVOKEVIRTUAL test/SimpleTryCatch.callSomeMethod()V L1 GOTO L3 L2 ASTORE 1 ALOAD 0 ALOAD 1 INVOKEVIRTUAL test/SimpleTryCatch.handleException(Ljava/lang/RuntimeException;)V L3 RETURN
So, we tell the catch operator to cover the entire try-block as a whole (but not the GOTO operator at the end), and in the case of a RuntimeException, to transfer control to L2. If the try statement is complete, jump over the catch statement and continue execution. If the RuntimeException handler is called, the exception is at the top of the stack, and we save it to a local variable. Then we load the pointer to “this” and the exception in this order to call the handleException method. After that, the remaining code is executed to the end. However, if there was an additional catch block, we would jump over it.
Example try / catch / finally block
Now we add a finally block and an additional catch statement and see what happens in the bytecode. Take the following completely contrived example:
public void tryCatchFinally(boolean arg) { try { callSomeMethod(); if (arg) { return; } callSomeMethod(); } catch (RuntimeException e) { handleException(e); } catch (Exception e) { return; } finally { callFinallyMethod(); } }
In this case, we get a much less understandable bytecode:
public tryCatchFinally(Z)V TRYCATCHBLOCK L0 L1 L2 java/lang/RuntimeException TRYCATCHBLOCK L3 L4 L2 java/lang/RuntimeException TRYCATCHBLOCK L0 L1 L5 java/lang/Exception TRYCATCHBLOCK L3 L4 L5 java/lang/Exception TRYCATCHBLOCK L0 L1 L6 TRYCATCHBLOCK L3 L7 L6 TRYCATCHBLOCK L5 L8 L6 L0 ALOAD 0 INVOKEVIRTUAL test/SimpleTryCatch.callSomeMethod()V L9 ILOAD 1 IFEQ L3 L1 ALOAD 0 INVOKEVIRTUAL test/SimpleTryCatch.callFinallyMethod()V L10 RETURN L3 ALOAD 0 INVOKEVIRTUAL test/SimpleTryCatch.callSomeMethod()V L4 GOTO L11 L2 ASTORE 2 L12 ALOAD 0 ALOAD 2 INVOKEVIRTUAL test/SimpleTryCatch.handleException(Ljava/lang/RuntimeException;)V L7 ALOAD 0 INVOKEVIRTUAL test/SimpleTryCatch.callFinallyMethod()V GOTO L13 L5 ASTORE 2 L8 ALOAD 0 INVOKEVIRTUAL test/SimpleTryCatch.callFinallyMethod()V RETURN L6 ASTORE 3 ALOAD 0 INVOKEVIRTUAL test/SimpleTryCatch.callFinallyMethod()V ALOAD 3 ATHROW L11 ALOAD 0 INVOKEVIRTUAL test/SimpleTryCatch.callFinallyMethod()V L13 RETURN
So what happens here? (Note that the labels are numbered in the order in which they are created by the compiler, and not in the order in which they appear in the code.) First of all, you will notice that both exception handling blocks are now divided into two: from L0 to L1 and from L3 to L4. This was due to the fact that a finally block was inserted between L1 and L3 because of the return statement.
Due to the fact that exceptions thrown from a finally block should not be processed by catch blocks associated with the same try statement, the corresponding range was removed from the exception table. Entries in the table without exception type refer to the finally block. They must handle exceptions of any type thrown from a try statement or from catch blocks, and they should ignore any substituted finally blocks. Thus, finally blocks will not catch exceptions thrown by the same finally blocks. There are three such records, because in addition to finally inserted inside the try block, the catch (Exception) block also contains a return statement.
You may also be surprised to see that the finally block occurs in code 5 (five) times. The first substituted finally, corresponding to the return statement of the try block, occurs between L1 and L3. The second finally block is a bit more complicated: it is substituted at the end of the first catch block, which then jumps through the rest of the finally code. (I personally believe that it was necessary to make the transition to the end instead of the next embedding.) For the third time it appears between L8 and L6 before the return statement in the second catch block. The fourth time the finally-block appears in the code between L6 and L11, which corresponds to the case of an exceptional situation: you need to be sure that the finally-block will be executed in case of an unhandled exception thrown in a try block or any catch block. Note that the exception is saved as if nothing had happened, the finally operator is called, after which the exception is loaded and thrown again. In the last finally block, control passes from the end of the try block.
If we had nested try / catch or try / finally blocks, everything would be even stranger. The return statement of an internal try-block requires that finally-blocks of both the internal and external try be substituted in front of it. The exception table must be configured so that the exception thrown by the internal finally is caught by external catch and finally statements, and the exception thrown by the external finally is not caught by anyone. Now you are probably trying to imagine which set of states your compiler is forced to carry with you to know what to substitute and how to fill the exception table.
It would be interesting to know, at least to me, how the creators of JVM decided to shove a finally statement into the compiler instead of embedding it in a virtual machine. Obviously, doing this work with the compiler can significantly simplify the virtual machine, but it makes life a little more difficult for people like us who create another language for the JVM.
Non-standard examples
Understanding how the compiler is implemented makes it easier to understand some non-standard cases. For example:
try { return "foo"; } finally { return "bar"; }
The result will be “bar”, because the finally operator will be substituted before the return statement, which means that the return from the finally block will be called first, and the return from the try block will not be called at all.
String value = "foo"; try { return value; } finally { value = "bar"; }
The result will be “foo”, because the value for the return statement will be pushed onto the stack before the finally statement is called, after which it will be restored and returned. (My example does not show this, but this is exactly what you will see if you look at the byte code.) Thus, changing the value of “value” in a finally block has no value for the return statement. And finally, something like:
while(true) { try { return "foo"; } finally { break; } } return "bar";
The result will be “bar”. It was a surprise even for me, but everything is logical if you know that the break statement is only GOTO in bytecode. Those. when the finally block is substituted as part of the internal return statement, the GOTO statement is called before the RETURN instruction, which causes the loop to exit. (The same goes for the continue statement inside the finally block.)
Conclusion
For our part, we decided to prohibit the return, break, and continue statements inside finally-blocks due to undefined semantics, as is done in C #. (And I feel that a good company has gathered from those who made such a decision.)
If someone found this article instructive, I plan to write a few notes about other interesting things that we encountered when generating bytecode. They will apply to both the JVM itself and various inconsistencies between our language and Java, such as closures, external functions and generics.
Glossary
routine = sub-routine
statement = statement
substitution = inlining
exception = exception
external function =
enhancement =
extension function =
mixinLinks
[1]
devblog.guidewire.com/author/akeefer[2]
www.guidewire.com[3]
gosu-lang.org