The principles in the note are common to almost any programming language and execution system, but the emphasis will be on jvm. Consider two main approaches to program modification:
- manipulations with the executable code of the program after compilation or during code loading;
- change source code before compiling.

The metaphor associated with the
image in the note : the program - this is the main building, and the result of the transformation program - an auxiliary structure.
Why modify?
We begin by asking
why the program
should modify another program.
Metaprogramming helps reduce the amount of boilerplate code in a project and concentrate on the main thing, improve code readability and solve problems that are difficult to solve in another way.
')
The simplest example in java is JavaBeans and get / set methods for accessing class fields, an example is the creation of builders for a class, the automatic implementation of equals / hash in the IDE, etc.
The next example is logging, automatic transaction management. Everything is used to when using the Spring Framework and rarely think about how it is implemented. But even Spring created difficulties with the configuration and initialization of the framework for beginners, which caused the emergence of the "magic"
Spring Boot /
Spring Roo . But this is a separate topic, we will return to the topic of program modification.
Obfuscation and minimization of code, instrumental profilers and the world of DevOps is the third example for which you need to modify the program. I think I missed other important examples, you can add in the comments.
So, why modify the program, we have decided, now consider
how it is usually done .
Modification of executable program instructions
You can modify the bytecode of the program and this is the most common way. This can be done either immediately after compilation, but before building the jar, or when the class is loaded. In the first case, it will be a plugin of the project's build system, in the second, a special class loader or java agent, or hotswap mechanism in jvm. The approach with agents and class loaders is very similar to self-modifying machine code and polymorphic viruses.
The bytecode of the file that loads jvm has the structure described in the official
class format documentation.
The
javap application allows
you to view the bytecode of the compiled class.
Example from official documentationClass Source:
import java.awt.*; import java.applet.*; public class DocFooter extends Applet { String date; String email; public void init() { resize(500,100); date = getParameter("LAST_UPDATED"); email = getParameter("EMAIL"); } public void paint(Graphics g) { g.drawString(date + " by ",100, 15); g.drawString(email,290,15); } }
The javap output to the console for the bytecode of this class:
Compiled from "DocFooter.java" public class DocFooter extends java.applet.Applet { java.lang.String date; java.lang.String email; public DocFooter(); Code: 0: aload_0 1: invokespecial #1
But manually modifying the bytecode only makes sense for training purposes or if you are a ninja who loves to create and overcome difficulties.
The Java Virtual Machine Specification answers most questions at this stage.
In industrial programming, the work with byte-code is simplified by the
ASM ,
javassist ,
BCEL ,
CGLIB libraries . After parsing the code bytes, the same ASM allows the programmer to work with the bytes code both through the Tree API and in the event model using the visitor template. In addition to the analysis, it is also possible to modify, add new instructions, fields, methods, etc. In nature, there are other libraries work with byte code, but they are used less frequently.
An example of using the ASM API for
bytecode analysis import java.util.HashSet; import java.util.Iterator; import java.util.List; import java.util.Set; import org.objectweb.asm.ClassReader; import org.objectweb.asm.MethodVisitor; import org.objectweb.asm.Opcodes; import org.objectweb.asm.tree.AbstractInsnNode; import org.objectweb.asm.tree.ClassNode; import org.objectweb.asm.tree.IincInsnNode; import org.objectweb.asm.tree.MethodNode; import org.objectweb.asm.tree.VarInsnNode; import org.objectweb.asm.tree.analysis.Analyzer; import org.objectweb.asm.tree.analysis.BasicValue; import org.objectweb.asm.tree.analysis.BasicVerifier; import org.objectweb.asm.tree.analysis.Frame; import org.objectweb.asm.tree.analysis.SourceInterpreter; import org.objectweb.asm.tree.analysis.SourceValue; import org.objectweb.asm.util.TraceMethodVisitor; import org.objectweb.asm.util.Textifier; public class Analysis implements Opcodes { public static void main(final String[] args) throws Exception { ClassReader cr = new ClassReader("Analysis"); ClassNode cn = new ClassNode(); cr.accept(cn, ClassReader.SKIP_DEBUG); List<MethodNode> methods = cn.methods; for (int i = 0; i < methods.size(); ++i) { MethodNode method = methods.get(i); if (method.instructions.size() > 0) { if (!analyze(cn, method)) { Analyzer<?> a = new Analyzer<BasicValue>( new BasicVerifier()); try { a.analyze(cn.name, method); } catch (Exception ignored) { } final Frame<?>[] frames = a.getFrames(); Textifier t = new Textifier() { @Override public void visitMaxs(final int maxStack, final int maxLocals) { for (int i = 0; i < text.size(); ++i) { StringBuilder s = new StringBuilder( frames[i] == null ? "null" : frames[i].toString()); while (s.length() < Math.max(20, maxStack + maxLocals + 1)) { s.append(' '); } System.err.print(Integer.toString(i + 1000) .substring(1) + " " + s + " : " + text.get(i)); } System.err.println(); } }; MethodVisitor mv = new TraceMethodVisitor(t); for (int j = 0; j < method.instructions.size(); ++j) { Object insn = method.instructions.get(j); ((AbstractInsnNode) insn).accept(mv); } mv.visitMaxs(0, 0); } } } } public static boolean analyze(final ClassNode c, final MethodNode m) throws Exception { Analyzer<SourceValue> a = new Analyzer<SourceValue>( new SourceInterpreter()); Frame<SourceValue>[] frames = a.analyze(c.name, m);
An aspect-oriented approach can be considered a
high-level method of program modification . In AspectJ implementations at the level of an agent, class loader or plug-in, all the “magic” of AOP turns into manipulations with class byte-code. But how the programmer sees this when developing differs from how the byte code is modified “under the hood” using the same ASM and BCEL. If you are wondering what AspectJ actually adds to the classes of your application, you can
include a dump of the modified classes and get into this code to your elbow, for example, using
Java Decompiler .
In AspectJ, a developer defines actions in the form of classes, annotating them as aspects and indicating at which points in the program (Pointcut) they should be called.
The syntax for defining pointcut expressions is also quite high-level. This approach to modifying bytecode is easier to use for a programmer.
More showed and told on examples in a series of publications on Habré Program transformation due to modification of byte-code has its strengths and weaknesses:
pros | Minuses |
---|
the approach works in the absence of source code of the program | complexity of analysis and modification, with non-trivial transformations |
compiler independence | lack of information available only in source code |
Transformation of AST source code, metaprogramming
The theory and practice of transforming source code has long been used in metaprogramming, Prolog, Lisp, macros, and preprocessors of programming languages.
With this approach, the source code of the program is transformed or supplemented by another program before compilation, and then compiled. It is more convenient to work not with the program text itself, but with the abstract syntax tree constructed from it (abstract syntax tree, AST).
Again, the convenience of metaprogramming depends on its support in the programming language itself. There is a joke
In Lisp, if you are hunting for aspect-oriented programming, you just need to set up some macros, and you're done. In Java, you need a Gregor Kichales, creating a new company, and months and years of trying to make everything work.
Peter NorvigTherefore, jvm is a bit more complicated, although the mechanism of reflection is part of the language and the platform can dynamically load and execute bytecode. Two technologies that use code generation - JPA static metamodel generator and jaxb code generation come to mind. Another example is the
project Lombok , which allows you to automatically implement what was previously generated by the IDE or was written manually and was supported by the developers.
Annotations of the project Lombokval
Finally! Hassle-free final local variables.
@ NonNull
or: How I learned to stop NullPointerException.
@ Cleanup
Automatic resource management: Call your close () methods safely with no hassle.
@ Getter / @ setter
Never write public int getFoo () {return foo;} again.
@ ToString
No need to start a field. Just let lombok generate a toString for you!
@ EqualsAndHashCode
Equality made easy: Generates hashCode and equals implementations from your fields.
@ NoArgsConstructor, @ RequiredArgsConstructor and @ AllArgsConstructor
Constructors made to order: Generate field for each field.
@ Data
All together now: A shortcut for @ ToString, @ EqualsAndHashCode, @ Getter on all fields, and @ Setter on all non-final fields, and @ RequiredArgsConstructor!
@ Value
Immutable classes made very easy.
@ Builder
... and Bob's your uncle: No-hassle fancy-pants APIs for object creation!
@ SneakyThrows
Boldly throw checked exceptions
@ Synchronized
synchronized done right: Don't expose your locks.
@ Getter (lazy = true)
Laziness is a virtue!
@ Log
Captain's Log, stardate 24435.7: "What was that line again?"
Implemented in Lombok by
modifying the AST user program and code generation.
Similar functionality, with its limitations, is also available in java for annotations with the scope compile time -
Annotation Processing Tool .
In the case of parsing java source code, it's better than anyone who is better than javac and eclipse java compiller. There are alternatives, such as
Spoon and
JTransformer , but how fully they support the specification and complex classes, there is not even a desire to check.
If we are talking about jvm, then the transformation of the source code of the program on Groovy is
part of the language itself , there are similar possibilities in the
Scala language.
So, in the modification of the source code of the program there are weaknesses and strengths:
pros | Minuses |
---|
more information than bytecode | compilation or interpretation stage (memory, time) |
features like refactoring in IDE | requirement for source code, a way to automatically find it for a given class / jar |
Transformation of AST code and runtime recompilation
The most hardcore part of this article is thoughts about recompiling to runtime.
Modification and compilation of AST java
code at the time of execution may be necessary if crutches are required: the project is either complete caprolit, or it is very laborious to do and maintain dozens of forks of different versions, or if its management and developers consider it ideal and do not allow anyone to modify it, but the source text is in the enterprise maven repository. And this approach is needed
only if the task is impossible or inconvenient to solve with the two previously described program transformation classes.
Compiling is relatively simple.
The JavaCompiler API allows you to compile a program from source code at the time of execution, providing an interface independent of implementation. While studying the manifest and source code of the eclipse EJC compiler, I discovered that it also supports the JavaCompiler API.
But when analyzing the program text, there is still no public and universal API for working with AST. Those. will have to work either with
com.sun.source.tree. * or
org.eclipse.jdt.core.dom. *The task of finding the source code of a class is easily solved if the project was published in the maven repository along with the source artifact and in the jar with classes there are files pom.properties or pom.xml, or there is some dictionary of the name / hash correspondence of the artifact to the source code of the corresponding jar file and a way to get these sources while the program is running.
Pros - more information is available for transformation than is in the byte-code, the application does not require rebuilding the project and is almost as convenient as the application of the AspectJ agent, but it was not possible to perform the transformation by means of the byte-code transformation, or very laborious.
The disadvantages are the same as in the previous approach: memory, time, the requirement for the source code of the program and the way to find it for this class.
The examples of the above in the form of the ejc + maven code will be in the coming months, and the task chosen is quite vital. Have you encountered similar? What tasks from your practice could be elegantly solved only with the help of java code transformation and recompilation during execution?
By the way, the capabilities of the
TinyCC compiler and its size prove that this approach is also possible for C programs.

In this note, we looked at several approaches to modifying the program, their strengths and weaknesses. Modification of the executable code is more common, but not all tasks can be solved without the source code of the program and its subsequent transformation.