Program modification and what is better to change: executable code or AST program?

The principles in the note are common to almost any programming language and execution system, but the emphasis will be on jvm. Consider two main approaches to program modification:

manipulations with the executable code of the program after compilation or during code loading;
change source code before compiling.

The metaphor associated with the image in the note : the program - this is the main building, and the result of the transformation program - an auxiliary structure.

Why modify?

We begin by asking why the program should modify another program. Metaprogramming helps reduce the amount of boilerplate code in a project and concentrate on the main thing, improve code readability and solve problems that are difficult to solve in another way.
')
The simplest example in java is JavaBeans and get / set methods for accessing class fields, an example is the creation of builders for a class, the automatic implementation of equals / hash in the IDE, etc.

The next example is logging, automatic transaction management. Everything is used to when using the Spring Framework and rarely think about how it is implemented. But even Spring created difficulties with the configuration and initialization of the framework for beginners, which caused the emergence of the "magic" Spring Boot / Spring Roo . But this is a separate topic, we will return to the topic of program modification.

Obfuscation and minimization of code, instrumental profilers and the world of DevOps is the third example for which you need to modify the program. I think I missed other important examples, you can add in the comments.

So, why modify the program, we have decided, now consider how it is usually done .

Modification of executable program instructions

You can modify the bytecode of the program and this is the most common way. This can be done either immediately after compilation, but before building the jar, or when the class is loaded. In the first case, it will be a plugin of the project's build system, in the second, a special class loader or java agent, or hotswap mechanism in jvm. The approach with agents and class loaders is very similar to self-modifying machine code and polymorphic viruses.

The bytecode of the file that loads jvm has the structure described in the official class format documentation.
The javap application allows you to view the bytecode of the compiled class.

Example from official documentation

Class Source:

import java.awt.*; import java.applet.*; public class DocFooter extends Applet { String date; String email; public void init() { resize(500,100); date = getParameter("LAST_UPDATED"); email = getParameter("EMAIL"); } public void paint(Graphics g) { g.drawString(date + " by ",100, 15); g.drawString(email,290,15); } }

The javap output to the console for the bytecode of this class:

 Compiled from "DocFooter.java" public class DocFooter extends java.applet.Applet { java.lang.String date; java.lang.String email; public DocFooter(); Code: 0: aload_0 1: invokespecial #1 // Method java/applet/Applet."<init>":()V 4: return public void init(); Code: 0: aload_0 1: sipush 500 4: bipush 100 6: invokevirtual #2 // Method resize:(II)V 9: aload_0 10: aload_0 11: ldc #3 // String LAST_UPDATED 13: invokevirtual #4 // Method getParameter:(Ljava/lang/String;)Ljava/lang/String; 16: putfield #5 // Field date:Ljava/lang/String; 19: aload_0 20: aload_0 21: ldc #6 // String EMAIL 23: invokevirtual #4 // Method getParameter:(Ljava/lang/String;)Ljava/lang/String; 26: putfield #7 // Field email:Ljava/lang/String; 29: return public void paint(java.awt.Graphics); Code: 0: aload_1 1: new #8 // class java/lang/StringBuilder 4: dup 5: invokespecial #9 // Method java/lang/StringBuilder."<init>":()V 8: aload_0 9: getfield #5 // Field date:Ljava/lang/String; 12: invokevirtual #10 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 15: ldc #11 // String by 17: invokevirtual #10 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 20: invokevirtual #12 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 23: bipush 100 25: bipush 15 27: invokevirtual #13 // Method java/awt/Graphics.drawString:(Ljava/lang/String;II)V 30: aload_1 31: aload_0 32: getfield #7 // Field email:Ljava/lang/String; 35: sipush 290 38: bipush 15 40: invokevirtual #13 // Method java/awt/Graphics.drawString:(Ljava/lang/String;II)V 43: return }

But manually modifying the bytecode only makes sense for training purposes or if you are a ninja who loves to create and overcome difficulties. The Java Virtual Machine Specification answers most questions at this stage.

In industrial programming, the work with byte-code is simplified by the ASM , javassist , BCEL , CGLIB libraries . After parsing the code bytes, the same ASM allows the programmer to work with the bytes code both through the Tree API and in the event model using the visitor template. In addition to the analysis, it is also possible to modify, add new instructions, fields, methods, etc. In nature, there are other libraries work with byte code, but they are used less frequently.

An example of using the ASM API for

bytecode analysis

 /*** * ASM examples: examples showing how ASM can be used * Copyright (c) 2000-2011 INRIA, France Telecom * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the copyright holders nor the names of its * contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF * THE POSSIBILITY OF SUCH DAMAGE. */ import java.util.HashSet; import java.util.Iterator; import java.util.List; import java.util.Set; import org.objectweb.asm.ClassReader; import org.objectweb.asm.MethodVisitor; import org.objectweb.asm.Opcodes; import org.objectweb.asm.tree.AbstractInsnNode; import org.objectweb.asm.tree.ClassNode; import org.objectweb.asm.tree.IincInsnNode; import org.objectweb.asm.tree.MethodNode; import org.objectweb.asm.tree.VarInsnNode; import org.objectweb.asm.tree.analysis.Analyzer; import org.objectweb.asm.tree.analysis.BasicValue; import org.objectweb.asm.tree.analysis.BasicVerifier; import org.objectweb.asm.tree.analysis.Frame; import org.objectweb.asm.tree.analysis.SourceInterpreter; import org.objectweb.asm.tree.analysis.SourceValue; import org.objectweb.asm.util.TraceMethodVisitor; import org.objectweb.asm.util.Textifier; /** * @author Eric Bruneton */ public class Analysis implements Opcodes { public static void main(final String[] args) throws Exception { ClassReader cr = new ClassReader("Analysis"); ClassNode cn = new ClassNode(); cr.accept(cn, ClassReader.SKIP_DEBUG); List<MethodNode> methods = cn.methods; for (int i = 0; i < methods.size(); ++i) { MethodNode method = methods.get(i); if (method.instructions.size() > 0) { if (!analyze(cn, method)) { Analyzer<?> a = new Analyzer<BasicValue>( new BasicVerifier()); try { a.analyze(cn.name, method); } catch (Exception ignored) { } final Frame<?>[] frames = a.getFrames(); Textifier t = new Textifier() { @Override public void visitMaxs(final int maxStack, final int maxLocals) { for (int i = 0; i < text.size(); ++i) { StringBuilder s = new StringBuilder( frames[i] == null ? "null" : frames[i].toString()); while (s.length() < Math.max(20, maxStack + maxLocals + 1)) { s.append(' '); } System.err.print(Integer.toString(i + 1000) .substring(1) + " " + s + " : " + text.get(i)); } System.err.println(); } }; MethodVisitor mv = new TraceMethodVisitor(t); for (int j = 0; j < method.instructions.size(); ++j) { Object insn = method.instructions.get(j); ((AbstractInsnNode) insn).accept(mv); } mv.visitMaxs(0, 0); } } } } /* * Detects unused xSTORE instructions, ie xSTORE instructions without at * least one xLOAD corresponding instruction in their successor instructions * (in the control flow graph). */ public static boolean analyze(final ClassNode c, final MethodNode m) throws Exception { Analyzer<SourceValue> a = new Analyzer<SourceValue>( new SourceInterpreter()); Frame<SourceValue>[] frames = a.analyze(c.name, m); // for each xLOAD instruction, we find the xSTORE instructions that can // produce the value loaded by this instruction, and we put them in // 'stores' Set<AbstractInsnNode> stores = new HashSet<AbstractInsnNode>(); for (int i = 0; i < m.instructions.size(); ++i) { AbstractInsnNode insn = m.instructions.get(i); int opcode = insn.getOpcode(); if ((opcode >= ILOAD && opcode <= ALOAD) || opcode == IINC) { int var = opcode == IINC ? ((IincInsnNode) insn).var : ((VarInsnNode) insn).var; Frame<SourceValue> f = frames[i]; if (f != null) { Set<AbstractInsnNode> s = f.getLocal(var).insns; Iterator<AbstractInsnNode> j = s.iterator(); while (j.hasNext()) { insn = j.next(); if (insn instanceof VarInsnNode) { stores.add(insn); } } } } } // we then find all the xSTORE instructions that are not in 'stores' boolean ok = true; for (int i = 0; i < m.instructions.size(); ++i) { AbstractInsnNode insn = m.instructions.get(i); int opcode = insn.getOpcode(); if (opcode >= ISTORE && opcode <= ASTORE) { if (!stores.contains(insn)) { ok = false; System.err.println("method " + m.name + ", instruction " + i + ": useless store instruction"); } } } return ok; } /* * Test for the above method, with three useless xSTORE instructions. */ public int test(int i, int j) { i = i + 1; // ok, because i can be read after this point if (j == 0) { j = 1; // useless } else { try { j = j - 1; // ok, because j can be accessed in the catch int k = 0; if (i > 0) { k = i - 1; } return k; } catch (Exception e) { // useless ASTORE (e is never used) j = j + 1; // useless } } return 0; } }

An aspect-oriented approach can be considered a high-level method of program modification . In AspectJ implementations at the level of an agent, class loader or plug-in, all the “magic” of AOP turns into manipulations with class byte-code. But how the programmer sees this when developing differs from how the byte code is modified “under the hood” using the same ASM and BCEL. If you are wondering what AspectJ actually adds to the classes of your application, you can include a dump of the modified classes and get into this code to your elbow, for example, using Java Decompiler .

In AspectJ, a developer defines actions in the form of classes, annotating them as aspects and indicating at which points in the program (Pointcut) they should be called. The syntax for defining pointcut expressions is also quite high-level. This approach to modifying bytecode is easier to use for a programmer.

More showed and told on examples in a series of publications on Habré

Program transformation due to modification of byte-code has its strengths and weaknesses:

pros	Minuses
the approach works in the absence of source code of the program	complexity of analysis and modification, with non-trivial transformations
compiler independence	lack of information available only in source code

Transformation of AST source code, metaprogramming

The theory and practice of transforming source code has long been used in metaprogramming, Prolog, Lisp, macros, and preprocessors of programming languages.

With this approach, the source code of the program is transformed or supplemented by another program before compilation, and then compiled. It is more convenient to work not with the program text itself, but with the abstract syntax tree constructed from it (abstract syntax tree, AST).

Again, the convenience of metaprogramming depends on its support in the programming language itself. There is a joke

In Lisp, if you are hunting for aspect-oriented programming, you just need to set up some macros, and you're done. In Java, you need a Gregor Kichales, creating a new company, and months and years of trying to make everything work.

Peter Norvig

Therefore, jvm is a bit more complicated, although the mechanism of reflection is part of the language and the platform can dynamically load and execute bytecode. Two technologies that use code generation - JPA static metamodel generator and jaxb code generation come to mind. Another example is the project Lombok , which allows you to automatically implement what was previously generated by the IDE or was written manually and was supported by the developers.

Annotations of the project Lombok

val
Finally! Hassle-free final local variables.
@ NonNull
or: How I learned to stop NullPointerException.
@ Cleanup
Automatic resource management: Call your close () methods safely with no hassle.
@ Getter / @ setter
Never write public int getFoo () {return foo;} again.
@ ToString
No need to start a field. Just let lombok generate a toString for you!
@ EqualsAndHashCode
Equality made easy: Generates hashCode and equals implementations from your fields.
@ NoArgsConstructor, @ RequiredArgsConstructor and @ AllArgsConstructor
Constructors made to order: Generate field for each field.
@ Data
All together now: A shortcut for @ ToString, @ EqualsAndHashCode, @ Getter on all fields, and @ Setter on all non-final fields, and @ RequiredArgsConstructor!
@ Value
Immutable classes made very easy.
@ Builder
... and Bob's your uncle: No-hassle fancy-pants APIs for object creation!
@ SneakyThrows
Boldly throw checked exceptions
@ Synchronized
synchronized done right: Don't expose your locks.
@ Getter (lazy = true)
Laziness is a virtue!
@ Log
Captain's Log, stardate 24435.7: "What was that line again?"

Implemented in Lombok by modifying the AST user program and code generation.

Similar functionality, with its limitations, is also available in java for annotations with the scope compile time - Annotation Processing Tool .

In the case of parsing java source code, it's better than anyone who is better than javac and eclipse java compiller. There are alternatives, such as Spoon and JTransformer , but how fully they support the specification and complex classes, there is not even a desire to check.

If we are talking about jvm, then the transformation of the source code of the program on Groovy is part of the language itself , there are similar possibilities in the Scala language.

So, in the modification of the source code of the program there are weaknesses and strengths:

pros	Minuses
more information than bytecode	compilation or interpretation stage (memory, time)
features like refactoring in IDE	requirement for source code, a way to automatically find it for a given class / jar

Transformation of AST code and runtime recompilation

The most hardcore part of this article is thoughts about recompiling to runtime. Modification and compilation of AST java code at the time of execution may be necessary if crutches are required: the project is either complete caprolit, or it is very laborious to do and maintain dozens of forks of different versions, or if its management and developers consider it ideal and do not allow anyone to modify it, but the source text is in the enterprise maven repository. And this approach is needed only if the task is impossible or inconvenient to solve with the two previously described program transformation classes.

Compiling is relatively simple. The JavaCompiler API allows you to compile a program from source code at the time of execution, providing an interface independent of implementation. While studying the manifest and source code of the eclipse EJC compiler, I discovered that it also supports the JavaCompiler API.

But when analyzing the program text, there is still no public and universal API for working with AST. Those. will have to work either with com.sun.source.tree. * or org.eclipse.jdt.core.dom. *

The task of finding the source code of a class is easily solved if the project was published in the maven repository along with the source artifact and in the jar with classes there are files pom.properties or pom.xml, or there is some dictionary of the name / hash correspondence of the artifact to the source code of the corresponding jar file and a way to get these sources while the program is running.

Pros - more information is available for transformation than is in the byte-code, the application does not require rebuilding the project and is almost as convenient as the application of the AspectJ agent, but it was not possible to perform the transformation by means of the byte-code transformation, or very laborious.

The disadvantages are the same as in the previous approach: memory, time, the requirement for the source code of the program and the way to find it for this class.

The examples of the above in the form of the ejc + maven code will be in the coming months, and the task chosen is quite vital. Have you encountered similar? What tasks from your practice could be elegantly solved only with the help of java code transformation and recompilation during execution?

By the way, the capabilities of the TinyCC compiler and its size prove that this approach is also possible for C programs.

In this note, we looked at several approaches to modifying the program, their strengths and weaknesses. Modification of the executable code is more common, but not all tasks can be solved without the source code of the program and its subsequent transformation.

Source: https://habr.com/ru/post/269037/

All Articles

Program modification and what is better to change: executable code or AST program?

Why modify?

Modification of executable program instructions

Transformation of AST source code, metaprogramming

Transformation of AST code and runtime recompilation

More articles: