📜 ⬆️ ⬇️

Parsing Java programs using java programs

Understood with the theory in the publication "Modification of the program and what better to change: the executable code or AST program?" . Let's go to practice using the Eclipse java compiler API.



A Java program that digests a java program begins with working on an abstract syntax tree (AST) ...

Before transforming a program, it would be good to learn how to work with its intermediate representation in the computer's memory. With this, let's begin.
')
I will repeat the conclusions from my previous publication that there is no public and universal API for analyzing the source texts on java for working with the abstract syntax tree of the program. You will have to work with either com.sun.source.tree. * Or org.eclipse.jdt.core.dom. *

The choice for an example in this article is the Eclipse java compiler (ejc) and its AST model org.eclipse.jdt.core.dom. *

I’ll give a few reasons in favor of ejc:


The program that I wrote for the example of working with the AST java program will bypass all classes from the jar file and analyze the calls of the methods of the logging classes org.slf4j.Logger, org.apache.commons.logging.Log, org.springframework.boot that interest us. .cli.util.Log

The task of finding the source text for a class is easily solved if the project was published in the maven repository together with an artifact of type source and in the jar with classes there are files pom.properties or pom.xml. With the extraction of this information, at the time of program execution, we will be helped by the MavenCoordHelper class from the io.fabric8.insight artifact: insight-log4j and the class loader from the Maven MavenClassLoader repository from the com.github.smreed: dropship artifact.

MavenCoordHelper allows you to find for a given class the coordinates of a groupId: artifactId: version from the pom.properties file in this jar file
public static String getMavenSourcesId(String className) { String mavenCoordinates = io.fabric8.insight.log.log4j.MavenCoordHelper.getMavenCoordinates(className); if(mavenCoordinates==null) return null; DefaultArtifact artifact = new DefaultArtifact(mavenCoordinates); return String.format("%s:%s:%s:sources:%s", artifact.getGroupId(), artifact.getArtifactId(), artifact.getExtension(), artifact.getVersion()); } 


MavenClassLoader allows you to load source text on these coordinates for analysis and create a classpath (including transitive dependencies) to define types in the program. Download from the maven repository:
  public static LoadingCache<String, URLClassLoader> createMavenClassloaderCache() { return CacheBuilder.newBuilder() .maximumSize(MAX_CACHE_SIZE) .build(new CacheLoader<String, URLClassLoader>() { @Override public URLClassLoader load(String mavenId) throws Exception { return com.github.smreed.dropship.MavenClassLoader.forMavenCoordinates(mavenId); } }); } 


The initialization of the EJC compiler itself and working with AST are quite simple:
 package com.github.igorsuhorukov.java.ast; import com.google.common.cache.LoadingCache; import org.eclipse.jdt.core.JavaCore; import org.eclipse.jdt.core.dom.AST; import org.eclipse.jdt.core.dom.ASTParser; import org.eclipse.jdt.core.dom.CompilationUnit; import java.net.URLClassLoader; import java.util.Set; import static com.github.igorsuhorukov.java.ast.ParserUtils.*; public class Parser { public static final String[] SOURCE_PATH = new String[]{System.getProperty("java.io.tmpdir")}; public static final String[] SOURCE_ENCODING = new String[]{"UTF-8"}; public static void main(String[] args) throws Exception { if(args.length!=1) throw new IllegalArgumentException("Class name should be specified"); String file = getJarFileByClass(Class.forName(args[0])); Set<String> classes = getClasses(file); LoadingCache<String, URLClassLoader> classLoaderCache = createMavenClassloaderCache(); for (final String currentClassName : classes) { String mavenSourcesId = getMavenSourcesId(currentClassName); if (mavenSourcesId == null) throw new IllegalArgumentException("Maven group:artifact:version not found for class " + currentClassName); URLClassLoader urlClassLoader = classLoaderCache.get(mavenSourcesId); ASTParser parser = ASTParser.newParser(AST.JLS8); parser.setResolveBindings(true); parser.setKind(ASTParser.K_COMPILATION_UNIT); parser.setCompilerOptions(JavaCore.getOptions()); parser.setEnvironment(prepareClasspath(urlClassLoader), SOURCE_PATH, SOURCE_ENCODING, true); parser.setUnitName(currentClassName + ".java"); String sourceText = getClassSourceCode(currentClassName, urlClassLoader); if(sourceText == null) continue; parser.setSource(sourceText.toCharArray()); CompilationUnit cu = (CompilationUnit) parser.createAST(null); cu.accept(new LoggingVisitor(cu, currentClassName)); } } } 

Having created a parser, we indicate that the source code will correspond to the Java 8 language specification
ASTParser parser = ASTParser.newParser (AST.JLS8);

And that after parsing, it is necessary to resolve identifier types based on the classpath, which we passed to the compiler
parser.setResolveBindings (true);

The source code of the class is passed to the parser by calling
parser.setSource (sourceText.toCharArray ());

Create an AST tree of this class.
CompilationUnit cu = (CompilationUnit) parser.createAST (null);

And we get events when traversing AST using our Visitor class
cu.accept (new LoggingVisitor (cu, currentClassName));


By extending the ASTVisitor class and overloading the public boolean visit (MethodInvocation node) method in it, we pass it to the ejc compiler. In this handler, we analyze that this is exactly the methods of those classes that we are interested in and after that we analyze the arguments of the method being called.

When traversing the AST program tree, which also contains additional information about types, our visit method will be called. In it, we get information about the location of the token in the source file, parameters, expressions, etc.

The main stuffing with analysis of the places of interest for calling logger methods in the analyzed program is encapsulated in LoggingVisitor:
LoggingVisitor.java
 package com.github.igorsuhorukov.java.ast; import org.eclipse.jdt.core.dom.*; import java.util.ArrayList; import java.util.HashSet; import java.util.List; import java.util.Set; class LoggingVisitor extends ASTVisitor { final static Set<String> LOGGER_CLASS = new HashSet<String>() {{ add("org.slf4j.Logger"); add("org.apache.commons.logging.Log"); add("org.springframework.boot.cli.util.Log"); }}; final static Set<String> LOGGER_METHOD = new HashSet<String>() {{ add("fatal"); add("error"); add("warn"); add("info"); add("debug"); add("trace"); }}; public static final String LITERAL = "Literal"; public static final String FORMAT_METHOD = "format"; private final CompilationUnit cu; private final String currentClassName; public LoggingVisitor(CompilationUnit cu, String currentClassName) { this.cu = cu; this.currentClassName = currentClassName; } @Override public boolean visit(MethodInvocation node) { if (LOGGER_METHOD.contains(node.getName().getIdentifier())) { ITypeBinding objType = node.getExpression() != null ? node.getExpression().resolveTypeBinding() : null; if (objType != null && LOGGER_CLASS.contains(objType.getBinaryName())) { int lineNumber = cu.getLineNumber(node.getStartPosition()); boolean isFormat = false; boolean isConcat = false; boolean isLiteral1 = false; boolean isLiteral2 = false; boolean isMethod = false; boolean withException = false; for (int i = 0; i < node.arguments().size(); i++) { ASTNode innerNode = (ASTNode) node.arguments().get(i); if (i == node.arguments().size() - 1) { if (innerNode instanceof SimpleName && ((SimpleName) innerNode).resolveTypeBinding() != null) { ITypeBinding typeBinding = ((SimpleName) innerNode).resolveTypeBinding(); while (typeBinding != null && Object.class.getName().equals(typeBinding.getBinaryName())) { if (Throwable.class.getName().equals(typeBinding.getBinaryName())) { withException = true; break; } typeBinding = typeBinding.getSuperclass(); } if (withException) continue; } } if (innerNode instanceof MethodInvocation) { MethodInvocation methodInvocation = (MethodInvocation) innerNode; if (FORMAT_METHOD.equals(methodInvocation.getName().getIdentifier()) && methodInvocation.getExpression() != null && methodInvocation.getExpression().resolveTypeBinding() != null && String.class.getName().equals(methodInvocation.getExpression().resolveTypeBinding().getBinaryName())) { isFormat = true; } else { isMethod = true; } } else if (innerNode instanceof InfixExpression) { InfixExpression infixExpression = (InfixExpression) innerNode; if (InfixExpression.Operator.PLUS.equals(infixExpression.getOperator())) { List expressions = new ArrayList(); expressions.add(infixExpression.getLeftOperand()); expressions.add(infixExpression.getRightOperand()); expressions.addAll(infixExpression.extendedOperands()); long stringLiteralCount = expressions.stream().filter(item -> item instanceof StringLiteral).count(); long notLiteralCount = expressions.stream().filter(item -> item.getClass().getName().contains(LITERAL)).count(); if (notLiteralCount > 0 && stringLiteralCount > 0) { isConcat = true; } } } else if (innerNode instanceof Expression && innerNode.getClass().getName().contains(LITERAL)) { isLiteral1 = true; } else if (innerNode instanceof SimpleName || innerNode instanceof QualifiedName || innerNode instanceof ConditionalExpression || innerNode instanceof ThisExpression || innerNode instanceof ParenthesizedExpression || innerNode instanceof PrefixExpression || innerNode instanceof PostfixExpression || innerNode instanceof ArrayCreation || innerNode instanceof ArrayAccess || innerNode instanceof FieldAccess || innerNode instanceof ClassInstanceCreation) { isLiteral2 = true; } } String type = loggerInvocationType(node, isFormat, isConcat, isLiteral1 || isLiteral2, isMethod); System.out.println(currentClassName + ":" + lineNumber + "\t\t\t" + node+"\t\ttype "+type); //node.getStartPosition() } } return true; } private String loggerInvocationType(MethodInvocation node, boolean isFormat, boolean isConcat, boolean isLiteral, boolean isMethod) { if (!isConcat && !isFormat && isLiteral) { return "literal"; } else { if (isFormat && isConcat) { return "format concat"; } else if (isFormat && !isLiteral) { return "format"; } else if (isConcat && !isLiteral) { return "concat"; } else { if (isConcat || isFormat || isLiteral) { if (node.arguments().size() == 1) { return "single argument"; } else { return "mixed logging"; } } } if(isMethod){ return "method"; } } return "unknown"; } } 


The dependencies of the analyzer program required for compilation and operation are described in
pom.xml
 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <parent> <groupId>org.sonatype.oss</groupId> <artifactId>oss-parent</artifactId> <version>7</version> </parent> <modelVersion>4.0.0</modelVersion> <groupId>com.github.igor-suhorukov</groupId> <artifactId>java-ast</artifactId> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <properties> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <insight.version>1.2.0.redhat-133</insight.version> </properties> <dependencies> <!-- EJC --> <dependency> <groupId>org.eclipse.tycho</groupId> <artifactId>org.eclipse.jdt.core</artifactId> <version>3.11.0.v20150602-1242</version> </dependency> <dependency> <groupId>org.eclipse.core</groupId> <artifactId>runtime</artifactId> <version>3.9.100-v20131218-1515</version> </dependency> <dependency> <groupId>org.eclipse.birt.runtime</groupId> <artifactId>org.eclipse.core.resources</artifactId> <version>3.8.101.v20130717-0806</version> </dependency> <!-- MAVEN --> <dependency> <groupId>io.fabric8.insight</groupId> <artifactId>insight-log4j</artifactId> <version>${insight.version}</version> <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>io.fabric8.insight</groupId> <artifactId>insight-log-core</artifactId> <version>${insight.version}</version> </dependency> <dependency> <groupId>io.fabric8</groupId> <artifactId>common-util</artifactId> <version>${insight.version}</version> </dependency> <dependency> <groupId>com.github.igor-suhorukov</groupId> <artifactId>aspectj-scripting</artifactId> <version>1.0</version> <classifier>agent</classifier> </dependency> <dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>19.0-rc2</version> </dependency> <!-- Dependency to analyze --> <dependency> <groupId>com.googlecode.log4jdbc</groupId> <artifactId>log4jdbc</artifactId> <version>1.2</version> </dependency> </dependencies> </project> 

Part of the "street magic", which helps with parsing, is hidden in the class ParserUtils, implemented at the expense of third-party libraries and discussed above.

ParserUtils.java
 package com.github.igorsuhorukov.java.ast; import com.google.common.cache.CacheBuilder; import com.google.common.cache.CacheLoader; import com.google.common.cache.LoadingCache; import com.google.common.io.CharStreams; import org.sonatype.aether.util.artifact.DefaultArtifact; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.net.URL; import java.net.URLClassLoader; import java.security.CodeSource; import java.util.Arrays; import java.util.Collections; import java.util.Set; import java.util.function.Function; import java.util.jar.JarEntry; import java.util.jar.JarFile; import java.util.stream.Collectors; public class ParserUtils { public static final int MAX_CACHE_SIZE = 1000; public static Set<String> getClasses(String file) throws IOException { return Collections.list(new JarFile(file).entries()).stream() .filter(jar -> jar.getName().endsWith("class") && !jar.getName().contains("$")) .map(new Function<JarEntry, String>() { @Override public String apply(JarEntry jarEntry) { return jarEntry.getName().replace(".class", "").replace('/', '.'); } }).collect(Collectors.toSet()); } public static String getMavenSourcesId(String className) { String mavenCoordinates = io.fabric8.insight.log.log4j.MavenCoordHelper.getMavenCoordinates(className); if(mavenCoordinates==null) return null; DefaultArtifact artifact = new DefaultArtifact(mavenCoordinates); return String.format("%s:%s:%s:sources:%s", artifact.getGroupId(), artifact.getArtifactId(), artifact.getExtension(), artifact.getVersion()); } public static LoadingCache<String, URLClassLoader> createMavenClassloaderCache() { return CacheBuilder.newBuilder() .maximumSize(MAX_CACHE_SIZE) .build(new CacheLoader<String, URLClassLoader>() { @Override public URLClassLoader load(String mavenId) throws Exception { return com.github.smreed.dropship.MavenClassLoader.forMavenCoordinates(mavenId); } }); } public static String[] prepareClasspath(URLClassLoader urlClassLoader) { return Arrays.stream(urlClassLoader.getURLs()).map(new Function<URL, String>() { @Override public String apply(URL url) { return url.getFile(); } }).toArray(String[]::new); } public static String getJarFileByClass(Class<?> clazz) { CodeSource source = clazz.getProtectionDomain().getCodeSource(); String file = null; if (source != null) { URL locationURL = source.getLocation(); if ("file".equals(locationURL.getProtocol())) { file = locationURL.getPath(); } else { file = locationURL.toString(); } } return file; } static String getClassSourceCode(String className, URLClassLoader urlClassLoader) throws IOException { String sourceText = null; try (InputStream javaSource = urlClassLoader.getResourceAsStream(className.replace(".", "/") + ".java")) { if (javaSource != null){ try (InputStreamReader sourceReader = new InputStreamReader(javaSource)){ sourceText = CharStreams.toString(sourceReader); } } } return sourceText; } } 

Running com.github.igorsuhorukov.java.ast.Parser for execution and passing, as a parameter for analysis, the name of the class net.sf.log4jdbc.ConnectionSpy

Get the output in the console, from which you can understand which parameters are passed to the methods:
Application console
[Dropship WARN] No dropship.properties found! Using .dropship-prefixed system properties (-D)
[Dropship INFO] Collecting maven metadata.
[Dropship INFO] Resolving dependencies.
[Dropship INFO] Building classpath for com.googlecode.log4jdbc: log4jdbc: jar: sources: 1.2 from 2 URLs.
net.sf.log4jdbc.Slf4jSpyLogDelegator: 104 jdbcLogger.error (header, e) type literal
net.sf.log4jdbc.Slf4jSpyLogDelegator: 105 sqlOnlyLogger.error (header, e) type literal
net.sf.log4jdbc.Slf4jSpyLogDelegator: 106 sqlTimingLogger.error (header, e) type literal
net.sf.log4jdbc.Slf4jSpyLogDelegator: 111 jdbcLogger.error (header + "" + sql, e) type mixed logging
net.sf.log4jdbc.Slf4jSpyLogDelegator: 116 sqlOnlyLogger.error (getDebugInfo () + nl + spyNo + "." + sql, e) type mixed logging
net.sf.log4jdbc.Slf4jSpyLogDelegator: 120 sqlOnlyLogger.error (header + "" + sql, e) type mixed logging
net.sf.log4jdbc.Slf4jSpyLogDelegator: 126 sqlTimingLogger.error (getDebugInfo () + nl + spyNo + "." + sql + "{FAILED after" + execTime + "msec}", e) type mixed logging
net.sf.log4jdbc.Slf4jSpyLogDelegator: 130 sqlTimingLogger.error (header + "FAILED!" + sql + "{FAILED after" + execTime + "msec}", e) type mixed logging
net.sf.log4jdbc.Slf4jSpyLogDelegator: 158 logger.debug (header + "" + getDebugInfo ()) type concat
net.sf.log4jdbc.Slf4jSpyLogDelegator: 162 logger.info (header) type literal
net.sf.log4jdbc.Slf4jSpyLogDelegator: 221 sqlOnlyLogger.debug (getDebugInfo () + nl + spy.getConnectionNumber () + "." + processSql (sql)) type concat
net.sf.log4jdbc.Slf4jSpyLogDelegator: 226 sqlOnlyLogger.info (processSql (sql)) type method
net.sf.log4jdbc.Slf4jSpyLogDelegator: 352 sqlTimingLogger.error (buildSqlTimingDump (spy, execTime, methodCall, sql, sqlTimingLogger.isDebugEnabled ())) type method
net.sf.log4jdbc.Slf4jSpyLogDelegator: 360 sqlTimingLogger.warn (buildSqlTimingDump (spy, execTime, methodCall, sql, sqlTimingLogger.isDebugEnabled ())) type method
net.sf.log4jdbc.Slf4jSpyLogDelegator: 365 sqlTimingLogger.debug (buildSqlTimingDump (spy, execTime, methodCall, sql, true)) type method
net.sf.log4jdbc.Slf4jSpyLogDelegator: 370 sqlTimingLogger.info (buildSqlTimingDump (spy, execTime, methodCall, sql, false)) type method
net.sf.log4jdbc.Slf4jSpyLogDelegator: 519 debugLogger.debug (msg) type literal
net.sf.log4jdbc.Slf4jSpyLogDelegator: 531 connectionLogger.info (spy.getConnectionNumber () + ". Connection opened" + getDebugInfo ()) type concat
net.sf.log4jdbc.Slf4jSpyLogDelegator: 533 connectionLogger.debug (ConnectionSpy.getOpenConnectionsDump ()) type method
net.sf.log4jdbc.Slf4jSpyLogDelegator: 537 connectionLogger.info (spy.getConnectionNumber () + ". Connection opened") type concat
net.sf.log4jdbc.Slf4jSpyLogDelegator: 550 connectionLogger.info (spy.getConnectionNumber () + ". Connection closed" + getDebugInfo ()) type concat
net.sf.log4jdbc.Slf4jSpyLogDelegator: 552 connectionLogger.debug (ConnectionSpy.getOpenConnectionsDump ()) type method
net.sf.log4jdbc.Slf4jSpyLogDelegator: 556 connectionLogger.info (spy.getConnectionNumber () + ". Connection closed") type concat



For example, if when you call the info method, concatenation of the spy.getConnectionNumber () method call, the string ". Connection opened" and the getDebugInfo () method call to the string occurs, we get the message that this is a concat
net.sf.log4jdbc.Slf4jSpyLogDelegator: 531 connectionLogger.info (spy.getConnectionNumber () + ". Connection opened" + getDebugInfo ()) type concat

And after that we could transform the source text in such a way as to replace the concatenation operation in the parameters of this method, calling the method with the pattern "{}. Connection opened {}" and the parameters spy.getConnectionNumber (), getDebugInfo (). And then this more machine-readable call and information from it can be sent immediately to Elasticsearch, which I already mentioned in the article “Publishing logs to Elasticsearch - life without regular expressions and without logstash” .

As you can see, parsing and analyzing java programs is easy to implement in java code using the ejc compiler and it is also easy to programmatically obtain source codes for classes of interest from the Maven repository. An example from the article is available on github

Ahead of us is the Java agent, modification and compilation into runtime is the task.
bigger and harder than just digesting AST ...


See you soon!

Source: https://habr.com/ru/post/269129/


All Articles