Finding Java bytecode vulnerabilities: what to do with the results?

Solar inCode can detect vulnerabilities in Java bytecode. But showing the bytecode statement that contains the vulnerability is not enough. How to show the vulnerability in the source code, which is not?

In practice, when using the vulnerability scan tool, one of three actions must be applied to each vulnerability found:

eliminate;
take risks;
prove that this is not a vulnerability (false positive).

All three actions require some vulnerability analysis (for example, you need to understand how to eliminate it and what risks it carries).

How to conduct such an analysis, if we find vulnerabilities in Java bytecode, but we don’t have source code?
')
The main method of searching for vulnerabilities when creating inCode was static analysis - search for vulnerabilities without executing code. With static analysis you need:

build a code model (intermediate representation);
Supplement the model with information about data using static analysis algorithms (data flow analysis, control flow - dataflow analysis, taint analysis);
apply the rules of searching for vulnerabilities (the rules say where vulnerabilities are in the code model, in terms of this model and the information with which it is supplemented).

Finding vulnerabilities in the intermediate view, you need to map them in terms of source code for further analysis.

The first applications in which Solar inCode looked for vulnerabilities were Android and Java applications. The search for vulnerabilities in executable files is quite popular:

under the terms of the contract, the customer may not transmit the source code;
even if the source code was transferred, the executable code that does not match the transferred source code may go to the battle stand (or on Google Play);
developers use third-party components without source code, such code also needs to be controlled.

Therefore, for Android and Java applications, we chose Java bytecode as an intermediate representation for static analysis. After compiling the source code of the mobile application into Java bytecode, the Dalvik compiler combines the class files and recompiles the code into the bytecode for Dalvik, obtaining the executable dex file. The executable file, along with the resources and the configuration file, is packaged in an apk package that is distributed through Google Play. There are tools for processing and converting apk packages: unpacking, decrypting resources and configuration files, translating Dalvik code into Java bytecode ( apktool , dex2jar ).

Bytecode can also be obtained from the source code by compiling (this is how we do when analyzing Java and Scala source code). Thus, Java bytecode is well suited as a single internal representation when analyzing the source and executable Java code and Android applications (in fact, you can also analyze all languages compiled into Java bytecode).

Java bytecode can be decompiled, while getting sufficiently high quality code. There are many decompilers for Java ( JD , fernflower , Procyon ). We did not use the restored Java code as an intermediate representation, since any means of decompiling make mistakes, which can affect the quality of the vulnerability search.

So, we found Java bytecode vulnerabilities (we will write in the following articles about how this is done). What to do with the results?

We must show them in terms of the “source” code, more precisely, the recovered high-level code. By vulnerability here we mean a set of positions of instructions in the bytecode that define a vulnerability (an unsafe method call, a set of instructions through which the unsafe data stream passes). Thus, each instruction in bytecode we must match the line number in the recovered code. In the class file (the bytecode file corresponding to the class in the source code) there is the LineNumberTable attribute, which stores the display of bytecode positions to the line numbers in the source code. Thus, to display vulnerabilities in terms of the Java language, it is necessary for the bytecode to contain the LineNumberTable attribute.

When analyzing bytecode (including that obtained from the apk-file), the LineNumberTable attribute may not appear. It can be deleted during compilation or during reverse translation from apk. Although it is not so important - the LineNumberTable removed from the bytecode corresponded to the source code that the developer wrote, and not to the restored “source” code. This means that it is necessary to restore the LineNumberTable attribute in the analyzed bytecode, which will indicate the restored code.

The basic algorithm is based on the construction of an abstract syntax tree of the decompiled code (AST) using Java bytecode and the output of the source code during the AST bypass process. In the process of bypassing the AST (depth to depth), information about the correspondence of the line numbers of the code being restored and the positions of instructions in the bytecode methods is also saved.

At each node of the tree, we know the position of the instruction in the bytecode relative to the beginning of the current method and the line number in the file of the code being restored. Therefore, the crawl also preserves the method boundaries in the restored code for the subsequent filtering of the pairs “position in the bytecode of the method” - “line number in the file”.

Anonymous classes are handled separately, since they spawn methods nested within each other. For this, at the end of the traversal, an analysis of the nesting of the intervals of the method positions in the source code takes place.

Formalization of a restored mapping and recovery algorithm

In practice, we often get projects containing both source code (then we can get bytecode with a table of line numbers by compiling) and bytecode (various third-party components, libraries, and so on). To analyze such projects, inCode implemented a combined preprocessing of a Java project. It consists of the following steps:

all class files, source code files for java and scala, jar / war bytecode files are detected in the project;
depending on the configuration of the project scan that the user specifies, class files from jar / war files are included in the list of class files (most often this means that the project is analyzed along with the libraries);
from class files and source code files we get the full class names, using class names we associate source code files with bytecode files;
Bytecode files for which no source code files were found are decompiled with the procedure for recovering line number information described above.

This preprocessing takes into account anonymous and nested classes — a single source code file can correspond to several bytecode files.

As a result, for each bytecode file we have a file with Java code (either recovered or source code) and a table of line numbers connecting it with this file.

Using the procedures and algorithms described in the article, any vulnerability found in a Java or Android application can be displayed by inCode on the source code, regardless of whether it was submitted for analysis.

A similar approach is used when analyzing binary files of iOS applications, but everything is much more complicated there: the task of decompiling the binary code of the ARM architecture is much less explored. Further publications will be devoted to this topic.

Source: https://habr.com/ru/post/312056/

All Articles

Finding Java bytecode vulnerabilities: what to do with the results?

More articles: