Work with abstract JavaScript syntax trees

Why parse your code? For example, in order to find the forgotten console.log before a commit. And what if you need to change the function signature in hundreds of entries in the code? Can regular expressions handle this? This article will show you what possibilities open abstract syntax trees to the developer.

Under the cut - video and text transcript of the report by Kirill Cherkashin ( z6Dabrata ) from the HolyJS 2018 Piter conference.

about the author
Kirill was born in Moscow, now lives in New York and works in Firebase. Teaches Angular, not only at Google, but around the world. The organizer of the largest Angular mitap in the world is AngularNYC (as well as VueNYC and ReactNYC). In his free time programming tango, books and pleasant conversations.
')

Hacksaw or wood?

Let's start with an example: let's say you debugged the program and sent the changes you made to git, then quietly went to sleep. In the morning it turned out that my colleagues downloaded your changes and, since you forgot to remove the output of debug information to the console the day before, it displays them and clogs the output. Many faced a similar problem.

There are tools, such as EsLint , to correct the situation, but for educational purposes, let's try to find a solution on our own.
What tool should I use to remove all console.log() from the code?
Choose between regular expressions and the use of Abstract syntax trees (ASD). Let's try to solve this by regular expressions for now, by writing a certain function findConsoleLog . At the input, as an argument, it will receive the program code and output true if console.log () is found somewhere in the program text.

 function findConsoleLog(code) { return !!code.match(/console.log/); }

I wrote 17 tests, trying to come up with various ways to break our function. This list is far from complete.

The easiest test passed.
And if suddenly a function contains the string "console.log" in its name?

 function findConsoleLog(code) { return !!code.match(/\bconsole.log/); }

Added a symbol that indicates that console.log should appear at the beginning of a word.

Only two tests have been completed, but what if console.log is in a comment and does not need to be deleted?

Rewrite it so that the parser does not touch the comments.

 function findConsoleLog(code) { return !!code   .replace(/\/\/.*/)   .match(/\bconsole.log/); }

We exclude the deletion of “console.log” from the lines:

 function findConsoleLog(code) { return !!code   .replace(/\/\/.*|'.*'/, '')   .match(/\bconsole.log/); }

Do not forget that we still have spaces and other characters that may prevent some tests from passing:

Despite the fact that the idea was not entirely simple, all 17 tests using regular expressions can be passed. So, in this case, the solution code will look like:

 function findConsoleLog(code) { return code   .replace(/\/\/.*|'.*?[^\\]'|".*?"|`[\s\S]*`|\/\*[\s\S]*\*\//)   .match(/\bconsole\s*.log\(/); }

The problem is that this code does not cover all possible cases, and it is rather difficult to maintain it.

Consider how to solve this problem using the SDA.

How are trees grown?

Abstract syntax tree is obtained as a result of the parser with the code of your application. For the demonstration, the @ babel / parser parser was used .
As an example, take the line console.log('holy') , skip it through the parser.

 import { parse } from 'babylon'; parse("console.log('holy')");

As a result of his work, a JSON file of about 300 lines is obtained. We exclude from their number the lines with the service information. We are interested in the body section. Meta-information also does not interest us. The result is about 100 lines. Compared to what structure the browser generates for one body variable (about 300 lines), this is a bit.

Consider a few examples of how different literals are represented in code in a syntax tree:

This is an expression that has the Numeric Literal, a numeric literal.

Already familiar to us console.log expression. It has an object that has a property.

If log is a function call, then the description looks like this: there is a call expression, it has arguments — numeric literals. At the same time, the calling expression has a name - log.

Literals are different: numbers, strings, regular expressions, boolean, null.
Let's go back to the console.log call.

This is a call expression within which there is a Member Expression. From it it is clear that the console object has a property inside, which is called log.

Bypass ASD

Now we will try to work with this structure in the code. To traverse the tree, the babel-traverse library will be used .

The same 17 tests are given. Such code is obtained by analyzing the program's syntax tree and searching for the entries of “console.log”:

 function traverseConsoleLog(code, {babylon, babelTraverse, types, log}) { const ast = babylon.parse(code); let hasConsoleLog = false; babelTraverse(ast, {   MemberExpression(path){     if (       path.node.property.type === 'Identifier' &&       path.node.property.name === 'log' &&       path.node.object.type === 'Identifier' &&       path.node.object.name === 'console' &&       path.parent.type === 'CallExpression' &&       path.Parentkey === 'callee'     ) {       hasConsoleLog = true;     }   } }) return hasConsoleLog; }

Let us examine what is written here. const ast = babylon.parse(code); in par ast parsim syntax tree from code. Next we give the library babel-parse this tree for processing. We look for nodes and properties in it with matching names inside call expressions. Set the hasConsoleLog variable to true if the required combination of nodes and their names is found.

We can move around the tree, take the parents of nodes, descendants, look for what arguments and properties they have, look at the names of these properties, types - this is very convenient.

There is an unpleasant nuance that is easy to fix with the help of the babel-types library. To avoid mistakes when searching the tree because of an incorrect name, for example, you accidentally wrote path.parent.type === 'callExpression' instead of path.parent.type === 'CallExpression' with babel-types you can write :

 // Before path.node.property.type === 'Identifier' path.node.property.name === 'log' // with babel types import {isIdentifier} from 'babel-types'; isIdentifier(path.node.property, {name: log}) //         ,  ,    isIdentifier,

Rewrite the previous code using babel-types:

 function traverseConsoleLogSolved2(code, {babylon, babelTraverse, types}) { const ast = babylon.parse(code); let hasConsoleLog = false; babelTraverse(ast, {   MemberExpression(path) {     if (       types.isIdentifier(path.node.object, { name: 'console'}) &&       types.isIdentifier(path.node.property, { name: 'log'}) &&       types.isCallExpression(path.parent) &&       path.parentKey === 'callee'     ) {       hasConsoleLog = true;     }   } }); return hasConsoleLog; }

Transform ASD with babel-traverse

To reduce labor costs, we need console.log immediately removed from the code - instead of the signal that it is in the code.

Since we do not need to delete MemberExpression itself, but its parent, in place hasConsoleLog = true; we write path.parentPath.remove(); .

From the removeConsoleLog function, we still return a boolean value. We replace its output with the code that will generate the babel-generator, like this:
hasConsoleLog => babelGenerator(ast).code

Babel-generator gets the modified abstract syntax tree as a parameter, returns an object with the code property, the generated code without console.log inside this object. By the way, if we want to get a code map, we can call the sourceMaps property for this object.

And if you need to find a debugger?

This time we will use ASTexplorer to complete the task. Debugger is a debugger statement node type. We do not need to look at the whole structure, since this is a special kind of node, it is enough just to find the debugger statement. We will write a plugin for ESLint (on ASTexplorer).

ASTexplorer is designed in such a way that you write the code on the left, and on the right you get the finished ASD. You can choose in which format you want to receive it: JSON or in the tree format.

Since we are using ESLint, it will do all the work for finding files for us and will give us the necessary file so that we can find the string debugger in it. This tool uses another ASD parser. However, the ASD itself in JavaScript, there are several types. Something like the past, when different browsers implemented the specification in different ways. Thus, we implement a debugger search:

 export default function(context) { return {   DebuggerStatement(node) { // ,     console.log    path,    -  ,     path         context.report(node, 'LOL Debugger!!!'); //   ESLint ,   debugger, node     ,    ,    debugger   } } }

Checking the work of the written plugin:

Similarly, you can remove the debugger from the code.

What else is useful ASD

I personally use ASD to simplify working with Angular and other front-end frameworks. You can import, expand, add an interface, a method, a decorator and something else with the push of a button. Although this is a case of Javascript, however, TypeScript also has its own ASD, the only difference is in the difference between the names of the node types and the structure. In the same ASTExplorer, you can choose as a TypeScript language.

So:

We have more control over the code, easier refactoring, codemods. For example, before a commit, by pressing a single key, you can format the entire code in accordance with the guidelines. Codemods implies automatic code casting in accordance with the required version of the framework.
Less disputes about the design of the code.
You can create game projects. For example, automatically give the programmer feedback on the code that he writes.
Better understanding of javascript.

Some useful links for babel

All Babel transformations use this API: plugins and presets .
Part of the process of adding new functionality to ECMAScript is creating a plug-in for Babel. This is necessary so that people can test the new functionality. If you follow the link , you can see that the capabilities of the ASD are also used inside. For example, logical-assignment-operator .
Babel Generator loses formatting when generating code. This is partly good, since if this tool is used in the development team, then after generating the code from the ASD, it will look the same for everyone. But if you want to keep your formatting, you can use one of these tools: Recast or Babel CodeMod .
From this link you can find a wealth of information on Babel Awesome Babel .
Babel is an open source project with a team of volunteers working on it. You can help. There are three ways to do this: cash aid, you can support the patreon website, which employs Henry Zhu, one of the key contributors to babel, to help with the code on opencollective.com/babel .

Bonus

How else can you find our console.log in code? Use your IDE! Using the "find and replace" tool, having previously selected where to look for the code.
Also in Intellij IDEA there is a “structural search” tool that can help you find the right places in the code, by the way, it uses the ASD.

November 24-25, Kirill will speak at the Moscow HolyJS with the report “JavaScript * LOVES * binary data” : let's get down to the level of binary data, dig in binary files using the example of * .gif files and deal with serializing frameworks such as Protobuf or Thrift. After the report, it will be possible to talk with Kirill and discuss all questions of interest in the discussion area.

Source: https://habr.com/ru/post/428628/

All Articles