Minifikation programming language

Once I took part in a demo contest (programs that generate an audio-visual series, the main feature of which is an extremely small size - dozens or even kibibyte units).

In the course of a general discussion, someone suggested a non-standard demo idea for the world: to write a program in a scripting language. The fact is that all demos are compressed by the packer to reduce the size (and unpacked when executed). And the text is compressed much better than the binary code. If the interpreter is of very small size, this can be a significant advantage.

Because of my experience in the frontend, I immediately had the idea to additionally minify the code - to remove spaces and optional elements, to reduce the length of identifiers. After all, compression saves all the information, and many syntax elements are not necessary.

But even so, most of the existing languages are not designed for this optimization - obviously, they have many elements that are necessary for an individual to understand, and not a machine. And what if you develop a language specifically designed for minification?

In the end, I did not participate in that competition. However, this idea did not leave me. After all, it can be useful for more practical purposes than a demo - in the world of frontend the volume of client scripts is still extremely important, if you can reduce it, this decision may be justified, at least in some cases.

I decided to conduct an experiment - to make a prototype of the language and see what came of it.

Key features

I put the following key features into my language: dynamic typing, functional paradigm, no separators, and classes of identifiers.

I chose dynamic typing, because for static it is required to describe types, which takes precious space.

Functional paradigm - because it is more expressive (allows a smaller amount of code to express more algorithms).

I’ll dwell on the other two.

No separators

Consider the following code:

add(2, multiple(2, 2))

The number of arguments each function expects (the arity of the function) is known in advance. Brackets and commas are only needed for readability. Remove them:

 add 2 multiple 2 2

You can do the same with operators, simply considering them as functions:

 + 2 * 2 2

At the same time, operators do not need to have priority - the procedure is defined by the order of recording, since to call a function, you first need to calculate all its parameters. So the expression above will return the value 6. And to get 8, you will need to write the code like this:

 * 2 + 2 2

ID Classes

Since my operators are normal functions, I decided to extend this and simply allow any punctuation characters to be used in identifiers.

And then I got the idea to divide all identifiers into two classes: alphabetic and punctuation.

The fact is that in any language, identifiers must be separated by something - either by other elements of the syntax, or by whitespace. Otherwise there will be an ambiguity: xy is an identifier xy or two identifiers x and y ?

However, having two non-intersecting classes of identifiers, I can relax this requirement: identifiers of different classes can be written together, there will be no ambiguity.

Moreover, it is not for nothing that my first grade is only alphabetic - the numbers do not include it. This allows you to write numbers together with any class of identifiers.

For example, take the following expression:

 foo($(5))

In my language it can be written as follows:

 foo$5

Problem solving

During the development of the language, several interesting problems arose, which I did not expect, but still managed to solve: calling functions without arguments and building the code structure (AST).

Calling functions without arguments

Since I do not have a special syntax for calling a function, as in other languages, functions without arguments must be called by simply specifying their name:

 fn answer() 42 ; answer

So it worked, but only at the same level of nesting. What if such a function returns another one the same?

 fn answer() fn() 42 ; ; answer

Then the result is a closure, not an internal value. And how to cause this closure? After all, we have already indicated his name.

I had to use the approach from the Clojure language - trampoline: any values after their calculation fall into a special cycle, which cyclically causes closures that do not require arguments, until the result is something else. Thus the result of the second example above will also be 42, as in the first.

Building a code structure

To be able to carry out the minification, it is necessary to be able to determine the structure of the code without executing it.

And when we know the number of arguments of all functions, it is easy. For example, the code:

 add 2 multiple 2 2

It has the following structure:

However, as soon as I begin to return the closure, ambiguity appears:

 fn add(xy) + xy ; fn increase(x) + x 1 ; fn foo(n) if == n 2 add increase ; foo x ...

How many arguments should I pass to the result of the foo function call? This can only be determined during the execution of the code, but not at the stage of its analysis. And this makes the implementation of minification impossible.

To solve this problem, I extended the typing to semi-static: types need to be specified only for functions, while the role of the type is to indicate only the required number of arguments both for the function itself and for its result, if that is the closure.

 fn make_adder(bias):2 fn(xy) + bias + xy ; ; make_adder 1 2 2

The definition of the make_adder function explicitly states that its result is a closure and expects two parameters. Therefore, it is easy to build the structure of the last call:

General features

The language has the following types: nil , floating-point numbers, lists, hash tables, and closures. Strings are based on lists. Logical values are missing - certain values of other types are considered false, and all others are considered true.

The language has a set of built-in functions: basic math functions and operations (including for bit arithmetic), functions for working with lists and hash tables, basic input-output.

The language supports modularity through dynamic loading of source code files. Caching is supported, the vendor directory and the search in paths specified through a special environment variable.

The language has a small standard library containing a module for working with lists in a functional style and a module for unit testing.

Benchmark

To assess the possibilities of minification, I decided to compare my language with JavaScript. For this, I wrote the same program on both.

I chose the Virtual DOM tree comparison algorithm as the task. Based on these articles:

However, when comparing them, the real DOM immediately changes, but I just generated a list of the required changes, indicating the node address to which they relate.

JavaScript version

 function make_node(type, properties, ...children) { return { 'type': type, 'properties': properties || {}, 'children': children, } } function make_difference(path, action, ...parameters) { return { 'path': path, 'action': action, 'parameters': parameters, } } function is_different(node_1, node_2) { return typeof node_1 !== typeof node_2 || (typeof node_1 === 'object' ? node_1['type'] !== node_2['type'] : node_1 !== node_2) } function compare_property(path, name, old_value, new_value) { let difference if (!new_value) { difference = make_difference(path, 'remove_property', name) } else if (!old_value || new_value !== old_value) { difference = make_difference(path, 'set_property', name, new_value) } return difference } function compare_properties(path, old_properties, new_properties) { const properties = Object.assign({}, old_properties, new_properties) return Object.keys(properties) .map(name => compare_property(path, name, old_properties[name], new_properties[name])) .filter(difference => difference) } function compare_nodes(old_node, new_node, index=0, path=[]) { let differences = [] if (!old_node) { differences.push(make_difference(path, 'create', new_node)) } else if (!new_node) { differences.push(make_difference(path, 'remove', index)) } else if (is_different(old_node, new_node)) { differences.push(make_difference(path, 'replace', index, new_node)) } else if (old_node['type']) { const child_path = [...path, old_node['type']] const properties_differences = compare_properties(child_path, old_node['properties'], new_node['properties']) differences.push(...properties_differences) const maximal_children_length = Math.max(old_node['children'].length, new_node['children'].length) for (let i = 0; i < maximal_children_length; i++) { const child_differences = compare_nodes(old_node['children'][i], new_node['children'][i], i, child_path) differences.push(...child_differences) } } return differences } module['exports'] = { 'make_node': make_node, 'compare_nodes': compare_nodes, }

Version in my language

 let map:2 load "std/list/map"; let filter:2 load "std/list/filter"; let zip_longest:3 load "std/list/zip_longest"; let reduce:3 load "std/list/reduce"; fn make_node(kind properties children) #"type" kind #"properties" properties #"children" children {}; fn make_difference(path action parameters) #"path" path #"action" action #"parameters" parameters {}; fn is_different(node_i node_ii) || != type node_i type node_ii fn() if == "hash" type node_i fn() != ."type"node_i ."type"node_ii; fn() != node_i node_ii; ; ; fn compare_property(path name old_value new_value) if !new_value fn() make_difference path "remove_property" ,name[]; fn() if || !old_value != new_value old_value fn() make_difference path "set_property" ,name,new_value[]; fn() nil; ; ; fn compare_properties(path old_properties new_properties) let properties + old_properties new_properties; let differences map keys properties fn(name) compare_property path name .name old_properties .name new_properties ;; filter differences fn(difference) difference ; ; fn compare_nodes(old_node new_node index path) if !old_node fn() , make_difference path "create" ,new_node[] []; fn() if !new_node fn() , make_difference path "remove" ,index[] []; fn() if is_different old_node new_node fn() , make_difference path "replace" ,index,new_node[] []; fn() if == "hash" type old_node fn() let child_path + path , ."type"old_node []; let properties_differences compare_properties child_path ."properties"old_node ."properties"new_node ; let children_pairs zip_longest ."children"old_node ."children"new_node fn(node_i node_ii) ,node_i,node_ii[] ; ; let children_differences let result reduce {} children_pairs fn(result children_pair) let index ?? ."index"result 0; let differences compare_nodes .0 children_pair .1 children_pair index child_path ; #"differences" + ?? ."differences"result [] differences #"index" ++ index {}; ; ?? ."differences"result [] ; + properties_differences children_differences ; fn() []; ; ; ; ; #"make_node" fn(kind properties children) make_node kind properties children ; #"compare_nodes" fn(old_node new_node index path) compare_nodes old_node new_node index path ; {}

I minified the JavaScript version using the Google Closure Compiler ( JavaScript version ), the version in my language - manually.

Results:

Parameter	Javascript	My language
Volume of the full version	2398 B	2827 B
The volume of the minified version	794 B	872 B
Volume savings	66.89%	69.16%

Results

In order for my idea to make sense, it was necessary to exceed JavaScript in compression several times (after all, a place for the interpreter itself is needed). And the result was even greater.

Thus the experiment ended in failure. The ideas that I laid the foundation for the language did not bring the expected benefits.

Repository

The interpreter source code (implemented in Python), the standard library and examples, as well as the documentation are available in the repository under the MIT license.

Source: https://habr.com/ru/post/351772/

All Articles