Emacs Lisp Alternative

Have you ever looked for an alternative to Emacs Lisp? Let's try to add another programming language to Emacs.

In this article:

The potential benefits to be gained when you can extend Emacs to Go;
Define how Go and Emacs Lisp interact;
We will touch on some details of the implementation of the described transcompiler;

An article may be of interest to users of Emacs, as well as those who care about all these countless implementations of countless programming languages.

At the very end of the article is a link to a work in progress project that allows you to convert Go to Emacs Lisp.

Choosing Emacs Go

Like any other programming language, Emacs Lisp has a number of "flaws", which we will politically correctly call "design tradeoffs". It is rather difficult to say "better" or "worse" about certain properties of the programming language at the objective level, because there will almost always be defenders of opposite positions. We, as users of programming languages, can try to choose the language whose compromises are easier for us to accept due to our tasks or personal preferences. The key point is the choice.

Suppose we chose Go. How will you use Go to interact with the editor?

Your options:

Use Emacs modules to run Go functions. Inspiration can be drawn from the project go-emacs .
Find (or write) the Go interpreter, embed it in Emacs by patching or the same C modules, and then call eval from the editor.
Stream Go to Emacs Lisp bytecode.

There can be more ways, but none of them will be closer to the "native" list than (3). It allows you to have the same virtual machine at the execution level as a regular Emacs Lisp.

This, in turn, means that:

Emacs Lisp code will be able to call translated Go code;
FFI is free. Calling a function already defined in Emacs from Go is as efficient as possible;
Easily distribute converted packages (native to Emacs format);

If this is your first time hearing about Emacs bytecode, read the article by Chris Wellons .

Why go?

In place of Go, there could potentially be any other programming language.

There are several reasons for which the choice made becomes more reasonable. The main ones are:

Language compiler inside the standard library;
Laconic specification;
Modest runtime;
Tooling;

There are also those properties that could be arguments in favor of choice, but specifically for me they were less significant:

Go is a fairly popular language with C-like syntax (i.e., this is not for you Scheme);
Static typing;

Language compiler inside the standard library

go/* packages make it much easier to write Go tools.

No need to write parser, typechecker, and other delights of the frontend compiler. Over 20 lines of code we can get AST and type information for the whole package.

Documentation is mostly good. And for go/types in my opinion - exemplary .

Initially for me it was a killer argument. The task seemed to be 90% solved thanks to this secret weapon: "it remains only to convert AST to Emacs byte-code".

A spoon of tar

In practice, there were difficulties with certain nuances.

First of all, API entanglement and duplication of similar entities with different packages, and even under the same name. Often the same can be done through go/ast and go/types ; it is not uncommon for you to mix entities from both packages (yes, yes, including those with the same name).

Surprisingly inconvenient was the work with imports and declarations (oh, this ast.GenDecl ).

Many solutions with which you can solve these problems look like dirty hacks. A detailed description of these hacks is perhaps the material for a separate article (especially since I did not check the abundance of information on this topic on the Internet, they probably all had time to chew and more than once).

Laconic specification

Creating an implementation that is more (~ 80%) conforming to the specification is quite a feasible task for one person. The Go specification is easy to read, it can be mastered in the evening.

Specification Features:

Some points cast doubt on the unambiguous interpretation. Short price;
In addition to the specification there is also Effective Go . Without it, white spots will remain in the specification;

Modest runtime

The more features in the language that are implemented in the runtime library, the more difficult it will be to transcompile.

If at least temporarily throwing overboard the gorutines and channels, then a compact core will remain, which can be fully realized in terms of Emacs Lisp without loss of performance.

Tooling

It is damn nice when many familiar features work in several of your favorite editors, moreover, uniformly.

For Go, many of the functions that are usually re-inventoried for each IDE separately are implemented as separate utilities. The simplest example known to every Go developer is gofmt . To a large extent this is facilitated by the above go/types ,

go/build and other packages from the go/* group.

Sublime text, Emacs, Visual Studio Code - choose any of them, install the plugin (s), and enjoy refactoring through gorename , a lot of linkers and automatic import'ami. And auto-completion ... exceeds company-elisp in many aspects.

Refactoring and maintaining a project on Emacs Lisp after 1000 lines of code is already uncomfortable for me personally. Emacs is much more convenient to program on Go than on Emacs Lisp.

What does Emacs Go look like

Let's fantasize about what Go might look like for Emacs. How comfortable and functional would it be?

Bridge types

Before talking directly about calling Lisp functions, you need to think about the bridge that connects two programming languages that work on the same computational model.

With primitive types like int , float64 , string and others, everything is more or less simple. Both Go and Emacs Lisp have these types.

Of interest are slices, symbolic types ( symbol ) and unsigned integer types of fixed width ( uintX ).

We implement slices in runtime (for example, on the same Emacs Lisp);
Characters are represented as opaque type;
Unsigned arithmetic with deterministic overflow - we emulate;

A type that can express an "arbitrary type object" that is returned by Emacs Lisp function is called lisp.Object . Its definition is given under the spoiler lisp.Object: implementation details .

Go slices

For an analogy: the slices in Go in its “interface” are std::vector from C ++, but with the ability to take a full-fledged subslice without copying the elements.

Let's start with the intuitive view {data, len, cap} .

data will be a vector, len and cap numbers. To store attributes, select the improper list, where we do not have the final nil, in order to save some memory:

(cons data (cons len cap))

Why a list, not a vector?

In short, then: the choice between the list and the vector is not particularly critical here, so one could take a vector.

A more detailed answer to this question will help find a disassembler (or a table of opcodes ). Access to lists of 2-3 items is very effective. The closer to the head of the list, the more noticeable the difference. The data attribute is used most often, so it is at the very beginning of the list.

With N = 4, we can assume that the list begins to yield in efficiency in the case of reading the last element, but the other three attributes are still more efficient in accessing => even for objects from the four attributes, I tend to believe that the list is a better structure than vector.

Disclaimer: this is all true for the Emacs virtual machine, its instruction set. Out of context is not worth it.

The slice-get / slice-set operations will be very efficient. We will have the same aset / aget , but with one additional car instruction to extract the data attribute.

But what happens when we need a subslice?

In C, you could make data a pointer and shift it to the right place. Addressing would be the same, 0-based. In our case, this is not possible, which makes it necessary to store also the offset:

(cons data (cons offset (cons len cap)))

For each slice-get / slice-set now need to add an offset to the index.

Compare the bytecode for the slice-get operation.

 ;;   <vector> ;; [vector] <index> ;; [vector index] aref ;; [elem] ;; Slice  offset (  subslice) <slice> ;; [slice] car ;; [data] <index> ;; [data index] aref ;; [elem] ;; Slice   subslice <slice> ;; [slice] dup ;; [slice slice] (1) car ;; [slice data] stack-ref 1 ;; [slice data slice] cdr ;; [slice data slice.cdr] car ;; [slice data offset] <index> ;; [slice data offset index] plus ;; [slice data real-index] aref ;; [slice elem] stack-set 1 ;; [elem] (2) ;; (1)  <slice>    ,  ;;   . ;; (2)        ;;  slice  .

Using the <X> notation, expressions are selected that can be arbitrarily complex (from the usual stack-ref , to call with a variety of arguments). The status of the data stack is displayed to the right of the code.

Opaque types

Some types we do not want / can not express as Go structures. These types include lisp.Object , lisp.Symbol and lisp.Number .

The main purpose of the opaque type for us is to prohibit the arbitrary creation of objects through literals. Interface types with a non-exportable method do a great job with this.

 type Symbol interface { symbol() } type Object interface { object() //  ... } //       -. // Intern returns the canonical symbol with specified name. func Intern(name string) Symbol

The Intern function is processed in a special way by the compiler. In other words, it is an intrinsic function.

Now we can be sure that these special types have an API that we want to give them, and not something that is possible according to the laws of Go.

lisp.Object

If lisp.Object represents "any value", then why don't we use interface{} ?

Recall what interface{} in Go is - a structure that stores the dynamic type of an object, plus the object itself - “data”.

This is not exactly what I would like, because for Emacs, this “anything” view is not effective. lisp.Object needed to store unboxed Emacs Lisp values,

which can be easily transferred to lisp functions and get as a result.

In order to get the value of a particular type from lisp.Object , you can add additional methods to its interface.

 type Object interface { object() Int() int Float() float64 String() string // ... etc. //     : IsInt() bool //     GetInt() (val int, ok bool) //  "comma, ok"-style  // ...    . }

Each call generates a type check. If a value other than the requested type is stored inside lisp.Object , panic must be called. Something like API reflect.Value , is not it?

Emacs Lisp from Go

If the signature of the function is unknown, then the only thing that remains is to take a variable number of arguments of an arbitrary type, and return lisp.Object .

 pair := lisp.Call("cons", 1, "2") a, b := lisp.Call("car", pair), lisp.Call("cdr", pair) lisp.Call("insert", "Hello, Emacs!") sum := lisp.Call("+", 1, 2).Int()

Manually annotated functions can be called in a more convenient way.

 part := "c" lisp.Insert("Hello, Emacs!") //  void s := lisp.Concat("a", "b", part) //  string,  ...string

FFI DSL

DSL to annotate functions can be written on macros.

 ;;   ,   FFI. (ffi-declare (concat Concat (:string &parts) :string) (message Message (:string format :any &args) :string) (insert Insert (:any &args) :void) (+ IntAdd (:int &xs) :int)) ;;  , ,  Go .

Such a macro should unfold in Go function signatures. You need to leave a comment directive to preserve information about which Lisp function should be called.

 // IntAdd - ... <  +  Emacs> //$GO-ffi:+ func IntAdd(xs ...int) int // ...

Documentation can be pulled from Emacs using the documentation function. We obtain functions with known arity and at the same time do not lose valuable docstrings.

Go from Emacs Lisp

The result of the transcompilation is the Emacs Lisp package, in which all the characters from Go have a transformed look.

An identifier mapping scheme can be, for example:

 package "foo" func "f" => "$GO-foo.f" package "foo/bar" func "f" => "$GO-foo/bar.g" package "foo" func (typ) "m" => "$GO-foo.typ.m" package "foo" var "v" => "$GO-foo.v"

Accordingly, in order to call a function or use a variable, you need to know which Go package it belonged to (and its name, of course). The $GO prefix avoids conflicts with names already defined in Emacs.

Transcompilation subtleties

Bytecode or lapcode?

As an output format, you can choose among three options:

Emacs Lisp code (source-to-source compilation)
Bytecode
Lapcode (Lisp Assembly Program)

The first option loses much to the other options, because it will not allow for the effective translation of the return statement , and it is also more difficult to implement the goto (which is in Go).

The second and third options are almost equivalent in their capabilities.

Bytecode is an analogue of machine code, the lowest level;
Lapcode is an assembly language for a virtual machine with a stack architecture;

The Emacs compiler can optimize at the source code level and lapcode views.

If we choose lapcode, we can additionally apply low-level optimizations,

implemented by emacs developers.

Disadvantages lapcode

The Lisp assembly program is the internal format of the Emacs compiler (IR). Documentation on it is even less than on byte code.

Writing on this "assembler" on your own is almost impossible because of the features of the optimizer, which can break your code.

I did not find an exact description of the format of instructions. This is where the trial and error method helps, as well as reading the source code for the Emacs Lisp compiler (you will need steel nerves).

Generated Code Performance

Go, which runs inside Emacs VM cannot be faster than Emacs Lisp.

Or maybe?

Emacs Lisp has dynamic scoping for variables. If you look at "emacs / lisp / emacs-lisp / byte-opt.el" , you can find many references to this feature of the language; because of it, some optimizations are either impossible or much more difficult.

There are no constants in Emacs Lisp. Names declared using defconstant less immutable than those defined by defvar . In Go, constants are embedded in the place of use, which allows you to collapse more constant expressions.

Optimizing Go code is easier, so you can expect at least performance that is not inferior to normal Emacs Lisp code. Potentially, overtaking in terms of speed is real.

Implementation difficulties

Even without Gorutin, there are Go features that do not have an obvious and / or optimal implementation within Emacs VM.

The most interesting difficulty are pointers.

In the context of the task, we can distinguish two categories of values in Emacs Lisp:

Reference types ( string , vector , list / cons )
Value Types ( integer and float )

For reference types, the problem is solved easier.

Taking an address from an int or float variable requires processing

more boundary cases.

We should also remember that the operator = defined for pointers,

therefore, the proposed solution must respect the identity of the addresses of the objects.

Wrapping the number in cons , we fly by semantics,

because the value from which the address was taken,

will not change if the data stored in cons changes.

If all numbers are initially created in boxed form (inside cons ),

greatly increase the number of allocations.

Unpacking will require additional car instructions every time a value is read.

Implementing pointers through cons has a significant flaw: &x != &x ,

because (eq (cons x nil) (cons x nil)) always false.

Correct emulation of pointer semantics is an open question.

I will be glad to hear your ideas for their implementation.

Seems like go-ism inside emacs

The goism project is a tool that allows you to get near-optimal Emacs Lisp bytecode from Go packets.

The runtime library was originally written in Lisp, but more recently it has been completely rewritten to be broadcast in lapcode Go.

emacs / rt is currently one of the largest packages written with goism .

At the moment, goism is not particularly friendly towards the end user,

have to work with your hands to properly assemble and configure it

( guick start guide should simplify the task).

Why is the article written right now, and not when a more stable version was released? The answer is quite simple: there is no guarantee that this version will ever be ready, plus you can refine it for a very, very long time.

I would like to know if this idea seems interesting and useful to members of the habr-community.

Source: https://habr.com/ru/post/331134/

All Articles