Very happy to announce the completion of my first compiler for a programming language!
Malcc is an incremental AOT Lisp compiler written in C.Briefly tell about his many years of development and what I learned in the process. Alternative article title: "How to write a compiler in ten years or less."
(At the end there is a
TL; DR , if you do not care about the background).
Demo compiler
tim ~/pp/malcc master 0 → ./malcc Mal [malcc] user> (println "hello world") hello world nil user> (+ 1 2) 3 user> (def! fib2 (fn* (n) (let* (f (fn* (n1 n2 c) (if (= cn) n2 (f n2 (+ n1 n2) (+ c 1))))) (f 0 1 1)))) <lambda> user> (fib2 25) 75025 user> ^D% tim ~/pp/malcc master 0 → ./malcc examples/hello.mal hello world tim ~/pp/malcc master 0 → ./malcc --compile examples/hello.mal hello gcc -g -I ./tinycc -I . -o hello hello.c ./reader.c ./printer.c ./hashmap.c ./types.c ./util.c ./env.c ./core.c ./tinycc/libtcc.a -ledit -lgc -lpcre -ldl tim ~/pp/malcc master 0 → ./hello hello world tim ~/pp/malcc master 0 →
Successful failures
For almost ten years I dreamed of writing a compiler. I have always been fascinated by the work of programming languages, especially compilers. Although I imagined the compiler as dark magic and realized that making it from scratch is impossible for a mere mortal like me.
')
But I still tried and studied along the way!
First, the interpreter
In 2011, I began work on a simple interpreter for the fictional language Airball (airball can be translated as "Mazil"). By name you can assess the degree of my uncertainty that it will work. It was a fairly simple Ruby program that analyzed the code and walked around the
abstract syntax tree (AST). When the interpreter did work, I renamed it
Lydia and rewrote C to make it faster.

I remember the Lydia syntax seemed very clever to me! I still enjoy its simplicity.
Although Lydia was far from the ideal compiler, it inspired me to continue the experiments. However, I was still tormented by questions, how to make the compiler work:
what to compile? Do I need to learn assembler?Second, the bytecode compiler and interpreter
As a next step in 2014, I started working on a
scheme-vm - a
virtual machine for Scheme, written in Ruby. I thought that a virtual machine with its own stack and bytecode would be a transitional stage from an interpreter with AST passes and a full compiler. And since Scheme is
formally defined , you don’t have to invent anything.
I have been fiddling with the scheme-vm for more than three years and have learned a lot about compilation. In the end, I realized that I could not finish this project. The code has become a real chaos, and the end was not visible. Without a mentor or experience, I seemed to be wandering in the dark. As it turned out,
the language
specification is not the same as its
manual . Lesson learned!
By the end of 2017, I postponed the scheme-vm in search of something better.
Meeting with Mal

Somewhere in the year 2018,
Mal was caught by me, a Clojure-like Lisp interpreter.
Mal was invented by Joel Martin as an educational tool. Since then, more than 75 implementations in different languages ​​have been developed! When I looked at these implementations, I realized that they are very helpful: if I get stuck, I can go look for hints in the Ruby or Python version. Finally, at least someone speaks my language!
I also thought that if I could write an interpreter for Mal, I could repeat the same steps - and make a compiler for Mal.
Mal to Rust interpreter
First, I started developing the interpreter according to the
step-by-step guide . At that time I was actively studying Rust (I’ll leave it for another article), so I wrote my own Mal implementation on Rust:
mal-rust . See more about this experiment
here .
It was a perfect pleasure! I don’t know how to express gratitude or praise Joel for creating an excellent guide to Mal. Each step is described
in detail , there are flowcharts, pseudocode and
tests ! Everything a developer needs to create a programming language from start to finish.
By the end of the tutorial, I was able to run my Mal implementation for Mal written in Mal, on top of my implementation of Rust. (two levels of depth, uh). When she worked for the first time, I jumped up into a chair with excitement!
Mal C Compiler
As soon as I proved the viability of mal-rust, I immediately began to investigate how to write a compiler. Compile to assembler? Will I be able to compile the machine code directly?
I saw an x86 assembler written in Ruby. He intrigued me, but the thought of working with an assembler made me stop.
At some point, I came across this
comment on Hacker News , where the
Tiny C Compiler was referred to as a “compilation backend”. It seemed like a great idea!
TinyCC has a test file showing
how to use libtcc to compile C code from program C. This is the starting point for “hello world”.
Again, returning to Mal's step-by-step tutorial, recalling my knowledge of C, in a couple of months of free evenings and weekends I was able to write the Mal compiler. It was a pleasure.

If you are used to developing through testing, then assess the availability of a preliminary test suite. Tests lead to a working implementation.
I can not say much about this process, except to repeat: the guide to Mal is a real treasure. At every step I knew exactly what to do!
Difficulties
Looking back, here are some difficulties when writing the Mal compiler where you had to tinker:
- Macros must be compiled on the fly and be ready for execution at compile time. This is a bit puzzling.
- It is necessary to provide an “environment” (a tree of hashes / associative arrays / dictionaries with variables and their values) both for the compiler code and for the final code of the compiled program. This allows macros to be defined at compile time.
- Since the environment is available at compile time, initially Malcc at compile time caught indefinite errors (access to a variable that was not defined), and in a couple of places it violated the expectations of the test suite. In the end, to pass the tests, I turned off this feature. It would be great to add it back as an additional compiler flag, because you can catch a lot of errors in advance.
- I compiled C code, writing in three lines of structure:
top
: top level code - functions heredecl
: declaration and initialization of variables used in the bodybody
: the body where the main work is performed
- All day I wondered if I should not write my own garbage collector, but decided to leave this exercise for later. The Boehm-Demers-Weiser garbage collection library is easy to connect and is available for many platforms.
- It is important to review the code that your compiler writes. Whenever the compiler encountered the
DEBUG
environment variable, it would produce a compiled C code where you can view the errors.
What would i do otherwise
- Writing C code and trying to save the indentation was not easy, then I would not give up on automation. It seems to me that some compilers write ugly code, and then a special library “decorates” it before issuing it. It needs to be studied!
- Adding to the lines when generating code is a bit messy. One could consider creating an AST, and then converting to the last line of C code. This should bring the code in order and give harmony.
Now advice
I like that the compiler took almost a decade. No, really. Each step on the path is a pleasant memory, as I gradually became a better programmer.
But this does not mean that I "finished." There are still many hundreds of methods and tools that need to be studied in order to feel like a real compiler writer. But I can confidently say: "I did it."
Here is the whole process in a compressed form, how to make your own Lisp compiler:
- Choose a language in which you feel comfortable. You do not want to simultaneously learn a new language and how to write another new language.
- Follow Mal to write the interpreter.
- Rejoice!
- Follow the instructions again, but instead of executing the code, write code that executes the code. (Not just "refactoring" the existing interpreter. We must start from scratch, although copy-paste is not prohibited).
I believe that this method can be used with any programming language that is compiled into an executable file. For example, you can:
- Write the Mal interpreter on Go .
- Modify your code to:
- create a line of Go code and write it to a file;
- compile this resulting file with
go build
.
Ideally, it is better to control the Go compiler as a library, but this is also a way to make a compiler!
With Mal and your ingenuity, you can do all this. Even if I could, then you can!