LISP interpreter on pure C

I love C for its simplicity and efficiency. However, it cannot be called flexible and extensible. There is another simple language with unprecedented flexibility and extensibility, but losing C in resource efficiency. I mean LISP. Both languages were used for system programming and have a long and glorious history.

For quite a long time I have been pondering the idea of uniting the approaches of both these languages. Its essence lies in the implementation of a programming language based on LISP, which solves the same tasks as C: ensuring a high degree of control over the equipment (including low-level memory access). In practice, this will be a LISP macro system that generates a binary code. The possibilities of LISP for preprocessing source code, it seems to me, will provide unprecedented flexibility in comparison with C preprocessor or C ++ templates, while maintaining the original simplicity of the language. This will give the opportunity on the basis of such DSL to build on new extensions that increase the speed and convenience of development. In particular, the LISP system itself can be implemented in this language.

Writing a compiler requires a code generator, and ultimately an assembler. Therefore, practical research should begin with the implementation of the assembler (for a subset of instructions of the target processor). I was interested in minimizing any dependencies on specific technologies, programming languages and operating system. Therefore, I decided to implement from scratch on C the simplest interpreter of an improvised LISP dialect, and also write to it a system of macro extensions, which make it convenient to code on a subset of the x86 assembler. The culmination of my efforts should be the resulting boot image that displays "Hello world!" In real processor mode.
')
At the moment, I have implemented a working interpreter (file int.c, about 900 lines of C code), as well as a set of basic functions and macros (file lib.l, about 100 lines of LISP code). Those who are interested in the principles of LISP-code execution, as well as the details of the implementation of the interpreter, please under the cat.

The basic unit of LISP computing is a dotted pair. In a classic McCarthy lisp, the point pair and the symbol are the only two data types. In practical implementations, this set has to be expanded at least by numbers. In addition, strings and arrays are also added to the base types (the first is a variation of the second). In the pursuit of simplification, there is the temptation to view strings as a list of numbers, but I deliberately abandoned this idea as a dramatically restricting language in the real world. I decided to use double as a container for numbers.

So, we have the following basic data types: dotted pair, character, number, string (pascal style, since this will allow storing arbitrary binary data in an unchanged form). Since I am working on an interpreter (and not on a compiler), I could restrict myself to this set (functions and macros can be represented by ordinary s-expressions), but for convenience of implementation 4 additional types were added: function, macro, built-in function and built-in macro. So, we have the following structure for the s-expression:

struct l_env; typedef struct s_expr *(*built_in) (struct s_expr*, struct l_env*, struct file_pos*); struct s_expr { enum { DOTTED_PAIR, STRING, SYMBOL, NUMBER, FUNCTION, MACRO, BUILT_IN_FUNCTION, BUILT_IN_MACRO } type; union { struct { struct s_expr *first, *rest; } pair; struct { char *ptr; size_t size; } string; struct { struct s_expr *expr; struct l_env *env; } function; char *symbol; double number; built_in built_in; } u; }; struct l_env { char *symbol; struct s_expr *expr; struct l_env *next; };

This structure is not optimal in terms of saving resources and performance, but I did not set myself the goal of building an effective implementation. First of all, the simplicity and conciseness of the code was important. I even had to give up memory management: all memory is allocated without being freed. In fact, for my practical problem this solution is valid: the interpreter will not work for a long time: its task is only to translate the code into binary form.

As you can see from the above code, the function (and macro) refers to the l_env structure. It is a basic element of the lexical environment, stored in a list. Of course, this is inefficient, since it involves sequential access to the characters. But this is a very simple and convenient structure for supporting local variables: they are added to the head of the list, while global ones are added to the tail. It is very easy to get rid of local variables (when exiting a function or from a let block), simply ignoring the front part of this list. The function’s own lexical environment allows the implementation of closures.

On the basis of the above structure of the s-expression, it is easy to build a function to calculate it:

 struct s_expr *eval_s_expr (struct s_expr *expr, struct l_env *env, struct file_pos *pos) { struct s_expr *first, *in = expr; struct l_env *benv; trace_put("%s -> ...", in, NULL, env); if (expr) if (expr->type == SYMBOL) if (find_symbol(expr->u.symbol, &env)) expr = env->expr; else error(UNBOUND_SYMBOL_MSG, pos, expr->u.symbol); else if (expr->type == DOTTED_PAIR) { first = eval_s_expr(expr->u.pair.first, env, pos); if (!first || first->type == DOTTED_PAIR || first->type == SYMBOL || first->type == STRING || first->type == NUMBER) error(NON_FUNC_MACRO_MSG, pos, s_expr_string(first, env)); expr = first->type == FUNCTION || first->type == BUILT_IN_FUNCTION ? map_eval(expr->u.pair.rest, env, pos) : expr->u.pair.rest; if (first->type == FUNCTION || first->type == MACRO) { assert(first->u.function.expr->type == DOTTED_PAIR); benv = apply_args(first->u.function.expr->u.pair.first, expr, first->u.function.env, pos); expr = eval_list(first->u.function.expr->u.pair.rest, benv, pos); if (first->type == MACRO) { trace_put("%s ~> %s", in, expr, env); expr = eval_s_expr(expr, env, pos); } } else expr = first->u.built_in(expr, env, pos); } trace_put("%s -> %s", in, expr, env); return expr; }

If the computable expression is a symbol, we simply look for its value in the current lexical environment (find_symbol). If the function call: we first calculate the actual parameters using the current lexical environment (map_eval), then bind them to the symbols of formal parameters (apply_args) already in the lexical environment of the function itself. Then we sequentially calculate the elements of the body based on the resulting lexical environment, returning the value of the last expression (eval_list). To call a macro, the calculation order is somewhat different. Actual parameters are not calculated, but transmitted in unchanged form. In addition, the resulting macro expression (macro substitution) is subjected to additional calculation. Numbers, strings, functions, and macros are computed into themselves.

The full text of the int.c file

 #include <assert.h> #include <ctype.h> #include <float.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #define LINE_COMMENT_CHAR ';' #define BLOCK_COMMENT_CHAR1 ';' #define BLOCK_COMMENT_CHAR2 '|' #define LIST_OPEN_BRACE_CHAR '(' #define LIST_CLOSE_BRACE_CHAR ')' #define LIST_DOT_CHAR '.' #define STRING_DELIMITER_CHAR '"' #define STRING_ESCAPE_CHAR '\\' #define NUMBER_PREFIX_CHAR '$' #define NUMBER_FORMAT_HEX_CHAR 'h' #define NUMBER_FORMAT_OCT_CHAR 'o' #define NIL_SYMBOL_STR "_" #define TRUE_SYMBOL_STR "t" #define TRACE_SYMBOL_STR "trace" #define CAR_SYMBOL_STR "@" #define CDR_SYMBOL_STR "%" #define CONS_SYMBOL_STR "^" #define IF_SYMBOL_STR "?" #define LAMBDA_SYMBOL_STR "!" #define MACRO_SYMBOL_STR "#" #define SETQ_SYMBOL_STR "=" #define QUOTE_SYMBOL_STR "'" #define PLUS_SYMBOL_STR "+" #define GREATER_SYMBOL_STR ">" #define FUNCTION_STR_FORMAT "<!%s>" #define MACRO_STR_FORMAT "<#%s>" #define OUT_OF_MEMORY_MSG "out of memory" #define UNEXPECTED_EOF_MSG "unexpected end of file" #define BAD_SYNTAX_MSG "bad syntax" #define NON_FUNC_MACRO_MSG "expression %s is neither a function nor a macro" #define NON_NONEMPTY_LIST_MSG "expression %s is not a nonempty list" #define NON_LIST_MSG "expression %s is not a proper list" #define UNBOUND_SYMBOL_MSG "unbound symbol %s" #define BAD_FORMAL_ARGS_MSG "bad formal arguments %s" #define BAD_ACTUAL_ARGS_MSG "bad actual arguments %s" #define STRING_OVERFLOW_MSG "string size overflow" #define NUMBER_LENGTH_MAX 32 #define SYMBOL_LENGTH_MAX 32 #define STRING_LENGTH_MAX 256 #define S_EXPR_LENGTH_MAX 1024 struct file_pos { char *filename; int line, chr; }; struct l_env; typedef struct s_expr *(*built_in) (struct s_expr*, struct l_env*, struct file_pos*); struct s_expr { enum { DOTTED_PAIR, STRING, SYMBOL, NUMBER, FUNCTION, MACRO, BUILT_IN_FUNCTION, BUILT_IN_MACRO } type; union { struct { struct s_expr *first, *rest; } pair; struct { char *ptr; size_t size; } string; struct { struct s_expr *expr; struct l_env *env; } function; char *symbol; double number; built_in built_in; } u; }; void error(char *message, struct file_pos *pos, char *expr) { if (pos) printf("Error at %s:%d:%d: ", pos->filename, pos->line, pos->chr); else printf("Error: "); if (expr) printf(message, expr); else printf("%s", message); puts(""); exit(1); } void *alloc_mem(size_t size) { void *ptr = malloc(size); if (!ptr) error(OUT_OF_MEMORY_MSG, NULL, NULL); return ptr; } struct s_expr *true_ () { static struct s_expr *expr = NULL; if (!expr) { expr = alloc_mem(sizeof(*expr)); expr->type = SYMBOL; expr->u.symbol = TRUE_SYMBOL_STR; } return expr; } int get_char(FILE *file, struct file_pos *pos) { int chr = getc(file); if (chr == '\n') pos->line++, pos->chr = 1; else if (chr != EOF) pos->chr++; return chr; } int next_char(FILE *file) { int chr = getc(file); ungetc(chr, file); return chr; } int get_significant_char (FILE *file, struct file_pos *pos) { enum { NO_COMMENT, LINE_COMMENT, BLOCK_COMMENT } state = NO_COMMENT; int chr; while (1) { chr = get_char(file, pos); if (state == NO_COMMENT) { if (chr == BLOCK_COMMENT_CHAR1 && next_char(file) == BLOCK_COMMENT_CHAR2) { get_char(file, pos); state = BLOCK_COMMENT; continue; } if (chr == LINE_COMMENT_CHAR) state = LINE_COMMENT; else if (chr != ' ' && chr != '\t' && chr != '\r' && chr != '\n') return chr; } else if (state == BLOCK_COMMENT) { if (chr == BLOCK_COMMENT_CHAR2 && next_char(file) == BLOCK_COMMENT_CHAR1) { get_char(file, pos); state = NO_COMMENT; } else if (chr == EOF) error(UNEXPECTED_EOF_MSG, pos, NULL); } else if (state == LINE_COMMENT) { if (chr == '\n') state = NO_COMMENT; else if (chr == EOF) return EOF; } } } struct s_expr *parse_s_expr (FILE*, struct file_pos*); struct s_expr *parse_list (FILE *file, struct file_pos *pos) { struct s_expr *expr, *rest; int chr; chr = get_significant_char(file, pos); if (chr == LIST_CLOSE_BRACE_CHAR) return NULL; ungetc(chr, file); pos->chr--; expr = alloc_mem(sizeof(*expr)); expr->type = DOTTED_PAIR; expr->u.pair.first = parse_s_expr(file, pos); rest = expr; while (1) { chr = get_significant_char(file, pos); if (chr == LIST_DOT_CHAR) { rest->u.pair.rest = parse_s_expr(file, pos); if (get_significant_char(file, pos) != LIST_CLOSE_BRACE_CHAR) error(BAD_SYNTAX_MSG, pos, NULL); break; } else if (chr == LIST_CLOSE_BRACE_CHAR) { rest->u.pair.rest = NULL; break; } else if (chr == EOF) error(UNEXPECTED_EOF_MSG, pos, NULL); else { ungetc(chr, file); pos->chr--; rest->u.pair.rest = alloc_mem(sizeof(*expr)); rest->u.pair.rest->type = DOTTED_PAIR; rest->u.pair.rest->u.pair.first = parse_s_expr(file, pos); rest = rest->u.pair.rest; } } return expr; } void read_escape_seq (FILE *file, struct file_pos *pos, char *buf) { /* TODO: add support for escape sequences */ } struct s_expr *parse_string (FILE *file, struct file_pos *pos) { char buf[STRING_LENGTH_MAX]; struct s_expr *expr; int chr, i = 0; while (i < STRING_LENGTH_MAX) { chr = get_char(file, pos); if (chr == STRING_ESCAPE_CHAR) read_escape_seq(file, pos, buf); else if (chr == STRING_DELIMITER_CHAR) break; else if (chr == EOF) error(UNEXPECTED_EOF_MSG, pos, NULL); else buf[i++] = chr; } expr = alloc_mem(sizeof(*expr)); expr->type = STRING; expr->u.string.ptr = i ? alloc_mem(i) : NULL; memcpy(expr->u.string.ptr, buf, i); expr->u.string.size = i; return expr; } void read_double (FILE *file, struct file_pos *pos, char *buf) { int chr, i = 0, point = -1; chr = next_char(file); if (chr == '+' || chr == '-') { get_char(file, pos); buf[i++] = chr; } while (i < NUMBER_LENGTH_MAX && isdigit(next_char(file))) buf[i++] = get_char(file, pos); if (i < NUMBER_LENGTH_MAX && next_char(file) == '.') buf[point = i++] = get_char(file, pos); while (i < NUMBER_LENGTH_MAX && isdigit(next_char(file))) buf[i++] = get_char(file, pos); chr = next_char(file); if (i < NUMBER_LENGTH_MAX && (chr == 'e' || chr == 'E') && i > point + 1) { get_char(file, pos); buf[i++] = chr; chr = next_char(file); if (i < NUMBER_LENGTH_MAX && (chr == '+' || chr == '-')) { get_char(file, pos); buf[i++] = chr; } while (i < NUMBER_LENGTH_MAX && isdigit(next_char(file))) buf[i++] = get_char(file, pos); } if (i && i < NUMBER_LENGTH_MAX) buf[i] = 0; else error(BAD_SYNTAX_MSG, pos, NULL); } void read_int (FILE *file, struct file_pos *pos, int base, char *buf) { int chr, i = 0; assert(base == 8 || base == 16); for (; i < NUMBER_LENGTH_MAX; get_char(file, pos)) { chr = next_char(file); if ((base == 16 && isxdigit(chr)) || (chr >= '0' && chr <= '7')) buf[i++] = chr; else break; } if (i && i < NUMBER_LENGTH_MAX) buf[i] = 0; else error(BAD_SYNTAX_MSG, pos, NULL); } struct s_expr *parse_number (FILE *file, struct file_pos *pos) { char buf[NUMBER_LENGTH_MAX + 1]; struct s_expr *expr; int inum; expr = alloc_mem(sizeof(*expr)); expr->type = NUMBER; switch (next_char(file)) { case NUMBER_FORMAT_HEX_CHAR: get_char(file, pos); read_int(file, pos, 16, buf); sscanf(buf, "%x", &inum); expr->u.number = inum; break; case NUMBER_FORMAT_OCT_CHAR: get_char(file, pos); read_int(file, pos, 8, buf); sscanf(buf, "%o", &inum); expr->u.number = inum; break; default: read_double(file, pos, buf); sscanf(buf, "%lf", &expr->u.number); break; } return expr; } struct s_expr *parse_symbol (FILE *file, struct file_pos *pos) { char buf[NUMBER_LENGTH_MAX + 1]; struct s_expr *expr; int chr, chr2, i = 0; for (; i < NUMBER_LENGTH_MAX; get_char(file, pos)) { chr = next_char(file); if (chr == BLOCK_COMMENT_CHAR1) { get_char(file, pos); chr2 = next_char(file); ungetc(chr2, file); pos->chr--; if (chr2 == BLOCK_COMMENT_CHAR2) break; } if (chr >= '!' && chr <= '~' && chr != LINE_COMMENT_CHAR && chr != LIST_OPEN_BRACE_CHAR && chr != LIST_CLOSE_BRACE_CHAR && chr != LIST_DOT_CHAR && chr != STRING_DELIMITER_CHAR && chr != NUMBER_PREFIX_CHAR) buf[i++] = chr; else break; } if (i && i < SYMBOL_LENGTH_MAX) buf[i] = 0; else error(BAD_SYNTAX_MSG, pos, NULL); if(!strcmp(buf, NIL_SYMBOL_STR)) return NULL; if(!strcmp(buf, TRUE_SYMBOL_STR)) return true_(); expr = alloc_mem(sizeof(*expr)); expr->type = SYMBOL; expr->u.symbol = alloc_mem(i + 1); strcpy(expr->u.symbol, buf); return expr; } struct s_expr *parse_s_expr (FILE *file, struct file_pos *pos) { struct s_expr *expr; int chr; chr = get_significant_char(file, pos); switch (chr) { case EOF: return NULL; case LIST_OPEN_BRACE_CHAR: expr = parse_list(file, pos); break; case STRING_DELIMITER_CHAR: expr = parse_string(file, pos); break; case NUMBER_PREFIX_CHAR: expr = parse_number(file, pos); break; default: ungetc(chr, file); pos->chr--; expr = parse_symbol(file, pos); break; } return expr; } struct l_env { char *symbol; struct s_expr *expr; struct l_env *next; }; static int do_trace = 0; char *s_expr_string (struct s_expr*, struct l_env*); void trace_put (char *format, struct s_expr *expr1, struct s_expr *expr2, struct l_env *env) { if (do_trace) { printf("Trace: "); printf(format, s_expr_string(expr1, env), s_expr_string(expr2, env)); puts(""); } } struct l_env *add_symbol (char *symbol, struct s_expr *expr, struct l_env *env, int append) { struct l_env *new_env; new_env = alloc_mem(sizeof(*new_env)); new_env->symbol = symbol, new_env->expr = expr; if (append) env->next = new_env, new_env->next = NULL; else new_env->next = env; return new_env; } struct l_env * add_built_in (int macro, char *symbol, built_in bi, struct l_env *env) { struct s_expr *expr = alloc_mem(sizeof(*expr)); expr->type = macro ? BUILT_IN_MACRO : BUILT_IN_FUNCTION; expr->u.built_in = bi; return add_symbol(symbol, expr, env, 0); } int find_symbol (char *symbol, struct l_env **env) { struct l_env *next = *env; for (; next; *env = next, next = next->next) if (!strcmp(symbol, next->symbol)) { *env = next; return 1; } return 0; } char *str_cat (char *dest, size_t dest_size, char *src) { if (strlen(src) > dest_size - 1 - strlen(dest)) error(STRING_OVERFLOW_MSG, NULL, NULL); return strcat(dest, src); } char *list_string (struct s_expr *list, struct l_env *env) { char buf[S_EXPR_LENGTH_MAX + 1] = { LIST_OPEN_BRACE_CHAR, 0 }; char psep[] = { ' ', LIST_DOT_CHAR, ' ', 0 }; char cbrc[] = { LIST_CLOSE_BRACE_CHAR, 0 }; for (; list && list->type == DOTTED_PAIR; list = list->u.pair.rest) { if (buf[1]) str_cat(buf, S_EXPR_LENGTH_MAX + 1, " "); str_cat(buf, S_EXPR_LENGTH_MAX + 1, s_expr_string(list->u.pair.first, env)); } if (list) str_cat(str_cat(buf, S_EXPR_LENGTH_MAX + 1, psep), S_EXPR_LENGTH_MAX + 1, s_expr_string(list, env)); str_cat(buf, S_EXPR_LENGTH_MAX + 1, cbrc); return strcpy(alloc_mem(strlen(buf) + 1), buf); } char *string_string (char *ptr, size_t size) { char *str = alloc_mem(size + 3); str[0] = str[size + 1] = '"'; memcpy(str + 1, ptr, size); str[size + 2] = 0; return str; } char *number_string (double number) { char *str = alloc_mem(NUMBER_LENGTH_MAX + 2); str[0] = NUMBER_PREFIX_CHAR; sprintf(str + 1, "%g", number); return str; } char *function_string (struct s_expr *expr, int macro, struct l_env *env) { char *str; for (; env; env = env->next) if (env->expr == expr) break; str = alloc_mem((macro ? sizeof(MACRO_STR_FORMAT) : sizeof(FUNCTION_STR_FORMAT)) + (env ? strlen(env->symbol) : 0) - 1); sprintf(str, macro ? MACRO_STR_FORMAT : FUNCTION_STR_FORMAT, env ? env->symbol : ""); return str; } char *s_expr_string (struct s_expr *expr, struct l_env *env) { if (!expr) return NIL_SYMBOL_STR; switch (expr->type) { case DOTTED_PAIR: return list_string(expr, env); case STRING: return string_string(expr->u.string.ptr, expr->u.string.size); case SYMBOL: return expr->u.symbol; case NUMBER: return number_string(expr->u.number); case FUNCTION: case BUILT_IN_FUNCTION: return function_string(expr, 0, env); case MACRO: case BUILT_IN_MACRO: return function_string(expr, 1, env); default: assert(0); return NULL; } } int proper_listp (struct s_expr *expr) { while (expr && expr->type == DOTTED_PAIR) expr = expr->u.pair.rest; return expr == NULL; } struct s_expr *search_symbol(struct s_expr *list, char *symbol) { for (; list && list->type == DOTTED_PAIR; list = list->u.pair.rest) { assert(list->u.pair.first->type == SYMBOL); if (!strcmp(list->u.pair.first->u.symbol, symbol)) return list; } return NULL; } void check_fargs (struct s_expr *fargs, struct l_env *env, struct file_pos *pos) { struct s_expr *rest = fargs; if (rest && rest->type == DOTTED_PAIR && !rest->u.pair.first && rest->u.pair.rest->type == SYMBOL) return; for (; rest && rest->type == DOTTED_PAIR; rest = rest->u.pair.rest) if (!rest->u.pair.first || rest->u.pair.first->type != SYMBOL || search_symbol(fargs, rest->u.pair.first->u.symbol) != rest) error(BAD_FORMAL_ARGS_MSG, pos, s_expr_string(fargs, env)); if (rest && (rest->type != SYMBOL || search_symbol(fargs, rest->u.symbol))) error(BAD_FORMAL_ARGS_MSG, pos, s_expr_string(fargs, env)); } void check_aargs (struct s_expr *args, int count, int va, struct l_env *env, struct file_pos *pos) { struct s_expr *rest = args; for (; count && rest && rest->type == DOTTED_PAIR; count--) rest = rest->u.pair.rest; if (count || (!va && rest) || !proper_listp(rest)) error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env)); } struct s_expr *eval_list (struct s_expr*, struct l_env*, struct file_pos*); struct s_expr *eval_s_expr (struct s_expr*, struct l_env*, struct file_pos*); #define ARG1(args) args->u.pair.first #define ARG2(args) args->u.pair.rest->u.pair.first #define ARG3(args) args->u.pair.rest->u.pair.rest->u.pair.first struct s_expr *trace (struct s_expr *args, struct l_env *env, struct file_pos *pos) { struct s_expr *expr; do_trace = 1; expr = eval_list(args, env, pos); do_trace = 0; return expr; } struct s_expr *quote (struct s_expr *args, struct l_env *env, struct file_pos *pos) { check_aargs(args, 1, 0, env, pos); return ARG1(args); } struct s_expr *car (struct s_expr *args, struct l_env *env, struct file_pos *pos) { check_aargs(args, 1, 0, env, pos); if (ARG1(args) && ARG1(args)->type != DOTTED_PAIR) error(NON_LIST_MSG, pos, s_expr_string(ARG1(args), env)); return ARG1(args) ? ARG1(args)->u.pair.first : NULL; } struct s_expr *cdr (struct s_expr *args, struct l_env *env, struct file_pos *pos) { check_aargs(args, 1, 0, env, pos); if (ARG1(args) && ARG1(args)->type != DOTTED_PAIR) error(NON_LIST_MSG, pos, s_expr_string(ARG1(args), env)); return ARG1(args) ? ARG1(args)->u.pair.rest : NULL; } struct s_expr *cons (struct s_expr *args, struct l_env *env, struct file_pos *pos) { struct s_expr *expr; check_aargs(args, 2, 0, env, pos); expr = alloc_mem(sizeof(*expr)); expr->type = DOTTED_PAIR; expr->u.pair.first = ARG1(args); expr->u.pair.rest = ARG2(args); return expr; } struct s_expr *if_ (struct s_expr *args, struct l_env *env, struct file_pos *pos) { check_aargs(args, 3, 0, env, pos); return eval_s_expr(ARG1(args), env, pos) ? eval_s_expr(ARG2(args), env, pos) : eval_s_expr(ARG3(args), env, pos); } struct s_expr *function (struct s_expr *args, struct l_env *env, struct file_pos *pos, int macro) { struct s_expr *expr; check_aargs(args, 1, 1, env, pos); check_fargs(ARG1(args), env, pos); expr = alloc_mem(sizeof(*expr)); expr->type = macro ? MACRO : FUNCTION; expr->u.function.expr = args; expr->u.function.env = env; return expr; } struct s_expr *lambda (struct s_expr *args, struct l_env *env, struct file_pos *pos) { return function(args, env, pos, 0); } struct s_expr *macro (struct s_expr *args, struct l_env *env, struct file_pos *pos) { return function(args, env, pos, 1); } struct s_expr *setq (struct s_expr *args, struct l_env *env, struct file_pos *pos) { struct s_expr *rest = args, *expr = NULL; struct l_env *senv; while (rest && rest->type == DOTTED_PAIR) { if (ARG1(rest) && ARG1(rest)->type == SYMBOL && rest->u.pair.rest && rest->u.pair.rest->type == DOTTED_PAIR) { expr = eval_s_expr(ARG2(rest), env, pos), senv = env; if (find_symbol(ARG1(rest)->u.symbol, &senv)) { trace_put("%s => %s [assign]", expr, ARG1(rest), env); senv->expr = expr; } else { trace_put("%s => %s [global]", expr, ARG1(rest), env); add_symbol(ARG1(rest)->u.symbol, expr, senv, 1); } } else error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env)); rest = rest->u.pair.rest->u.pair.rest; } if (rest) error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env)); return expr; } struct s_expr *plus (struct s_expr *args, struct l_env *env, struct file_pos *pos) { struct s_expr *rest = args; double sum = 0; while (rest && rest->type == DOTTED_PAIR && ARG1(rest)->type == NUMBER) sum += ARG1(rest)->u.number, rest = rest->u.pair.rest; if (rest) error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env)); rest = alloc_mem(sizeof(*rest)); rest->type = NUMBER; rest->u.number = sum; return rest; } struct s_expr *greater (struct s_expr *args, struct l_env *env, struct file_pos *pos) { struct s_expr *rest = args, *num; double prev = DBL_MAX; while (rest && rest->type == DOTTED_PAIR) { num = eval_s_expr(ARG1(rest), env, pos); if (!num || num->type != NUMBER) error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env)); if (prev - num->u.number < DBL_EPSILON) return NULL; prev = num->u.number, rest = rest->u.pair.rest; } if (rest) error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env)); return true_(); } struct l_env *create_env () { struct l_env *env = NULL; env = add_built_in(1, TRACE_SYMBOL_STR, trace, env); env = add_built_in(1, QUOTE_SYMBOL_STR, quote, env); env = add_built_in(0, CAR_SYMBOL_STR, car, env); env = add_built_in(0, CDR_SYMBOL_STR, cdr, env); env = add_built_in(0, CONS_SYMBOL_STR, cons, env); env = add_built_in(1, IF_SYMBOL_STR, if_, env); env = add_built_in(1, LAMBDA_SYMBOL_STR, lambda, env); env = add_built_in(1, MACRO_SYMBOL_STR, macro, env); env = add_built_in(1, SETQ_SYMBOL_STR, setq, env); env = add_built_in(0, PLUS_SYMBOL_STR, plus, env); env = add_built_in(1, GREATER_SYMBOL_STR, greater, env); return env; } struct s_expr *map_eval (struct s_expr *list, struct l_env *env, struct file_pos *pos) { struct s_expr *expr = NULL, *rest; while (list) { if (list->type != DOTTED_PAIR) error(NON_LIST_MSG, pos, s_expr_string(list, env)); if (expr) { rest->u.pair.rest = alloc_mem(sizeof(*expr)); rest = rest->u.pair.rest; } else expr = rest = alloc_mem(sizeof(*expr)); rest->type = DOTTED_PAIR; rest->u.pair.first = eval_s_expr(list->u.pair.first, env, pos); list = list->u.pair.rest; } if (expr) rest->u.pair.rest = NULL; return expr; } struct l_env *apply_args (struct s_expr *fargs, struct s_expr *aargs, struct l_env *env, struct file_pos *pos) { struct s_expr *rest = aargs; if (!fargs || fargs->u.pair.first) while (fargs && fargs->type == DOTTED_PAIR) { if (!rest || rest->type != DOTTED_PAIR) error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(aargs, env)); assert(fargs->u.pair.first->type == SYMBOL); trace_put("%s => %s [local]", rest->u.pair.first, fargs->u.pair.first, env); env = add_symbol(fargs->u.pair.first->u.symbol, rest->u.pair.first, env, 0); fargs = fargs->u.pair.rest, rest = rest->u.pair.rest; } else fargs = fargs->u.pair.rest; if (fargs) { assert(fargs->type == SYMBOL); if (rest && !proper_listp(rest)) error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(aargs, env)); trace_put("%s => %s [local]", rest, fargs, env); env = add_symbol(fargs->u.symbol, rest, env, 0); } else if (rest) error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(aargs, env)); return env; } struct s_expr *eval_list (struct s_expr *list, struct l_env *env, struct file_pos *pos) { struct s_expr *expr = NULL, *rest = list; for (; rest && rest->type == DOTTED_PAIR; rest = rest->u.pair.rest) expr = eval_s_expr(rest->u.pair.first, env, pos); if (rest) error(NON_LIST_MSG, pos, s_expr_string(list, env)); return expr; } struct s_expr *eval_s_expr (struct s_expr *expr, struct l_env *env, struct file_pos *pos) { struct s_expr *first, *in = expr; struct l_env *benv; trace_put("%s -> ...", in, NULL, env); if (expr) if (expr->type == SYMBOL) if (find_symbol(expr->u.symbol, &env)) expr = env->expr; else error(UNBOUND_SYMBOL_MSG, pos, expr->u.symbol); else if (expr->type == DOTTED_PAIR) { first = eval_s_expr(expr->u.pair.first, env, pos); if (!first || first->type == DOTTED_PAIR || first->type == SYMBOL || first->type == STRING || first->type == NUMBER) error(NON_FUNC_MACRO_MSG, pos, s_expr_string(first, env)); expr = first->type == FUNCTION || first->type == BUILT_IN_FUNCTION ? map_eval(expr->u.pair.rest, env, pos) : expr->u.pair.rest; if (first->type == FUNCTION || first->type == MACRO) { assert(first->u.function.expr->type == DOTTED_PAIR); benv = apply_args(first->u.function.expr->u.pair.first, expr, first->u.function.env, pos); expr = eval_list(first->u.function.expr->u.pair.rest, benv, pos); if (first->type == MACRO) { trace_put("%s ~> %s", in, expr, env); expr = eval_s_expr(expr, env, pos); } } else expr = first->u.built_in(expr, env, pos); } trace_put("%s -> %s", in, expr, env); return expr; } struct s_expr *eval_file (char *filename, struct l_env *env) { struct file_pos pos, prev_pos; struct s_expr *expr; FILE *file; int chr; file = fopen(filename, "r"); if (!file) { printf("Failed to open file '%s'\n", filename); exit(1); } pos.filename = filename, pos.line = pos.chr = 1; expr = NULL; while (1) { chr = get_significant_char(file, &pos); if (chr == EOF) break; ungetc(chr, file); pos.chr--, prev_pos = pos; expr = eval_s_expr(parse_s_expr(file, &pos), env, &prev_pos); } fclose(file); return expr; } int main (int argc, char *argv[]) { struct l_env *env; if (argc != 2) { puts("Usage: int source"); exit(1); } env = create_env(); puts(s_expr_string(eval_file(argv[1], env), env)); return 0; }

I decided to introduce more concise names for basic and arbitrary functions and macros. In the classic LISP (and, especially, in Common Lisp), the verbosity of the basic primitives strains me a little. On the one hand, I did not want to complicate the parser, because quote and backquote syntax is not supported by it, only the bracket notation. On the other hand, he sought to compensate for excessive stabilities by the wide use of special characters for conciseness. To some, this will seem like a very controversial decision.

I tried to select names according to their associative line:

_ - replaces nil
! - replaces lambda
# - likewise ! but declares a nameless macro
? - replaces if with the obligatory third parameter
^ - replace cons
@ - replaces car
% - replaces cdr
= - replace setq

Accordingly, the names of the derived functions and macros largely became derived from the base names:

!! - replaces defun
## - replaces defmacro
^^ - replace list
@% - replaces cadr
%% - replaces cddr
: - replaces let for one variable
:: - replaces let without redundant brackets
& - replaces and
| - replaces or

Now consider the derived definitions. We first define the basic abbreviations:

 (= @% (! (list) (@ (% list)))) ; cadr (= %% (! (list) (% (% list)))) ; cddr (= ^^ (! (_ . elts) elts)) ; list (= ## (# (name fargs . body) ; defmacro (^^ = name (^ # (^ fargs body))))) (## !! (name fargs . body) ; defun (^^ = name (^ ! (^ fargs body))))

Pay attention to the dot notation of the list of formal arguments. The character after the point captures the remaining actual parameters. The case when all arguments are optional is described by a special notation: (_. Rest-args). Next, we define a classic map and two paired list splits:

 (!! map (func list) (? list (^ (func (@ list)) (map func (% list))) _)) (!! pairs1 (list) ; (abcd) -> ((ab) (bc) (cd)) (? (% list) (^ (^^ (@ list) (@% list)) (pairs1 (% list))) _)) (!! pairs2 (list) ; (abcd) -> ((ab) (cd)) (? list (^ (^^ (@ list) (@% list)) (pairs2 (%% list))) _))

We define two options for let:

 (## : (name value . body) ; simplified let (^^ (^ ! (^ (^^ name) body)) value)) (## :: (vars . body) ; let without redundant braces (= vars (pairs2 vars)) (^ (^ ! (^ (map @ vars) body)) (map @% vars)))

Classic reverse and left convolution:

 (!! reverse (list) (: reverse+ _ (!! reverse+ (list rlist) (? list (reverse+ (% list) (^ (@ list) rlist)) rlist)) (reverse+ list _))) (!! fold (list func last) ; (fold (' (ab)) fl) <=> (fa (fbl)) (? list (func (@ list) (fold (% list) func last)) last))

Now if-based logical operators:

 (= t (' t)) ; true constant (!! ~ (bool) (? bool _ t)) ; not (## & (_ . bools) ; and (: and (! (bool1 bool2) (^^ ? bool1 (^^ ? bool2 t _) _)) (fold bools and t))) (## | (_ . bools) ; or (: or (! (bool1 bool2) (^^ ? bool1 t (^^ ? bool2 t _))) (fold bools or _)))

And, finally, comparison operators based on embedded> (greater):

 (: defcmp (! (cmp) (# (_ . nums) (: cmp+ (! (pair bool) (^^ & (cmp (@ pair) (@% pair)) bool)) (fold (pairs1 nums) cmp+ t)))) (= == (defcmp (! (num1 num2) (^^ & (^^ ~ (^^ > num1 num2)) (^^ ~ (^^ > num2 num1)))))) (= >= (defcmp (! (num1 num2) (^^ ~ (^^ > num2 num1)))))) (## < (_ . nums) (^ > (reverse nums))) (## <= (_ . nums) (^ >= (reverse nums)))

Notice that the last block of definitions explicitly uses a closure.

Full lib.l test

 ;| Formal argument list notation: ([{arg1 [arg2 [arg3 ...]] | _} [. args]]) Number notation: ${double | ooctal | hhex} ; $4 $-2.2e3 $o376 $h7EF Built-in symbols: _ ; nil Built-in functions: @ (list) ; car % (list) ; cdr ^ (first rest) ; cons + (_ . nums) Built-in macros: trace (_ . body) ' (expr) ? (cond texpr fexpr) ; if with mandatory fexpr ! (args . body) ; lambda # (args . body) ; creates anonymous macro > (_ . nums) |; (= @% (! (list) (@ (% list)))) ; cadr (= %% (! (list) (% (% list)))) ; cddr (= ^^ (! (_ . elts) elts)) ; list (= ## (# (name fargs . body) ; defmacro (^^ = name (^ # (^ fargs body))))) (## !! (name fargs . body) ; defun (^^ = name (^ ! (^ fargs body)))) (!! map (func list) (? list (^ (func (@ list)) (map func (% list))) _)) (!! pairs1 (list) ; (abcd) -> ((ab) (bc) (cd)) (? (% list) (^ (^^ (@ list) (@% list)) (pairs1 (% list))) _)) (!! pairs2 (list) ; (abcd) -> ((ab) (cd)) (? list (^ (^^ (@ list) (@% list)) (pairs2 (%% list))) _)) (## : (name value . body) ; simplified let (^^ (^ ! (^ (^^ name) body)) value)) (## :: (vars . body) ; let without redundant braces (= vars (pairs2 vars)) (^ (^ ! (^ (map @ vars) body)) (map @% vars))) (!! reverse (list) (: reverse+ _ (!! reverse+ (list rlist) (? list (reverse+ (% list) (^ (@ list) rlist)) rlist)) (reverse+ list _))) (!! fold (list func last) ; (fold (' (ab)) fl) <=> (fa (fbl)) (? list (func (@ list) (fold (% list) func last)) last)) (= t (' t)) ; true constant (!! ~ (bool) (? bool _ t)) ; not (## & (_ . bools) ; and (: and (! (bool1 bool2) (^^ ? bool1 (^^ ? bool2 t _) _)) (fold bools and t))) (## | (_ . bools) ; or (: or (! (bool1 bool2) (^^ ? bool1 t (^^ ? bool2 t _))) (fold bools or _))) (: defcmp (! (cmp) (# (_ . nums) (: cmp+ (! (pair bool) (^^ & (cmp (@ pair) (@% pair)) bool)) (fold (pairs1 nums) cmp+ t)))) (= == (defcmp (! (num1 num2) (^^ & (^^ ~ (^^ > num1 num2)) (^^ ~ (^^ > num2 num1)))))) (= >= (defcmp (! (num1 num2) (^^ ~ (^^ > num2 num1)))))) (## < (_ . nums) (^ > (reverse nums))) (## <= (_ . nums) (^ >= (reverse nums)))

So, the interpreter and most of the primitives are ready to write DSL assembler. I will try…

Source: https://habr.com/ru/post/150805/

All Articles

LISP interpreter on pure C

More articles: