Since June 2009, I have been developing the C interpreter. (I already mentioned this
in the article about function calls ).
A lot of constructions have already been implemented: cycles, selection, evaluation of expressions, function calls (both user-declared and standard), inclusions, and more.
The program received the name CPrompt by analogy with the “prompt” command prompt (D:>, root @ comp #, ...).
A little about the work of the interpreter
Running to interpret a source file is easy:
$ cprompt /path/to/file.c
Interpretation is divided into 3 stages:
1) Preprocessing
2) Building the execution tree
3) Call main ()
')
The first stage, preprocessing, is important and boring: it handles inclusions, defines, comments, and more.
The output is clean code, without directives and unnecessary text.
The second stage is building the execution tree. Here the code is taken and a graph (tree) is built, at the root of which is a certain APPLICATION object, which contains information about the file being run. In subtrees, functions, each token goes into an element of the tree. Each element of the graph has its own “type” - a number that indicates its purpose.
Functions are divided into 2 types - declared by the user and "external". The latter - for example, standard from libc, from other shared libraries.
The output is a tree like this:
...
Tree was built:
t0; APPLICATION ;;;
| t0; FUNCTIONS ;;;
| | t14; double; cos ;;
| | t14; double; floor ;;
| | t14; int; isdigit ;;
...
| | t2; double; round ;;
| | | t5; floor (value + 0.5) ;;;
| | t14; int; printf ;;
| | t2; int; main ;;
| | | t1; int; l; round (7.2);
...
(this is part of the log that the interpreter displays).
t0, t2, t14 - type of elements.
0 - without type
1 - expression (assignment is considered one of the operations, along with +, -, * ... but in a different priority).
2 - function
5 - “return”
14 - "external function",
And others, for different actions.
Arguments to functions are stored in structures, so they are not in the visible part of the tree. In fact, pointers to them are written in each branch of the function.
All program execution is logged in great detail - if you specify the --dbg parameter, it outputs a lot of information to the standard output.
If you look at the log, you will notice that the implementation is divided into 5 points, and not 3 as I said at the beginning. Two points - before preprocessing and after - is parsing the source text. Parsing before processing - and parsing the processed text.
The only way beyond the standard that I allowed myself was to add the “outside” construction, which, because of the impossibility of static linking, allows declaring functions from external libraries. It is now possible to declare only those functions with the libraries of which the interpreter was compiled, but in the future there will be the possibility of importing from shared libraries.
For example,
outside cdecl: double cos (double x);
outside cdecl: double floor (double x);
Where cdecl is a function call convention.
More on this function call .
The product is written in C ++.
View or download the source code (Google Code) .
I hope that soon there will be a working release of the product, which can be used, as every day I am getting closer to the standard. The ultimate goal is full C99 support.
Example
#include <cmath>
int main(int argv,char* argc[])
{
int l=round(7.2);
}
On such a code, the log will be
like this .
A lot of excess, but you can disassemble if you wish.
What are your options for using the C interpreter? Where could this product be applied?
UPD moved to programming languages
UPD2 Added a makefile to the repository.
To compile:
svn checkout cprompt.googlecode.com/svn/trunk/ cprompt
cd cprompt
make
and run ./cprompt / path / to / file