This is the first article in the cycle about 161eForth v0.5b, ending here:habr.com/ru/post/452572
The EFORTH translator is now available on the domestic MK-161 Electronics calculator! On May 17, version v0.5b successfully passed my tests, as well as the five author tests TEST-TEST4. I have achieved what can be done alone, but I think this is only half the story. It is time to introduce a new tool to the community by opening the code 161eForth for public testing. I have a list of what to improve and where to “work on stability.” Your suggestions and comments will be taken into account when completing work and releasing version 1.0.
When transferring the latest version of eForth to the domestic platform, two obstacles were successfully overcome - the relatively low speed of the 8-bit machine, which is programmed in its own input language, and the modest amount of available binary memory (see 2.4.1), only 4096 bytes. When writing 161eForth, ready-made solutions prepared for Callisto, the input language of the new generation for Russian PMK, were used. This is the technology of implementation of the fort-machine on top of the decimal ALU and "Harvard" architecture, console drivers and the layout of the alphanumeric keyboard, as well as a software terminal based on them that works on the RS-232 serial port. In addition to the "Electronics MK-161" and the distribution kit 161eForth, you may need a self-made invoice keyboard, where the keys have letters of the Russian and English alphabets. The letters are arranged alphabetically line by line, from left to right and from top to bottom. ')
Dr. Chen-Hanson Ting, author of modern versions of eForth, emphasizes in his book [1] the importance of understanding the two components of the Fort. This is an internal (“address”) interpreter, allowing the equipment to execute the sewn Fort code, and an external (“text”) interpreter, responsible for the dialogue with the person.
In two articles I will discuss in detail the most radical solutions used in the implementation of each of these two interpreters on "Electronics". Exploring these solutions can be helpful and inspiring to transfer eForth to other devices with limited memory and performance. Understanding the articles will help initial acquaintance with programmable microcalculators (PMC) and the Fort. I will explain the difficult moments that are unique to “Electronics MK” and the eForth translator.
To begin with, the words eForth are divided into general useful and systemic. Letter size matters.The names of ordinary words are defined in capital letters, and system - lowercase. I also made my innovations in eForth in lower case letters. The author of eForth offers to conduct the main dialogue in CAPS mode. When you need to use the system word, switch to time in lower case letters (FP key combination).
In the article, all words are written in capital letters to stand out from the text. In several early eForth implementations, the headers of the system words were eliminated and not outputted by the WORDS command. This helped to simplify the appearance of eForth and save the attention of those who use the Fort for the first time. In 161eForth, the headings of these words are saved primarily due to the presence of the SEE colon word decompiler (see video No. 3 at the end of the article), which will not show the names of the system words if their headings are removed.
To streamline the article and make it useful, as a reference, I had to use several terms before defining them. Specialists in the Fort and PMK these terms should be familiar. Newbies sometimes have to look in the next sections (I put the links in the right places) or re-read the article a couple of times.
The 161eForth itself is posted here, along with the source text, the drawing of the keyboard overlay and help words.txt with a description of all the words implemented: http://the-hacker.ru/2019/161eforth0.5b.zip
I also posted 5 small videos on YouTube illustrating the work of 161eForth for those who do not have the MK-161. You can watch the entire playlist on YouTube . Below the first of them, the remaining 4 at the end of the article.
eForth and its implementation
eForth was developed as a modern replacement for the well-known FIG-Fort translator. To transfer to the MK-161, I chose a 32-bit version 5.2 of the 86eForth translator with indirectly written code, written in 2016 in MASM assembler for the Windows operating system. This version is described in detail in the third edition of the book “eForth and Zen” [1]. I advise those who know English to find and study this book, it is very useful for understanding 161eForth.
In a personal letter, the author confirmed that 86eForth502.asm from this book is the latest version of eForth. On the Internet you can find a lot of English-language information on this and previous versions of eForth.
The development of eForth followed a scientific path, taught by Professor Wirth using the example of its programming language Oberon. Each subsequent version of eForth was a simplification of the previous version. Everything you can do without is removed from the language. There remains a carefully thought-out set of strong, expressive language constructs, whose power has been tested on more than 40 eForth implementations for various platforms. Now on the calculator!
Being a minimalist dialect of the Fort, eForth does not aim to win the race for the tiniest Fort. The set of words offered by him is quite practical and can be easily expanded by the programmer in the direction necessary for his tasks.
The first version of eForth was released in 1990 in MASM assembler for 8086 processors and worked under MS-DOS. It contained 31 machine-dependent core words and 191 high-level words. The idea was simple - you translate only 31 words to your assembler, and immediately get eForth on your computer.
This approach has been criticized on the Internet, as the way to minimize the number of words in assembler led to extremely low speed for embedded systems. Already in the second version of eForth, the maximum number of words began to be implemented on the assembler, which straightened the tilt in the direction of not only easily portable, but also practical programming system.
For several years, Bill Munch, the original author of eForth, and his colleague Dr. Chen-Hanson Ting released their eForth releases in parallel. Each version had its own characteristics. Other programmers have also made their own efforts for eForth options for different platforms.
Version 5.2, released in 2016, contains 71 words “code” and 110 words “colons”. A quarter of a century of searching for the ideal led to a significant decrease in the total number of words. At the same time, for performance reasons, the percentage of words implemented at a low level has increased.
The proposed 161eForth enjoys the generous fruits of this progress, but does not pretend to further develop the trunk line. My implementation provides the programmer with all the tools present in version 5.2. When the MK-161 architecture makes the implementation of some 86eForth words impossible or meaningless, instead of discarding the extra, I give the programmers a full replacement by taking it from the ANSI / ISO standard [4]. Those who are looking for minimalism can independently throw out “extra” words, because by tradition 161eForth comes with source code.
When implementing eForth, I adhered to the author's understanding. For example, in my opinion, the FOR NEXT loop with an initial value of n must be executed exactly n times. In time, Chuck Moore, author of the Forth and colorForth languages, came to the same conclusion. Unfortunately, eForth uses an outdated agreement and performs such a cycle n + 1 times, with a counter from n to 0. I did not correct this and several other shortcomings, preferring 161eForth compatibility with implementations for other platforms.
Since 161eForth is the first practical on-board programming system for MK-161 Electronics, except for the factory language, I traced the long history of eForth and returned a few words to the language that were useful on other platforms and may be in demand now.
For example, the new-old variable 'BOOT contains a token (see 3.1) of the word that is executed first after the initialization of the environment, but before the dialogue begins. By default, 'BOOT contains the TLOAD token for interpreting the code from the “text area” (see 2.4.2). This allows the programmer to tweak eForth for himself without recompiling the environment, which is not yet possible to produce on board Electronics.
The priority tasks of implementation were the saving of binary memory (see 2.4.1) and the increase in speed. Their solution led to a dramatic decrease in the number of high-level words, because their code takes up this precious memory, by increasing the number of core fast words implemented in cheap program memory (see 2.4.3).
As a result, 161eForth contains 129 words of code, 78 high-level words and occupies 1,816 bytes of the MK-161 binary memory, that is, less than half of it. This gives hope for the metacompilation of its high-level part right aboard the "Electronics".
The original eForth for MK-161 is divided into two large parts. The kernel written in the MK-161 command system is contained in the eForth0.mkl file. High-level words are defined in SP-Forth language and are located in the eForth.f file.
Also in the distribution there is a reference file words.txt, in which all 161eForth words with stack notation and a brief explanation, in one line, are documented.
1.1 Kernel source eForth0.mkl
The eForth kernel contains executable code running in the MK-161 program memory (see 2.4.3), which is compiled on the computer into the eForth0.mkp file by standard means, for example, the proprietary MKL2MKP compiler.
The kernel source text contained in the eForth0.mkl file is written in Latin mnemonics . For example, the IHE command for reading the register E (aka R14) is written in this mnemonic as RME. Being unusual for the owners of the Soviet PMK, Latin mnemonics are convenient for typing from a computer keyboard. Indeed, it is easier to type strange FX ^ 2 than Fx² familiar from childhood.
The eForth0.mkp file is a kernel stock. In addition to the primitive code, it contains the kernel header and the tblNames name table, which eForth.f translates into decimal registers during compilation (see 2.4.4). It is on the basis of eForth0.mkp that the eForth.mkp kernel will be created (see 2.4.3), therefore eForth0.mkl must be compiled first.
1.2 Source code for high level words eForth.f
The file eForth.f is fed to the input of the excellent domestic compiler SP-Forth [5]. The file contains definitions for all high-level words. In time, they can be defined at eForth itself and, possibly, compiled right aboard the "Electronics MK-161".
During compilation, eForth.f reads the eForth0.mkp kernel procurement and with its help creates three files in the current directory for subsequent loading into MK-161: eForth.mkp, eForth.mkd and eForth.mkb. It is eForth.mkb that contains the bodies of high-level words, although their headers are placed in the eForth.mkd file.
The fourth file, eForth.mkt, is written on eForth manually and can be edited on board the MK-161 using the built-in text editor. Each of these four files I will discuss in more detail below (see 2.4).
2. Electronics MK-161
The manufacturer from Novosibirsk calls the MK-161 a vintage acronym. So the very first calculators were called in the USSR. The command system MK-161 inherits the command system of the Soviet calculators "Electronics B3-34" and "Electronics MK-61". This means that programs written for Soviet calculators will go to MK-161 without changes or with minor changes.
The reverse is not true. eForth will not go to the Soviet PMK, because uses many resources that appeared for the first time in the MK-152/161 and were absent in previous models of the series.
Consider the features of the input language and architecture of the MK-161, which influenced the 161eForth (hereinafter referred to simply as eForth) and gave the Russian accent implementation of eForth under discussion.
The first of these features is the “junior address senior” agreement consistently maintained in MK-161. For example, the number 1000 = 3 × 256 + 232 will be written in two consecutive bytes, as 3 and 232.
2.1 Indirect Addressing
Programmed Soviet PMK heard about indirect addressing. For direct addressing, we explicitly indicate the register number to which we are applying. For example, the RSP 44 considers the contents of register 44. The P key, which appeared in MK-152, is used to access registers with number 15 and more — these registers were absent in the Soviet PMC.
For indirect addressing, the number of the required register is not known in advance. This number is contained in another register. For example, if register 8 contains the number 44, the command K IP 8 counts the contents of register 44 (R44).
Keys K and P can be combined. For example, the command RK BP 20 will transfer control (GOTO in Latin mnemonics) to the address stored in R20.
The peculiarity that turned out to be important for the internal interpreter of eForth is associated with a preliminary increase / decrease in registers with indirect addressing. This feature is inherited from the Soviet PMK.
For example, the commands of indirect reading K PI 0, K IP 1, K IP 2 and K IP 3, before accessing the desired register, reduce by one the contents of registers 0, 1, 2 or 3. Teams K IP 4, K IP 5 and K IP 6 before reading increase by one the contents of registers 4, 5 or 6.
Such a “modification” of the address register allows you to process entire groups of registers in a loop. It is similar to ++ R and --R in C. The index register number is important. It is he who determines whether it will increase (registers 4-6) or decrease (registers 0-3) with indirect addressing.
The architecture of 161eForth was affected by the fact that the increase in registers 4-6 with indirect addressing is preliminary . As a result, the Interpretation Index (IP) located in R6 always indicates the last read byte of the sewn code. In 86eForth IP always indicates the next byte, not yet read.
This is also true for the return stack pointer (RP) stored in register 2. R2 always points to the top of the return stack.
A useful feature of the MK-161 is the absence of an increase / decrease in the register if indirect addressing occurs with a new R key. For example, RKIP02 counts the number from the top of the return stack without changing the pointer. This is Fort R @ ready command. From the above, it follows that the read value is one less than the address of the next token, which will be executed after returning from the word "colon".
When you have to develop or study words that closely interact with the internal interpreter of eForth - be sure to fully understand this subtle point associated with pre-exaggeration .
2.2 Tables ordered and associative
Tables MK-161 are located in the program memory (see 2.4.3). They appeared in Novosibirsk "Electronics MK" and completely unfamiliar to experts on the Soviet PMK. The address of the used table is always stored in register 9042, but the access to them is different.
An ordered table is an array of unsigned 16-bit integers. eForth contains the following tblTokens table with addresses of primitives (see 3.1.1) —Fort words written in the command system MK-161. The address interpreter (see 3.2) uses tblTokens to quickly execute the sewn code, so eForth tries to always contain the address of this table in R9042.
To refer to an ordered table, you need to write the number of the desired element in R9210. The number n in the register X will be replaced by the value of the element of the table with the number n, the counting starts from zero.
Associative tables (“search by value”) are actively used by eForth, primarily the primitive (FIND), searching for a word by its name. Also, the tblCHPUT associative table is used when displaying letters on the screen to handle line feeds and other control codes.
To search for the element n in the associative table, you must write n in R9212. The number n in the register X (the manual calls it “the index”) will be replaced by the 16-bit value recorded in the table immediately after its “index” n.
The presence of this fast, albeit unpretentious search function implemented in assembler in the “firmware” MK-161, helped eForth to achieve acceptable performance when recognizing word names and compiling programs. Of course, for this it was necessary to develop not the simplest name recognition tables, “sharpened” for this function. Let's talk about this in more detail in the second article.
2.3 Interrupts and console
"Electronics MK" allows its owners to write programs in the input language that react to certain events, such as pressing or releasing a button, ending a timer account.
eForth actively uses this interrupt system both for keyboard input and displaying a flashing cursor when requesting such input, and for input / output via the universal serial port (RS-232).
The letters entered from the keyboard are put in the bufKbd queue as you press the keys. It is very convenient and saves time on systems with low speed. Alphabet and register switches are handled by the KeyPress interrupt and do not take up space in the queue. A long press on the key causes auto repeat.
When the queue of 8 letters is full, and eForth is not yet ready to process the input (the situation is very rare), the MK-161 will emit a disgruntled squeak. Of course, I would not like to implement all this natural work of the keyboard in the translator, but to get it out of the box of the MK-161, as a service of the firmware (firmware). But so than, as they say, rich.
After starting the work, the entire eForth output is directed to the graphic screen MK-161. The output of the letter to it is carried out by a relatively simple subroutine ChPut. The only difficulty here is associated with the implementation of the BS control code, the space back. MK-161 uses a proportional font. Therefore, in a special buffer tblBS, you have to memorize the positions of the characters that were output, from which they later get the output code BS.
During the dialogue, the user can use the word IO> redirect all I / O to the RS-232 serial port, which makes it possible to program the MK-161 from a familiar computer keyboard or from another MK-161 . The word CON> returns control to the calculator console.
2.4 Memory and installation of eForth on the MK-161
Memory "Electronics MK-161" consists of separately addressable program memory and a register data memory. In turn, the register memory is heterogeneous and is divided into three large areas.
Registers numbered from 0 to 999 store "decimal numbers." These are the usual registers, as in "Electronics B3-34" and other calculators. They are simply able to store not 8, but 12 decimal places of the "mantissa".
Registers with numbers from 1000 to 8167 store integers from 0 to 255. The last 3 KB of this area with addresses from 5096 to 8167 are called the text area .
Registers with numbers from 9000 to 9999 are called function registers . This service area of ​​the address space resembles microprocessor I / O ports. Using the write and read commands to these addresses, access to I / O devices, an interrupt system, etc. is realized.
To install eForth on “Electronics MK-161”, it is enough to transfer four files to the calculator, for example, using the program of the manufacturer MK.EXE:
Write eForth.mkp to the program memory, starting from page 0. Version 0.5b takes 74 pages.
Write eForth.mkd to decimal data memory
Write eForth.mkb to binary data memory
Write eForth.mkt to text memory
After transferring to the calculator, I recommend that you immediately save these four files in a separate directory of the built-in "electronic disk". Since they have the same name, you can download eForth immediately at once, as a “package”.
2.4.1 Binary ("byte") memory MK-161: eForth.mkb
Registers "Electronics MK" with numbers from 1000 to 5095 are used to store numbers from 0 to 255. This area of ​​the register memory of the calculator is called binary. Two consecutive binary registers can be accessed from eForth, as one 16-bit “cell”, and (as everywhere else on MK-161) the upper 8 bits are in the register with a lower number.
eForth uses this tiny “binary memory” as its main memory. Words work with her! and @, HERE and ALLOT, only from here the address interpreter executes the sewn code (see 3.2). There are eForth variables, a text input buffer (TIB), a dictionary and a rollback tblBS stack to implement a space back.
4096 bytes is very modest, by modern standards. Therefore, enormous efforts were expended to bring to the other areas of memory everything that was possible.
2.4.2 Text Area: eForth.mkt
Immediately after the binary memory follows the text area , registers with numbers from 5095 to 8167. Technically, these are the same byte registers, but the ability to write them to disk and read them in a separate file makes this area special.
To work with "text" in eForth is the word TLOAD. It feeds the entire area to the input of the text interpreter, as a string, 3072 characters long.
There is disagreement on how to break the text into lines. The editor built into MK Electronics insists on the length of the line of 24 characters. Callisto uses the Fort convention, where the string contains 64 characters. eForth provides a choice to the user, counting all the text as one long line. You can use the built-in editor MK-161. You can write your own, compatible with Callisto.
Here is the initial content of eForth.mkt, for convenience, divided into three lines:
: hi ." , %user%!" CR ; ' hi 'boot ! hi \
The first line defines the new word hi, welcoming the user. The second line takes the token of this word (see 3.1) and places it in the variable 'BOOT (see 1). Now the text area will no longer compile each time you start eForth. Instead, the already compiled greeting is executed.
The last line starts the word hi, displaying the greeting on the screen. The word \ ends the interpretation of the text, returning control to the console.
To compile an arbitrary text file, you need to go to the calculator with the BYE command, go to the main menu and load the desired file in DOS mode. You can also transfer the mkt file from the computer. The C / P key will return you to eForth, after which the TLOAD command will be able to compile the file loaded into the text area.
2.4.3 Program Memory: eForth.mkp
Memory programs MK-161 - isolated address space. It also stores bytes, but they are read-only. The program memory contains 10,000 “steps”, which turned out to be redundant for eForth. More than a quarter of the program memory was free, which gives a good start for the development of the translator.
Only in the program memory can “code words” be realized. Also here are the name recognition tables and all known text strings, which saves binary memory.
Some words, such as C @, COUNT, and TYPE, can address program memory if the address is not a positive number. For example, the phrase 0 C @ counts the “pitch” (byte) from the address 0 of the program memory.
2.4.4 Decimal memory: eForth.mkd
Registers "Electronics MK" with numbers from 0 to 999 are called decimal and contain the numbers used for normal calculations on the calculator - 12 decimal places "mantissa" and 2 decimal places "order". The fort is designed to work with integers up to 4 bytes long, such a resource is clearly redundant for eForth.
Decimal memory is used to save precious binary memory. These are stacks of data and returns. It also stores the headings of words - both user-defined and embedded, one register per title. This approach allows you to override even words that have standard names.
The stack in decimal memory leads to a number of features characteristic of the Fort on the MK-161. First, the range of values ​​of the elements of the stack is huge, it is capable of containing 32-bit integers. The need for "double integers" on the MK-161 is no longer necessary, although for the sake of compatibility I have implemented the corresponding words eForth. "Double integers" are presented on the MK-161, as two stack elements containing numbers from 0 to 65535, encoding a single 32-bit integer with a sign in the additional code. The older 16 bits of such a number are placed on top, that is, at the lowest address.
Bitwise logical operations AND, OR, XOR and NOT treat their arguments as 16-bit integers. The result from 32768 to 65535 is converted to negative numbers from -32768 to -1. In eForth, false is encoded as zero, and true is minus one. Also true is any value other than zero.
The second feature of the 161eForth data stack is that it contains signed numbers. When the word @ reads the number 65535 from a 16-bit "cell", it is automatically converted to -1. A special "unsigned" word U @ is provided for counting 65535 directly, with a plus sign.
I mention that in order to speed the two top stack element data are not arranged in the decimal memory, directly to X and Y registers .
The fact that decimal registers can contain fractional numbers and floating point numbers is not used by eForth. The eForth virtual machine uses these registers to store signed 12-bit decimal integers. Appeal to decimal registers carry the word C @ and C! - the same that work with any single registers.
3. Internal interpreter
The eForth kernel is a program written in the input language MK-161. Her first BPU MAIN team transfers control to the MAIN code, which first determines the circumstances of the reboot. If the wrong token caused it, MK-161 squeals. When you first start, as well as after turning on the MK-161, the screen is cleared. Next, MAIN calls the Init subroutine to initialize the interrupt system and everything the MK-161 console drivers need.
After initialization of data stacks and returns, the low-level part of the start is complete. There is an incredible for machines with Harvard architecture - eForth goes to the execution of "sewn code" from byte memory. The honor of being the first belongs to a word whose heading address is recorded in R43. This is usually the word COLD.
How are the high-level words(IED)? Any word consists of two parts, body and heading. The title is stored in decimal case. It helps the external interpreter and decompiler to find the name and body of the word. The header also contains a “lexicon” field - a set of flags that help the external interpreter correctly process the found word. The internal interpreter is much more important than the body of the VCA, located in binary memory and stored in the dictionary. He can even execute words that have no heading.
The body of the VCA begins with a byte of the code field that contains the address of the processorgiven word. Four VCA handlers are written in the input language MK-161 and begin on the first page of the program memory. We will analyze them all (see 3.3), but the main one is called DOLST and is located at address 02, immediately after the already reviewed team of the MAIN BP. This handler executes Fort words defined with a colon.
After the byte of the code field, there is a field of parameters of arbitrary length. In the “words of the colon”, the parameter field contains a “stitched code” - a sequence of 16-bit tokens, each of which represents one action assigned to it.
First we will look at the token in more detail. Then we will study the internal interpreter INEXT, making the transition from one token to the execution of the next. The author of eForth calls INEXT a primitive handler. We will conclude this tour through the internal interpreter with an analysis of all four VCA processors.
3.1 Tokens
The token represents a word in the stitched code and stack, allowing it to be quickly executed. The token is a pointer to the body of the word, but the severe architecture of the MK-161 has made its own adjustments to this simple idea. Let's sort all types of tokens, starting with a token of primitives.
3.1.1 Primitive Token
All words included in the eForth distribution are numbered from 0 to 206. This numbering is through, taking into account both the primitives and the VCA. This is done to make it easy to restore his name by the word number . These names are stored in the program memory. A link to the desired name is easy to find through the header table.
The number of the primitive is its token . Like any token, the primitive occupies two bytes in the sewn code. The first is zero. The second contains his number. The tblTokens table allows you to quickly find the address of the primitive code by this number. The tblTokens address is permanently stored in R9042 (see 2.2), that is, everything is always at hand to execute the primitive.
The word XT> allows you to find out the address of the primitive code by its number (token). Since the primitive code is always located in the program memory, the resulting address is always negative (see 2.4.3).
3.1.2 IED Token
The VCA may have its own number and the standard name associated with it, or it may be completely new, created by the user. In all cases, the VCA token is the address of its code field (see 3), that is, the number is from 1000 to 5095.
In the sewn code, the VCA token is recorded in a very unusual way. The first byte is the number of hundreds (a number from 10 to 50), the second is the remainder of the division of a token by 100 (a number from 0 to 99).
For example, token 1234 will be represented by two bytes 12 and 34. Compilation of this, and of any other token, is performed by the word taken from the ANSI standard COMPILE ,. To write and read VCA tokens in the sewn code are the words XT! and xt @. They also access the addresses (see 3.1.4), and the word XT @ can also read the token of the primitive.
3.1.3 Whole Literals
Entire literals are a type of primitive tokens. They are unusual enough to be considered separately.
In the stitched code, the DOLIT and DOLITM tokens occupy four bytes. The first two bytes contain the already considered token of the primitive, that is, 0 and the number of the primitive. The next two bytes contain the integer that the given literal will put on the data stack when executed.
DOLITM differs in that it changes the sign of a number before putting it on the stack. It is intended to implement negative numbers.
3.1.4 Address Literals
Like whole literals, the three address literals BRANCH,? BRANCH and DONXT occupy 4 bytes each in stitched code. The first 2 bytes contain the primitive token, the last two contain the transition address.
The address is written in the same format as the VCA token (see 3.1.2). In the first byte, the number of hundreds goes, in the second - the remainder of the address division by 100. Let me remind you that due to the pre-exaggeration (see 2.1), the transition address contains not the address of the desired token, but a number that is one less.
The DONXT token helps implement the FOR-NEXT “end loop” (see 1). The unconditional BRANCH transition is needed to implement the infinite BEGIN-AGAIN loop. A conditional transition? BRANCH transfers control if there is zero (“false”) on the top of the data stack. It serves to implement the conditional IF-THEN operator, exits from the “undefined cycles” BEGIN-UNTIL and BEGIN-WHILE-REPEAT.
3.1.5 String Literals
String literals are a type of VCA tokens. In the stitched code of the string literal after the token, there is a byte with the length of the string, after which the string itself, from the first byte to the last.
EForth has three string literals: $ "|,." | and abort "|. They are defined in the eForth.mkl file as STRQP, DOTQP and ABORQ tokens, respectively. The main" literal "work is done for them by the word do $, the DOSTR token.
To make the size of the article reasonable, I cannot dwell too much on this interesting topic, but it is nice to know about their availability in eForth.
3.2 Address Interpreter
The time has come to consider the interpreter of tokens , whose address is always written in register 9. Most primitives complete their work with the command K BP 9, which transfers control to the INEXT tag.
First of all, the address interpreter reads the first byte of the next token with the KIP6 command. If it is zero, it is a primitive and the token will be processed by the code under the label NPrime.
The NData tag designates the processing of the VCA token. The first byte is multiplied by one by the command VP 2, after which KIP6 + adds the second byte of the token to the result (see 3.1.2). The read token is entered by the P7 command into the WP working register (R7).
We know that the VCA token is the address of its code field, which contains the address of the processor. The CIP7 P8 commands read the code field byte at R8, and the KBP8 command transfers control to the VCA handler. The handler knows that there is a number in R7 that is one less than the address of the parameter field of the word being processed.
F⟳ commands with code 25 “clean up” on the stack. The fact is that eForth stores the two top items of the data stack directly in the X and Y registers of the MK-161 stack. Such a decision speeds up the work, but makes sure that this important data is not lost.
It remains to understand how the address interpreter executes primitives.
NPrime: Fâźł 6 9210 8 Fâźł 8
The KIP6 command reads the second byte of the token primitive. The RRP9210 P8 commands read from the tblTokens table the address of this primitive (see 2.2 and 3.1.1), and the KBP8 passes control to this primitive.
As above, Fâźł removes the excess from the stack, restoring the contents of the X and Y registers. The
eForth address interpreter is so tiny that it is duplicated several times in the program memory. The main copy is executed by the command K BP 9, which ends most of the primitives.
As an exercise, I recommend studying the implementation of the word EXECUTE, placed after the EXECU tag. This is the INEXT variant, which reads the token not from the sewn code, but takes it from the data stack.
3.3 Handlers of VCA
Four types of VCA have four different handlers: DOLST, DOVAR, DOCON and DOCONM. We have already seen above that the address interpreter, before calling the handler, leaves in R7 the address of the code field of the word being processed.
eForth.f finds the addresses of these handlers by reading the kernel header from the eForth0.mkp file. This helps him to correctly compile the VCA for "Electronics MK-161", placing the result in the file eForth.mkb.
3.3.1 Colonals: DOLST and EXIT
The next important issue after INEXT is what the internal interpreter does when it encounters the token of a word defined through colons. The code field of such a word contains the number 2, so INEXT transfers control to the DOLST handler, which performs the necessary work to begin interpreting the new list of tokens.
DOLST: 6 2 Fâźł 7 6 Fâźł INEXT:
Register 2, as we have already discussed (see 2.1), contains the return pointer RP. The IP6 KP2 commands write to the return stack the value of R6 - Interpretation Index (IP). Later it will help to remember the current position in the old list of tokens, where INEXT came across the word colon. Now IP7 P6 rearranges IP to the beginning of the new list.
Immediately after the DOLST code, the INEXT code is placed, which will execute the first word of the new list of tokens. Like everywhere else, the Fâźł commands help keep the top two items in the data stack.
Colon words are usually terminated with the EXITT token, which does the reverse operation, in comparison with DOLST — it retrieves the old IP value from the return stack and returns to interpreting the old list of tokens.
EXITT: 02 6 x 1 2 + 2 Fâźł INEXT:
RCCP02 P6 commands read the old IP value from the top of the stack of returns (see 2.1). After that, the Cx 1 IP2 + P2 commands correct the RP value by increasing it by one. The Fâźł command restores the stack, after which INEXT executes the next word from the old list of tokens.
Of course, INEXT can not simultaneously go after DOLST, and after EXITT. To do this, I applied one ancient trick of the times of the USSR. You can also master it by examining the corresponding lines of the eForth0.mkl file.
3.3.2 DOVAR, Variable and Array Handler
Words generated by the words CREATE and VARIABLE use the same DOVAR handler. This handler puts on the stack the address of a variable placed in the parameter field that comes right after the byte of the code field. Variables VARIABLE occupy 2 bytes, and arrays created with CREATE contain as many bytes as the programmer wants.
DOVAR: ⇔ 3 x 1 7 + 9
KP3 commands ⇔ save the contents of the Y register in the data stack. At the same time, the number from the top of the stack is entered into RY, freeing the RX to a new value. After the Cx 1 IP7 + commands, this new value at the top of the stack becomes the address of the parameter field of the word being executed. KBP9 transfers control to INEXT, without any tricks, making the transition to the next word.
3.3.3 Constant Handlers: DOCON and DOCONM
Unlike DOVAR, the constant handler accesses the parameter field of its word itself. DOCON reads a 16-bit constant value from it. This value is always positive.
DOCON: ⇔ 3 ⇔ 7 5 x 256 5 × 5 + 9
Commands ⇔ KP3 ⇔ save RY in the data stack. But this time the old top of the data stack returns to the RX. The IP7 P5 commands push it back to RY, at the same time preparing the register-pointer R5 to read the value of a constant. Then Cx 256 replaces the garbage in the X register with the number 256.
The KIP5 Ă— KIP5 + commands read a constant from the parameter field to the top of the data stack, that is, in the RX. As we remember, in MK-161 the first byte is always the oldest. It is multiplied by 256, after which the low byte of the constant is added to the product. All the work has been done, KBP9 transfers control to the next word.
DOCONM works in the same way, only the sign of the constant after reading is changed to the opposite. Negative constants are implemented on the MK-161 by a separate handler for the sake of speed:
DOCONM: ⇔ 3 ⇔ 7 5 x 256 5 × 5 + /-/ 9
Now we have completely figured out how eForth executes its code on the “MK-161 Electronics” from the data area, even slightly touching on the deeper topic of string literals (see 3.1.5).
In the second article of the cycle, I will talk about the external "text" interpreter 161eForth, analyze the structure of the header tables and name recognition. This part of the translator demanded from me the development of much more radical solutions, against the background of which the traditional Fort, old and kind, dismantled above.
Happy programming on the Fort!
Literature
Dr. Chen-Hanson Ting. eForth and Zen - 3rd Edition, 2017. Available on Amazon Kindle.
Baranov S.N., Nozdrunov N.R. Language Fort and its implementation. - L .: Mechanical Engineering. Leningrad Separation, 1988.
Semenov Yu.A. FORT programming. - M .: Radio and communication, 1991.