EFORTH for MK-161: Data Structures

This article is the end of a series of articles about eForth on a programmable calculator. Start here .

Commands of the MK-161 Electronics input language occupy only half of the eForth0.mkl file. The second half is occupied by the tables, which were no less difficult to develop than to write the algorithmic part of the translator. Let's try to figure out how these tables are used.

Professor Wirth teaches that “programming in the small” consists of developing two equally important components — algorithms and data structures.

We have already encountered one eForth data structure. This is the body of the VCA (high-level words) located in byte memory. Four processors interpret the fields of the parameters of "their" VCA in different ways:
')

.DB DOVAR ;      .DB … ;      .DB DOCON ;    .DW _ ;   .DB DOCONM ;     .DW _ ;   .DB DOLST ;     .DW 1, 2,… EXITT ;

The following relatively simple data structure is related to the "standard messages" TYPE. All eForth messages are numbered and transferred to cheap program memory. If the word TYPE prints a single letter, its code can be the number of such a message, from 0 to 7.

 ;   TYPE .BASE tblTYPE: .DBB str7,str6, str5, str4, str3, str2, str1, str0

In the extended language of the MC, the pseudo-command .BASE sets the “base” for the .DBB command, which sequentially places the offset str7, str6, etc. in bytes. relative to the tblTYPE base label. Adding to the address of the table the numbers from 0 to 7, we can read from it the offset. Adding the offset to the tblTYPE, we get the address of the desired line.

The first byte of the string contains its length. eForth makes extensive use of such “countable lines” .

We also encountered the tblTokens table, which lists the code addresses of all 208 embedded words. If the word is not primitive, the table contains 0. Going to address 0 will cause eForth to reboot, and with a squeak.

The table tblNames was also mentioned, referring to the names of the same 208 words. These names in the form of countable lines are stored in the same “rubber” program memory. The tblNames table itself will not be available while eForth is running, but the information it contains will not be lost. During compilation, eForth.f transfers name addresses to a more convenient data structure stored in decimal memory (see 2).

I also told about tblCHPUT, an associative table of control codes when displaying letters on the calculator screen. Another seven tables, from tblKeyNum to tblKeyRusF, translate the button code, pressed in different keyboard modes, into an 8-bit letter code. The address of the subroutine responsible for the active keyboard mode is contained in the decimal register ptrKbdInt.

As a result, only one data structure remained unparsed in the eForth0.mkl file, these are name recognition tables. Let's leave them for dessert (see 5) after the main course - two tables of headers stored in decimal memory. First we arm ourselves with the stuffing tools for these headers.

1. Work with headlines: HEAD! and HEAD @

 HEAD! ( xt nfa r -- )     r,  xt  nfa. HEAD@ ( r -- xt nfa lex )     r,  xt, nfa  .

One decimal register MK-161 can memorize 12 decimal places. eForth uses this register to hold three small numbers, each from 0 to 9999. I called the three “fields” to store these numbers A, B, and C: AAAABBBBCCCC. The decimal number applies only to field A.

Primitive HEAD @ gets the register number and splits the number from there into the fields, and HEAD! collects fields in a long number and writes the resulting "monster" in the specified register. But there are nuances.

The “decimal header” of the word contains in the field A the address of its name (nfa). If this address is negative, the name is stored in program memory. The B field contains the word token (xt). The C field is called "lexicon." It contains the IMMEDIATE bit and a sign that the word is intended only for compilation.

HEAD @ splits the header into parts. The C lexicon field is put on the top of the stack, and the name field A is under it. The B field, in which the token is usually stored, is at the very bottom.

HEAD! clears the C field.

2. Embedded word headers

The headings of each of the 208 embedded words (from 0 to 207) go in order, starting with R44. The field A always contains a negative number, since the names of these words are strictly written in the program memory.

Fields B and C are editable. Therefore, the user can override the embedded words and make the necessary IMMEDIATE from them (see 4).

3. User headers

Working with only 208 predefined names saves byte memory, but is unusually boring. Therefore, I developed another data structure where the fantasy in choosing a name is limited to only 32 letters. This structure consists of 32 lists , each of which is responsible for user words of a certain length. Each of these 32 lists has a personal header. The lists themselves “jump” in decimal memory, but their headers are always stored in R301 ... R332.

Sorting words by name is an important highlight of the 161eForth. Sorting greatly reduces the number of comparisons when looking for a word by its name, speeding up the compilation. Who needs hash functions if each name has a known length?

For simplicity, the heading of the list has the same structure with the fields A, B and C as the heading of the word. The assignments of these fields are different. Field A contains the number of the first register of the list. Field B contains the number of registers provided for the list. The C field stores the number of words whose headings are already included in the list.

At the beginning of the work, the C fields are zero, words in all lists are missing. Fields B are 2, each list is given for a start by a pair of registers. Fields A indicate blocks of 2 registers each, starting with R333.

Each list contains word headings. We have already disassembled them (see 1). Here, except that the address of the name (nfa) will be positive and point to the counting string, traditionally stored in front of the VCA body. Also, the token in the B field is the address of the code field (cfa) that goes into binary memory immediately after this name. The only exception is that if the word has already been defined, the A field will point to the old name. Why store the string again? Binary memory is expensive.

When all the registers of the list are filled (B = C), the word PUBLISH provides 5 more free places, pushing this data structure in the right place and correcting the links (A) in the list headers.

4. Publication of the new word: WORK and PUBLISH

 LAST ( -- a )      . WORK ( -- a )     . PUBLISH ( -- )     . $,n ( nfa -- )     ,    nfa. ?UNIQUE ( a -- a )  ,    .

The data structure developed for MK-161 for storing word headings turned out to be practical and easy to integrate into eForth. When CREATE, CONSTANT or: create a new word, they refer to the system word $, n to create a title for the word with the given name. $, n refers to? UNIQUE for the sake of testing - are we creating a new word or are we redefining the old?

If a word with the same name already exists,? UNIQUE warns the user about it. At the same time, the address of the header to be redefined is entered into the LAST system variable. For a new word, LAST is reset.

In any case, $, n builds a new header in the WORK variable - this is a decimal register capable of storing 12 bits of the header. If the name was not found, it is included in the dictionary before the code field, as is the case in 86eForth and many other Forts. At MK-161, we managed to do without the “communication field” ; this also saves binary memory.

The PUBLISH primitive completes the definition of the word. When compiling colons, PUBLISH is called from;, as a result, the SMUDGE bit is not required. The place where the header is copied from WORK is determined by the variable LAST. If LAST is zero, a new header is created in the corresponding list (see 3). Is the required list full? Then PUBLISH will add 5 more registers to it, four of them for the future.

After the PUBLISH operation, the LAST variable always indicates the title of the word defined last. This helps IMMEDIATE to do their work by changing the field of the lexicon.

5. (FIND) and name recognition tables

 (FIND) ( a -- r T | a F )    r,        a. FIND ( a -- a F | xt 1 | xt -1)    .  1,  IMMEDIATE.

The search for a word by its name is managed by a primitive (FIND). First, he searches for a name among the embedded words with previously known names, then checks the word list of the user with the desired name length (see 3). Name recognition tables seriously accelerate this "first." This is how they work.

At the beginning (FIND) finds in the tblLen array the address of the main associative table, in which known names of the desired length are “prepared”. In this table (FIND) searches for the first character of the name. In most cases, this immediately allows you to find out the register number of the title of the word you are searching — by the first letter and length.

It happens that several words of the same length have the same first letters. Then, instead of the register number (FIND), it stumbles upon the address of the next associative table (the read number is 300 or more) and the search continues on the second letter. And so on, until the word is found or established that there is no such word.

Of course, after a match of the first letters (FIND), it checks the entire name as a whole. But recognition tables made eForth fast . This spring I have invested a lot of my time in them, and now they save search time. The “keys” in them are even sorted alphabetically. Sorry, firmware MK-161 to spit on it.

For the sake of compatibility, I implemented the word FIND from ANS Fort [4], which trusts the “black work” to the primitive (FIND). The word already considered? UNIQUE also searches for its argument through (FIND).

6. External interpreter

The book [1] contains an exhaustive description of eForth, including talks about the external "text" interpreter. It is he who executes or compiles the source code in the Fort language. Differences from text interpreters of other Fort dialects ([2], [3]) have appeared over the past decades, but they are few.

Below is a block diagram of a text interpreter taken from [1]. Be careful - this “interpreter” has a compilation mode! The word $ COMPILE is responsible for compiling text on Forte into “embroidered code”, the execution of which we discussed in detail in the first article. When $ INTERPRET is executed instead, the words entered are executed immediately - the interpretation mode. EVAL “computes” the entered string completely, calling one of these two words for each entered word.

After the flowchart, the author translates what some of the blocks do. Here is her translation. Block names usually correspond to eForth word names. The word NAME? missing in my implementation, it successfully replaces the fast (FIND) (see 5).

MAIN	Configure the virtual "engine" Fort
COLD	Initialize system variables
ABORT	Reset the data stack. Error handler
Quit	Reset the return stack and enter the interpreter loop
QUERY	Accept text input from the terminal
Eval	Calculate or interpret text string
PARSE	Select a word from the entered text.
$ INTERPRET	Interpret the word
$ COMPILE	Compile word
NAME?	Search word in dictionary
NUMBER?	Translate a string of text into an integer
EXECUTE	Fulfill word
IMMED?	Is this word an immediate command?
LITERAL	Compile whole literal
COMPILE	Compile token

The book also provides the source text of each word eForth in the version for Windows, with brief explanations. What is the version for MK-161, I already told you. The source code of my implementation is in the archive: the-hacker.ru/2019/161eforth0.5b.zip

Finally, I will mention the implementation of the word (PARSE) in the language MK-161 - under Windows it is VCA. Debugging took a week, but it accelerated the compilation twice . The word (PARSE) does all the “black work” for PARSE in isolating individual words from the input text stream.

My additions to the external interpreter are two words, in addition to the usual QUIT cycle: the already mentioned TLOAD and taken from old FILE versions. The word FILE translates the I / O to the console, but reads the lines for interpretation from the RS-232 port. After each line has been successfully processed, a letter 11 is output to the port. The file downloaded from the computer must end with the word QUIT.

I have not debugged the word FILE yet. If anyone needs it, share your impressions.

The 161eForth Hard Spot Review is complete, but the Fort is an incredibly flexible tool that every owner customizes. Even when you have thoroughly understood everything, someone somewhere on the planet will come up with another trick that can surprise you.

I will give the final words of the author eForth from [1]:

For 26 years, I rewrote eForth many, many times. In each rewriting, I tried to make it simpler and clearer. Now in 86eForth v5.2, I think I’ve got it right, and therefore very happy.

As Einstein said:
Everything should be made as simple as possible, but not simpler.

Make 86eForth v5.2 even easier, perhaps break it or not useful as a programming tool.

Literature

Dr. Chen-Hanson Ting. eForth and Zen - 3rd Edition, 2017. Available on Amazon Kindle.
Baranov S.N., Nozdrunov N.R. Language Fort and its implementation. - L .: Mechanical Engineering. Leningrad Separation, 1988.
Semenov Yu.A. FORT programming. - M .: Radio and communication, 1991.
Standard ANS Forth. X3.215-1994. Translation .
SP-Forth documentation .
The website of Offete Enterprises (Dr. Chen-Hanson Ting) , by 86eForth v5.2, is in English.
The story of Mikhail Pukhov "True Truth" with the program "Lunolet-1", from where I got KDPV and love for Soviet calculators.

Source: https://habr.com/ru/post/452572/

All Articles