I understand perfectly well that this article was 20 years late in commercials. Let the Spectrum not be released since 1992, but the army of fans of this platform does not decrease from year to year. Therefore, this article may be useful to researchers of programs written for the ZX Spectrum')
0. IntroductionSinclair Basic was born in 1979 and was originally developed for the ZX80. The successful implementation and small size allowed him to migrate almost unchanged, first on the ZX Spectrum 16 / 48k, and then on more advanced models of the ZX Spectrum line. This article is devoted to anti-debugging techniques that are widely used in programs written in BASIC.
First of all, it is worth explaining that the program recorded in the Spectrum ROM does not contain any additional tools that help debugging. Therefore, the study of a program in BASIC was trivially reduced to loading from a cassette, pressing the BREAK key (or SHIFT + space) and methodical examination of the source code using the
LIST command. At times it worked, but to study most programs, at least a minimum toolkit is required. Currently, we have emulators that have a disassembler, debugger, memory editor (for example, EmuZWin).
1. Sinclair Basic outside and insidePrograms written directly on BASIC, not so much. By the way, when I had the ZX Spectrum and five cassettes with games in 1994, then out of about a hundred BASIC games only one was written. But almost all other games had a small loader that loaded machine code blocks and transferred control. Just this bootloader was written in BASIC. Naturally, there were quite a few varieties of such loaders. We will analyze some of them below.
The four-kilobyte size of the interpreter could not help but add a number of restrictions. Developers in the design and implementation process had to solve a lot of technical difficulties. The first thing I had to give up was the parser. The creators of BASIC have proposed a very elegant solution - bytecode. Moreover, this bytecode is generated on the fly. There is a gain in CPU time during the introduction of the team and the execution of the program.
Each keyword in BASIC has its own code in the character generator. The table itself can be
viewed on Wikipedia . For us, it is important to know that each keyword occupies exactly 1 byte in memory.
Also pay attention to the five-byte format for representing numbers. Each five-byte record is preceded by a byte-marker
0x0e
. Integers from -65535 to 65535 are encoded with the following mask:
0x00 0x00 LSB MSB 0x00
. You see that the sign of an integer does not appear here, it is taken from the textual representation of a number. But for our purposes, positive integers are enough.
Each line of the program in BASIC has the following format:
2 - (big endian)
2 - (little endian, )
n -
1 - 0x0d
The program line contains data for display on the screen as well as for calculation. For example, the string
10 LET a=32768
has the following bytecode representation:
0x00 0x0a - 10
0x0f 0x00 - 15
0xf1 - LET
0x61 0x3d 0x33 0x32 0x37 0x36 0x36 - a=32768
0x0e 0x00 0x00 0x00 0x80 0x00 - 32768
0x0d -
And now the fun begins. So what can you do if you know the Sinclair Basic internal kitchen?
2. Line number zero.Line number 0 can not be entered, no matter how hard we try. If it is, then it can be seen in the listing, but it will not be possible to edit it. However, there is a system variable, which is called PROG in the documentation (in the ZX Spectrum 48k, it is located at 0x5c53), which stores the address of the program loaded in memory in BASIC. And a very simple manipulation with this address allows you to change the number of the line. For example:
10 LET addr=PEEK(23635) + (PEEK(23636) * 256)
20 POKE addr, 0
30 POKE addr + 1, 0
After running this program in its listing, the first line will have the number 0.
3. Unordered listingBased on the previous example, you can shuffle line numbers as you like. There are absolutely no obstacles for row 1 after line 10, followed by row 600, etc. The program will be executed in the order of the lines in memory.
4. We execute machine code insolentlyThere is such a reserved word -
USR . It allows you to perform a subroutine in memory. You cannot use it directly, but there are a great many workarounds. For example:
RANDOMIZE USR addr
PRINT USR addr
LET a=USR addr
And you can load machine code using the keywords
READ ,
POKE and
DATA . Here is one of the similar loaders found in the wild:
1 REM FANTASY WORLD DIZZY
10 CLEAR 24319
20 FOR j=24576 TO 24594
30 READ a
40 POKE j,a
50 NEXT j
60 RANDOMIZE USR 24576
70 DATA 17,0,1,221,33,198,92,62,255,55,205,86,5,212,0,0,195,198,92
In this fragment, the memory area used by BASIC is limited to 24319. After that, the remaining memory can be freely used. In the cycle from the 20th to the 50th row, the data from the 70th row is loaded. This data is recorded at 24576, and then control is transferred there.
5. Hide the machine code in the commentsAs a matter of fact, nothing prevents to keep the machine code in the comments. Of course, you will not enter it from the keyboard. But you can first reserve enough memory in the line with the keyword
REM , and then enter the machine code using the
POKE command . I met this reception along with the “unordered listing”. Here is how it was:
28725 REM [ 3 , ]
20 CLEAR 24499: BORDER 0: PAPER 0: INK 0: CLS: RANDOMIZE USR 23875
This example is taken from the Dizzy 3.5 bootloader converted to a file for running in an emulator. I don’t know if this code was on the disk, or it was added after the conversion, but the example is still pretty typical. For each program in BASIC, an entry point is set (the number of the line with which the performance starts). In this case, it was set rigidly - 28,725. In principle, there are few differences from the previous case, except for the fact that the machine code is already in memory, and you can simply transfer control to it.
It is much more interesting where such an address came from. To make a universal loader, you need to take a value from the PROG variable, then add 8 (2 bytes to the line number, 2 to the length of the line, 1 to REM, 3 to spaces), and then go to the received address. But we know the conditions under which the program will be launched. It is loaded from a floppy disk in TR-DOS, therefore, we have a disk drive. And this means that PROG will point to address 23867, and not 23755 (there is a
good document on TR-DOS variables).
6. “Liverpool” is written, and “Manchester” is readRemember, just above, we looked at the five-byte format for storing numbers? This technique uses a double entry of the number in all its glory. The fact is that the numbers are converted into a five-byte format and added to the bytecode by the BASIC interpreter itself. But we can always change the display of the number in the listing or its value. There is a whole class of loaders that first loads the machine code into memory (for example, in the 4th or 5th scenario), and then commits a real suicide in a straightforward way:
RANDOMIZE USR 0
(if you do not know, under normal conditions this command has the effect similar to restarting the computer).
But if you look carefully at baytkod, then there will be something like
0xf9 0xc0 0x30 - RANDOMIZE USR 0
0x0e 0x00 0x00 0x43 0x5d 0x00 - , 23875
This technique works well against an inattentive code explorer, if a noticeable zero is replaced with a more traditional address, which will have a more or less meaningful machine code.
7. ConclusionOf course, this is not all that can be said about tricks. You can think of (and probably already have been thought of) many ways to counter the study of code, and some of them are described in this article. Also, information on anti-debugging techniques (not only on BASIC, but also for the ZX Spectrum as a whole) can be found in the
How to Hack on the ZX Spectrum guide. And do not be lazy yourself from time to time to watch what you feed the emulator, and help you with this utility from the site
http://www.zxmodules.de/ . I assure you will find many interesting discoveries.
UPD: Transferred to the blog "World 8 bit»