
Some of you may remember a series of noteworthy games for the Game Boy Advance, released during 2004. Light-gray cartridges with simple labels were very different from the usual, dark gray, with multi-colored labels. They sold games ported to the original Nintendo Entertainment System. These games, known in the USA as the
Classic NES Series , are interesting for several reasons.
They are especially interesting in terms of GBA emulation. Usually, games for the Game Boy Advance are incredibly problematic, and the platform itself contains many tools to protect against failures. Therefore, to run games, emulators must be compatible with the errors of the original equipment. However, in the Classic NES Series, the developers went further and tried to protect the game from running in emulators.

')
If you tried to play in one of the old emulators, you probably saw the Game Pak Error screen. As it turned out, these games use tricks and undefined behavior that complicate emulation. It seems that it was a deliberate attempt to protect against copying such games. In the interests of emulation accuracy, I painstakingly researched, implemented and recorded all the unusual actions performed in these games.
Trick 1: Memory Mirroring
The first trick used in games involves the Game Boy Advance memory structure. GBA has a “flat” (non-segmented) memory address space. However, the top eight bits of the address signal the bus which device should have access to the memory at the moment. 00 is the BIOS, 02 is the main RAM, 03 is the RAM on the chip, etc. However, since only 8 upper bits signal a device, and most devices have a very limited (less than 16
mebibyte ) address space, the bits between the upper 8 bits and the lower bits that signal the address in the device do not have a specific target.
For example, the main RAM is 256 kibibytes. This equals 18 bits of the address space. This means that addresses in this area of ​​memory are in the range from 02000000 to 0203FFFF, and everything from 02040000 to 02FFFFFF remains unaddressed. In a conventional ARM device, accessing invalid addresses results in a data abort error (resetting data). However, the GBA does not support data resets, so what is happening at this moment is particularly interesting. Since the upper 8 bits are used to select a device, and the lower 18 bits are used for addressing in the device, 6 unused bits remain in the middle. These unused bits are in fact simply
ignored . This means that when attempting to access data above valid memory areas in the main RAM, the upper bits are actually masked and only areas with valid addresses remain. In some emulators, this property is called "mirrored" memory.
Classic NES Series doesn't do anything special about mirrored memory: it copies the code to the main RAM, and then goes to one of these mirror addresses. Such an action confuses some emulators, but thanks to the peculiarities of the implementation of memory areas, it never caused problems in mGBA. However, this is still the easiest trick used in Classic NES.
Trick 2: code in VRAM
The further work of the games is much more interesting: they start copying data into video memory (VRAM), which in itself is perfectly normal, but then the execution of instructions is transferred to this data copied to video memory: after copying the code into the RAM area, usually reserved for graphics, it is executed there . When I first saw this behavior of games, I thought that I had made some big mistake. The transition to an invalid address is a typical symptom of an emulator failure. This usually happens when copying to executable addresses or memory. But after deep research, I realized that if you allow the game to execute code in video memory, there will be no failure and the work will be relatively stable. There were still other problems, but it was obvious that this was a tactic to protect against emulation. Using video memory in operations for which it was definitely not intended, at first confused me, but when I allowed games to do this work, I encountered other problems as well.
Trick 3: STM in DMA registers
The next trick is a very non-standard use of STM instructions. STM (stands for “multiple storage”, multiple storage) is a class of instructions designed to pack multiple CPU registers into serial memory. There are four types of STM instructions: subsequent decrement, preliminary decrement, subsequent increment and preliminary increment. The description “subsequent decrement” refers to the structure of the packaged memory: the values ​​are stored alternately, the address is reduced by the size of the word after each word is saved. Such an operation resembles the process of writing to memory in decreasing order. If you save the values ​​A, B, and C, they are stored in memory as CBA, so you can start with A, because this is the source address, and process the values ​​in reverse order.
However, as Martin Korth wrote, the processor actually calculates in advance where the last register address will be, and writes the values ​​in the same order as in the case of increment instead of decrement. Therefore, despite the fact that the CBA will be stored in memory as a result, it first writes C. The emulator will need to constantly calculate in advance how many registers will be stored, which can slow down work. In general, the order of writing to memory may seem unimportant, especially for single-core processors, for which writing to memory can be considered atomic. For the main RAM this may be true. (Since the recording is done by one instruction, DMA (
direct memory access ) cannot free up the CPU in the middle of the recording process.) However, in the Classic NES Series games, a tricky trick is done here: transferring DMA data for one instruction.
DMA data transfers are used to efficiently copy memory from one area to another, often from Game Pak to main RAM, or from main RAM to sound FIFO buffers. In the field of memory I / O registers there are three consecutive registers per DMA channel (there are four DMA channels in total), which can be recorded to configure DMA data transmission. Usually, the game indicates the source and end addresses with two separate 32-bit entries, then starts the data transfer, recording the number and DMA control bits, or one final 32-bit entry, or two entries of 16 bits in each half of the control register.
But the Classic NES Series games are much smarter: since these three registers are sequentially stored in memory, they use the STMIA and STMDA instructions to simultaneously record all three values. STMIA is the simplest case: writing to one register, increment, writing to the next register, increment, writing to the control register, increment. STMDA is a little different: it performs the decrement, so an ignorant emulator can write control bits to the address, which leads to incorrect transfer of DMA data. Although A, B, and C are recorded as CBA, and the starting address is A, A must be recorded last. I had to use the
Hamming weight for the writeable registers and
pick up the initial write
offset for the order to work properly. After changing these operations according to the expectations of the equipment, the data transfer began to be performed properly.
Trick 4: hiding save types
But the tricks are not over. The following trick is not so difficult, and in some other games it is also used. The Game Boy Advance cartridge may have one or more save mechanisms. Some games use save to NVRAM, which is byte-addressing. They exist in memory block 0E and can be stored normally. Other cartridges use flash memory in the same area and use a standard protocol to write bytes to flash memory or erase areas for reprogramming. The third type is the EEPROM, it is located at the top of the Game Pak memory area (area 0D). It uses a bit-level protocol that uses DMA data transmission to send a series of bits to the EEPROM for programming. However, each game can have only one type of saving, and the cartridge header does not indicate which type is used in it. Some emulators, including mGBA, try to determine the type of conservation, waiting until the game tries to interact with one of them. But some games, including the Classic NES Series, are cheating emulators, trying to access the wrong type first. For example, all these games use EEPROM, but pretend to use SRAM in them. If they determine that they have written to SRAM, then demonstrate the Game Pak Error screen shown above. This trick is pretty easy to get around, and the emulator
checks the game code
in advance . If it detects the code associated with the Classic NES Series game, then it will forcibly change the save type to EEPROM.
Trick 5: prefetch violation
The next trick of these games turned out to be the hardest for me. It took several days to complete the study, after which we had to make fairly low-level changes to the basic emulation cycle. Device processors have a multi-step instruction flow process called a pipeline. At each stage, a separate task is performed so that one part of the CPU is busy while the other is performing its own stage. The conveyor is designed in such a way that after executing the instruction in one stage and passing it on to the next subsequent instruction, it can immediately take over the already vacated stage. The Game Boy Advance ARM7TMDI processor has three steps that are necessary for accurate emulation: sampling (instructions), decoding, and execution. At the sampling stage, a memory request is sent to the memory bus associated with the instruction. Then it is passed to the decoding stage, at which the processor finds out what the instruction is. And finally, the processor executes the instruction. A naive interpreter will combine all three stages, either to speed up the work, or simply without knowing the principles of the work of the processors. Until recently, the mGBA decoding and execution steps were combined. However, when studying the code of Classic NES Series games, an important discovery was made: the game changed the instructions in the immediate vicinity of the execution stage. Here is the assembler code derived from the classic NES Metroid video memory.
06000260: E3A01000 mov r1, #0 06000264: E28FE008 add lr, pc, #8 06000268: E51F0010 ldr r0, [$06000260] 0600026C: E58E0000 str r0, [lr, #0] 06000270: E3A010FF mov r1, #255 06000274: E3A010FF mov r1, #255
The operation of the code is quite simple. It saves 0 to r1, then loads the word at 06000260 into r0, saves it at 06000274. Then saves 255 to r1, and finally ... well, actually, I lied a little. Notice that the last instruction in this assembly block is the same address that was stored two instructions earlier. The value stored at this address is an instruction that saves the value 0 in r1 instead of 255. So what does this code do? The answer depends on the length of the conveyor.
The most important thing in understanding this block of code is to realize that after sending instructions to the pipeline, the change in memory that returns this address is not applicable. This is similar to the principle of cache integrity, but an even more severe case. This means that if the conveyor is long enough, the instruction going to the conveyor during recording will be as follows, which saves 255. If the conveyor is too short, it will save 0. As it turned out, the games are not loaded if they find the value 0 in the r1 register , and normally start with a value of 255. Realizing this, I had to extend the emulated pipeline into mGBA and insert a model of the stage between execution and selection. In the real ARM7TDMI pipeline between these two stages there is a decoding stage. However, I did not carefully read the specification and did not understand that this stage was implemented separately. After
adding one more stage of the pipeline to the Classic NES Series game interpreter, they suddenly began to work!

Trick 6: Inhomogeneities in the FIFO Buffer Sound
However, there is one difficulty: although the games can be played, the sound was hopelessly spoiled. The solution required a little debugging, and this trick also turned out to be unique for the Classic NES Series and, therefore, due to incomplete overlap of specifications, it was implemented a little wrong. Game Boy Advance has six audio channels: four procedurally generated audio channels (functionally enhanced channels of the original Game Boy) and two PCM audio channels. As I understand it, Classic NES Series games use only one channel, one of the PCM channels. PCM audio channels are controlled by a small internal FIFO buffer that starts DMA data transmission when a certain point is reached. Games configure them to write 32 bits at a time into the I / O registers associated with each channel. Since PCM channels are only 8 bits wide, 32-bit recording is actually four samples. But the Classic NES Series games are a bit different: they write only 16 bits at a time, half, and not in a whole register. Since I assumed that games could only write 32 bits at a time, this led to the emulator recording at one time two samples needed by the game and two empty samples. This banal oversight completely distorted the sound in games. After
making a simple fix the game began to work normally.
We have achieved success, but why such difficulties?
I don’t know why Nintendo did all this for the sake of the usual ports of NES games. Full-featured NES emulators have existed for a long time, good examples appeared in 1997. Although it was the first time to play Classic NES Series games on a portable console, the Game Boy Advance emulation on other portable devices has existed for several years. In addition, although the problems I listed prevented emulation in some projects, such protection was applied only in the Classic NES Series. I don’t understand why they put so much effort into hindering the developers of emulators in this particular case, but for me this resulted in several long evenings of a consistent analysis of the functions of the game code until things got really bad.
However, after eliminating all these problems, the games are started and run at 100%. The most current fixes were made in version 0.1.0, but some major edits were postponed until later. Games will be fully supported in version 0.2.0 after its release, and you can play them now in nightly releases.