Methods for modifying machine code: "selection" vs. "Genetic Engineering"
About the myths about GMOs have already been written here . In addition to this, I would like to share an excellent analogy with the analysis of real examples from the field of reverse engineering and the modification of machine code, which will be understood and closely related to programmers.
Machine Code Mutations
As an example, let's take the NES prefix (known as Dendy here), which uses the 6502 processor. The command system is very simple - the opcode is always represented by one byte, and each of the 256 does at least do something. No "protection" against the fool is not provided, and almost any random set of bytes will be executed without resistance from the processor. Thus, we can take a ROM of a game, fix random bits in it (we will call it “mutations”) - and after launching we see funny glitches in various unexpected places, but in general the game will most likely work. It seems that on YouTube there is a whole genre of this video. The machine code obtained in this way is probably not very correct, but in most cases the processor will be able to execute it and do something. ')
As it turned out, such a technique is used not only for fun (and playing familiar games with unexpected glitches is quite funny), but also for getting specific modifications to yourself: they make a large number of “mutants” and look for the one in which the desired effect manifests itself. Just as in modern methods of selection, when the embryos of organisms are exposed to mutagens (which leads to random changes in the genetic code), and then from those that could grow, those that have the desired trait are selected. The organisms thus obtained receive, in the appendix, a mass of other undesirable mutations. Get rid of them by gradual crossing with a normal species, seeking to obtain a more or less sane organism with the desired trait and a minimum of other mutations that were noticeable. The same can be done with machine code.
"Genetic engineering" on machine code
As a direct analogy, reverse engineering is suggested here. We study the machine (binary) code, understand how it works (in whole or only some of its parts), after which we make the changes that will most likely lead us to the target effect.
Programmers have a huge number of tools for analyzing machine code (all kinds of disassemblers, debuggers, and the like). Despite the fact that the machine code itself is not intended for human reading, it was invented by people and documented, so creating tools to transform it into a more understandable form (assembly listing) was not a problem. Also, almost any program under investigation was most likely also written by man, so what makes its code usually looks logical and understandable.
It is clear that geneticists are much more difficult. There is no official documentation that could shed light on any questions that may arise. The genetic code was “written” by chance and selected according to the principle “whether it could survive, adapt and give offspring”, so surely it is far from logical and optimal, rather even any obfuscation of program code we have invented will seem just childish prattle. However, this does not mean that the genetic code cannot be studied and tried to change it.
Prehistory
For me, Dendy has always been more than just a prefix. I not only played it, but also spent considerable time inside it with a soldering iron in my hands for some simple modifications. On the way somewhere I often thought about how these games are created and how it works inside. Surely, many of you, too, once asked such questions, such is the nature of future IT-people.
Years have passed. With a certain periodicity I immersed myself in an emu-topic, learning everything new on thematic sites, but I did not dare to plunge into the study of 6502 assembler and NES architecture. Internal conflict is rational and irrational. I have long convinced myself that I did not need to spend time on it, but ... I lost my temper. Looking at what interesting things do emu-scene enthusiasts do, I took up my old idea with a bright thought: “I will be able too!”.
Surely many of you remember the “9999-in-1” cartridges, which were remarkable for the beautiful menu with a romantic plot by the sea, flying seagulls and pleasant music (Unchained Melody). I disassembled this menu and made a tribute demo of Unchained Nostalgia based on it.
The main idea is that there is no list of games, and the slides switch to the music. In the free sky, I drew meaningful clouds and stars; implemented automatic slide switching in sync with music and the ability to manually switch; added auto-correction of playback speed and pitch for systems other than Dendy (after all, this menu was made specifically for Chinese family names, and they differed from the original NES, which is why the original menu works too fast on NES NTSC, and on NES PAL it turns out to be too low sound); I could not resist the temptation to add a number of curious Easter eggs. That is, the approach used does not limit the imagination in any way and allows, in principle, to make any changes for some finite time.
However, similar ideas come to mind to different people.
After the release of my demo, I discovered that someone also tried to implement the same idea. The author indicated that he used “corruptor”, which is usually used to create corrupt versions of games with various glitches.
If someone had previously told me that such a “random” method is actually used to modify the machine code with the desired effects, I certainly would not have believed it. After all, this requires a lot of time and luck. Although on the other hand there is no need to study and understand the machine code. Since I already had a “live” example of such a hack in my hands, nothing remained but to investigate his work.
Slides switch automatically, about every 2 seconds (which is too fast). There is no manual control. When you change the slide for some reason, the screen flickers twice. The main thing is that it works! But how?
Traces of surgery
In the cartridge, the program code and graphics tiles are stored separately. The code is in PRG ROM, the tiles are in CHR ROM. The second can be easily changed using standard tools, for this reason, for Dendy there are a huge number of graphical hacks of famous games, where the code is left completely unchanged.
Compare the CHR ROM of the original menu and the hack under study: On the face of the traces of direct "surgical" intervention. For some reason, tiles of numbers, dots and a small gull were cut. Let's try to return the original CHR ROM and see what was hidden in this way.
Now it has become a little clearer. The input was broken in the program code, because of which the menu thinks that the “down” button is always pressed. Also the output of game titles was broken. Apparently, it did not work to break the output of the numbers of games and the little gull (this is the cursor pointing to the selected game), so they had to be hidden by cutting out the corresponding tiles in CHR ROM.
... and some magic!
Digging deeper. Using the disassembler, let's see what exactly the bytes mutated in the PRG ROM do.
This is the NMI interrupt handler, which is called when the next frame is drawn. Apparently, the address of the called function was corrected. The original function was engaged in reading the state of all buttons and writing their values ​​into one byte in the form of a bit mask. The new call executes only the last instruction in this function, which saves the result from the Y register to a variable with the mask of the currently pressed buttons. Thus, the garbage value of the Y register is used, which remains after the execution of the previous code. In most cases, there is a trash value of 0x07, which in binary form looks like 00000111 and corresponds to the Down, Left, and Right buttons simultaneously pressed (the main code ignores Left and Right, processing the Down button) . However, immediately after the slide is switched to Y, the trash value 0xFF remains, which means that all the buttons were pressed simultaneously, which eventually caused the debugging code, which in the original menu (on Left + Start + B) displays the revision number that is accompanied by disabling PPU for one frame. This causes double screen flicker.
Original Code
Hack code
LDA #0 STA $2000 STA $2001 LDA #$22 STA $2006 LDA #$AC STA $2006
LDA #0 STA $2000 STA $2001 SBC #$22 STA $2006 LDA #$AC STA $2006
The output code of the revision number on Left + Start + B is broken here, so even though this debugging code is executed every time due to the problem described above, we still don’t see the revision numbers on the screen. It turned out in a funny way: the instruction that set the value of 0x22 into register A was replaced with an instruction that subtracts the number 0x22 from register A. Since the register A is always 0 before this, the number 0xDE is obtained. The original code set for PPU the beginning of the output at 0x22AC in the video memory, which is located within the first screen. The new code, it turns out, sets the address 0xDEAC, which is far beyond the limits of the available video memory (the maximum allowed address is $ 3FFF), and as a result, the inscription is displayed “to nowhere”.
Original Code
Hack code
ASL A TAX LDA $EAB3,X STA $0A LDA $EAB4,X STA $0B LDY #0 loc_C380: LDA ( $0A ),Y BEQ loc_C38B STA $2007 INY JMP loc_C380
ASL A TAX LDA $EAB3,X STA $FA LDA $EAB4,X STA $0B LDY #0 loc_C380: LDA ( $FA ),Y BEQ loc_C38B STA $2007 INY JMP loc_C380
Here we had a broken code that displays the names of games. When writing a pointer to the output string, the low byte of the 16-bit pointer is written at 0xFA instead of 0x0A, and the high byte is written as it should - at address 0x0B. I was lucky that 0xFA was not used, and it did not break anything further. Next, the program reads the line pointed to by a pair of bytes 0xFA and 0xFB (0xFB is always 0). By a happy coincidence, all the pointers thus obtained point to a memory filled with zeros, so the lines with the name of the games are not output.
findings
Have you been able to feel how everything is “famously twisted” in the modifications created in such a “random” way? One failure props another failure, and somehow it generally gives a result that is close to the desired one. At the same time, there are simply incredible dependencies between random and, at first glance, unconnected parts of the code, and with the slightest further modifications (even correct ones themselves), everything can collapse. Full reverse engineering is much more reliable. The same can be said about genetic engineering, in comparison with traditional selection.