[NES] Writing level editor for Prince of Persia. Chapter one. Acquaintance

Chapter One , Chapter Two , Chapter Three , Chapter Four , Chapter Five , Epilogue

Disclaimer

In childhood, like many born in the 80s, I had the prefix Dendy. A clone of the Japanese FamiCom, presented to us by the good Chinese, and distributed by the notorious Steepler, painted the childhood of many from the generation of the 80s in bright colors. Time after time, passing my favorite games far and wide, finding all possible secrets (and, often, without books with loud headlines in the spirit of “Secrets and Passages of 100,500 +1 games”, the value of which tends to zero), I wanted to play them more and more, but with new levels, new secrets and new features.

Some games provided a built-in level editor (for example, Battle City, Fire'n'Ice aka Solomon's Key 2), but most of them, of course, did not provide such an opportunity. I wanted to play (naturally!) In the new Super Mario Bros (oh, how I loved and hated the Chinese who released the 99 ... 9 in 1 cartridge, which had levels A-1, B-1, ... Z-1, which is impossible had to go through, or which were duplicates of the original levels, but with modified textures), Duck Tales 1, 2, and many others.
')
With the advent of the computer and the possibility of emulating games on it, the light at the end of the tunnel of my dream dawned on me, and hope began to sneak in me. Various editors of games started appearing in which it is enough to specify a ROM file and you can not only see the whole game with all its secrets and pitfalls, but also add something of your own.

Meanwhile, the editors for some games were almost on the first links in Google, while others were hidden away or somewhere far away (but they were still), or were completely absent. Having found editors for most of my favorite games, I could not find an editor for a Persian prince. Yes, there are editors for the DOS version, there are for SNES, but my native, NES version, was deprived of such a treasure.

The various kinds of resources on NES and its emulation were not very willing to give in to my understanding and I remained somewhere at the level of noob.

And, once, at an hour of unprecedentedly hot sunset, in Moscow, I pulled the belt, opened the HEX-editor and the emulator with a debugger, and began to study the mysterious for me set of bytes contained in the ROM.

I began ... However, before I begin to tell further, I warned:
It is not recommended to climb under the cat professionals. There are considered the "zhivoderskie" beginner methods! Your nervous system may be irreversibly disrupted! I warned.

Understand the essence

Looking at the set of bytes in the HEX-editor and the set of instructions of the 6502 processor unknown to me at that time, I was sad. This whole picture did not clarify anything. But, eyes are afraid, and hands do. We begin to study what is.

NES header

This is the first thing that lies on the surface. Let's start with it.
The header is 16 bytes in size, where, in addition to the signature, there is technical information about the ROM file. The ROM file itself is a collection of data and technical information in a single file. Let's look inside the title.
4E 45 53 1A 08 00 21 00 00 00 00 00 00 00 00 00
4 bytes: Signature NES->;
1 byte: The number of 16 kB PRG-ROM banks;
1 byte: Number of 8 kB CHR-ROM banks;
2 bytes: Flags;
1 byte: Number of 8 kB PRG-R A M banks;
2 bytes: Flags;
The remaining tail is zeros.

The flags store specific information about working with video memory, the presence of a “battery” in the original piece of hardware, mapper and other technical information of little interest (I will not describe it, because in this particular case it was not needed).

From the header, we find out that we have Mapper # 2 (the byte with the mapper number is made up of the halves of the sixth and seventh bytes) and 8 16 kB of PRG-ROM banks. Everything else is missing.

It follows from the documentation that the PRG-ROM banks are the actual code + arbitrary data that will be used directly by the CPU. CHR-ROM banks are data for a video subsystem (PPU). The CPU cannot directly access them.
How do we get the image if we do not have a single CHR-ROM? Very simple: in PRG, we store data that is copied all the time into PPU memory. Uncomfortable? Yes. But we have, in a sense, more power over this data.

Next, we find out that Mapper # 2 actually implies several different mappers, identical in functionality but differentiated by structure: UNROM and UOROM, combined by name in UxROM. The difference is only in the number of PRG-ROM banks: in the first 8 banks, in the second - 16. For any of them, the last bank is fixed at the end of the RAM ($ C000- $ FFFF), and the remaining 7 (or 15 for the second case) can be switched into the memory area of $ 8000- $ BFFF.
This is all technical information which, at the moment, does not tell us anything sensible.

Stocking up on canned food

So, we can split the ROM file into 9 components: Header and 8 banks. There is no sense to edit the title, because information for the emulator program is stored there, and we don’t know how to edit banks. First, the banks do not have any strict structure (as, for example, in PE-format, where the code and resources lie in their sections) and the data can be mixed chaotically with the code. Maybe we are not lucky at all and the data for building levels are formed by executable code.

But, for now let's estimate the options, how could we get what we need from whole pieces of binary porridge:

The most difficult, but also the most universal: sequential reverse code debugger. Using this method, we will definitely get to what we need, on the way getting bonuses in the form of additional information about how the code is constructed. The disadvantages are obvious: we will spend a lot of time on the reverse, but this is still half the trouble, because we still don’t know anything, which means we will spend 90% of the time learning the assembler and various programming tricks on it. Those. Efficiency will be about 10%. Few;
The study of RAM, and then debugging at the breakpoints of the memory access (consider below). This method is already better. We have relatively few memory for study: out of 64 kB of available memory, half of us go to banks from a ROM file, half of this half is either reserved or used by IO ports. And finally, the remaining half still beats in half. One half is the IO ports for PPU (there are not many of them, but they mirror all of this half), and the second is divided into four parts of 2 KB. The first part is the actual RAM used by the code, and the remaining three are the mirrors of the first part. Thus, we have 2 kB of memory left to study, which we can easily study with money. The efficiency of the method is higher, because we will have before our eyes live data that we can change right at the time of execution and immediately look at the result;
Study read data. At the time of going to another level or going to another room, we study the data that was read from the ROM file. As we remember, in Prince of Persia, each level is divided into rooms between which it runs;
“Artisan” method - “Brutfors”: changing one byte one by one, screenshot of the result, and why study a bunch of screenshots.

We stock up on the necessary amount of material and proceed to study it. We will study in the reverse order: from simple to study (just look at the screenshots) to complex (we study the debugger listings).

Armed with tools:

Favorite yap for writing helper utilities. Maybe whatever. I used C ++ as part of VS2010;
Emulator with debugger. I used versions from different branches of the FCEUX emulator. FCEUX and FCEUXDSP. The first has a simple, clumsy debugger that can do very simple things. The second one has very powerful tools for debugging and studying memory, but unfortunately it often fell, so in simple cases I resorted to the first one;
Any hex editor. I used WinHEX, which allows, without saving the file, to launch it for execution (we edit any byte and press Ctrl + E).
For reference, you can use IDAPro with the NES bootloader. You can use the loader from cah4e3. But do not expect miracles from her. banks in the process of changing dynamically change, and the correct code will be only from the last bank.

Before starting, let's take a look at what UxROM is:
Generally speaking, a mapper is an algorithm from the point of view of an emulator, and from an iron point of view is a controller that handles bank switching in memory.
First: Why does he do it?
Second: How does he do it?

The first question is simple: the RAM, which the processor can address, is only 64 kB, and not only the code and data, but also the I / O ports, as well as a bunch of other things (like nonvolatile memory, where intermediate results of some games). Found a simple solution: as the code is executed, not all the data in memory is required for operation, so they can be simply deleted, and in their place put more important data at a given time. Next to the ROM-memory on the cartridge they put a controller, which, on command, displays the desired piece in the address memory. Depending on the complexity of the game, these controllers differed in their fillings. The Chinese picked it up on time and came up with a lot of different (adequate and not-so) mappers. Because of this, we have a large number of multiplays.

In the second question we will consider only the work of the mapper UxROM, since the rest are not interesting to us now. Mapper takes the last pot (# 07 or # 0F), puts it at $ C000- $ FFFF and no longer touches. All other banks are switched on as needed after the recording (in general) at any address from the bank $ C000- $ FFFF space (# 00- # 06 or # 00- # 0E). But this is done correctly as follows: at the address $ FFF0 + N, N is recorded, where N is the bank number, and as a result, we see the contents of the desired bank at the addresses $ 8000- $ BFFF.

We chop the bank with an ax. Method number 4.

For this, a small utility was written that changed one byte (simple increment: Byte ++), saved it in a separate file, then started the resulting ROM in the emulator, executed the screenshot and closed the emulator.
It would be reasonable to reduce the number of screenshots, because to study over 130.000 screenshots even quickly would be difficult.

Since we only have PRG-ROM banks, some of them probably also have tiles that are not interesting for us. We will try to exclude them.
We take any tile editor, open a ROM file in it. I used Tile Layer Pro - it's quite an ancient program, but it knows its work. It looks something like this (in the screenshot, the tiles from the Final Fantasy game). The status bar of the program window indicates the offset of each tile. We can scroll through the data window to the point where the tiles obviously end and the garbage data starts in terms of graphics. Scrolling through, we find out that the first two banks are graphics. We miss them and 6 banks remain. Already "only" 96 kB. Difficult, but still easier.

Well. How can we find the data we need in this way? Very simple: skimming through the screenshots, we will see that in some of them, we consistently change the blocks in the rooms. The room consists of 10x3 blocks, respectively, in 30 screenshots in a row, we must (but are not required to!) Adjacent blocks will be changed to some other blocks: for example, a “concrete” block may change to a column or something else.

We start the brute-force utility somewhere aside, and we will start the study of the data read from the ROM.

We cut the cover of the can with a cleaver. Method number 3.

This method is similar to the previous one, but we significantly reduce the amount of data to be studied. How?
In FCEUXDSP there is a tool that saves read data to a neighboring file. Between pressing the Start and Pause buttons, the read data is placed in the file exactly in the place where it is stored in the original. Thus, we can open the data commit dialog, click Start, run from one room to another in the game, click Pause, and, like in the previous paragraph, examine the recorded data. This data will be significantly less. In fact, the code itself will show us something to pay attention to. And going through a hundred or two bytes of labor will not be even manually.

With this, I suggest to pause, go to the kitchen, make coffee and open the documentation for instructions on the processor 6502.
Before you use the method number 2, it would not hurt to get acquainted with the enemy.

We assemble a microscope to examine the contents. The devil is not so terrible as he is painted.

The processor 6502 has only 56 documented instructions , so a cup of coffee is enough to at least briefly familiarize yourself with them.
Since it will be difficult to figure out the code right away, I have come up with a simple C-like language for learning, into which the assembly sheet can be easily translated.

First, let's highlight a few points from the documentation:

Addressing can be direct register <-> memory (IMM): LDA $ 00; STA $ 00;
Direct with participation of index registers: register <-> memory + INDEX: LDA $ 0000, X; STA $ 0000, Y;
Indirect with participation of index registers: register <-> pointer + INDEX: LDA ($ 00), X; STA ($ 00), Y;

In indirect addressing, the processor extracts a 16-bit pointer from two cells (within the first page of $ 00- $ FF memory), adds the value of the index register to it, and then works with the cell at the address thus obtained.

From here we make the following pseudo code conventions:

Variables are denoted as $ XXXX (where XXXX is its address);
We denote direct addressing as #XXXX [Y] (where XXXX is the address from which we are addressing). Analog: * ((char *) (XXXX + Y)) = A;
We denote indirect addressing as $ XXXX [Y] (a complete analogy with arrays in C);
All procedures we have the same form: char sub_XXXX () {}. Since the NES does not have any agreements on the transfer of arguments, we will not have any arguments. Any data, as a rule, is transmitted either through registers or through variables.
Registers have original names (A, X, Y).
Eight-bit numbers will be written as #XX in HEX

Take the simple bank switch code:

 $F2D3:84 41 STY $0041 = #$00 $F2D5:A8 TAY $F2D6:8D D1 06 STA $06D1 = #$06 $F2D9:99 F0 FF STA $FFF0,Y @ $FFF0 = #$00 $F2DC:A4 41 LDY $0041 = #$00 $F2DE:60 RTS ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Let's try to translate line by line:

 char switch_bank() //       A { $0041 = Y; //  Y Y = A; $06D1 = A; //     #FFF0[Y] = A; //       $FFF0+N  N ( N -   ) Y = $0041; //   Y return A; }

Now let's take the code more complicated:

 ;;         PPU $F302:20 D3 F2 JSR $F2D3 $F305:8E 06 20 STX $2006 = #$41 $F308:A9 00 LDA #$00 $F30A:8D 06 20 STA $2006 = #$41 $F30D:A2 10 LDX #$10 $F30F:A0 00 LDY #$00 $F311:B1 17 LDA ($17),Y @ $020E = #$03 $F313:8D 07 20 STA $2007 = #$00 $F316:C8 INY $F317:D0 F8 BNE $F311 $F319:E6 18 INC $0018 = #$02 $F31B:CA DEX $F31C:D0 F1 BNE $F30F $F31E:4C 10 D0 JMP $D010 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

The first stage of translation:

 char sub_F302() { sub_F2D3(); // switch bank. Bank counter in A register // $2006 – PPU Address register // $2007 – PPU data write //  $2006     // ( ,  ) // 2007       //  PPU.     PPU //    1. $2006 = X; //    $2006 = #00 //    X = #10; label_F30F: Y = #00; label_F311: //   $0017:$0018   // ,  ,     PPU $2007 = $0017[Y]; Y++; if ( Y != #00 ) goto label_F311; $0018++; X--; if ( X != #00 ) goto label_F30F; return sub_F2D3(#05); //  5-  }

And the last stage of the translation:

 void WriteDataIntoPPU(char Bank, char PPULine) { switch_bank(Bank); // switch bank. PPUADDRESS = PPULine; //    PPUADDRESS = #00; //    for(X = #10; X > 0; X--) { for(Y = #00; Y <= #FF; Y++) { PPUDATA = $Tiles[Y]; } $Tiles += #100; //     } return switch_bank5(); }

In two simple steps, we rewrote a hard-to-read (for a beginner) assembly listing in clear code. I did not consider the procedure switch_bank5, there is a banal code for assigning register A to the number # 05, and then calling the bank switching procedure sub_F2D3. For the development of automatism in translating the code into a readable two or three procedures I have had, then everything becomes much easier. After I had accumulated about a dozen 5-7 kB of text files, it was simply not necessary to translate the code - everything began to happen by itself in the head.

Go to the bouquet and candy period

In the second chapter we will get acquainted with the last two ways and more deeply penetrate the mysterious world of NES. I want to say that in the end we will be able to find the desired data by combining the first three methods. The fourth we will reject behind obvious minuses.

Assuming the appearance of the questions “Why did I describe it?” I will answer right away: when researching any subject, all the methods that can give a result are good. In our case, this method can be useful as the study of the black box by poking it with needles without getting into the jungle of code: some point will give its result. This method has obvious advantages of brutfors. One way or another, we stumble upon something.

Source: https://habr.com/ru/post/187876/

All Articles