Reverse engineering game Lost Vikings

After an interesting reverse development of the game engine Comprehend (see Recomprehend ), I selected a new project for reverse engineering of the game under DOS. Over the years, different people have reversed many of the old popular games and published specifications and tools for them. For example, the site shikadi.net has a lot of information about the games I played in my childhood.

I found that Blizzard’s The Lost Vikings reverse engineering (then called Silicon and Synapse) did not seem to have made any serious attempts to reverse-engineer. The game was released in 1993, at the sunset of the DOS era, and I really liked it in my youth. The Lost Vikings is a platforming puzzle game in which the player controls three Vikings, each with their own skills. Vikings need to combine their strengths to solve puzzles and complete levels with different themes: spacecraft, prehistoric world, Ancient Egypt. The image below shows the first level of the game (source: Strategy Wiki ):

')
It seemed that this game will be quite easy to disassemble. The levels are based on tile maps and contain simple puzzles: buttons that turn on and off objects, mobile boxes and a crane that lifts objects. And in fact, most of the reverse development project was fairly straightforward. The game has one batch data file containing compressed file blocks. The blocks encode various game resources, such as sprites, maps, sounds, etc. I have written several utilities that you can use to view game resources: The Lost Vikings Tools .

Virtual machine

An interesting aspect of the engine is that the objects in the game use a template system of classes. For each world, there is a block in the data file that defines a set of object classes. For example, in the first level shown above, both emergency exit doors are of type class 0x4f. Reverse engineering of the object class code led to the following function:

While skipping the alternative path to the address 0x142a2, this code uses si as an index in the structure of the array of object classes (each class template consists of 0x15 bytes). The word at offset 0x3 is the structure used as the address in the ES segment. The ES segment contains all the data of an object class template block. Then the code goes into a loop at address 0x142a6. This cycle receives the next byte from the ES segment and uses it as an index of the function table. At each step of the loop, bx contains the address in the ES segment, and si contains the current opcode (opcode).

It is interesting to note that the unconditional transition is used in the loop, and therefore it never ends. At least with normal program execution. It seems that each object class has some kind of program associated with it based on opcodes, which is interpreted by this function. Initially, I just explored some of the functions in the call table. Here’s what the first one looks like in IDA graph mode:

Analysis of the IDA stack here fails, and for good reason. pop ax as the first command of a function is a rather strange behavior. The x86 call command loads the command pointer (ip) onto the stack, and the ret command retrieves it to return to the caller. The pop command here actually drops the return address. This means that the next ret command will go back two frames of the stack instead of one. This opcode performs an exit from an infinite interpreter loop.

To understand how the opcode is actually implemented, we need to switch to linear IDA mode:

I commented on the function names with their respective opcode numbers so that this part of the code would be clearer. Code reuse is quite clever here. The easiest way to start is to study opcode 0x03. Recall that in the processing opcodes cycle there is data of object class blocks loaded into the ES segment, and bx is used as a pointer to the virtual machine command. Thus, opcode 0x03 is an unconditional branch instruction. It loads the word at the address of the current command pointer, sets the command pointer to it, and then returns to the opcode processing cycle.

Working in the reverse order, opcode 0x05 skips the next word operand and stores the next address in an array. The reverse of other functions indicates that word_28522 is the index of the current instance of an object, that is, this opcode stores one address for each object at the level. Then it restores the value of bx and passes in the code to the opcode 0x03 (transition). Therefore, this opcode seems to be a very limited call command with only one stack level.

Opcode 0x00 saves the current function pointer to the global array. Note that it differs from opcode 0x05, which stores the address after the operand of the call address. Then it goes to the opcode call handler. However, the command will not be executed due to the first pop command. Instead, the calling loop will exit. The address stored by this command is used as an alternative path to the function call loop, which I did not consider above. Here the opcode 0x00 is used as something like a coroutine exit. It saves the current position in the program and exits the opcode processing cycle, allowing the game engine to perform other tasks: update the graphics, retrieve user input data, etc., returning to where the virtual machine program exited. This allows the virtual machine program to perform complex tasks without idle the rest of the game engine.

Disassembling Opcodes

At this stage, I already had a general idea of how the opcode handler of the virtual machine works. Having studied the table of function calls, I found about 215 implemented opcodes there, so instead of developing them directly in reverse, I started writing a simple script to decompile the program for the desired class of objects. Thus, I could focus only on opcodes called by objects on the first level.

At this point, I basically tried to determine how many operands opcodes have, and what their main tasks are. If the opcode handler was short, I usually tried to completely figure out what it was doing. If it was more than a few blocks of code, then I just tried to find out how many operands it has, and whether it performs a conditional transition or a call. I did this because sometimes the purpose of an opcode becomes apparent from its surrounding context in a disassembled program, which is often easier to understand than doing code reversal.

Here is an example of a simple opcode:

This opcode takes an operand-word and stores it in a global variable. Reversing other functions indicates that this global variable is a temporary storage or general purpose register. There is only one such register in a virtual machine, but it also has the following opcode:

This opcode saves the current value of the general-purpose register to an address in the data segment (DS) specified by the operand-word. Programs in the game data file use this opcode to write to certain parts of the game data, such as Viking inventory slots. Several DS offsets are also used as additional time values when they need more than one. For example, offsets 0x0206 and 0x0208 are often used as temporal x and y coordinates of an object.

This opcode and its opposite opcode 0x53, which reads from DS to the general register, are interesting in that they can read and write to DS any game data, and not just those that are designated by the creators of the original game. This gives us interesting prospects for expanding the original game.

More complex is the opcode 0x14:

At first glance, its reversal should not cause problems. It receives a byte operand and calls a couple of subfunctions. It then receives another byte operand and calls the same subfunctions again. But if you look at the first subfunction (sub_15473), then the case becomes a bit more complicated:

The analysis of the IDA stack fails again, and it shows four exits from this block, which I cut off because all of them are incorrect. What actually happens is that the first byte operand is passed to this function in ax, which is then used as the index of the jump table. Although the AND operation is performed on ax for 7, in fact there are only four real values in the transition table. Looking at the type 0 in the transition table, we will see the following:

This is where the immediate operand word is loaded. This function fragment executes the retn, which is returned from the top-level opcode handler 0x14. It is easy to see why the IDA has problems with the correct disassembly of this code.

Other entries in the jump table load the values in different ways, and then each of them retn again. Type 1 takes a byte operand used as the index of the object field. The object has fields for values such as x and y offsets, flags, etc. Type 2 receives the operand word and loads the value from DS (as opcode 0x53 from the example above). Type 3 receives a byte operand and loads the target field to perform actions, such as collision detection.

Let's go back to the top-level opcode handler 0x14. Here the next is called sub_15470. It is only three bytes ahead of the first sub-function called. And in this there is again a smart code optimization. This subfunction simply shifts ax to the right by 3, and then proceeds to the code that we just reversed above. So, the first byte operand of the opcode 0x14 is used to encode how the next two arguments are obtained. Reversing the processing of the second byte operand shows that it works in the same way.

After receiving all its operands (of variable length), the opcode handler 0x14 calls sub_13809. I will not insert the code here, because the function is quite large and causes many other subfunctions, each of which is also voluminous. This is one of those cases where I did not bother reversing the actions of a function, because later they will become clear from the context.

CISC virtual machine

The commands in The Lost Vikings virtual machine are variable length. In the case of the opcode 0x14 and other similar opcodes, determining the length requires decoding some operands. The presence of variable length commands means that the program must be disassembled iteratively, at least determining the number of operands for each new opcode in the program. If all opcodes had a fixed length, then it would be possible to skip unknown opcodes.

There are also commands that perform standard operations used by many programs. For example, many programs perform summation or subtraction from the current position of an object based on its direction. This can be encoded as several commands, but opcode 0x91 receives a single operand word as an offset in DS, and then adds or subtracts from it the current value of the temporary register based on the current horizontal direction of the object (flag 0x0040). The following pseudocode demonstrates how opcode 0x90 works.

if (this.flags & 0x0040) DS[operand] -= var; else DS[operand] += var;

Due to this, much less bytes are required for encoding, especially since this operation is performed by several programs. In the days of DOS, when disk space and computing power were in short supply, this made a lot of sense. For a modern reverse developer, this simply adds a few new opcodes that need to be sorted out.

Disassembler

I wrote a simple Python script that can disassemble virtual machine programs into something that vaguely resembles C. It decodes program instructions linearly, keeping the addresses of jumps and calls. When he stumbles upon a function that stops or redirects the execution of a code, for example, a return or an unconditional branch, he stops and checks other unvisited addresses in the program.

In the programs of the virtual machine, they used optimization tricks similar to those used in the binary code of the game. For example, some exit paths from subprocedures are reused in virtual machine programs by switching to them from other subprocedures. The disassembler copes with this by decoding the commands twice, so two sub procedures can have code with the same addresses.

The disassembler can trim access to the temporary register to make the final code more readable. For example, in the virtual machine programs the following sequences are often used:

 var = 0x1000; obj->field_10 = var;

The disassembler reorganizes them as follows:

 obj->field_10 = 0x1000;

The disassembler does not attempt to rebuild the blocks of code in such constructions as if or while, so the resulting programs look like spaghetti code. Some large programs themselves can be made full-fledged projects for reverse engineering.

Full program

I started with a program for a turret with a cannon in the upper left room of the first level (object class 0x04). It seemed to me that it should be a fairly simple program. A turret with a gun shoots shells at a certain interval, and the Vikings cannot destroy it. The following shows the program obtained after processing my disassembler. There is a comment above each opcode that shows the address in square brackets, an opcode in quotation marks, followed by operands.

The tower program looks like this:

 main_374d: { // [374d] (19) 37bc this.field_21 = 0x37bc; this.field_22 = 0x0001; // [3750] (97) 00 01 if (0x0001 & (1 << 0)) var = 0x0001; // [3753] (00) this.save_state(); // [3754] (9c) 18 08 if (!(var)) this.db_flags &= ~(1 << 12); label_3757: // [3757] (2f) vm_func_2f(); // [3758] (00) this.save_state(); // [3759] (01) nop(); // [375a] (51) 0000 // [375d] (8c) 30 3798 if (this.field_30 != 0x0000) call sub_3798; // [3761] (52) 1c // [3763] (77) 0000 3757 if (this.update_time != 0x0000) jump label_3757; // [3768] (19) 37c8 this.field_21 = 0x37c8; this.field_22 = 0x0001; // [376b] (52) 1e // [376d] (57) 0206 g_tmp_a = this.xoff; // [3770] (51) 0004 // [3773] (91) 0206 if (this.flags & 0x0040) g_tmp_a -= 0x0004; else g_tmp_a += 0x0004; // [3776] (52) 20 // [3778] (57) 0208 g_tmp_b = this.yoff; // [377b] (51) fff8 // [377e] (5a) 0208 g_tmp_b += 0xfff8; // [3781] (51) 001e // [3784] (56) 1c this.update_time = 0x001e; // [3786] (02) 3030 vm_func_02(0x3030); // [3789] (14) 12 0206 0208 00 0000 0000 05 new.x = g_tmp_a; new.y = g_tmp_b; new.unknown_a = 0x0000; new.unknown_b = 0x0000; new.unknown_c &= 0x801; new.type = 0x05; spawn_obj(new); // [3795] (03) 3757 jump label_3757; } sub_3798: { // [3798] (51) 000a // [379b] (73) 30 37a5 if (this.field_30 == 0x000a) jump label_37a5; // [379f] (51) 0000 // [37a2] (56) 30 this.field_30 = 0x0000; // [37a4] (06) return; label_37a5: // [37a5] (52) 32 // [37a7] (69) 0a 37ba if (this.argument < this.field_32) jump label_37ba; // [37ab] (52) 32 // [37ad] (5c) 0a this.argument -= this.field_32; // [37af] (51) 0000 // [37b2] (56) 30 this.field_30 = 0x0000; // [37b4] (51) 0000 // [37b7] (56) 32 this.field_32 = 0x0000; // [37b9] (06) return; label_37ba: // [37ba] (0e) vm_func_0e(); // [37bb] (10) this.update(); }

Here you can see the following:

The function this.save_state () is the opcode 0x00. It is used as a result, allowing the program to save the place in which it was located and return to the game engine code.
My disassembler assigns names to some fields of an object. In this program, you can see the names xoff, yoff, flags, and update_time.
Names are also given to known DS offsets. g_tmp_a and g_tmp_b are offsets DS 0x0206 and 0x0208.
It uses opcode 0x14. I gave him the name spawn_obj (), because from the context of its application it is clear that it is used to create a new bullet object with each shot of the tower.
Opcode 0x19 assigns values to the fields 0x21 and 0x22 of the object. It is noteworthy that the field 0x21 is assigned a value resembling the address of the program. Reading the code and experimenting, I found out that this address is actually an offset for commands in a small secondary virtual machine that controls the display of graphics objects. Until I reversed this code.

Refactoring a program helps to understand what is happening in it:

 main_374d: { /*   */ this.set_graphics_prog(0x37bc); if (0x0001 & (1 << 0)) var = 0x0001; this.save_state(); if (!(var)) this.db_flags &= ~(1 << 12); label_3757: /* *   .     *     . */ while (1) { vm_func_2f(); this.save_state(); /* *    .  this.argument *       * . ,     *    . */ if (this.field_30 != 0x0000) { if (this.field_30 != 0x000a) { this.field_30 = 0x0000; } else { if (this.argument < this.field_32) { vm_func_0e(); this.update(); } else { this.argument -= this.field_32; this.field_30 = 0x0000; this.field_32 = 0x0000; } } } /*    update_time  ,    */ if (this.update_time == 0x0000) { this.set_graphics_prog(0x37c8); /*    */ this.update_time = 0x001e; vm_func_02(0x3030); /* *          *        . */ if (this.flags & OBJ_FLAG_FLIP_HORIZONTAL) new.x -= 4; else new.x += 4; new->y = this.yoff - 8; new->unknown_a = 0x0000; new->unknown_b = 0x0000; new->unknown_c &= 0x801; new->type_d = 0x05; spawn_obj(new); } } }

It is worth noting that the program still has parts and opcodes, the purpose of which I did not define, but the behavior of the tower is quite understandable. The object's update_time field is used to encode the shooting speed of the turret. The program itself does not reduce this value (at least in the obvious way). Perhaps this task is performed by one of the unknown opcodes or some part of the main game engine.

Hacking game

Of course, the opportunity to study the source programs used by the game is interesting in itself. But the most interesting thing about the way a virtual machine implements an object’s behavior is that we can change it or create new programs. It would be much simpler if you write a compiler (I will tell about it below), but for now we can only change the values in the hex editor manually.

I repeat - I chose to modify the program of the tower with a gun, because it is quite simple. At first I decided to change the type of object that the tower shoots. Success was variable. Attempts to implement the shooting of some objects, for example, a green alien, led to the "throwing out" or hang the game. The choice of other objects led to the fact that the tower did not shoot at all. I managed to get her to shoot fire arrows (in a normal game this is a bonus that one of the Vikings can use). It is also quite simple to increase the rate of fire of a cannon by decreasing the value of the update_time field.

The second modification I made became a bit more ambitious: I made the tower move instead of shooting. Opcode (0x14) to create a bullet is quite long:

 14 12 0206 0208 00 0000 0000 05

The virtual machine provides the ability to use the single-byte nop (0x01) command (which is quite noble on its part), which can be used to replace this command. The commands above have already set DS [0x0206] to the current position along the x axis of the tower, plus or minus 4, depending on its direction. So I just need to add commands to assign this value to the current position x. This requires only two commands:

 53 0206 // var = DS[0x0206]; 56 1e // this.x = var;

Thanks to this change, the tower can move, but nothing stops it. I tried to implement collision detection and rotation back, but I could not achieve proper operation. Deeper reverse engineering is required.

I collected a small video of my modifications:

I also laid out my disassembler in open access:

https://github.com/RyanMallon/TheLostVikingsTools/blob/master/dissassembler/lv_vm_disasm.py

Recompilation of the game

In the previous section, we talked about the reverse engineering of the virtual machine used to implement objects in Lost Vikings.

I used to test some of the opcode empirically, by manually patching the original data blocks in a hex editor. This approach worked, but was very monotonous and did not scale well for experiments with large programs or with programs containing loops and transitions. Above, I suggested that creating a simple language and a compiler would be useful for further developing the virtual machine. Therefore, I wrote it.

Building a compiler

To help reverse engineer a virtual machine and ease the creation of new programs, I decided to write a compiler. In order not to complicate things, I chose the single-pass method of recursive descent.

The design of the compiler is mainly borrowed from compilers such as PL / 0, which are often studied at university courses on compilers (see PL / 0 ). I will not go into the details of writing the basic compiler, because the Internet has enough information about it. In essence, the compiler gets the source program, performs lexical analysis to create a stream of tokens, and then parses the program, generating code.

Lost vikings c

I created a very simple C-like language for implementing object programs on it. One advantage of using a C-like language is that you can use an existing C preprocessor to support the definition of constants and macros. The language does not allow to define variables, the programmer is limited to the fields of objects and global variables defined by the virtual machine. Built-in functions provide support for generating opcodes for the operations of obtaining results and creating objects. The language is slightly different from C, just to preserve the simplicity of the lexical and parser.

Again, take for example the object of the turret with a cannon from the first level, which we modified above. You can implement an arrow-firing tower on Lost Vikings C as follows:

 #include "vikings.h" #define timer field_30 #define DELAY_TIME 40 function main { call set_gfx_prog(0x37bc); this.timer = 0; while (this.timer != 0xffff) { call update_obj(); call yield(); if (this.timer != 0) { this.timer = this.timer - 1; } else { //      //  . if (this.flags & OBJ_FLAG_FLIP_HORIZ) { g_tmp_a = this.x - 12; } else { g_tmp_a = this.x + 12; } g_tmp_b = this.y - 12; //    call set_gfx_prog(0x37c8); //   call spawn_obj(g_tmp_a, g_tmp_b, 0, 0, 7); //     this.timer = DELAY_TIME; } } }

Each object needs a main loop, which should never be terminated. The main loop must call the yield () function to allow the output to execute the virtual machine's processing loop. Otherwise, the game engine will freeze. Calling update_obj () does what its name implies.

The use of many fields of objects refers to a specific object. In our case, I used the field_30 field as a timer to control the speed of the tower. When the timer reaches zero, the tower shoots and then resets the timer.

The only thing left is to generate code for this program.

Code generation

The Lost Vikings virtual machine is not an ideal environment for a simple compiler. PL / 0 compilers usually generate code for an abstract stack machine, so a simple type expression is:

 this.x = this.y + 1;

will generate the following code:

 push this.y ;  this.y   push 1 ;  1   add ;    ,     pop this.x ;    this.x

However, The Lost Vikings virtual machine does not have a stack. It has a simple temporary register, object fields, and global variables. The program shown above is translated into a Lost Vikings virtual machine as follows:

 52 20 ; var = this.y 56 1e ; this.x = var 51 0001 ; var = 0x0001 59 1e ; this.y += var

It is much harder for it to generate code when compiling from a general purpose language. You may need to generate an intermediate representation in the first pass, and then in the second pass, change the order of commands, etc. to generate the final code.

Abstract machine

I came up with the following solution: creating an abstract stack machine for which you can comfortably compile over the Lost Vikings virtual machine. This is possible thanks to opcodes for loading and storing global variables. These opcodes receive as their operands offset in the data segment (DS) of the game. This allows opcodes to be used to load and store any arbitrary address in DS.

The compiler generates code by placing a fake stack on top of the (0xf004) data segment. The two addresses at the bottom of the fake stack are reserved for the special register: 0xf000 is the zero register, and 0xf002 is the flag register used for comparison. The above expression can now be compiled like this:

 52 20 ; var = this.x 57 f004 ; ds[f004] = var 51 0001 ; var = 0x0001 57 f006 ; ds[f006] = var 53 f006 ; var = ds[f006] 5a f004 ; ds[f004] += var 53 f004 ; var = ds[f004] 56 1e ; this.y = var

Despite not very high performance, this code is very easy to generate. My goal is to create a compiler for testing small programs to determine what opcodes or their combinations do. So far I am not very concerned about efficiency, in terms of both time and code size.

The second problem in creating a language for the original virtual machine is that many opcodes perform conditional function calls. For example, opcode 0x1a checks if the current object collides with a Viking, and if so, it executes a call command. It is desirable that the programmer can decide for himself whether to use a call or a transition (for example, a construction with if).

I implemented this by generating code for a standard helper function that sets a flag and performs a return. When generating a code for an opcode that performs a conditional call, the flag register is first reset. If the opcode makes a call, the flag register is set. You can then check the flag to make the transition call. For example, the following code:

 if (call collided_with_viking(0x01)) { this.x = 1; }

generates the following code:

 [c009] 51 0000 ; var = 0x0000 [c00c] 57 f002 ; ds[f002] = var,    [c00f] 1a 01 c0a5 ; if (collided_with_viking(0x01)) call c0a5 [c013] 53 f002 ; var = ds[f002] [c016] 74 f000 c026 ; if (var == ds[f000]) goto c026 [c01b] 51 0001 ; var = 0x0001 [c01e] 57 f004 ; ds[f004] = var [c021] 53 f004 ; var = ds[f004] [c024] 56 1e ; this.field_x = var [c026] ... ;     if ;     [c0a5] 51 0001 ; var = 0x0001 [c0a8] 57 f002 ; ds[f002] = var,    [c0ab] 06 ; return

Zero register is used here to compare it with the flag value. At the beginning of each program, the compiler generates commands to reset the zero register.

Software patching

Virtual machine programs for each game world are packed in a separate block in the data file. The compiler uses a simple approach: it adds the generated code to the end of the block, and then modifies the program title of the object so that it points to a patch in the program.

All this works in theory, but in fact there is one small problem. The game stores the virtual machine programs in an additional segment (ES) in the space specified by the DOS memory allocation API. The memory allocation function looks like this:

The memory allocation call for the virtual machine looks like this:

That is, 0xc000 bytes (48 KB) are allocated for programs. The problem is that the block for the programs of the world of the spacecraft already occupies 48972 bytes, that is, at the end of it only 180 bytes remain for the patch in the new program. Not so much when working with a compiler that generates very redundant code.

I discovered this problem when I tried to compile a large program. The game began to behave erratically, it either hung, or the cannon tower disappeared randomly. Such errors are quite difficult to track down, because this may be a bug in my compiler or generated code, a problem of misunderstanding of the principle of operation of some opcode, or a bug / restriction of the original game.

A quick solution is to patch the binary code of the game to expand the size of the memory allocated for the program. Increasing the allocated size to 0xd000 bytes (52 KB) does not cause problems and gives enough space for experiments with simple programs. I added commands for this to my compiler.

Empirical reversal

The purpose of the compiler was to simplify the reverse development of a virtual machine. To test a new opcode, you need to add an entry to the function dictionary in the code generator class and create a command generation function for the opcode. For example, one of the first experiments was to work with 0x41, which is used in several disassembled programs, such as objects of buttons / icons with a question mark.

His dictionary entry in the compiler initially looked like this:

 "vm_func_41" : (False, 4, emit_builtin_vm_func_41),

A tuple indicates whether a function will return a value, the number of arguments passed to it, and a code generator function. Functions are marked as returning values if the opcode performs a conditional call or transition.

After studying the opcode in IDA, I knew that opcode 0x41 is executed unconditionally and takes four arguments. His arguments use variable type encoding, which I described at the beginning of the article. It generates the following function:

 def emit_builtin_vm_func_41(self, reg_list): operands = self.pack_args(reg_list, 4) self.emit(0x41, *operands)

The pack_args helper function provides a package for the variable type arguments.

Now you can call it in a short test program. For testing, I use the collided_with_viking function (opcode 0x1a), so the opcode being tested runs only when the viking touches the tower. Note that the function collided_with_viking receives a single byte operand, the purpose of which I do not know, but the value 0x01 is quite suitable for it. I also use as a one-time trigger the field field_32 of the object, so that the opcode I test works only at the first touch of a viking object.

My test program looks like this:

 function main { call set_gfx_prog(0x37bc); this.field_32 = 0; while (this.field_32 != 0xffff) { call update_obj(); call yield(); if (call collided_with_viking(0x01)) { if (this.field_32 == 0) { call vm_func_41(1, 1, 1, 1); this.field_32 = 1; } } } }

As a result, this graphic “glitch”

appears in the game: It seems that the opcode is used to display the dialog box, but it is not clear why this graphic error occurs. If you look at some disassembled programs from the original game, you can see that opcode 0x41 is usually followed by unknown opcodes 0xcb and 0x42, none of which receive my arguments. When adding firmware to these opcodes in the compiler and restarting the game, a dialog box is displayed waiting for the button to be pressed, and then the window closes. So, opcode 0x41 shows a dialog box, opcode 0xcb waits for a button, and opcode 0x42 deletes the window.

Further experiments with opcode 0x41 show that its first argument is the index of the dialog line. Strings are stored in a binary game file, which is inconvenient for modding. The second argument is still unknown, and the third and fourth control the x and y offset relative to the object next to which the dialog box opens.

A good example of why this experimental reversal is useful can be the opcode 0xcb, discussed in IDA:

Although this function is small in size, and its purpose is clear (it changes global variables), it is not at all obvious what effect it has on the game engine. Studying cross-references for each of the global variables in IDA does not help much either. Each of these global variables is used in many places, and for none of them the purpose becomes immediately apparent. I could spend a lot of time in IDA, trying to figure out that changing these global variables tells the game engine to wait for a button press in the next game cycle. Empirical testing made it clear very quickly. The disadvantage is that I still do not know the exact purpose of each global variable in the function.

Interesting fields

One of the interesting features that I found when experimenting with the compiler is how objects can refer to each other. From the initial analysis, I already knew that there was support for references in the code, but the compiler helped me quickly understand the details.

The 0x3c field of each object identifies the current target object. Some opcodes, for example collided_with_viking (0x1a), automatically assign a value to this field, but you can also specify it manually. Vikings are always objects 0 (Baleog), 1 (Eric) and 2 (Olaf). After assignment of the target object, it can be managed through the target fields. Each opcode has its own options for changing the field of the current object (target field). For example, opcode 0x59 adds a temporary register to the field in the current object, and opcode 0x5b adds a temporary register to the field in the target object.

Other objects can control the fields in the Viking objects to control their behavior. For example, a 0x32 field for Vikings is the amount of expected damage. An object can inflict damage on a viking by adding value to this field. The Vikings themselves are partially implemented by programs in a virtual machine. The next time you start a viking program, it checks the expected damage field and deals the corresponding damage.

Of interest are also 0x12 and 0x14, which are the speeds for all objects in x and y. Before writing the compiler, I thought that the objects are moved either by adding / subtracting x and y offsets from their fields (0x1e and 0x20) or, possibly, using an opcode similar to the function. However, the game uses a couple of speed fields. Some objects, such as Vikings, automatically change their speed. Therefore, for example, if an object assigns a speed to the Vikings that raises them up, then in subsequent cycles of the game, the Viking accordingly changes its own speed in order to fall down again.

It seems that in the Viking objects there were not enough fields for storing everything, so each of the four Viking inventory slots is a global variable. An object can check whether the corresponding global variable is equal to zero, and give the viking object by directly assigning the value to the global slot variable of the inventory.

Advanced hacking game

To experiment with a virtual machine and demonstrate its flexibility, I wrote a slightly more complicated program for a tower with a gun on the first level. The tower checks which of the Vikings has touched it, and reacts differently. If Eric touches the tower, she gives him items. If Olaf touches, he pushes off into the air. When it comes to Baleog, the tower changes direction. Here’s what it looks like in action:

This level of flexibility in the game, created in the early 90s, is quite impressive. I added the source code of this program to the compiler's folder.

Work for the future

There are still quite a few opcodes, fields and global variables, the purpose of which I do not understand. The compiler lacks some functions, for example, while it supports only comparison operators == and! =, And it lacks most bitwise operators. It is a bit cumbersome to work, because it requires a separate unpacking of the original program block from the game data file, patching and repacking the data file.

All tools are public domain (public domain), their open source code is posted on github: https://github.com/RyanMallon/TheLostVikingsTools .

Old School Blizzard: Sprites, Maps and Palettes

I was engaged in reverse engineering of two early Blizzard games under DOS, The Lost Vikings and Blackthorne. Above, I wrote about the virtual machine used by The Lost Vikings to implement game objects. Blizzard has posted The Lost Vikings and Blackthorne freely available on Battle.net . It is worth noting that Blizzard created its early games, including The Lost Vikings, under the name of Silicon and Synapse. Blackthorne is the first game released under the brand Blizzard.

In this section, I will talk about how in two games the sprite formats and the tile engine are implemented. These two games have very similar engines, and some improvements have been made to Blackthorne after The Lost Vikings. In the future, I will call the game engine of both games "Blizzard engine."

Games store all their data in a single batch data file called DATA.DAT. A batch file is an archive that contains blocks of sprites, levels, sounds, etc. Most batch file blocks are compressed by a kind of LZSS compression algorithm .

I created some simple utilities that you can use to view sprites, levels, etc. from The Lost Vikings and Blackthorne. These utilities can be downloaded from GitHub (link above). In this section, I added code samples. All screenshots in the section are made using these utilities.

 ./sprite_view DATA.DAT -fraw -s -u -w344 -p 0x17b 0x17c ./level_view --blackthorne DATA.DAT 1 -h0x6e

Graphic mode

Both games use the popular VGA Mode X mode. Mode X is a 256-color planar graphics mode with a resolution of 320 × 240. "Planar" means that instead of a linear arrangement of pixels, as in VGA Mode 13h with a resolution of 320 × 200, they are divided into a set of planes. Planar graphics were originally designed to speed up graphics processing. It allows several memory chips to store separate planes and transfer them in parallel. This article on Shikadi.net has a good explanation of the work of plane graphics.

Mode X uses four planes. The first plane stores the pixels 0, 4, 8, etc. The second plane stores pixels 1, 5, 9, etc. Therefore, an 8 × 2 sprite is not linearly stored as pixels:

 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

Plane Mode Mode X stores them like this:

 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15

Basically, I did the reverse engineering of sprite formats, studying the data in blocks of a batch file, not the rendering code in IDA. I have a very rudimentary understanding of programming under DOS and VGA, and the rendering code for old games usually contains a lot of tricks for optimizing and cleverly using assembler that are hard to understand (for me).

Raw Sprites Format

The first format of sprites used in games is simply raw coded planar data. "Raw" sprites can have any width and height, divisible by four. Drawing raw sprites is simple:

 for (plane = 0; plane < 4; plane++) { for (i = 0; i < (sprite_width * sprite_height) / 4; i++) { offset = (i * 4) + plane; y = offset / sprite_width; x = offset % sprite_width; pixel = *sprite++; dst[((dst_y + y) * dst_width) + (dst_x + x)] = pixel; } }

Additionally, you can select the index of the color used as transparent. Raw sprites of the same width and height always have the same amount of data, but they do not use space very efficiently. This is especially true for the Blizzard game engine, because sprites use only sixteen colors (these are always values from 0x0 to 0xf), so four bytes per pixel are wasted. Sixteen colors per sprite are used because it allows you to apply clever tricks with a palette, which I will explain below.

Unpacked sprites / sprites with mask

The second format of sprites allows you to use transparency without sacrificing the color index, but the price is a more complex rendering algorithm and a slightly larger data size. In the utilities I developed, I called this format “unpacked”, but perhaps it would be better to call it “with a mask”.

In this sprite format, each set of eight pixels is preceded by a mask byte indicating which pixels to draw. Transparent pixels are still encoded with a value of 0x0, but are skipped when drawing. This allows you to use the color index 0x0 as an additional color.

The algorithm for rendering such sprites is as follows:

 for (plane = 0; plane < 4; plane++) { x = plane; y = 0; for (i = 0; i < (sprite_width * sprite_height) / 4 / 8; i++) { mask = *sprite++; for (bit = 7; bit >= 0; bit--) { pixel = *sprite++; if (mask & (1 << bit)) dst[((dst_y + y) * dst_width) + (dst_x + x)] = pixel; x += 4; if (x >= sprite_width) { y++; x = plane; } } } }

Various The Lost Vikings collected items are 16 × 16 unpacked sprites / sprites. They can be viewed using:

 ./sprite_view DATA.DAT -l2 -funpacked -w16 -h16 0x12f

Packed Sprites 32 × 32

The latest sprite format is used only in The Lost Vikings and is optimized for rendering 32 × 32 sprites. As in the unpacked format, each set of pixels is preceded by a mask byte, defining the pixels to be drawn. However, the packaged format does not store transparent pixels and packs two pixels into each byte, because only sixteen colors are used. If the number of pixels drawn is odd, then the last nibble is zero.

This format uses space more efficiently than the previous two, but it has a more complex rendering algorithm and variable length of sprites. Batch file blocks containing packed-format sprites begin with a header of 16-bit values defining the offset of each sprite.

The algorithm for rendering packed sprites is as follows:

 for (plane = 0; plane < 4; plane++) { for (y = 0; y < 32; y++) { num_pixels = 0; mask = *sprite++; for (bit = 7; bit >= 0; bit--) { if (mask & (1 << bit)) { pixel = *sprite; if (num_pixels & 1) sprite++; else pixel >>= 4; pixel &= 0xf; x = ((7 - bit) * 4) + plane; dst[((dst_y + y) * dst_width) + (dst_x + x)] = pixel; num_pixels++; } } if (num_pixels & 1) sprite++; } }

In the process of writing a post, I found the following sprite in the second-level title of The Lost Vikings. It uses a packaged 32 × 32 format, but I don’t remember seeing it in the game. Maybe someone knows, maybe this is a secret or unused resource? It can be viewed as follows:

 ./sprite_view DATA.DAT -l2 -fpacked32 0xec -b0x10

Why use multiple sprite formats?

I have no exact answer to this question. I am by no means a specialist in the intricacies of programming for optimizing VGA. Perhaps different formats are used to increase the rendering speed of sprites, which should have a different refresh rate. Tiles for cards always have a "raw" format. The raw format is also used for some interface sprites. Sprites in the packaged format 32 × 32 are the Vikings and many enemies. The unpacked format is used for some poorly animated enemies like a cannon tower and other objects, such as switches and elevators.

It appears that Blackthorne does not use the packaged 32 × 32 format at all. Perhaps because many Blackthorne sprites are larger than 32 × 32. Most sprites with detailed animations use a “raw” format. Blackthorne has no levels with scrolling (as in the original Prince of Persia), which speeds up rendering.

I did not understand yet another question: how does the Blizzard engine understand what format and size of sprites is used to draw each object? In the headings of the levels there is a partial relevant information, but it is not enough to correctly draw all the objects. I suspect that these parts of the rendering information are controlled by the virtual machine.

Sprite layout

As I mentioned above, Blackthorne uses larger sprites than The Lost Vikings. Although you can write an algorithm for rendering raw sprites of arbitrary size, a different approach is used in the Blizzard engine. Large sprites are rendered by combining several small sprites in a consistent pattern. For example, the main character of Blackthorne uses 32 × 48 sprites consisting of two 16 × 16 sprites for the head and one sprite 32 × 32 for the body:

It is worth noting that the background color used here is actually the color with the index 0 for Blackthorne sprites, which is considered to be transparent in the game. The whole set of Blackthorne character sprites can be viewed as:

 ./sprite_view --blackthorne DATA.DAT -fraw -w32 -h48 -l2 -b0x80 0x42

Perhaps this approach was chosen because the Blizzard developers have already written optimized rendering cycles for 16 × 16 and 32 × 32 sprites, and it was faster to render large sprites as a set of small ones, and not as one large one.

Levels from tile cards

Both in The Lost Vikings and Blackthorne, the levels are created from tile cards with 16 × 16 tiles (although it seems to me that in both games the authors did their best to ensure that such a feeling did not occur). Sprites used for tiles are actually 8 × 8 in size. Each tile contains a structure, which I call a “stub”. It defines how to create a set of 8 × 8 sprites from a 16 × 16 tile. Each part of the component can be turned horizontally / vertically, which allows the use of component tiles in different places.

For example, in the first-tier set of the first-level tiles of The Lost Vikings there are several tiles, such as stairs, which have mirrored versions, as well as many tiles, which several times use the same corner fragment, for example, blue riveted panels.

You can view this tile set like this:

 ./tileset_view DATA.DAT 1

The batch file contains for each tile set a block describing the blanks. Each blank has a length of 8 bytes, and each tile is encoded as a 16-bit value. For each 16-bit value, the sprite index is encoded as a 10-bit value, and the remaining 6 bits are used as flags. The flags determine the vertical and / or horizontal mirroring of the sprite, as well as where this component of the tile is located: in front or in the background. Encoding the front / background bit in the workpiece allowed the Blizzard engine to use a common tile map for rendering both layers

Blackthone expands the use of blanks in the engine in two ways. First, the three unused bits of flags are applied as a color base. The Lost Vikings binds to all tiles of the card only 16 colors. Thanks to the addition of the base color bits, Blackthorne can use 128 colors for tiles, but every single component sprite is still limited to 16 colors.

Secondly, a background map layer was added to Blackthorne. This allows the game to have more detailed backgrounds and the sky, translucent through the voids in the front tiles. For simplicity, I will call the secondary background layer a sky layer, and the main tile map as the foreground and background layers. The sky layer does not use bits of the front / background flags, so you can consider it as a single layer.

The figure below shows how the tile map for the first level of Blackthorne is assembled:

From left to right, top to bottom: sky / background, front tiles, background tiles, all layers.

You can view the tile map as follows (note that level is number 3 because levels are 1 and 2 - this is the initial animation screensaver and training level):

 ./level_view --blackthorne DATA.DAT 3

Here you can see a couple of interesting points:

Some parts of the sky layer are completely covered by the front layers. Perhaps it happened because of the development process or the Blizzard tools.
, — . , . , , . - /.
. , . , Blackthorne , . . , . , Blizzard , , The Lost Vikings.

Everything is a level

Blizzard took another smart step: everything in the game - intermediate screens, animated screensavers, menus - are levels. Without a doubt, it saved a lot of development time. Since all the hard work of implementing the code for the levels of the game with animation and moving objects has already been done, it is logical to use it again to control the screensavers and menus. I suppose, in the game engine, the virtual machine is again used to implement animation in the levels of animated screensavers. The only notable exception was The Lost Vikings' initial screen from the beginning of this section, which is encoded as a big raw sprite without compression.

For example, when you start a game, several initial screens are displayed. Two of them are encoded as a single level (of two rooms). The left part of the image below shows a set of tiles for the level, and on the right - the level itself. The game first shows the silhouette of the character (which glows, but more on that later). The Blackthorne logo is displayed above the main menu.

I don’t know exactly why we need these two copies of Blackthorne logo tiles. Notice that they have slightly different colors. For example, in the upper set the TM icon is black and is inside the logo, and in the lower one is white and is outside. Perhaps this set of tiles is once again used for another screen at the end of the game?

A set of tiles and the level of this screen can be viewed as:

 ./tileset_view --blackthorne DATA.DAT 1 -c 0x6e ./level_view --blackthorne DATA.DAT 1 -h0x6e

Palette control

I mentioned above that sixteen colors are used in each sprite, and the values are always encoded as 0x0 - 0xf. This allows each level to assign its own palette to reuse sprites in different levels.

There are several worlds in The Lost Vikings, including the cosmic, prehistoric and insane candy worlds. Each has its own color scheme and enemies. Player-controlled Vikings use only two sets of 16-color palettes (Olaf and Baleog both use the same green-yellow color scheme, while Eric has a red-blue scheme). The interface uses another set of 16 colors, and another one is used for items such as keys and health replenishment that exist in all worlds. Due to this, a decent part of the 256-color palette remains, which is determined by the level. Most levels load a basic 128-color palette, and then a set of eight 16-color palettes.

Individual level palettes allow you to reuse sprites with a different color scheme. It was a popular trick in the era of 8-bit color and limited disk space. As you know, in Mortal Kombat games there are several palettes that change the look of ninja characters. The image below shows the same dinosaur sprites with palette settings of different levels:

You can view two different versions of the dinosaur sprite like this:

 ./sprite_view DATA.DAT -fpacked32 -b0xc0 -l5 0xf8 ./sprite_view DATA.DAT -fpacked32 -b0xc0 -l10 0xf8

Notice that the sprite number viewer (-l) and offset base offset (-b) arguments are passed to the sprite viewer. The level number is used to determine which palette blocks to load when analyzing a level header block. I determined the index of the base palette experimentally. Again, in this part of the Blizzard engine, I did not understand completely. Again I suspect that the index of the base palette is indicated by commands in the program of virtual machine objects.

Palette animations

Another popular trick from the DOS era is animation of graphics by changing the colors of the palette. Animation of the object by redrawing pixels is quite expensive, especially for large parts that need to be updated frequently, for example, for background waterfalls in Blackthorne. Instead of changing the pixels themselves, it is much more economical to change the colors of the palette. At the same time, all pixels of the corresponding color are instantly updated. This technique is mainly useful for simple loop animations, such as waterfalls and flashing lights in the cosmic levels of The Lost Vikings. As mentioned earlier, palette animations are used in Blackthorne to create a silhouette glow effect on the initial screen.

The Blizzard engine implements palette animations in the level header. Each palette animator has an 8-bit speed value and two 8-bit color index values. If the two indices are not equal, then the animation cycles between the two indices. This format is well suited for animating objects moving in a pattern.

If the two indices are equal, then the animation of the palette is used for one color and the level header indicates a list of 16-bit values for the animation. Each of the 16-bit values encodes the color value in the RGB-555 format (5 bits per color, that is, one bit is wasted). The usual VGA palette and Blizzard engine are capable of displaying 6 bits per color. Palette animations lose the least significant byte by shifting each color value one space to the left. This palette animation format is useful for animating something like pulsating light.

You can click in the “A” level viewer to view the palette animations.

Game over

That's all for now:

 ./level_view DATA.DAT 48 ./level_view --blackthorne DATA.DAT 23

You can read me on Twitter: @ryiron .

Source: https://habr.com/ru/post/329448/

All Articles