Research of formats of game resources on the example of the game Dr. Riptide

I once asked myself to port this game to more modern platforms. But the game is understandably far from open source, and sometime back in 1994, developers took as much as 25 bucks for it, and therefore all game resources needed to be either redrawn or gut a single game archive. What I do.

The game archive named RIPTIDE.DAT is a binary file of its own format. By the way, it is not an archive, but a so-called pseudo-archive. Those. the files are stored in a single container without compression, and there is some primitive file system indicating how to access the files inside the container.
If we open this file in any hex editor, we will see that the records of the files inside the container go at the very beginning, and then the binary data itself. Actually, you need to know how many files the container contains and the recording format itself. The first thing we pay attention to is the fixed length of the record, i.e. from the beginning of the file name in one record, to the beginning in the next, for all records, the same and equal to 0x19 (25) bytes.

However, the file name of the first record is slightly offset from the beginning of the file, so we will look at what is in front of it. Since usually in compilers using standard data of 1 (byte), 2 (word), 4 (dword), 8 (qword) bytes, we will divide the data in the mind into approximately blocks of this size. Our attention is drawn to two dwords 0x00001A60 and 0x00013EEC and word 0x010E , because they can indicate the beginning of the data or the size of the data because it is smaller than the size of the file itself. Offsets 0x00013EEC and 0x000001E are of no interest. The first one points to the middle of some binary data, the second one goes to the middle of the file entries. But 0x00001A60 indicates exactly the binary data located immediately after the last file entry. Since this field refers to a file record, we look at the same field for the next record. To do this, add to the offset the number 0x19, which was obtained above and which is the length of the file record: 0x0000000A+0x19=0x00000023 . In the word for this offset is the number 0x0001594C , which is also within the file size. 0x00013EEC that the number 0x00013EEC from the first file entry is less than this number. Check it out. 0x00001A60+0x00013EEC=0x0001594C . We check it on other records and come to the conclusion that this field contains the size of the file located in the container.
')

In principle, this is enough to get all the files from the container, but let's see what the other fields are for. The numbers between the offset and the size are much larger than the size of the pseudo-archive itself, and therefore cannot be either an address or a size. The first thing that comes to mind is that it is a checksum, because it would be logical to check the data in case the file is broken. However, in this case, the developers acted differently. These fields contain the time stamp of the files. Why it took remains a mystery. Using the example of the first file record 0x1CEF2292 turns into 07.15.1994 04:20:36 in DOS format.
The last unsolved meaning remains only word 0x010E at the very beginning of the file. It is most logical to assume that it contains the number of files in the container. It is easy to check. Take the offset to the first file from the first file record 0x00001A60 subtract 2 bytes on the word itself, and divide by the length of the file record in 0x19 bytes and get exactly (00001A60-2)/19 = 010E as required.
As a result, it can be written in the form of structures in the C language as follows:

 typedef _FILE_ITEM { uint32_t Size; uint32_t TimeStamp; uint32_t Offset; char Name[13]; } FILE_ITEM, *PFILE_ITEM; typedef _HEADER { uint16_t Count; FILE_ITEM Files[0]; } HEADER, *PHEADER;

After unpacking, we get 270 files with the extensions CMF, L, M, PCS, PCX, TXT, VOC.
Of these extensions, there is no need to analyze TXT, PCX, which are quite common formats. After a small search, there is no need to analyze CMF and VOC, which are sound files. Remain L, M and PCS. Honestly what PCS is used for, I did not find out, and there was no need for that.

Analyzing the file names, we can assume that the L-format files contain graphics, and the M-format contains information about game levels-cards.

L format

Analyzing graphics, we rely on the fact that in any format somewhere the dimensions of the image must be indicated and the information itself displayed as graphics. For animation, at least another number of frames is added and possible time intervals between frames.
Again, open the file (and preferably several) in the hex editor. What immediately catches the eye - for all graphics, which is displayed in the game statically, the first byte is the value 0x01, and in those that are animated more than one. Thus, we make the assumption that this number indicates the number of frames in the file. Then there are two bytes, after which in most cases there are zeros. Suppose that this is the width and height. Let's check - we multiply the first by the second and we get just the length of the file minus just those three bytes in the beginning.
Since only one byte per color is used to describe a color, the colors are indicated by indices in the palette, and the maximum number of colors used at the same time is 256, which fully corresponds to the graphic modes of that time. In 256-color modes, a palette of the following type is used:

For files in which the number of frames is more than one, the sizes of each frame immediately follow the graphic data of the previous one.

In this figure, you can easily find the offset to the second frame. We take the offset to the first frame 0x00000001 and first add 2 bytes allocated for the dimensions, and then 0x0C * 0x10 for the graphics. We receive just 0x000000C3 .

Remarkably, to save space, the size of the described frames may vary. For transparent color, the value 0 is used.

M format

If before that the formats were simple, and were not difficult, and we knew about what to look for, now we have to act extremely intuitively.

Again, open several M-files in the hex editor at once and try to identify similar regions.
After a brief analysis, we note several main blocks in the file:

4 bytes at the very beginning of the file
large data block sparse zeros
a data block is always 0x8000 bytes with duplicate data strings, but almost no zero bytes
a small block of data sparse zeros at the end of the file
4 bytes at the very end of the file

The first thing that was noted was that the third block in all files is 0x8000 bytes in size. The second thing to notice was that the second block is similar to the DWORD array, and its length is a multiple of the first four bytes of the word. It was logical to assume that these two words set the dimensions of a two-dimensional array of dwords, followed by the array itself.
We start to look at the values of this array. The low byte of any dword is in most cases non-zero, but the closer to the high-order digits, the less often non-zero values occur.
It was decided to display this two-dimensional array as an image in which 2 bytes of the dword would indicate to fill the point or not.

The result was the following picture:

Which roughly resembles map silhouettes, and at first I thought that this byte describes a map to check for collisions with walls.
Then I decided to display in color the pixels for which the higher bytes of the dword are nonzero. The third is red, the fourth is yellow.

The picture starts to clear up. These values describe static and dynamic game objects for which there is a separate graphics in the L-files.

It became clear that the first byte of the dword stored the index number of the image in the tile map, which was supposed to be displayed in this place. But since nowhere in the graphic files the tile maps were stored, an attempt was made to display that data block of 0x8000 bytes as an image. Since I did not know the width of the image, a long strip of 1 pixel thickness was originally obtained. Gradually reducing the image width, silhouettes of some map images began to appear. With an image width of 8 pixels, I got a clear image with a pronounced square cut into tiles. The result was an image with a width of 8 and a height of 4096 pixels.
Some fragments will be presented in the picture below. As the RGB color component, the byte value was used, so the image turned out in shades of gray.
By the way, in most cases all that you see in the pictures are screenshots of rendered HTML pages with huge tables, the cells of which were painted in their own color. The very same parsing of binary files was carried out by means of PHP. Not that I'm a pervert, it was just too lazy to look towards the libraries graphically.

If we divide the height of the tile map into 4096 pixels by 8, then we get 512, and not 256 pictures. Thus, what I considered to be a mask for checking collisions with walls turned out to be the same image index in the map. In this way, the developers killed two birds with one stone. Those. the lower 256 images for objects through which it is impossible to swim, the older ones through which it is possible. And under the index reserved 2 bytes, not one.

Render maps with overlaid tiles now looks like this:

Now the whole map fits easily on one screen, and at the time of graphics with a resolution of 320x200 pixels, the background in the game smoothly scrolled.

The purpose of the last two blocks in the format could not be found out intuitively.

Source: https://habr.com/ru/post/154781/

All Articles

Research of formats of game resources on the example of the game Dr. Riptide

L format

M format

More articles: