
In the last
two articles I talked about the features of the data formats of the sound subsystem of modern games. In order not to bore readers, I will move on to a slightly different topic. Whichever engine the game uses, it needs to store resources somewhere and extract them from there at the right moment. Sometimes the resources in the archive have both an identifier and a readable file name. But there are quite a lot of engines, where there are no names for files, but only a hash. How, then, can something be parsed in the resources?
Consider this on the example of the rather rare bits bits engine. It is simple and compact, but, nevertheless, has all the features necessary for modern games. Last year, bitsquid along with its developer was bought by Autodesk, and now they are going to cross it with Maya and make their own game engine, which they promise will be something incredible.
So that anyone could look at the process himself,
let's use the demo version of
The showdown Effect , which is also very small (about 250MB). Going into the content folder, in which, obviously, all the resources are located, we find there a couple of dozen files with such wonderful names:
')
038bbacc4ce89296
0d42c15e8f2b473f
171e8b0d2241eb79
406c3644bd95237a
44bcc04093e5c506
680514e023d37cd5
71eec7a172194fe5
9229959b09a3b4be
9e13b2414b41b842
a6db0de7cf227dfe
a9956e471d528263
ac5c2f0670e5d674
b5af853949550001
These must be package / archive files with resources. Let's open one of them and see what's inside. And there, throughout the entire file, neither tables, nor texts, nor any meaningful numbers are visible - a continuous jumble of bytes:

This usually means that all data is encrypted and / or packed. In this case, almost at the very beginning, the combination of bytes 78 9C (highlighted in green) clearly says that the data is compressed zlib. Let's try to begin to unpack the file "manually." To do this, use
offzip , a utility that simply tries to unpack any sequences of bytes inside the file, as if they were packed with zip or zlib, no matter how many of them are in the file and in what sequence.
Run the following command:
offzip -a 9e13b2414b41b842 unp 0The -a option here means that you should try to find all segments compressed by zlib in the file, and not consider the file as the only compressed block. “Unp” is a folder for unpacking (you must first create it). “0” is the initial offset, that is, search from the very beginning of the file.
We get the following:
+ ------------ + ----- + ---------------------------- + - --------------------- +
| hex_offset | ... | | zip -> unzip size / offset | spaces before | info |
+ ------------ + ----- + ---------------------------- + - --------------------- +
0x00000010 24803 -> 65536 / 0x000060f3 _ 16 8: 7: 28: 0: 1: 441d52d8
0x000060f7 21186 -> 65536 / 0x0000b3b9 _ 4 8: 7: 28: 0: 1: 74fe0bf1
0x0000b3bd 16694 -> 65536 / 0x0000f4f3 _ 4 8: 7: 28: 0: 1: 4bdbbd7f
0x0000f4f7 17028 -> 65536 / 0x0001377b _ 4 8: 7: 28: 0: 1: 4cae9920
0x0001377f 16200 -> 65536 / 0x000176c7 _ 4 8: 7: 28: 0: 1: aa6b718e
0x000176cb 14445 -> 65536 / 0x0001af38 _ 4 8: 7: 28: 0: 1: e190c104
[skipped ...]
0x04ec0fb4 17108 -> 65536 / 0x04ec5288 _ 4 8: 7: 28: 0: 1: 952f8201
0x04ec528c 17139 -> 65536 / 0x04ec957f _ 4 8: 7: 28: 0: 1: 373c403f
0x04ec9583 22442 -> 65536 / 0x04eced2d _ 4 8: 7: 28: 0: 1: 8e95fe5c
0x04eced31 4215 -> 65536 / 0x04ecfda8 _ 4 8: 7: 28: 0: 1: 93e0ac5a
- 1483 valid compressed streams found
- 0x04d7e61c -> 0x05cb0000 bytes covering the 98% of the file
As you can see, 98% of the contents of the file was unpacked into a bunch of segments of 64kB each. After analyzing their contents, you can see that they were a single whole - one large file, which was simply cut into pieces of 64kB and then compressed separately by zlib. In principle, it could be the other way around - each source resource is compressed separately and then they are all molded into one large file. But in our case, the file is one, so you can unpack it with the following command:
offzip -a -1 9e13b2414b41b842 unp 0The option -1 means that all detected unpacked segments need to be connected. As a result, we get the unpacked file, which again must be studied. Having shook it back and forth, you can find that inside there are lua-scripts, and sounds, and textures, now uncompressed, but cobbled together.

Our task is to divide the file into separate resources, and it is desirable to somehow know their names. Let's turn to the beginning of the file. Here we have something incomprehensible, then a lot of zeros, and then some table apparently begins. It seems that the lines in it have a length of 16 bytes, and interestingly, the right half is always different, and in the left half there are repeated numbers (highlighted in green). Note also that the name of the file itself is sometimes repeated inside it in one of the lines.

Further, it turns out that the last row in the table is for some reason the same as the first. In addition, if you look at several files, it seems that the first number in them is just the number of rows in the table (but minus one). Comparing all these data, we can conclude that this is a table with the names of resources in the form of a hash, a separate name of the resource and its type. And the last line is no longer a table, but information about the first resource, where at the beginning its hash is visible, and then there should be size, other parameters, and the file itself. To verify this, we will look for the rest of the numbers in the file, and, of course, they are, moreover, in the same sequence as they are in the table.
Well, now it remains to parse the format of the records and try to guess the hash names. In principle, it may be that the game accesses resources already by hash, and there are no original names left in it, in which case we will not find names. But fortunately, most often they can be found, guessed, or calculated by code or scripts. By the way, about scripts: we have already seen that lua is used here, which means, most likely, the extension for such files will be “lua”. The type of hash used can be determined by the presence of known constants in the code. For example, the
FNV uses the number 0x811C9DC5. If you use your own algorithm, it is usually simple, such as addition with a shift, but finding it in the code will not be so easy.
I was already going to look for 0x811C9DC5, but I decided to google for a start, and it turned out that the developer bitsquid in my blog somehow told about the advantages of the murmur64 hash. Like any hash, murmur has different versions, but 64-bit is just 8 bytes, as in our table. The source code is found
here . Compile it and try to calculate the hash of the string “lua”. True, we do not know what the seed is equal to, so for now let's try to take a zero.
We get murmur64 from "lua" = A14E8DFA2CD117E2
This number is often found in our file! Congratulations, we now know how the game counts the hash. If seed were not null, again we would have to look or debug the code to find out. It can be a constant, or the length of a text string, but in general it can be anything. For example, the first character glued to the length of the string.
Well, well, we know one of the extensions, do we now have to guess all the others one by one? Perhaps it also happens. But let's try to look somewhere in their list, so to speak, in the clear. It can be in one of the lua scripts, or directly in the executable file, as in this case:

In the middle I have highlighted the lines that are exactly the types of resources. But where does this list start, and where does it end? This can be determined experimentally.
Let's try for example murmur64 from “unit” = E0A48D0BE9A7453F
And indeed, there is such a code. It seems to be an obvious name, but to guess it from the first time would not be so easy. And sound banks are generally called “timpani_bank”, I would never have guessed anything.
So, now we know all types of resources (file extensions), but how to know their names? They can be in resources or in code. For example, see the .ini file, which lies next to the archives.
boot_package = "resource_packages/boot" boot_script = "scripts/boot/boot" pdxigs = { game_name = "Showdown" game_version = "1.0.0" server_url = "http://xxxxxxxx.xxxxxxxxxxxxx.com/xxxx" } steam = { notification_position = "bottom-left" } timpani = "content/sounds/shoot"
This is the first clue - the boot package is called “resourse_packages / boot”. We calculate the hash of this line - 9E13B2414B41B842, it is in our list. It contains a boot script along with other files.
"Scripts / boot / boot" = BBF3D6DD1B2AC672.
It contains references to other scripts, for example, “scripts / boot / boot_common”. This turn, in turn, has many lines, including
"Resource_packages / base_game_resources" = 0D42C15E8F2B473F
Apparently this is the name of the package, which contains the main resources of the game. Let's check - there really is one. So you can theoretically find all the numbers. Naturally, this is not done manually, but programs or scripts are written, because in an average game there are several tens or even hundreds of thousands of resources. The process of solving sometimes sometimes takes a long time, and still, in the end, often a number of unnamed files remain. However, most of the names are usually found, and then a list is made, which is used when unpacking resources and modifying the game.
So, suppose we have found all the names, and now our resources have meaningful names and extensions. Let's go back to the file format.

After the table - the list of hashes (highlighted in yellow) separate entries for all resources begin. As we already found out, the first line (highlighted in green) is the name and type of the resource. Here, 82645835E6B73232 = "config", we do not yet know the right-hand side (name). Let's try to guess what goes on. Apparently, here we have a few 32-bit numbers. First, one, then two zeros, then another number (highlighted in pink), similar to size, and one more zero. It is not known what it is, but for all files these numbers are exactly like that. Then the actual content of the resource begins. Check its length. We add the size 045 to the offset where the recording begins, 0518, we get 0974.

Yes, indeed, here is the next hash. 9EFE0A916AAE7880 = this is “font”, then everything is the same as in the first record and the length of the font is 1838. Next comes the font itself, it starts with a large series of floating numbers, they are also usually easily visible with the naked eye. For example, 42000000 is 32, 4180000 is 25, and of course 3F800000 - the floating number most often found in game files is 1.
It seems that the format of records in the archive, we have dismantled. It seems that the only thing we have for each resource is its size. It is strange that there is no bias, well, it also happens. The remaining numbers are zeros, perhaps they mean something, but we do not know. Let's check the last entry just in case by adding the length of the last resource to the address of its beginning. We get 4ECFDA0 - this is just the total length of the file. It seems that in the end there is nothing more, so you can start writing the unpacking program. It will read the package file and share it into resources. If the hash is in our list - save files with the correct name, if not - the hash itself is taken as the name.
Run the program - and it successfully unpacks a bunch of files from our package. Check their contents. The textures really turned out to be correct DDS files, the sounds are played as regular OGGs, the rest of the files (for example, unit models), although they have some special format, also look believable.
Inspired by success, we begin unpacking all other files. And here we expect Unhandled Exception. Almost all files have been unpacked, except for a few. It usually happens. Among the thousands of files there will definitely be one or two, packaged somehow outside the box, or with additional parameters. Let's see what's wrong with these files. It turns out that the unit after the hash was no accident. In this file here is not one, but seven. Moreover, when unpacking other games made on the same engine, it turned out that the zeros there are also not always zeros. But that's not all. There is one file whose structure seems to be completely broken. When you try to find another resource in it, it turns out that the remaining piece of the file is not enough for a resource of that length.
Let's look at the record of this particular resource. It seems to be all right: hash, after it one, two zeros, then size. How so? Maybe the file was not properly unpacked? Well, let's try again. Perhaps in the flashing of hundreds of lines of output offzip we did not notice something?
0x0286c4e2 65196 -> 65536 / 0x0287c38e _ 4 8: 7: 28: 0: 1: 340433a1
0x0287c392 65242 -> 65536 / 0x0288c26c 4 8: 7: 28: 0: 1: 27dce3e7
0x0288c270 65415 -> 65536 / 0x0289c1f7 _ 4 8: 7: 28: 0: 1: b9bd6cd0
. 0x028cac59 ....................
zlib z_DATA_ERROR
or uses a different windowBits value (-z). Try to use -z -15
0x028cc207 65533 -> 65536 / 0x028dc204 _ 196624 8: 7: 28: 0: 1: 54aa0921
0x028dc208 65513 -> 65536 / 0x028ec1f1 _ 4 8: 7: 28: 0: 1: c4b3abd4
0x028fc1f9 65533 -> 65536 / 0x0290c1f6 _ 65544 8: 7: 28: 0: 1: 890356ae
0x0290c1fa 65534 -> 65536 / 0x0291c1f8 _ 4 8: 7: 28: 0: 1: 934a442c
0x0292c200 65496 -> 65536 / 0x0293c1d8 _ 65544 8: 7: 28: 0: 1: c21356fb
0x0293c1dc 65521 -> 65536 / 0x0294c1cd _ 4 8: 7: 28: 0: 1: 5bf3ea59
0x0294c1d1 65514 -> 65536 / 0x0295c1bb _ 4 8: 7: 28: 0: 1: d7c697a0
Yes, indeed, some kind of problem. It is in this file there are segments that are not unpacked. By the way, what kind of resource is this? Let's look at the contents of the file at offset 0x028cac59, or slightly earlier.

And this is just "timpani_bank". Well, of course! After all, the ogg-stream can contain such a chaotic stream of bits that zlib simply cannot squeeze it, as a result of the 92-MB file several 64-kbyte segments after compression turned out even more than 64k. Apparently the developers reasonably decided that in this case there was no point in compressing them, and they placed them in the archive just as they were. Therefore, offzip could not find the cherished bytes of 78 9C there, and as a result, just missed them when unpacking.
There is no flag / sign to distinguish between compressed and uncompressed segments in the file structure, then the game comes simply: if a segment is smaller than b4k, it means it is packed, but if it is exactly 64k, then no. However, this is not so simple. There were cases (in another game, on a different engine), when after packing a segment its length remained exactly equal to 64k. And here there is no way to determine whether it is compressed or not. Although the probability of such a coincidence is very small, this will also have to be taken into account.
So, gradually, comparing and analyzing files, most often you can parse the data format, even without resorting to debugging code. I will not go into details on how to determine what it was for the seven, and that it was for zeros, which are not always zeros, because all the features of the format in one example still do not solve. And if there are other games that use them - then we will understand.