Parsing formats: sound package in the Unreal Engine

Last time I talked about the study of the connection between sound and text in the UE3 engine. Today I will consider a simpler case in more detail - the analysis of the Bioshock Infinite sound package, from which the full localization of this game into Russian began.

Analysis of resources is like solving a rebus or a crossword puzzle: we start from one side, and gradually recognizing the meaning of all new fields, we solve everything else. At first the resource structure of the game was like a dark forest for us. But gradually we learned about them more and more, one thing was pulling for another, and as a result, almost all game resources were studied and modified: sound, textures, 3D models. An example of an inscription changed in the form of a model together with a texture, lighting and reflection maps, luminous lights, highlights and shadow can be seen in the picture.
')
But it all began with a simple at first glance task: to find where the sound is in the game, and how it can be replaced.

Glancing through the contents of the game folder, we find that there is a large file in the audio folder. Pck is a package that most likely contains all the sounds (phrases) from the game, you need to parse its format, extract the sounds, replace everything (or some from them), and collect the file back.

Opening it in the Hex-editor and squander back and forth, we can assume that at the beginning there is a header, then a table with data for each file, and then all these files, glued one after the other. In order not to get confused, we will further call a large file (the whole package) “file”, and the small files of which it consists of are “sounds”. Now let's see more:

...

Let's start with the title. In it, we first see the AKRK code (a sign of the file type, or as some call it, the magic word), then some numbers, and the inscriptions “english (us)”, “sfx”. We do not know the meaning of these figures, but we don’t need them, as it will become clear in the future. These are the various properties of the package, the size of the header, the size of its individual parts, etc. Therefore, since we are going to change the contents of the package (sound files), without changing anything else, we can just leave the title as it was. It is often the total length of the file, and then, of course, would have to change it, but in this case it is not. Otherwise, we would immediately notice the presence of this number (0xF49668E) in the title.

We now turn to the table. It can always be recognized by its repetitive structure, while “scrolling” the file, it characteristically “overflows”. Even without knowing its contents, you can usually determine the number of elements in a line, but we do not know exactly where it begins, and how many elements are in it (how many sounds are in a package). However, after analyzing the content, we can determine this.

The table usually contains the number of elements, as well as the address (offset) and the length for each of them. The sound files themselves, as you can see, have standard RIFF headers, so you can immediately find out that the first sound starts at 0x2ED58 (green) and has a length of 0x981C (yellow), the second one - at 0x38574, and has a length of 0x20B3, etc. . You can also find out how many there are, simply by counting the number of these RIFF headers: 9587 (0x2573). We see this number in the file at 0x54 (highlighted in red).

If the sound format was non-standard, then it would be more difficult to determine, not to mention that you would have to write a program to convert to the desired format, or even a special codec, as was done for Dead Space, but more on that later. .

Now we can take the total length of the packet header along with the table and divide by the number of sounds. The first sound begins with the address 0x2ED58 (191832). 191832/9587 = 20.01 The number is not integer, because we do not know the length of the header. Thus, each sound in the table has 20 bytes, i.e. five 32-bit words, and the total length of the table = 9587 * 20 = 191740. And yet, we still do not know where the table begins, because it may be somewhere in the middle, and after it there may be some something parameters. We only know the length of the table, and the remaining 191832-191740 = 92 bytes remain on the header. 191740 = 0x2ECFC - we see a similar number (Ox2ED00) in the header at 0x14, for some reason it is less by 4).

We are now engaged in the contents of the table. We know the length and address of the first sound - 0x981C and 0x2ED58, and we find them at 0x60. Suppose the table starts from this point. Then we have one, then some number, and then again one. Then come the length and address of the second sound, etc. to the end of the table:

  0000981C 0002ED58 00000001 00068064 00000001
 000020B3 00038574 00000001 0007532 00000001
 00003587 0003A627 00000001 0008A458 00000001
 .....
 00013D1B 0F482973 00000001 00000000

It can be seen that the last sound just goes to the end of the file (its length is 0x13D1B plus the start address 0xF482973 just gives the total length of the package file 0xF49668E), but this is where the table ends, the first sound with the RIFF header goes (highlighted in pink). So our assumption was wrong, and the table starts earlier, immediately after the number of sounds. Move back to the address 0x58 and get the following:

  00023E36 00000001 0000981C 0002ED58 00000001
 00068064 00000001 000020B3 00038574 00000001
 0007532 00000001 00003587 0003A627 00000001
 .....
 3FFE9BEF 00000001 00013D1B 0F482973 00000001
 00000000

That is, in the table there are 9587 lines, for each sound: some number, then one, then the address and length of the sound, and at the end one. After the table zero, apparently a sign that the table is over, or for some reason, we do not know, so just leave it as it is. This zero is four bytes, adding which to 0x2ECFC we just get the total length of the table, which is written at the beginning of the file at address 0x14.

We could have gone the other way, for example, by finding where the address of the first (0x64) and second (0x78) sound is recorded in the table, immediately calculate the size of the recording of one sound - 20 bytes. Then divide the approximate size of the table by 20, get an approximate number of records, and then make sure that it is actually recorded in the file at 0x54. Or in another way: just try to find the number of sounds at the beginning of the file, since there are obviously several thousand sounds in the game, the only suitable number is 9587 - all other numbers are either too small or too large. Then divide the header size by that number, and so on. In general, wherever we began to dig, as a result, we must parse the heading and the table of sounds, determine the number of elements, the size and structure of its recordings.

Now we can write a simple program that will try to "cut" the package into separate sounds. By doing this and converting them from the wwise format to the usual ogg, we can listen to them all. But now it would be nice to make sure that we can shove them back. Run the game. At the very beginning, right after the screensaver, the screen turns dark and the female voice says something. Let's find out in which file this phrase is located, replace it with ours and encode it with wwise, which we downloaded from the official website (the package is free for non-commercial use). For now, let's not rebuild the package, but paste the phrase directly into the same place where it was. We start the game - and it really works. Well, then we continue to research.

Let's return to the table structure. The 3 and 4 numbers are, as we have already determined, the length and the address. What do the others mean? In fact, we do not need this either. You can check that the 2 and 5 numbers are always single numbers, and the first number is different for each sound, that is, it will be enough for us to remember these numbers, and when assembling a new file, put them into the table as they were. If the game does not work, then you can understand what it is and why they are needed.

Later, in the course of working with voice acting, we found out what those numbers were. It turned out that the game has 2 packets of sounds - one with a voice, the other with sound effects. The last number in the string is the type of sound (0-effect, 1-voice). The meaning of another one became clear when we looked at the contents of the same package for XBOX:

It can be seen that instead of the ones here 0x800 everywhere. Looking at how the sounds are located, one can guess that this is the amount of alignment in bytes. In the XBOX package, the sounds inside the file are aligned at 2,048 bytes. That is, if the length of the sound is not a multiple of 2048, it achieves zeros. Once I saw how the principle of this “doboi” was explained as much as a few paragraphs, but it seems to me the easiest way to show it clearly. The contents of the sounds highlighted in green, and between them - the zeros, this is the same kind that continues to the nearest border in 2048 bytes.

Now only the first number in the rows of the table remains unknown. It looks like a unique value, monotonously increasing in the range from 0 to 2 ³⁰ . These are unique identifiers of sounds, a hash by which the game finds them in a package. Even the formula by which it is calculated is known.

Thus, in order to change the sounds in the game, we need to write a program that will parse the file package into separate sound files, remember the identifiers, and then assemble it back, making the exact same heading, new table, where there will be new addresses and the lengths of the sounds, and then cobbled together all the sounds in a row. Where to memorize identifiers - everyone does this as you like. You can text, XML, or even sign the number at the end of the file name. This is how the list will look like:

  146998 vo_fndr_female_02_taunt_creepy_20943
 426084 vo_vox_male_02_gammaReact_close_20235
 509234 vo_vox_male_06_emotion_scream_21209
 566360 vo_vox_female_04_taunt_22534
 612724 vo_vox_male_10_reload_21286
 971110 vo_vox_male_06_death_21181

When assembling it is necessary to save the exact same file order. In principle, you can not wonder why, acting simply from the usual principle - to do it the way it was. But in fact, the search for sound in the package occurs using the old, but still irreplaceable method of half division - for this they are sorted in ascending order. In our example, for a maximum of 14 operations, you can find the desired sound from all 9587, instead of going through them one by one. If we mix up the order, the algorithm will stop working, and some sounds will simply not be found, and will not play when they are needed.

Suppose we wrote a program that performs all the necessary operations. Given the fact that we know the file structure, this is easy to do. And then comes the exciting moment of the first launch of the game with the reassembled sound. The screensavers pass, the screen darkens, the music subsides, and after a short pause, a Russian voice sounds in complete silence. Fine, so we did everything right. In addition, we were lucky: it looks like this .pck file turned out to be self-sufficient - the necessary information about the sounds is stored only in it, so nothing broke from the fact that we changed the length and content of the sounds. You can start voice acting - to prepare the material and give the actors.

Could be worse. In some games (including the UE3 engine), references to the displacement of sounds inside the package are in other files, and in the worst case in many places. That is, if some sound is used in 10 places in the game, then there will be 10 links, and then we would have 2 options: either look for all these places and replace the links, or make sounds in such a way that their length is exactly as in the original. And it is difficult and sometimes impossible. But, as mentioned earlier, we were lucky, the sounds were replaced at an arbitrary length, and the game started.

A few months later, when the work was already in full swing, and quite a lot of roles were voiced, when testing, we were surprised to find that one of the characters spoke one phrase in English, although we replaced all the phrases, and in another place another character is completely silent, and just opens his mouth. That is, everything works except for a few sounds. Here's an example of such a scene:

Here, Robert and Rosalind offer the player to throw a coin, and guess what falls, an eagle or tails. Up to this point, everything went fine, the entire dialogue has already been voiced, processed and inserted into the game. And so, Robert calmly and calmly, as if it should be so, begins: “Heads ...”, and Rosalind, not surprised at all, blurts out: “... or tails?”

How so? Why didn't these phrases work and why differently? I must say that for UE3 this is a common situation, it is full of similar surprises. To fix the problem, we had to parse a few more files of other formats, but this is another story.

Source: https://habr.com/ru/post/258397/

All Articles

Parsing formats: sound package in the Unreal Engine

More articles: