Parsing formats: 3d models from the inside

Another article about the parsing of formats, for a summer Sunday evening, is small and entertaining. This time it will be about 3d-models. The principles of data storage for any model are the same, but the file formats are very diverse. Even in the conditions of the same engine, developers strive to modify everything and cram something of their own, because in commercial versions they have the opportunity to change the engine code, and they usually use it.

The main volume of any model file is made up of several large tables with data about vertices, how they are connected and how textures are stretched on them. Let's start from the top. A simple list with x, y, z coordinates might look like this:

')
Since the coordinates most often lie in the form of 32-bit floating numbers, they are easy to recognize inside the file by repeating 4 byte digits in the 40-45 range, or for negative numbers C0-C5. Of course there are other bytes, but these are most often. This is because the coordinate range of the 3d model is small in terms of order, and the order is just stored in the high byte.

Further, we need a table, where it is indicated in which order the vertices are connected in triangles. Most often, it looks like this:

These are 16-bit vertex numbers, in groups of three. Since the models usually have no more than a few hundred vertices, these numbers are small, and such a table is also easily visible visually. In this example, one of the triangles is selected, consisting of vertices with numbers 50,51 and 52.

And the third is a table of textural coordinates to bind the vertices to a flat texture that you want to pull on them.

The x and y coordinates within the texture are reduced to a range from 0 to 1, and if the texture has a size of 2048x2048 or 4096x4096, there is no point in high accuracy. Therefore, they are most often stored as floating numbers with half precision , 16 bit. The high byte they get is slightly more than 0x30, occasionally reaching 0x40 or slightly more. Here the texture coordinates are highlighted in red and orange, the coordinates in the light map are highlighted in green and light green.

After these tables are found, you can count the number of elements, find where this number is stored and how the entire structure is described. But in addition to these tables in the file there are surely many small incomprehensible numbers that do not know what they mean and what they mean. For example, in the middle of a file with a human model there are three floating units (highlighted in green):

How to find out what they mean? Yes, just change them and see what happens. We will write 1.5 instead and run the game.

Well, of course, this is the scale. An interesting effect is obtained from the fact that the animation is recorded in other files separately, and it is tied to the coordinates in the models, which is why there are such bizarre freaks. A man in the square on the shoulders of a child sitting. Now he was inside his chest. The man in front of the barrier, making strange movements, in fact applauds.

Now we will try to determine where the width is and where the height is. Let's change only one number, the first: we will reduce it by 10 times.

Thus, experimenting, you can determine the value of the remaining numbers. If, however, if some of them change, nothing happens, just leave them alone. Maybe we will never know why they are needed. And if suddenly one day they make themselves known, then we will understand.

Here is one of the examples when analyzing a file format is analytically easier and faster than studying an exe-file where you can get lost in the wilds of code that feeds all this information to a video subsystem.

Source: https://habr.com/ru/post/263009/

All Articles

Parsing formats: 3d models from the inside

More articles: