
After a considerable delay, I publish the continuation of the cycle of my articles.
For your reference:
In this article, I will address the remaining file formats:
- BLOB is an obsolete data format that serves as a container for binary data. It contained the basic parameters (IP-addresses of servers, CRD-record and much more);
- CDR (Content Description Record) is a binary file containing data about applications and their cache files. Currently not used;
- VDF is a binary / text file containing a set of data and having a structure depending on the specific application. Designed to replace BLOB and CDR;
- PAK - previously used in Half-Life 1, the great legacy of Quake 1, is no longer used;
- VPK - a new format of game archives inside the games themselves, is actively used at the moment. A detailed description of the file is available on the official resource . The article describes only the first version of the format.
The article is presented for reference only, since there is relatively little relevant information here, and there are almost no examples of algorithms - everything can be viewed in
the previously mentioned repository .
BLOB (Binary Large OBject)
In previous versions, the Steam client was used in a single copy - ClienRegistry.blob.
It has a clear tree-like structure and is read recursively until the child elements are exhausted. It does not have separate headers - immediately there is a root node with at least 1 descendant.
The format is somewhat non-linear , as I will indicate below.
Node Header
Each node has 2 headers - the header of the node itself and the header of the node data.
Node header format:
struct TBLOBNodeHeader { UINT16 Magic; UINT32 Size; UINT32 SlackSize; };
Magic - field describing the type of node. Possible values:
- 0x5001 - simple node with child nodes;
- 0x4301 - compressed node, you must go through the data in it as a deflate and read the headers of the received data again (here it is, nonlinearity!);
- Other values ββ(usually 0x0000) are a named node containing descendants.
Size - the actual size of the data stored in the node (does not include headers);
SlackSize is the size of the data block recorded for alignment in the file.
')
Compressed data header
If the node was compressed, then the header of the node is followed by the header of the compressed data:
struct TBLOBCompressedDataHeader { UINT32 UncompressedSize; UINT32 unknown1; UINT16 unknown2; };
UncompressedSize - the size of the "raw" data for which you will need to allocate memory;
unknown1, unknown2 - the destination is unknown, always equal to 0x00000001, do not affect parsing.
As stated above, for data received after calling
uncompress from ZLib, the node header should be re-read.
Parsing data
After reading the node header and, if necessary, unpacking its contents, the most fun part begins - reading the node contents. The algorithm was optimized as much as possible, which made it difficult to understand it after such a period of time.
Parsing data depends on the TBLOBNodeHeader.Magic field β if it is 0x5001, then we immediately read the descendant nodes.
Otherwise, read the header TBLOBDataHeader
struct TBLOBDataHeader { UINT16 NameLen; UINT32 DataLen; };
Following this header is the node name, followed by the data.
In the data, the header of the descendant node is immediately read, and depending on the type of node, branching occurs:
- If 0x5001 or 0x4301 - read the new node;
- Otherwise, save as simple data.
Parsing dataC ++
void CBLOBNode::DeserializeFromMem(char *mem) { TBLOBNodeHeader *NodeHeader = (TBLOBNodeHeader*)mem; TBLOBDataHeader *DataHeader = (TBLOBDataHeader*)mem; char *data = NULL; if (NodeHeader->Magic == NODE_COMPRESSED_MAGIC) { mem += sizeof(TBLOBNodeHeader); TBLOBCompressedDataHeader *CompressedHeader = (TBLOBCompressedDataHeader*)mem; mem += sizeof(TBLOBCompressedDataHeader); UINT32 compSize = NodeHeader->Size, uncompSize = CompressedHeader->UncompressedSize; data = new char[uncompSize]; if (uncompress((Bytef*)data, (uLongf*)&uncompSize, (Bytef*)mem, compSize) != Z_OK) return; mem = data; NodeHeader = (TBLOBNodeHeader*)mem; DataHeader = (TBLOBDataHeader*)mem; } if (NodeHeader->Magic == NODE_MAGIC) { fIsData = false; fDataSize = NodeHeader->Size; fSlackSize = NodeHeader->SlackSize; fChildrensCount = GetChildrensCount(mem); fChildrens = new CBLOBNode*[fChildrensCount]; mem += sizeof(TBLOBNodeHeader); for (UINT i=0 ; i<fChildrensCount ; i++) { fChildrens[i] = new CBLOBNode(); fChildrens[i]->DeserializeFromMem(mem); NodeHeader = (TBLOBNodeHeader*)mem; DataHeader = (TBLOBDataHeader*)mem; if ((NodeHeader->Magic == NODE_MAGIC) || (NodeHeader->Magic == NODE_COMPRESSED_MAGIC)) mem += NodeHeader->Size + NodeHeader->SlackSize; else mem += sizeof(TBLOBDataHeader) + DataHeader->DataLen + DataHeader->NameLen; } } else { fIsData = true; fNameLen = DataHeader->NameLen; fDataSize = DataHeader->DataLen; mem += sizeof(TBLOBDataHeader); fName = new char[fNameLen+1]; memcpy(fName, mem, fNameLen); fName[fNameLen] = '\x00'; mem += fNameLen; UINT16 node; memcpy(&node, mem, 2); if ((node == NODE_MAGIC) || (node == NODE_COMPRESSED_MAGIC)) { DeserializeFromMem(mem); fData = NULL; } else { fData = new char[fDataSize]; memcpy(fData, mem, fDataSize); } } if (data != NULL) delete data; }
Delphi
procedure TBLOBNode.DeserializeFromMem(Mem: pByte); var NodeHeader: pBLOBNodeHeader; DataHeader: pBLOBDataHeader; CompressedHeader: TBLOBCompressedDataHeader; compSize, uncompSize: uint32; Data: Pointer; ChildrensCount, i: integer;
CDR (Content Description Record)
It is contained in a BLOB container and has several main descendants in the root node, the location of which is rigidly spelled out (similarly for descendants):
- 0 - file version (number, 16 bits);
- 1 - application records;
- 2 - description of application packages;
- 3, 4 - the assignment is not defined, so they are simply ignored;
- 5 - public application keys;
- 6 - encrypted private keys.
Many well, very boring and long transfers, you can not even read. The purpose of the fields is still unclear.Application recordings
Fields (also BLOB nodes, by index):
- 1 - application ID;
- 2 - the name of the application;
- 3 - Application catalog;
- 4 - Minimum cache file size;
- 5 - Maximum cache file size;
- 6 - Contains a list of launch parameters;
- 7 - Contains a list of application icons;
- 8 - application ID. which must be run when you first start;
- 9 - flag Is Bandwidth Greedy;
- 10 - List of application versions;
- 11 - ID of the current version of the application;
- 12 - List of application cache files;
- 13 - Test version number;
- 14 - Additional fields in the form of a list of name-value pairs;
- 15 - test version password;
- 16 - ID of the test version;
- 17 - The original game folder;
- 18 - SkipMFPOverwrite Flag;
- 19 - UseFilesystemDvr flag.
Launch parameters:
- 1 - Description;
- 2 - Command line parameters;
- 3 - Icon number;
- 4 - Flag, responsible for the lack of a shortcut on the desktop;
- 5 - Flag, responsible for the lack of a shortcut in the Start menu;
- 6 - Flag Long Running Unattended.
Application Versions:
- 1 - Description of the version;
- 2 - Version number;
- 3 - Flag responsible for the inaccessibility of the application of this version;
- 4 - List of launch parameter IDs for this version;
- 5 - Decryption key for content;
- 6 - Flag indicating the presence of the decryption key;
- 7 - IsRebased flag;
- 8 - Flag IsLongVersionRoll.
Application cache files:
- 1 - cache file ID;
- 2 - The name of the cache file to be mounted;
- 3 - Flag responsible for the optional cache file.
Application Package Description
1 - package ID;
2 - Package name;
3 - Package type;
4 - Price in cents;
5 - There is a period in minutes;
6 - List of application IDs of this package;
7 - ID of the application being launched (WTF?);
8 - OnSubscribeRunLaunchOptionIndex flag;
9 - RateLimitRecord List;
10 - Discounts list;
11 - Pre-order flag;
12 - Flag indicating the requirement for the physical address of the buyer;
13 - Domestic price in cents;
14 - International price in cents;
15 - Type of key required;
16 - Flag indicating that this package is only for cybercafes;
17 - Some game code;
18 - Description of this code;
19 - Package unavailable flag;
20 - Flag of the requirement of the disc with the game;
21 - Area Code. on which this game is available;
22 - Flag indicating that the package is available in version 3;
23 - Additional fields in the form of a list of name-value pairs.
Vdf
The client's settings are stored in files of this format, and in the current versions there is also information about the applications. It can be either binary or text file.
Like BLOB, it has a tree structure.
Consider a binary file. There are several types of files that differ in structure and headers, but the format of the nodes is the same.
Each node begins with a byte describing the type of the node, followed by a NULL-terminated string with the name of the node.
Types of nodes:
- 0 - contains only subnodes;
- 1 - string data;
- 2 is an integer;
- 3 - fractional number;
- 4 - pointer (to what ??);
- 5 - Unicode string;
- 6 - color;
- 7 is a 64-bit integer number;
- 8 - marker of the end of the list of nodes.
In the case of reading a list of descendant nodes, the nodes are read until the type is equal to 8
Consider the basic binary files that use the binary version of the VDF format.
appcache / appinfo.vdf
First comes the headline with the following content:
struct TVDFHeader { uint8_t version1; uint16_t type; uint8_t version2; uint32_t version3; };
The fields
version1 and
version2 were previously considered as part of the signature, but over time they changed - they used to be equal to 0x24 and 0x06, now equal to 0x26 and 0x07, respectively.
The
type field is a signature and contains 0x4456 ('DV').
The
version3 field always contains 0x00000001.
After the title comes a list with information about the application, each element of which has its own title:
struct TVDFAppHeader { uint32_t AppID; uint32_t DataSize; };
The header is followed by a list of node parameters containing 1 byte of the end of list (0x00 if the end) and the VDF tree element.
appcache / packageinfo.vdf
The title is similar to the previous one, only the first 3 fields are different:
- version1 and version2 previously contained 0x25 and 0x06, now - 0x27 and 0x06;
- type - 0x5556 ('UV').
Following the header is a list of nodes describing application packages. Before each element of the list is a 4-byte number, which is equal to 0xFFFFFFFF, if the end of the list is reached.
Sample text VDF file .
PAK
The legacy archive format used in the first versions of Half-Life 1. No compression, itβs just a container for files.
File Header:
struct TPAKHeader { char Sign[4]; uint32_t DirectoryOffset; uint32_t DirectoryLength; };
Sign - the signature contains 'PACK'.
DirectoryOffset - offset of the beginning of the list of items.
DirectoryLength - the size of the list of items.
At the specified offset is an array of headers for the elements contained in the archive:
struct TPAKDirectoryItem { char ItemName[56]; uint32_t ItemOffset; uint32_t ItemLength; };
I think there is no need to describe anything, everything is clear.
VPK
The format of the game file archives, presented as a set of files, one of which contains a description of the location of the files, while the others contain the files themselves. The root file has a name of the form "<archive name> _dir.vpk", and the rest - "<archive name> _ <archive number> .vpk".
Consider the structure of the root file, starting with the following header:
struct TVPKHeader { uint32_t Signature; uint32_t PaksCount; uint32_t DirSize; }
Signature - always contains 0x55aa1234.
PaksCount - the number of archives with the contents of files;
DirSize - the size of the data with the meta-information about the files.
The header is followed by a hierarchical list with items. Moreover, the list structure is ordered by file extensions and paths to them.
That is, first there is a NULL-terminated string with a file extension, then a NULL-terminated string with a path where there are such files, followed by a NULL-terminated string file name (without extension) with information about the file. The end of each list level is an empty line.
Pseudo-structure example, only string partbsp
hl2 / maps
map1
map2
map3
wav
sound / amb
amb1
amb2
sound / voice
voice1
voice2
File Information Format:
struct TVPKDirectoryEntry { uint32_t CRC; uint16_t PreloadBytes; uint16_t ArchiveIndex; uint32_t EntryOffset uint32_t EntryLength uint16_t Dummy1; };
CRC - file checksum;
PreloadBytes - the size of the data at the beginning of the file contained in the root file after this structure;
ArchiveIndex - archive number with these files;
EntryOffset - data offset inside the archive;
EntryLength - data size.
Conclusion
This is the complete description of all the Steam file formats that I opened myself or with the help of materials from the cs.rin.ru forum (yes, that's exactly where the most ardent English-speaking no-Steam activists are still sitting). Only having finished this article, I realized that it could be safely incorporated into the previous one - the volume would not have increased much, and a small stump would hang like this ...
Well, nothing, in the next article I will describe how Steam works with all servers (root, authentication, content, etc.). The already outdated SteamNetwork2 protocol will be considered (the 3rd version based on HTTPS is currently running).