📜 ⬆️ ⬇️

CPIO under the microscope

cpio CPIO is quite old (1990), but at the same time a very convenient version of the archive. It is rather simple, and, perhaps therefore, has received a wide circulation. For example, this format is used RPM, initramfs of the Linux kernel, as well as the installer of the archives "pax" from Apple.

This archive allows you to collect any number of files, directories and other objects of the file system (symbolic links, etc.) into a single stream of bytes.

Let's take a look at examples of the format of this archive.

Each file system object in such an archive consists of a header with basic metadata, followed by the full path to the object and the contents of this object. The header contains a set of integer values ​​that largely follow the fields of the stat (2) structure of a file on * nix systems. The end of the archive is marked with a special entry (similar to the rest) with the name 'TRAILER !!!'.
')
File format.

At the moment, the most common is the old CPIO file entry format. Its description will be given.

The header of the recording format has the following structure:

struct header_old_cpio { unsigned short c_magic; unsigned short c_dev; unsigned short c_ino; unsigned short c_mode; unsigned short c_uid; unsigned short c_gid; unsigned short c_nlink; unsigned short c_rdev; unsigned short c_mtime[2]; unsigned short c_namesize; unsigned short c_filesize[2]; }; 

Here it is assumed that the unsigned short type is 16 bits.

c_magic
The integer value is 070707 (in octal CC), or 0x71c7 (in hexadecimal CC). Used to determine byte order (little-endian vs big-endian).

c_dev , c_ino
Device and inode numbers from the disk. Match the values ​​in the stat structure. If the inode value is greater than 65535, then the high-order bits will be lost.

c_mode
The field simultaneously determines the access rights and type of the object:
0170000Masks file type bits
0140000Socket
0120000Symbolic link. For symbolic links, the body of the link will contain the path to the file to which it refers.
0100000Regular file
0060000Special block device
0040000Catalog
0020000Special character device
0010000Named pipe (named pipe) or queue (FIFO).
0004000SUID
0002000SGID
0001000Sticky bit.
0000777The lower 9 bits define access rights to the object.

c_uid , c_gid
User IDs and groups of the file owner.

c_nlink
The number of links to this file. For catalogs, the value of this field is always at least two.

c_rdev
Only for special character and block devices. The field contains
associated device number. For all other file types, the value is
this field must be zero.

c_mtime
The time the file was last modified. The format corresponds to the number of seconds
since the beginning of the UNIX era. A 32-bit integer is written as an array of two.
16-bit integers: first most significant digits, then least significant ones.

c_namesize
The length of the string of the full path to the file including the terminal NULL.

c_filesize
File size.

Immediately after the title is placed the full path to the object. If the length of the path string is not a multiple of a power of two, then another NULL is added to the end. Then put the contents of the file. If the size of the content is not a multiple of a power of two, then it is padded with zeros

Sample archive.

Now let's take a microscope. I'll take Bless as a microscope. I can't say that I really like this hex editor, but the name of the one that I like I forgot.

Create a simple directory:

 cpio_test | + test.txt | + testl.txt 


Here testl.txt is a symbolic link to the test.txt file.
Contents of the test.txt file:
 Simple example of cpio usage. 

Then create an archive:
 $ find cpio_test | cpio -ov > example.cpio 

and open the resulting archive in your favorite hex editor.

My archive looks like this:
 0000 | C7 71 09 08 9A 34 FD 41 F4 01 F4 01 02 00 00 00 | .q...4.A........ 0010 | 8C 4E 09 31 0A 00 00 00 00 00 63 70 69 6F 5F 74 | .N.1......cpio_t 0020 | 65 73 74 00 C7 71 09 08 A2 34 B4 81 F4 01 F4 01 | est..q...4...... 0030 | 01 00 00 00 8C 4E 09 31 13 00 00 00 1E 00 63 70 | .....N.1......cp 0040 | 69 6F 5F 74 65 73 74 2F 74 65 73 74 2E 74 78 74 | io_test/test.txt 0050 | 00 00 53 69 6D 70 6C 65 20 65 78 61 6D 70 6C 65 | ..Simple example 0060 | 20 6F 66 20 63 70 69 6F 20 75 73 61 67 65 2E 0A | of cpio usage.. 0070 | C7 71 09 08 9C 34 FF A1 F4 01 F4 01 01 00 00 00 | .q...4.......... 0080 | 8C 4E 1A 2F 14 00 00 00 08 00 63 70 69 6F 5F 74 | .N./......cpio_t 0090 | 65 73 74 2F 74 65 73 74 6C 2E 74 78 74 00 74 65 | est/testl.txt.te 00A0 | 73 74 2E 74 78 74 C7 71 00 00 00 00 00 00 00 00 | st.txt.q........ 00B0 | 00 00 01 00 00 00 00 00 00 00 0B 00 00 00 00 00 | ................ 00C0 | 54 52 41 49 4C 45 52 21 21 21 00 00 00 00 00 00 | TRAILER!!!...... 


Well, let's understand.

0x71c7 = 070707 - the beginning of the title. And we can already say that the order of bytes when creating the archive is little-endian.
0x0809 - this is c_dev - the number of the device on which the file is located.
0x349a is c_ino - inode. In this case, just the older bits were lost.
0x41fd = 0040775 - c_mode. That is, the title describes the directory with 0775 access rights.
0x01f4 = 500 - c_uid.
0x01f4 = 500 - c_gid.
0x0002 - c_nlink. Each directory has at least two links (. And ..)
0x0000 - c_rdev.
0x4e8c and 0x3109 are high and low bits of the 32-bit file modification time. 0x31094e8c = 1317810441.
0x000a - the length of the directory name.
0x00000000 - the directory has no body.
Next comes the name of the directory.

Then immediately follows the heading of the next entry. We will not dwell on it in detail - just notice some differences:
c_mode: 0x34a2 = 0100664 - indicates that this is a regular file with 664 permissions.
0x0000001e - the size of the file contents.
The rest of the entry does not look like a directory description.

Next comes the symbolic link. The content of a symbolic link is the name of the file to which it points. Otherwise, both the metadata header and the file path are similar to the structures for a regular file.

This is not an artful way to create an archive of CPIO. In the future, I would like to consider the format of the file created by Gzip in a similar manner. In particular, the ramfs used by the GNU / Linux kernel is created using the cpio + gzip bundle.

I hope the article will be useful.

Related Links:
CPIO Utility Description
CPIO format description

Source: https://habr.com/ru/post/130092/


All Articles