CPIO is quite old (1990), but at the same time a very convenient version of the archive. It is rather simple, and, perhaps therefore, has received a wide circulation. For example, this format is used RPM, initramfs of the Linux kernel, as well as the installer of the archives "pax" from Apple.
This archive allows you to collect any number of files, directories and other objects of the file system (symbolic links, etc.) into a single stream of bytes.
Let's take a look at examples of the format of this archive.
Each file system object in such an archive consists of a header with basic metadata, followed by the full path to the object and the contents of this object. The header contains a set of integer values that largely follow the fields of the
stat (2) structure of a file on * nix systems. The end of the archive is marked with a special entry (similar to the rest) with the name 'TRAILER !!!'.
')
File format.
At the moment, the most common is the old CPIO file entry format. Its description will be given.
The header of the recording format has the following structure:
struct header_old_cpio { unsigned short c_magic; unsigned short c_dev; unsigned short c_ino; unsigned short c_mode; unsigned short c_uid; unsigned short c_gid; unsigned short c_nlink; unsigned short c_rdev; unsigned short c_mtime[2]; unsigned short c_namesize; unsigned short c_filesize[2]; };
Here it is assumed that the unsigned short type is 16 bits.
c_magicThe integer value is 070707 (in octal CC), or 0x71c7 (in hexadecimal CC). Used to determine byte order (little-endian vs big-endian).
c_dev ,
c_inoDevice and inode numbers from the disk. Match the values in the stat structure. If the inode value is greater than 65535, then the high-order bits will be lost.
c_modeThe field simultaneously determines the access rights and type of the object:
0170000 | Masks file type bits |
0140000 | Socket |
0120000 | Symbolic link. For symbolic links, the body of the link will contain the path to the file to which it refers. |
0100000 | Regular file |
0060000 | Special block device |
0040000 | Catalog |
0020000 | Special character device |
0010000 | Named pipe (named pipe) or queue (FIFO). |
0004000 | SUID |
0002000 | SGID |
0001000 | Sticky bit. |
0000777 | The lower 9 bits define access rights to the object. |
c_uid ,
c_gidUser IDs and groups of the file owner.
c_nlinkThe number of links to this file. For catalogs, the value of this field is always at least two.
c_rdevOnly for special character and block devices. The field contains
associated device number. For all other file types, the value is
this field must be zero.
c_mtimeThe time the file was last modified. The format corresponds to the number of seconds
since the beginning of the UNIX era. A 32-bit integer is written as an array of two.
16-bit integers: first most significant digits, then least significant ones.
c_namesizeThe length of the string of the full path to the file including the terminal NULL.
c_filesizeFile size.
Immediately after the title is placed the full path to the object. If the length of the path string is not a multiple of a power of two, then another NULL is added to the end. Then put the contents of the file. If the size of the content is not a multiple of a power of two, then it is padded with zeros
Sample archive.
Now let's take a microscope. I'll take
Bless as a microscope. I can't say that I really like this hex editor, but the name of the one that I like I forgot.
Create a simple directory:
cpio_test | + test.txt | + testl.txt
Here testl.txt is a symbolic link to the test.txt file.
Contents of the test.txt file:
Simple example of cpio usage.
Then create an archive:
$ find cpio_test | cpio -ov > example.cpio
and open the resulting archive in your favorite hex editor.
My archive looks like this:
0000 | C7 71 09 08 9A 34 FD 41 F4 01 F4 01 02 00 00 00 | .q...4.A........ 0010 | 8C 4E 09 31 0A 00 00 00 00 00 63 70 69 6F 5F 74 | .N.1......cpio_t 0020 | 65 73 74 00 C7 71 09 08 A2 34 B4 81 F4 01 F4 01 | est..q...4...... 0030 | 01 00 00 00 8C 4E 09 31 13 00 00 00 1E 00 63 70 | .....N.1......cp 0040 | 69 6F 5F 74 65 73 74 2F 74 65 73 74 2E 74 78 74 | io_test/test.txt 0050 | 00 00 53 69 6D 70 6C 65 20 65 78 61 6D 70 6C 65 | ..Simple example 0060 | 20 6F 66 20 63 70 69 6F 20 75 73 61 67 65 2E 0A | of cpio usage.. 0070 | C7 71 09 08 9C 34 FF A1 F4 01 F4 01 01 00 00 00 | .q...4.......... 0080 | 8C 4E 1A 2F 14 00 00 00 08 00 63 70 69 6F 5F 74 | .N./......cpio_t 0090 | 65 73 74 2F 74 65 73 74 6C 2E 74 78 74 00 74 65 | est/testl.txt.te 00A0 | 73 74 2E 74 78 74 C7 71 00 00 00 00 00 00 00 00 | st.txt.q........ 00B0 | 00 00 01 00 00 00 00 00 00 00 0B 00 00 00 00 00 | ................ 00C0 | 54 52 41 49 4C 45 52 21 21 21 00 00 00 00 00 00 | TRAILER!!!......
Well, let's understand.
0x71c7 = 070707 - the beginning of the title. And we can already say that the order of bytes when creating the archive is little-endian.
0x0809 - this is c_dev - the number of the device on which the file is located.
0x349a is c_ino - inode. In this case, just the older bits were lost.
0x41fd = 0040775 - c_mode. That is, the title describes the directory with 0775 access rights.
0x01f4 = 500 - c_uid.
0x01f4 = 500 - c_gid.
0x0002 - c_nlink. Each directory has at least two links (. And ..)
0x0000 - c_rdev.
0x4e8c and 0x3109 are high and low bits of the 32-bit file modification time. 0x31094e8c = 1317810441.
0x000a - the length of the directory name.
0x00000000 - the directory has no body.
Next comes the name of the directory.
Then immediately follows the heading of the next entry. We will not dwell on it in detail - just notice some differences:
c_mode: 0x34a2 = 0100664 - indicates that this is a regular file with 664 permissions.
0x0000001e - the size of the file contents.
The rest of the entry does not look like a directory description.
Next comes the symbolic link. The content of a symbolic link is the name of the file to which it points. Otherwise, both the metadata header and the file path are similar to the structures for a regular file.
This is not an artful way to create an archive of CPIO. In the future, I would like to consider the format of the file created by
Gzip in a similar manner. In particular, the ramfs used by the GNU / Linux kernel is created using the cpio + gzip bundle.
I hope the article will be useful.
Related Links:
CPIO Utility DescriptionCPIO format description