Reflecting on
software hell and other troubles with modern operating
systems, I came across an interesting article about baytkod and about in some unexpected places it can be used. I think this article echoes the above, so I decided to make a translation and post it here. This is my first translation in Habré, so do not judge strictly. If there are any comments on the translation, errors, etc. please write in a personal.
What is the most used bytecode in the world? Java (JVM bytecode)? .NET (CLI)? Flash (AVM1 \ AVM2)? Not. There are several bytecodes that you use every day by simply turning on a computer, or a tablet, or even a phone. You do not even need to launch applications or visit any page on the Internet.
')
ACPIPerhaps the most common in use is
ACPI bytecode. The ACPI specification (Advanced Configuration and Power Interface) is a giant
document of nearly 1,000 pages. And yes, it is understood that operating systems must follow the specification and implement it. Completely. The part related to bytecode can be found in Chapter 20 of the ACPI Machine Language, which describes a register virtual machine with a typical set of commands like Add, Substract, Multiply, Divide, comparison operations and such wonderful commands as ToHexString or Mid (substring in essence). As you read further, you will encounter the full object model, the specification of system properties, and the asynchronous notification mechanism that is triggered when any of the system properties change.
Of course, most devices must follow the ACPI specification completely, so the specification is fully implemented at the kernel level of operating systems, for example, the implementation in
Linux . All this code is executed at the initial stage of loading the operating system. The complexity of the implementation and execution of all this is commensurate with the full implementation of JavaScript and its environment. Due to the fact that the ACPI specification is very complex, Intel has created a platform-independent implementation of the specification -
ACPICA, and it is this implementation that is used in Linux and BSD kernels (including Mac OS), as well as in systems such as ReactOS and HaikuOS. I don't know if Windows uses this implementation, but because Microsoft’s company name is in the specification, I think their implementation was created long before ACPICA.
FontsLet's continue, do you want a graphical bootloader? To simply display a font in the OpenType format (only OpenType fonts with CFF glyphs, the complexity of the OpenType font format is a separate topic for conversation), you need to parse the data in
Type 2 Glyph format, which also includes the execution of specialized
bytecodes for building glyphs. This bytecode is even more interesting - it is a real stack interest interpreter and it even has a “random” command that allows you to build glyphs that are displayed randomly during execution. I can not imagine at least any useful application for this feature (it was also implemented in FreeType fonts), so I can only assume that it is actually used somewhere. The interpreter of this bytecode is known for its
stack overflow vulnerability, which allowed
iPhone jailbreak to be created by a specially crafted PDF file.
The glyph formation language is a simplified version of the
PostScript language . PostScript implements a full turing register-stack virtual machine based on the ideas of the Forth language. The disadvantages of this system (perpetual cycles, the whole document is completely interpreted, even if only one page needs to be displayed, and all because of the complexity of the internal state of the document) were the main cause of the PDF format — PostScript-based, but not containing the global state of the document, prohibiting arbitrary order of operations. In this model, for example, it is quite easy to check which image was added to the document without doing anything extra.
And of course, because The fonts are complex, and the OpenType specification is also complex, it also includes the entire specification of the TrueType fonts, which describes the
bytecode model for displaying fonts that look the same at
any screen resolutions . I will not delve into the wilds of these specifications. Here is the implementation of
FreeType . There is little interesting going on here, but it seems that once a
vulnerability was found here too.
To see this article on the screen, thousands of these small firmware are executed to form the correct glyph shapes for each font on the screen.
Packet filteringFurther, if you have a desire to intercept any network packet using tcpdump or libpcap (or using tools based on them, for example, Wireshark), you will use Berkeley Packet Filter based on register bytecode. Performance was very critical for people who debugged networks, so a simple
JIT compiler was also implemented at the Linux kernel level.
For those interested in history - an earlier implementation of BPF was part of the code that participated in the
SCO lawsuits
against Linux, although in general BPF was part of the code from
BSD4.3 and was simply copied to the Linux kernel. BPF was later replaced by a newer implementation known as Linux Socket Filter.
Well, why all this?The popularity of baytkod as a universal and flexible solution is very tempting, but at the same time baytkod does not exclude complexity and does not solve many potential problems, such as security of devices in general (full iPhone jailbreak because of stack overflow) and insanely complex requirements for the implementation of specifications , so complex that at present there is only one implementation of the ACPI mechanism, which is used in most of today's operating systems.
The four examples reviewed also show something interesting - extremely different situations in which bytecode can be used. In the case of ACPI, this is an interesting look at what seems to me to be an ugly implementation of an initially declarative specification, which is now in complete disarray. Font languages ​​Type 2 Glyph and TrueType Hinting are basic stack interpreters showing PostScript legacy. And BPF is a register-interested interpreter, with a rather strange register language that allows you to perform only fairly simple operations.
We also note that all of the above implementations had security problems, because the implementation of baytkode interest interpreters is not such an easy task. And finally, a question for hackers - do you know any other esoteric bytecode specifications? And the question to the creators of the specifications - do you
really need such flexibility?
From translatorInteresting links from comments to the original article:
UEFI bytecode see section 20
Sqlite bytecodeRAR bytecode