Unexpected Turing fullness everywhere

A catalog of software constructs, languages and APIs that are unexpectedly Turing complete; implications for safety and reliability. Application: how many computers in your computer?

Any rather complex C or Fortran program contains a rewritten, unspecified, buggy, and slow implementation of half of Common Lisp . - The tenth rule of Greenspan

Turing completeness (TC) is a property of the system, with some simple representation of input and output, to implement any computable function.

Turing completeness is a fundamental concept in computer science. It helps to answer many key questions, for example, why it is impossible to create an ideal anti-virus program. But at the same time, it is amazingly common . It would seem that a computer system is difficult to achieve such universality to perform any program, but it turns out the opposite: it is difficult to write a useful system that does not immediately turn into full Turing. It turns out that even a small control over the input data and converting them into a result, as a rule, allows you to create a turing-complete system. It can be fun, useful (although not usually ), harmful or extremely unsafe, and a real gift for a hacker (see about “language safety” , which studies methods of hacking “strange machines” ¹ ). Amazing examples of such behavior remind us that Turing completeness lurks everywhere, and protecting the system is extremely difficult.

“Too powerful” programming languages can also trigger unpleasant DoS attacks. Afl fazzer found such a roff in OpenBSD that it is capable of generating an infinite loop , abusing some string substitution rules.
')
Probably, these unexpected examples of turing-complete systems are best viewed as a subset of the “discovered” or “found” esoteric programming languages . So an extraordinarily minimalist in its essence FRACTRAN is not considered ² , as well as the specially obfuscated Malbolge language (where writing a trivial program will take years), because it is a specially designed esoteric PL. Also, “Life” is not part of our subset, because the questions about turing completeness appeared immediately after its release, and its recognition by Turing was not a surprise. And given the complexity of networks with routing and packet switching, it is not surprising that you can build a cellular automaton or program logic circuits on these networks, and planning / validating air tickets is not only NP-hard and even EXPSPACE-hard, but completely unsolvable (due to complex airline regulations).

Many configurations, special languages, tools, or complex games, as it turns out, violate the rule of least power and “accidentally become Turing-complete” , like MediaWiki templates , sed or multiple regexp / find-replace commands in an editor. In general, any form of replacing strings or templating, or compiling on the fly with high probability is a turing-complete system itself or when repeated, as they often support lambda calculus or rewriting of terms of a language or a label, for example, esoteric languages " /// " or thue .

XSLT , Infinite Minesweeper , Dwarf Fortress ³ , Starcraft, Minecraft , Ant , Transport Tycoon , C ++ templates and Java generalizations , DNA calculations and so on - all of these are Turing-complete systems, and this is also not surprising. Many games support scripts to simplify development and custom mods. Therefore, to make the game turing-complete elementary: it is enough to include the syntax to call more famous languages, such as Perl.

Turing completeness may simply be a little-known part of the standard format. Probably, in our time many people do not know that TrueType and many fonts are PostScript programs on stack machines, similar to ELF metadata and DWARF debugging information . Or that some music formats go beyond MIDI , support scripts, and need interpretation. If you know about the turing of the fullness of fonts, it is no longer surprising that the Turing documents are full of TeX, which naturally causes many serious and interesting security vulnerabilities in fonts and media, such as BLEND or Linux exploits SNES and NES . In other formats like PDF, just a terrible amount of vulnerabilities ⁴ . Again, outstanding achievements such as the creation of a small Turing machine from Lego or Domino ⁵ cubes are not considered, since we have long known how mechanical computers work.

On the other hand, the direction of computer security research called “weird machines” often reveals truly astounding turing-complete systems. Moreover, in different people they cause surprise in varying degrees: one seems unusual that others are not surprised.

Peano arithmetic : the addition and multiplication of natural numbers is sufficient for Turing completeness. In contrast, the Presburger arithmetic does not have multiplication and, therefore, is not Turing complete.
Van tiles : multicolored squares, the placement of which is determined by the rule that the adjacent sides of two tiles must be of the same color (historically understandable to Van, but the system surprised me, and probably many other people).
x86 frauds:
- MMU shuffles RAM to make programming easier. If the program correctly assigns addresses in memory in a special way, then it will be able to perform arbitrary calculations on the MMU using page-faults exceptions ( comments ; scientific work ) without running the code at all. The MMU exception mechanism turns into a computer with one instruction .
- mov is a Turing complete system : movable at first glance, mov instruction that transfers data between the CPU and RAM, allows you to implement a computer with a single instruction on the TTA trigger-transport architecture . On this computer, you can play Doom (as a bonus: and on the instructions xor too).
- "X86 is a turing-complete set without registers."
“Return-in- libc attacks”: software libraries provide pre-packaged functions, each of which is designed to do one useful thing. Of the calls to these functions, you can create a turing-complete "language" that can bypass security mechanisms, because the attacker does not run his own recognizable code. For many other examples, see “Geometry of innocent flesh on bone: return-into-libc without function calls (on x86)” and “On the expressiveness of return-into-lib attacks” .
Pokemon Yellow : " Pokemon Yellow 's full control hack" describes a memory damage attack that allows you to create arbitrary programs in the Game Boy assembler by walking back and forth and buying items in the game. There are similar achievements by speed-run fans (high-speed passing), but I usually ignore them as “unclean”: for example, you can turn Super Mario World on SNES into an arbitrary game like Snake or Pong , but new programs need to be loaded into additional equipment . In my opinion, this does not allow calling Super Mario World “unexpectedly” a turing-complete system and differs from other examples in this article. For example, you can exit from Super Game Boy to SNES and to arbitrary code, such as IRC . This is a controversial difference.
- A similar memory corruption problem occurs in printf from POSIX, in the %n option, as in other C library functions ( Karlini et al., 2015 ). Hence the « printbf -Brainfuck in printf .
- The StarCraft community exploited a buffer overflow in the game to implement complex maps, defense games, Mario games, and level editors for it. Hacking emulation to protect mods in upgraded versions of SC caused Blizzard a lot of problems .
Braid game is turing-complete
Musical notation with instructions for transferring subsequent notes becomes the esoteric language of Choon .
The heart muscle cells (cardiomyocyte) interact in such a way that they allow programming through logic gates, therefore, they represent a turing-complete system (perhaps this is not too surprising, because cellular automata are created on a biological example)
One category of strange machines is considered not quite turing-complete, since the user must click a mechanical switch or make the only possible choice for the system to go to the next step. In this case, the user does not introduce any logic and does not perform calculations, therefore this category does not fully satisfy the definition of turing-complete systems:
- Magic: the Gathering : this is a turing-complete system , based on the assumption that players mechanically agree to the proposed option, but otherwise all actions are subject to the rules of the game
- CSS is designed as a declarative markup language to customize the visual appearance of HTML pages, but on CSS declarations you can encode an elementary cellular automaton Rule 110 , which changes state by mechanical mouse clicks in the browser
- Microsoft PowerPoint animations (excluding macros, VBScript, etc.) with special connections can implement a Turing machine ( Wildenhain, 2017 : video ; PPT ) if the user clicks on active animation triggers

The following systems may be accidentally turing-complete:

CSS without mouse clicks
SVG: PostScript is a TC design, but what about the more modern SVG vector graphic format, which is written in XML, that is, in the language of documents, which is (usually) not turing-complete? It seems that in combination with XSLT it may still be, but I have not found any evidence or demonstration of this in the usual context of a web browser. The SVG standard is great and sometimes horrifying: an unsuccessful version of the SVG 1.2 standard tried to add the ability to open network sockets to SVG images.
Unicode : Nicholas Seriot suggests that Unicode bidirectional algorithms (designed to display right-to-left scripts, such as Arabic or Hebrew) can be quite complex to support the tag system through case- sensitive rules (for example, Turkish)

application

How many computers are there in your computer?

Some get bogged down in disputes about weird machines or about how “big” an AI agent will become: one such, two, ten or millions will be created. It doesn't matter, since this is just an organizational question. In fact, the inputs and outputs of the system are important: how efficient is the system as a whole, and what resources does it consume? No one cares if Google runs on 50 supercomputers, 50,000 mainframes, 5 million servers, 50 million embedded / mobile processors, or a combination of all of the above . It doesn't matter that Google uses a variety of chips: from self-made “tensor processors” to unique silicon processors (Intel implements them on chips for Xeon processors for a number of major customers), FPGA, GPU, CPU to even more exotic equipment like D-Wave quantum computers. It is only important that it remain competitive and can provide services for a reasonable fee. In the end, today, a supercomputer usually looks like a large number of servers in racks with a huge amount of GPUs and unusually high-speed InfiniBand connections. That is, the supercomputer is not so different from the data center, as you might think. Any of the listed equipment can support numerous strange machines, depending on its internal dynamics and connectivity.

Similarly, any AI system can be implemented as a single giant neural network or a multitude of individual neural networks operating asynchronously, or as a heterogeneous set of microservices, or as a “society of reason”, and so on. All this is not particularly important. In terms of complexity or risk, it’s not so important how the system is organized as long as it works. The system can be seen on many levels, each of which is equally invalid in and of itself, but useful for different purposes in a common system.

Here is an example of a badly defined question: how many computers are in your pockets and on the table now? How many computers are there in your “computer”? Think only one? Let's take a closer look.

It's not just about the CPU: nowadays, transistors and processor cores are so cheap that now it often makes sense to allocate separate cores for real-time tasks, for increasing performance, for security, to avoid loading the main OS, for compatibility with old architecture or existing software package. Just because a DSP or core is faster to program than to create a specialized ASIC, or because this is the simplest possible solution. In addition, many of these components can be used as computational elements, even if they are not designed or hide this functionality altogether.

So:

In an ordinary Intel processor, billions of transistors perform many tasks:
- Each of the 2–8 main processor cores is capable of operating independently, turning on and off as needed, has its own cache (larger than RAM in most computers until recently), and should be considered as an independent computer.
- The CPU as a whole is reprogrammed through microcode, for example, to eliminate chip design errors, and flaunts increasingly opaque objects, such as the Intel Management Engine (with JVM for programming ; Ruan 2014 and SGX ) or the Platform Security Processor (PSP) from AMD, or Android TEE . These hardware modules, as a rule, are full-fledged computers in their own right, operate independently of the host and can interfere with its operation.
- Any FPU can become a turing-complete system through encoding in a floating point operation in the spirit of FRACTRAN.
The MMU can be programmed into a strange page-fault machine, as mentioned above.
DSP blocks, custom chips. Probably, ASIC for video formats like h.264 will not be turing-complete systems (despite the support for complex deltas and compression methods that can allow for something like Van tiles). But Apple's A9 mobile SoC goes far beyond the usual dual-core ARM processor with an integrated GPU. Like Intel or AMD desktop chips, it includes the Secure Enclave secure environment (physically dedicated processor cores), but also contains an image co-processor, voice recognition co-processor (partly to support Siri), and apparently several other cores. These ASICs sometimes exist for AI tasks and, apparently, specialize in matrix multiplications for neural networks, and since the recurrent neural networks are Turing-complete, then ... you understand. Motorola, Qualcomm and other companies also rushed to expand their SoC.
Motherboard BIOS and / or control chips with network access.
- Mark Ermolov notes:
  
  "It's amazing how many disparate processor cores are integrated into the Intel Silvermont Moorefield SoC (ANN): x86, ARC, LMT, 8051, Audio DSP, each on its own firmware and with support for the JTAG interface
These control or debug chips can “accidentally” remain activated on after sales devices, like the ARM built into the Via C3 CPU .
There are several hundreds or thousands of simple cores in a GPU, each of which works very well with neural networks or performs general-purpose calculations (albeit slower than a processor).
Tape drives, hard drives, flash drives, and SSDs typically run on ARM processors to run built-in utilities for tasks such as hiding bad sectors from the operating system. They can be hacked. But ARM processors are used in most embedded applications, so ARM likes to boast that “a modern smartphone contains from 8 to 14 ARM processors, one of which is an application processor (running Android or iOS), and the other is a processor for a frequency band stack (baseband stack) . ”
Network chips perform independent processing for DMA (thanks to this independence, functions like Wake-on-LAN for netboot work ).
smartphones: in addition to all the mentioned blocks, there is also an independent baseband processor that runs under its own real-time OS for processing communications with cell towers / GPS / etc. Or even more than one, if you use virtualization like L4 . Backdoors have already been detected in baseband processors, in addition to other vulnerabilities.
SIM cards for smartphones are much more than just memory cards that record your subscriber data. These are smart cards that can independently launch Java Card applications (probably NFC chips, too). It is like a JVM in IME. Naturally, SIM cards can be hacked and used for surveillance, etc.
Devices connected via USB or to the motherboard can be equipped with their own processors. For example, WiFi adapters, keyboards, mice, and others. Theoretically, most of them are isolated from direct interference with the host through DMA and IOMMU, but the devil is in the details ...
random strange chips like the MacBook Touch running WatchOS .
...

So in a regular smartphone or desktop computer there will be from fifteen to several thousand computers in the sense of turing-full devices. Each of them can be programmed, it has enough power to run many programs and can be used by an attacker to monitor, exfiltrate, or attack the rest of the system.

There is nothing unusual in the historical context, because even the very first mainframes usually included several computers where the main computer performs batch processing, and auxiliary computers provide high-speed I / O operations that would otherwise interfere with the main machine with their own interrupts.

In practice, apart from the information security community (since all these computers are insecure and therefore useful for the NSA and virus writers), all other users do not care that insanely complex systems are hidden under the hood of our computers, which are more accurately regarded as a colorful menagerie from hundreds computers, awkwardly connected with each other (it is not clear, “network is a computer” or “computer is a network” ...?). The user perceives and uses this as one computer.

1. The active area of research is the creation of languages and systems that are carefully designed and are not guaranteed to be turing-complete (for example, totally functional programming ). Why make so much effort to create a language in which it is impossible to write many programs? The fact is that Turing completeness is closely related to Godel's incompleteness theorem and Rice's theorem.. Therefore, if we allow TC, then we lose all possible provability properties. On the contrary, different useful things are easily proved in a non-Turing language: for example, that a program is complete, type safe or not, that it can be easily converted into a logical theorem, that it consumes a limited amount of resources, that the implementation of the protocol is true or equivalent to another implementation. It is easy to prove that there are no side effects and that the program can be converted into a logically equivalent, but faster option (this is especially important for declarative languages like SQL, where the ability of the optimizer to convert queries is the key to acceptable performance. Although, of course, you can do amazing things with SQL, such asgradient descent for machine learning models , and some SQL extensions make it turing-complete anyway, allowing you to either encode a looping system of tags , or modelDSL , or call PL / SQL , etc.

Here is some literature about strange machines:

"Programming exploits: from buffer overflows to strange machines and the theory of computation," Bratus et al., 2011
"The problem of stopping in the security of the network stack" , Sassamen et al., 2011
“The Strange Page-Fault Machine: Computing Lessons Without Instructions,” Bangert et al., 2013
“Strange cars in ELF: focus on undervalued metadata” , Shapiro et al 2013
“Interrupt-oriented programming of bugdors: a minimalist approach to the introduction of bugdors into embedded systems firmware” , Tan et al., 2014
“Strange machines in evidence” , Vaneg, 2014
"Frame alignment signals - return to portable shellcode" , Bosman & Bos, 2014

↑

2. Although linear neural networks exploit floating-point mode with rounding to zero to encode potentially turing-complete behavior (for RNN), but this is not noticeable in normal operation, which is also a random turing-complete behavior and a good example of a secure language. ↑

3. Dwarf Fortress gives clockwork, so Turing completeness is not surprising. But water is implemented as a simple cellular automaton, so there are even more ways to get turing completeness! Now the game wiki names four potential ways to create logic gates: fluids, clockwork mechanisms, mine carts, and creature / animal logic gates with doors and pressure sensors. ↑

4. The full specification of PDF is solely inflated. For example, in a simple PDF viewer that supports enough PDF specifications, like Google Chrome’s browser, you can play Breakout (because PDF includes its own strange subset of JavaScript). The official Adobe PDF viewer supports up to 3D CAD functionality. ↑

5. See the domino logic gates on the Think Math and the 4-bit adder domino demo . ↑

Source: https://habr.com/ru/post/429602/

All Articles

Unexpected Turing fullness everywhere

see also

Links

application

How many computers are there in your computer?

More articles: