Transponder DST40: principle of operation, the history of the appearance and hacking, as well as some practice on brute force

Frame from the movie 'Beware of the car'

Long ago, back in the nineties of the last century, the automotive market, which is gaining momentum, was in dire need of the emergence of serious anti-theft systems (hereinafter referred to as immobilizers). In those times, there were no special obstacles for auto theftrs to prevent the engine from starting with a mechanical copy of a key or even completely without a key — simply by closing the wires. Needed immobilizers that can significantly complicate the process of starting the engine and further carjacking a car without a native ignition key.

It was then that the idea of creating a compact radio module (hereinafter referred to as a transponder), embedded directly into the ignition key of the car, came to light. In the car was installed immobilizer, communicating with the transponder over the air. The immobilizer sent a request to the transponder, and the transponder answered with a certain code, without which the immobilizer did not allow to start the engine. However, at first transponders were still quite primitive, relatively easily cloned devices. It was enough to have a radio interceptor and a bright head on the shoulders to understand the exchange algorithm and simulate the transponder response. A radical change in the immobilizer's communication algorithm with the transponder was required.

Today I will tell you about the history of the emergence and subsequent hacking of one of these algorithms, as well as tell you about the practical intricacies of the brute-force secret encryption process.
')
Further in the text, all the pictures will be clickable, so that if desired they can be considered in detail.

Part One: Indian Joe

So, demand creates supply: gradually, systems began to emerge on the market that used encryption in the process of transmitting data over the air. These systems, in fact, carried out the process of wireless identification of the key holder. In this case, the secret key stored in the transponder was not broadcast in any form, but was used for cryptographic “signing” of the request received from the immobilizer. One of such systems was developed by engineers of Texas Instruments corporation. The transponder developed by them was called Digital Signature Transponder (abbreviated DST).

The DST transponder turned out to be very small, which made it possible to integrate it into various compact tokens without any problems: for example, in car ignition keys or key fobs for keys. In the above photo, in the handle near the blade, you can see the hole closed by a plug, through which the transponder is placed inside the key. And the use of a hashing scheme in its design made the radio sniffing process completely useless (for the time being, but about that later), because so different data blocks were transmitted over the air that it was logical to trace at least some dependence in them .

The transponder consists of the following main components:

Antenna and transceiver (hereinafter referred to as a transceiver): designed to power up the transponder and to communicate with the base station via radio.
Encryption scheme: designed to hash the request received from the base station.
Non-volatile rewritable memory: designed to store the encryption key and some additional parameters (for example, the transponder serial number and manufacturer's ID).

A transponder is also programmed over the radio channel - without a direct connection to the programmer. You can overwrite almost any information, but only if the write protection bits are not set. Recorded transponder is easy to link to the base station without the help of any additional devices (the prerequisite is to match the manufacturer's ID in the transponder and in the base station).

The algorithm of the entire wireless authentication system is:

The base station transmits the unmodulated radio signal of such power and duration that it is enough to power the entire electronic transponder circuit that is in the range of the transceiver of the base station.
The base station, using a random number generator, generates a 40-bit request, remembers it and broadcasts it using the amplitude modulation of the radio signal.
The transceiver still emits a non-modulated radio signal for some time - so that the transponder has enough power to perform computational operations.
The transponder, having received a request from the base station, performs its DST40 hashing using a 40-bit encryption key stored in its non-volatile memory: the result is the so-called "signature".
The transponder transmits the digital signature and service information to the base station.
The base station performs hashing of the request just transmitted to the transponder using the same encryption key that is stored in it: if the result matches the one received from the transponder, then the authentication is considered successful. If this base station is used as a car immobilizer, then after successful authentication it transmits the engine start resolution to the car’s central computer.

It is not entirely clear what the considerations of development engineers were based on, but the fact remains: the length of the encryption key used in the hashing process is only 40 bits (by the way, why the algorithm was called DST40). Such a length of the encryption key, from the modern point of view, is completely inadequate to ensure any decent security. But at the time, apparently, development engineers did not seem so. In addition, the DST40 algorithm is purely proprietary and is disclosed to manufacturers only under strict non-disclosure subscriptions.

In the future, manufacturers' confidence in the security of the transponder continued to grow, as the algorithm remained unbroken for a long time.

As a result, DST40 transponders have become extremely popular. They were adopted by a number of large automobile corporations (for example, Toyota, Ford, Lincoln, Nissan, and others). Many millions of cars whose immobilizers use DST40 transponders gradually flooded not only the US markets, but also the markets of other countries actively importing cars from the USA. Moreover, these transponders also began to be used in the SpeedPass contactless payment system , which is rapidly conquering various fast foods, supermarkets, restaurants and gas stations in the United States.

Part Two: And the sound of thunder!

Everyone has long known that Indian Joe remains elusive only until no one needs him. So in this story, the DST40 algorithm remained unbreakable only until it was taken by young, energetic guys.

It happened in 2004. By that time, Texas Instruments 'engineers' confidence in the robustness of the DST40 algorithm had become so great that they were bursting with pride and the desire to share their achievements with anyone. And they decided to send one of the employees of the German division of the company - Dr. Ulrich Kaiser - to the fourth conference on AES with a small overview report on DST40. This report was the beginning of the end of the Indian Joe.

The fact is that this conference was attended by an expert in cryptography, a lecturer at the University of Information Security, Johns Hopkins (USA), Professor Evi Rubin ( Aviel D. Rubin ). One glance at the general scheme of the algorithm was enough for him to see in it serious holes in security. This is how the scheme looked like:

Despite the fact that the scheme was very general and it lacked many subtle details, the experienced cryptographer’s keen eye immediately caught on several vulnerabilities: first, it was clear that for each clock cycle the encryption key and request / response registers were minimally modified - just one bit. Secondly, the presence of a “weak” encryption key consisting of only zeros was obvious - in the process of hashing it will remain zero until the very end. This opened up the possibility of conducting various cryptanalytic experiments over the transponder, which could reveal its internal structure. And, thirdly, the key length was only 40 bits, which by the standards of 2004 was absolutely not enough to withstand the brute force performed with the help of hardware.

Of course, Evie understood that a large company, which occupies a significant segment in the production of such devices, cannot be convinced of the weakness and vulnerability of the algorithm simply with words. It was then that he came up with the idea of hacking the DST40 into practice - which would be the most irrefutable argument. First of all, he decided to assemble a team of several university students. He chose the most energetic and capable guys, whom he suggested doing this business: to delve into the algorithm, and at the same time to improve theoretical knowledge and practical skills on cryptography and cryptanalysis. So the team came into being, which included (in the above photo in order from left to right) Adam Stubblefield , Aviel D. Rubin , Stephen C. Bono and Matthew Green .

The next step was to purchase TI Series 2000 developer kit - LF RFID from Texas Instruments. This set included a receiver-transmitter for communicating with transponders and several transponders, which, however, were completely useless, since they did not perform encryption using the DST40 algorithm. So the guys needed the transponders separately.

Among other things, by the way, this developer’s kit also included software that allowed for encryption using the DST40 algorithm. However, the guys, as they later oathly assured all those present at the USENIX symposium, did not disassemble and debug the program code to get to the pale pink body of the proprietary algorithm, since this was prohibited by the license agreement.

Instead, they decided to use the "predictor" or "black box" method for hacking. In simple terms, they began to conduct various experiments, writing different encryption keys to transponders, as well as transferring various requests to them and studying the results of hashing obtained from them.

From the Kaiser scheme, it was clear that the basis of the encryption scheme is the Feistel network widely used in other encryption algorithms on logical elements with fixed truth tables. To completely break the algorithm, the guys needed to solve three problems:

Check if the Kaiser scheme is valid.
Determine the paths of the signal distribution from each of the bits of the key registers and the query to the logic elements.
Calculate the truth tables of all logical elements.

I will not go deep into the description of the experiments: whoever is interested, can familiarize themselves with them in this document presented to the public at the 14th USENIX Safety Symposium held in Baltimore in 2005.

I can only say that as a result, the guys managed to successfully solve all three tasks and restore the complete wiring diagram of the functional blocks included in the DST40 encryption module, including the truth tables of these blocks. Moreover, it should be noted that the actual functional scheme was not fully consistent with the Kaiser scheme. There were several differences:

After the start of hashing, 200 clock pulses, rather than 400, are sent to the circuit.
The request register for each clock cycle is shifted not by one, but by two bits at once.
The logical element designated on the Kaiser circuit “F21” has not one, but two outputs, which, before getting into the leftmost (according to the scheme) two bits of the query register, XOR are with the two rightmost bits from the same register.
To calculate the next left (according to the scheme) bit of the encryption key, the wrong bits are used that are shown in the diagram.

Also, the guys managed to figure out the encryption key modification algorithm: the key changes every third clock cycle, starting from the second.

Moreover, it turned out that the resulting hash is not fully transmitted by the transponder to the air - not all 40 bits, but only 24 of them. The consequence of this is the emergence of a large number of false results in the further selection of all possible combinations of encryption keys. However, this did not become too big a problem - it was enough just to check the found key again, but with another request / answer pair. If the second check also gave a match, then the key was considered found.

Then the guys developed a hardware key brutforser based on the FPGA XILINX board on board, which cost them just under $ 200. On the chip of this FPGA, they managed to place 32 hashing cores synchronously operating at 100 MHz. Each of the cores went through its own range of encryption keys. Ideally, one such board would go through the entire range of keys in approximately 19 hours of operation: (2 ⁴⁰ x200) / (100x10 ⁶ x32x3600) = 19.09 hours. But in reality, part of the time was spent on overhead - receiving commands from the computer. Therefore, a complete bust took almost 21 hours. To speed up the search process, 15 more such cards were purchased. In each of them, they programmed 32 of the same cores, combined the cards with each other into one network and received as a result a cluster of 512 parallel cores. In this case, each core had to perform a maximum of 2 40/512 = 2 ³¹ complete hashing cycles. This cluster coped with the task in less than an hour and a half.

The first guinea-pig was the ignition key from the Ford Escape SUV 2005 model, equipped with just such a transponder. With the help of a developer kit, two random queries were transmitted to the key and two corresponding responses were received. These two pairs of requests / responses were the original data submitted to the bruteforcer before the brute force. Less than an hour after the start of the search, the secret key was successfully found.

The next step was the manufacture of this transponder simulator, which could be used to start this car. The basis was a compact personal computer with a transceiver board installed in it and an external antenna connected to this board. To provide autonomous power supply for all the hardware, a UPS was used with an additional battery pack connected to it. A program was launched on the computer that listened to the broadcast through the transceiver in anticipation of a request from the immobilizer. Upon receipt of such a request, the program performed its hashing and transmitted the result back to the air. To start the car engine used a mechanical copy of the ignition key, which does not contain a transponder.

In the video below, Adam and Matthew demonstrate the process of starting the engine using a simulator:

Then, using this simulator, they successfully purchased gasoline at a gas station with the SpeedPass payment system:

After successfully conducting all these experiments, it was decided to publish the information obtained. At the time, Evie was on the board of directors of the USENIX association — therefore, it was a logical decision to publish this information at the next USENIX safety symposium.

However, in order to prevent the rapid collapse of payment systems and not to give thieves a handy means for easy hacking of immobilizers, the guys did not publish all the information. For example, the final functional hashing scheme was not published. The only thing that they provided as a confirmation of the veracity of their words are formulas that describe the key hashing algorithm and the truth table of the functional elements that make up the Feistel network. This was perfectly enough for engineers from Texas Instruments to realize the complete fiasco of the DST40 algorithm.

Part Three: Would you take advantage of it?

It was 2009. On the ground, with a crash, a growing wave of crisis rolled. Two people, let's call them, conditionally, Steve and John, were actively looking for options for obtaining additional income. The sensational story of the breaking of DST40 transponders gave them the idea that you could make a little money on this business. The idea was to offer the installation of remote engine start systems to car owners equipped with similar immobilizers. At that time, similar systems already existed, but all of them required sacrificing one key from the car: it had to be placed in the cabin in the immediate vicinity of the remote start device. It is clear that this deprived the immobilizer of any meaning whatsoever and forced the car owner to install a separate alarm system in the car. In this case, the car owners were offered a system devoid of these shortcomings: it was assumed that it would simulate the ignition key itself, and it would do so only at the moment of receiving the command to start the engine remotely.

Steve and John got down to business, studied the document published at the USENIX Symposium, and understood the hashing mechanism. As a result of their work, such a scheme emerged, which fully complies with the formulas published on page 14 of the above document:

Then they made two designs:

the first of these was the DST40 transponder programmer. It allowed you to read open information from transponders, send a request to the transponder and receive a hash result from it - just as a car immobilizer does, and also allowed you to write open information to a transponder along with an encryption key. With the help of the programmer, two pairs of requests / responses for the ignition key from the 2005 Toyota Toyota Camry were received.

The second design was a brute forcer based on the Xilinx Spartan 3E FPGA chip. Brute Forcer allowed the search method to find the encryption key stored in the transponder. To do this, the input data to the brute forcer was supplied with initial data in the form of two combinations of requests / responses and an encryption key, from which it was necessary to start the search. The brute forcer worked at a frequency of 135 megahertz, heaped 32 hashing cores on an FPGA chip, and performed a complete enumeration of all combinations in about 14 hours.

The result was, frankly, not encouraging: to wait for several hours when the brute forcer did his job was not a very pleasant experience. Therefore, Steve and John turned their attention to another data transfer area - between the central computer of Toyota and the immobilizer unit. After conducting a small survey and several tests, it turned out that this site, although it has some concealment of the transmitted data, is so primitive that it was not difficult to understand it and implement its device in the rupture of this path. While the device was in standby mode, it simply transmitted data from the computer to the immobilizer and back through itself. If it received a command to the remote engine plant, it disconnected the immobilizer and began to communicate with the computer on its own - simulating positive responses from the immobilizer. And the most important thing in the whole process was that the immobilizer response depended only and only on the computer request. The ignition key is simply not needed.

The programmer and bruteforcer remained unclaimed.

Part Four: Purely Sports Interest

A few more years passed after the events described above, and here once I got a DE0-Nano-SoC board in my hands. Its heart is the Altera Cyclone V SE 5CSEMA4U23C6N chip. It contains the ARM Cortex-A9 dual-core HPS processor (Hard Processor System) and FPGA with 15094 adaptive logic modules (ALM). Included with the board, the manufacturer provides the Linux OS deployed on a microSD memory card. This makes it easy to implement the user interface - without spending a lot of time on it. After mastering this device, I remembered that story about hacking the DST40 transponder and there was a purely sports interest - how many hashes can be squeezed out per second with DST40 brute force using such a device?

Having a scheme drawn by Steve and John, as well as two pairs of requests / responses they received from Toyota's key, I got down to business. At first, a simple way was chosen to extensively increase the number of hashing cores, of which as many as 128 pieces fit on the FPGA chip:

This diagram shows the following components:

HPS processor - executes dst40 program. It requests the initial data from the user, loads them into the registers of the control unit, starts / stops the search process, and displays information about the current state on the screen.
Control block. Its task is to prepare source data for all hashing kernels, control their work, check the results and transfer them to HPS.
Hash kernels. Each core forms a source key of two parts: a 7-bit fixed high part that matches the kernel sequence number (00H for the core number 0, 7FH for the core number 127) and a 33-bit low part variable obtained from the control unit. The cores perform hashing of the request using the DST40 algorithm, at the end of this process they compare the hash result with the response and transmit the result of the comparison to the control unit.

An approximate algorithm for the operation of the entire device is as follows:

The program prompts the user for initial data: two requests sent to the transponder and two corresponding answers received from the transponder, as well as an encryption key from which to start the search.
The program loads the first request / response and key into the control unit and starts the search process. After that, it waits when the control unit informs it about the detection of a match or the exhaustion of the enumerated keys.
The control unit sets the request / response and the key to the inputs of the source data of all cores and gives them the command to start hashing.
Each core performs hashing of the request. It will take 200 cycles. At the end of the work, each core compares the hash result with the response and sends the result of the comparison to the control unit.
The control unit evaluates the performance of all the cores: if no matches are found, then increment the key and proceed to step 3 of the algorithm.
- , , .
/ .
3 5 . — .
— , . . , .

The hardware part of the design has earned at a clock frequency of 200 megahertz. As a result, search speed was 128x200x10 ⁶ /200 = 128 million hashes per second. The device performed a complete enumeration of all options in 2 hours and 24 minutes. This, of course, was a very good result, but still not so good - to dwell on it.

Next, I will describe a few steps taken to speed up the iteration process. So…

Step one

Let's start with the optimization algorithm. Take another look at the above hashing scheme. We see that the hash result is read from the lower 24 bits of the request / response register. The upper 16 bits are not used. A natural question arises: why perform the last 8 cycles, if their result is then thrown away? Answer: absolutely no reason. It is enough to submit 192 clocks to the circuit, and then pick up the hash result from the upper 24 bits of the register. So let's do it: it will give us a completely free four-percent increase in speed.

Step two

Let's look at the truth tables of the logical elements given at the very end of the document.

Carefully look at the truth table of the function Fe

It is easy to see that the lines are duplicated in pairs. This means that the low input bit does not affect the result and can be discarded without any damage whatsoever. No sooner said than done.

Now the truth table has acquired this kind

This does not give an obvious increase in speed. However, as is well known, any reduction in the number of combinatorial logic in synchronous circuits has a positive effect on the possibility of increasing the clock frequency.

As a result, such a kernel scheme was born, in which the changes described above were taken into account (also in it the bits in the registers are renumbered so that the score goes from scratch - so more familiar):

Step Three

Programmers who are trying to squeeze the maximum speed out of a looped piece of a program have such a way of optimization as “unrolling loops”. It consists in the fact that the loop counter is reduced N times, and the sequence of commands executed in this cycle is repeated N after each other. This reduces redundancy introduced by the maintenance teams of the cycle counters.

Let us try and use this method: we will turn the entire 192-cycle cycle into one continuous line. The implementation of this option is quite possible, since the work of each cycle depends only on the results of the previous cycle and nothing else:

In this scheme, each logic block with the name "CYCLE N" includes the entire Feistel network used in the DST40 algorithm. It is clear that the length of logical circuits will become abnormally large and the clocking speed will have to be significantly reduced. However, such a scheme will produce a result for each measure, and not for each 192nd measure, as it was originally - it is worth a try!

We will implement such a scheme and test: as expected, the clock frequency had to be reduced to 2 megahertz, and the logic turned out so much that 8 cores could barely fit on the crystal. 16 million hashes per second is completely frivolous!

Throw this idea in a landfill? In no case!There is another trump card that can now be pulled out of the sleeve. It is called the conveyor. I believe many of the readers are aware of the pipelines. And if you do not know, I recommend reading about them in a wonderful article by Ivan Shevchuk, aka ishevchuk , “ A few words about pipelines in FPGA ”.

So, let's cut the logical chain into 192 links, at the junctions between which we put two forty-bit registers:

Compile.This construction worked at the same frequency as the original construction of 128 cores - 200 megahertz. But now new input data is received at its input for each measure. The result is also now removed from the circuit outputs for each clock cycle (starting from 192 clock cycles). The acceleration was 192 times! Cheers cheers!However, not everything is as rosy as we would like. The core circuit is so swollen that only one core is placed on the crystal. The resulting brute force rate was 200 million hashes per second.

Let's not despair - look for a compromise. Let's take a closer look at the resulting circuit again. It is striking that the whole chain of 192 links consists of 64 identical blocks of 3 links: in the first link of the block the key register does not change, in the second it is shifted by 1 bit, and in the third it does not change again. Let's try to change the scheme: remove the registers, cutting these blocks into three parts. Thus, the number of chain links will be reduced to 64, and the length of the logical circuits of each link will be tripled. The result will be a need to lower the clock frequency, but at the same time the core size will have to be significantly reduced.

We implement such a scheme and as a result we can place four such cores on a crystal. The TimeQuest analyzer allowed running this scheme on 125 megahertz. But since there were four cores and the scheme gives four results on each clock (starting from the 64th), the total search speed was 4x125x10 ⁶ = 500 million hashes per second. Already very well!

Step Four

Well, the final touch is overclocking! Where do without him? 125 megahertz, obtained at the previous step - this is the frequency at which the TimeQuest analyzer is not yet swearing. But Cyclone V chips have a very decent "safety margin" in speed. We take advantage of this and will raise the clock frequency of the circuit until it starts to make mistakes - skip past the right combinations of the original data. To evaluate the correctness of the scheme, the program in the HPS processor was replaced with a test one: in each cycle it formed random pairs of keys / requests, calculated the answer, loaded all this stuff into the pipeline and started the scheme. If after 64 cycles the circuit did not report a successful match, the test was considered not passed - the circuit frequency had to be lowered. In this way, the limiting frequency was found at which the circuit remained operable:170 megahertz At 175 megahertz, the scheme started to fail. At 170 megahertz brute force speed was 4x170x10⁶ = 680 million hashes per second. With an exhaustive search of all possible options for the keys, the device coped in less than 27 minutes.

Below is a video demonstrating the use of this brute forcer in practice:

Part Five: The Final

The total efficiency of the DE0-Nano-SoC based brute forcer exceeded the efficiency of the 512-nuclear cluster built by the Evi team by about 90 times (the design turned out 30 times cheaper and three times faster) - which is not surprising at the current stage.

If anyone wants to "dig" in the source code, then this can be done here . Also in the bin directory is the compiled firmware for uploading to FPGA (clock frequency is limited to 150 MHz for reliability) and the compiled program to run on HPS under Linux.

So let me leave! All health and good luck! Thanks for attention!

Source: https://habr.com/ru/post/275745/

All Articles