MIT course "Computer Systems Security". Lecture 16: "Attacks through the side channel", part 3

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.

Lecture 1: "Introduction: threat models" Part 1 / Part 2 / Part 3
Lecture 2: "Control of hacker attacks" Part 1 / Part 2 / Part 3
Lecture 3: "Buffer overflow: exploits and protection" Part 1 / Part 2 / Part 3
Lecture 4: "Separation of privileges" Part 1 / Part 2 / Part 3
Lecture 5: "Where Security Errors Come From" Part 1 / Part 2
Lecture 6: "Opportunities" Part 1 / Part 2 / Part 3
Lecture 7: "Sandbox Native Client" Part 1 / Part 2 / Part 3
Lecture 8: "Model of network security" Part 1 / Part 2 / Part 3
Lecture 9: "Web Application Security" Part 1 / Part 2 / Part 3
Lecture 10: "Symbolic execution" Part 1 / Part 2 / Part 3
Lecture 11: "Ur / Web programming language" Part 1 / Part 2 / Part 3
Lecture 12: "Network Security" Part 1 / Part 2 / Part 3
Lecture 13: "Network Protocols" Part 1 / Part 2 / Part 3
Lecture 14: "SSL and HTTPS" Part 1 / Part 2 / Part 3
Lecture 15: "Medical Software" Part 1 / Part 2 / Part 3
Lecture 16: "Attacks through the side channel" Part 1 / Part 2 / Part 3

Audience: Do you use the Karatsuba method?
')
Professor: Yes, this is a clever multiplication method that does not require four stages of computation. Is the Karatsuba method taught in the .601 course, or how is it designated today?

Audience: 042.

Prof: 042, excellent. Yes, this is a very good method. It is used by almost every cryptographic library. For those of you who are not graduates of our institute - I say that, because we have graduate students here - I will write about the Karatsuba method on the blackboard. Here you need to calculate three values:

a ₁ b ₁
(a ₁ - a ₀ ) (b ₁ - b ₀ )
a ₀ b ₀

Thus, you make 3 multiplications instead of four, and it turns out that you can recover this value a ₁ b ₀ + a ₀ b ₁ of these three multiplication results.

The special way to do this is this ... let me write it in a different form.

So, we will have:

(2 ⁶⁴ + 2 ³² ) (a ₁ b ₁ ) +
(2 ³² ) (- (a ₁ - a ₀ ) (b ₁ - b ₀ )
(2 ³² + 1) (a ₀ b ₀ )

This is not very clear, but if you work through the details, then ultimately convince yourself that this value in these 3 lines is equivalent to the value of ab, but at the same time reduces the computation by one multiplication. And the way we apply this to more voluminous multiplications is that you continue to go down recursively. So, if you have 512 bit values, you can break them into 256-bit multiplication. You do three 256-bit multiplications, recursively using Karatsuba's method each time. In the end, your calculations are reduced to machine size and can be processed by a single machine instruction.

So where is the timings attack here? How do these guys use Karatsuba multiplication? It turns out that OpenSSL is worried about two types of multiplication that you may have to do.
The first is the multiplication of two large numbers of approximately the same size. This happens many times when we perform modular exponentiation, because all the values that we multiply will be approximately 512 bits in size. Therefore, when we multiply c by y or square it, we multiply two things of about the same size. In this case, the Karatsuba method makes a lot of sense, because it reduces the size of the numbers squared by about 1.58 times, which greatly speeds up the calculation process.
The second type of multiplication is when OpenSSL multiplies two numbers, the size of which differs significantly from each other: one is very large and the other is very small. In this case, you could also use the Karatsuba method, but it will run slower than primitive multiplication. Suppose you multiply a number of 512 bits by a 64-bit number, you will have to raise each bit of the first number to 64 degrees, and as a result you will get a 2n slowdown instead of acceleration n / 1.58. Therefore, these guys using OpenSSL, tried to get smarter, and this is where the problems started.

They decided that they would dynamically switch between the effective Karatsuba method and the primary school multiplication method. Their heuristics were as follows. If the two numbers that you multiply consist of the same number of machine words, or at least have the same number of bits as 32-bit units, then the Karatsuba method is used. If two numbers are very different in size from each other, squaring or direct simple, normal multiplication is performed.

In this case, you can track how the switch to another multiplication method. Since the moment of switching does not pass without a trace, it will be noticeable after it, whether it takes much more time to multiply now or much less than before. This circumstance was used by researchers to organize an attack using the method of determining timings.

I think that I’ve finished telling you about all the strange tricks that people use to put RSA into practice. Now let's try to put them together and use against a whole web server to figure out how you can “pinch” the bits of interest to us from the input network packet.

If you remember from the HTTPS lecture, the web server has a secret key. He uses this secret key to prove that he is the correct owner of all certificates via HTTPS or TLS. This is due to the fact that the clients send some randomly selected bits, these bits are encrypted using the server's public key, and the server decrypts the message using the TLS protocol. And if the message is verified, then these random bits are used to establish a communication session.

But in our case this message will not be checked, because it will be created in a special way, and when it turns out that the additional bits do not match, the server will return an error as soon as it finishes decrypting our message.

That's what we're going to do here. The server - you can assume that it is Apache with SSL open - will receive a message from the client, which will count the encrypted text with or some hypothetical encrypted text that the client has created. The first thing we do with ciphertext is decrypt it using the formula c → (c ^d mod n) = m.

If you remember the first optimization, we are going to apply the Chinese remainder theorem and divide our text into two parts: one to calculate mod p, the other mod q, and then combine the results. First, we take c and present it in two quantities: the first is called c ₀ , it will be equal to mod q, and the second we will denote c ₁ , and it will be equal to c mod p. Then we do the same to calculate c for d mod p and c for d mod q.

Next we are going to switch to the Montgomery representation, because this will make our multiplications very fast. So the next thing SSL is going to do with your number is to calculate c ₀ ', which will be equal to c ₀ R mod q and do the same here, for c1, I will not record it because it looks the same .

Now that we are in the form of Montgomery, we can finally produce our multiplications, and here we will use the “sliding window” technique. As soon as we get c ₀ ', we will complete this simple construction of c ₀ ' to the power d by mod q. And here, since we calculate this value for d, we will use “sliding windows” for the exponent bits d, and also apply the Karatsuba method or the usual multiplication depending on what the size of our operands is.

So, if it turns out that this value is c ₀ 'and the previously obtained squaring result is the same size, we use the Karatsuba method. If c ₀ 'is very small and the previous result of multiplication is large, then we will square it and multiply it in the usual way. Here we use "sliding windows" and the Karatsuba method instead of normal multiplication.

Also at this stage, additional reductions appear. Because with each multiplication, the additional abbreviations will be proportional to what we are raising to the power modulo q, that is, to the value (c ₀ ') ^d . Here, with a simple connection of the formula, the probability of additional reductions will be proportional to the value of c ₀ 'mod q divided by 2R. It is in this place that a bit appears that affects timing.

In fact, there are two possible effects: the use of the Karatsuba method instead of normal multiplication and the appearance of additional cuts that you are going to make.

In a second you will see how it can be used. Now, when you got this result for mod q and are going to get a similar result for mod p, you can finally recombine these two parts above and below and use the CRT, the Chinese theorem on residuals.

And what you end up with from CRT ... sorry, I think we first need to convert it back from the Montgomery form. Therefore, before recombination, we transform the upper part into the expression (c ₀ ') ^d / R mod q and return our value cd mod q. In the lower part, we respectively obtain cd mod p.

You can now use CRT to get the value of c ^d mod n. Sorry for the small font, I did not have enough boards. Approximately the same thing we have here below for with ₁ , and we can finally get our result, that is, the message m.

Thus, the server takes an incoming packet, which receives it, runs it through the whole pipeline, runs the two parts of this pipeline, and ends with the decoded message m, equal to cd mod m. Then he is going to check the padding padding of this message. In the case of our specific attack, we created with in such a way that in fact this addition will not coincide. We chose the value of c for such heuristics that do not encrypt the real message with the correct padding padding.

Thus, the addition will not withstand the checks, the server will need to write an error, send an error message to the client and terminate the connection. So, we are going to measure the time it takes the server to pass our message through this pipeline. Have questions about the message processing process of the server and the integration of all these optimizations?

Audience: in my opinion, there is an error with the index of the value of c.

Professor: yes, you are right, I finish the index 0, here it should be c ₀ ^d mod q.

Audience: when you divide by R mod q, aren't there any assumptions about how much q you should add to further reduce low bits by zeros?

Professor: yes, you are right, at this final stage (c ₀ ') ^d / R mod q there may be additional abbreviations. So we need to do this division of R in the right way, and probably should do the same thing as when doing the Montgomery cut here when we divide by R to convert the value back. Since at the beginning of the calculations it is not clear how much q we should add, we use the selection method, destroy low zeros, then do mod q again, and possibly an additional abbreviation. You say absolutely true, in this case it is exactly the same division by R mod q as for each step of the Montgomery multiplication.

So how to use it? How can an attacker unravel the server's secret key by measuring the time of operations? These guys have a plan that is based on guessing one bit of the private key at a time. We can assume that the secret key is an encrypted exponent d, because you know e and you know n, it is a public key. The only thing you do not know is d.

In fact, in this attack they do not directly guess the value of d, since it is rather difficult. Instead, they consider the value of q or p, no matter which of these two quantities. As soon as you guess what value is p or q, you can calculate n = pq. Then, if you know the values of p and q, you can calculate the function φ, which we talked about earlier. This will allow you to get the value of d from the value of e. Therefore, this factorization of the n value is extremely important; it must be kept secret to ensure RSA security.

So actually, these guys intended to guess the value of q, analyzing the timing of this pipeline. What are they doing for this? They carefully select the initial value of c and measure its passage through the server pipeline.

In particular, there are two parts to this attack, and you must take some initial steps to guess the first few bits. Then, as soon as you have the first few bits, you can guess the next bit. So let me not tell in detail how they guess the first couple of bits, because it's actually much more interesting to see how they guess the next bit. If we have time, we will return to how the guessing of several initial bits in a lecture article is described.

So let's say that you have an assumption g about which bits are in the meaning of this q. Let this quantity consist of such bits: g = g ₀ g ₁ g ₂ ... and so on. Rather, it is not even g, but the real bits of q, so let me rewrite it in this form: g = q ₀ q ₁ q ₂ .... We believe that these q - high bits, and we are trying to guess the bits lower and lower. Suppose we know the value of q up to the qj bit, and then all the zeros follow. You do not know what the other bits are.

These guys tried to inject this guess g into this place of our pipeline: (c0 ') d mod q. Because this is the place where two types of optimization are used: the Karatsuba method instead of the usual multiplication and a different number of additional reductions depending on the value of c ₀ '. Actually, they tried to introduce two different guesses to this place of the pipeline: the first one, which looks like g = q ₀ q ₁ q ₂ ... qj 000 ... 0000 and the second, which they called g _high , which consists of the same high bits, but instead of all zeros at the end there is a one that means a high bit, followed by zeros again:

g = q ₀ q ₁ q ₂ ... qj 100 ... 0000.

How does this help these guys understand what's going on? There are two ways to do this. Suppose that our guess g is equal to the value of c ₀ '. We can assume that these g and g _high correspond to the value c ₀ 'given on the left board. In fact, it is quite simple to do this, because c ₀ 'is quite easily calculated the opposite way from the encrypted input value c ₀ , you simply multiply it by R.

Therefore, in order to guess the value (c ₀ ') ^d , they simply need to take their guess, their guess g, and first divide it by R, that is, divide by 512 mod something there. Then they are going to enter it back, the server will multiply it by R and continue the process given in our pipeline scheme.

So, suppose we managed to put our particular selected integer value in the right place. So, what is the computation time from c ₀ 'to d mod q?

There are two possible options where q fits into this picture. It may be that q is between these two values of g and g _high , because the next bit q is 0. Thus, this value — the first 0 after qj — will be less than q, but this value — 1 after q — will be greater than q. This happens if the next bit q is 0, or it is possible that q is above both of these values if the next bit q is 1.

Now we can tell what the decryption time of these two values will be if q lies between them, or if q is located above both of them.

Let's consider the situation where q is located above. In this case, everything is pretty much the same. Since both these values are less than q, then the value of these things mod q will be about the same. They are somewhat different due to this extra bit, but still more or less the same size. And the number of extra reductions will probably also not differ much, because it is proportional to the value of c0 'mod q. And for both the values of g and g _high , smaller than q, they are all about the same. Neither of them will exceed q and will not cause a large number of additional reductions, because when q is greater than both of these guesses, the number of calculations by the Karatsuba method relative to the number of ordinary calculations will remain the same. In terms of this relationship, the server will handle both g and g _{high in the} same way. Therefore, the server is going to make about the same additional reductions for both of these values. Thus, if you see that the server spends the same time to answer these guesses, then you should probably assume that, in the value of g _{high, there is} indeed 1 in this place.

On the other hand, if q is located between these two values, then there are two possible things that can cause switching and changing timings. One of the things is that since g _{high is} slightly larger than q, the number of additional cuts will be proportional to c ₀ 'mod q, which is very small, because c ₀ ' is q plus some bits in this additional sequence of 100 bits ... 00 Thus, the number of additional reductions will be more noticeable and everything will start to happen faster.
Another possible thing that can happen is that the server will probably decide: “oh, now it’s time to do normal multiplication instead of Karatsub’s method!”. Perhaps, for this value of g, the value of c ₀ 'will have the same number of bits as q, and if it turns out that g _{high is} greater than q, then g _high mod q will potentially have less bits. And if it crosses a certain border, the server will suddenly switch to normal multiplication. In this case, we will observe the reverse process - everything will start to flow more slowly, because the usual multiplication takes more time than the multiplication of Karatsuba.

The number of additional cuts is proportional to c ₀ 'mod q. If c ₀ which is this g _high value, is slightly larger than q, then this is a tiny difference that will not be comparable to the difference between g and q. In this case, there will also be a change in timings, which you can try to measure. So actually, there are a few interesting things here, when these effects actually work in different directions. So, if you set a 32-bit switching boundary between Karatsuba’s method and normal multiplication, then decoding this message will take much longer.

On the other hand, if the 32-bit border is not applied here, perhaps the effect of additional cuts will tell you what is happening here. Thus, in fact, you should monitor the various effects. 32 , , - . , , 32, , , , .

, , , . , q 1, , q 0, , g _high q , , .

. , . , . , , 1-2 . , , Ethernet.
, . . , 7 . , , -, , ?

: ?

: , , , , , . , .

, , 7 , , 7 . 7 g, 7 g + 1, g + 2 7 g + 400. g, g, , 7 400, ?

: , , ?

: , , , — (c ₀ ') ^d . . , , , mod p. , , . , g, 1, 2, 3, , .

, , – 100…00. , mod p, , mod p . « », , (c ₀ ') ^d , . It's clear?

: , ?

: - , q. , q, , , .

: c ₀ '?

: c ₀ ', c, c ₀ ' R mod n.

, «» , c ₀ = mod q, c ₀ = ((c ₀ ' R ^-1 ) mod n) mod q. R, R. c ₀ ', (c ₀ ') ^d mod q. , , , , R. , R = 2 ⁵¹² . , .

: mod p ?

:that is, you do not know what p is, but you want to define it at random? If you succeed, you will do a great job! Well, next week we will begin to discuss other issues.

Full version of the course is available here .

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to friends, 30% discount for Habr users on a unique analogue of the entry-level servers that we invented for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from $ 20 or how to share the server? (Options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps until December for free if you pay for a period of six months, you can order here .

Dell R730xd 2 times cheaper? Only we have 2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from $ 249 in the Netherlands and the USA! Read about How to build an infrastructure building. class c using servers Dell R730xd E5-2650 v4 worth 9000 euros for a penny?

Source: https://habr.com/ru/post/429394/

All Articles

MIT course "Computer Systems Security". Lecture 16: "Attacks through the side channel", part 3

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

More articles: