MIT course "Computer Systems Security". Lecture 16: "Attacks through the side channel", part 1
Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014
Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.
Okay guys, let's get started. So today we will talk about attacks through side channels, a common problem common to all types of systems. In a broad sense, attacks through a side channel arise in a situation where you do not think that some information can reveal information about your system. ')
Typically, you have several components that establish communication between the user and the server. At the same time you think that you know about everything that passes through this wire connecting two subscribers, know about all the bits that the user and server exchange with each other, and it is safe. But it often happens that some information is still disclosed by the user or the server. The example described in the materials of today's lecture describes a situation where the synchronization of messages between the user and the server reveals some additional information that you wouldn’t know about, just by watching the flow of bits between these subscribers.
In fact, there is a much wider class of side channels that you may worry about. People learned about the appearance of side channels back in the 40s, when they found out that if you start typing characters on a teletype, then the teletype electronics will begin to emit radio frequency radiation. In this case, you can simply position the oscilloscope next to it and observe how the frequency of the radio signal changes, depending on which character the teletype operator is typing. Thus, radiofrequency radiation is a classic example of a side channel that makes you worry.
There are many other examples seen by people, such as power consumption. This is another side channel you can worry about, as your computer will use different amounts of energy depending on what it calculates.
An example of a side channel is sound, thanks to which information leaks are also possible. For example, people listen to the printer and, based on the sounds they make, can tell which characters it prints. This is especially easy to do for dot-matrix dot-matrix printers that make very annoying sounds during printing.
And in general, this is something to think about. At a lecture on Monday, Kevin also mentioned some interesting side channels he deals with in his research. But we are going to look at the specific side channel that David Brumley and Dan Bone studied. In their work, published about 10 years ago, they write that they were able to extract the Apache web server cryptographic key by measuring the timing of different responses to various hostile client input packets. In this particular case, they "hunted" for a cryptographic key. In fact, many attacks through a side channel are aimed at cryptographic keys in part because it is rather difficult to obtain a large amount of data through such a channel. And cryptographic keys are one of the situations when a small amount of bits are output, that is, during an attack, attackers can extract about 200 - 256 bits of information. But based only on these 200 bits, they are able to crack the cryptographic key of this web server.
If you are trying to infiltrate into a database full of social security numbers, you will need to “merge” a lot of information from it. This is why most attacks through side channels focus on obtaining small secrets, such as cryptographic keys or passwords. But in general, this applies to many other situations.
The lecture materials described another cool thing - this is that all this can be done over the network. You probably realized that they had to do a lot of careful work in order to catch these smallest differences in temporal information. Each request that they sent to the server was different from another request for 1-2 microseconds, which is a very short amount of time. Therefore, you must be very careful, because our networks do not allow us to catch such a temporary difference and determine that the server processed your request for 1-2 microseconds longer than it should have been.
As a result, it was not clear whether such an attack could be organized in a very “noisy” network, since useful information should be separated from the noise level. These guys were among the first to show that you can really do this on an Ethernet network with a server at one end of the network and a client at the other end. They proved that it is indeed possible to measure these differences in part by averaging, in part with the help of other tricks. The plan for the rest of this lecture is this - first we will dive into the details of the RSA public key cryptographic algorithm that these guys used. We will not evaluate it in terms of security, but we will look at how this is implemented, because it is the implementation of the algorithm that is critical for using a side channel.
Bramley and Bonaire carefully examined the various implementation details in order to understand when some things were performed faster or slower. So first we need to find out how the RSA cryptosystem is implemented, and then go back and find out how you can attack all these structures that exist in RSA.
Let's start by looking at RSA at a high level. RSA is a fairly widely used public-key cryptosystem. We mentioned these guys a couple of weeks ago when we talked about certificates. Now we are going to see how it actually works.
As a rule, there are 3 things that you have to worry about - key generation, encryption and decryption. In RSA, the way to create a key is to select 2 large prime integers; we denote them by p and q. In their article, these guys write about p and q each 512 bits in size. This is usually referred to as 1024 bit RSA, because the product of these primes is a 1000-bit integer. These days, this is probably not a good choice for RSA key size, because hacking is a relatively easy task for intruders. So if 10 years ago, 1000 bits seemed like a reasonable value, today when building a system, you should choose 2000, 3000 or even 4000 bit RSA key. So the RSA key size means the size of these primes.
Then, for convenience, we will talk about the number n, which is simply the product of these prime numbers n = pq, and this number is called a module. So now that we know how to create a key, or at least part of a key, we will have to figure out how to encrypt and decrypt messages.
And the way we will encrypt and decrypt messages is based on the exponentiation of numbers modulo n. It seems a bit strange, but we'll figure it out in a second. So if you want to encrypt message m, you must convert it to m e mod n. The value of e is a degree, in a second we will find out how to choose it. This is how we are going to encrypt the message.
That is, we simply take this message as an integer and raise it to a power. In a second we will see why this works, but for now let’s label all this encrypted message with C.
Then, in order to decrypt it, we have to find another interesting exponent d, which allows us to take the encrypted message C, raise it to the power d by module n and as a result magically get the decrypted message m.
So, the general principle is to use one exponent to encrypt a message and another exponent to decrypt it.
In general, it seems a little difficult to understand how we are going to come up with these two magic numbers, which somehow return the original message to us. But if you look at how exponentiation or multiplication modulo this number n works, then you will understand everything.
If we take X and raise it to a power equal to the function φ of n, then it will be equal to 1 modulo n:
X φ (n) = 1 mod n
This phi function for our particular choice of n is fairly simple, it is φ = (p-1) (q-1). Then the product of the open exponent e, which is greater than 1, but less than φ (n), and the secret exponent d, will be equal to:
ed = ɑφ (n) +1, where ɑ is some constant.
Thus, it is possible to correctly decipher the message, since there is a rather simple algorithm, if you know the value of φ (n) in order to calculate d with e or e taking into account d.
Audience: Is 1 modulo n not equal to 1?
Professor: yes, I made an inaccuracy here, equality should look like this:
X φ (n)mod n = 1 mod n, that is, the value of X in the degree of the function “phi” from n modulo n is equal to 1 modulo n.
Basically, this means that for RSA we must choose some value of e, which will be our encryption value. Then from e we are going to generate d based on the formula de = 1 mod φ (n), hence d = 1 / e mod φ (n).
There are some Euclidean algorithms that you can effectively use for this calculation. But in order to do this, you must know this φ (n), which requires factorization, that is, the expansion of our number n into factors p and q.
So, RSA is a system where the public key is a pair: the encryption exponent e and the number n, and the pair d and n is the private key, which is kept secret. So anyone can raise a message to a power to encrypt it for you. But only you know this value of d and therefore can decrypt the message. And until you know the factorization of these factors P and Q, or N, equal to the product of P and Q, you do not know what is φ = (p-1) (q-1).
As a result, calculating this d value is a rather difficult task. This is what the high level RSA algorithm is all about.
Now that we are familiar with the principles of RSA, I want to focus on 2 things. There are tricks, how to use it correctly and there are pitfalls that arise when using it. There are many ways to implement the code for this exponentiation and do it efficiently. This is quite an extraordinary task, because we are dealing with 1000-bit integers, for which it is impossible to simply execute the multiplication instruction. It will probably take a long time to do these operations.
Therefore, the first thing I want to mention is the various pitfalls of the RSA. One of them is multiplicative. Suppose we have 2 messages: m 0 and m 1 . I will encrypt them, turning m 0 into m 0e mod n, and m1 into m 1e mod n. A possible problem, or rather, not necessarily a problem, but an unpleasant surprise for someone who uses RSA is that it is very easy to encrypt the product m 0 by m 1 , because you simply multiply these 2 digits: m 0e mod n * m 1e mod n.
If you multiply them, then the result is the product (m 0 m 1 ) e modulo n. This is the correct encryption with simplified use of RSA for the value of m 0 m 1 . At the moment this is not a big problem, because you can simply create this encrypted message, but not able to decrypt it. However, there is a possibility that the general system will allow you to decrypt certain messages. That is, if it allows you to decrypt the message you have created, then perhaps, following the opposite way, you can find out what these messages are m 0 and m 1 .
This fact should not be ignored, as it affects a number of protocols using RSA. There is one property that we use as a defense mechanism at the end of our lecture.
The second pitfall, or RSA property, which is worth paying attention to - is determinism, or determinability. In the elementary implementation described earlier, if you take the message m and encrypt it, turning it into m e mod n, then this is a deterministic message function. Therefore, if you encrypt it again, you will receive exactly the same encryption.
This is not a desirable property, because if I see that you are sending some message encrypted with RSA, and want to know what it is, it can be difficult for me to decrypt it. But I can try different things, as a result of which I see that you are sending this message.
Then I will encrypt it and see if you get the same encrypted text. And if so, then I will find out that you are encrypted. Because all I need to encrypt a message is a publicly available public key, which is a pair of numbers (n, e).
So it's not so great, and you may have to be careful with this property when using RSA. Thus, primitives of this kind are difficult to use directly.
In practice, to avoid such problems with RSA, people encrypt the message in a certain way before encrypting it. Instead of directly raising to the power of the message, they take some function of the message and raise it to the power modulo n:
(f (m)) e mod n.
This f function commonly used these days is called OAEP - Optimal Asymmetric Encryption with Addition. This is something encoded with two interesting properties.
First, it introduces an accident. You can think of f (m) as generating a 1000-bit message that you are going to encrypt. A part of this message will be your message m here in the middle of this rectangle. Of course, you can get it back when decrypted. So, there are 2 interesting things you want to do: on the right you place some randomness, the value of r. It gives the fact that if you encrypt a message several times, you will get different results each time, so that encryption will no longer be determined.
In order to overcome this multiplicative property and other types of problems, on the left you have some pad, which is a sequence of the type 1 1 1 0 1 0. You can choose a sequence better, but roughly speaking, this is some kind of predictable sequence, which you put in here and whenever you decrypt a message, you make sure that this sequence is still there. The multiplicity will destroy these bits of the pad, and then it will be clear to you that someone messed up your message and dropped it. If these bits remain in place, it proves that no one has forged your message and you are able to accept it, since it was correctly encrypted by someone.
Audience: if the attacker knows how big the pad value is, can he set 1 to the bottom of the sequence and then multiply that value?
Professor: yes, it is possible. This is a little difficult, because this randomness of r will spread further. So the specific design of this OAEP is a bit more complicated than what I have depicted. But imagine that this is an integer, rather than a bit multiplication, randomness extends further, so you can create an OAEP scheme in which this will not happen.
It turns out that you should not use this RSA math directly, in practice you should use a certain library that implements all these things for you in the right way, and use it as an encryption and decryption parameter.
However, the details discussed above will be meaningful to us, since we are actually trying to figure out how to break or attack an existing RSA implementation.
In particular, the attack described in the lecture article will use the fact that the server is going to test this pading addition when it receives the message. So this is how we are going to consider the time it takes for the server to decrypt. We are going to send a carefully constructed message, but it is not created from the real message m and its encryption. We are going to create a thoroughly encrypted integer value.
The server will try to decrypt it and will receive some absurdity with which the pad pad will most likely not mate, and the server will immediately reject this message. This is exactly what we need, because in this way the server will tell us exactly how long it took him to get to this point: decrypt the RSA, receive this message, check the addition and reject it. This is what we will measure in this attack, described in the lecture material. There is a certain integral component in this message that will allow us to determine the decryption time.
So now let's talk about how we implement RSA. The essence of the implementation lies in the exponentiation, which is rather nontrivial, as I mentioned earlier, because all these are very large integers. The message itself, at least in this article, is a 1000-bit integer. And the exhibitor e itself will also be quite significant.
At least the encryption exponent is well known. But it is better that the decoding exponent be a large number, also about 1000 bits. So you have a 1000-bit integer that you want to build into a 1000-bit integer modulo one more 1000-bit integer n, which will be quite time-consuming. Therefore, optimization is provided in almost every RSA implementation to make this process a bit faster. There are 4 main optimizations that are relevant to this attack. In fact, there are more, but the most important are the following. The first is called the Chinese remainder theorem, or CRT. . . , , , , [ = a 1 ] mod p [ = a 2 ] mod q, q – , .
, [ = 1 ] mod pq. 1 , . 1 , mod pq a 1 a 2 mod p q.
, ? , , n, p q.
, mod pq, mod p mod q. , , mod pq. , ? , , ?
: , ?
: , , , . , p q, n 1000 , p q 500 , . , , — , , . — . , , . 1000 512 , 2 . , 4 . [ = a 1 ] mod p [ = a 2 ] mod q , 4 . 2 RSA , .
, .
, Sliding windows, « ». . , .
, - , 500 , mod p mod q, d. c d? , c d . d , 500, . , , . , « ». .