MIT course "Computer Systems Security". Lecture 7: "Sandbox Native Client", part 2

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.

Lecture 1: "Introduction: threat models" Part 1 / Part 2 / Part 3
Lecture 2: "Control of hacker attacks" Part 1 / Part 2 / Part 3
Lecture 3: "Buffer overflow: exploits and protection" Part 1 / Part 2 / Part 3
Lecture 4: "Separation of privileges" Part 1 / Part 2 / Part 3
Lecture 5: "Where Security Errors Come From" Part 1 / Part 2
Lecture 6: "Opportunities" Part 1 / Part 2 / Part 3
Lecture 7: "Sandbox Native Client" Part 1 / Part 2 / Part 3

Audience: why should the range of the memory capacity of the address space start from zero?
')
Professor: because from the point of view of performance, it is more efficient to use a target jump if you know that a valid address is a continuous set of addresses starting from zero. Because then you can do it with a single AND -mask, where all the high bits are one, and only a pair of low bits is zero.

Audience: I thought AND- mask was supposed to provide alignment.

Prof: right, the mask provides alignment, but why does it start from scratch? I think they rely on segmented hardware segmentation hardware. So, in principle, they could use it to move the area upwards, in terms of linear space. Or maybe it is simply related to how the application “sees” this range. In fact, you can place it at different offsets in your virtual address space. This will allow you to perform certain tricks with segmented hardware to run multiple modules in a single address space.

Audience: Is this possibly because they want to “catch” the point of reception of the null pointer?

Professor: yes, because they want to catch all points of reception. But you have a way to do it. Because the null pointer refers to the segment that is being accessed. And if you move a segment, you can display an unused zero page at the beginning of each segment. So this will help to make several modules.

I think that one of the reasons for this decision - to start the range from 0 - is related to their desire to port their program to the x64 platform, which has a slightly different design. But in their article this is not mentioned. In 64-bit design, the hardware itself got rid of some segmentation hardware that they relied on for efficiency reasons, so they had to consider a software-based approach. However, for x32, this is still not a sufficiently weighty reason for the space to start from scratch.

So, we continue the main question - what do we want to provide from a security point of view Let's get to this case a little "naively" and see how we can spoil everything, and then try to fix it.

I believe that a naive plan is to look for prohibited instructions, simply by scanning the executable from the very beginning to the end. So how can you discover these instructions? You can simply take the program code and put it in a giant line that goes from zero to 256 megabytes, depending on how big your code is, and then start the search.

This line can first contain the NOP module, then the ADD instruction module, NOT , JUMP, and so on. You are simply looking for, and if you find bad instructions, then say that this is a bad module and discard it. And if you do not see any system call to this instruction, then you can allow the launch of this module and do everything that is within the range of 0-256. Do you think it will work or not? What are they worried about? Why is it so hard?

Audience: Are they concerned about the size of the instructions?

Professor: yes, the fact is that the x86 platform has instructions of variable length. This means that the specific size of the instruction depends on the first few bytes of this instruction. In fact, you can look at the first byte to say that the instruction will be much larger, and then you may have to look at a couple more bytes, and then decide what kind of size it takes. Some architectures, such as Spark , ARM , MIPS, have more fixed-length instructions. ARM has two instruction lengths - either 2 or 4 bytes. But in the x86 platform , the length of instructions can be 1, and 5, and 10 bytes, and if you try, you can even get a rather long instruction of 15 bytes. However, these are complex instructions.

As a result, there may be a problem. If you scan this line of software code linearly, everything will be fine. But maybe at runtime you will go to the middle of some instruction, for example, NOT .

It is possible that this is a multibyte instruction, and if you interpret it, starting from the second byte, it will look completely different.

Another example in which we will “play” with an assembler. Suppose we have instruction 25 CD 80 00 00 . Looking at the 2nd byte, you will interpret it as a five-byte instruction, that is, you will have to look 5 bytes forward and see that it is followed by the instruction AND% EAX, 0x00 00 80 CD , starting with the AND operator for the EAX register with some certain constants, for example, 00 00 80 CD . This is one of the safe instructions that the Native Client should simply allow according to the first rule for checking binary instructions. But if during the execution of the program the CPU decides that it should start executing the code from the CD , I will mark this place of the instruction with an arrow, then the instruction % EAX, 0x00 00 80 CD , which is actually a 4-byte instruction, will mean execution $ 0x80 , which is a way to make a system call on Linux .

So if you miss this fact, then let the unreliable module "jump" into the kernel and make system calls, that is, do what you wanted to prevent. How can we avoid this?

Perhaps we should try to look at the offset of each byte. Since x86 can only begin to interpret the instruction in byte boundaries, not bit boundaries. Thus, you will have to look at the offset of each byte to see where the instruction starts. Do you think this is an expedient plan?

Audience: I think that if someone actually uses AND , the processor will not jump to this place, but simply allow the program to run.

Professor: yes, because basically he is not prone to false positives. Now, if you really want it, you can change the code a bit to somehow avoid it. If you know exactly what the test device is looking for, you could potentially change these instructions. Maybe by setting AND first for one instruction, and then use the mask on another. But it’s much easier to avoid these suspicious byte locations, although this seems rather inconvenient.

It is possible that the architecture includes a compiler change. In principle, they have some component that actually needs to compile the code correctly. You cannot simply take GCC off the shelf and compile the code for the Native Client . So basically it is doable. But probably, they just think that it causes too much trouble, will not be a reliable or high-performance solution, and so on. Plus, there are several x86 instructions that are prohibited, or should be deemed unsafe and therefore should be prohibited. But for the most part they are one byte in size, so they are quite difficult to search for or filter out.

Therefore, if they cannot simply collect and sort unsafe instructions and hope for the best, they need to use another plan to disassemble it in a reliable way. So what does the Native Client do to make sure that they do not “stumble” about this variable length encoding?

In a sense, if we really scan the executable file forward from left to right and look for all possible incorrect codes, and if this is the way the code is executed, then we are in good shape. Even if there are some strange instructions present and there is some offset, the processor is still not going to “jump” there, it will execute the program in the same order in which the instructions are scanned, that is, from left to right.

Thus, the problem with reliable disassembling arises from the fact that “jumps” can take place somewhere in the application. A processor may crash if it “jumps” to some kind of code instruction that it did not notice when scanning from left to right. So this is a problem of reliable disassembling while it is in development. And the basic plan is to check where all the “jumps” go. In fact, it is quite simple at some level. There are a lot of rules that we will consider in a second, but the approximate plan is that if you see the “jump” of the instruction, then you need to make sure that the goal of the “jump” was seen earlier. To do this, in fact, it suffices to perform a scan from left to right, that is, the procedure that we described in our naive approach to the problem.

In this case, if you see any “jump” instruction and the address that this instruction points to, then you must make sure that this is the same address that you already saw during the disassembly from left to right.

If a jump instruction to this CD byte is found, then we must mark this jump as invalid, because we have never seen the instruction starting at the CD byte, but we have seen another instruction starting with the number 25. But if all the jump instructions order to proceed to the beginning of the instruction, in this case to 25, then everything is fine with us. It's clear?

The only problem is that you cannot check the goals of each jump in the program, because there may be indirect jumps. For example, in x86 you might have something like a jump to the value of this register EAX . This is great for implementing function pointers.

That is, the function pointer is somewhere in memory, you hold it in some register, and then go to any address in the move register.

So how do these guys deal with indirect jumps? Because, in fact, I have no idea whether this will be a “jump” to a CD byte or to byte 25. What do they do in this case?

Audience: use tools?

Professor: yes, instrumentation is their main gimmick. Therefore, whenever they see that the compiler is ready to perform the generation, this is proof that this jump does not cause trouble. To do this, they need to make sure that all jumps are performed with a multiplicity of 32 bytes. How do they do it? They change all the jump instructions for what they called “pseudoinstructions”. These are the same instructions, but prefixed, which clears the 5 low bits in the EAX register. The fact that the instruction clears 5 low bits means that it causes the given value to be a multiple of 32, from two to five, and then a “jump” is made to this value.

If you look at it during verification, then make sure that this instructional “pair” will “jump” only with a multiplicity of 32 bytes. And then, in order to make sure that there is no possibility of “jumping” into some strange instructions, you apply an additional rule. It is that during disassembly, when you view your instructions from left to right, you ensure that the beginning of each valid instruction will also be a multiple of 32 bytes.

Thus, in addition to this toolkit, you verify that every code that is a multiple of 32 is the correct instruction. By correct, valid instruction, I understand the instruction that is being disassembled from left to right.

Audience: why is the number 32 chosen?

Professor: yes, why did they choose 32, not 1000 or 5? Why is 5 a bad thing?

Audience: because the number must be a power of 2.

Professor: Yes, well, that's why. Because otherwise, ensuring the use of something multiple of 5 will require additional instructions leading to overhead. What about eight? Eight - a good enough number?

Audience: You may have instructions longer than eight bits.

Professor: yes, that could be for the longest instruction allowed on the x86 platform. If we have 10 byte instructions, and everything must be a multiple of 8, then we will not be able to insert it anywhere. So the length should be sufficient for all cases, because the biggest instruction I saw was 15 bytes long. So 32 bytes is quite enough.

If you want to adapt the instruction to enter or exit from the process environment, you may need some nontrivial amount of code in one 32-byte slot. For example, 31 bytes, because 1 byte contains the instruction. Should it be much bigger? Should we make it equal, say, 1024 bytes? If you have a lot of function pointers or a lot of indirect jumps, then every time you want to create some place where you are going to jump, you must continue it to the next border, regardless of its value. So with 32 bits, it's quite a normal size. In the worst case, you will lose only 31 bytes if you need to quickly get to the next border. But if you have a size that is a multiple of 1024 bytes, then it is possible to waste a whole kilobyte of memory for making an indirect jump. If you have short functions or a lot of function pointers, such a large size of the multiplicity of the "jump" length will cause a significant memory overflow.

I do not think that the number 32 is a stumbling block for the Native Client . Some blocks could work with a multiplicity of 16 bits, some 64 or 128 bits are not important. Just 32 bits seemed to them the most acceptable, optimal value.

So let's make a reliable disassembly plan. As a result, the compiler should be a bit careful when compiling C or C ++ code into a binary Native Client file and follow these rules.

Therefore, whenever he has a jump, as shown in the top line, he must add these additional instructions given in the 2 bottom lines. And regardless of the fact that it creates the function to which it is going to “jump,” our instruction will jump as indicated by this addition AND $ 0xffffffe0,% eax . And it cannot simply add zeros to it, because all this must have the correct codes. Thus, the addition is necessary in order to make sure that every possible instruction is valid. And, fortunately, on the x86 platform, not a single noop function is described by one byte, or at least there is no noop that is 1 byte in size. Thus, you can always add things to the value of a constant.

So what does this guarantee us? Let's make sure that we always see what happens in the terminology of the instructions that will be executed. That's what gives us this rule - the confidence that the system call will not be made by chance. This is about jumping, but what about returns? How do they cope with the return? Can we perform a return to a function in the Native Client ? What happens if you run the red-hot code?

Audience: He can overflow the stack.

Professor: it is true that he quite unexpectedly pops up on the stack. But the fact is that the stack used by the Native Client modules actually contains some data inside. Thus, when contacting the Native Client you should not be worried about stack overflow.

Audience: wait, but you can put anything on the stack. And when you take an indirect jump.

Professor: that's true. A return looks almost like an indirect jump from some place in memory that is at the top of the stack. Therefore, I think that one thing they could do for the return function is to set the prefix in the same way that it was done in the previous check. And this prefix checks for what floats at the top of the stack. You check if it is valid, and when you write or use the AND operator, you check what is at the top of the stack. This seems a bit unreliable due to the constant change of data. Because, for example, if you look at the top of the stack and make sure everything is fine there, and then write something, the data flow in the same module can modify something at the top of the stack, after which you will refer to the wrong address.

Audience: Isn’t this also true of jumping?

Professor: yes, so what happens there with a jump? Can our race conditions somehow invalidate this check?

Audience: but the code is not available for recording?

Professor: Yes, the code can not be written, it's true. Thus, you cannot modify AND. But can not some other stream change the purpose of the jump between these two instructions?

Audience: this is in the register, so ...

Professor: Yes, this is a cool thing. Because if the stream modifies something in memory or in what is loaded from EAX (by itself, you do it before loading), in this case, this EAX will be in a bad state, but then clear the bad bits. Or it may change memory after the pointer is already in EAX , so it doesn’t matter that it changes the location of the memory from which the EAX register was loaded.

In fact, threads do not share sets of registers. Therefore, if another thread changes the EAX register, it will not affect the EAX register of this thread. Thus, other threads cannot invalidate this sequence of instructions.

There is another interesting question. Can we get around this AND ? . , AND .

, , , , , AND . . jmp , .

, , - , 1237. , 32. Native Client , , , . , , 1237 ?

- EAX , , , , . , ? ?

: NaCl , .

: , . x86 , , NaCl , 2 . , , : «, , !», 25 CD 80 00 00 . . , x86 .

, Native Client . , , , , NaCl . , .

: , , . , . , , , , .

: , . , . , , , EAX . , - . EAX , EBX . , . EAX EBX AND . , , EAX , . , - 64 . Jmp *% eax AND .

, , , , . Intel , , , , . , , . AND , EAX , «» .

, , . , . , , , . , , , .
, , C1 C7 .

C1 , , . , «» . , , . , , - . , .
2 , 0 64 . , , . , , .

3 , , , . , , .

4 , hlt . halt ? , C4 . , , - , .

, , ? , , - .
Therefore, they just want to make sure that there is no ambiguity in what happens if you continue to perform and do not jump, but simply follow to the end of the instructions screen. So let's assume that the meaning of this rule is that if you continue execution, you will stop, get trapped during the execution of the process and interrupt the module. So this rule concerns simplicity and safety of work.

55:20 min

Continued:

MIT course "Computer Systems Security". Lecture 7: "Sandbox Native Client", part 3

Full version of the course is available here .

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to friends, 30% discount for Habr users on a unique analogue of the entry-level servers that we invented for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from $ 20 or how to share the server? (Options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

3 months for free if you pay for new Dell R630 for half a year - 2 x Intel Deca-Core Xeon E5-2630 v4 / 128GB DDR4 / 4x1TB HDD or 2x240GB SSD / 1Gbps 10 TB - from $ 99.33 a month , only until the end of August, order can be here .

Dell R730xd 2 times cheaper? Only we have 2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from $ 249 in the Netherlands and the USA! Read about How to build an infrastructure building. class c using servers Dell R730xd E5-2650 v4 worth 9000 euros for a penny?

Source: https://habr.com/ru/post/418225/

All Articles

MIT course "Computer Systems Security". Lecture 7: "Sandbox Native Client", part 2

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

More articles: