ld -z separate-code

This article is about a small security feature added in GNU ld to release 2.30 in December 2018. In Russian, this improvement was mentioned on opennet with the following annotation:

the "-z separate-code" mode, which increases the security of executable files at the cost of a small increase in the size and memory consumption

Let's figure it out. To explain what kind of security problem we are talking about and what the solution is, let's start with the common features of exploits of binary vulnerabilities.

Issues of interception of control flow in exploits

An attacker can transfer data to a program and manipulate it in this way with the help of various vulnerabilities: writing over an index beyond the boundaries of an array, unsafe copying of strings, using objects after release. Such errors are typical of the program code of C and C ++ languages and can lead to memory corruption with certain program input data.

Memory corruption vulnerabilities

CWE-20: Improper Input Validation
CWE-118: Incorrect Access of Indexable Resource ('Range Error')
CWE-119: Improve Restriction of Memory Buffer
CWE-120: Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')
CWE-121: Stack-based Buffer Overflow
CWE-122: Heap-based Buffer Overflow
CWE-123: Write-what-where Condition
CWE-124: Buffer Underwrite ('Buffer Underflow')
CWE-125: Out-of-bounds Read
CWE-126: Buffer Over-read
CWE-127: Buffer Under-read
CWE-128: Wrap-around Error
CWE-129: Improper Validation of Array Index
CWE-130: Improper Handling of Length Parameter Inconsistency
CWE-131: Incorrect Calculation of Buffer Size
CWE-134: Use of Externally-Controlled Format String
CWE-135: Incorrect Calculation of Multi-Byte String Length
CWE-170: Improper Null Termination
CWE-190: Integer Overflow or Wraparound
CWE-415: Double Free
CWE-416: Use After Free
CWE-476: NULL Pointer Dereference
CWE-787: Out-of-bounds Write
CWE-824: Access of Uninitialized Pointer
...

The classic element of an exploit of similar memory corruption vulnerabilities is the rewriting of a pointer in memory. The pointer will then be used by the program to transfer control to another code: to call a class method or function from another module, to return from a function. And since the pointer has been rewritten, the control will be intercepted by the attacker - that is, the code prepared by him will be executed. If you are interested in the variations and details of these techniques, we recommend reading the document .

This common point in the work of such exploits is well known, and here for the attacker, obstacles have long been placed:

Checking the integrity of pointers before passing control: stack cookies, control flow guard, pointer authentication
Randomization of segment addresses with code and data: address space layout randomization
Prevent code execution outside code segments: executable space protection

Next we focus on the last type of protection.

executable space protection

The program's memory is heterogeneous and divided into segments with different rights: for reading, writing and execution. This is provided by the processor's ability to mark memory pages with access rights flags in the page tables. The idea of protection is based on a strict separation of code and data: the data received from an attacker in the course of their processing should be placed in non-executable segments (stack, heap), and the code of the program itself in separate unchangeable segments. Thus, this should make it impossible for an attacker to place and execute extraneous code in memory.

To bypass the prohibition of code execution in data segments, Code reuse techniques are used. That is, an attacker transfers control to code fragments (hereinafter referred to as gadgets) located on executable pages. This kind of technology comes in different levels of complexity, in ascending order:

passing control to a function that does what is enough for an attacker: to the system () function with a controlled argument to run arbitrary shell commands (ret2libc)
transferring control to a function or chain of gadgets that disables protection or makes part of the memory executable (for example, calling mprotect() ), and then executing arbitrary code
performing all desired actions using a long chain of gadgets

Thus, the attacker is faced with the task of re-using the existing code in a particular volume. If this is something more difficult to return to one function, then you need to build a chain of gadgets . To search for gadgets by executable segments there are tools: ropper , ropgadget .

Hole READ_IMPLIES_EXEC

However, sometimes areas of data memory may be executable, and the principles of separation of code and data described above are clearly violated. In such cases, the attacker is free from the problems of searching for gadgets or functions for reusing code. An interesting find of this kind was the executable stack and all the data segments on one "industrial firewall".

Listing /proc/$pid/maps :

 00008000-00009000 r-xp 00000000 08:01 10 /var/flash/dmt/nx_test/a.out 00010000-00011000 rwxp 00000000 08:01 10 /var/flash/dmt/nx_test/a.out 00011000-00032000 rwxp 00000000 00:00 0 [heap] 40000000-4001f000 r-xp 00000000 1f:02 429 /lib/ld-linux.so.2 4001f000-40022000 rwxp 00000000 00:00 0 40027000-40028000 r-xp 0001f000 1f:02 429 /lib/ld-linux.so.2 40028000-40029000 rwxp 00020000 1f:02 429 /lib/ld-linux.so.2 4002c000-40172000 r-xp 00000000 1f:02 430 /lib/libc.so.6 40172000-40179000 ---p 00146000 1f:02 430 /lib/libc.so.6 40179000-4017b000 r-xp 00145000 1f:02 430 /lib/libc.so.6 4017b000-4017c000 rwxp 00147000 1f:02 430 /lib/libc.so.6 4017c000-40b80000 rwxp 00000000 00:00 0 be8c2000-be8d7000 rwxp 00000000 00:00 0 [stack]

Here you can see the memory card of the test utility process. The map consists of memory areas - rows of the table. First, pay attention to the right column - it explains the contents of the field (code segments, data libraries of functions or the program itself) or its type (heap, stack). On the left, in order, is the range of addresses that each memory area occupies and, further, the access rights flags: r (read), w (write), x (execute). These flags determine the behavior of the system when trying to read, write, and execute memory at these addresses. If this access mode is violated, an exception is thrown.

Note that almost all of the memory inside the process is executable: both the stack, and the heap, and all data segments. This is problem. Obviously, the presence of rwx memory pages will make life easier for an attacker, because he will be able to freely execute his code in such a process in any place where his code will go when transferring data (packets, files) to such a program for processing.

Why did such a situation arise on a modern device, which hardware supports the prohibition of code execution on data pages, the security of corporate and industrial networks depends on the device, and the sounded problem and its solution have been known for a long time?

This picture is determined by the behavior of the kernel during the process initialization (stack allocation, heap, loading of the main ELF, etc.) and during the execution of the nuclear process calls. The key attribute affecting this is the personality flag of READ_IMPLIES_EXEC . The effect of this flag is that any readable memory becomes also executable. A flag can be set to your process for several reasons:

Legacy can be explicitly requested by the software flag in the ELF header to implement a very interesting mechanism: springboard on the stack ( 1 , 2 , 3 )!
Can be inherited by child processes from the parent.
It can be installed by the kernel itself for all processes! First, if the architecture does not support non-executable memory. Secondly, just in case to support some more ancient crutches . This code is in kernel 2.6.32 (ARM), which had a very long lifespan. This was our case.

Space to search for gadgets in an ELF image

Function libraries and executable program files are in ELF format. The gcc compiler translates language constructs into machine code and adds it into one section, and the data that operates this code into other sections. There are many sections and they are grouped by the ld linker into segments. Thus, ELF contains a program image that has two representations: a table of sections and a table of segments.

 $ readelf -l /bin/ls Elf file type is EXEC (Executable file) Entry point 0x804bee9 There are 9 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x08048034 0x08048034 0x00120 0x00120 RE 0x4 INTERP 0x000154 0x08048154 0x08048154 0x00013 0x00013 R 0x1 [Requesting program interpreter: /lib/ld-linux.so.2] LOAD 0x000000 0x08048000 0x08048000 0x1e40c 0x1e40c RE 0x1000 LOAD 0x01ef00 0x08067f00 0x08067f00 0x00444 0x01078 RW 0x1000 DYNAMIC 0x01ef0c 0x08067f0c 0x08067f0c 0x000f0 0x000f0 RW 0x4 NOTE 0x000168 0x08048168 0x08048168 0x00044 0x00044 R 0x4 GNU_EH_FRAME 0x018b74 0x08060b74 0x08060b74 0x00814 0x00814 R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10 GNU_RELRO 0x01ef00 0x08067f00 0x08067f00 0x00100 0x00100 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame 03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss 04 .dynamic 05 .note.ABI-tag .note.gnu.build-id 06 .eh_frame_hdr 07 08 .init_array .fini_array .jcr .dynamic .got

Here you see the section mapping into segments in the ELF image.

The partition table is used by utilities to analyze programs and libraries, but is not used by loaders to project the ELF into the process memory. The section table describes the structure of the ELF in more detail than the segment table. Multiple sections can be inside one segment.

An ELF image in memory is created by ELF loaders based on the contents of the segment table. The partition table is no longer used to load ELF into memory.

But there are exceptions to this rule.

For example, in nature there is a patch for Debian developers for the ELF ld.so loader for the ARM architecture, which is looking for a special section ".ARM.attributes" of type SHT_ARM_ATTRIBUTES and binaries with a partitioned table of sections in such a system are not loaded ...

The ELF segment has flags that determine which access rights will be to the segment in memory. Traditionally, most of the software for GNU / Linux was built so that two PT_LOAD (loaded into memory) segments were declared in the segment table - as in the listing above:

RE flags segment

1.1. Executable code in ELF: sections .init , .text , .fini

1.2. Immutable data in ELF: sections .symtab , .rodata
RW flags segment

2.1. Variable data in ELF: sections .plt , .got , .data , .bss

If you pay attention to the composition of the first segment and its access flags, it becomes clear that this arrangement expands the space for searching for gadgets for the code reuse technician. In large ELFs, such as libcrypto, service tables and other immutable data can occupy up to 40% of the executable segment. The presence in this data of something similar to code pieces is confirmed by attempts to disassemble such binary files with a large amount of data in an executable segment without section tables and symbols. Each sequence of bytes in this single executable segment can be considered as useful for an attacking machine code fragment and springboard - be this sequence of bytes even a piece of a debugging message from a program, a part of the function name in the symbol table, or a cryptographic number-constant ...

PE executable headers

Executable headers and tables at the beginning of the first segment of an ELF image resemble the situation with Windows about 15 years ago. There were a number of viruses infecting files by writing their code in their PE header, which was also executable there. I managed to dig up such a sample in the archive:

Virus.Win32.Haless.1127

As you can see, the virus body is squeezed immediately after the partition table in the area of the PE headers. In the projection of the file on the virtual memory there is usually about 3 KB of free space. After the body of the virus, there is empty space and then the first section begins with the program code.

However, for Linux there were much more interesting works of the VX scene: Retaliation .

Decision

The problem described above has been known for a long time .
Fixed January 12, 2018 : added `ld -z separate-code key:" Create separate code "for each object. Don't create separate code "PT_LOAD" segment if noseparate-code is used. "). Feature entered release 2.30 .
Further, this feature was enabled by default in the next release 2.31 .
Present in fresh binutils packages, for example, in the Ubuntu 18.10 repositories. Many packages have already been assembled with this new feature, which ElfMaster has encountered and documented .

As a result of changes in the layout algorithm, a new ELF picture is obtained:

 $ readelf -l ls Elf file type is DYN (Shared object file) Entry point 0x41aa There are 11 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x00000034 0x00000034 0x00160 0x00160 R 0x4 INTERP 0x000194 0x00000194 0x00000194 0x00013 0x00013 R 0x1 [Requesting program interpreter: /lib/ld-linux.so.2] LOAD 0x000000 0x00000000 0x00000000 0x01e6c 0x01e6c R 0x1000 LOAD 0x002000 0x00002000 0x00002000 0x14bd8 0x14bd8 RE 0x1000 LOAD 0x017000 0x00017000 0x00017000 0x0bf80 0x0bf80 R 0x1000 LOAD 0x0237f8 0x000247f8 0x000247f8 0x0096c 0x01afc RW 0x1000 DYNAMIC 0x023cec 0x00024cec 0x00024cec 0x00100 0x00100 RW 0x4 NOTE 0x0001a8 0x000001a8 0x000001a8 0x00044 0x00044 R 0x4 GNU_EH_FRAME 0x01c3f8 0x0001c3f8 0x0001c3f8 0x0092c 0x0092c R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10 GNU_RELRO 0x0237f8 0x000247f8 0x000247f8 0x00808 0x00808 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt 03 .init .plt .plt.got .text .fini 04 .rodata .eh_frame_hdr .eh_frame 05 .init_array .fini_array .data.rel.ro .dynamic .got .data .bss 06 .dynamic 07 .note.ABI-tag .note.gnu.build-id 08 .eh_frame_hdr 09 10 .init_array .fini_array .data.rel.ro .dynamic .got

The border between code and data is now more accurate. The only executable segment really contains only code sections: .init, .plt, .plt.got, .text, .fini.

What exactly was changed inside ld?

As is known, the structure of the output ELF file is described by the linker script . You can view the default script like this:

 $ ld --verbose GNU ld (GNU Binutils for Ubuntu) 2.26.1 * * * using internal linker script: ================================================== /* Script for -z combreloc: combine and sort reloc sections */ /* Copyright (C) 2014-2015 Free Software Foundation, Inc. * * *

Many other scripts for different platforms and combinations of options are located in the ldscripts directory. New scripts have been created for the separate-code option.

 $ diff elf_x86_64.x elf_x86_64.xe 1c1 < /* Default linker script, for normal executables */ --- > /* Script for -z separate-code: generate normal executables with separate code segment */ 46a47 > . = ALIGN(CONSTANT (MAXPAGESIZE)); 70a72,75 > . = ALIGN(CONSTANT (MAXPAGESIZE)); > /* Adjust the address for the rodata segment. We want to adjust up to > the same address within the page on the next page up. */ > . = SEGMENT_START("rodata-segment", ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)));

Here you can see that a directive has been added to declare a new segment with read-only sections following the code segment.

However, in addition to scripts, changes were made to the source code of the linker. Namely, in the _bfd_elf_map_sections_to_segments function - see the commit . Now, when selecting segments for sections, a new segment will be added when the section differs in the SEC_CODE flag from the previous section.

Conclusion

As before , we recommend that developers not forget and use the security flags built into the compiler and linker when developing software. Only such a small change can make life harder for an attacker, and make your life much calmer.

Source: https://habr.com/ru/post/433108/

All Articles