The most important thing to know about the Go assembler: it is not a direct representation of the machine underlying the language. Something is compared directly with the machine, but something is not. The fact is that the compiler does not need to transfer the assembler to a regular pipeline. Instead, the compiler operates on a semi-abstract set of instructions, which are partially selected after generating the code. The assembler works in a semi-abstract form, so if you see the MOV instruction, this does not mean that the toolkit will generate a move instruction for this operation. Perhaps this will be a cleaning or loading instruction. Or maybe the generated instruction will exactly match the machine instruction with the same name. In general, machine-specific operations look like they are, and more general concepts, like moving memory or call and return routines, are more abstract. The details depend on the architecture, and we apologize for the inaccuracies, the situation is uncertain.
//go:noinline func add(a, b int32) (int32, bool) { return a + b, true } func main() { add(10, 32) }
//go:noinline
compiler directive //go:noinline
... Be careful.) $ GOOS=linux GOARCH=amd64 go tool compile -S direct_topfunc_call.go 0x0000 TEXT "".add(SB), NOSPLIT, $0-16 0x0000 FUNCDATA $0, gclocals¡f207267fbf96a0178e8758c6e3e0ce28(SB) 0x0000 FUNCDATA $1, gclocals¡33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 MOVL "".b+12(SP), AX 0x0004 MOVL "".a+8(SP), CX 0x0008 ADDL CX, AX 0x000a MOVL AX, "".~r2+16(SP) 0x000e MOVB $1, "".~r3+20(SP) 0x0013 RET 0x0000 TEXT "".main(SB), $24-0 ;; ...omitted stack-split prologue... 0x000f SUBQ $24, SP 0x0013 MOVQ BP, 16(SP) 0x0018 LEAQ 16(SP), BP 0x001d FUNCDATA $0, gclocals¡33cdeccccebe80329f1fdbee7f5874cb(SB) 0x001d FUNCDATA $1, gclocals¡33cdeccccebe80329f1fdbee7f5874cb(SB) 0x001d MOVQ $137438953482, AX 0x0027 MOVQ AX, (SP) 0x002b PCDATA $0, $0 0x002b CALL "".add(SB) 0x0030 MOVQ 16(SP), BP 0x0035 ADDQ $24, SP 0x0039 RET ;; ...omitted stack-split epilogue...
add
0x0000 TEXT "".add(SB), NOSPLIT, $0-16
0x0000
: Offset (offset) of the current instruction relative to the start of the function.TEXT "".add
: The TEXT
directive declares the character "".add
part of the .text
section (that is, executable code) and means that the instructions following the directive are the body of the function.""
during the build will be replaced with the name of the current package: for example, "".add
after linking to the final binary will become main.add
.(SB)
: SB
is a virtual register containing a "static-base" pointer, that is, the address of the beginning of the program's address space."".add(SB)
declares that our character is located at an address with a constant offset from the beginning of the address space. In other words, it is the absolute direct address where the symbol of the global function is written. This confirms objdump
:$ objdump -j .text -t direct_topfunc_call | grep 'main.add'
000000000044d980 g F .text 000000000000000f main.add
All user characters are written as offsets for pseudo-registers FP (arguments and local variables) and SB (global variables). The pseudo-register SB can be considered as a source of memory, so the symbol foo(SB)
is the name foo as an address in memory.
NOSPLIT
tells the compiler that it should NOT insert the stack split preamble (stack-split), which checks whether the current stack should be enlarged.add
function, the compiler set this flag itself: it is smart enough and realized that since add
does not have local variables and its own stack frame, then it simply cannot outgrow the current stack. This means that checks are performed on every call - processor cycles thrown to the wind."NOSPLIT"
: do not insert the initial check if the stack should be split. The frame for the subroutine (routine), as well as what it calls, must be placed in the spare space at the beginning of the stack segment. Used to protect subroutines, such as the stack partitioning code itself. At the end of the article we will talk a little about gorutin and stack splits.
$0-16: $0
- the size (in bytes) of the stack frame allocated in memory. $16
- the size of the arguments passed to the caller.In general, after the frame size comes the size of the argument, separated by a minus sign (this is not a subtraction, but a stupid syntax). The frame size of$24-8
means that the function has a frame size of 24 bytes, and it is called with an 8-byte argument that is in the frame of the caller. IfNOSPLIT
is not specified forTEXT
, then the size of the argument must be provided. For assembly functions with go-prototypes,go vet
will check if the size of the argument is correct.
0x0000 FUNCDATA $0, gclocals¡f207267fbf96a0178e8758c6e3e0ce28(SB)
0x0000 FUNCDATA $1, gclocals¡33cdeccccebe80329f1fdbee7f5874cb(SB)
FUNCDATA
and PCDATA
provided by the compiler and contain information for the garbage collector.0x0000 MOVL "".b+12(SP), AX
0x0004 MOVL "".a+8(SP), CX
The pseudo-register SP is a virtual stack pointer used to refer to local frame variables and arguments prepared for function calls. It indicates the beginning of the local stack frame, so links should use a negative offset in the range [âframesize, 0]:x-8(SP)
,y-4(SP)
, and so on.
"".b+12(SP)
and "".a+8(SP)
refer to addresses located 12 and 8 bytes from the top of the stack (remember: the stack grows down!)..a
and .b
are arbitrary aliases for the places to which we refer. Although they have absolutely no semantic meaning , they are prescribed to be used when relative addressing is used for virtual registers. This is what the documentation says about the virtual frame pointer:The FP pseudo-register is a virtual frame pointer used to refer to function arguments. Compilers support a virtual frame pointer and refer to arguments in the stack as offsets from the pseudo-register. Thus, 0 (FP) is the first argument of the function, 8 (FP) is the second (on a 64-bit machine), and so on. However, if you refer to the function arguments in this way, you must first put the name, for example: first_arg + 0 (FP) and second_arg + 8 (FP) (here the offset - from the frame pointer - differs from SB, which means offset from characters). The assembler uses this convention forcibly, rejecting simple 0 (FP) and 8 (FP). The real name does not correspond semantically, but should be used to document the name of the argument.
a
is not at 0(SP)
, but at 8(SP)
, because the caller retains its return address at 0(SP)
by means of a pseudo-function CALL
. 0x0008 ADDL CX, AX 0x000a MOVL AX, "".~r2+16(SP) 0x000e MOVB $1, "".~r3+20(SP)
ADDL
adds two Long-words (for example, 4-byte values), lying in AX
and CX
, and the result is written in AX
. Then this result is moved to "".~r2+16(SP)
, on the stack of which the caller has previously reserved a place and will look for returned values ââthere. I repeat: in this case, "".~r2
has no semantic meaning.true
. The mechanics are exactly the same as in the case of the first return value, only the offset will correspond to changes in SP
. 0x0013 RET
RET
pseudoinstructor tells the Go assembler to insert any instructions required by the calling convention used on the target platform in order to correctly return the result from the subroutine of the call. This will certainly force the code to extract (pop off) the return address located at 0(SP)
, and then return to it.The last instruction in the TEXT block should be some kind of transition, it is usually a (pseudo) RET instruction. If this is not the case, the linker will add a jump-to-itself instruction. There is no âfall throughâ in the TEXT blocks.
;; Declare global function symbol "".add (actually main.add once linked) ;; Do not insert stack-split preamble ;; 0 bytes of stack-frame, 16 bytes of arguments passed in ;; func add(a, b int32) (int32, bool) 0x0000 TEXT "".add(SB), NOSPLIT, $0-16 ;; ...omitted FUNCDATA stuff... 0x0000 MOVL "".b+12(SP), AX ;; move second Long-word (4B) argument from caller's stack-frame into AX 0x0004 MOVL "".a+8(SP), CX ;; move first Long-word (4B) argument from caller's stack-frame into CX 0x0008 ADDL CX, AX ;; compute AX=CX+AX 0x000a MOVL AX, "".~r2+16(SP) ;; move addition result (AX) into caller's stack-frame 0x000e MOVB $1, "".~r3+20(SP) ;; move `true` boolean (constant) into caller's stack-frame 0x0013 RET ;; jump to return address stored at 0(SP)
main.add
: | +-------------------------+ <-- 32(SP) | | | G | | | R | | | O | | main.main's saved | W | | frame-pointer (BP) | S | |-------------------------| <-- 24(SP) | | [alignment] | D | | "".~r3 (bool) = 1/true | <-- 21(SP) O | |-------------------------| <-- 20(SP) W | | | N | | "".~r2 (int32) = 42 | W | |-------------------------| <-- 16(SP) A | | | R | | "".b (int32) = 32 | D | |-------------------------| <-- 12(SP) S | | | | | "".a (int32) = 10 | | |-------------------------| <-- 8(SP) | | | | | | | | | \ | / | return address to | \|/ | main.main + 0x30 | - +-------------------------+ <-- 0(SP) (TOP OF STACK) (diagram made with https://textik.com)
main
main
function looks like: 0x0000 TEXT "".main(SB), $24-0 ;; ...omitted stack-split prologue... 0x000f SUBQ $24, SP 0x0013 MOVQ BP, 16(SP) 0x0018 LEAQ 16(SP), BP ;; ...omitted FUNCDATA stuff... 0x001d MOVQ $137438953482, AX 0x0027 MOVQ AX, (SP) ;; ...omitted PCDATA stuff... 0x002b CALL "".add(SB) 0x0030 MOVQ 16(SP), BP 0x0035 ADDQ $24, SP 0x0039 RET ;; ...omitted stack-split epilogue... 0x0000 TEXT "".main(SB), $24-0
"".main
(once linked main.main
) is the symbol of a global function in the .text
section, whose address is a constant offset from the beginning of our address space.0x000f SUBQ $24, SP
0x0013 MOVQ BP, 16(SP)
0x0018 LEAQ 16(SP), BP
main
- increases its stack frame by 24 bytes ( do not forget that the stack grows down, so in this case SUBQ
increases the stack frame ) by decrementing the virtual stack pointer. What do these 24 bytes consist of:16(SP)-24(SP)
) are used to store the current value of the BP frame pointer ( real! ) To unwind the stack (stack-unwinding) and simplify debugging.12(SP)-16(SP)
) is reserved for the second return value ( bool
) plus 3 bytes of the necessary equalization on amd64.8(SP)-12(SP)
) are reserved for the first return value ( int32
).4(SP)-8(SP)
) are reserved for the value of the argument b ( int32
).0(SP)-4(SP)
) are reserved for the value of the argument a ( int32
).LEAQ
calculates the new frame pointer address and saves it to BP
. 0x001d MOVQ $137438953482, AX 0x0027 MOVQ AX, (SP)
137438953482
corresponds to 4-byte values ââof 10
and 32
, which are combined into one 8-byte value: $ echo 'obase=2;137438953482' | bc 10000000000000000000000000000000001010 \____/\______________________________/ 32 10 0x002b CALL "".add(SB)
CALL
to the add
function as an offset from the static-base pointer. That is, it is a direct transition to a direct address.CALL
also places the return address (8-byte value) on top of the stack. Therefore, each link to SP
from within the add
function will be offset by 8 bytes! For example, "".a
is now not at 0(SP)
, but at 8(SP)
. 0x0030 MOVQ 16(SP), BP 0x0035 ADDQ $24, SP 0x0039 RET
NOSPLIT
, which tells the compiler not to insert checks. 0x0000 TEXT "".main(SB), $24-0 ;; stack-split prologue 0x0000 MOVQ (TLS), CX 0x0009 CMPQ SP, 16(CX) 0x000d JLS 58 0x000f SUBQ $24, SP 0x0013 MOVQ BP, 16(SP) 0x0018 LEAQ 16(SP), BP ;; ...omitted FUNCDATA stuff... 0x001d MOVQ $137438953482, AX 0x0027 MOVQ AX, (SP) ;; ...omitted PCDATA stuff... 0x002b CALL "".add(SB) 0x0030 MOVQ 16(SP), BP 0x0035 ADDQ $24, SP 0x0039 RET ;; stack-split epilogue 0x003a NOP ;; ...omitted PCDATA stuff... 0x003a CALL runtime.morestack_noctxt(SB) 0x003f JMP 0
0x0000 MOVQ (TLS), CX ;; store current *g in CX 0x0009 CMPQ SP, 16(CX) ;; compare SP and g.stackguard0 0x000d JLS 58 ;; jumps to 0x3a if SP <= g.stackguard0
TLS
is a virtual register supported by the runtime environment containing a pointer to the current g
, that is, to a data structure that monitors the entire state of the gorutine.g
in the runtime source code: type g struct { stack stack // 16 bytes // stackguard0 is the stack pointer compared in the Go stack growth prologue. // It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption. stackguard0 uintptr stackguard1 uintptr // ...omitted dozens of fields... }
16(CX)
corresponds to g.stackguard0
, the threshold value supported by the runtime environment. It compares this value with the stack pointer and finds out if the goretin is close to stack exhaustion. That is, the prolog checks if the current SP
value is less than or equal to stackguard0
(correctly, it is greater), and if necessary, it goes to the epilog. 0x003a NOP 0x003a CALL runtime.morestack_noctxt(SB) 0x003f JMP 0
NOP
instruction stands in front of CALL
so that the prologue does not go directly to CALL
. On some platforms, this can lead to bad consequences. Therefore, right before the call itself, they usually insert an empty instruction (noop instruction) and land on the NOP
(also see discussion issue # 4: Clarify "nop before call" paragraph ).Source: https://habr.com/ru/post/358088/
All Articles