Go has good support for calling assembler functions, and the large number of very fast cryptographic code in the standard library is, in fact, a well-optimized assembler, which gives more than 20-fold increase in speed.
But it is still difficult to write code in assembler, it is even more difficult to analyze it, and cryptography does not forgive errors . Wouldn't it be great to be able to write these functions in some higher level language?
This post is about a slightly inadequate experiment on calling the Rust code from Go in an attempt to do this so quickly that it can be compared to a call to the assembler. You do not need to know Rust or about the inside of the compiler, it’s enough to understand what a linker is.
I admit right away - I do not know Rust and the idea of writing on it does not attract me very much. But still, I know that Rust is very well customizable and optimizable language, more readable than assembler. (In the end, more and more readable than assembler!)
In Go, it is customary to choose default values so that they fit the main tasks, and include by default only those features that are guaranteed fast - this helps in the constant and successful struggle with the need to have a bunch of parameters. I love him for that. But for what we are going to do today, we will need a language that will not blink if we ask it to generate stack-only functions with disabled security checks.
And if there is such a language that we could limit enough so that it behaves like an assembler, and optimize it enough so that it is as effective as an assembler, it will most likely be Rust.
In the end, Rust is safe, actively developing, and, importantly, it already has a good ecosystem of a fast cryptographic code that we can use.
In Go is from the box Foreign Function Interface , cgo. cgo allows Go programs to call C functions in the most natural way - which, unfortunately, is never natural at all. (I know more about cgo than I would like , and believe me, this is not fun at all ).
Using C ABI as a lingua franca for FFI, we can call anything from anywhere: Rust can be compiled into a library that is compatible with C ABI, and cgo can use this. This is stupid, but it works.
We can do it the other way around - compile Go into the C library and call it from different languages, as I, for example, did with Python as a trick . (Folk, it was just a trick, don't take it seriously)
But cgo does a lot of things under the hood to add some Go to naturalness: it organizes a stack for C code, it will set up defer calls to work correctly in case of a panic in a Go call ... you can write a separate post about it.
But as a result, the cost of each cgo call is too high for the case we are talking about today - small, nimble functions .
In general, such an idea: if we have isolated Rust code as an assembler, we should, in theory, be able to use it also as an assembler and call it directly. Maybe with a thin layer.
We do not want to work with it at the level of intermediate representation IR : the Go compiler converts both Go code and assembler into machine code before rejoicing since Go 1.3 .
This is confirmed by the presence of such a thing as "external linking" , when the system linker is used to link the program to Go. This is exactly how cgo works: first, C compiles with the compiler, Go - Go the compiler, and all this is linked together using clang
or gcc
. We can even directly transfer flags to the battleship via CGO_LDFLAGS
.
Under the hood of all the security measures in the cgo, we, of course, will find the interlanguage challenge itself, after all.
But it would be great to find a way to do this without changing the compiler, of course. First, let's figure out how to link the Go program with the Rust archive.
I could not find a normal way to link with an alien binary using go build
(and why not?), Except for using #cgo
directives. But the cgo call creates .s files that are then passed to the C compiler, not Go , which means, friends, that we will need a Go assembler.
Fortunately, go / build is just a front-end! Go offers a set of low-level utilities for compiling and linking programs, and go build
just assembles files into a heap and runs these utilities. We can follow what happens with the -x
flag.
I wrote a small Makefile along the lines of the -x -ldflags "-v -linkmode=external '-extldflags=-v'"
call when building cgo:
rustgo: rustgo.a go tool link -o rustgo -extld clang -buildmode exe -buildid b01dca11ab1e -linkmode external -v rustgo.a rustgo.a: hello.go hello.o go tool compile -o rustgo.a -p main -buildid b01dca11ab1e -pack hello.go go tool pack r rustgo.a hello.o hello.o: hello.s go tool asm -I "$(shell go env GOROOT)/pkg/include" -D GOOS_darwin -D GOARCH_amd64 -o hello.o hello.s
This will assemble a simple main package consisting of a single Go file ( hello.go
) and an assembler Go file ( hello.s
).
Now, if we want to link an object to Rust, we first need to build it as a static library ...
libhello.a: hello.rs rustc -g -O --crate-type staticlib hello.rs
... and then just tell the external linker to link them together:
rustgo: rustgo.a libhello.a go tool link -o rustgo -extld clang -buildmode exe -buildid b01dca11ab1e -linkmode external -v -extldflags='-lhello -L"$(CURDIR)"' rustgo.a
$ make go tool asm -I "/usr/local/Cellar/go/1.8.1_1/libexec/pkg/include" -D GOOS_darwin -D GOARCH_amd64 -o hello.o hello.s go tool compile -o rustgo.a -p main -buildid b01dca11ab1e -pack hello.go go tool pack r rustgo.a hello.o rustc --crate-type staticlib hello.rs note: link against the following native artifacts when linking against this static library note: the order and any duplication can be significant on some platforms, and so may need to be preserved note: library: System note: library: c note: library: m go tool link -o rustgo -extld clang -buildmode exe -buildid b01dca11ab1e -linkmode external -v -extldflags="-lhello -L/Users/filippo/code/misc/rustgo" rustgo.a HEADER = -H1 -T0x1001000 -D0x0 -R0x1000 searching for runtime.a in /usr/local/Cellar/go/1.8.1_1/libexec/pkg/darwin_amd64/runtime.a searching for runtime/cgo.a in /usr/local/Cellar/go/1.8.1_1/libexec/pkg/darwin_amd64/runtime/cgo.a 0.00 deadcode 0.00 pclntab=166785 bytes, funcdata total 17079 bytes 0.01 dodata 0.01 symsize = 0 0.01 symsize = 0 0.01 reloc 0.01 dwarf 0.02 symsize = 0 0.02 reloc 0.02 asmb 0.02 codeblk 0.03 datblk 0.03 sym 0.03 headr 0.06 host link: "clang" "-m64" "-gdwarf-2" "-Wl,-headerpad,1144" "-Wl,-no_pie" "-Wl,-pagezero_size,4000000" "-o" "rustgo" "-Qunused-arguments" "/var/folders/ry/v14gg02d0y9cb2w9809hf6ch0000gn/T/go-link-412633279/go.o" "/var/folders/ry/v14gg02d0y9cb2w9809hf6ch0000gn/T/go-link-412633279/000000.o" "-g" "-O2" "-lpthread" "-lhello" "-L/Users/filippo/code/misc/rustgo" 0.34 cpu time 12641 symbols 5764 liveness data
Well, we are linked, but the characters themselves can do nothing, just sitting in a binary file next to each other. We need to somehow call the Rust function from our Go code.
We already know how to call the Go function from Go. In assembler, this call will look like CALL hello(SB)
, where SB is a virtual register accessible to all global characters.
If we want to call a function in an assembler from Go, we need to let the compiler know about it - something like the C header, just writing func hello()
without the function body.
I tried all the above call combinations for the external Rust function, but they all complained that they either did not see the symbol or the function body.
But cgo, which in the end is just a big code generator, somehow manages in the end to call this alien function! But how?
I came across an answer a few days later.
//go:cgo_import_static _cgoPREFIX_Cfunc__Cmalloc //go:linkname __cgofn__cgoPREFIX_Cfunc__Cmalloc _cgoPREFIX_Cfunc__Cmalloc var __cgofn__cgoPREFIX_Cfunc__Cmalloc byte var _cgoPREFIX_Cfunc__Cmalloc = unsafe.Pointer(&__cgofn__cgoPREFIX_Cfunc__Cmalloc)
It looks like an interesting pragma! //go:linkname
just creates an alias for a symbol in the local scope (which can be used to call private functions! ), and I'm more than sure that the trick with byte
is just for some sort of address manipulation, but //go:cgo_import_static
... it imports an external character!
Armed with this new knowledge and the above Makefile, we have a chance to call the function from Rust (hello.rs)
#[no_mangle] pub extern fn hello() { println!("Hello, Rust!"); }
(Sorcery from no-mangle / pub / extern is taken from this tutorial )
And, we call from this program on Go ( hello.go
):
package main //go:cgo_import_static hello func trampoline() func main() { println("Hello, Go!") trampoline() }
Using this example in assembler ( hello.s
):
TEXT ·trampoline(SB), 0, $2048 JMP hello(SB) RET
CALL
was too sophisticated to use, but using simple JMP
...
Hello, Go! Hello, Rust! panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
Well, the program crashed while trying to exit. And this $2048
value is the whole stack that is given to Rust (if it puts the stack at all where it should be) and don’t ask me what happens if Rust tries to touch the heap ... but, damn, I'm surprised that it works!
Now, if we want to cleanly exit the program and pass on some arguments, we need to take a closer look at the call agreements in Go and Rust. These conventions determine where the arguments and return values are between function calls.
Go call conventions are described here and here . For Rust, we will have to look at the standard for FFI , which is just a standard agreement for C.
To continue, we need a debugger. (LLDB supports Go, but the breakpoints are buggy on MacOS X , so I had to do it inside the privileged Docker container)
The call convention in Go is mostly undocumented , but we need to understand it in order to continue further, so that's what we can learn from this listing disassembler (amd64). Let's look at a very simple function:
// func foo(x, y uint64) uint64 TEXT ·foo(SB), 0, $256-24 MOVQ x+0(FP), DX MOVQ DX, ret+16(FP) RET
foo
has 256 (0x100) bytes of the local frame, 16 bytes of arguments, 8 bytes for the return value, and it simply returns its first argument:
func main() { foo(0xf0f0f0f0f0f0f0f0, 0x5555555555555555)
rustgo[0x49d785]: movabsq $-0xf0f0f0f0f0f0f10, %rax rustgo[0x49d78f]: movq %rax, (%rsp) rustgo[0x49d793]: movabsq $0x5555555555555555, %rax rustgo[0x49d79d]: movq %rax, 0x8(%rsp) rustgo[0x49d7a2]: callq 0x49d8a0 ; main.foo at hello.s:14
The calling code above does very little: puts the arguments on the stack in reverse order, at the bottom of its frame (from rsp
to 16(rsp)
, remember that the stack grows down) and calls CALL
. The CALL
call starts the pointer to the return value on the stack and makes the transition. There is no cleaning of the caller; there is a simple RET
at the end.
Note that rsp
fixed, and here we have movq
instead of push
:
rustgo`main.foo at hello.s:14: rustgo[0x49d8a0]: movq %fs:-0x8, %rcx rustgo[0x49d8a9]: leaq -0x88(%rsp), %rax rustgo[0x49d8b1]: cmpq 0x10(%rcx), %rax rustgo[0x49d8b5]: jbe 0x49d8ee ; main.foo + 78 at hello.s:14 [...] rustgo[0x49d8ee]: callq 0x495d10 ; runtime.morestack_noctxt at asm_amd64.s:405 rustgo[0x49d8f3]: jmp 0x49d8a0 ; main.foo at hello.s:14
The first 4 and last 2 instructions of the function check if there is enough space on the stack, and if not, then call runtime.morestack
. They are most likely skipped for functions labeled NOSPLIT
:
rustgo[0x49d8b7]: subq $0x108, %rsp [...] rustgo[0x49d8e6]: addq $0x108, %rsp rustgo[0x49d8ed]: retq
Then we have control of rsp
, in which 0x108 is subtracted, freeing up space for 0x100 bytes for the frame and 8 bytes for the pointer, at a time. As a result, rsp
points to the bottom (end) of the function frame and is controlled by the called function. Before returning, rsp
returns to the same place where it was (immediately after the return pointer).
rustgo[0x49d8be]: movq %rbp, 0x100(%rsp) rustgo[0x49d8c6]: leaq 0x100(%rsp), %rbp [...] rustgo[0x49d8de]: movq 0x100(%rsp), %rbp
Finally, a pointer to a frame , which, in fact, is rbp
stack immediately after the return pointer and updated in rbp
. It turns out that rbp
also saved by the called function and must be updated where the called function has saved rbp
in order to be able to spin the stack.
rustgo[0x49d8ce]: movq 0x110(%rsp), %rdx rustgo[0x49d8d6]: movq %rdx, 0x120(%rsp)
As a result, from the body of the function, we saw that the returned values are located immediately above the arguments.
The Go documentation says that SP
and FP
are virtual registers, not just aliases for rsp
and rbp
.
It is clear, when we go to SP
from Go assembler, all offsets are recalculated relative to the real rsp
register so that SP
points upward, not at the bottom of the frame. This is convenient, because it means that we can not change all the offsets when the frame is resized, but this is, in fact, only syntactic sugar. Bare case access (like MOV SP, DX
) accesses rsp
directly.
The virtual FP
register is also simply recalculated relative to rsp
. It is pointed to the bottom of the frame of the calling function, where the arguments lie and there is no direct access.
Note: Go stores rbp
and frame pointers to help debug, but then uses a fixed rsp
and rsp
omit-stack-pointer
style omit-stack-pointer
for virtual FP
. You can read more about pointers to frames and how they should not be used in this post by Adam Langley .
With the standard x86-64 calling convention, sysv64
is rather different:
JMP
worked, but CALL
not - we did not align the stack!)Pointers to frames work in a similar way (and are generated using rustc
with -g
).
Creating a simple springboard between the two agreements should not be difficult. We can look at asmcgocall
for inspiration, since it does just that, only for cgo.
We need to remember that we want to give the Rust function to use the stack of our assembly function, since Go guarantees for us that it is present. To do this, we need to return rsp
to the end of the stack.
package main //go:cgo_import_static increment func trampoline(arg uint64) uint64 func main() { println(trampoline(41)) }
TEXT ·trampoline(SB), 0, $2048-16 MOVQ arg+0(FP), DI // Load the argument before messing with SP MOVQ SP, BX // Save SP in a callee-saved registry ADDQ $2048, SP // Rollback SP to reuse this function's frame ANDQ $~15, SP // Align the stack to 16-bytes CALL increment(SB) MOVQ BX, SP // Restore SP MOVQ AX, ret+8(FP) // Place the return value on the stack RET
#[no_mangle] pub extern fn increment(a: u64) -> u64 { return a + 1; }
CALL
, in fact, is not very good friends with macOS. For some reason, the function call was replaced with the intermediate call cgo_thread_start
, which is not so strange, considering that we use something called cgo_import_static
and CALL
also a virtuoso in Go assembly language.
callq 0x40a27cd ; x_cgo_thread_start + 29
We can get around this "help" using witchcraft //go:linkname
, which we found in the standard library to take a pointer to a function and then call a function pointer, like this:
import _ "unsafe" //go:cgo_import_static increment //go:linkname increment increment var increment uintptr var _increment = &increment
MOVQ ·_increment(SB), AX CALL AX
The whole task of this experiment was to call Rust instead of an assembler for cryptographic operations (well, have fun). Therefore, the rustgo call must be as fast as the assembler call to be useful.
Time benchmarks!
We compare the increase in the uint64 variable in the inline version, with the //go:noinline
directive, with our rustgo call, and with the cgo call of that same Rust function.
Rust was compiled with the flags -g -O
and benchmarks run on macOS on a 2.9GHz Intel Code i5 processor.
name time/op CallOverhead/Inline 1.72ns ± 3% CallOverhead/Go 4.60ns ± 2% CallOverhead/rustgo 5.11ns ± 4% CallOverhead/cgo 73.6ns ± 0%
rustgo is 11% slower than calling the usual Go function and almost 15 times faster than cgo!
The results are even better on Linux without a problem with the pointer, only 2% slower:
name time/op CallOverhead/Inline 1.67ns ± 2% CallOverhead/Go 4.49ns ± 3% CallOverhead/rustgo 4.58ns ± 3% CallOverhead/cgo 69.4ns ± 0%
For a real example, I chose the wonderful curve25519-dalek library and specifically the task of multiplying the starting point of a curve by a scalar and returning its Edwards view.
Cargo benchmarks vary greatly between launches due to a dynamic change in the processor frequency , but they roughly promise that the operation will take 22.9µs ± 17%.
test curve::bench::basepoint_mult ... bench: 17,276 ns/iter (+/- 3,057) test curve::bench::edwards_compress ... bench: 5,633 ns/iter (+/- 858)
On the Go side, we add a simple API.
func ScalarBaseMult(dst, in *[32]byte)
On the Rust side, this is not much different from building the interface for normal FFI .
I admit honestly, it took me forever to make it work in Rust:
#![no_std] extern crate curve25519_dalek; use curve25519_dalek::scalar::Scalar; use curve25519_dalek::constants; #[no_mangle] pub extern fn scalar_base_mult(dst: &mut [u8; 32], k: &[u8; 32]) { let res = &constants::ED25519_BASEPOINT_TABLE * &Scalar(*k); dst.clone_from(res.compress_edwards().as_bytes()); }
To create .a
we run cargo build --release
with Cargo.toml
, which shows the dependencies, turns on pointers to frames, and configures curve25519-dalek to use its most advanced mathematics without a standard library.
[package] name = "ed25519-dalek-rustgo" version = "0.0.0" [lib] crate-type = ["staticlib"] [dependencies.curve25519-dalek] version = "^0.9" default-features = false features = ["nightly"] [profile.release] debug = true
Well and still, we should correct our springboard that it accepted 2 arguments, and returned nothing:
TEXT ·ScalarBaseMult(SB), 0, $16384-16 MOVQ dst+0(FP), DI MOVQ in+8(FP), SI MOVQ SP, BX ADDQ $16384, SP ANDQ $~15, SP MOVQ ·_scalar_base_mult(SB), AX CALL AX MOVQ BX, SP RET
The result will be a transparent call from Go with a speed comparable to a benchmark on a pure Go, and almost 6% faster than cgo!
name old time/op new time/op delta RustScalarBaseMult 23.7µs ± 1% 22.3µs ± 4% -5.88% (p=0.003 n=5+7)
For comparison, the similar functionality from Go package github.com/agl/ed25519/edwards25519
- net Go implementation spends almost 3 times more time:
h := &edwards25519.ExtendedGroupElement{} edwards25519.GeScalarMultBase(h, &k) h.ToBytes(&dst)
name time/op GoScalarBaseMult 66.1µs ± 2%
Now we know that it really works, great! But in order to be able to actually use it, the solution must be in the form of a package that can be imported, and not forcefully inserted into package main
using the muddy build process.
And this is where //go:binary-only-package
comes into play. This annotation allows us to say to ignore the source code and use only the previously collected .a
library file from $GOPATH/pkg
.
If we can compile a .a
file that works with the native Go linker ( cmd / link , also called an internal linker ), we can distribute it and this will allow users to import our package as if it were native code , including cross-compiling (meaning that we collected .a
for this platform)!
Often Go is simple and we already have a pair with an assembler and Rust. We can even add documentation so that it can be seen through the go doc
:
//go:binary-only-package // Package edwards25519 implements operations on an Edwards curve that is // isomorphic to curve25519. // // Crypto operations are implemented by calling directly into the Rust // library curve25519-dalek, without cgo. // // You should not actually be using this. package edwards25519 import _ "unsafe" //go:cgo_import_static scalar_base_mult //go:linkname scalar_base_mult scalar_base_mult var scalar_base_mult uintptr var _scalar_base_mult = &scalar_base_mult // ScalarBaseMult multiplies the scalar in by the curve basepoint, and writes // the compressed Edwards representation of the resulting point to dst. func ScalarBaseMult(dst, in *[32]byte)
The makefile changes slightly, because we no longer build the library, we can stop using the go tool link
.
An .a
archive .a
is just a collection of object .o
files in an ancient format along with a symbol table . If we could take the symbols from libed25519_dalek_rustgo.a
to the libed25519_dalek_rustgo.a
archive so that go tool compile
saw them, we would reach the goal.
The .a
archives .a
operated using the UNIX utilities ar
or the internal analog Go - cmd / pack (via the go tool pack
). These two formats are very slightly different, of course. We will have to use ar
for libed25519_dalek_rustgo.a
and cmd/pack
for edwards25519.a
.
(For example, ar
on my macOS uses the BSD conventions for calling the #1/LEN
files and then inserting the file name of the length LEN at the beginning of this file to bypass 16 bytes of the maximum file length. This is confusing.)
To link these two libraries together, I tried to do the simplest (read: crutch) way: extract libed25519_dalek_rustgo.a
into a separate folder and then pack its objects back into edwards25519.a
.
edwards25519/edwards25519.a: edwards25519/rustgo.go edwards25519/rustgo.o target/release/libed25519_dalek_rustgo.a go tool compile -N -l -o $@ -p main -pack edwards25519/rustgo.go go tool pack r $@ edwards25519/rustgo.o # from edwards25519/rustgo.s mkdir -p target/release/libed25519_dalek_rustgo && cd target/release/libed25519_dalek_rustgo && \ rm -f *.o && ar xv "$(CURDIR)/target/release/libed25519_dalek_rustgo.a" go tool pack r $@ target/release/libed25519_dalek_rustgo/*.o .PHONY: install install: edwards25519/edwards25519.a mkdir -p "$(shell go env GOPATH)/pkg/darwin_amd64/$(IMPORT_PATH)/" cp edwards25519/edwards25519.a "$(shell go env GOPATH)/pkg/darwin_amd64/$(IMPORT_PATH)/"
Imagine my surprise when it worked!
Having the .a
file in the right place, it remains to write a simple program using this package:
package main import ( "bytes" "encoding/hex" "fmt" "testing" "github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519" ) func main() { input, _ := hex.DecodeString("39129b3f7bbd7e17a39679b940018a737fc3bf430fcbc827029e67360aab3707") expected, _ := hex.DecodeString("1cc4789ed5ea69f84ad460941ba0491ff532c1af1fa126733d6c7b62f7ebcbcf") var dst, k [32]byte copy(k[:], input) edwards25519.ScalarBaseMult(&dst, &k) if !bytes.Equal(dst[:], expected) { fmt.Println("rustgo produces a wrong result!") } fmt.Printf("BenchmarkScalarBaseMult\t%v\n", testing.Benchmark(func(b *testing.B) { for i := 0; i < bN; i++ { edwards25519.ScalarBaseMult(&dst, &k) } })) }
And run go build
!
$ go build -ldflags '-linkmode external -extldflags -lresolv' $ ./ed25519-dalek-rustgo BenchmarkScalarBaseMult 100000 19914 ns/op
Well, it almost worked. I had to pohimichit a little. The binary file carried compiled until we linked to libresolv
. To be honest, the Rust compiler tried to say this. (But who is it that everything the Rust compiler says?)
note: link against the following native artifacts when linking against this static library note: the order and any duplication can be significant on some platforms, and so may need to be preserved note: library: System note: library: resolv note: library: c note: library: m
Now, linking to system libraries will be a problem, because it will never happen to the internal linker and cross-compilation ...
But wait, lib resolve? Why is our no_std
, "should be as an assembler", only the Rust stack library is trying to use the standard DNS name resolver library?
The problem here is that the library is not really no_std
. Look at everything that is here! We do not need any allocators:
$ ar t target/release/libed25519_dalek_rustgo.a __.SYMDEF ed25519_dalek_rustgo-742a1d9f1c101d86.0.o ed25519_dalek_rustgo-742a1d9f1c101d86.crate.allocator.o curve25519_dalek-03e3ca0f6d904d88.0.o subtle-cd04b61500f6e56a.0.o std-72653eb2361f5909.0.o panic_unwind-d0b88496572d35a9.0.o unwind-da13b913698118f9.0.o arrayref-2be0c0ff08ae2c7d.0.o digest-f1373d68da35ca45.0.o generic_array-95ca86a62dc11ddc.0.o nodrop-7df18ca19bb4fc21.0.o odds-3bc0ea0bdf8209aa.0.o typenum-a61a9024d805e64e.0.o rand-e0d585156faee9eb.0.o alloc_system-c942637a1f049140.0.o libc-e038d130d15e5dae.0.o alloc-0e789b712308019f.0.o std_unicode-9735142be30abc63.0.o compiler_builtins-8a5da980a34153c7.0.o absvdi2.o absvsi2.o absvti2.o [... snip ...] truncsfhf2.o ucmpdi2.o ucmpti2.o core-9077840c2cc91cbf.0.o
So, how do we make it no_std
? This turned out to be a separate adventure , but I will write only conclusions:
no_std
, your no_std
flag will be reset. One of the dependencies of curve25519-dalek
had this problem and cargo update
fixed it.no_std
static library (that is, a library for external use, and not just inside Rust), then this is how to make the `no_std
executable binary, which is much more complicated if it should be self-sufficientno_std
binary is very no_std
. I mainly used the old version of the book on Rust and, as a result, I found this section on lang_items . This post also helped.panic_fmt
.compiler-rt
equivalents, so you must import crate compiler_builtins ( rust-lang / rust # 43264 )rust_begin_unwind
, no_mangle
panic_fmt
, ( rust-lang/rust#38281 )memcpy
, , , Rust rlibc
. , nm -u
, .lib.rs
:
#![no_std] #![feature(lang_items, compiler_builtins_lib, core_intrinsics)] use core::intrinsics; #[allow(private_no_mangle_fns)] #[no_mangle] // rust-lang/rust#38281 #[lang = "panic_fmt"] fn panic_fmt() -> ! { unsafe { intrinsics::abort() } } #[lang = "eh_personality"] extern fn eh_personality() {} extern crate compiler_builtins; // rust-lang/rust#43264 extern crate rlibc;
, go build
(!!!) macOS.
Linux .
fmax
, , , :
$ ld -r -o linux.o target/release/libed25519_dalek_rustgo/*.o $ nm -u linux.o U _GLOBAL_OFFSET_TABLE_ U abort U fmax U fmaxf U fmaxl U logb U logbf U logbl U scalbn U scalbnf U scalbnl
, , --gc-sections
, , . , , ( ):
$ go build -ldflags '-extld clang -linkmode external -extldflags -Wl,--gc-sections'
, , Makefile , --gc-sections
? , .a
man- .
.o
, , ld -r --gc-sections -u $SYMBOL
. -r
-u
"", . $SYMBOL
scalar_base_mult
.
macOS? , , , macOS .
$ ld -e _scalar_base_mult target/release/libed25519_dalek_rustgo/*.o Undefined symbols for architecture x86_64: "___assert_rtn", referenced from: _compilerrt_abort_impl in int_util.o "_copysign", referenced from: ___divdc3 in divdc3.o ___muldc3 in muldc3.o "_copysignf", referenced from: ___divsc3 in divsc3.o ___mulsc3 in mulsc3.o "_copysignl", referenced from: ___divxc3 in divxc3.o ___mulxc3 in mulxc3.o "_fmax", referenced from: ___divdc3 in divdc3.o "_fmaxf", referenced from: ___divsc3 in divsc3.o "_fmaxl", referenced from: ___divxc3 in divxc3.o "_logb", referenced from: ___divdc3 in divdc3.o "_logbf", referenced from: ___divsc3 in divsc3.o "_logbl", referenced from: ___divxc3 in divxc3.o "_scalbn", referenced from: ___divdc3 in divdc3.o "_scalbnf", referenced from: ___divsc3 in divsc3.o "_scalbnl", referenced from: ___divxc3 in divxc3.o ld: symbol(s) not found for inferred architecture x86_64 $ ld -e _scalar_base_mult -dead_strip target/release/libed25519_dalek_rustgo/*.o
, , macOS _
, .
, Makefile, :]
edwards25519/edwards25519.a: edwards25519/rustgo.go edwards25519/rustgo.o edwards25519/libed25519_dalek_rustgo.o go tool compile -N -l -o $@ -p main -pack edwards25519/rustgo.go go tool pack r $@ edwards25519/rustgo.o edwards25519/libed25519_dalek_rustgo.o edwards25519/libed25519_dalek_rustgo.o: target/$(TARGET)/release/libed25519_dalek_rustgo.a ifeq ($(shell go env GOOS),darwin) $(LD) -r -o $@ -arch x86_64 -u "_$(SYMBOL)" $^ else $(LD) -r -o $@ --gc-sections -u "$(SYMBOL)" $^ endif
, - , rustgo, , , . cmd/link ( !), , Go, , //cgo:cgo_import_static
, //cgo:cgo_import_dynamic
.
//go:cgo_import_static scalar_base_mult //go:cgo_import_dynamic scalar_base_mult
, , - rustgo , macOS, Linux, .
, .a
, //go:binary-only-package
tar- .a
linux_amd64/darwin_amd64
, :
$ tar tf ed25519-dalek-rustgo_go1.8.3.tar.gz src/github.com/FiloSottile/ed25519-dalek-rustgo/ src/github.com/FiloSottile/ed25519-dalek-rustgo/.gitignore src/github.com/FiloSottile/ed25519-dalek-rustgo/Cargo.lock src/github.com/FiloSottile/ed25519-dalek-rustgo/Cargo.toml src/github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519/ src/github.com/FiloSottile/ed25519-dalek-rustgo/main.go src/github.com/FiloSottile/ed25519-dalek-rustgo/Makefile src/github.com/FiloSottile/ed25519-dalek-rustgo/release.sh src/github.com/FiloSottile/ed25519-dalek-rustgo/src/ src/github.com/FiloSottile/ed25519-dalek-rustgo/target.go src/github.com/FiloSottile/ed25519-dalek-rustgo/src/lib.rs src/github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519/rustgo.go src/github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519/rustgo.s pkg/linux_amd64/github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519.a pkg/darwin_amd64/github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519.a
, , - ( .a
).
, , , Rust -Ctarget-cpu=native
, . ( curve25519-dalek authors ) - Haswell , Haswell:
$ benchstat bench-none.txt bench-haswell.txt name old time/op new time/op delta ScalarBaseMult/rustgo 22.0µs ± 3% 20.2µs ± 2% -8.41% (p=0.001 n=7+6) $ benchstat bench-haswell.txt bench-native.txt name old time/op new time/op delta ScalarBaseMult/rustgo 20.2µs ± 2% 20.1µs ± 2% ~ (p=0.945 n=6+7)
, , Makefile GOOS/GOARCH, Rust, Rust -, - .a
.
: github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519
godoc .
, .
, , rustgo , . , , g
, , , , . Rust .
, morestack
NOSPLIT
, , ( rsp
) , , Rust ( ).
, - "rustgo" , Makefile . Cgo , . go:generate
, -, cargo (-, Go Rust!). FFI- Rust, GoSlice
.
#[repr(C)] struct GoSlice { array: *mut u8, len: i32, cap: i32, }
- Go Rust , .
Ps. , - cgo ( ) Go, rustgo , . , , , - , . , , .
Source: https://habr.com/ru/post/337348/