📜 ⬆️ ⬇️

Start using Rust



Hello!

Recently, I began to learn the beautiful language Rust. I see practical application of this language for myself in embedding code into critical places of performance (at least until the moment of “ripening” and fouling with libraries and frameworks).
')
To consolidate the theoretical skills, I decided to make a small project, the essence of which is as follows: the dynamic library on Rust implements a simplified version of the Shingle algorithm and, through FFI, allows you to connect it (the library). Anyone interested is asking for cat.

First, a little theory - what is FFI? Here is my free translation of the material from the English-language Wikipedia :
FFI (Foreign function interface) is a mechanism that allows to execute code written in one language with another (language). The term comes from the Lisp programming language (more precisely, its Common Lisp dialect), also called Haskell and Python. In Java, it is called JNI (Java Native Interface) or JNA (Java Native Access).

In most cases, FFI is declared in high-level languages ​​(python, ruby, js), so that it is possible to use the service declared and implemented in a low-level language (C, C ++). This allows one programming language to be used by the OS API, in which they are not defined (but defined in another programming language) or improve performance.

The main function of FFI is the binding of semantics and calling conventions of one language (host) with the semantics and calling conventions of another (guest). This process should take into account the runtime environment and / or the binary interface of both.

So enough theory, let's get down to practice.

Rusty
First you need to create a new project using the command (before that you need to install Rust on your system ):
cargo new [ ] 

After this, a folder with the name you specified appears with the following contents.


In the configuration file Cargo.toml we write the following code:
Cargo.toml
 [package] name = "libshingles" version = "0.1.0" authors = ["Andrey Ivanov <lekkion@gmail.com>"] [dependencies] regex = "0.1.41" rust-crypto = "^0.2" libc = "0.1" [lib] name = "libshingles" crate_type = ["dylib"] 


In the [package] block we specify information about your packages, in [dependencies] - connectable containers, and [lib] instructs to compile the library as a standard dynamic library (by default, compiles into a specific rlib format).

Now directly the lib.rs code:
lib.rs
 extern crate libc; extern crate regex; extern crate crypto; use std::ffi::CStr; use std::thread; use regex::Regex; use std::cmp::min; use crypto::md5::Md5; use crypto::digest::Digest; #[no_mangle] pub extern "C" fn count_same(text1: *const libc::c_char, text2: *const libc::c_char, length: libc::uint16_t) -> f32 { let buf1 = unsafe { CStr::from_ptr(text1).to_bytes() }; let text1_ = match String::from_utf8(buf1.to_vec()) { Ok(val) => val, Err(e) => String::new(), }; let buf2 = unsafe { CStr::from_ptr(text2).to_bytes() }; let text2_ = match String::from_utf8(buf2.to_vec()) { Ok(val) => val, Err(e) => String::new(), }; fn canonize(text: String) -> String { let html = Regex::new(r"<[^>]*>|[:punct:]").unwrap(); let stop_words = Regex::new(r"(?i)\b[-]{1,2}\b").unwrap(); let mut temp = html.replace_all(&text, " "); temp = stop_words.replace_all(&temp, " "); temp } fn get_shingles(text: String, len: u16) -> Vec<String> { let text = canonize(text); let split: Vec<&str> = text.split_whitespace().collect(); let length = len as usize; if(split.len()<length) { return Vec::new(); } let mut str: Vec<String> = Vec::new(); for i in 0..(split.len()-length+1) { let mut buf = String::new(); for y in i..i+length { buf = buf + " " + split[y]; } let el = String::from(buf.trim()).to_lowercase(); str.push(el); } let mut handles: Vec<_> = Vec::with_capacity(str.len()); for item in str { handles.push( thread::spawn(move || { let bytes: &[u8] = item.as_bytes(); let mut hash = Md5::new(); hash.input(bytes); hash.result_str() }) ) } let mut res: Vec<String> = Vec::new(); for h in handles { match h.join() { Ok(r) => res.push(r), Err(err) => println!("error {:?}", err), }; } res } let shingles1 = get_shingles(text1_, length); let shingles2 = get_shingles(text2_, length); if(shingles1.len()==0 || shingles2.len()==0) { return 0 as f32; } let mut same = 0; for item in &shingles1 { for el in &shingles2 { if(*item == *el) { same += 1; } } } same = same*100; let length_text = min(shingles1.len(), shingles2.len()); let length_text_f = length_text as f32; let same_f = same as f32; let result: f32 = same_f/length_text_f; result } 


I have already mentioned that the algorithm of shingles is largely simplified: the stage of canonization is reduced to clearing punctuation marks and removing words less than two characters long (such words are usually devoid of any semantic meaning). In an amicable way, it was necessary to clear the text of adjectives and discard the words of the end, leaving only the roots. But to solve my problems, this implementation fits more than.

In the library code I will focus only on points related to the implementation of the FFI.
 extern crate libc; 

The purpose of this container is to provide all the definitions needed to interact with the C-code on each platform that supports Rust: the definition of types, constants, and function headers.

Attribute
 #[no_mangle] 
allows you to turn off default behavior that changes function names.

 pub extern "C" 
indicates that this function adheres to the calling agreement outside of this module on the binary interface C.

 text1: *const libc::c_char text2: *const libc::c_char 
These are signature types for CString.

 let buf1 = unsafe { CStr::from_ptr(text1).to_bytes() }; let text1_ = match String::from_utf8(buf1.to_vec()) { Ok(val) => val, Err(e) => String::new(), }; let buf2 = unsafe { CStr::from_ptr(text2).to_bytes() }; let text2_ = match String::from_utf8(buf2.to_vec()) { Ok(val) => val, Err(e) => String::new(), }; 
Here we denote as unsafe code extraction of bytes by a raw pointer to a CString. And convert these bytes to type String.

That's all. Important note - do not panic when writing a dynamic library using FFI! Using macro panic! leads to indefinite behavior. If you use panic !, you need to move it to another stream, so that panic does not pop up to C.
Example
 use std::thread; #[no_mangle] pub extern fn oh_no() -> i32 { let h = thread::spawn(|| { panic!("Oops!"); }); match h.join() { Ok(_) => 1, Err(_) => 0, } } 


Compile the library with the command
 cargo build 

The compiled library will be located along the way.
 /target/debug/ 
with the .so extension. Let's start connecting our library in other languages.

Node.js
To connect our dynamic library on this platform, we use the library called in npm as “ffi” .
Here is an example of connecting the library:
index.js
 var FFI = require('ffi'); var lib = FFI.Library('./target/debug/liblibshingles.so', { 'count_same': [ 'float', [ 'string', 'string', 'int' ] ] }); module.exports = lib.count_same; var text1 = "   ,      , -              ,            ,  -    . «   ,        2010-,  2011-   ,     .     , —   . —     - ,       ,    .    —  ».    , , -           ,      .   , ,   ,          ,     . «  ,      ,     .      , —   . —   .    .     ,      ,        .             —   »."; var text2 = "          « »,      ,   ,      ,  -    .              . «      .    25 ,    .          50  ,         », —   . « »     , , , , , , , ,    ."; lib.count_same.async(text1, text2, 2, function(err, res) { if(err) { return console.log('err', err); } console.log(res) }); 


The ffi package allows you to execute plug-in code in another thread, using libuv - to preserve the concept of asynchronous code. More information here .

The materials used in writing the article:
RUST FFI Embedding projects for safe, concurrent, and fast code anywhere
Node FFI Tutorial
Foreign function interface
Foreign Function Interface Rust book

Also thanks to comrade mkpankov

Githaba code

Source: https://habr.com/ru/post/270137/


All Articles