πŸ“œ ⬆️ ⬇️

Go for big data


In this post, we will talk about using the Intel Data Analytics Acceleration Library (Intel DAAL) Data Acceleration Library with the Go programming language for batch, interactive, and distributed processing.

On the basis of Go , the most modern infrastructure projects are built, including Kubernetes , Docker , Consul , etcd and many others. Go becomes the preferred language for DevOps, web servers and microservices. This language is easy to learn, easy to deploy, it is very fast, there is an excellent set of development tools for it.

Data processing and analysis are used in business more and more often, so it is necessary to implement resource-intensive computational algorithms at all levels of the company's infrastructure, including at those levels where the Go language is used. A natural question arises: how to integrate such solutions as machine learning, distributed data conversion and interactive data analysis into Go-based systems?

One way to ensure reliable, fast, and scalable data processing in Go is to use the Intel Data Analytics Acceleration Library (Intel DAAL) in Go. This library provides batch, interactive, and distributed processing algorithms for a variety of useful tasks.
')


Since Go interacts well with C / C ++, you can implement this functionality in Go programs without much difficulty. At the same time, we will significantly benefit in terms of speed: these libraries are already optimized for the Intel architecture. As shown here , in certain operations, for example when analyzing the main components, Intel DAAL can work seven times faster than Spark with MLlib. It's great! It would be very useful to use such power in Go applications.

Installing Intel DAAL


The Intel DAAL library is available as an open source code ; follow these instructions to install it. On my Linux computer, this was incredibly easy.

  1. Download source code.
  2. Run the install script.
  3. Set the necessary environment variables (you can also use the shell script provided for this).

Before integrating Intel DAAL into any Go program, it makes sense to make sure everything works correctly. To do this, you can use the various getting started guides in the Intel DAAL documentation . In particular, these guides provide an example of an Intel DAAL application for the Cholesky decomposition algorithm . Below we will try to create it in the Go language. The original example of the Cholesky decomposition algorithm in C ++ looks like this.

/**************************************************************************** ! Copyright(C) 2014-2017 Intel Corporation. All Rights Reserved. ! ! The source code, information and material ("Material") contained herein is ! owned by Intel Corporation or its suppliers or licensors, and title to such ! Material remains with Intel Corporation or its suppliers or licensors. The ! Material contains proprietary information of Intel or its suppliers and ! licensors. The Material is protected by worldwide copyright laws and treaty ! provisions. No part of the Material may be used, copied, reproduced, ! modified, published, uploaded, posted, transmitted, distributed or disclosed ! in any way without Intel's prior express written permission. No license ! under any patent, copyright or other intellectual property rights in the ! Material is granted to or conferred upon you, either expressly, by ! implication, inducement, estoppel or otherwise. Any license under such ! intellectual property rights must be express and approved by Intel in ! writing. ! ! *Third Party trademarks are the property of their respective owners. ! ! Unless otherwise agreed by Intel in writing, you may not remove or alter ! this notice or any other notice embedded in Materials by Intel or Intel's ! suppliers or licensors in any way. ! !**************************************************************************** ! Content: ! Cholesky decomposition sample program. !***************************************************************************/ #include "daal.h" #include <iostream> using namespace daal; using namespace daal::algorithms; using namespace daal::data_management; using namespace daal::services; const size_t dimension = 3; double inputArray[dimension *dimension] = { 1.0, 2.0, 4.0, 2.0, 13.0, 23.0, 4.0, 23.0, 77.0 }; int main(int argc, char *argv[]) { /* Create input numeric table from array */ SharedPtr inputData = SharedPtr(new Matrix(dimension, dimension, inputArray)); /* Create the algorithm object for computation of the Cholesky decomposition using the default method */ cholesky::Batch<> algorithm; /* Set input for the algorithm */ algorithm.input.set(cholesky::data, inputData); /* Compute Cholesky decomposition */ algorithm.compute(); /* Get pointer to Cholesky factor */ SharedPtr<Matrix > factor = staticPointerCast<Matrix, NumericTable>(algorithm.getResult()->get(cholesky::choleskyFactor)); /* Print the first element of the Cholesky factor */ std::cout << "The first element of the Cholesky factor: " << (*factor)[0][0]; return 0; } 

Try compiling and running this code to make sure the Intel DAAL installation is successful. In addition, this will give you an idea of ​​what we will do in the Go language. Any questions and problems associated with installing Intel DAAL can be discussed on the Intel DAAL forum (for me personally this forum turned out to be an extremely useful resource when I started trying to work with Intel DAAL).

Use of Intel DAAL in programs in the Go language


If we are talking about using the Intel DAAL library in Go programs, we have several options.

  1. Directly calling the Intel DAAL library from Go through a shell function.
  2. Create a reusable library with specific Intel DAAL functionality.

Below I demonstrate both of these approaches. All source code is available here . This is just one example. It would be great if over time we managed to add other Go programs with Intel DAAL to this repository. When experimenting, please send requests. I would be very interested to see what you create.

If you have not used Go before, then before continuing with this article, I recommend that you become better acquainted with this language. Note that Go doesn't even need to be installed on a local computer in order to start learning about it. You can take advantage of familiarization with Go on the Internet and the Go Playground site, and only then, when you are ready, you can install Go on your local computer .

Call the Intel DAAL library directly from Go


Go provides a tool called cgo , which allows you to create Go packages that call C code. In this case, we will use cgo to organize the interaction of our Go program with the Intel DAAL library.

By the way, the use of cgo with Go programs is subject to certain restrictions, which are discussed in some detail on the Internet (in particular, see the discussion of Dave Cheney or this article from Cockroach Labs). When deciding to use cgo, always take into account these limitations, or at least just remember them. In this case, we are ready to come to terms with the cgo limitations to take advantage of the optimized distributed Intel DAAL library: these limitations will more than pay off with improved performance in certain cases with high computational load or with large amounts of data.

To integrate the Cholesky decomposition algorithm from Intel DAAL into Go, you need to create the following folder structure (in the $ GOPATH directory).

cholesky`
β”œβ”€β”€ cholesky.go`
β”œβ”€β”€ cholesky.hxx`
└── cholesky.cxx`


The file cholesky.go is our Go program, which will use the Cholesky decomposition algorithm from the Intel DAAL library. The cholesky.cxx and cholesky.hxx files are C ++ definitions / declarations that include Intel DAAL and tell the cgo compiler what kind of Intel DAAL functionality we will use. Consider each one of them.

First the * .cxx file.

 #include "cholesky.hxx" #include "daal.h" #include <iostream> using namespace daal; using namespace daal::algorithms; using namespace daal::data_management; using namespace daal::services; int choleskyDecompose(int dimension, double inputArray[]) { /* Create input numeric table from array */ SharedPtr inputData = SharedPtr(new Matrix(dimension, dimension, inputArray)); /* Create the algorithm object for computation of the Cholesky decomposition using the default method */ cholesky::Batch<> algorithm; /* Set input for the algorithm */ algorithm.input.set(cholesky::data, inputData); /* Compute Cholesky decomposition */ algorithm.compute(); /* Get pointer to Cholesky factor */ SharedPtr<Matrix > factor = staticPointerCast<Matrix, NumericTable>(algorithm.getResult()->get(cholesky::choleskyFactor)); /* Return the first element of the Cholesky factor */ return (*factor)[0][0]; } 

Now the file * .hxx.

 #ifndef CHOLESKY_H #define CHOLESKY_H // __cplusplus gets defined when a C++ compiler processes the file. // extern "C" is needed so the C++ compiler exports the symbols w/out name issues. #ifdef __cplusplus extern "C" { #endif int choleskyDecompose(int dimension, double inputArray[]); #ifdef __cplusplus } #endif #endif 

These files define the choleskyDecompose wrapper in C ++ using the Intel DAAL Cholesky decomposition algorithm to decompose the input matrix and output the first element of the Cholesky multiplier (as in the example provided in the Intel DAAL Getting Started Guide). Note that in this case our input data is an array of the length of the matrix dimension (i.e. a 3 x 3 matrix corresponds to an input array of length 9). You need to include extern β€œC” in the * .hxx file. In this case, the C ++ compiler will β€œknow” that the corresponding names defined in our C ++ files need to be exported.

After determining the wrapper function of the Cholesky decomposition in the * .cxx and * .hxx files, you can call this function directly from Go. cholesky.go looks like this.

 package main // #cgo CXXFLAGS: -I$DAALINCLUDE // #cgo LDFLAGS: -L$DAALLIB -ldaal_core -ldaal_sequential -lpthread -lm // #include "cholesky.hxx" import "C" import ( "fmt" "unsafe" ) func main() { // Define the input matrix as an array. inputArray := [9]float64{ 1.0, 2.0, 4.0, 2.0, 13.0, 23.0, 4.0, 23.0, 77.0, } // Get the first Cholesky decomposition factor. data := (*C.double)(unsafe.Pointer(&inputArray[0])) factor := C.choleskyDecompose(3, data) // Output the first Cholesky dcomposition factor to stdout. fmt.Printf("The first Cholesky decomp. factor is: %d\n", factor) } 

Let's analyze this process step by step in order to understand what is going on here. First you need to tell the Go program to use cgo when compiling the program, and also to compile with certain flags.

 // #cgo CXXFLAGS: -I$DAALINCLUDE // #cgo LDFLAGS: -L$DAALLIB -ldaal_core -ldaal_sequential -lpthread -lm // #include "cholesky.hxx" import "C" 

To use, import β€œC” is required: this is a pseudo-packaging, reporting cgo usage. If there is a comment immediately before the β€œC” import command, this comment (it is called the preamble) will be used as the title when compiling the C ++ components of this package.

With CXXFLAGS and LDFLAGS, you can specify compilation and layout flags that cgo should use when compiling, and you can add our C ++ function using // #include "cholesky.hxx". To compile this example, I used Linux and gcc, as indicated above using the appropriate flags. However, you can follow this guide to determine how to build an application with Intel DAAL.

After that, you can write Go code in the same way as for any other program, and refer to our wrapper function as C.choleskyDecompose () .

 // Define the input matrix as an array. inputArray := [9]float64{ 1.0, 2.0, 4.0, 2.0, 13.0, 23.0, 4.0, 23.0, 77.0, } // Get the first Cholesky decomposition factor. data := (*C.double)(unsafe.Pointer(&inputArray[0])) factor := C.choleskyDecompose(3, data) // Output the first Cholesky dcomposition factor to stdout. fmt.Printf("The first Cholesky decomp. factor is: %d\n", factor) 

A unique feature in this case (due to the use of cgo) is that you need to convert the pointer to the first element of our float64 slice into an unsafe pointer, which can then be explicitly converted into a * C.double pointer (compatible with C ++) to our function choleskyDecompose. Packing in an insecure pointer allows us to bypass the type security restrictions in Go programs.
Fine! So, we have the Go program, which caused the Cholesky decomposition algorithm from the Intel DAAL library. Now it's time to build and run this program. This can be done in the usual way using go build.

 $ ls cholesky.cxx cholesky.go cholesky.hxx $ go build $ ls cholesky cholesky.cxx cholesky.go cholesky.hxx $ ./cholesky The first Cholesky decomp. factor is: 1 $ 

And the result is ready! Of course, the first multiplier for the Cholesky decomposition is 1. We successfully used the Intel DAAL library directly from Go. However, our Go program looks rather strange with unsafe pointers and C code fragments. In addition, this is a one-time solution. Now we will try to implement the same functionality in the form of a reusable Go package, which can be imported in the same way as any other Go package.

Creating a reusable Go package with Intel DAAL


To create a Go package containing Intel DAAL functionality, we will use the SWIG program. In Go, in addition to using cgo, you can invoke SWIG at build to compile Go packages that implement C / C ++ functionality. For this build you will need to create the following folder structure.

choleskylib
β”œβ”€β”€ cholesky.go
β”œβ”€β”€ cholesky.hxx
β”œβ”€β”€ cholesky.cxx
└── cholesky.swigcxx


However, the shell files * .cxx and * .hxx may remain the same. But now you need to add the file * .swigcxx. This file looks like this.

 %{ #include "cholesky.hxx" %} %include "cholesky.hxx" 

Now the SWIG program creates a wrapper code for the Cholesky decomposition function, which allows using this code as a Go package.

In addition, we create a Go package that is reusable (rather than a separate application), so the * .go file may not include package main or function main. He should just define the name of our package. In this case, let's call it cholesky. Now cholesky.go will look like this.

 package cholesky // #cgo CXXFLAGS: -I$DAALINCLUDE // #cgo LDFLAGS: -L$DAALLIB -ldaal_core -ldaal_sequential -lpthread -lm import "C" 

(Again, specify the files in the header.)

Now you can build the package and install it locally.

 $ ls cholesky.cxx cholesky.go cholesky.hxx cholesky.swigcxx $ go install $ 

This command compiles all the necessary binaries and libraries that Go uses using this package. Go "sees" that there is a * .swigcxx file in our folder, and automatically uses SWIG to build the package.

Sumptuously! We now have a Go package using Intel DAAL. Let's see how the import and use of the package will work.

 package main import ( "fmt" "github.com/dwhitena/daal-go/choleskylib" ) func main() { // Define the input matrix as an array. inputArray := [9]float64{ 1.0, 2.0, 4.0, 2.0, 13.0, 23.0, 4.0, 23.0, 77.0, } // Get the first Cholesky decomposition factor. factor := cholesky.CholeskyDecompose(3, &inputArray[0]) // Output the first Cholesky dcomposition factor to stdout. fmt.Printf("The first Cholesky decomp. factor is: %d\n", factor) } 

Great! This code is much cleaner compared to using direct Intel DAAL. You can import a Cholesky algorithm packet, like any other Go bundle, and call a wrapped function like cholesky.CholeskyDecompose (...) . In addition, all unsafe components were automatically processed in SWIG. Now you can simply pass the address of the first element of our source slice float64 to cholesky.CholeskyDecompose (...) .

This program, like any other Go program, can be compiled and run with the go build command:

 $ ls main.go $ go build $ ls example main.go $ ./example The first Cholesky decomp. factor is: 1 $ 

Hooray! All right Now you can use this package in other Go programs if we need the Cholesky decomposition algorithm.

Conclusions and resources


With the help of Intel DAAL, cgo and SWIG, we were able to embed an optimized algorithm for Cholesky decomposition into Go programs. Of course, the possibilities are not limited to this algorithm. Similarly, you can create programs and packages in the Go language, using any algorithms implemented in Intel DAAL. You can create neural networks with batch, interactive, and distributed processing, clustering, acceleration, co-filtering, and other features directly in Go applications.

All the code used above is available here .

Go Programming Resources


DAAL Resources


about the author


Daniel (@dwhitena) - Ph.D., an experienced data researcher, he works at Pachyderm (@pachydermIO). He develops modern distributed data pipelines, including predictive models, data visualization, statistical analysis, and other features. He spoke at conferences around the world (ODSC, Spark Summit, Datapalooza, DevFest Siberia, GopherCon b lheubt), teaches data research and analysis at Ardan Labs (@ardanlabs), supports the Go kernel for Jupyter and is actively involved in the development of various projects on intelligent open source data mining.

Source: https://habr.com/ru/post/338002/


All Articles