Features of the Apache ML Grid Machine Learning Library
The
Apache Ignite 2.0 release
includes the Apache Ignite Machine Learning Grid (
ML Grid ) beta library of machine learning, based on the highly optimized and scalable Apache Ignite Memory-Centric Platform API.
Source: xkcdAbout what the new library is capable of and how to work with it, our story is under the cut.
In release 2.0, the library mainly includes basic functionality, such as local and distributed operations of vector and matrix algebra using both ordinary and sparse data structures. The data itself can be stored in regular JVM memory, in off-heap memory and in the distributed Ignite cache.
')
For those who have already used other popular machine learning libraries, such as
Apache Mahout or
Colt , many things will be familiar. This is not by chance - one of the goals in designing the Apache Ignite ML Grid API was ease of use for those who are already used to typical machine learning libraries.
Many users have already used Ignite to accelerate matrix and vector operations even before the ML Grid. Ignite allows the sharing of Compute and Data grids for highly efficient processing of sparse data sets. And with the advent of ML Grid, it will be much easier to do this.
In future releases, it is planned to further expand the functionality of the library, in particular, the inclusion of distributed versions of popular algorithms used in solving machine learning problems.
Based on the algorithms of distributed algebra in release 2.0, the Apache Ignite community plans to add classification, regression analysis, k-average cluster analysis, decision trees, etc. to the library. Much of this is included in the next upcoming release 2.1 (in particular, linear regression and k-average).
More distant plans discuss the development of Python and R libraries as part of the Ignite ML stack.
The possibilities offered by the new module are, for the time being, rather modest: as already mentioned, this is a beta version, having basically basic functionality. And if you want to learn how to use the new API right now, without waiting for the next releases, then for you - the second part of our post, under the spoiler ...
How to work with ML Grid in Ignite Apache 2.0How to work with ML Grid in Ignite Apache 2.0
Probably the fastest way to start getting to know and working with the ML Grid is to collect, run and examine the results of the execution and the code of the examples included in the release. ML examples can be found in the
examples directory of the Apache Ignite distribution. You can also get the sample code from
this link in Github .
Step-by-step instructions for getting started with examples:
- Install Java version no lower than 8.
- Download Apache Ignite 2.0 or later.
- Open the examples project in the IDE — for example, IntelliJ IDEA or Eclipse.
- Activate the Maven ml profile in the project settings (in Ignite 2.0 it is disabled by default):

- Open the src / main / ml directory in IDE and run the ML Grid examples, as explained below.
- Find and run an example that interests you.
(in the screenshot - SparseDistributedMatrixExample ):

- Observe the execution of the example and the output to the console:

- When executing the TracerExample example , note what will be displayed in the browser:

- Optionally, edit the sample code and run it again to examine the result of the changes:

Let's say if the code of the example TracerExample is changed to like this:
then, when it is executed, there will be a more effective output in the browser:

So you can get a more substantive idea of ​​how the methods of matrix algebra are implemented in Ignite ML.
In this example, the matrix is ​​first created with a size of 100 by 100. Then its elements are filled with values ​​proportional to the product of their indices in the vertical and horizontal. And finally, using the Tracer API, the matrix is ​​rendered in HTML.
ML Grid examples do not require any special configuration. All of them can simply be started, run and stopped, and the result will be displayed in the console automatically, without any user intervention. In addition to the above, when the
Tracer API is executed, the browser is launched, which additionally displays the resulting HTML.
Also in javadocs, you can find documentation on the use of ML Grid classes and methods.
Build from source code
The latest Apache Ignite ML Grid jar build is available in the Maven repository. You can also independently build the library from the source code:
- Download the latest release version of Apache Ignite with source code.
- If necessary, clear the local Maven repository to eliminate the impact of previous builds.
- Make sure your java version is at least 8.
- Build and install the Apache Ignite Memory-Centric Platform from the root directory of the project:
mvn clean install -DskipTests -Dmaven.javadoc.skip = true -P java8
- Build and install ML Grid from the project root directory:
mvn install -Pml -DskipTests -U -pl modules / ml -am
- Locate the ML Grid jar build in the local Maven repository:
{user_dir} /. m2 / repository / org / apache / ignite / ignite-ml / {ignite-version} / ignite-ml- {ignite-version} .jar
- If you need to collect ML Grid examples from source code, run the following commands from the project root directory:
cd examples mvn clean package -DskipTests -Pml
If necessary, you can refer to additional documentation in the
DEVNOTES.txt files in the project root directory and
README in the
ignite-ml ML component
directory .