Riak and Riak Search Yokozuna: First acquaintance

The article describes the process of creating a simple repository of simple text documents based on Riak version 2.1.1 and organizing a search for them using Riak Search (Yokozuna). The official client for Erlang is used as the client library.

To begin with, imagine that we have a huge number of such documents:

title - title;
body - content;
tags - tags;
created_at - creation time;
smiles - the number of emoticons (plus signs, likes, whatever you want)

and a huge number of users who want to change them. Who cares, let's start.

I mean that readers already have some idea of Riak. If not, it is better to first read here and here , or, of course, official documentation .
')

Installation and Initial Setup

Java must be installed on the system to run Riak Search. You can install Riak itself into OSX through Homebrew, and Erlang, if required, will be installed automatically:

brew install riak

For our learning goals, it makes no sense to deploy a whole cluster, you can limit to only one node, so before starting you will need to perform only the minimum setting. In the config, you need to activate Riak Search:

 ## To enable Search set this 'on'. ## ## Default: off ## ## Acceptable values: ## - on or off search = on

If Riak is installed using Homebrew, then the config is here - /usr/local/Cellar/riak/2.1.1/libexec/etc/riak.conf

You will also need to increase the limit on open files, if this is not done, a warning will be displayed at startup. In OSX Yosemite, I did this:

 echo kern.maxfiles=65536 >> /etc/sysctl.conf echo kern.maxfilesperproc=65536 >> /etc/sysctl.conf sudo sysctl -w kern.maxfiles=65536 sudo sysctl -w kern.maxfilesperproc=65536 echo ulimit -n 65536 65536 >> ~/.bash_profile ulimit -n 65536 65536

Now you can run Riak:

 riak console

And you can test from another shell session:

 riak-admin test

The answer should be something like this: Successfully completed 1 read / write cycle to 'riak@127.0.0.1'

More about installation and settings:

Riak Data Types

Eventually Consistent repositories to which Riak belongs, allow the emergence of so-called. Data inconsistent situations where the contents of the same key on different replicas is different. Depending on the settings, Riak may try to resolve conflicts by using vector clocks or timestamps, or shift the responsibility to determine the correct version of the value to the application, giving it all available versions (siblings). In a real situation, if your documents will be edited in multi-user mode and replicated to several nodes, merge conflict data can become a very difficult task. In this case, perhaps the best solution would be to use Riak Data Types (also known as CRDT ). This technology allows you to describe data using special types that will take on the solution of the problem of cluster data convergence and will free the application from the obligations to resolve conflicts.

Riak Data Types, currently, implement the following five types of CRDT:

flag - Bit flag. Available operations - remove, install. Can only be used inside the map;
counter - counter. Available operations - increase, decrease. Also can only be used inside the map;
register - Any value (stored as a string);
set - Set of values. Available operations - add item, delete item;
map - Container for other types. Allows you to store within yourself flags, counters, registers, sets and nested maps. Available operations - add a field, delete a field, and also operations for internal fields corresponding to their types

In accordance with these types, we will present our documents as a map with the following structure:

title - register;
tags - set;
body - register;
created_at - register;
smiles - counter

Create and activate a bucket-type called documents-type to store our documents:

 riak-admin bucket-type create documents-type '{"props":{"datatype":"map"}}' riak-admin bucket-type activate documents-type

More information on Eventually Consistent, Riak Data Types and CRDT:

Riak Search

We want to implement a search of our documents by tags, by title, by date of creation and by content. To do this, we will use the technology Riak Search, codenamed Yokozuna, which is essentially an intermediary between the Riak repository and the Apache Solr search engine. Yokuzuna itself launches and monitors on each cluster node a separate JVM process with Solr, passes on to it search queries and changes in data.

In order for Solr to know how to index our documents, we need to create a search scheme. In general, Riak Search has a default scheme for all occasions - _yz_default , which is convenient to use during development, but for the working environment it is better to create your own.

Since the data structure we have already defined, we will create a scheme immediately. In the scheme, you need to list the fields of the document, for each of them specify the type, whether it is necessary to build an index on it and store its values, then to return them to the search results. Also in the scheme it is necessary to include the service fields Riak Search. It should be noted that when using Riak Data Types, a suffix corresponding to their type is added to the field names. Thus, we get the following description:

  <field name="title_register" type="string" indexed="true" stored="false" multiValued="fase" /> <field name="body_register" type="text_ru" indexed="true" stored="false" multiValued="true" /> <field name="tags_set" type="string" indexed="true" stored="false" multiValued="true" /> <field name="created_at_register" type="tdate" indexed="true" stored="false" multiValued="false" omitNorms="true" />

Under the spoiler, the full contents of the file with the scheme:

docs_seacrh_schema.xml

 <?xml version="1.0" encoding="UTF-8" ?> <schema name="schedule" version="1.5"> <fields> <field name="title_register" type="string" indexed="true" stored="false" multiValued="fase" /> <field name="body_register" type="text_ru" indexed="true" stored="false" multiValued="true" /> <field name="tags_set" type="string" indexed="true" stored="false" multiValued="true" /> <field name="created_at_register" type="tdate" indexed="true" stored="false" multiValued="false" omitNorms="true" /> <!--   --> <dynamicField name="*" type="ignored" /> <!--    Riak Search--> <field name="_yz_id" type="_yz_str" indexed="true" stored="true" multiValued="false" required="true"/> <field name="_yz_ed" type="_yz_str" indexed="true" stored="false" multiValued="false"/> <field name="_yz_pn" type="_yz_str" indexed="true" stored="false" multiValued="false"/> <field name="_yz_fpn" type="_yz_str" indexed="true" stored="false" multiValued="false"/> <field name="_yz_vtag" type="_yz_str" indexed="true" stored="false" multiValued="false"/> <field name="_yz_rk" type="_yz_str" indexed="true" stored="true" multiValued="false"/> <field name="_yz_rt" type="_yz_str" indexed="true" stored="true" multiValued="false"/> <field name="_yz_rb" type="_yz_str" indexed="true" stored="true" multiValued="false"/> <field name="_yz_err" type="_yz_str" indexed="true" stored="false" multiValued="false"/> </fields> <uniqueKey>_yz_id</uniqueKey> <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0" sortMissingLast="true" /> <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" /> <filter class="solr.SnowballPorterFilterFactory" language="Russian"/> <!-- less aggressive: <filter class="solr.RussianLightStemFilterFactory"/> --> </analyzer> </fieldType> <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" /> <!-- YZ String: Used for non-analyzed fields --> <fieldType name="_yz_str" class="solr.StrField" sortMissingLast="true" /> </types> </schema>

Details on Riak Search:

Access to Riak from Erlang

It's time to finally connect to our node with the help of a client for Erlang:

 #    : git clone https://github.com/basho/riak-erlang-client.git cd riak-erlang-client/ make cd .. #  REPL: erl -pa riak-erlang-client/ebin riak-erlang-client/deps/*/ebin

Connect

 {ok, RiakPid} = riakc_pb_socket:start_link("127.0.0.1", 8087). %   pong = riakc_pb_socket:ping(RiakPid).

Create a search schema and index

 {ok, Schema} = file:read_file("docs_search_schema.xml"). ok = riakc_pb_socket:create_search_schema(RiakPid, <<"documents-schema">>, Schema). ok = riakc_pb_socket:create_search_index(RiakPid, <<"documents-index">>, <<"documents-schema">>, []).

Create a bucket and assign a search index

 ok = riakc_pb_socket:set_search_index(RiakPid, {<<"documents-type">>, <<"documents-bucket">>}, <<"documents-index">>).

Create a new document

 Map = riakc_map:new(). %   -  (register) Map1 = riakc_map:update({<<"title">>, register}, fun(Reg) -> riakc_register:set(<<"DocumentTitle">>, Reg) end, Map). %   -   (register) Map2 = riakc_map:update({<<"body">>, register}, fun(Reg) -> riakc_register:set(<<"Some Document Body">>, Reg) end, Map1). %  -  (set) Map3 = riakc_map:update({<<"tags">>, set}, fun(Set) -> Set1 = riakc_set:add_element(<<"Tag One">>, Set), Set2 = riakc_set:add_element(<<"Tag Two">>, Set1), Set2 end, Map2). %   -  (register) Map4 = riakc_map:update({<<"created_at">>, register}, fun(Reg) -> % C   Solr      ISO8601. % https://cwiki.apache.org/confluence/display/solr/Working+with+Dates. ISODateFmtStr = "~4.10.0B-~2.10.0B-~2.10.0BT~2.10.0B:~2.10.0B:~2.10.0BZ", {{Year, Month, Day}, {Hour, Min, Sec}} = calendar:universal_time(), ISODate = list_to_binary(io_lib:format(ISODateFmtStr, [Year, Month, Day, Hour, Min, Sec])), riakc_register:set(ISODate, Reg) end, Map3). %         MapOperations = riakc_map:to_op(Map4). ok = riakc_pb_socket:update_type(RiakPid, {<<"documents-type">>, <<"documents-bucket">>}, <<"DocumentKey">>, MapOperations).

Find and get the documents

 %    Riak Client. rr("riak-erlang-client/include/riakc.hrl"). %     ,     {ok, Results} = riakc_pb_socket:search(RiakPid, <<"documents-index">>, <<"title_register:DocumentTitle AND tags_set:\"Tag One\" AND body_register:Some AND created_at_register:[1972-05-20T17:33:18Z TO NOW]">>). %   Docs = Results#search_results.docs. lists:foldr(fun({_Index, Doc}, Acc) -> {_, DocumentId} = lists:keyfind(<<"_yz_rk">>, 1, Doc), {ok, {map, Image, _, _, _}} = riakc_pb_socket:fetch_type(RiakPid, {<<"documents-type">>, <<"documents-bucket">>}, DocumentId), Image end, [], Docs).

Change the document

 riakc_pb_socket:modify_type(RiakPid, fun(Map) -> %   UpdatedMap1 = riakc_map:update({<<"body">>, register}, fun(Register) -> riakc_register:set(<<" ">>, Register) end, Map), %   UpdatedMap2 = riakc_map:update({<<"tags">>, set}, fun(Set) -> riakc_set:del_element(<<"Tag One">>, Set) end, UpdatedMap1), %  10  UpdatedMap3 = riakc_map:update({<<"smiles">>, counter}, fun(Counter) -> riakc_counter:increment(10, Counter) end, UpdatedMap2), UpdatedMap3 end, {<<"documents-type">>, <<"documents-bucket">>}, <<"DocumentKey">>, []). %  riakc_pb_socket:search(RiakPid, <<"documents-index">>, <<"body_register:\"\"">>).

Delete the document

 riakc_pb_socket:delete(RiakPid, {<<"documents-type">>, <<"documents-bucket">>}, <<"DocumentKey">>). %  riakc_pb_socket:search(RiakPid, <<"documents-index">>, <<"body_register:\"\"">>).

More information on working with the client library:

That's all for now. If this article is useful, then next time I will write about how to do this for all of this web-interface based on cowboy.

Source: https://habr.com/ru/post/265399/

All Articles