⬆️ ⬇️

Riak and Riak Search Yokozuna: First acquaintance





The article describes the process of creating a simple repository of simple text documents based on Riak version 2.1.1 and organizing a search for them using Riak Search (Yokozuna). The official client for Erlang is used as the client library.



To begin with, imagine that we have a huge number of such documents:



and a huge number of users who want to change them. Who cares, let's start.



I mean that readers already have some idea of ​​Riak. If not, it is better to first read here and here , or, of course, official documentation .

')

Installation and Initial Setup



Java must be installed on the system to run Riak Search. You can install Riak itself into OSX through Homebrew, and Erlang, if required, will be installed automatically:

brew install riak 


For our learning goals, it makes no sense to deploy a whole cluster, you can limit to only one node, so before starting you will need to perform only the minimum setting. In the config, you need to activate Riak Search:

 ## To enable Search set this 'on'. ## ## Default: off ## ## Acceptable values: ## - on or off search = on 


If Riak is installed using Homebrew, then the config is here - /usr/local/Cellar/riak/2.1.1/libexec/etc/riak.conf



You will also need to increase the limit on open files, if this is not done, a warning will be displayed at startup. In OSX Yosemite, I did this:

 echo kern.maxfiles=65536 >> /etc/sysctl.conf echo kern.maxfilesperproc=65536 >> /etc/sysctl.conf sudo sysctl -w kern.maxfiles=65536 sudo sysctl -w kern.maxfilesperproc=65536 echo ulimit -n 65536 65536 >> ~/.bash_profile ulimit -n 65536 65536 


Now you can run Riak:

 riak console 


And you can test from another shell session:

 riak-admin test 


The answer should be something like this: Successfully completed 1 read / write cycle to 'riak@127.0.0.1'



More about installation and settings:





Riak Data Types



Eventually Consistent repositories to which Riak belongs, allow the emergence of so-called. Data inconsistent situations where the contents of the same key on different replicas is different. Depending on the settings, Riak may try to resolve conflicts by using vector clocks or timestamps, or shift the responsibility to determine the correct version of the value to the application, giving it all available versions (siblings). In a real situation, if your documents will be edited in multi-user mode and replicated to several nodes, merge conflict data can become a very difficult task. In this case, perhaps the best solution would be to use Riak Data Types (also known as CRDT ). This technology allows you to describe data using special types that will take on the solution of the problem of cluster data convergence and will free the application from the obligations to resolve conflicts.



Riak Data Types, currently, implement the following five types of CRDT:



In accordance with these types, we will present our documents as a map with the following structure:



Create and activate a bucket-type called documents-type to store our documents:

 riak-admin bucket-type create documents-type '{"props":{"datatype":"map"}}' riak-admin bucket-type activate documents-type 


More information on Eventually Consistent, Riak Data Types and CRDT:





Riak Search



We want to implement a search of our documents by tags, by title, by date of creation and by content. To do this, we will use the technology Riak Search, codenamed Yokozuna, which is essentially an intermediary between the Riak repository and the Apache Solr search engine. Yokuzuna itself launches and monitors on each cluster node a separate JVM process with Solr, passes on to it search queries and changes in data.



In order for Solr to know how to index our documents, we need to create a search scheme. In general, Riak Search has a default scheme for all occasions - _yz_default , which is convenient to use during development, but for the working environment it is better to create your own.



Since the data structure we have already defined, we will create a scheme immediately. In the scheme, you need to list the fields of the document, for each of them specify the type, whether it is necessary to build an index on it and store its values, then to return them to the search results. Also in the scheme it is necessary to include the service fields Riak Search. It should be noted that when using Riak Data Types, a suffix corresponding to their type is added to the field names. Thus, we get the following description:

  <field name="title_register" type="string" indexed="true" stored="false" multiValued="fase" /> <field name="body_register" type="text_ru" indexed="true" stored="false" multiValued="true" /> <field name="tags_set" type="string" indexed="true" stored="false" multiValued="true" /> <field name="created_at_register" type="tdate" indexed="true" stored="false" multiValued="false" omitNorms="true" /> 


Under the spoiler, the full contents of the file with the scheme:

docs_seacrh_schema.xml
 <?xml version="1.0" encoding="UTF-8" ?> <schema name="schedule" version="1.5"> <fields> <field name="title_register" type="string" indexed="true" stored="false" multiValued="fase" /> <field name="body_register" type="text_ru" indexed="true" stored="false" multiValued="true" /> <field name="tags_set" type="string" indexed="true" stored="false" multiValued="true" /> <field name="created_at_register" type="tdate" indexed="true" stored="false" multiValued="false" omitNorms="true" /> <!--   --> <dynamicField name="*" type="ignored" /> <!--    Riak Search--> <field name="_yz_id" type="_yz_str" indexed="true" stored="true" multiValued="false" required="true"/> <field name="_yz_ed" type="_yz_str" indexed="true" stored="false" multiValued="false"/> <field name="_yz_pn" type="_yz_str" indexed="true" stored="false" multiValued="false"/> <field name="_yz_fpn" type="_yz_str" indexed="true" stored="false" multiValued="false"/> <field name="_yz_vtag" type="_yz_str" indexed="true" stored="false" multiValued="false"/> <field name="_yz_rk" type="_yz_str" indexed="true" stored="true" multiValued="false"/> <field name="_yz_rt" type="_yz_str" indexed="true" stored="true" multiValued="false"/> <field name="_yz_rb" type="_yz_str" indexed="true" stored="true" multiValued="false"/> <field name="_yz_err" type="_yz_str" indexed="true" stored="false" multiValued="false"/> </fields> <uniqueKey>_yz_id</uniqueKey> <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0" sortMissingLast="true" /> <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" /> <filter class="solr.SnowballPorterFilterFactory" language="Russian"/> <!-- less aggressive: <filter class="solr.RussianLightStemFilterFactory"/> --> </analyzer> </fieldType> <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" /> <!-- YZ String: Used for non-analyzed fields --> <fieldType name="_yz_str" class="solr.StrField" sortMissingLast="true" /> </types> </schema> 




Details on Riak Search:





Access to Riak from Erlang



It's time to finally connect to our node with the help of a client for Erlang:

 #    : git clone https://github.com/basho/riak-erlang-client.git cd riak-erlang-client/ make cd .. #  REPL: erl -pa riak-erlang-client/ebin riak-erlang-client/deps/*/ebin 


Connect

 {ok, RiakPid} = riakc_pb_socket:start_link("127.0.0.1", 8087). %   pong = riakc_pb_socket:ping(RiakPid). 


Create a search schema and index

 {ok, Schema} = file:read_file("docs_search_schema.xml"). ok = riakc_pb_socket:create_search_schema(RiakPid, <<"documents-schema">>, Schema). ok = riakc_pb_socket:create_search_index(RiakPid, <<"documents-index">>, <<"documents-schema">>, []). 


Create a bucket and assign a search index

 ok = riakc_pb_socket:set_search_index(RiakPid, {<<"documents-type">>, <<"documents-bucket">>}, <<"documents-index">>). 


Create a new document

 Map = riakc_map:new(). %   -  (register) Map1 = riakc_map:update({<<"title">>, register}, fun(Reg) -> riakc_register:set(<<"DocumentTitle">>, Reg) end, Map). %   -   (register) Map2 = riakc_map:update({<<"body">>, register}, fun(Reg) -> riakc_register:set(<<"Some Document Body">>, Reg) end, Map1). %  -  (set) Map3 = riakc_map:update({<<"tags">>, set}, fun(Set) -> Set1 = riakc_set:add_element(<<"Tag One">>, Set), Set2 = riakc_set:add_element(<<"Tag Two">>, Set1), Set2 end, Map2). %   -  (register) Map4 = riakc_map:update({<<"created_at">>, register}, fun(Reg) -> % C   Solr      ISO8601. % https://cwiki.apache.org/confluence/display/solr/Working+with+Dates. ISODateFmtStr = "~4.10.0B-~2.10.0B-~2.10.0BT~2.10.0B:~2.10.0B:~2.10.0BZ", {{Year, Month, Day}, {Hour, Min, Sec}} = calendar:universal_time(), ISODate = list_to_binary(io_lib:format(ISODateFmtStr, [Year, Month, Day, Hour, Min, Sec])), riakc_register:set(ISODate, Reg) end, Map3). %         MapOperations = riakc_map:to_op(Map4). ok = riakc_pb_socket:update_type(RiakPid, {<<"documents-type">>, <<"documents-bucket">>}, <<"DocumentKey">>, MapOperations). 


Find and get the documents

 %    Riak Client. rr("riak-erlang-client/include/riakc.hrl"). %     ,     {ok, Results} = riakc_pb_socket:search(RiakPid, <<"documents-index">>, <<"title_register:DocumentTitle AND tags_set:\"Tag One\" AND body_register:Some AND created_at_register:[1972-05-20T17:33:18Z TO NOW]">>). %   Docs = Results#search_results.docs. lists:foldr(fun({_Index, Doc}, Acc) -> {_, DocumentId} = lists:keyfind(<<"_yz_rk">>, 1, Doc), {ok, {map, Image, _, _, _}} = riakc_pb_socket:fetch_type(RiakPid, {<<"documents-type">>, <<"documents-bucket">>}, DocumentId), Image end, [], Docs). 


Change the document

 riakc_pb_socket:modify_type(RiakPid, fun(Map) -> %   UpdatedMap1 = riakc_map:update({<<"body">>, register}, fun(Register) -> riakc_register:set(<<" ">>, Register) end, Map), %   UpdatedMap2 = riakc_map:update({<<"tags">>, set}, fun(Set) -> riakc_set:del_element(<<"Tag One">>, Set) end, UpdatedMap1), %  10  UpdatedMap3 = riakc_map:update({<<"smiles">>, counter}, fun(Counter) -> riakc_counter:increment(10, Counter) end, UpdatedMap2), UpdatedMap3 end, {<<"documents-type">>, <<"documents-bucket">>}, <<"DocumentKey">>, []). %  riakc_pb_socket:search(RiakPid, <<"documents-index">>, <<"body_register:\"\"">>). 


Delete the document

 riakc_pb_socket:delete(RiakPid, {<<"documents-type">>, <<"documents-bucket">>}, <<"DocumentKey">>). %  riakc_pb_socket:search(RiakPid, <<"documents-index">>, <<"body_register:\"\"">>). 


More information on working with the client library:





That's all for now. If this article is useful, then next time I will write about how to do this for all of this web-interface based on cowboy.

Source: https://habr.com/ru/post/265399/



All Articles