📜 ⬆️ ⬇️

How to find love or adventure with crate.io and kibana

You can argue about the performance, quality and efficiency of dating sites, you can search for 101 reasons than to look for acquaintances in a club / bar / _fill_variants_ / park. The fact that even ten or fifteen years ago caused laughter - now the mainstream. So is it not easier to try to use another opportunity to search and communicate on the Internet with the transition to acquaintance in life ...



Gikovsky option search technology, screencast application under the cut. At the end of the article link to the archive with a running application under the Apache License v2.0 and a small set of data for example.

')
It sounds encouraging, isn't it !? The reality is somewhat more complicated: armies of bots and fake accounts, workers of the oldest profession, attempts by dating services to squeeze out the most money with minimum results and even thieves in search of prey. More interesting? Not everything is so sad and with the right approach, the game is worth the candle!

Promised screencast applications:


Consider the software part for the search. We divide the task into two parts, as with the drawing of an owl:



We are in a hurry to help crate.io is a set of plug-ins for storing binary data in the file system and performing distributed SQL queries using the capabilities that are already in the search server elasticsearch. In a nutshell, this is the NoSQL shared nothing base at the heart of both the facebook Presto SQL parser and the add-on planner above it. Distributed solution from the world of big data, which we will use as long as one process on one computer.

Why crate.io? We need to store a photo somewhere and at the same time need Elasticsearch, and even SQL can be useful for statistics and reports in the future. Calm you and this time do without the enterprise, hibernate and JPA). As you will see, working with crate is no more difficult than with a relational base.

Kibana is an HTML5 application that allows you to visualize data from elasticsearch, work with time series, filter data, save search parameters in the form of dashboards.

How can this help in the search !? Minimum programming and maximum results.
You can work with crate.io from Python, Ruby, PHP, Java - jdbc type 4 drivers. But it was more convenient for me to include the REST API elasticsearch, which for some reason is hidden in crate and will work through it.

In the file config / crate.yml add parameters
es.api.enabled: true
udc.enabled: false

The second parameter disables crate.io usage reports sent via UDP to the project server and I immediately deleted the binaries from the sigar monitoring library, so as not to confuse your antivirus.

In this form, the “box” becomes friendly to work through elasticsearch REST and using spring data elasticsearch.

To start the server, you need java jre version 7 or higher.
I run the project bin / crate (in the case of windows, I need the file bin \ crate.bat )

Using the crash command line utility or web console
  http: // localhost: 4200 / _plugin / crate-admin / # / console 

I create storage for photos with the name images .

 bin / crash -c "create blob table images clustered into 7 shards 
 with (number_of_replicas = 0) "
 + ----------------------- + ----------- + --------- + --- -------- + --------- +
 |  server_url |  node_name |  version |  connected |  message |
 + ----------------------- + ----------- + --------- + --- -------- + --------- +
 |  http://127.0.0.1:4200 |  Brigade |  0.45.3 |  TRUE |  OK |
 + ----------------------- + ----------- + --------- + --- -------- + --------- +
 CONNECT OK
 CREATE OK (1.104 sec)


Elasticsearch does not require us to define a data format. In this decision, the devil is in the details, it is rather a topic for discussion in the comments to the article. I still specify the data types explicitly using the Mapping API, so that there are no problems with the search and display in kibana.

Data types
{ "info": { "mappings": { "default": { "properties": { "accommodation": { "type": "string", "index": "not_analyzed" }, "age": { "type": "long" }, "build": { "type": "string", "index": "not_analyzed" }, "drinkingHabits": { "type": "string", "index": "not_analyzed" }, "education": { "type": "string", "index": "not_analyzed" }, "ethnicity": { "type": "string", "index": "not_analyzed" }, "first": { "type": "date", "format": "basic_date_time" }, "height": { "type": "long" }, "images": { "type": "string" }, "info": { "properties": { "": { "type": "string" }, "": { "type": "string" }, "": { "type": "string" }, "": { "type": "string" }, " ": { "type": "string" }, "   ": { "type": "string" }, " ": { "type": "string" }, "": { "type": "string" }, "": { "type": "string" }, "  ": { "type": "string" }, "  ": { "type": "string" }, "": { "type": "string" }, "": { "type": "string" }, "": { "type": "string" }, "": { "type": "string" }, "": { "type": "string" } } }, "kids": { "type": "string", "index": "not_analyzed" }, "last": { "type": "date", "format": "basic_date_time" }, "login": { "type": "string" }, "mainImage": { "type": "string", "index": "not_analyzed" }, "message": { "type": "string" }, "readableLogin": { "type": "boolean" }, "realName": { "type": "string" }, "relationship": { "type": "string", "index": "not_analyzed" }, "replyRate": { "type": "long" }, "searchingFor": { "type": "string" }, "self": { "properties": { "     ": { "type": "string" }, "    ": { "type": "string" }, "      ": { "type": "string" }, "    ": { "type": "string" }, "     ": { "type": "string" }, "       ": { "type": "string" }, "      ": { "type": "string" }, ",     ": { "type": "string" }, "  ": { "type": "string" }, " ": { "type": "string" }, "       ": { "type": "string" }, "   ": { "type": "string" }, "   ": { "type": "string" }, "  ": { "type": "string" }, "  ": { "type": "string" }, "  ": { "type": "string" }, " ": { "type": "string" }, "  ": { "type": "string" }, "   ": { "type": "string" }, "    ": { "type": "string" }, "      ": { "type": "string" }, "  ,     ": { "type": "string" }, "    ": { "type": "string" }, "     ": { "type": "string" }, "   ": { "type": "string" } } }, "smoker": { "type": "string", "index": "not_analyzed" }, "updated": { "type": "date", "format": "basic_date_time" }, "viewed": { "type": "long" }, "weight": { "type": "long" } } } } } } 



Run the script that downloads the html pages from the sites, parses the html and extracts the data we need and saves using the REST API / elasticsearch java client.
Be sure to load json with index type = “default” so that you can execute SQL queries.



An example of one of json documents.



 cr> select count (*) from info;
 + ---------- +
 |  count (*) |
 + ---------- +
 |  291 |
 + ---------- +
 SELECT 1 row in set (0.030 sec)


What is the average age in the sample data?

 cr> select avg (age) from info;
 + --------------- +
 |  avg (age) |
 + --------------- +
 |  24.7275862069 |
 + --------------- +
 SELECT 1 row in set (0.038 sec)


The same script downloads images, considers sha1 digest and does http PUT for each photo in crate.io:
  "http://127.0.0.1:4200/_blobs/images/"+fileDigest 


We can verify that the entries in blob.images appeared:

 cr> select count (*) from blob.images;
 + ---------- +
 |  count (*) |
 + ---------- +
 |  2813 |
 + ---------- +
 SELECT 1 row in set (0.029 sec)


Excellent, data in the database!

I download the kibana archive and unpack it into the plugins / kibana / _site directory. When restarting, the server will find the frontend as a site plugin.

In plugins / kibana / _site / config.js specify the address to the Elasticserch REST API

  <b> elasticsearch: "http: //" + window.location.host, </ b> 


All changes to kibana are minor, most likely hacks. According to the correct, you would need to make your component with the ability to configure.

This fragment of the angularJS template displays an evaluation selector for the _id field in the main table and a photo, with the mainImage field visible .

plugins / kibana / _site / app / panels / table / module.html

Code displaying photos in the table, voting for rating
  <tr ng-click="toggle_details(event)" class="pointer"> <td ng-if="panel.fields.length<1" bo-text="event._source|stringify|tableTruncate:panel.trimFactor:1"></td> <td ng-show="panel.fields.length>0" ng-repeat="field in panel.fields"><span ng-if="(!panel.localTime || panel.timeField != field) && field!='mainImage' && field!='_id'" bo-html="(event.kibana.highlight[field]||event.kibana._source[field]) |tableHighlight | tableTruncate:panel.trimFactor:panel.fields.length" class="table-field-value"></span> <span ng-if="field=='_id' "> <span ng-repeat="t in [0,2,3,4,5]"> <input type="radio" name="item_{{event.kibana._source[field]}}" value="{{t}}" onclick="postESUpdate('{{event.kibana._source["_index"]}}','{{event.kibana._source["_type"]}}','{{event.kibana._source[field]}}',{{t}})" ng-if="event.kibana._source["rate"]!=t"> <input type="radio" name="item_{{event.kibana._source[field]}}" value="{{t}}" onclick="postESUpdate('{{event.kibana._source["_index"]}}','{{event.kibana._source["_type"]}}','{{event.kibana._source[field]}}',{{t}})" ng-if="event.kibana._source["rate"]==t" checked>{{t}} </span> </span> <span ng-if="field=='mainImage' "><img src="/_blobs/images/{{event.kibana._source[field]}}"/></span> <span ng-if="panel.localTime && panel.timeField == field && field!='mainImage'" bo-html="event.sort[1]|tableLocalTime:event" class="table-field-value"></span> </td> </tr> 



To display multiple images for a single recording while viewing a recording:

Display code for all photos
  <tr ng-repeat="(key,value) in event.kibana._source track by $index" ng-class-odd="'odd'"> <td style="word-wrap:break-word" bo-text="key"></td> <td style="white-space:nowrap"><i class="icon-search pointer" ng-click="build_search(key,value)" bs-tooltip="'Add filter to match this value'"></i> <i class="icon-ban-circle pointer" ng-click="build_search(key,value,true)" bs-tooltip="'Add filter to NOT match this value'"></i> <i class="pointer icon-th" ng-click="toggle_field(key)" bs-tooltip="'Toggle table column'"></i></td> <td style="white-space:pre-wrap;word-wrap:break-word"> <span ng-if=" key != 'images' " bo-html="value|noXml|urlLink|stringify"></span> <span ng-if=" key == 'images' "><div ng-repeat="img in value"><img src="/_blobs/images/{{img}}"/></div></span></td> </tr> 



For the voting script, use jquery, which is already in kibana

plugins / kibana / _site / index.html

Update assessment in json document, request to server
  function postESUpdate(index, type, id, rate){ $.ajax({ type: "POST", url: "http://"+window.location.host+"/"+index+"/"+type+"/"+id+"/_update", data: '{"doc":{"rate":'+rate+'}}' }).done(function(){//alert("success" }).fail(function(){alert("error")}); } 


This is a call to the elasticsearch Update API to update the rate document field.

This ends the programming. Further only web interface!



Briefly about the creation of filters you have already looked at the screencast at the beginning of the article.
It also shows how to select a time range on a histogram or using a timepicker. All your filters and settings can be saved as a dashboard in kibana and loaded when you need by name.

Beyond the scope of this article are regular expression searches, service security, monitoring and administration of crate.io, SQL queries through jdbc or clients for your programming language.

I repeat that to run the project you need jvm 7 or higher .

The application, with data for an example, you can download with dropbox (234MB tar.gz), unpack and run in * nix command:
bin / crate
or windows:
bin \ crate.bat

Open the finished dashboard in the browser:
  http: // localhost: 4200 
/ _plugin / kibana / # / dashboard / elasticsearch / When% 20first% 20photo% 20was% 20uploaded

Good luck with crate.io/kibana and in real dating !!!

PS Dropboxs decided not to issue an archive today (11/27/2014). Please tell me in the comments which public file hosting will allow you to upload a 234MB file without restrictions on the number of downloads.


According to the results of your vote, wrote the article “What should we parse the site. Basics webdriver API »

Source: https://habr.com/ru/post/244193/


All Articles