How to find love or adventure with crate.io and kibana
You can argue about the performance, quality and efficiency of dating sites, you can search for 101 reasons than to look for acquaintances in a club / bar / _fill_variants_ / park. The fact that even ten or fifteen years ago caused laughter - now the mainstream. So is it not easier to try to use another opportunity to search and communicate on the Internet with the transition to acquaintance in life ...
Gikovsky option search technology, screencast application under the cut. At the end of the article link to the archive with a running application under the Apache License v2.0 and a small set of data for example. ')
It sounds encouraging, isn't it !? The reality is somewhat more complicated: armies of bots and fake accounts, workers of the oldest profession, attempts by dating services to squeeze out the most money with minimum results and even thieves in search of prey. More interesting? Not everything is so sad and with the right approach, the game is worth the candle!
Promised screencast applications:
Consider the software part for the search. We divide the task into two parts, as with the drawing of an owl:
The first part - draw an oval. For us it is to find, collect and structure the data for further search. Any programming language with the client's html library, with regular expressions or working with DOM / xPath. For me, this part was not a problem, as a developer with solid experience in the integration of IT systems and a developer of a distributed search robot for a search startup Visuvi. If you think this topic is interesting, vote in favor of a new topic of the article.
The second part is drawing the rest of the owl. This is how to save data to storage information, index it and write a frontend to search and view data.
We are in a hurry to help crate.io is a set of plug-ins for storing binary data in the file system and performing distributed SQL queries using the capabilities that are already in the search server elasticsearch. In a nutshell, this is the NoSQL shared nothing base at the heart of both the facebook Presto SQL parser and the add-on planner above it. Distributed solution from the world of big data, which we will use as long as one process on one computer.
Why crate.io? We need to store a photo somewhere and at the same time need Elasticsearch, and even SQL can be useful for statistics and reports in the future. Calm you and this time do without the enterprise, hibernate and JPA). As you will see, working with crate is no more difficult than with a relational base.
Kibana is an HTML5 application that allows you to visualize data from elasticsearch, work with time series, filter data, save search parameters in the form of dashboards.
How can this help in the search !? Minimum programming and maximum results. You can work with crate.io from Python, Ruby, PHP, Java - jdbc type 4 drivers. But it was more convenient for me to include the REST API elasticsearch, which for some reason is hidden in crate and will work through it.
In the file config / crate.yml add parameters es.api.enabled: true udc.enabled: false
The second parameter disables crate.io usage reports sent via UDP to the project server and I immediately deleted the binaries from the sigar monitoring library, so as not to confuse your antivirus.
In this form, the “box” becomes friendly to work through elasticsearch REST and using spring data elasticsearch.
To start the server, you need java jre version 7 or higher. I run the project bin / crate (in the case of windows, I need the file bin \ crate.bat )
Using the crash command line utility or web console
Elasticsearch does not require us to define a data format. In this decision, the devil is in the details, it is rather a topic for discussion in the comments to the article. I still specify the data types explicitly using the Mapping API, so that there are no problems with the search and display in kibana.
Run the script that downloads the html pages from the sites, parses the html and extracts the data we need and saves using the REST API / elasticsearch java client. Be sure to load json with index type = “default” so that you can execute SQL queries.
An example of one of json documents.
cr> select count (*) from info;
+ ---------- +
| count (*) |
+ ---------- +
| 291 |
+ ---------- +
SELECT 1 row in set (0.030 sec)
What is the average age in the sample data?
cr> select avg (age) from info;
+ --------------- +
| avg (age) |
+ --------------- +
| 24.7275862069 |
+ --------------- +
SELECT 1 row in set (0.038 sec)
The same script downloads images, considers sha1 digest and does http PUT for each photo in crate.io:
"http://127.0.0.1:4200/_blobs/images/"+fileDigest
We can verify that the entries in blob.images appeared:
cr> select count (*) from blob.images;
+ ---------- +
| count (*) |
+ ---------- +
| 2813 |
+ ---------- +
SELECT 1 row in set (0.029 sec)
Excellent, data in the database!
I download the kibana archive and unpack it into the plugins / kibana / _site directory. When restarting, the server will find the frontend as a site plugin.
In plugins / kibana / _site / config.js specify the address to the Elasticserch REST API
All changes to kibana are minor, most likely hacks. According to the correct, you would need to make your component with the ability to configure.
This fragment of the angularJS template displays an evaluation selector for the _id field in the main table and a photo, with the mainImage field visible .
Code displaying photos in the table, voting for rating
<trng-click="toggle_details(event)"class="pointer"><tdng-if="panel.fields.length<1"bo-text="event._source|stringify|tableTruncate:panel.trimFactor:1"></td><tdng-show="panel.fields.length>0"ng-repeat="field in panel.fields"><spanng-if="(!panel.localTime || panel.timeField != field) && field!='mainImage' && field!='_id'"bo-html="(event.kibana.highlight[field]||event.kibana._source[field]) |tableHighlight | tableTruncate:panel.trimFactor:panel.fields.length"class="table-field-value"></span><spanng-if="field=='_id' "><spanng-repeat="t in [0,2,3,4,5]"><inputtype="radio"name="item_{{event.kibana._source[field]}}"value="{{t}}"onclick="postESUpdate('{{event.kibana._source["_index"]}}','{{event.kibana._source["_type"]}}','{{event.kibana._source[field]}}',{{t}})" ng-if="event.kibana._source["rate"]!=t"><inputtype="radio"name="item_{{event.kibana._source[field]}}"value="{{t}}"onclick="postESUpdate('{{event.kibana._source["_index"]}}','{{event.kibana._source["_type"]}}','{{event.kibana._source[field]}}',{{t}})" ng-if="event.kibana._source["rate"]==t" checked>{{t}} </span></span><spanng-if="field=='mainImage' "><imgsrc="/_blobs/images/{{event.kibana._source[field]}}"/></span><spanng-if="panel.localTime && panel.timeField == field && field!='mainImage'"bo-html="event.sort[1]|tableLocalTime:event"class="table-field-value"></span></td></tr>
To display multiple images for a single recording while viewing a recording:
Display code for all photos
<trng-repeat="(key,value) in event.kibana._source track by $index"ng-class-odd="'odd'"><tdstyle="word-wrap:break-word"bo-text="key"></td><tdstyle="white-space:nowrap"><iclass="icon-search pointer"ng-click="build_search(key,value)"bs-tooltip="'Add filter to match this value'"></i><iclass="icon-ban-circle pointer"ng-click="build_search(key,value,true)"bs-tooltip="'Add filter to NOT match this value'"></i><iclass="pointer icon-th"ng-click="toggle_field(key)"bs-tooltip="'Toggle table column'"></i></td><tdstyle="white-space:pre-wrap;word-wrap:break-word"><spanng-if=" key != 'images' "bo-html="value|noXml|urlLink|stringify"></span><spanng-if=" key == 'images' "><divng-repeat="img in value"><imgsrc="/_blobs/images/{{img}}"/></div></span></td></tr>
For the voting script, use jquery, which is already in kibana
plugins / kibana / _site / index.html
Update assessment in json document, request to server
This is a call to the elasticsearch Update API to update the rate document field.
This ends the programming. Further only web interface!
Briefly about the creation of filters you have already looked at the screencast at the beginning of the article. It also shows how to select a time range on a histogram or using a timepicker. All your filters and settings can be saved as a dashboard in kibana and loaded when you need by name.
Beyond the scope of this article are regular expression searches, service security, monitoring and administration of crate.io, SQL queries through jdbc or clients for your programming language.
I repeat that to run the project you need jvm 7 or higher .
The application, with data for an example, you can download with dropbox (234MB tar.gz), unpack and run in * nix command: bin / crate or windows: bin \ crate.bat
Good luck with crate.io/kibana and in real dating !!!
PS Dropboxs decided not to issue an archive today (11/27/2014). Please tell me in the comments which public file hosting will allow you to upload a 234MB file without restrictions on the number of downloads.