📜 ⬆️ ⬇️

Rails + Sphinx =? Part I

Talk about searching in Ruby on Rails?

I decided to break the narration into two parts: the first one is a boring project setup and a simple search in one field of the same model. In the second we will dwell on the intricacies in more detail and I will try to tell you about everything that a plugin can do. By the way, in the source code (link in the text) the project has already been slightly modified for the second part, but this will not cause problems.

Installation


Install Rails 2.0.2 or later.
Download sphinx 0.9.8: www.sphinxsearch.com/downloads.html and collect it yourself, or use ports / portazhi / <insert the necessary>
$ sudo port install sphinx

Sphinx supports two subd - MySQL and PostgreSQL, but you can quite easily support any database.
Post-installation check:
Macintosh:sphinx-0.9.8 kronos$ searchd -h
Sphinx 0.9.8-release (r1371)
Copyright (c) 2001-2008, Andrew Aksyonoff
...

The path to searchd and indexer must be in the path variable.

Sphinx consists of several utilities, some of them are:
searchd - search daemon
search - console analogue of searchd for debugging / test searching.
indexer - indexer.
Create a project:
$ rails sphinxtest -d mysql
$ cd sphinxtest/

Do not forget to edit config / database.yml
')
For convenience, we will use a plugin to work with the sphinx.
I believe that there are two adequate plug-ins for Rails - ultrasphinx and Thinking Sphinx (by the way, while writing an article RailsCast was released about it). Since the latter conflicts with another plugin “redhill on rails” because of internal naming, I use the first one. But perhaps the second is better - choose yourself. :)
Installing the plugin:
$ script/plugin install git://github.com/fauna/ultrasphinx.git


Customization


$ mkdir config/ultrasphinx
cp vendor/plugins/ultrasphinx/examples/default.base config/ultrasphinx/

default.base - preparation for the sphinx configuration file. In the first part, just configure the paths to logs / pids / indexes:
# ...
searchd
{
# ...
log = /opt/local/var/db/sphinx/log/searchd.log
query_log = /opt/local/var/db/sphinx/log/query.log
pid_file = /opt/local/var/db/sphinx/log/searchd.pid
# ...
}
# ...
index
{
#
path = /opt/local/var/db/sphinx/
# ...
}
# ...

Write the code


For simplicity, let's make one controller with a form that, with the help of ajax, will look for, let's say ... Artists by name. The model of the artist will consist of one field - title:
$ script/generate controller home index search
$ script/generate model artist

Migration code, let the artist have only one title field (db / migrate /..._ create_artists.rb):
 class CreateArtists <ActiveRecord :: Migration
   def self.up
     create_table: artists do | t |
       t.string: title,: null => false
       t.timestamps
     end
   end

   def self.down
     drop_table: artists
   end
 end

Now let's say to the sphinx that we will search by one field (app / models / artist.rb):
class Artist < ActiveRecord::Base
is_indexed :fields => ['title']
end

The entry “is_indexed: fields => ['title']” means that indexing will take place on one field.

Well, create the database and perform the migration:
$ rake db:create
$ rake db:migrate

It is also worth setting up routes in the config / routes.rb file:
map.root :controller => 'home'
map.search 'search', :conditions => {:method => :get}, :controller => 'home', :action => 'search'


Controller code (app / controllers / home_controller.rb):
 class HomeController <ApplicationController
   def index
   end

   def search
     query = params [: query] .split (/ '([^'] +) '| "([^"] +) "| \ s + | \ + /). reject {| x | x.empty?}. map {| x | x.inspect} * '&&'
     @artists = Ultrasphinx :: Search.new (: query => query, 
                                       : sort_mode => 'relevance', 
                                       : class_names => ["Artist"])    
     @ artists.run
     respond_to do | format |
       format.js # search.js.erb
     end
   end
 end

With the first regular expression, we parse the search query, breaking up words by spaces, ignoring empty words (for example ,,) and adding quotes to all words. The && operation means only a set of words, for example, a query
“Bleed it out” => 'Bleed' && 'it' && 'out' will match and the “Sell it out” entry (two words out of three matched), i.e. && does not dictate a list of mandatory words, but only lists them (if you need the obligatory presence of all words, then you need to use AND, but this is in the second part).
Let's briefly go over the parameters:
: query - search query
: sort_mode - type of results sorting
: class_names - an array of model class names that will be created as a result of the search. Sphinx internally stores each document as a set of fields and their values. In Rails, working with such a view is not convenient, but much more convenient with a ready-made model object. Ultrasphinx will determine to which model the found document belongs and create an instance of it, so the search itself is no different from Artist.find (...) or Artist.paginate (yes, the search query results are compatible with will_paginate).
The @ artists.run command executes the request. Requests are executed very quickly. At the seven millionth base - thousandths of a second.
Views (templates) can be viewed in the finished project.

Now you can add something to the database:
$ script/console
>> Artist.create(:title => 'Tiesto')
>> Artist.create(:title => 'Armin')
>> Artist.create(:title => 'ATB')
>> exit

Perform the necessary preparations for the plug-in to work (this should be done every time you change something in the models in the is_indexed description):
$ rake ultrasphinx:configure
$ rake ultrasphinx:index
$ rake ultrasphinx:daemon:start ( restart )

Run and test)
$ mongrel_rails -p 3001 -d

One small but


Indices are separate data in the database separately. When we delete / change / add indexes to the database do not change. To change the database reflected on the indexes full re-indexing database:
$ rake ultrasphinx:index

Of course this is not very good. But I can assure you that the solution to the problem exists and is called delta-indexing. About this in the next part.

Summary


Sphinx is a very cool thing :). Open source, free, smartly searches and indexes. Must have!

Analogs


Of analogs, I can mark acts_as_ferret, for small projects it fits perfectly (for example, we used it at the Hackfest Rambler ), but for large amounts of data it behaves, to put it mildly, it doesn’t matter very much — it takes a very long time to index.
For postgrest tsearch2 there seems to be a bad plugin: Acts as tsearch , didn’t use it in combat, I don’t know. Still have acts_as_solr

Source: https://habr.com/ru/post/29538/


All Articles