Talk about searching in Ruby on Rails?
I decided to break the narration into two parts: the first one is a boring project setup and a simple search in one field of the same model. In the second we will dwell on the intricacies in more detail and I will try to tell you about everything that a plugin can do. By the way, in the source code (link in the text) the project has already been slightly modified for the second part, but this will not cause problems.
Installation
Install Rails 2.0.2 or later.
Download sphinx 0.9.8:
www.sphinxsearch.com/downloads.html and collect it yourself, or use ports / portazhi / <insert the necessary>
$ sudo port install sphinx
Sphinx supports two subd - MySQL and PostgreSQL, but you can quite easily support any database.
Post-installation check:
Macintosh:sphinx-0.9.8 kronos$ searchd -h
Sphinx 0.9.8-release (r1371)
Copyright (c) 2001-2008, Andrew Aksyonoff
...
The path to searchd and indexer must be in the path variable.
Sphinx consists of several utilities, some of them are:
searchd - search daemon
search - console analogue of searchd for debugging / test searching.
indexer - indexer.
Create a project:
$ rails sphinxtest -d mysql
$ cd sphinxtest/
Do not forget to edit config / database.yml
')
For convenience, we will use a plugin to work with the sphinx.
I believe that there are two adequate plug-ins for Rails -
ultrasphinx and
Thinking Sphinx (by the way, while writing an article
RailsCast was
released about it). Since the latter conflicts with another plugin “redhill on rails” because of internal naming, I use the first one. But perhaps the second is better - choose yourself. :)
Installing the plugin:
$ script/plugin install git://github.com/fauna/ultrasphinx.git
Customization
$ mkdir config/ultrasphinx
cp vendor/plugins/ultrasphinx/examples/default.base config/ultrasphinx/
default.base - preparation for the sphinx configuration file. In the first part, just configure the paths to logs / pids / indexes:
# ...
searchd
{
# ...
log = /opt/local/var/db/sphinx/log/searchd.log
query_log = /opt/local/var/db/sphinx/log/query.log
pid_file = /opt/local/var/db/sphinx/log/searchd.pid
# ...
}
# ...
index
{
#
path = /opt/local/var/db/sphinx/
# ...
}
# ...
Write the code
For simplicity, let's make one controller with a form that, with the help of ajax, will look for, let's say ... Artists by name. The model of the artist will consist of one field - title:
$ script/generate controller home index search
$ script/generate model artist
Migration code, let the artist have only one title field (db / migrate /..._ create_artists.rb):
class CreateArtists <ActiveRecord :: Migration
def self.up
create_table: artists do | t |
t.string: title,: null => false
t.timestamps
end
end
def self.down
drop_table: artists
end
end
Now let's say to the sphinx that we will search by one field (app / models / artist.rb):
class Artist < ActiveRecord::Base
is_indexed :fields => ['title']
end
The entry “is_indexed: fields => ['title']” means that indexing will take place on one field.
Well, create the database and perform the migration:
$ rake db:create
$ rake db:migrate
It is also worth setting up routes in the config / routes.rb file:
map.root :controller => 'home'
map.search 'search', :conditions => {:method => :get}, :controller => 'home', :action => 'search'
Controller code (app / controllers / home_controller.rb):
class HomeController <ApplicationController
def index
end
def search
query = params [: query] .split (/ '([^'] +) '| "([^"] +) "| \ s + | \ + /). reject {| x | x.empty?}. map {| x | x.inspect} * '&&'
@artists = Ultrasphinx :: Search.new (: query => query,
: sort_mode => 'relevance',
: class_names => ["Artist"])
@ artists.run
respond_to do | format |
format.js # search.js.erb
end
end
end
With the first regular expression, we parse the search query, breaking up words by spaces, ignoring empty words (for example ,,) and adding quotes to all words. The && operation means only a set of words, for example, a query
“Bleed it out” => 'Bleed' && 'it' && 'out' will match and the “Sell it out” entry (two words out of three matched), i.e. && does not dictate a list of mandatory words, but only lists them (if you need the obligatory presence of all words, then you need to use AND, but this is in the second part).
Let's briefly go over the parameters:
: query - search query
: sort_mode - type of results sorting
: class_names - an array of model class names that will be created as a result of the search. Sphinx internally stores each document as a set of fields and their values. In Rails, working with such a view is not convenient, but much more convenient with a ready-made model object. Ultrasphinx will determine to which model the found document belongs and create an instance of it, so the search itself is no different from Artist.find (...) or Artist.paginate (yes, the search query results are compatible with will_paginate).
The @ artists.run command executes the request. Requests are executed very quickly. At the seven millionth base - thousandths of a second.
Views (templates) can be viewed in the
finished project.Now you can add something to the database:
$ script/console
>> Artist.create(:title => 'Tiesto')
>> Artist.create(:title => 'Armin')
>> Artist.create(:title => 'ATB')
>> exit
Perform the necessary preparations for the plug-in to work (this should be done every time you change something in the models in the is_indexed description):
$ rake ultrasphinx:configure
$ rake ultrasphinx:index
$ rake ultrasphinx:daemon:start ( restart )
Run and test)
$ mongrel_rails -p 3001 -d
One small but
Indices are separate data in the database separately. When we delete / change / add indexes to the database do not change. To change the database reflected on the indexes full re-indexing database:
$ rake ultrasphinx:index
Of course this is not very good. But I can assure you that the solution to the problem exists and is called delta-indexing. About this in the next part.
Summary
Sphinx is a very cool thing :). Open source, free, smartly searches and indexes. Must have!
Analogs
Of analogs, I can mark acts_as_ferret, for small projects it fits perfectly (for example,
we used it at the
Hackfest Rambler ), but for large amounts of data it behaves, to put it mildly, it doesn’t matter very much — it takes a very long time to index.
For postgrest tsearch2 there seems to be a bad plugin:
Acts as tsearch , didn’t use it in combat, I don’t know. Still have
acts_as_solr