📜 ⬆️ ⬇️

Using Xapian Full-Text Search Library in Python

Today, in the era of Web 2.0, when content on sites is becoming more and more, developers face the challenge of implementing full-text search.

There are few options:

The third option seems to be the best, because it combines the advantages of the other two options. The truth is not without flaws here either - the library requires installation, sometimes even starting the daemon (for example, Sphinx ), which can be unacceptable.

There are many solutions, each has its own advantages and disadvantages. I would like to dwell in more detail on the relatively obscure Xapian library.
')

Overview


This open (GPL) cross-platform library is written in C ++, there are bindings to Python, PHP, Ruby, Perl, Java, Tcl, and C #.

Library features:

In a sense, the main disadvantage of Xapian is the binding to programming languages ​​other than C ++. SWIG is used to generate the binding code, so the API in it completely coincides with the version for C ++ .

Fortunately for Python there is a simple and effective Xappy wrapper that takes care of all the dirty work.

Installation


The first step is to install Xapian itself, a binding to Python and Xappy. Most GNU / Linux distributions already have all the necessary packages in the repositories, for example, you need to install the packages in Ubuntu 10.10:
sudo apt-get install libxapian15 python-xapian python-xappy 

Xappy is also available via easy_install or pip:
 sudo pip install xappy 

Indexing


Let's try to index something:
 import xappy #         #        connection = xappy.IndexerConnection('/path/to/base') #    connection.add_field_action( 'title', xappy.FieldActions.INDEX_FREETEXT, weight=5, language='ru') connection.add_field_action( 'description', xappy.FieldActions.INDEX_FREETEXT, language='ru') 

When you open a connection for indexing, a new (or already existing) search index database will be created - a folder with a set of files. The base format is independent of the operating system.

After opening, you must specify the properties of the index fields: name, type, and other attributes.

Field type can be:

To add a document like this code:
 #    doc = xappy.UnprocessedDocument() #   doc.fields.append(xappy.Field('title', '  ')) doc.fields.append(xappy.Field('description', '  ')) #    connection.add(doc) 

Each document should have a unique identifier, in the example above it will be added automatically, but you can specify your own:
 #      for posts_item in posts: doc = xappy.UnprocessedDocument() #       #    ! doc.id = posts_item.id doc.fields.append(xappy.Field('title', posts_item.title)) doc.fields.append(xappy.Field('description', posts_item.description) connection.add(doc) 

After adding documents, it is necessary to write all changes to the disk and close the connection:
 connection.flush() connection.close() 

Everything, the index is created!

Search


To search for an existing index, you need to open a connection to search the search index database:
 import xappy #         #        connection = xappy.SearchConnection('/path/to/base') 

It is possible that new documents were indexed after the discovery of the search connection. In this case, you need to re-open the connection to gain access to the current database:
 connection.reopen() 

There are several methods for performing a search query (the SearchConnection class), the simplest is query_parse:
 #    query = connection.query_parse('') #    10  #      10, 20  .. results = connection.search(query, 0, 10) # -  if results.matches_estimated > 0: for results_item in results: print(results_item.rank, results_item.id) else: print('  ') 

For fields with the type STORE_CONTENT or INDEX_EXACT, you can display their contents, which allows, for example, not to select the selected records from the main database by ID, and only get by with the search index:
 for results_item in results: print(results_item.data['title']) 


Related Links


Of course, this is not all what Xapian is capable of. These and other features are discussed in more detail in the Xappy 0.5 documentation , you can also refer to the official Xapian documentation and some materials are in this Xapian blog in English.

Source: https://habr.com/ru/post/113657/


All Articles