📜 ⬆️ ⬇️

Flask Mega-Tutorial, Part 10: Full-Text Search

This is the tenth article in the series where I describe my experience of writing a Python web application using the Flask mic framework.

The purpose of this guide is to develop a fairly functional microblog application, which I decided to call microblog in the absence of originality.


')

Brief repetition


In the previous article, we improved our queries so that they return posts to the page.

Today we will continue to work with our database, but with a different purpose. All applications that store content should provide the ability to search.

For many types of websites, you can simply enable Google, Bing, etc. index everything and provide search results. This works well with sites that are based on static pages, such as a forum. In our small application, the basic unit of content is a short user post, not a whole page. We want a more dynamic search result. For example, if we search for the word “dog”, we want to see all user posts that include this word. Obviously, the search result page does not exist until no one searches, so search engines will not be able to index it.

Introduction to Full-Text Search Systems


Unfortunately, support for full-text search in relational databases is not standardized. Each database implements full-text search in its own way, and SQLAlchemy does not have a suitable abstraction for this case.

We are now using SQLite for our database, so we could just create a full-text index using the capabilities provided by SQLite, bypassing SQLAlchemy. But this is a bad idea, because if one day we decide to switch to another database, we will have to rewrite our full-text search for another database.

Instead, we are going to leave our database for working with ordinary data, and create a specialized database for search.

There are several open source full-text search systems. Only one, as far as I know, has a Flask extension called Whoosh, and its engine is also written in Python. The advantage of using pure Python is the ability to install it and run wherever Python is available. The disadvantage is the efficiency of the search, which does not compare with the engines written in C or C ++. In my opinion, it would be an ideal solution to have an extension for Flask that can connect with different systems and abstract us from details, as Flask-SQLAlchemy does, freeing us from the nuances of various databases, but there is nothing like that in the full-text search area. Django developers have a very good extension that supports various full-text search systems called django-haystack. Maybe one day someone will create a similar extension for Flask.

But now, we implement our search using Whoosh. The extension we are going to use is Flask-WhooshAlchemy, which combines the Whoosh base with the Flask-SQLAlchemy model.

If you do not yet have Flask-WhooshAlchemy in your virtual environment, it's time to install it. Windows users should do this:

 flask\Scripts\pip install Flask-WhooshAlchemy 


All others can do this:

 flask/bin/pip install Flask-WhooshAlchemy 


Configuration


The configuration of Flask-WhooshAlchemy is very simple. We just have to tell the extension the name of our base for full-text search (the config.py ):

 WHOOSH_BASE = os.path.join(basedir, 'search.db') 


Model changes


Since Flask-WhooshAlchemy integrates Flask-SQLAlchemy, we need to specify which data should be indexed in which models (file app/models.py ):

 from app import app import flask.ext.whooshalchemy as whooshalchemy class Post(db.Model): __searchable__ = ['body'] id = db.Column(db.Integer, primary_key = True) body = db.Column(db.String(140)) timestamp = db.Column(db.DateTime) user_id = db.Column(db.Integer, db.ForeignKey('user.id')) def __repr__(self): return '<Post %r>' % (self.body) whooshalchemy.whoosh_index(app, Post) 


The model now has a new field __searchable__ , which is an array with all the fields of the __searchable__ that should be included in the index. In our case, we need only the index of the body field of our post.

We also initialize the full-text index for this model by calling the whoosh_index function.

Since we did not change the format of our database, we do not need to do a new migration.

Unfortunately, all the posts that were in the database before adding the full-text search engine will not be indexed. To make sure that the database and the search engine are synchronized, we must remove all posts from the database and start over. First, run the Python interpreter. For Windows users:

 flask\Scripts\python 

For everyone else:

 flask/bin/python 

With this request, we delete all posts:

 >>> from app.models import Post >>> from app import db >>> for post in Post.query.all(): ... db.session.delete(post) >>> db.session.commit() 


Search


Now we are ready to search. Let's first add some posts to the database. We have two ways to do this. We can start the application and add posts via a web browser as a regular user, or we can do it through an interpreter.

Through the interpreter, we can do this as follows:

 >>> from app.models import User, Post >>> from app import db >>> import datetime >>> u = User.query.get(1) >>> p = Post(body='my first post', timestamp=datetime.datetime.utcnow(), author=u) >>> db.session.add(p) >>> p = Post(body='my second post', timestamp=datetime.datetime.utcnow(), author=u) >>> db.session.add(p) >>> p = Post(body='my third and last post', timestamp=datetime.datetime.utcnow(), author=u) >>> db.session.add(p) >>> db.session.commit() 

The Flask-WhooshAlchemy extension is very cool because it connects to Flask-SQLAlchemy automatically. We do not need to maintain a full-text search index, everything is done transparently for us.

Now we have several posts indexed for full-text search and we can try to search:

 >>> Post.query.whoosh_search('post').all() [<Post u'my second post'>, <Post u'my first post'>, <Post u'my third and last post'>] >>> Post.query.whoosh_search('second').all() [<Post u'my second post'>] >>> Post.query.whoosh_search('second OR last').all() [<Post u'my second post'>, <Post u'my third and last post'>] 

As you can see in the examples, requests do not have to be limited to single words. In fact, Whoosh supports excellent search language .

Full-text search integration in our application


To make the search available to users of our application, we need to make a few small changes.

Configuration


In the configuration, we must specify how many search results should be returned ( config.py ):

 MAX_SEARCH_RESULTS = 50 


Search form


We are going to add a search form to the navigation bar at the top of the page. The location at the top is very good, since the search will be available from all pages.

First we need to add a search form class ( app/forms.py ):

 class SearchForm(Form): search = TextField('search', validators = [Required()]) 


Then we need to create a search form object and make it available to all templates. We put it in the navigation bar, which is common to all pages. A simple way to achieve this is to create a form in the before_request handler, and insert it into the global variable g (file app/views.py ):

 from forms import SearchForm @app.before_request def before_request(): g.user = current_user if g.user.is_authenticated(): g.user.last_seen = datetime.utcnow() db.session.add(g.user) db.session.commit() g.search_form = SearchForm() 


Then we will add the form to our template ( app/templates/base.html ):

 <div>Microblog: <a href="{{ url_for('index') }}">Home</a> {% if g.user.is_authenticated() %} | <a href="{{ url_for('user', nickname = g.user.nickname) }}">Your Profile</a> | <form style="display: inline;" action="{{url_for('search')}}" method="post" name="search">{{g.search_form.hidden_tag()}}{{g.search_form.search(size=20)}}<input type="submit" value="Search"></form> | <a href="{{ url_for('logout') }}">Logout</a> {% endif %} </div> 


Please note we display the search form only when the user is logged in. In the same way, the before_request handler will create the form only when the user is logged in, since our application does not show any content to unauthorized guests.

View. Search function


The action field for our form was set above to send all requests to the search function of our view. This is where we will execute our full-text queries ( app/views.py ):

 @app.route('/search', methods = ['POST']) @login_required def search(): if not g.search_form.validate_on_submit(): return redirect(url_for('index')) return redirect(url_for('search_results', query = g.search_form.search.data)) 


This function is actually not so big, it simply collects the request from the form and redirects it to another page that accepts the request as an argument. We do not search directly in this function so that the user's browser does not issue a warning about re-submitting the form if the user tries to refresh the page. This situation can be avoided by making a redirect to a POST request, then when the page is updated, the browser will update the page to which the redirect was, and not the request itself.

Results page


After the query string is submitted by the form, the POST handler passes it through redirection to the search_results handler ( app/views.py ):

 from config import MAX_SEARCH_RESULTS @app.route('/search_results/<query>') @login_required def search_results(query): results = Post.query.whoosh_search(query, MAX_SEARCH_RESULTS).all() return render_template('search_results.html', query = query, results = results) 


The search_result function sends a request to Whoosh, passing along with the request a limit on the number of results in order to protect against a potentially large number of search results.

The search is completed in the search_result template ( app/templates/search_results.html ):

 <!-- extend base layout --> {% extends "base.html" %} {% block content %} <h1>Search results for "{{query}}":</h1> {% for post in results %} {% include 'post.html' %} {% endfor %} {% endblock %} 


And here we can again reuse our post.html .

Final words


We have now completed another very important, albeit often overlooked feature, which a decent web application should have.

Below I post the updated version of the microblog application in all the changes made in this article.

Download microblog-0.10.zip .

As always, there is no database, you have to create it yourself. If you follow this series of articles, you know how to do it. If not, go back to the database article to find out.

I hope you enjoyed this tutorial.

Miguel

Source: https://habr.com/ru/post/234613/


All Articles