Modern Tornado Part 2: Blocking Operations

We are improving our distributed image hosting . In this part we will talk about configuring the application and enable csrf protection. Then, using the example of creating thumbnails of images, let's learn how to work with blocking tasks, run corutines in parallel and handle the exceptions that appear in them.

Configuring the application

Configuration Parameters The Application constructor takes keyword arguments. We have already encountered this by passing debug=True second parameter to the Application constructor. However, there is no need to hardcode such settings, otherwise how to run a script on production, where this parameter should obviously be False ? The standard method for django and other python frameworks is to store the general configuration in the settings.py file, at the end of which you import settings_local.py , overwriting the settings_local.py specific to this environment. Of course, you can easily use this trick, but tornado has the ability to change specific settings using command line parameters. Let's see how this is implemented:

 from tornado.options import define, options define('port', default=8000, help='run on the given port', type=int) define('db_uri', default='localhost', help='mongodb uri') define('db_name', default='habr_tornado', help='name of database') define('debug', default=True, help='debug mode', type=bool) options.parse_command_line() db = motor.MotorClient(options.db_uri)[options.db_name]

Using define we define parameters in the syntax of optparse . And then in the right place we get them using options. Calling options.parse_command_line() we overwrite the default parameter values with data from the command line. That is, in production now it is enough for us to run the application with the --debug=False parameter. And starting with the --help parameter will show us all the possible parameters:
')

 $python3 app.py --help Usage: app.py [OPTIONS] Options: --db_name name of database (default habr_tornado) --db_uri mongodb uri (default localhost) --debug debug mode (default True) --help show this help information --port run on the given port (default 8000) /home/imbolc/.pyenv/versions/3.4.0/lib/python3.4/site-packages/tornado/log.py options: --log_file_max_size max size of log files before rollover (default 100000000) --log_file_num_backups number of log files to keep (default 10) --log_file_prefix=PATH Path prefix for log files. Note that if you are running multiple tornado processes, log_file_prefix must be different for each of them (eg include the port number) --log_to_stderr Send log output to stderr (colorized if possible). By default use stderr if --log_file_prefix is not set and no other logging is configured. --logging=debug|info|warning|error|none Set the Python log level. If 'none', tornado won't touch the logging configuration. (default info)

As you can see, the tornado automatically added logging parameters.

CSRF

Now add to the application settings xsrf_cookies=True . After trying to upload a new image, we will see an error: HTTP 403: Forbidden ('\_xsrf' argument missing from POST) . This worked csrf protection. To restore the application, it’s enough to add {% module xsrf_form_html() %} to the boot form, in the page HTML code it will turn into something like: <input type="hidden" name="_xsrf" value="2|a52d8046|a83cbd25c8b7c06e2c3ac476338982d8|1406302123"/> .

Thumbnail images

When displaying thumbnails in the list of recent images, we used full images for simplicity. It is time to correct this moment. We will need a pillow (this is a modern fork of PIL - the famous image library):

 pip3 install pillow

However, a tornado is single-threaded and such a resource-intensive operation as image processing will negate all our dances with asynchrony. The simplest solution is to bring this task to a separate thread:

 import os import io from concurrent.futures import ThreadPoolExecutor from PIL import Image class UploadHandler(web.RequestHandler): executor = ThreadPoolExecutor(max_workers=os.cpu_count()) @gen.coroutine def post(self): file = self.request.files['file'][0] try: thumbnail = yield self.make_thumbnail(file.body) except OSError: raise web.HTTPError(400, 'Cannot identify image file') orig_id, thumb_id = yield [ gridfs.put(file.body, content_type=file.content_type), gridfs.put(thumbnail, content_type='image/png')] yield db.imgs.save({'orig': orig_id, 'thumb': thumb_id}) self.redirect('') @run_on_executor def make_thumbnail(self, content): im = Image.open(io.BytesIO(content)) im.convert('RGB') im.thumbnail((128, 128), Image.ANTIALIAS) with io.BytesIO() as output: im.save(output, 'PNG') return output.getvalue()

First, we create a pool of workers with a limited number of cpu cores (this is optimal for processor-intensive tasks like image processing). And if more images are loaded at the same time, the rest will wait for their turn. Then we asynchronously create a thumbnail by calling our make_thumbnail method, wrapped by the run_on_executor decorator, which will cause the task to be executed in one of the executor threads.

Notice how beautifully we intercept an OSError exception that throws a pillow if it cannot recognize the image format. We do not need to explicitly pass an error in the response as it is done in the case of callback asynchrony (for example, in node.js). Simply, we work with exceptions in synchronous style.

Next, we save the original image and thumbnail to the gridfs . Note that instead of a sequential call:

 orig_id = yield gridfs.put(file.body, content_type=file.content_type) thumb_id = yield gridfs.put(thumbnail, content_type='image/png')

We use parallel orig_id, thumb_id = yield [ ... ] . That is, files are saved at the same time. Such a parallel call Corutin makes sense for any operations independent of each other. For example, we could combine the creation of a miniature with the preservation of the original, but it would not be possible to combine the creation and saving of a miniature as the second operation depends on the results of the first.

imgs we save the image information to the imgs collection. This collection is needed to link the thumbnail and the original image. Also in the future there you can store any information about the image: the author, access rights, etc. With the advent of this collection, the methods for displaying the list and a single image will change accordingly:

 class UploadHandler(web.RequestHandler): ... @gen.coroutine def get(self): imgs = yield db.imgs.find().sort('_id', -1).to_list(20) self.render('upload.html', imgs=imgs) class ShowImageHandler(web.RequestHandler): @gen.coroutine def get(self, img_id, size): try: img_id = bson.objectid.ObjectId(img_id) except bson.errors.InvalidId: raise web.HTTPError(404, 'Bad ObjectId') img = yield db.imgs.find_one(img_id) if not img: raise web.HTTPError(404, 'Image not found') gridout = yield gridfs.get(img[size]) self.set_header('Content-Type', gridout.content_type) self.set_header('Content-Length', gridout.length) yield gridout.stream_to_handler(self)

As you can see, ShowImageHandler.get now receives an additional size parameter - specifying whether we want to receive a thumbnail of the image or the original. The regular url has changed accordingly:

 web.url(r'/imgs/([\w\d]+)/(orig|thumb)', ShowImageHandler, name='show_image'),

And the restoration of these url in the template:

 {% for img in imgs %} <a href="{{ reverse_url('show_image', img['_id'], 'orig') }}"> <img src="{{ reverse_url('show_image', img['_id'], 'thumb') }}"> </a> {% end %}

Conclusion

For today everything, the code of this and previous part is available on github .

Source: https://habr.com/ru/post/231201/

All Articles

Modern Tornado Part 2: Blocking Operations

Configuring the application

CSRF

Thumbnail images

Conclusion

More articles: