Modern Tornado: distributed image hosting in 30 lines of code

Do you hear about tornado for the first time? Heard, but were afraid of asynchrony? Looked at him more than six months ago? Then I dedicate this article to you.

Training

We will write on the third python. If it is not installed, I advise you to use pyenv . In addition to tornado, we need a motor - asynchronous driver for mongodb:

pip3 install tornado motor

Import the necessary modules

 import bson import motor from tornado import web, gen, ioloop

Connect to gridfs

As distributed storage we will use gridfs :

 db = motor.MotorClient().habr_tornado gridfs = motor.MotorGridFS(db)

In the first line we connect to mongodb and select the base 'habr_tornado'. Next we connect to the gridfs (by default it will be an fs collection).

Upload handler

 class UploadHandler(web.RequestHandler): @gen.coroutine def get(self): files = yield gridfs.find({}).sort("uploadDate", -1).to_list(20) self.render('upload.html', files=files) @gen.coroutine def post(self): file = self.request.files['file'][0] gridin = yield gridfs.new_file(content_type=file.content_type) yield gridin.write(file.body) yield gridin.close() self.redirect('')

We were tornado.web.RequestHandler . And now, overriding the get and post methods, we write handlers for the corresponding http requests.

Decorator tornado.gen.coroutine allows to use generators instead of asynchronous callbacks. The snapshot files = yield gridfs ... visually does not get rid of synchronous files = gridfs . But the functional difference is huge. In the case of yield an asynchronous request to the database will occur and waiting for its completion. That is, while the database will “think”, the site will be able to handle other requests.

So in the get method, we asynchronously retrieve from gridfs meta-information about the last uploaded files. And direct it to the template.

In the post method, we get the image file sent (using the form drawn in the template). Then asynchronously open the gridfs file, save the image there and close it. After this, we redirect to the same page to display the updated list of files.

ShowImageHandler

Now we need to get the gridfs and display the resulting image:

 class ShowImageHandler(web.RequestHandler): @gen.coroutine def get(self, img_id): try: gridout = yield gridfs.get(bson.objectid.ObjectId(img_id)) except (bson.errors.InvalidId, motor.gridfs.NoFile): raise web.HTTPError(404) self.set_header('Content-Type', gridout.content_type) self.set_header('Content-Length', gridout.length) yield gridout.stream_to_handler(self)

Here we only process the GET request. First, we asynchronously get the file from the gridfs by id. This id is unique and was automatically generated when saving the image in UploadHandler. If in the process exceptions occur (incorrect id or file is missing) - we show the 404th page. Next, set the appropriate headers so that the browser identifies the answer as an image. And asynchronously give the body of the picture.

Routing

To bind our handlers (UploadHandler and ShowImageHandler) to the url, create an instance of tornado.web.Application :

 app = web.Application([ web.url(r'/', UploadHandler), web.url(r'/imgs/([\w\d]+)', ShowImageHandler, name='show_image'), ])

By parameter, we pass a list describing the mapping of url regularizers to their handlers. The regular group ([\w\d]+) will be passed to ShowImageHandler.get as img_id . And the parameter name='show_image' we will use in the template for generating URLs.

We start the server

 app.listen(8000) ioloop.IOLoop.instance().start()

Now the result can be seen in the browser: http: // localhost: 8000 /

Template

 <!DOCTYPE html> <html> <h1>Upload an image</h1> <form action="" method="post" enctype="multipart/form-data"> <input type="file" name="file" accept="image/*" onchange="javascript:this.form.submit()"> </form> <h2>Recent uploads</h2> {% for file in files %} {% set url = reverse_url('show_image', file['_id']) %} <a href="{{ url }}"><img src="{{ url }}" style="max-width: 50px;"></a> {% end %} </html>

Here you should be familiar with django or jinja. The only difference: end instead endfor

Result

So, we got fast, scalable, asynchronous in its essence, but pseudo-synchronous image hosting. And most importantly, now you know how it works: routing, request handlers and templates in tornado . And you can also work asynchronously with mongodb and gridfs in particular.

But ...

You probably noticed one bottleneck: file = self.request.files['file'][0] . Yes, indeed, we will load the entire image file into memory before writing it into the database. And you probably think that you can use something like NginxHttpUploadModule . However, this can now be done using tornado: tornado.web.stream_request_body . Perhaps we will do this in one of the following lessons.

Your opinion

Did you like it? Should I continue? Corrections? Wishes?

Source: https://habr.com/ru/post/230607/

All Articles