📜 ⬆️ ⬇️

Flask Mega-Tutorial, Part 14: I18n and L10n

This is the fourteenth article in the series, where I describe my experience of writing a Python web application using the Flask mic framework.

The purpose of this guide is to develop a fairly functional microblog application, which I decided to call microblog, in the absence of originality.



The topics of today's article are Internationalization and Localization, abbreviated I18n and L10n. We would like to make our microblog available to as many people as possible, so we should not forget that many people in the world do not speak English, or perhaps speak, but prefer their native language.
')
To make our application accessible to foreign visitors, we will use the Flask-Babel extension, which is an easy-to-use framework for translating the application into various languages.

If you have not installed Flask-Babel, then it is time to do it. For Linux and Mac users:

flask/bin/pip install flask-babel 

And for Windows users:

 flask\Scripts\pip install flask-babel 


Customization


Flask-Babel is initialized by simply creating an instance of the Babel class and passing our main Flask application to it (file app / __ init__.py):

 from flask.ext.babel import Babel babel = Babel(app) 

We also need to decide which languages ​​we will support in our application. Let's start with the support of the Spanish language, because we have a translator at hand with this language (your humble servant), but do not worry - in the future it will be enough just to add support for other languages. We will put the list of supported languages ​​in our configuration file (config.py file):

 # -*- coding: utf-8 -*- # ... # available languages LANGUAGES = { 'en': 'English', 'es': 'Español' } 

The LANGUAGES dictionary contains keys representing the codes of supported languages, and values ​​contain the names of languages. Here we use short versions of codes, but if necessary, full codes indicating the language and region can also be used. For example, if we want to support British and American variations of the English language separately, we can add 'en-US' and 'en-GB' to our dictionary.

Note that since the word Español contains a character not included in the basic ascii character set, we need to add a coding comment line to the beginning of the file to tell the python interpreter that we use UTF-8 encoding, rather than ascii (in which naturally, there is no symbol ñ).

The next step in customization is to create a function that Babel will use to determine which language to use (app / views.py file):

 from app import babel from config import LANGUAGES @babel.localeselector def get_locale(): return request.accept_languages.best_match(LANGUAGES.keys()) 

This function, wrapped by the localeselector decorator, will be called before each request, giving us a chance to choose a language to generate a response. To begin, we will use a very simple approach, we will read the contents of the Accept-Languages ​​header, which came from the browser along with the http request and will select the most appropriate language from our list of supported languages. In fact, it is even easier than it seems - the best_match method will do all the work for us.

The Accept-Languages ​​header in most browsers defaults to the language installed in the OS, but all browsers allow the user to select other languages. The user can even specify a list of languages, indicating the priority (weight) of each language. As an example, consider the complex Accept-Languages ​​header:

 Accept-Language: da, en-gb;q=0.8, en;q=0.7 

This heading tells us that the user's preferred language is Danish (weight = 1.0), then comes British English (weight = 0.8) and the last option is just English (without specifying a region) (weight = 0.7).

And the final step in customization will be the Babel configuration file, which tells Babel where to look for texts for translation contained in our code and templates (babel.cfg file):

 [python: **.py] [jinja2: **/templates/**.html] extensions=jinja2.ext.autoescape,jinja2.ext.with_ 

The first two lines tell Babel the file name patterns for our python code files and templates, respectively. The third line tells Babel the extensions that need to be activated, and thanks to which it becomes possible to search for the text to be translated in Jinja2 templates.

Mark the text for translation


We start the most tiresome stage of this task. We need to review all our code and templates and check all English sentences to be translated so that Babel can find them. For an example, take a look at this after_login function code snippet:

 if resp.email is None or resp.email == "": flash('Invalid login. Please try again.') redirect(url_for('login')) 

Here we have a flash message that we would like to translate. To mark this text for Babel, we simply pass the string to the gettext () function:

 from flask.ext.babel import gettext # ... if resp.email is None or resp.email == "": flash(gettext('Invalid login. Please try again.')) redirect(url_for('login')) 

In the template we will act in a similar way, but here we have an alternative option - to use the _ () function, which is essentially an alias for the same gettext () function. For example, the word Home in a link from our basic template:

  <li><a href="{{ url_for('index') }}">Home</a></li> 

can be noted for translation as follows:

  <li><a href="{{ url_for('index') }}">{{ _('Home') }}</a></li> 

Unfortunately, not all the text we would like to translate is as simple as the one presented above. As a more complex example, consider the following code snippet from our post.html template:

 <p><a href="{{url_for('user', nickname = post.author.nickname)}}">{{post.author.nickname}}</a> said {{momentjs(post.timestamp).fromNow()}}:</p> 

Here the sentence we would like to translate has the following structure: "<nickname> said <when>". It is tempting to mark only the word “said” for translation, but we cannot be 100% sure that the order of the name and the time in the sentence will be the same in different languages. The correct solution here is to mark the entire proposal for translation, using placeholders for the name and time, so that the translator can change the order if necessary. The situation is further complicated by the fact that the name component is a link!

There is no simple and beautiful solution to this problem. The gettext function supports placeholders using the% (name) syntax and that’s all we can do. Here is a simple example of applying placeholders in a much simpler situation:

 gettext('Hello, %(name)s', name = user.nickname) 

The translator must be aware that there are placeholders and they do not need to be touched. It is clear that the name of the placeholder (that is between “% (“ and “) s”) should not be translated, otherwise we will simply lose the true value of the variable.
But back to our post template. Here is how we mark the text to be translated:

 {% autoescape false %} <p>{{ _('%(nickname)s said %(when)s:', nickname = '<a href="%s">%s</a>' % (url_for('user', nickname = post.author.nickname), post.author.nickname), when = momentjs(post.timestamp).fromNow()) }}</p> {% endautoescape %} 

Text that the translator sees for this example:

 %(nickname)s said %(when)s: 

Which is quite good. The value of the nickname and when variables is what constitutes the main difficulty of the translated sentence, but they are passed as additional arguments to the _ () function and are not visible to the translator.
The nickname and when placeholders contain a lot of things. In particular, for nickname we have to create a whole hyperlink, since we want the username to be a link to his profile.

Since The nickname placeholder contains html, we have to turn off auto-shielding when rendering, otherwise Jinja2 will render our html elements as screened text. However, a request to render a string without shielding is deservedly considered a security risk; it is very unsafe to render the text entered by the user without shielding.

The text that will be assigned to the when placeholder is safe, because this text is fully generated by our momentjs () function. The value that comes to the place of the nickname placeholder, however, comes from the nickname field of our User model, which, in turn, is taken from the database, which gets from the web form filled by the user. If someone signs up in our application with a nickname that contains html markup or javascript, and then we render this nickname unshielded, then this can be considered an invitation to cracking. Of course, we want to avoid this, so we will conduct an inspection and remove all potential risks.

The most reasonable solution is to limit the possibility of attacks, by limiting the set of characters allowed for use in a nickname. We will start by creating a function that will convert incorrect usernames to correct ones (file app / models.py):

 import re class User(db.Model): #... @staticmethod def make_valid_nickname(nickname): return re.sub('[^a-zA-Z0-9_\.]', '', nickname) 

Here we simply remove from the nick all characters that are not letters, numbers, periods, or underscores.
When a user registers on the site, we get his (her) nickname from the OpenID provider, and convert it, if necessary, to the correct view (file app / views.py):

 @oid.after_login def after_login(resp): #... nickname = User.make_valid_nickname(nickname) nickname = User.make_unique_nickname(nickname) user = User(nickname = nickname, email = resp.email, role = ROLE_USER) #... 

In addition, in the profile editing form, where the user can change his nickname, we must extend the validation by checking the new nickname for the presence of invalid characters (file app / forms.py):

 class EditForm(Form): #... def validate(self): if not Form.validate(self): return False if self.nickname.data == self.original_nickname: return True if self.nickname.data != User.make_valid_nickname(self.nickname.data): self.nickname.errors.append(gettext('This nickname has invalid characters. Please use letters, numbers, dots and underscores only.')) return False user = User.query.filter_by(nickname = self.nickname.data).first() if user != None: self.nickname.errors.append(gettext('This nickname is already in use. Please choose another one.')) return False return True 

With the help of such rather simple measures, we excluded the possibility of an attack when rendering a nickname on a page without screening.

Extract text to be translated


I will not list here all the necessary changes to mark all the text in the code and in the templates. Interested readers can explore the change page on GitHub.
Let's imagine that we found all the text that needs translation, and wrapped it in gettext () or _ () calls. What's next?
Now we will run pybabel to extract all the text into a separate file:

 flask/bin/pybabel extract -F babel.cfg -o messages.pot app 

Windows users, use this command:

 flask\Scripts\pybabel extract -F babel.cfg -o messages.pot app 

The extract command of the pybabel utility reads the resulting configuration file, then scans all the code and template files in the folders specified by the command (in our case, only the app) and when it finds the text marked for translation, copies it to the messages.pot file.
The messages.pot file is a template file that contains all the text that needs translation. This file is used as a model for creating language files.

Language Reference Generation


The next step is to create a translation for a new language. As we planned, we will add support for the Spanish language (es language code). Here is a team that will add Spanish to the languages ​​supported by our application:

 flask/bin/pybabel init -i messages.pot -d app/translations -l es 

running pybabel with the init parameter takes the .pot file as an input value and creates a directory of the new language in the directory specified in the -d parameter for the language specified in the -l parameter. By default, Babel expects to find translations in the translations directory at the same level as the templates directory, so there we will create them.

After running the above command, the app / translations / es directory will be created. Inside, another LC_MESSAGES directory will be created, and inside it is the messages.po file. The command can be run several times with different language codes to add support for these languages.

The messages.po file created in each language directory uses a format that is the de facto standard for language translations, the same format used by gettext . There are many applications for working with .po files. For the needs of the translation, we will use poedit, since this is one of the most popular applications, which is also cross-platform.

If you are not going to stop, and decide to make the same translation - download poedit at this link . Using this application is quite simple. Below is a screenshot of the program window after translating all the text into Spanish:

image

At the top of the window is the text in the original and in the target language. At the bottom left is a window in which the translator makes a translation.

After finishing the translation and saving it to the messages.po file, it remains to take the last step:

 flask/bin/pybabel compile -d app/translations 

running pybabel with the compile option simply reads the contents of the .po file and saves the compiled version as a .mo file in the same directory. This file contains the translated text in an optimized form that can be used by our application.

Translation is ready to use. To test it, you can specify the Spanish language preferred in your browser settings, or, if you don’t want to bother with the browser settings, you can just always return “es” (Spanish language code) from the localeselector function (file app / views.py):

 @babel.localeselector def get_locale(): return "es" #request.accept_languages.best_match(LANGUAGES.keys()) 

Now, after the server is restarted, each time the gettext () or _ () function is called instead of English text, the translation will be given in the language defined by the localeselector function.

Translation update


What if we create messages.po is incomplete, that is, if some of the text to be translated is not represented in it? Nothing bad will happen, just the text without a translation will be displayed in English ...

What happens if we miss some text in English in our code or in templates? All lines that are not wrapped in a call to the gettext () or _ () function will simply be missing from the translation files, and therefore Babel will not pay attention to them and they will remain in English. As soon as we notice the missing text, we can wrap it in a call to the gettext () function, and then run the following commands to update the translation files:

 flask/bin/pybabel extract -F babel.cfg -o messages.pot app flask/bin/pybabel update -i messages.pot -d app/translations 

The extract command is identical to the one we used earlier; it simply generates an updated message.pot file with new text. The update call accepts the new messages.pot file and adds the new text to all translation files found in the directory specified by the -d parameter.

When the messages.po files in all directories are updated, we can run poedit again to translate the new texts, and then repeat the pybabel compile command to make the translation of new texts available to our application.

Translation moment.js


Now that we have added a Spanish translation for all the text found in our code and in the templates, we can launch the application to check how it looks in reality.

And then we notice that all the time stamps remained in English. The moment.js library, which we used to display dates and times, does not know anything about our desire to support some other language.

After reading the documentation for the moment.js, we find that there is a decent list of supported languages ​​and that we just need to download another javascript file with the required language. Thus, we simply download the Spanish version from the site moment.js and place it in the static / js directory with the name moment-es.min.js. Here we follow the naming convention for the moment.js library files using the pattern moment- <language code> .min.js to be able to select the desired file dynamically.

To be able to choose which javascript file to download, we must pass the language code to the template. The easiest way is to add a language code to the global variable g, just as the user information is added (file app / views.py):

 @app.before_request def before_request(): g.user = current_user if g.user.is_authenticated(): g.user.last_seen = datetime.utcnow() db.session.add(g.user) db.session.commit() g.search_form = SearchForm() g.locale = get_locale() 

And now when the language code is available in the template, we can load the required moment.js in our basic template (file app / templates / base.html):

 {% if g.locale != 'en' %} <script src="/static/js/moment-{{g.locale}}.min.js"></script> {% endif %} 

Note that there is a condition here, because if we display the English version of the site, then we have all the texts in the right form after downloading the first moment.js.

Lazy computing


If we continue to work with the Spanish version of the site for a while, we will notice another problem. When we log in to the site and then try to log in again, we see a flash message saying “Please log in to access this page.” In English. Where does this message come from? Unfortunately, it is not we who display this message, it belongs and is displayed by a third-party Flask-Login extension.

Flask-Login allows the user to customize this message, and we are going to use this opportunity, not to change the message, but to translate it. So, the first attempt (file app / __ init__.py):

 from flask.ext.babel import gettext lm.login_message = gettext('Please log in to access this page.') 

But it does not work. The gettext function should be used in the context of the request to display the translated message. If we call this function outside the request, it will simply give us the default text, and this is the English version ...

For cases like this, Flask-Babel provides another function lazy_gettext, which does not immediately look for a translation like gettext () and _ (), but instead postpones the search until the time the string is used. And here is how to properly configure this message (file app / __ init__.py):
 from flask.ext.babel import lazy_gettext lm.login_message = lazy_gettext('Please log in to access this page.') 

Finally, when using lazy_gettext, we must inform the pybabel extract command that the lazy_gettext function is also used to wrap the text to be translated. This can be done with the -k option:

 flask/bin/pybabel extract -F babel.cfg -k lazy_gettext -o messages.pot app 

So after creating the next messages.pot, we update the language directories (pybabel update), translate the added text (poedit) and re-compile the translations (pybabel compile).

And now we can say that our application is fully internationalized!

Shortcuts


Since the pybabel commands are rather long and difficult to remember, we will end this article with small scripts designed to simplify the most complex tasks we have seen before.
Script to add a language to the translation catalog (file tr_init.py):

 #!flask/bin/python import os import sys if sys.platform == 'win32': pybabel = 'flask\\Scripts\\pybabel' else: pybabel = 'flask/bin/pybabel' if len(sys.argv) != 2: print "usage: tr_init <language-code>" sys.exit(1) os.system(pybabel + ' extract -F babel.cfg -k lazy_gettext -o messages.pot app') os.system(pybabel + ' init -i messages.pot -d app/translations -l ' + sys.argv[1]) os.unlink('messages.pot') 

Script for updating the directory with new text from source codes and templates (file tr_update.py):

 #!flask/bin/python import os import sys if sys.platform == 'win32': pybabel = 'flask\\Scripts\\pybabel' else: pybabel = 'flask/bin/pybabel' os.system(pybabel + ' extract -F babel.cfg -k lazy_gettext -o messages.pot app') os.system(pybabel + ' update -i messages.pot -d app/translations') os.unlink('messages.pot') 

Script to compile the directory (file tr_compile.py):

 #!flask/bin/python import os import sys if sys.platform == 'win32': pybabel = 'flask\\Scripts\\pybabel' else: pybabel = 'flask/bin/pybabel' os.system(pybabel + ' compile -d app/translations') 

These scripts should turn work with translation into a simple task.

Conclusion


Today we have implemented an aspect of our application that often falls outside the developer’s vision. Users prefer to deal with their native language, t.ch. we can provide translation for the number of languages, how many translators we can find. If you are looking for a huge accomplishment.

In the next article we will look at what is perhaps the most difficult task in the field of I18n and L10n - automatic translation of user-generated content in real time. And we use this as a pretext for adding some Ajax magic to our application.

Here is a link to the latest microblog, including a full translation into Spanish:

Download microblog-0.14.zip.

Or, if you like it better, you can find the source code on GitHub .

Miguel

Source: https://habr.com/ru/post/236861/


All Articles