📜 ⬆️ ⬇️

Customize NGINX for multilingual sites


It has long been considered a good form to give site content in the language preferred by the user. Some servers determine the language at the location of the user using geolocation modules, the rest take the browser settings. User’s language preferences are often stored in a cookie, and then used during the second visit.

What method of determining the user's language is better suited - the question is quite controversial. My personal rank is the importance of language information (in descending order): cookie, browser settings, region.

For search engines, social networks and other information aggregators, it is important to know in what language the page should be indexed or loaded, for example, as a miniature in the Facebook chronicle. This means that the link must clearly indicate the language.
')
Common coding options for language information about a resource are as follows:

The first option is the most radical, each language version of the site is considered as a separate resource. There may be difficulties with the SSL certificate, you must provide all possible options in advance in the SAN DNS Host Name, or order a certificate with a mask, for example * .example.com.

The second option is the most practical, the choice of language is included in the URI, which means there will be no problems with indexing and copying the link.

The third option looks less familiar, requires additional logic when adding the remaining GET parameters and can confuse the user when copying the link. Not the best option for public links.
image
I will talk about the implementation of the second option based on the NGINX server. With minimal changes, you can apply the described settings for the first option.

Setup consists of several stages.

First, the language setting of the browser is checked. If the user has cookies set, this value overwrites the browser setting. The total value is passed to the $lang variable.

At the first stage, you need to configure the back-end to receive a GET parameter with information about the language. That is, we implement the third option from the list, but inside our system. For example, we will consider the GET parameter locale=< > , using the two-letter ISO 639-1 code .

You need to make sure that when you go to a link like http://<_back-end_>?locale=ru we get an answer in Russian.
After that, you can configure NGINX on the frontend.

The second stage is getting the language settings from the user. It is understood that when you visit the server sets a cookie in the browser of the client with the preferred language. The cookie is called $ lang.

In the configuration of the site we write

 map $http_accept_language $browser_lang { default en; ~ru ru; } map $cookie_lang $lang { default $browser_lang; ~en en; ~ru ru; } 

First you need to select the query type /NN/* in a separate location. We use regular expressions with variable allocation.

 location ~ '^/(?<lang_code>[\D-]{2})/(?<rest_uri>.*)' 

Save the two-character code to the $lang_code variable, everything else to the $rest_uri variable

You can also redirect close languages ​​to one existing one, for example, for a Ukrainian or Belarusian locale, it is better to show the site in Russian than in English.

 if ($lang_code ~* (uk|be)) { return 301 http://$host/ru/$rest_uri$is_args$args; } 

If the code is unknown, then the English version of the site is used.

 if ($lang_code !~* (en|ru)) { return 301 http://$host/en/$rest_uri$is_args$args; } 

For if-constructions, the order is important. Therefore, you must first put the block on the compliance check, and only at the end - on the non-conformity check.

Next, you need to clear the custom link from the possible use of the locale parameter in a GET request. It is not known how the back-end will behave if you send duplicate arguments to it, such as ?locale=en&locale=ru . Therefore, if a user comes with an example.com/en/?locale=ru link, then locale=ru better not to send to the back-end.

 if ($args ~ (.*)locale=[^&]*(.*)) { set $args $1$2; } 

We clean the repeating ampersands

  if ($args ~ (.*)&&+(.*)) { set $args $1&$2; } 

We remove the ampersand at the beginning

 if ($args ~ ^&(.*)) { set $args $1; } 

We remove the ampersand at the end

 if ($args ~ (.*)&$) { set $args $1; } 

Everything, it remains only to transfer the necessary parameters to the back-end. In my example, everything goes to a group of servers, registered as a back-end in the configuration section of upstream .

 proxy_pass http://back-end/$rest_uri?locale=$lang_code&$args; 

The final configuration looks like this
 ## get locale map $http_accept_language $browser_lang { default en; ~ru ru; } map $cookie_lang $lang { default $browser_lang; ~en en; ~ru ru; } upstream back-end { ip_hash; server 172.21.71.15:8080; # vm-deb-osl-scala-1 server 172.21.71.16:8080; # vm-deb-osl-scala-2 server 172.21.71.17:8080; # vm-deb-osl-scala-3 server 172.21.71.18:8080; # vm-deb-osl-scala-4 keepalive 32; } server { listen 109.233.59.100:80; server_name ruvpn.net; location / { # Redirect to locale return 301 http://$host/$lang$uri$is_args$args; } # Handle URL with locale location ~ '^/(?<lang_code>[\w-]{2})/(?<rest_uri>.*)' { # Redirect to Russian for some CIS countries if ($lang_code ~* (uk|be|kk)) { return 301 http://$host/ru/$rest_uri$is_args$args; } # Redirect to English for unknown languages if ($lang_code !~* (en|ru)) { return 301 http://$host/en/$rest_uri$is_args$args; } if ($args ~ (.*)locale=[^&]*(.*)) { set $args $1$2; } # Cleanup any repeated & introduced if ($args ~ (.*)&&+(.*)) { set $args $1&$2; } # Cleanup leading & if ($args ~ ^&(.*)) { set $args $1; } # Cleanup ending & if ($args ~ (.*)&$) { set $args $1; } proxy_pass http://back-end/$rest_uri?locale=$lang_code&$args; include /etc/nginx/proxy.conf; } 

You can check how it works on a real site. As you have already noticed from the sample configuration, the resource http://ruvpn.net is configured using this scheme. All requests of type ruvpn.net/en/product/details/4 will display the page in Russian, while the request ruvpn.net/sv/product/details/4 will be redirected to ruvpn.net/en/product/details/4 , as the Swedish version of the site does not exist. When you go to the root link ruvpn.net , it will automatically redirect to ruvpn.net/ru or ruvpn.net/en , depending on your language settings.
The only drawback of the method described is that you cannot use links with two characters at the beginning of a URI for something other than choosing a language. But this is a question of site architecture and is easily solved when designing.

Source: https://habr.com/ru/post/183060/


All Articles