📜 ⬆️ ⬇️

Alternative Website Localization: CDN Mutating Content

Introduction


Most web developers faced the task of translating a website into several languages. The mission is quite simple, and the solution, as a rule, refers to the routine. I am sure that many will agree with the statement that localization is a boring, non-creative part of the project.

In this article, I would like to discuss an alternative website translation model. If you try to describe the principle in one sentence, then it is: CDN, which translates the content between the user and the original source.

The need for translations


I doubt that the usefulness of the multi-lingual resource is worth proving, but, nevertheless, I devote to this one small paragraph.

Any Internet site is accessible to three billion users on the planet by default - simply because your site is online. If you sell something on the site, then simply adding a language will actually lead you into a new market. At a minimum, you want to have a version in the local language (the language of the territory where you do business) and the English version, because English is half of the Internet content according to W3Techs .
')

Existing methods


Translation files


The options are different - from special formats like GNU gettext, to simple text files that your current framework can use. The result is approximately the same: at the moment of the text output, a function is called that checks the presence of the translation in the dictionary.

PHP example:

// gettext: echo _(", !"); //  Laravel 5 echo trans('common.hello_world'); 

Advantages of the method:


Cons of the way:


Translations in the database


Unlike the first method, translations are stored in the project database, which allows you to make adjustments on the go. As a rule, project developers make a translation control panel for administrators, which takes additional time.

Advantages of the method:


Cons of the way:


Translation by the user through JavaScript


A relatively new way is being proposed by several Western startups. All you need to do is add a link to an external JavaScript file that will begin to replace texts in the DOM based on the provided (or approved) pre-translation.

Advantages of the method:


Cons of the way:


CDN Translator


Actually, what is submitted for discussion in this article. And what if between the user and the sites insert a “layer” - an edge server capable of translating web content? Services like CloudFlare already know how to minimally mutate client pages — add Google Analytics code, for example. What if you go a step further and allow the user to replace texts and links?

Traditional CDN Behavior:

  1. Customer requests address X;
  2. If X is in the cache, it is immediately returned from the cache;
  3. If the X address is not in the cache, then the Edge Server makes a request to the original site, and then returns the response to the client. Depending on the headers in the response of the original site and the rules established on the site, resource X can now be placed in the cache.



CDN Translator Behavior:

  1. Customer requests address X;
  2. If the X address is in the cache, then it is immediately returned from the cache as is;
  3. If the X address is not in the cache, the edge server makes a request to the original site, and then applies the rules of mutation - replaces the links, replaces the translated texts. Depending on the headers in the response of the original site and the rules established on the site, resource X may be placed in the cache.

Step 2b in detail


Having received a response from the original site, the Edge Server has the task of how to translate it. Suggested tactics:

  1. Pay attention to the Content-type header. If the value is not included in the list of supported, then do not try to transform the content;
  2. Pay attention to the size of the response. If the size is above the established border - do not try to transform the content;
  3. Start parsing and editing content. An example for an HTML page: walk through all the DOM nodes that have a descendant text node. Request translated text in the repository, passing the source text and context as parameters.
  4. Replacing the necessary pieces of content, return the result to the user. If the headers and rules allow, then we cache the result.

The repository would be logical to implement as a stand-alone RESTful API, and the context would be conveniently set like a URL: selector. For example, we want to translate the word “Main page” as “Home” in any block of any page starting with / news, we get the context “/ news *: head”. The world is so used to CSS / jQuery style selectors that virtually any developer can start working with this syntax right away.

Since the border server is requesting translation into the repository API, the implementation of the SDK and packages for popular languages ​​and frameworks becomes quite logical. Website owners are given a choice - you can translate content through a CDN, or through our class in existing code.

Suppose we have a PHP application and use the Laravel framework. Implementing legacy support is trivial - we re-declare the trans () helper function, replace it with our own implementation, where the search is not in local text files, but in the remote API. To avoid delays with each request, use a cache or a separate proxy process.

Similarly, we can change the contents of JavaScript objects, graphics, and so on.

Advantages of the method:


Cons of the way:


YouTube video


In order to clearly explain the concept, I shot a very short video clip that shows my prototype of such a translation system. The narrative is in English, but I added Russian subtitles.



Implementation


I have already checked the feasibility and practicality of the proposed method - I wrote a primitive version of the boundary application in PHP and Lumen.

My method that receives a request from the user and returns the translated answer:

 /** * @param Request $request * @param WebClientInterface $crawler * @param MutatorInterface $mutator * @param TranslatorInterface $translator * @return Response */ public function show(Request $request, WebClientInterface $crawler, MutatorInterface $mutator, TranslatorInterface $translator) { $url = $request->client['origin'] . parse_url($request->url(), PHP_URL_PATH); $response = $crawler->makeRequest($request->getMethod(), $url); if ($response === false) abort(502); $mutator->initWithWebRequest($response); if ($response->isTranslatable()) $mutator->translateText($translator); if ($response->isCacheable()) $mutator->cache(60); $mutator->replaceLinks($request->client['origin'], $request->getSchemeAndHttpHost()); return (new Response($mutator->getBody(), $mutator->getStatusCode())) ->withHeaders($mutator->getHeaders()); } 

I am sure that many will begin to doubt the paradigm because of the load on the processor - because the same nginx therefore does not want to mutate the contents of the answers in any way, as this would have a very negative impact on performance. In general, translating like this, post factum, is certainly more expensive in terms of resources.

My arguments here are as follows. We are seeing a constant reduction in the cost of IT resources over the past 5-10 years, the era of servers for $ 5 has come - for many sites it is not that scary to increase the load a little. Secondly, if I do this project after all, optimization of productivity will be one of the priorities. Surely, you can find a lot of room for improvement!

Conclusion


The industry is always moving in the direction of optimization, increased comfort and cost savings. I believe that the proposed method of localization of web applications is likely to become the main in 5-10 years.

Moreover, a CDN, like a structure, may have more and more new applications. CloudFlare offered DDoS protection to the world, Imgix makes responsive images on the fly.

Source: https://habr.com/ru/post/279201/


All Articles