Alternative Website Localization: CDN Mutating Content
Introduction
Most web developers faced the task of translating a website into several languages. The mission is quite simple, and the solution, as a rule, refers to the routine. I am sure that many will agree with the statement that localization is a boring, non-creative part of the project.
In this article, I would like to discuss an alternative website translation model. If you try to describe the principle in one sentence, then it is: CDN, which translates the content between the user and the original source.
The need for translations
I doubt that the usefulness of the multi-lingual resource is worth proving, but, nevertheless, I devote to this one small paragraph.
Any Internet site is accessible to three billion users on the planet by default - simply because your site is online. If you sell something on the site, then simply adding a language will actually lead you into a new market. At a minimum, you want to have a version in the local language (the language of the territory where you do business) and the English version, because English is half of the Internet content according to W3Techs . ')
Existing methods
Translation files
The options are different - from special formats like GNU gettext, to simple text files that your current framework can use. The result is approximately the same: at the moment of the text output, a function is called that checks the presence of the translation in the dictionary.
The ability to simply give individual files to third-party translators;
Small effect on the code.
Cons of the way:
As a rule, there is no possibility of making immediate changes;
gettext dictionaries need to be compiled and final files committed to the repository;
Relatively inconvenient and slow text management, the larger the project - the larger the files and the more complicated the hierarchy;
There are no standard mechanisms for translation work by a team of translators;
As a rule, two schemes are used in parallel - translations for backend and translations for frontend (JavaScript);
From time to time HTML tags appear in the texts for translations because it was inconvenient for programmers to pull them out.
Translations in the database
Unlike the first method, translations are stored in the project database, which allows you to make adjustments on the go. As a rule, project developers make a translation control panel for administrators, which takes additional time.
Advantages of the method:
Easier to organize the work of teams;
The ability to make immediate changes.
Cons of the way:
It is more difficult to give sections for translation to third-party translators;
Frontend texts are still translated separately from the backend texts;
From time to time HTML tags appear in the texts for translations because it was inconvenient for programmers to pull them out.
Translation by the user through JavaScript
A relatively new way is being proposed by several Western startups. All you need to do is add a link to an external JavaScript file that will begin to replace texts in the DOM based on the provided (or approved) pre-translation.
Advantages of the method:
Easy installation with almost no programming required;
Frontend and backend are translated simultaneously from the same translation repository;
There will be no HTML tags in the text repository, because all texts were processed post factum from the DOM.
Cons of the way:
Search engines will not see additional languages;
Share the link in social networks will also be impossible;
Additional network load (read the risks of delays) when opening the site.
CDN Translator
Actually, what is submitted for discussion in this article. And what if between the user and the sites insert a “layer” - an edge server capable of translating web content? Services like CloudFlare already know how to minimally mutate client pages — add Google Analytics code, for example. What if you go a step further and allow the user to replace texts and links?
Traditional CDN Behavior:
Customer requests address X;
If X is in the cache, it is immediately returned from the cache;
If the X address is not in the cache, then the Edge Server makes a request to the original site, and then returns the response to the client. Depending on the headers in the response of the original site and the rules established on the site, resource X can now be placed in the cache.
CDN Translator Behavior:
Customer requests address X;
If the X address is in the cache, then it is immediately returned from the cache as is;
If the X address is not in the cache, the edge server makes a request to the original site, and then applies the rules of mutation - replaces the links, replaces the translated texts. Depending on the headers in the response of the original site and the rules established on the site, resource X may be placed in the cache.
Step 2b in detail
Having received a response from the original site, the Edge Server has the task of how to translate it. Suggested tactics:
Pay attention to the Content-type header. If the value is not included in the list of supported, then do not try to transform the content;
Pay attention to the size of the response. If the size is above the established border - do not try to transform the content;
Start parsing and editing content. An example for an HTML page: walk through all the DOM nodes that have a descendant text node. Request translated text in the repository, passing the source text and context as parameters.
Replacing the necessary pieces of content, return the result to the user. If the headers and rules allow, then we cache the result.
The repository would be logical to implement as a stand-alone RESTful API, and the context would be conveniently set like a URL: selector. For example, we want to translate the word “Main page” as “Home” in any block of any page starting with / news, we get the context “/ news *: head”. The world is so used to CSS / jQuery style selectors that virtually any developer can start working with this syntax right away.
Since the border server is requesting translation into the repository API, the implementation of the SDK and packages for popular languages ​​and frameworks becomes quite logical. Website owners are given a choice - you can translate content through a CDN, or through our class in existing code.
Suppose we have a PHP application and use the Laravel framework. Implementing legacy support is trivial - we re-declare the trans () helper function, replace it with our own implementation, where the search is not in local text files, but in the remote API. To avoid delays with each request, use a cache or a separate proxy process.
Similarly, we can change the contents of JavaScript objects, graphics, and so on.
Advantages of the method:
Full abstraction of the application and translations - the application does not know at all about the presence of other language versions. Programmers are quietly working on the main product;
Backend and frontend-content is translated simultaneously using one translation repository;
You can simply transfer graphic images;
It is very easy to run translated versions of the site on other (separate) domains;
Compatible with any existing CDN service. You can build in a chain;
Compatible with search engines and social networks;
There will be no HTML tags in the text repository, because all texts were processed post factum from the DOM;
Easy to organize teams work.
Cons of the way:
I could not find it, but I will be very happy to help in this!
YouTube video
In order to clearly explain the concept, I shot a very short video clip that shows my prototype of such a translation system. The narrative is in English, but I added Russian subtitles.
Implementation
I have already checked the feasibility and practicality of the proposed method - I wrote a primitive version of the boundary application in PHP and Lumen.
My method that receives a request from the user and returns the translated answer:
I am sure that many will begin to doubt the paradigm because of the load on the processor - because the same nginx therefore does not want to mutate the contents of the answers in any way, as this would have a very negative impact on performance. In general, translating like this, post factum, is certainly more expensive in terms of resources.
My arguments here are as follows. We are seeing a constant reduction in the cost of IT resources over the past 5-10 years, the era of servers for $ 5 has come - for many sites it is not that scary to increase the load a little. Secondly, if I do this project after all, optimization of productivity will be one of the priorities. Surely, you can find a lot of room for improvement!
Conclusion
The industry is always moving in the direction of optimization, increased comfort and cost savings. I believe that the proposed method of localization of web applications is likely to become the main in 5-10 years.
Moreover, a CDN, like a structure, may have more and more new applications. CloudFlare offered DDoS protection to the world, Imgix makes responsive images on the fly.