📜 ⬆️ ⬇️

Why the motorcycle could not replace the tank, or the translation of the REG.RU website from the Template :: Toolkit to Text :: Xslate

Behind any major Internet project is an automated information system and a website selling goods or services. The larger the project, the more difficult the logic of the site, and the greater load it has to bear. There are challenges to increase the "power" of the site and reduce the response time of pages. Like everyone who writes such systems, we periodically hold sessions on tuning the speed of our website . We optimize everything we can reach. At a certain stage, they rested on the speed of the HTML template, which is not at all clear how to “overclock”. We managed to squeeze something using caching of subpatterns , but, despite the positive results obtained, the work time of the template engine still remained the cornerstone in the speed of page generation. More radical measures were needed, perhaps even other template engines ...

About the history of one of our initiatives in the difficult task of finding the Holy Grail of the fastest templating engine, read below in the detailed report of Dmitry Karasik, who was involved in this task:

“In my opinion, now everyone is using template engines for web development. All use and slowly swear at the imperfection of the selected tool. After all, the migration of a spreading project to another template maker is very difficult, so much more often people prefer to finish something in an already existing package than to rewrite a lot of code with an unknown result.

REG.RU registrar faced the need to increase the speed of its website, written in the Template :: Toolkit , and instructed me to solve this problem. I was pleasantly surprised that people inside the company, after analyzing possible solutions, decided to transfer the project to another template engine, Text :: Xslate , while retaining the full syntax of Template :: Toolkit (hereinafter referred to as TT).
')
I had heard about Xslate, but had no previous dealings with it. According to the advertising data on the website, he overtakes TT by 158 times, and if “at least one half of what he said is true” (c), then this looks like an excellent option. And there wasn’t much to doubt because of that - most of the project was written in C, and at least, by itself, it’s faster than a pearl. Moreover, Xslate partially supports TT syntax. Looking at the code, it seemed to me that everything was “ok” - there is no complete support, probably because the author concentrated on the template engine, leaving the basic outline to those who would be interested in expanding Xslate in this direction (a kind of invitation to participate in the project). It all looked quite promising, and I decided to finish the Xslate to the point that it supported TT in full or in full enough for the customer, which we joyfully agreed about with REG.RU. The winners will be, as it seemed, everything - the client will receive a 158 times faster code, Xslate will receive a refined syntax and new users, the community will receive an excellent alternative to TT2, and I, respectively, a profit and a feather in the hat.

The problems started almost immediately. The author of Xslate Goro Fuji (alas, alas) did not respond to any letters, nor to IRC. My proposal to agree on how to expand the module so that it most corresponded to the author's intention, actually failed. Maybe my letter was not written in accordance with the rules of Japanese etiquette. I used to come across conferences with programmers from Japan, and if I couldn’t manage to establish a deeper contact than on-duty smiles and hello, it’s probably more difficult to go online.

“Well, it's nothing,” I thought, “In the end, the RTFS mantra has never failed.” If the code is quality, you can always publish a patch or a forked project, even if the author does not want to cooperate. In the end, I do not pretend to understand how to develop Xslate as a project, but I am only going to use its highly optimized engine as a base.

And I began to study the module. In fairness, it's worth saying that both the code and the Xslate architecture are great. They should be studied by students, and programmers with experience would also do well to look. Xslate is built around the fact that template engines mostly chase text back and forth, and the optimization started already at this level - replacing the usual barley concatenation

  $ a. = $ b 

copied to C is the same analog, but as light as possible. Also, all processing is also in C: atomic operations of the template engine (for example, the above concatenation or operations in terms of TT as [% IF%], [% CALL%], etc.) are not only written in C, but also the run cycle of the template file is written on it. This means that the cycles required for calling the procedures (even if written in C) from pearl were saved, and this is a rather expensive operation. This trick turned out to be possible, because a pattern that looks fairly ordinary, like, for example, written on TT

 [% IF a%]
    text a
 [% ELSE%]
    text b
 [% END%] 

Xslate translates to pseudo-assembler commands like

 .if a
 .literal "text a"
 .print # 1
 .goto 3
 .literal "text b"
 .print # 2
 .end

and then into a binary compiled file. Unlike TT, which also compiles templates, but in a pearl-barley code that looks something like this:

 Template :: Document-> new ( 
     BLOCK => sub {
         ...
         eval {
             ...
             if ($ a) {
                $ output. = "text a";
             } else {
                $ output. = "text b";
             }
             ...
         }
     }
 )
 ...

and then it is loaded with the usual eval.
But this was not enough for the author of Xslate! The code that executes the individual "assembler" commands was not made in the way that one might think, for example (schematically):

 for (i = 0; i <opcodes.length; i ++) {
     opcodes.list [i] .callback ();
 }

and using the so-called sewn code :

     LABEL (noop): TXCODE_noop (aTHX_ st);  goto * (st-> pc-> exec_code);
     LABEL (move_to_sb): TXCODE_move_to_sb (aTHX_ st);  goto * (st-> pc-> exec_code);
     LABEL (move_from_sb): TXCODE_move_from_sb (aTHX_ st);  goto * (st-> pc-> exec_code);
     LABEL (save_to_lvar): TXCODE_save_to_lvar (aTHX_ st);  goto * (st-> pc-> exec_code);
     ...
     LABEL (end): TXCODE_end (aTHX_ st);

where the execution "jumps" between the goto, until the opcode "end" is encountered. In fact, unnecessary function calls do not occur at all, if, for example, the TXCODE_move_to_sb code looks like register.a = register.b, which allows the entire processor body to theoretically fit in the processor cache (in practice this is unrealistic, since templates do not consist only of simple action - very often calls for pearl barley functions are required).

From the side of the parser, everything looked interesting too. For me personally, the mechanics of parsing, all these LL / LR / LALR parsers have never been particularly interesting, but I can appreciate the idea. Both TT and Xslate are small independent languages, and not just a set of directives (for example, on Xslate, you can "golf" here are the commercials:

  Hello, <: $ lang // "Perl":> world! 

However, TT used the old proven lex / yacc method performed by Parse :: Yapp, which personally appeals to me most. Xslate used my own parser, which was made like a top-down parser , and which I had never encountered before. I will not go into details here, I will only say that extending the syntax with this type of parser was far from the most pleasant task from my experience. Mostly due to the fact that many parts of the code were common to all syntaxes supported by Xslate, and we had to break apart each of them.

However, the biggest problems were still ahead, in runtime. As it turned out, Xslate in runtime simply does not have a mechanism similar to TT-shn [% CALL%], which can call a function defined in another template. With this, I found how to fight, having spent a couple of weeks of free time picking in his pseudo-assembler and trying to understand his principles, so as not to break anything. It also turned out that TT has the ability to use asymmetric brackets, for example, [% - ...%], but the Xslate parser does not know how. But the biggest sticking point was that TT and Xslate just have very different scope visibility for variables, i.e. if in TT I write [% foo = 'bar'%] and after that the variable foo is visible in all subtemplates, in Xslate it is a local variable.

All these problems were, of course, solvable, but again the question arose of a balance of effort and result — the pilot assessment of the project did not suggest such a scale of changes. After a meeting with the customer, it turned out that the company was interested in a performance increase of at least twice. And I decided to evaluate how much you can really raise productivity, and, roughly speaking, “is it worth it?”

With this, everything turned out to be simple - Devel :: NYTProf, an excellent profiler (after many years with Devel :: DProf especially), produced just such an interesting picture when tracing the template:

image

The table was once again five times longer, and I removed some of the lines for clarity, but even a cursory analysis suggests that the main load is not due to the TT, but independently of it. As a result, even if we make it so that the speed from replacing the TT with the Xslate is zero, then this will not give the desired two-fold increase. Nevertheless, in order not to retreat completely empty-handed, I decided to dig a little more, and found what I found:

• In main_menu.inc and index.html, I discovered a tritely slow code like

 [% FOR x%]
 [% FOR y%]
 ... do something ...
 [% END%]
 [% END%]

which is easily rewritten at a faster rate. One of these, I think, you can already raise the speed in the right two times.

• Then - Template :: Stash, part of the TT. This main suspect, it turns out, has long been firmly optimized for nowhere in Template :: Stash :: XS - apparently, once upon a time his speed also interfered with someone’s speed. However, it was the barley part that was problematic, which I, having cut back a little, lowered from 356ms to 238ms. The only problem is that my “trimming” works specifically for this customer and with this particular profile, and the patch for general use will not make sense. In other words, you can use a local hack, but then the system administrator will have to impose it every time. Doubtful benefit is obtained, in general. Who cares, here it is:

 --- Template-Toolkit-2.24 / xs / Stash.xs.0 2012-11-22 23: 26: 54.670467300 +0100
 +++ Template Toolkit-2.24 / xs / Stash.xs 2012-11-22 23: 26: 39.406594300 +0100
 @@ -1198.21 +1198.7 @@
      }
 
      if (! SvOK (RETVAL)) {
 - dSP;
 - ENTER;
 - SAVETMPS;
 - PUSHMARK (SP);
 - XPUSHs (root);
 - XPUSHs (ident);
 - PUTBACK;
 - n = call_method ("undefined", G_SCALAR);
 - SPAGAIN;
 - if (n! = 1)
 - croak ("undefined () did not return a single value \ n");
 - RETVAL = SvREFCNT_inc (POPs);
 - PUTBACK;
 - FREETMPS;
 - LEAVE;
 + RETVAL = newSVpv ("", 0);
      }
      else

Roughly speaking, calling undefined is too expensive, and you just need to return an empty string. But, if these are 60 thousand calls, then the time spent will increase accordingly.

• IO :: Uncompress :: * and IO :: Compress :: *, as it turned out, are related to memcached - it compresses the data if its volume exceeds the default megabyte. I played with this option, but the improvement was microscopic.

• Finally, Template :: Iterator, 180660 calls. I rewrote it partially on XS and lowered it from 263 to 7ms, which is 5-7% of the entire request. The result is uploaded to CPAN as Template :: Iterator :: XS . This is the only thing that can benefit the community. If you have an existing project on a TT, then you can easily embed this module like this

 use Template;
 use Template :: Iterator :: XS;
 $ Template :: Config :: ITERATOR = 'Template :: Iterator :: XS';

and see what comes of it.

The results were such that they could not boast. The question of how best (and whether it is worth it at all) is to move from TT to something else remains open to me. Moreover, it is impossible to make an unequivocal conclusion that TT is very bad - in any case, in terms of speed of implementation, but this is more likely a matter for general discussion.

Finally, I want to thank REG.RU for the opportunity to delve into the interesting code and for perseverance in publishing the Module :: Iterator :: XS module - the result of which was this article, which, on my initiative, would hardly have seen the light. Usually, companies are frankly not interested in the fact that the code that was written on their order was published anywhere. In this regard, REG.RU was a pleasant exception, for which I, on behalf of the pearl barley community, express my gratitude. ”

Source: https://habr.com/ru/post/161351/


All Articles