📜 ⬆️ ⬇️

Built perl'u be!

Once, in the cold Siberian winter, the authorities decided to warm our team with arrogant love and passed another project into our hands. At first glance, several serious problems were found:

In general, like real Russian guys, we decided to break everything to the damn dogs and build anew, making it beautiful. Having scratched a turnip, it was decided that apache + modperl is for old farthers and is not interesting, and fastcgi is the way of the weak in spirit, so we’ll do everything on nginx + with its always-built embedded pearl.

Later, I felt myself on a field with densely sized glabley ...


Jedi way

Literally from the very first line of the code, the first problems began. To begin with, many of your favorite pearl-barley modules were sharpened by Apache and refused to persuade them to work as nginx. Plus, the forums were full of headlines like "nginx + perl + mason - this is impossible", which immediately alerted. As it turned out in vain! mason wound up with a half-kick and worked fine, but that later.
')
We didn’t want to sharpen the modules under nginx (see the remark about the Russian guys above), so it was decided to write them ourselves. Throwing everything up
unnecessary, we realized that what was missing was Apache :: Session, which, as can be seen from the name, works with sessions only under enemy servers. Session storage with the same interface as Apache :: Session was written in the evening and the happiness seemed so close.

The next morning I decided to start something about what, in fact, the topic - the factory of the pearl embedded in nginx. And then the fun began.

The nginx manual says it's pretty simple. It is necessary to insert into the config
         http {
                 perl_modules habrahabr / lib;
                 perl_require handler.pl;
                 server {
                         location / {
                                 perl habrahabr :: webhandler;
                         }
                 }
         }


And write the handler himself

package habrahabr ;
use nginx ;
sub webhandler {
my $r = shift ;
$r -> send_http_header ( "text/html" );
return OK if $r -> header _ only ;
$r -> print ( "hello habrahabr!\n<br/>" );
return OK ;
}
1 ;


A little pusus path in the config, this example was able to run. Next we start the mason. Here we got a hit where we need to be the first rake: usually mason is done through HTML :: Mason :: ApacheHandler, which makes the programmer well. We did not have this, so we had to go into the guts of mason and use HTML :: Mason :: Interp directly. But by default it spits all the output into stdout, and we need it through $ r-> print (). Having smoked mana
We saw in the HTML :: Mason :: Interp constructor the out_method parameter. The rake appeared like this: mason is a pretty hard thing and to create it with every page request is not comme il faut, therefore we make one object of the mason during initialization (here we also have rabel, but more on that later) and use it later when processing all requests, but then we have to do and $ r global!
It turned out something like this:

our $r ;
our $mason ;

$mason = new HTML :: Mason :: Interp (
...
out _ method => sub { $r -> print ( @ _ ) } ,
...
);

sub webhandler {
$r = shift ;
...
$mason -> exec ( $r -> uri );
}


Immediately after that, it turned out that nginx does not know how to pass parameters from a URL to variables, but only the URL itself passes - do whatever you want with it. Well, I thought, the loss is not great. The URLs were parsed, everything was neatly put into variables and moved on.

Soon, of course, it was necessary to use not GET, but POST requests, and they are handled quite nicely in nginx. Judging by the manuals, if a POST request arrives, then it can be processed by setting the handler via $ r-> has_request_body (handler_ref).
The first handler was:
sub HandlePOST {
my $body = $r -> request _ body ;
...
DoRequest ();
return OK ; # - ,
# , webhandler , OK
}


But after a short reading of self-awareness, it was quickly replaced by
sub HandlePOST {
my $body = $r -> request _ body ;
unless ( $body ) {
my $file = $r -> request _ body _ file ;
if ( $file && - f $file ) {
my ( $fh , $block );
open $fh , "<$file" ;
binmode $fh ; #
while ( read $fh , $block , 4096 ) { $body .= $block }
close $fh ;
}
}
...
DoRequest ();
}


Worked with a bang ... on a blank page without passing parameters, and so on. Hardly that - at once in the log appeared beautiful worker process exited on signal 11 . Reading the mans did not help. Debug it was found that we always fall in different ways - sometimes when calling $ r-> request_body, sometimes when $ r-> request_body_file, sometimes already in DoRequest, and sometimes after DoRequest. Two days later (!!!) with tears in my eyes, I opened the manual for the thousandth time, grabbed one single line, my heart began to beat with terrible force, and my lips silently repeated all the unprintable spells that I only know.

Everything turned out to be very simple as always: when we in the webhandler do $ r = shift; In no case should you work with this object in other places! the request object is also passed to HandlePOST, which may not be the same as our global one. Well, it turns out such a classic segolt.

After adding to the very beginning of the HandlePOST line $ r = shift; everything worked fine without problems.

Almost at the very end I had to tinker a bit with downloading files. Since we didn’t use CGI, I had to read all kinds of google there, in the handler to understand that the file came to us and do something with it. On duty, RFC822, RFC2822, RFC2046-2049 have long been read and memorized, so it immediately became clear to me that the files come as a normal MIME part. I didn't risk parsing it with my hands, so I tried parsing through MIME :: Parser. Razparsila, infection!

Another feature that I promised to write about, but I did not find a suitable place:
When you start nginx runs the pearl and loads our file with the handler. This file is honestly executed with a pearl (that is, all use, require passes, non-procedural code is executed, and so on). It seems to be okay, except that all this happiness is done from under the root =). In general, the main problem arose with mason here: it has its own cache so that it does not recompile the templates each time. The path to the cache is specified in the constructor, but this path is not complete and the mason inside it creates a handful of service folders and files. Further, when a request comes from a client, webhandler is executed with the rights given to it by the administrator (that is, nginx: nginx) and cannot compile the compiled templates into the cache, since the folders created were root.

There were two ways out: either when creating a mason object to make chmod folders on a regular user, or to create a mason object already in the webahndler (which I initially wanted to avoid).
Of the two evils, I chose the second and just below it becomes clear how it was implemented.

A few weeks later there was a launch and everything was fine, and the first million successful responses of nginx (without a gap) were perceived by me as a complete victory. Good won!

In the end, I would like to say that it is useful to read mana (especially if they are in their native language), but it is not always there that they immediately see what is needed. Well, I’ll bring the skeleton of the nginx handler using mason and bypassing all the rakes I encountered:

package Habrahabr ;
use strict ;
use warnings ;
use nginx ;
use HTML :: Mason ;

# our - ,
our $request ;
our $mason ;
our $init = 0 ;
sub WebHandler {
$request = shift ;
if ( ! $init || ! $mason ) {
$mason = new HTML :: Mason :: Interp (
comp _ root => "/opt/habrahabr/html" ,
data _ dir => "/opt/habrahabr/var/mason_cache/" ,
use _ strict => 1 ,
out _ method => sub { $request -> print ( @ _ ) } ,
error _ format => 'html' ,
error _ mode => 'output' ,
allow _ globals => [ qw ( $request %session $uri $post_param ) ], #
);
}
#
$mason -> set_global ( uri => '' );
$mason -> set_global ( post _ param => '' );
if ( $r -> request _ method eq 'POST' ) {
$request -> has_request_body ( \& Habrahabr :: ProcessPOST );
# !
}
elsif ( $r -> request _ method eq 'GET' ) {
DoRequest ();
}
else {
return DECLINED ;
}
return HTTP_OK ;
}
sub ProcessPOST {
$request = shift ; #
# post
my $body = $r -> request _ body ;
unless ( $body ) {
my $file = $r -> request _ body _ file ;
if ( $file && - f $file ) {
my ( $fh , $block );
open $fh , "<$file" ;
binmode $fh ;
while ( read $fh , $block , 4096 ) { $body .= $block } ;
close $fh ;
}
}
#
my $ ContentType = $r -> header_in ( 'Content-type' );
if ( $ ContentType =~ 'multipart/form-data' ) {
my ( $ Boundary ) = ( $ ContentType =~ / boundary = "?([^\s]+)" ?/ );
# , , MIME
$body = "Content-Type: multipart/form-data;\n boundary=\"$Boundary\"\n\n" . $body ;
my $ Parser = new MIME :: Parser ;
$ Parser -> output_to_core ( 1 );
$ Parser -> decode_bodies ( 0 );
# . . eval , eval !
eval {
$body = $ Parser -> parse_data ( $body );
} ;
}

$mason -> set_global ( post _ param => \ $body );
DoRequest ();
}
sub DoRequest {
my $uri = $request -> uri ;
# DirectoryIndex . nginx'a .
if ( ! $mason -> comp_exists ( $uri ) ) {
if ( $mason -> comp_exists ( $uri . "/index.html" ) ) {
$uri = $uri . "/index.html" ;
}
else {
$uri = "/404.html" ; # 404 , 200
}
}
$mason -> set_global ( request => $request );
$mason -> set_global ( uri => $uri );
$mason -> exec ( $uri );
}


I hope that I worked not in vain, and this topic will help at least someone. Good luck =)

_________
Text prepared in VIM . The code is colored by GNU Source Highlight.

UPD: I have nothing against pkhp - he used to write a lot on it himself. I just pearl more interesting. And the first point there is not because we consider PCP as a pre-tale, but because we write on a pearl and once it was decided to redo everything, it was obvious for us to do it on a pearl.

Source: https://habr.com/ru/post/45590/


All Articles