📜 ⬆️ ⬇️

Multithreaded proxy on Perl, or how to buy on books.ru conveniently

Picture from web-site blogs.perl.org

We talked somehow with the user icoz about shopping at books.ru and deciding how not to buy the same book twice by accident. The dialogue was not very good, but the solution was convenient and showed which books were bought and which were not. Moreover, no parameters for the script are needed. The script will receive a username and password to interact with the site itself. If you bought something, it is enough to exit from the site books.ru and go back to the script picked up the books you bought.



What do we need?

The machine with the installed pearl and the Ubuntu operating system (but any Linux will do) under Windows has problems, due to the fork used, but they can be defeated by installing the module via CPAN with force. He will not pass the tests anyway, but the part we need will work.
')
Step 1: Install the necessary libraries. If they are already installed, do not be afraid - the second time they will not be installed. For lovers of ActivePerl there is ppm.bat
sudo apt-get install liburi-encode-perl libwww-perl libhtml-tokeparser-simple-perl libwww-mechanize-perl libdatetime-perl libhttp-proxy-perl 


Step 2: Create a proxy server without forgetting to install the necessary filters:
 my $proxy = HTTP::Proxy->new( engine => 'Threaded', port=>8888, max_keep_alive_requests => 0, host=>'127.0.0.1', timeout=>120); my $filter = HTTP::Proxy::BodyFilter::simple->new(\&alter_page); $proxy->push_filter(mime => 'text/html', response => HTTP::Proxy::BodyFilter::complete->new(), response => $filter); $proxy->push_filter(method => 'POST', path =>'/member/login.php', request => HTTP::Proxy::HeaderFilter::simple->new(sub { my $booklog = uri_decode($1) if $_[2]->decoded_content =~ /login\=(.*?)(?:\&|$)/; my $bookpsw = uri_decode($1) if $_[2]->decoded_content =~ /password\=(.*?)(?:\&|$)/; my @new_books = init_proxy($booklog, $bookpsw, @OWN_BOOKS); {lock (@OWN_BOOKS); @OWN_BOOKS = @new_books;} })); 

And run it:
 $proxy->start; 

Two filters are used:

Important points:


Step 3: Write the modification filter:
 sub alter_page { my ( $self, $dataref, $message, $protocol, $buffer ) = @_; return unless ${$dataref}; return unless $message->headers->content_type; foreach my $haveid (@OWN_BOOKS) { my $str = $haveid.'/?show=1"'; my $spat = quotemeta $str; my $repl = $str." style=\"text-decoration: line-through;\""; ${$dataref} =~ s/$spat/$repl/sg; } } 

Everything is simple: we create the necessary template for replacing by going over all the books we have, and mark them as purchased by crossing out!

Step 4: We write initialization by going over all orders and collecting all the books you have already bought.
  foreach my $order_id (@order_list) { $resp = $mech->get('http://www.books.ru/order.php?order='.$order_id); parse_hrefs($resp->decoded_content, sub {push @OWN_BOOKS, $1 if ($_[0] =~ /(\d+)\/download\/\?file_type\=\w{3}/);}); } my %seen = (); my @ubooks = grep { ! $seen{$_}++ } @OWN_BOOKS; 

In the end, we remove from the list all repetitions, if any.

Step 5: It would seem that everything should work, but it does not work, because you need to write:
 my @OWN_BOOKS :shared; 

Otherwise, the global variable @OWN_BOOKS will have its own for each thread.
Step 6: Install FoxyProxy or any other extension that allows you to use per site proxy, and enjoy the convenient work with books.

As usual, I am attaching the full script.
 #!/usr/bin/perl use WWW::Mechanize; use HTTP::Request::Common; use LWP; use LWP::UserAgent; use HTML::TokeParser; use DateTime; use Encode qw(decode encode); use HTTP::Proxy; use HTTP::Proxy::BodyFilter::simple; use HTTP::Proxy::Engine::Threaded; use HTTP::Proxy::BodyFilter::complete; use HTTP::Proxy::HeaderFilter::simple; use URI::Encode qw(uri_encode uri_decode); use threads; use threads::shared; use warnings; # initialisation binmode STDOUT, ":utf8"; my @OWN_BOOKS; share(@OWN_BOOKS); @OWN_BOOKS = (); my $proxy = HTTP::Proxy->new( engine => 'Threaded', port=>8888, max_keep_alive_requests => 0, host=>'127.0.0.1', timeout=>120); $proxy->engine()->max_clients(100); my $filter = HTTP::Proxy::BodyFilter::simple->new(\&alter_page); $proxy->push_filter(mime => 'text/html', response => HTTP::Proxy::BodyFilter::complete->new(), response => $filter); # $proxy->push_filter(method => 'POST', path =>'/member/login.php', request => HTTP::Proxy::HeaderFilter::simple->new(sub { my $booklog = uri_decode($1) if $_[2]->decoded_content =~ /login\=(.*?)(?:\&|$)/; my $bookpsw = uri_decode($1) if $_[2]->decoded_content =~ /password\=(.*?)(?:\&|$)/; my @new_books = init_proxy($booklog, $bookpsw, @OWN_BOOKS); {lock (@OWN_BOOKS); @OWN_BOOKS = @new_books;} print "You already has ".scalar @OWN_BOOKS." books.\n"; })); # this is a MainLoop-like method $proxy->start; sub init_proxy { my $mail = shift; my $password = shift; my @OWN_BOOKS = @_; my $mech = WWW::Mechanize->new(); $mech->agent_alias("Linux Mozilla"); my $resp = $mech->get('http://www.books.ru/member/login.php'); $mech->cookie_jar->set_cookie(0, 'cookie_first_timestamp',DateTime->now->epoch, '/', 'www.books.ru'); $mech->cookie_jar->set_cookie(0, 'cookie_pages', '1', '/', 'www.books.ru'); $resp = $mech->post('http://www.books.ru/member/login.php',[ 'login' => $mail, 'password' => $password, 'go' => 'login', 'x' => rand_from_to(20, 55), 'y' => rand_from_to(10, 19), 'token' => '' ]); $resp = $mech->get('http://www.books.ru/member/orders/'); my @order_list = $resp->decoded_content =~ /\<a\shref=\"http:\/\/www\.books\.ru\/order.php\?order\=(\d+)\"\>/gi; foreach my $order_id (@order_list) { $resp = $mech->get('http://www.books.ru/order.php?order='.$order_id); parse_hrefs($resp->decoded_content, sub {push @OWN_BOOKS, $1 if ($_[0] =~ /(\d+)\/download\/\?file_type\=\w{3}/);}); } my %seen = (); my @ubooks = grep { ! $seen{$_}++ } @OWN_BOOKS; return @ubooks; } sub parse_hrefs { my ($data, $functor) = @_; my $stream = HTML::TokeParser->new(\$data); $stream->empty_element_tags(1); while (my $token = $stream->get_token) { if ($token->[0] eq 'S' && $token->[1] eq 'a') { my $href = $token->[2]{'href'}; $functor->($href); } } } sub alter_page { my ( $self, $dataref, $message, $protocol, $buffer ) = @_; return unless ${$dataref}; return unless $message->headers->content_type; #print scalar @OWN_BOOKS."!!!!!\n"; foreach my $haveid (@OWN_BOOKS) { my $str = $haveid.'/?show=1"'; my $spat = quotemeta $str; my $repl = $str." style=\"text-decoration: line-through;\""; ${$dataref} =~ s/$spat/$repl/sg; } } sub rand_from_to { my($from, $to) = @_; return int(rand($to - $from)) + $from; } 



PS: If such is the desire of society, then I can place the modified version on my server, although I myself would not use a proxy that is out of my control for any convenience.

Source: https://habr.com/ru/post/237259/


All Articles