Multithreading in Perl, or As I watched a video about shooting Warehouse 13

It all started with the fact that I came across a video that told about the shooting of one of my favorite TV shows, Warehouse 13:
www.aoltv.com/2009/07/10/behind-the-scenes-of-warehouse-13

Cool things in the style of "steampunk", the story of mutual assistance and friendship, and a light magical atmosphere, leading away from the gray everyday life - that's what I like this series. And in general, I'm no less interested than the film itself, it happens to watch how this film is shot.

But, as is customary on the AOL TV website, the video was available only to US residents. Why impose such restrictions - I can not understand. Where and what a toad strangling them is not clear.
But such a trifle could not stop me.

Obviously, it was necessary to find a proxy server, for which I would look like a real American for the aoltv website. And for this, it was necessary to first find a list of proxies, then find workers, and then American ones.
')
By the way, I found the list of suitable proxies here:
www.proxz.com/proxy_list_anonymous_us_0.html

But before that, I walked through several sites with lists that had to be checked.
Since I do not often take a great interest in this, I didn’t have a suitable program at hand, but it’s easier for a real programmer to write his program than to look for someone else’s.

It is clear that the proxy check practically does not load the computer and the network, but it can take quite a long time, considering that the proxy may not work or be too far away. Therefore, it was logical to connect multithreading. At the same time, multithreading was tested in perl under Windows, with ActiveState perl 5.10.1

The script operation algorithm is as follows:
- get list of proxies
- run several threads that will go through the proxy from the list
- each thread checks proxy for anonymity

Perl threads are connected by a pragma

use threads;

Details: perldoc.perl.org/threads.html

If using it you will receive such a refusal:

This perl hasn’t been configured to work. (The 'useithreads' configuration hasn’t been used.)

so it's time to collect perl'y.

After that we can create a new thread with the command

$ thread = threads-> create (\ & job_todo, $ param);

job_todo is the name of the function in which the work of the thread should be described.
After the function reference, you can pass other parameters, for example, the sequence number of the stream. True, the stream has a method tid (), which returns the sequence number, but for example, we leave this parameter.
At the output we get the object $ thread.

However, if you create all threads in this way, and then do nothing in the program, the main script will end, and all threads will die while inhaling without completing the work.

For the main program to wait for threads, you need to run the command

$ thread-> join ();

Another question with threads is: how can they exchange information with each other, for example, to find out which proxies from the list have already been processed and which ones are not? The fact is that by default, each thread receives an independent copy of the declared variables before creating the threads, and they remain cut off from each other, like Cinderella from the prince.

There is a pragma for this.

use threads :: shared;

which allows you to declare and use common, i.e. shared variables. If the variable is declared like this:

my $ useme: shared = 0;

then for all threads the $ useme variable will become common - when one thread changes its value, other threads see the already changed value.

Thus, we can make an array of proxies, and to it is a shared pointer to the position of the array, on which the next raw proxy lies.

So, each thread at the beginning of the work takes the next proxy from the list, and moves the pointer further along the array.
Proxy is checked for capacity simply. As you know, there are three types of proxies - anonymous, open and non-working.
And also, anonymous are strongly anonymous (from which one can never find out who is behind it) and weakly anonymous (which in one of the variables still pass the ip-address of the secretive cunning).
We just need to understand whether proxy works at all, and what ip we get when using this proxy. For this, any suitable site that tells everyone its external ip-address.

I chose the site www.myip.ru for the experiment. However, if you check proxies on an ongoing basis, it is better to make your own simple script and put it on some server - and quickly, and do not load someone else's server.
Moreover, this script can be so simple:

#! / usr / bin / perl

print "Content-type: text / html \ n \ n $ ENV {REMOTE_ADDR}" ;

So, the stream, taking a proxy, tries to query through it the site and see what kind of ip the site returns.
If the request does not pass - the proxy does not work, if the site shows our own ip - it means that the proxy is open. But if the site shows a different ip, then we have reached the goal, proksya and working and anonymous.

The list of proxies is received by the script from a file whose format is unimportant, the main thing is that there are ip-addresses of proxies with ports in the form 12.34.56.78:8080
Of course, you can go a little further and automatically parse any of the sites that publish a list of proxies.

Also, I did not need to check the proxy for belonging to a country, since the very first proxy was from the US. This will also be easy for you to do as a homework assignment (use Geo :: IP;).

The following is the text of the script with comments. Use as you wish. In the case of making a profit, I will send the number of the donation wallet.
The main thing is not to harm anyone, and let perl be with you.

PS: video looked, - briefly, but interesting.

#! / usr / bin / perl -w

# We connect libraries
use strict ;
use LWP :: UserAgent ;
use threads ;
use threads :: shared ;

# Setting variables
my @threads ;
my @proxy ;
my $ threads = 100 ;
my $ last_p : shared = 0 ;
my $ ip_checker = 'http://www.myip.ru/get_ip.php?loc=' ;

# List of working anonymous proxies
open LOG , ">> proxy.log" ;

# Find out your current external ip
my $ ua = LWP :: UserAgent -> new ;
$ ua -> agent ( "Mozilla / 5.0" ) ;
my $ res = $ ua -> get ( $ ip_checker ) ;
exit if ( ! $ res -> is_success ) ;
my $ s = $ res -> decoded_content ;
$ s = ~ m / ( \ d + \ . \ d + \ . \ d + \ . \ d + ) / s ;
my $ myip = $ 1 ;

# Read proxy list
open FIL , 'proxy.lst' ;
while ( $ s = <FIL> )
{
while ( $ s = ~ s / ( \ d + \ . \ d + \ . \ d + \ . \ d +: \ d + ) // )
{
push ( @proxy , $ 1 ) ;
}
}
print "Proxies found:" . @proxy . " \ n " ;

# Create the right number of threads
for my $ t ( 1 .. $ threads ) {
push @threads , threads -> create ( \ & check_proxy , $ t ) ;
}
# Waiting for all threads to finish
foreach my $ t ( @threads ) {
$ t -> join ( ) ;
}

sub check_proxy
# Check proxies
{
# Current stream number
my $ num = shift ;
print "+ Thread $ num started. \ n " ;

# Endless cycle
while ( 1 )
{
# Take the next number in the list
my $ seq = $ last_p ++;
# If the list is over, finish
if ( $ seq > = @proxy )
{
print "- Thread $ num done. \ n " ;
return ;
}
# Get the next proxy from the list
my $ proxy = $ proxy [ $ seq ] ;

# Start rocking
my $ ua = LWP :: UserAgent -> new ;
$ ua -> agent ( "Mozilla / 5.0" ) ;
$ ua -> proxy ( [ 'http' ] , "http: // $ proxy /" ) ;
my $ res = $ ua -> get ( $ ip_checker ) ;

# Report
printf ( "Thread% 02d; Seq.% 03d; Proxy% 20s; Status:" , $ num , $ seq , $ proxy ) ;
# Does not work
if ( ! $ res -> is_success ) {
print "Unable to connect. \ n " ;
next ;
}

# Private page
my $ s = $ res -> decoded_content ;

# On the page of our ip, then proksya open
if ( $ s = ~ m / $ myip / s )
{
print "Open. \ n " ;
}
# Proksy does not necessarily leave your ip.
# If there is some kind of ip on the page, but not ours, then proxy is anonymous
elsif ( $ s = ~ m / (\ d + \. \ d + \. \ d + \. \ d +) / s )
{
print "Anonymous! \ n " ;
print LOG "$ proxy \ n " ;
}
# Something bad has happened
else
{
print "Not working ... \ n " ;
}
}
}

Source: https://habr.com/ru/post/128477/

All Articles

Multithreading in Perl, or As I watched a video about shooting Warehouse 13

More articles: