Some time ago, Habré
skipped a note about the client-parser saytik on Python. The author of this example sorted out the problems of multi-threaded network applications.
But it seemed to me that the same task (or rather, its main part - parallel connections with an http-server) can be effectively solved without threads.
')
Looking, for a start, in my
article about Twisted and Tornado, scratching my head and digging into the documentation on non-blocking sockets, I sketched
an asynchronous server http-client .
Below is the source of the key part of the application with explanations:
import socket
import select
import sys
import errno
import time
from config
import *def ioloop
( ip_source
, request_source
) :"" "Asynchronous loop in personip_source - infinite iterable, issuing ip-addresses for connections;request_source - an iterable that generates request bodies;»" "starttime
= time
. time
( )# open the pool of sockets; dictionaries that store connections to the request and response body
epoll
= select
. epoll
( )connections
= { } ; responses
= { } ; requests
= { }bytessent
= 0.0
bytesread
= 0.0
timeout
= 0.3
# select the first request
request
= request_source
. next
( )try :while true :# check the number of connections if they are less than the minimum# possible and left requests - add one more.#connection_num
= len ( connections
)if connection_num
< CLIENT_NUM
and request
:ip
= ip_source
. next
( )print "Opening a connection to% s." % ip
clientsocket
= socket
. socket
( socket
. AF_INET
,socket
. SOCK_STREAM
)# Somewhat trivial. Non-blocking socket throws out# exception-error EINPROGRESS, if it can not immediately connect immediately.# Ignore the error and start waiting for events on the socket.#clientsocket
. setblocking
( 0
)try :res
= clientsocket
. connect
( ( ip
, 80
) )except socket
. error
, err
:#if err
. errno
! = errno
. EINPROGRESS
:raise# We enter the socket in the pool and the dictionary of connectionsepoll
. register
( clientsocket
. fileno
( ) , select
. EPOLLOUT
)connections
[ clientsocket
. fileno
( ) ] = clientsocket
requests
[ clientsocket
. fileno
( ) ] = request
responses
[ clientsocket
. fileno
( ) ] = ""# "Pulling" - that is, the collection of events#events
= epoll
. poll
( timeout
)for fileno
, event
in events
:if event
& select
. EPOLLOUT
:# Sending part of the request ...#try :byteswritten
= connections
[ fileno
] . send
( requests
[ fileno
] )requests
[ fileno
] = requests
[ fileno
] [ byteswritten
: ]print byteswritten
, "bytes sent."bytessent
+ = byteswritten
if len ( requests
[ fileno
] ) = = 0
:epoll
. modify
( fileno
, select
. EPOLLIN
)print "switched to reading."except socket
. error
, err
:print “Socket write error:„ , err
except Exception , err
:print “Unknown socket error:„ , err
elif event
& select
. EPOLLIN
:# Read part of the answer ... “#try :bytes
= connections
[ fileno
] . recv
( 1024
)except socket
. error
, err
:# We catch the error „connection reset by peer“ -# happens with a large number of connections#if err
. errno
= = errno
. ECONNRESET
:epoll
. unregister
( fileno
)connections
[ fileno
] . close
( )del connections
[ fileno
]print »Connection reset by peer."continueelse :raise err
print len ( bytes
) , "bytes read."bytesread
+ = len ( bytes
)responses
[ fileno
] + = bytes
if not bytes
:epoll
. unregister
( fileno
)connections
[ fileno
] . close
( ) ;del connections
[ fileno
]print "Done reading ... Closed."# select the next requestif request
:request
= request_source
. next
( )print “Connections left:„ , len ( connections
)if not len ( connections
) :breakexcept KeyboardInterrupt :print “Looping interrupted by a signal.”for fd
, sock
in connections
. items
( ) :sock
. close
( )epoll
. close
( )endtime
= time
. time
( )timespent
= endtime
- starttime
return responses
, timespent
, bytesread
, bytessent
The moral here is simple - it is not necessary to shove threads everywhere, moreover, there are situations where multithreading only reduces the reliability of the program, creates known problems in testing and becomes a source of elusive bugs. If performance is uncritical, but I really want to parallelize something, then even normal processes and primitive IPC often justify themselves.
In addition, in Python, there are still no real kernel-level flows, and the damned GIL still lives on to this day. Accordingly, no advantages in performance on multi-core processors can be obtained.
This script, of course, is creepy and hastily executed, does not handle server connection failures and errors on read / write operations to the socket, does not parse server responses, but repeatedly drags the root of cnn.com site to the limit of my channel's capabilities - 800-1000 Kb / s :)
The entire script source can be found somewhere
here.PS Maybe someone has thoughts, for which you can use productive
asynchronous clients? :)