📜 ⬆️ ⬇️

Asynchronous http-client, or why multithreading is superfluous

Some time ago, Habré skipped a note about the client-parser saytik on Python. The author of this example sorted out the problems of multi-threaded network applications.

But it seemed to me that the same task (or rather, its main part - parallel connections with an http-server) can be effectively solved without threads.


')
Looking, for a start, in my article about Twisted and Tornado, scratching my head and digging into the documentation on non-blocking sockets, I sketched an asynchronous server http-client .

Below is the source of the key part of the application with explanations:

import socket
import select
import sys
import errno
import time

from config import *

def ioloop ( ip_source , request_source ) :
"" "Asynchronous loop in person

ip_source - infinite iterable, issuing ip-addresses for connections;
request_source - an iterable that generates request bodies;
»" "
starttime = time . time ( )

# open the pool of sockets; dictionaries that store connections to the request and response body
epoll = select . epoll ( )
connections = { } ; responses = { } ; requests = { }
bytessent = 0.0
bytesread = 0.0
timeout = 0.3

# select the first request
request = request_source . next ( )
try :
while true :
# check the number of connections if they are less than the minimum
# possible and left requests - add one more.
#
connection_num = len ( connections )

if connection_num < CLIENT_NUM and request :
ip = ip_source . next ( )
print "Opening a connection to% s." % ip
clientsocket = socket . socket ( socket . AF_INET ,
socket . SOCK_STREAM )
# Somewhat trivial. Non-blocking socket throws out
# exception-error EINPROGRESS, if it can not immediately connect immediately.
# Ignore the error and start waiting for events on the socket.
#
clientsocket . setblocking ( 0 )
try :
res = clientsocket . connect ( ( ip , 80 ) )
except socket . error , err :
#
if err . errno ! = errno . EINPROGRESS :
raise
# We enter the socket in the pool and the dictionary of connections
epoll . register ( clientsocket . fileno ( ) , select . EPOLLOUT )
connections [ clientsocket . fileno ( ) ] = clientsocket
requests [ clientsocket . fileno ( ) ] = request
responses [ clientsocket . fileno ( ) ] = ""

# "Pulling" - that is, the collection of events
#
events = epoll . poll ( timeout )
for fileno , event in events :
if event & select . EPOLLOUT :
# Sending part of the request ...
#
try :
byteswritten = connections [ fileno ] . send ( requests [ fileno ] )
requests [ fileno ] = requests [ fileno ] [ byteswritten : ]
print byteswritten , "bytes sent."
bytessent + = byteswritten
if len ( requests [ fileno ] ) = = 0 :
epoll . modify ( fileno , select . EPOLLIN )
print "switched to reading."
except socket . error , err :
print “Socket write error:„ , err
except Exception , err :
print “Unknown socket error:„ , err
elif event & select . EPOLLIN :
# Read part of the answer ... “
#
try :
bytes = connections [ fileno ] . recv ( 1024 )
except socket . error , err :
# We catch the error „connection reset by peer“ -
# happens with a large number of connections
#
if err . errno = = errno . ECONNRESET :
epoll . unregister ( fileno )
connections [ fileno ] . close ( )
del connections [ fileno ]
print »Connection reset by peer."
continue
else :
raise err

print len ( bytes ) , "bytes read."
bytesread + = len ( bytes )
responses [ fileno ] + = bytes
if not bytes :
epoll . unregister ( fileno )
connections [ fileno ] . close ( ) ;
del connections [ fileno ]
print "Done reading ... Closed."

# select the next request
if request :
request = request_source . next ( )

print “Connections left:„ , len ( connections )
if not len ( connections ) :
break
except KeyboardInterrupt :
print “Looping interrupted by a signal.”
for fd , sock in connections . items ( ) :
sock . close ( )
epoll . close ( )

endtime = time . time ( )
timespent = endtime - starttime
return responses , timespent , bytesread , bytessent

The moral here is simple - it is not necessary to shove threads everywhere, moreover, there are situations where multithreading only reduces the reliability of the program, creates known problems in testing and becomes a source of elusive bugs. If performance is uncritical, but I really want to parallelize something, then even normal processes and primitive IPC often justify themselves.

In addition, in Python, there are still no real kernel-level flows, and the damned GIL still lives on to this day. Accordingly, no advantages in performance on multi-core processors can be obtained.

This script, of course, is creepy and hastily executed, does not handle server connection failures and errors on read / write operations to the socket, does not parse server responses, but repeatedly drags the root of cnn.com site to the limit of my channel's capabilities - 800-1000 Kb / s :)

The entire script source can be found somewhere here.

PS Maybe someone has thoughts, for which you can use productive
asynchronous clients? :)

Source: https://habr.com/ru/post/81716/


All Articles