📜 ⬆️ ⬇️

Glitches in Python libraries or not?

I wrote here the other day a web-spider on Python, the task is, in general, simple, but it has serious loads, so you have to actually launch five spiders (in five streams), in addition, there are several initial conditions that complicate the case ... In general, the solution was interesting, I had the opportunity to carefully climb in the guts of standard plug socket , httplib and urllib2 (if you urllib2 interested, I can describe this experience).



What I want to tell you now, this is about what the addiction can lead to not follow the created objects, grafted with languages ​​with garbage collection. Monitor my spider, I noticed that there are a lot of sockets in the CLOSE_WAIT state hanging on the system. The reason for this is that the sockets are already closed from the server, but still in memory. Ie, roughly speaking, the close method was not called on the socket, and the object itself is still somewhere in memory.
')
Having rummaged in urllib2 , httplib and socket , I received the following information about the mechanism of their work:

  1. To load the page, call urllib2.OpenDirector.open .
  2. It calls the urllib2.HTTPHandler.open method, which in turn calls urllib2.AbstractHTTPHandler.do_open
  3. In do_open , an h object is created of type httplib.HTTPConnection to directly perform the communication task. An important point - this object disappears when exiting do_open !
  4. h spawns and opens a socket, storing it in its self.sock attribute.
  5. h sends the request to the server.
  6. do_open requests the server's response from h and gets an object of type r httplib.HTTPResponse .
  7. This object, when created on the basis of the socket h.sock creates the file object self.fp using the method h.sock.makefile , which will be used by the application for reading data. Again, an important point is that the socket object passed to the constructor is not saved anywhere.
  8. do_open wraps the resulting HTTPResponse into a service object and returns it to the application.
  9. The application reads the data and closes HTTPResponse .


Thus, the socket object itself (a wrapper over a real socket) may no longer exist. At least, there is no link to it anywhere. But the socket itself still lives! Nobody called him close! In short, while on a hot hand only one option came to my mind - after completing the reading, manually close the socket through the heels of the service links with the “understandable” code of the following type:
 tf.fp._sock.fp._sock.close ()

Where tf is the link obtained from urllib2.open. Such are the cakes :) This is the way back in 2.5; in 2.4 there are still a couple of bugs worse. I will be glad to any tips on how to correctly defeat this behavior.

Source: https://habr.com/ru/post/28887/


All Articles