Probably, someone from readers faced this problem. In GAE, it has long been hanging in the form of an unclosed
Issue 3379 . It seems that initially the problem concerned only Java, but now it is observed in Python (at least in 2.7). A description of the error and a solution for Java can be found, for example,
there , and in this topic we will discuss Python.
Briefly about the essence. Often sites try to set more than one cookie at a time. They do this by specifying several Set-Cookie headers in response to a request. In a strange way, the urlfetch (and the urllib / urllib2 based on it) behaves strangely: all these headers are glued together into one and separated by commas. Whether it is necessary to remind that commas are also present in the expiries fields, and sometimes in the cookie values themselves, which makes it difficult to reverse parse such a string. And the standard HTTPCookieProcessor from urllib2 and mechanize just can't handle this situation.
So, if your project uses cookies "out of the box" in urllib2 or mechanize, then you will definitely fit
The following is a simple solution.
')
Create a handler to separate problem headers before they go to the HTTPCookieProcessor:
import urllib2, re class SplitCookiesHandler(urllib2.BaseHandler): handler_order = 0 def http_response(self, req, response): headers = response.info().headers for h in headers[:]: matched = re.match("(?i)set-cookie:(.*)\n$", h) if matched is not None: headers.remove(h) for cookie in re.split(",(?= \w+[\w\d]*=)", matched.group(1)): headers.append("set-cookie:" + cookie + "\n") return response
The regular expression used as a separator can give false positives in severe cases, but I have never encountered such cookie curves in practice.
It remains only to add our handler to the new OpenDirector (or to the existing one using add_handler ()):
import cookielib cj = cookielib.CookieJar() opener = urllib2.build_opener(SplitCookiesHandler(), urllib2.HTTPCookieProcessor(cj))
Everything, we use:
r = opener.open("http://www.yandex.ru/")
In case you (like me) use mechanize, simply replace urllib2 with mechanize everywhere, and instead of mechanize.build_opener () it is more natural to use the following construction:
br = mechanize.Browser() br.add_handler(SplitCookiesHandler())