Documentation for Grab - library for parsing sites
I previously told in Habré about Grab - a library for parsing websites and Spider - an asynchronous module for parsing. I am glad to announce that I finally added the Grab documentation . I decided to write everything in Russian because in English it is more difficult for me to express thoughts. In fact, the scribbling turned out to be much more than it seemed at the beginning, but I did describe almost all the functions of the library. I decided to just insert the table of contents here, click on the interesting section and read about the possibilities of Grab:
By the way, I prepared the HTML code for the table of contents using a script using Grab: ')
# coding: utf-8 from grab import Grab from lxml.html import tostring g = Grab() g.go('http://grablib.org/docs/') g.tree.make_links_absolute('http://grablib.org/docs') elem = g.xpath(u'//h3[text()=" "]/following-sibling::ul[1]') toc = tostring(elem, encoding='utf-8') print toc
The official website of the grab library: grablib.org Questions on the use of grab should not be written to me in skype / jabber, but to the email list: groups.google.com/group/python-grab I also remind you that we (GrabLab) are engaged in parsing websites to order, if you need to collect and process data - please contact us.
Next, I plan to do the documentation of the asynchronous spider module.