Suppose that I am a big lover of pizza and I want to have in my notebook the addresses of all the pizzerias of my city. The
2GIS service helps me to do this at home, or when I have internet access at hand. But I want to get a list of all pizzerias offline, without electronic devices at hand!
How to do it? You can use the copy-paste method, but this is tedious.
Let's copy all the data from the above 2GIS automatically:
import urllib url = 'http://maps.2gis.ru/spb/#center/30.332635,59.921282/zoom/4/query/firmbyrub/id/5348144816638211/' pizza_page = urllib.urlopen(url) for sorce_line in pizza_page: print sorce_line
And ... disappointment. Instead of seeing a beautiful list of names, addresses, and phone numbers of pizzerias in the item div-blocks:
')

we see a very poor code in which there is not even a hint of what is required. What is the problem? Yes, everything is very simple! Most of the data is obtained by the browser later, using js-scripts.
What to do? Let's replace urllib with a full browser.
First, download
the latest version of selenium and install python-selenium:
pip install -U selenium
We define the XPath of the desired element. For example using firefox. This “path” will be required just below in our script:

Run selenium:
java -jar selenium-server-standalone-2.11.0.jar
We write a small, “knee”, script that allows us to quickly get the whole list of names and addresses:
# -*- coding: utf-8 -* from selenium import webdriver from selenium.common.exceptions import NoSuchElementException import time i = 3 # while i >= 0: browser = webdriver.Firefox() # firefox url = "http://maps.2gis.ru/spb/#center/30.330559,59.937064/zoom/4/query/firmbyrub/id/5348144816638211/page/"+str(i)+"/" browser.get(url) # time.sleep(5) # . ... j = 20 # () . , , - while j > 0: try: place_block = browser.find_element_by_xpath("/html/body/table/tbody/tr[2]/td/div/div/div[2]/div["+str(j)+"]") #XPath firebug firefox devtools chrome/safari. place_info = place_block.text.split('\n') place_name = place_info[1].split(',')[0] # , place_address = place_info[2] string_for_file = ": " +place_name+ "\t: " + place_address + "\n" f = open("/home/user_name/my_pizza_list.txt", "a") # . . . f.write(string_for_file) except NoSuchElementException: assert 0, "can't open url" j -= 1 browser.close() i -= 1 time.sleep(5) #
You can safely go to drink tea or watch how your computer turns firefox on and off 4 times, and a list of addresses appears in the coveted file.
PS 2gis is just the first site that comes to hand, in which data is loaded with js scripts.
PPS The quality of the code has suffered as a result of the maximum reduction for clarity.