📜 ⬆️ ⬇️

Python + Selenium = not just tests

Suppose that I am a big lover of pizza and I want to have in my notebook the addresses of all the pizzerias of my city. The 2GIS service helps me to do this at home, or when I have internet access at hand. But I want to get a list of all pizzerias offline, without electronic devices at hand!
How to do it? You can use the copy-paste method, but this is tedious.

Let's copy all the data from the above 2GIS automatically:

import urllib url = 'http://maps.2gis.ru/spb/#center/30.332635,59.921282/zoom/4/query/firmbyrub/id/5348144816638211/' pizza_page = urllib.urlopen(url) for sorce_line in pizza_page: print sorce_line 


And ... disappointment. Instead of seeing a beautiful list of names, addresses, and phone numbers of pizzerias in the item div-blocks:
')
Item element in 2gis

we see a very poor code in which there is not even a hint of what is required. What is the problem? Yes, everything is very simple! Most of the data is obtained by the browser later, using js-scripts.

What to do? Let's replace urllib with a full browser.

First, download the latest version of selenium and install python-selenium:
pip install -U selenium


We define the XPath of the desired element. For example using firefox. This “path” will be required just below in our script:
image

Run selenium:
java -jar selenium-server-standalone-2.11.0.jar


We write a small, “knee”, script that allows us to quickly get the whole list of names and addresses:

 # -*- coding: utf-8 -* from selenium import webdriver from selenium.common.exceptions import NoSuchElementException import time i = 3 #   while i >= 0: browser = webdriver.Firefox() #    firefox url = "http://maps.2gis.ru/spb/#center/30.330559,59.937064/zoom/4/query/firmbyrub/id/5348144816638211/page/"+str(i)+"/" browser.get(url) #   time.sleep(5) #   .     ... j = 20 #   ()    . ,     ,    -     while j > 0: try: place_block = browser.find_element_by_xpath("/html/body/table/tbody/tr[2]/td/div/div/div[2]/div["+str(j)+"]") #XPath         firebug  firefox  devtools  chrome/safari. place_info = place_block.text.split('\n') place_name = place_info[1].split(',')[0] #    ,    place_address = place_info[2] string_for_file = ": " +place_name+ "\t: " + place_address + "\n" f = open("/home/user_name/my_pizza_list.txt", "a") #      .         .    . f.write(string_for_file) except NoSuchElementException: assert 0, "can't open url" j -= 1 browser.close() i -= 1 time.sleep(5) #     


You can safely go to drink tea or watch how your computer turns firefox on and off 4 times, and a list of addresses appears in the coveted file.

PS 2gis is just the first site that comes to hand, in which data is loaded with js scripts.
PPS The quality of the code has suffered as a result of the maximum reduction for clarity.

Source: https://habr.com/ru/post/131966/


All Articles