📜 ⬆️ ⬇️

Downloading tracks from Autotravel.ru

Like many travel enthusiasts, I find the coordinates of the sights of cities on the site autotravel.ru (hereinafter referred to as the site). To fit my needs I wrote a small utility for downloading files with sights for subsequent uploading to the navigator. The program is extremely simple, but it works exactly as I needed. In addition, the simplest way to save load time and traffic is caching.

The program, which I called AtTrackDownloader, is written in Python 3 using Beautiful Soup, a library for parsing HTML files. PyQt is used for the GUI, simply because I am familiar with Qt.

The core of the program is the autotravel class. It makes no sense to completely describe its methods (especially at the end I will give a link to the git repository). Sign for the main logic.

def __load_towns_page(self, letter): url = AutoTravelHttp + '/towns.php' params = {'l' : letter.encode('cp1251')} req = urllib2.Request(url, urllib.parse.urlencode(params).encode('cp1251')) response = urllib2.urlopen(req) return response.read().decode('utf-8') 

The __load_towns_page method loads the contents of the site page for cities starting with the letter. More precisely, it was a few months ago, but some time ago they changed the addresses of the pages and instead of the letter there is indicated 'a' + number_ of the letter. For example: A - 'a01', B - 'a02', ..., I - 'a30'.
')
Accordingly, to load cities into all letters, this method is called in a loop from the __ load_all_towns method. The resulting webpage text is passed to the __load_towns_from_url method:

 def __load_towns_from_url(self, html_src): for line in html_src.splitlines(): soup = BeautifulSoup(line, 'html.parser') area = soup.find('font', {'class' : 'travell0'}) town= soup.find('a', {'class', 'travell5'}) if town == None: town = soup.find('a', {'class', 'travell5c'}) if town == None: continue town_name = town.get_text().strip() area_name = area.get_text().strip()[1:-1] town_href = town.get('href') yield {'area' : area_name, 'town' : town_name, 'href' : town_href} 

In this method, all the main work on parsing cities goes. Dismantled cities are entered into the list of dictionaries with the keys 'area' - region or country, 'town' - the name of the city, 'href' - a short reference to the city (relative to the site).

Thus, all cities are loaded from the site into memory. It takes about 10 seconds on my computer. Naturally, this is bad, so the idea of ​​a cache of loaded cities arose. Python allows you to serialize the list in one line, unlike other languages ​​I know, which is great. The method of saving cities to a cache file is as follows:

 def __save_to_cache(self, data): with open('attd.cache', 'wb') as f: pickle.dump(data, f) 

Loading from the cache is performed in the class constructor:

 def __init__(self): try: with open('attd.cache', 'rb') as f: self.__all_towns = pickle.load(f) except FileNotFoundError: self.__all_towns = list(self.__load_all_towns()) self.__save_to_cache(self.__all_towns) 

As a result, we get a ten-second delay only at the first start.

Getting links to track files is implemented in the get_towns_track_links method, which takes the address of the page with the city:

 def get_towns_track_links(self, href): req = urllib2.Request(href) response = urllib2.urlopen(req) soup = BeautifulSoup(response.read().decode('utf-8'), 'html.parser') r = {} for link in soup.findAll('a', {'class' : 'travell5m'}): if link.get_text() == 'GPX': r['gpx'] = AutoTravelHttp + link.get('href') elif link.get_text() == 'KML': r['kml'] = AutoTravelHttp + link.get('href') elif link.get_text() == 'WPT': r['wpt'] = AutoTravelHttp + link.get('href') return r 

Since, theoretically (and earlier such cases were), there may not be any track format on the pages of the city, the available track types and links to them are recorded in the dictionary. This is then used to filter files in the file save dialog.

I see no point in describing the creation of an interface - it’s still easier for anyone who is interested in going to the repository. And for those who are not interested, I provide a screenshot:

image

Full source codes are on Github .

Source: https://habr.com/ru/post/271231/


All Articles