Script for downloading podcasts Python + Google Reader

Introduction

There was a useful post " Automating download podcasts on an mp3 player ". Useful for me because I don’t use iTunes and other similar software (I don’t want to discuss this :). I just need to download a pack of podcasts that are periodically collected in the reader’s feed. And I prefer PHP to Python.

I would like to hear tips - I'm just learning Python. And I like to write posts with examples for beginners. Comments would be critics ... But to the point.

Organization of the process

I keep a list of podcaster tapes in Google Reader . Tapes are tagged with their tag, and neatly lie in their folder:

')
To download new podcasts from the Podcasts folder, I wrote a small Python script. I took the library /> pyrfeed as the basis, which implements the useful GoogleReader class.

The source code of the library is available and includes a small example of working with it. There is documentation. True, I found the documentation only on the API to Google Reader, and not working with the library itself. There is also an example of a utility with a Gui interface for reading RSS feeds.

Source

Link to the archive with the source at the end.
And here is the source of the main script:

  import sys
 import os
 import time
 import urlparse
 import urllib

 import progressBar
 import GoogleReader

 downloadDir = "myDownloadDir";
 logFile = downloadDir + "PodcastsDownloadTool.log";
 tag = "Podcasts";
 login = "myGoogleReaderLogin";
 password = "myGoogleReaderPassword";

 def GetLocalFileNameFromURL (fullpath):
     (filepath, filename) = os.path.split (urlparse.urlparse (fullpath) .path)
     return downloadDir + filename

 def LogMessage (message):
     f = open (logFile, "a")
     print >> f, message;
     f.close ();
     pass

 def DownloadFile (url, filename):
     progressBar.ResetProgressBar ();
     urllib.urlretrieve (url, filename, reporthook = progressBar.ProgressBarReportHook);
     pass

 def ProcessPodcastDownloading ():
     # Check and create dir
     if not os.path.exists (downloadDir):
         os.mkdir (downloadDir);
        
     # Login to Google Reader
     gr = GoogleReader.GoogleReader ();
     gr.identify (login, password);
     if gr.login ():
         print "Login OK";
     else:
         print "Login KO";
         return

     xmlfeed = gr.get_feed (feed = "user / - / label /% s"% tag, n = 17, xt = "user / - / state / com.google / read");
     for entry in xmlfeed.get_entries ():
         try:
             googleID = entry ['google_id'];
             if entry.has_key ('enclosure'):
                 # Prepare vars and print info
                 URLToDownload = entry ['enclosure'];
                 localFilePath = GetLocalFileNameFromURL (URLToDownload);
                 print "Title:% s"% entry ['title'];
                 print "Download from URL:% s ..."% URLToDownload;
                 print "Local file:% s"% localFilePath;

                 # Download file
                 DownloadFile (url = URLToDownload, filename = localFilePath)
                
                 # Log message
                 LogMessage ("% s% s% s% s \ n"% (time.strftime ('% x% X'), URLToDownload, googleID, entry ['published']));
                
                 print "Downloaded.";
                 # Mark as readed
                 gr.set_read (googleID);
                 print "Marked.";
         except:
             #Print and log error
             print "Error:", sys.exc_info ();
             LogMessage ("% s \ nError:% s \ nEntry:% s \ nException info:% s \ n% s \ n"% ("=" * 80, time.strftime ('% x% X'), entry, sys.exc_info (), "=" * 80));
     pass

 if __name __ == '__ main__':
     ProcessPodcastDownloading ();

Explanation of the code

The main parameters are set at the beginning of the script:

downloadDir - directory where podcasts will be downloaded
logFile - log file
tag - the name of the tag / folder in Google Reader where the tapes will be viewed
login and password - username and password in Google Reader

And then nothing complicated:

authentication in the Google Reader service
getting parsed RSS feeds
cycle over records with information output and logging
actually downloading files
record mark read

The pyrfeed library itself is not included in the application. It is enough to download it, make a couple of lines (about which later), and put it in the place allowed for the import. For example, in the Lib directory of the directory where Python is installed - then the library will be available to all scripts.
My GoogleReader and web directories are located in the same directory as my script.

Interface

This is a console utility. Draw conclusions.
Simply displaying the boot process looks like this:

The progress bar is taken from some example. I don't remember where exactly. There are many examples on the Internet and most of them are alike. The source is in the application.

Patch for feed.py

Unfortunately, the GoogleFeed class does not extract a link to a file from the resulting XML.
I solved this problem by adding XML to parsing after such a fragment:

  elif dom_entry_element.localName == 'link':
     if dom_entry_element.getAttribute ('rel') == 'alternate':
         entry ['link'] = dom_entry_element.getAttribute ('href')

Such a piece:

  if dom_entry_element.getAttribute ('rel') == 'enclosure':
     entry ['enclosure'] = dom_entry_element.getAttribute ('href')

It turned out like this:

  elif dom_entry_element.localName == 'link':
     if dom_entry_element.getAttribute ('rel') == 'alternate':
         entry ['link'] = dom_entry_element.getAttribute ('href')
     if dom_entry_element.getAttribute ('rel') == 'enclosure':
         entry ['enclosure'] = dom_entry_element.getAttribute ('href')
 elif dom_entry_element.localName == 'category':

Disadvantages that suit me

Resume is not supported. In case of failure, the RSS entry will not be marked as read. The next time you run the script, the file will be downloaded again.
From here the next moment - the next time the script is run, erroneous records (for example, there is no link to the file for download in the record, it happens) will be re-processed. We could mark them with a special tag and skip them later. But I am satisfied with the subsequent manual viewing of the tape for unread entries.
Configuration storage - login and password in open form. Login is not so scary, but the password ... You can use the getpass () function or store it in another place.
You can automate the launch of the script when you connect a USB flash drive or player, for example using the USB Detect & Launch utility (it was already mentioned in Habré).

The finish

And Sources in the archive.
A note kopipastil from its page .

Source: https://habr.com/ru/post/20876/

All Articles