
In my
previous post about automating downloads of new episodes from
LostFilm 's
RSS feed, the
AmoN habrauser raised the correct
question of the impossibility of the method of downloading distributions I described, direct links to the torrent file of which are not contained in the
RSS feed. As an example, the cinema
hall tv tracker was given. This post is dedicated to solving this issue;)
Instead of introducing
I will retell in brief the essence of the last post. Many popular torrent clients allow you to set up tracking folders in their settings, analyzing which for the appearance of new files automatically start downloading. The shell script written earlier periodically scans the RSS tracker feed, selects distributions of interest to us, and uploads their torrent files to the tracking folder.
What's in a name?
The basis of the selection and filtering of RSS feeds of the past way was a regular expression analysis of the link to the torrent file. For example, even glancing at the link of the form
http://www.lostfilm.tv/download.php/2035/Lost.s06e07.rus.PROPER.LostFilm.TV.torrent
You can immediately see what kind of series, season and episode. However, as
AmoN correctly noted, not all RSS trackers contain direct links to torrent files, which makes our download automation task somewhat difficult. It is this feature that caused this post :)
Nuss, let's get started
To start, I carefully looked at the format of the experimental RSS feed. And that's what I saw:
')
<item>
<title>The 3 Great Tenors - VA / Classic / 2002 / MP3 / 320 kbps</title>
<description>: - </description>
<link>http://kinozal.tv/details.php?id=546381</link>
</item>
Namely: the link not only does not contain the distribution name, but is not a direct link to the torrent file. Well, it means that in order to get the torrent file itself you need to follow the link, and on the downloaded page you already have a direct link to the file.
We develop a plan
A little thought, I invented the following algorithm:
- read the RSS feed
http://kinozal.tv/rss.xml
and grep
'om choose the distribution of interest to us according to the description:
curl -s http://kinozal.tv/rss.xml | grep -iA 2 'MP3'
where " -s
" is an indication to "be quiet",
" -i
" is case-insensitive search,
" -A 2
" - tells grep along with the found string to output two more following it (it is in them that the link of interest is contained)
- among the selected distributions with
grep
'and leave only the links:
grep -ioe 'http.*[0-9]'
- open the loop on all found links:
for i in ... ; do ... ; done
where in place of the list, using the "magic" quotes `...`
substitute the two results of our previous surveys:
for i in `curl -s http://kinozal.tv/rss.xml | grep -iA 2 'MP3' | grep -ioe 'http.*[0-9]'`; do ... ; done
- in the loop, for each of the links we load the page and, again, with
grep
'we pull out a link to the torrent file from it:
curl -sb "uid=***; pass=***; countrys=ua" $i | grep -m 1 -ioe 'download.*\.torrent'
where, " -b "uid=***; pass=***; countrys=ua"
-b "uid=***; pass=***; countrys=ua"
-b "uid=***; pass=***; countrys=ua"
" - option to set the transmitted cookies with authorization information,
" -m 1
" - leaves only the first of two direct links to the torrent file (yes, the link to the same file is found twice on the cinema distribution pages)
I draw attention to the fact that neither the password nor the uid is not transmitted in the clear ! Their values can be seen by opening the window for viewing cookies in your browser, or, for example, use the plugin for FireFox .
- Download torrent files
wget
'om:
wget -nc -qi - -B "http://kinozal.tv/" -P ~/.config/watch_dir --header "Cookie: uid=***; pass=***; countrys=ua"
where from the options I will note " -B "http://kinozal.tv/"
" - setting the prefix / domain for downloading relative links (namely, they are on the pages of the movie distribution descriptions),
and " --header "Cookie: uid=***; pass=***; countrys=ua"
--header "Cookie: uid=***; pass=***; countrys=ua"
--header "Cookie: uid=***; pass=***; countrys=ua"
" - setting the header for the GET request (this time I wanted to transfer cookies in this way and not through the file :))
- go to cycle start
And what do we have
As a result, we have such a "
simple " team:
for i in `curl -s http://kinozal.tv/rss.xml | grep -iA 2 'mp3' | grep -ioe 'http.*[0-9]'`; do curl -sb "uid=***; pass=***; countrys=ua" $i | grep -m 1 -ioe 'download.*\.torrent' | wget -nc -qi - -B "http://kinozal.tv/" -P ~/.config/watch_dir --header "Cookie: uid=***; pass=***; countrys=ua"; done
And for complete happiness, this command should be written in
cron
:
*/15 * * * * > /dev/null 2>&1
Behind this all, allow me to leave :)
UPD . In the comments to my previous
post in this series, several interesting suggestions were made to optimize server load:
habrahabr.ru/blogs/p2p/87042/#comment_2609116 (check for the existence of files)
habrahabr.ru/blogs/p2p/87042/#comment_2609714 (using Last-Modified and ETag)
UPD2 . On
advice, apatrushev replaced "
head -1
" with the
grep
"
-m 1
" option.