📜 ⬆️ ⬇️

Parser RSS to bash for Lostfilm: gentle mode for RSS server, check downloaded

Good day!
Once, thanks to this topic, a good parser for Lostfilm was written. Now I would like to share my own refinement of the script related to changing the format of links on Lostfilm, adding a check for updating the tape and checking the downloaded one.

First of all, to reduce the load on the RSS server, when requesting a tape using wget, you need to use "If-Modified-Since:". Then, in the absence of updates, we will not download and process the entire tape. This approach will reduce the load on the server (and a little on our computer).
Use the following command option:

# If-Modified-Since
lastmod="$( grep -m 1 Last-Modified response.out )"

header="If-Modified-Since:"${lastmod#*:}


The response.out file stores the last response of the RSS server.
Using $ {lastmod # *:} from the Last-Modified line found in the last server's answer, we cut off all unnecessary to ":".
')
The title is ready, now we will create filters to search for the necessary series:

movies='House.MD|IT.Crowd|Persons.Unknown|Legend.of.the.Seeker|Leverage|Warehouse.13|Futurama'

quality='\.720p\.|\.HD\.|Persons.Unknown|Legend.of.the.Seeker|IT.Crowd'

movies contains a list of downloadable TV shows. But on Lostfilm they usually come in two quality versions, and I personally try to download everything in high quality. So you need an additional filter - quality . But not all serials were released in 720p or HD, so we will add them to the end of the filter so that they can also pass the quality check!

With the filters finished, proceed to processing the tape. In the topic I mentioned at the beginning, the construction of regular expressions for filters and the wget command option is considered in great detail, so I omit them.
The command uses the prepared header and the server's response is stored right there in response.out. By the way, if response.out is missing (first launch or forced update), then nothing terrible will happen.

wget -vS -O - --header="$header" www.lostfilm.tv/rssdd.xml -o response.out | grep -ioe "http.*torrent" | egrep -i "$movies" | egrep "$quality" | while read link;
do
# "&" "&"
link="${link/&/&}"

For convenience, save separately the name of the torrent (cut everything to "&"). We need it to check the downloaded and save the torrent file.
name="${link#*&}"

Now we check by the name the presence of the file in the downloaded directory. If there is no such file, download by reference and save with this name. Then (optionally) copy our client's auto-load directory.
# ./Downloaded,
if [ ! -e Downloaded/$name ]
then
wget -q --header "Cookie: uid=123456; pass=xxxxxxxxxxxx; usess=xxxxxxxxxxx" $link -O "Downloaded/$name"
cp "$name" "$path_to_your_autoload_dir/$name"
fi
done

Cookies for Losstfilm can be ripped out of any browser, and the usess parameter is in your profile at www.lostfilm.tv

Here is the full version of my script:

 #!/bin/bash cd $your/rssdownloader/dir #   If-Modified-Since lastmod="$( grep -m 1 Last-Modified response.out )" #  header header="If-Modified-Since:"${lastmod#*:} #   movies='House.MD|IT.Crowd|Persons.Unknown|Legend.of.the.Seeker|Leverage|Warehouse.13|Futurama' #   quality='\.720p\.|\.HD\.|Persons.Unknown|Legend.of.the.Seeker|IT.Crowd' wget -vS -O - --header="$header" http://www.lostfilm.tv/rssdd.xml -o response.out | grep -ioe "http.*torrent" | egrep -i "$movies" | egrep "$quality" | while read link; do link="${link/&/&}" name="${link#*&}" #     ./Downloaded,   if [ ! -e Downloaded/$name ] then wget -q --header "Cookie: uid=123456; pass=xxxxxxxxxxxx; usess=xxxxxxxxxxx" $link -O "Downloaded/$name" cp "Downloaded/$name" "$path_to_your_autoload_dir/$name" fi done 

Thanks for attention!
PS: I realize that the code is far from ideal, I will be glad to accept tips for improvement!

Update: I think that now, with all the amendments, this script can be called RTM. Thanks to all who participated in bringing it to mind, especially GreyCat with his detailed explanations!

Update 2: If there are problems downloading the tape, wget returns the line we need several times. Therefore, we take only the first found option, using the key "-m 1" (the maximum download once again the old tape).
lastmod="$( grep -m 1 Last-Modified response.out )"

Source: https://habr.com/ru/post/127588/


All Articles