Appeared almost a year ago,
an article about downloading Rapidshare.com with a favorite treasure trove of
almost legal information that you had saved up aroused public approval by Habr. Lately, Rapid has removed the captcha from herself, made waiting not so long between downloads, in general, with her whole appearance shows that it is very pleasant to work with her. And if it can be done more for free ... so why not ?!
In view of the
latest events, it is recommended to approach the process of receiving and distributing information to other network users very intelligently. And the author of this post is not going to bear any responsibility for the violation by you, dear habrovchane, licensing agreements, copyright, etc.
A little bit of lyrics at the beginning. Interest in automating some processes, for example, automatically saving audio pronunciation files that were pulled from
translate.google.com when learning new words; or getting a direct link from the video storage, for viewing later by mplayer on the tablet nokia N810; forced to look for ways to communicate with web servers via the console, if possible, without resorting to user intervention, i.e. full automation. Perhaps the most popular tool is wget. But they use it most often for banal direct download links to files. We will try to correct this article. But besides wget there are also slightly less known programs, such as for example curl. The latter, say, is missing from the default Linux installation from Canonical Ltd., widely known in narrow circles.
It is with curl and start. Just a few words to show what he is capable of.
')
Examples of using the automation of download can be found on the
English open spaces of all Internet. Let's try on their base to describe the process itself.
Let the first be:
curl -s
The option simply “shuts up” the zyrl, so that it wouldn’t be too much ... The option is really useful if you don’t want to understand the grief of status information. The same option in full form:
curl --silent
.
You can send a POST request to the HTTP server using:
curl -d DATA_TO_SEND
curl --data DATA_TO_SEND
The post is used by the browser when sending form values ​​on the HTML page when the user clicks the submit button. And now we can tell this server which button we pressed, or what we entered in the field on the page, etc. with this parameter.
Immediately for example, I will give a method using curl to get a direct link on rapid:
#!/bin/bash
while read urlline; do
pageurl=$(curl -s $urlline | grep "<form id=\"ff\" action=\"" | grep -o 'http://[^"]*rar')
fileurl=$(curl -s -d "dl.start=Free" "$pageurl" | grep "document.dlf.action=" | grep -o 'http://[^"]*rar' | head -n 1)
sleep 60
wget $fileurl
done < URLS.txt
A little about this bash script - URLs are read line by line from the URLS.txt file, in the pageurl variable we pull the link to the page with the premium / free user selection. A direct link to the file is thrown into the fileurl variable. We receive it by sending to the server that we want to receive everything from life for free, filtering grep'om legal entities, and since there may be several of them, we leave only the first line in the head. We wait 60 seconds and download the file. Here is such a script.
And now the banana ...
Let's try to portray it all with wget.
Script in the studio:
#!/bin/bash
################################################
#Purpose: Automate the downloading of files from rapidshare using the free account
#using simple unix tools.
#Date: 14-7-2008
#Authors: Slith, Tune, Itay
#Improvements, Feedback, comments: Please go to emkay.unpointless.com/Blog/?p=63
#Notes: To use curl instead of wget use 'curl -s' and 'curl -s -d'
#Version: 1.?
################################################
#! -
# Tune curl- ,
# , .rar
#TODO:
#TODO: ,
###
echo "test"
in=input.txt
timer()
{
TIME=${1:-960}
/bin/echo -ne "${2:-""}\033[s"
for i in `seq $TIME -1 1`; do
/bin/echo -ne "\033[u $(printf "%02d" `expr $i / 60`)m$(printf "%02d" `expr $i % 60`)s ${3:-""}"
sleep 1
done
/bin/echo -ne "\033[u 00m00s"
echo
}
while [ `wc -l $in | cut -d " " -f 1` != 0 ]; do
read line < $in
URL=$(wget -q -O - $line | grep "<form id=\"ff\" action=\"" | grep -o 'http://[^"]*');
output=$(wget -q -O - --post-data "dl.start=Free" "$URL");
#
serverbusy=$(echo "$output" | egrep "Currently a lot of users are downloading files. Please try again in.*minutes" | grep -o "[0-9]{1,0}")
if [ "$serverbusy" != "" ]; then
timer `expr $serverbusy '*' 60` " . ." " ..."
continue; # try again
fi
# ( )
longtime=$(echo "$output" | egrep "Or try again in about.*minutes" | egrep -o "[0-9]*")
if [ "$longtime" != "" ]; then
timer `expr '(' $longtime + 1 ')' '*' 60` "." "( ) ..."
URL=$(wget -q -O - $line | grep "<form id=\"ff\" action=\"" | grep -o 'http://[^"]*');
output=$(wget -q -O - --post-data "dl.start=Free" "$URL");
fi
# ( , )
time=$(echo "$output" | grep "var c=[0-9]*;" | grep -o "[0-9]\{1,3\}");
time=$(echo "$time" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//') # trim ws
if [ "$time" = "" ]; then
echo " \"`basename "$line"`\" ".
echo $line >> fail.txt
sed -i '1 d' $in; # input
continue
fi
ourfile=$(echo "$output" | grep "document.dlf.action=" | grep checked | grep -o "http://[^\\]*");
timer $time "" " `basename "$ourfile"`";
if ! wget -c $ourfile; then
echo ' . .'
else
sed -i '1 d' $in; # input
fi
done
if [ -e fail.txt ]; then
mv fail.txt $in # .
fi
Very nice script. I found for myself a couple of interesting implementations. So leave it as it is, with the indication of the authors. Made only full translation of comments.
We throw the list of links in input.txt and run the script - it will tell us what it does. If the file could not be written, it is sent to the fail.txt file. When all input.txt is passed, the fail.txt file is written back to the input, and the downloaded links are deleted.
Good luck downloading your backups.
PS:- Russified phrases in the script;
- I noticed a bug: if the link is thrown into the file, without moving to a new line - the script does not want to read such a line. Exit: add an empty line to the end of the file.
- For especially lazy copy-paste - I post the script separately . Yesterday (27 Oct 09) it was not available, because on the server rose django framework. Under the new link the file will be available for a long time.