Hello.
So I became a full user Habr. I want to thank the person who gave me an invite for this article:
Not long ago, I began to comprehend working with Linux (in particular, Ubuntu 8.10), and I had the task of automatically downloading files through the list. “Wget -i” is certainly a good thing, but I wanted more, namely:
- Downloading a list of links from a file
- Downloading multiple files at once
- Transfer failed downloads to a separate list for further retries.
So there is a need for a slightly more advanced tool for downloading files than wget can offer. I decided to implement it with bash. The truth may prevent the lack of experience in writing bash-scripts, but just came the weekend and the hours spent on the material on the topic were not in vain.
The result of my work was the following script:
Update: Thanks to the advice,
zencd used the
wait command to
wait for the downloads to complete.
Update 2: shulc indicated an error: replaced
#! / Binbash with
#! / Bin / sh .
darkk prompted
mktemp to create temporary files.
#!/bin/sh
log_dir= "${PWD}/log"
list_dir= "${PWD}/list"
output_dir=${PWD}
# download_list -
download_list= "${list_dir}/download.lst"
# active_list
active_list= "${list_dir}/active.lst"
# done_list
done_list= "${list_dir}/done.lst"
# error_list
error_list= "${list_dir}/error.lst"
# $timeout -
timeout=5
# $1 $2 $3,
# move_line line source_file dest_file
move_line()
{
tmp_file=`mktemp -t downloader.XX`
echo $1 >> $3
cat $2 | grep -v $1 > $tmp_file
mv $tmp_file $2
}
# , $1
download_thread()
{
thread=$1
# , download.lst error.lst
while [ -s $download_list ] || [ -s $error_list ]
do
# download.lst - error.lst
if [ ! -s $download_list ]
then
read url < $error_list
move_line $url $error_list $download_list
sleep $timeout
fi
read url < $download_list
move_line $url $download_list $active_list
echo "[Thread ${thread}]Starting download: $url"
#
wget -c -o "${log_dir}/wget_thread${thread}.log" -O "${output_dir}/$(basename " $url ")" $url
# wget ( 0 - )
if [ $? -eq 0 ]
then
#
move_line $url $active_list $done_list
echo "[Thread ${thread}]Download successful: $url"
else
# -
move_line $url $active_list $error_list
echo "[Thread ${thread}]Error download: $url"
fi
done
return 0
}
# active.lst
stop_script()
{
#
kill -9 `ps ax | grep $0 | grep -v "grep" | awk '{print $1}' | grep -v $$`
# active.lst
while [ -s $active_list ]
do
read url < $active_list
move_line $url $active_list $download_list
kill -9 `ps ax | grep $url | grep -v "grep" | awk '{print $1}' `
done
}
case "$1" in
"stop" )
echo "Stoping downloader..."
stop_script
echo "Done..."
;;
"start" )
#
if [ ! -e $download_list ];
then
echo "[Error] There is no ${list_dir}/download.lst file"
exit
fi
echo "Starting downloader..."
#
stop_script
# - $2, 1
if [ -z $2 ]
then
threads=1
else
threads=$2
fi
#
i=1
while [ $i -le $threads ]
do
download_thread $i &
downloader_pid= "${downloader_pid} $!"
sleep 1
i=`expr $i + 1`
done
if [ ! -e $error_list ]; then touch $error_list; fi
#
wait $downloader_pid
# ...
echo "All completed"
;;
* )
echo "Usage:"
echo "\t$0 start [number of threads]"
echo "\t$0 stop"
;;
esac
return 0
* This source code was highlighted with Source Code Highlighter .
For the script to work, it is necessary to make it executable and create a file "./list/download.lst" with a list of links to download.
')
Run:
sh downloader start [ ]
or, as
Mezomish correctly noted, so:
./downloader start [ ]
Parameter after 'start' is optional (if you do not specify it - “1” is used).
Those. `sh downloader start 2` will launch the script while simultaneously downloading 2 files.
Stop:
sh downloader stop
or
./downloader stop
At the end of the script using the "Ctrl + C" downloads do not end, because work in the background, so you need to run the above command command to stop downloading.
I decided not to clutter up the script, but in principle, it is not difficult to work with lists (show - display on the screen, add - add download, wipe - clean). And so he is working, albeit with minimal functionality.
Because this is my first bash script, any comments / suggestions / recommendations are very welcome.
Then I will briefly describe the principles of the script, so that those who wish can more easily modify it to fit their needs.
The constants are:
log_dir - folder with wget logs (default "./log")
list_dir - folder with lists download_list, active_list, done_list, error_list (by default "./list")
output_dir - folder where downloaded files will be saved (by default ".")
download_list list of download links
active_list - list of active downloads
done_list - list of completed downloads
error_list - list of failed downloads
timeout - time before trying to download a failed download
At the beginning of work, the script stops its previously running copies, as well as downloads from active_list (of course, if any) with their transfer to download_list. This is done in case the script is rerun before the download is completed by the previously launched process. Further in the loop, the required number of background downloads is created. Each such background thread is implemented by the download_thread () function. Her job is to download files from the list until the download_list and error_list are empty. Thus, the main part of the script, checking these files, will find out whether the jump is over. Before running wget, the link is transferred from the download_list file to the active_list file. After wget completes, the link is transferred, either to done_list (if the return code was '0'), or to error_list (if the return code was not equal to '0').
After everything has been downloaded (download_list and error_list are empty), the script completes its work.
That's all. If you wish, anyone who is a little familiar with scripting can add to it the functions you need.