Extending wget features

Hello.

So I became a full user Habr. I want to thank the person who gave me an invite for this article:

Not long ago, I began to comprehend working with Linux (in particular, Ubuntu 8.10), and I had the task of automatically downloading files through the list. “Wget -i” is certainly a good thing, but I wanted more, namely:

Downloading a list of links from a file
Downloading multiple files at once
Transfer failed downloads to a separate list for further retries.

So there is a need for a slightly more advanced tool for downloading files than wget can offer. I decided to implement it with bash. The truth may prevent the lack of experience in writing bash-scripts, but just came the weekend and the hours spent on the material on the topic were not in vain.

The result of my work was the following script:

Update: Thanks to the advice, zencd used the wait command to wait for the downloads to complete.
Update 2: shulc indicated an error: replaced #! / Binbash with #! / Bin / sh . darkk prompted mktemp to create temporary files.

#!/bin/sh log_dir= "${PWD}/log" list_dir= "${PWD}/list" output_dir=${PWD} # download_list - download_list= "${list_dir}/download.lst" # active_list active_list= "${list_dir}/active.lst" # done_list done_list= "${list_dir}/done.lst" # error_list error_list= "${list_dir}/error.lst" # $timeout - timeout=5 # $1 $2 $3, # move_line line source_file dest_file move_line() { tmp_file=`mktemp -t downloader.XX` echo $1 >> $3 cat $2 | grep -v $1 > $tmp_file mv $tmp_file $2 } # , $1 download_thread() { thread=$1 # , download.lst error.lst while [ -s $download_list ] || [ -s $error_list ] do # download.lst - error.lst if [ ! -s $download_list ] then read url < $error_list move_line $url $error_list $download_list sleep $timeout fi read url < $download_list move_line $url $download_list $active_list echo "[Thread ${thread}]Starting download: $url" # wget -c -o "${log_dir}/wget_thread${thread}.log" -O "${output_dir}/$(basename " $url ")" $url # wget ( 0 - ) if [ $? -eq 0 ] then # move_line $url $active_list $done_list echo "[Thread ${thread}]Download successful: $url" else # - move_line $url $active_list $error_list echo "[Thread ${thread}]Error download: $url" fi done return 0 } # active.lst stop_script() { # kill -9 `ps ax | grep $0 | grep -v "grep" | awk '{print $1}' | grep -v $$` # active.lst while [ -s $active_list ] do read url < $active_list move_line $url $active_list $download_list kill -9 `ps ax | grep $url | grep -v "grep" | awk '{print $1}' ` done } case "$1" in "stop" ) echo "Stoping downloader..." stop_script echo "Done..." ;; "start" ) # if [ ! -e $download_list ]; then echo "[Error] There is no ${list_dir}/download.lst file" exit fi echo "Starting downloader..." # stop_script # - $2, 1 if [ -z $2 ] then threads=1 else threads=$2 fi # i=1 while [ $i -le $threads ] do download_thread $i & downloader_pid= "${downloader_pid} $!" sleep 1 i=`expr $i + 1` done if [ ! -e $error_list ]; then touch $error_list; fi # wait $downloader_pid # ... echo "All completed" ;; * ) echo "Usage:" echo "\t$0 start [number of threads]" echo "\t$0 stop" ;; esac return 0 * This source code was highlighted with Source Code Highlighter .

For the script to work, it is necessary to make it executable and create a file "./list/download.lst" with a list of links to download.
')
Run:
sh downloader start [ ]
or, as Mezomish correctly noted, so:
./downloader start [ ]

Parameter after 'start' is optional (if you do not specify it - “1” is used).
Those. `sh downloader start 2` will launch the script while simultaneously downloading 2 files.

Stop:
sh downloader stop
or
./downloader stop

At the end of the script using the "Ctrl + C" downloads do not end, because work in the background, so you need to run the above command command to stop downloading.

I decided not to clutter up the script, but in principle, it is not difficult to work with lists (show - display on the screen, add - add download, wipe - clean). And so he is working, albeit with minimal functionality.

Because this is my first bash script, any comments / suggestions / recommendations are very welcome.

Then I will briefly describe the principles of the script, so that those who wish can more easily modify it to fit their needs.

The constants are:
log_dir - folder with wget logs (default "./log")
list_dir - folder with lists download_list, active_list, done_list, error_list (by default "./list")
output_dir - folder where downloaded files will be saved (by default ".")
download_list list of download links
active_list - list of active downloads
done_list - list of completed downloads
error_list - list of failed downloads
timeout - time before trying to download a failed download

At the beginning of work, the script stops its previously running copies, as well as downloads from active_list (of course, if any) with their transfer to download_list. This is done in case the script is rerun before the download is completed by the previously launched process. Further in the loop, the required number of background downloads is created. Each such background thread is implemented by the download_thread () function. Her job is to download files from the list until the download_list and error_list are empty. Thus, the main part of the script, checking these files, will find out whether the jump is over. Before running wget, the link is transferred from the download_list file to the active_list file. After wget completes, the link is transferred, either to done_list (if the return code was '0'), or to error_list (if the return code was not equal to '0').
After everything has been downloaded (download_list and error_list are empty), the script completes its work.

That's all. If you wish, anyone who is a little familiar with scripting can add to it the functions you need.

Source: https://habr.com/ru/post/51302/

All Articles

Extending wget features

So I became a full user Habr. I want to thank the person who gave me an invite for this article:

More articles: