find
, no less a great mv
, and all this is in the soup from bash
.python
and not bother with the “subtleties” of bash
, but I didn’t have such a choice, plus bash has better portability between different environments — just an executable file and no libraries, virtual environments, and related problems with launching on vinaigrette from platforms. Who does not agree - throw in me ... something, just to get it :). So I had to tinker, including adaptation for old CentOS 5 environments. But the dependencies turned out to be at least for new axes (such as Ubuntu 16.04): find mv wc
, for old ones plus date touch
. The rest is all find
process that writes the found files to its own log.APPROVED_SUFFIX
is specified in the config (in order to avoid moving files from an APPROVED_SUFFIX
list), add this suffix to the log names. By default, APPROVED_SUFFIX
empty, i.e. no renaming is required../findNclean
./findNclean help -c
options available can be found in the output ./findNclean help -c
# 100501 <Rule - > result_list = /tmp/.csv search_path = / type = file move_path = /.Trash move_log = /var/log/findNclean - Highway to hell (remix).csv </Rule>
2017-03-13T02:04:02 10853 Start action: find 2017-03-13T02:04:02 10853 Config: ./findNclean.conf.example 2017-03-13T02:04:02 10853 Debug mode enabled 2017-03-13T02:04:02 10853 Config rules count: 2 2017-03-13T02:04:02 10853 Relative paths allowed. Current dir: /home/user/path/to/findNclean 2017-03-13T02:04:02 10853 Effective config content: <Global> DEBUG = true REL_PATH_ALLOW = true APPROVED_SUFFIX = .approved </Global> <Rule1 find/move old unused *.tmp files in /tmp, /var and current user home dir> result_list = ./ancient-tmp.find-example.csv search_path = /var/ search_path = /tmp/ search_path = ~/ type = f name = .*\.tmp accessed < 2017 move_path = ./ move_log = ./ancient-tmp.move-example.csv move_log_err = ./ancient-tmp.move-example.err.csv </Rule1> <Rule2 The same as above but for small non-zero logs> result_list = ./ancient-logs.find-example.csv search_path = ~/ search_path = /var/ search_path = /tmp/ type = f name = .*\.log(\.gz)? size > 0 size <= 2M accessed < 2017 move_path = ./ move_log = ./ancient-logs.move-example.csv move_log_err = ./ancient-logs.move-example.err.csv </Rule2> 2017-03-13T02:04:02 10853 All non-zero size existing result_list files will be preliminary backuped: 2017-03-13T02:04:02 10853 The following commands will be evaluated (you can copy-paste them to bash terminal as is): find -P -O3 -- /var/ /tmp/ /home/user/ -regextype posix-extended \( \! -name \*$'\n'\* -type f -regex .\*/.\*\\.tmp \! -newerat 2017-01-01T00:00:00 -fprintf ./ancient-tmp.find-example.csv %y\;%s\;%TY-%Tm-%TdT%TH:%TM:%TS\;%AY-%Am-%AdT%AH:%AM:%AS\;%u\;%g\;%p\\n \) , \( \! -name \*$'\n'\* -type f -regex .\*/.\*\\.log\(\\.gz\)\? -size +0c \( -size -2097152c -o -size 2097152c \) \! -newerat 2017-01-01T00:00:00 -fprintf ./ancient-logs.find-example.csv %y\;%s\;%TY-%Tm-%TdT%TH:%TM:%TS\;%AY-%Am-%AdT%AH:%AM:%AS\;%u\;%g\;%p\\n \) 2017-03-13T02:04:02 10853 Wait for all background processes completion... 2017-03-13T02:04:06 10853 Completed in 4 sec 2017-03-13T02:04:06 10853 Found items count (filenames with newline was excluded from the search): 0 ./ancient-tmp.find-example.csv 7 ./ancient-logs.find-example.csv 7 total 2017-03-13T02:04:06 10853 WARNING: Some 'find' commands returned non-zero exit code, results could be incomplete or incorrect! 2017-03-13T02:04:06 10853 Exit codes list: 1 2017-03-13T02:04:06 10853 Finished action: find
grep
, by the way, in this context is not entirely correct, because several child processes are written to the same terminal, and sometimes one of the processes starts / continues to write even before the other has completed the line, so some of the useful information may disappear, and some useless to become visible. vim ./ancient-logs.find-example.csv mv ./ancient-logs.find-example.csv{,.approved} mv ./ancient-tmp.find-example.csv{,.approved}
2017-03-13T02:05:32 11141 Start action: mv 2017-03-13T02:05:32 11141 Config: ./findNclean.conf.example 2017-03-13T02:05:32 11141 Debug mode enabled 2017-03-13T02:05:32 11141 Config rules count: 2 2017-03-13T02:05:32 11141 Relative paths allowed. Current dir: /home/user/path/to/findNclean 2017-03-13T02:05:32 11141 Effective config content: <Global> DEBUG = true REL_PATH_ALLOW = true APPROVED_SUFFIX = .approved </Global> <Rule1 find/move old unused *.tmp files in /tmp, /var and current user home dir> result_list = ./ancient-tmp.find-example.csv search_path = /var/ search_path = /tmp/ search_path = ~/ type = f name = .*\.tmp accessed < 2017 move_path = ./ move_log = ./ancient-tmp.move-example.csv move_log_err = ./ancient-tmp.move-example.err.csv </Rule1> <Rule2 The same as above but for small non-zero logs> result_list = ./ancient-logs.find-example.csv search_path = ~/ search_path = /var/ search_path = /tmp/ type = f name = .*\.log(\.gz)? size > 0 size <= 2M accessed < 2017 move_path = ./ move_log = ./ancient-logs.move-example.csv move_log_err = ./ancient-logs.move-example.err.csv </Rule2> 2017-03-13T02:05:32 11141 All non-zero size existing move_log and move_log_err files will be preliminary backuped: 2017-03-13T02:05:32 11141 Start processing the following move lists in background: 0 ./ancient-tmp.find-example.csv.approved 2 ./ancient-logs.find-example.csv.approved 2 total 2017-03-13T02:05:32 11141 Wait for all background processes completion... mv: cannot move '/var/log/openvpn/00004-test.log' to './00004-test.log': Permission denied mv: cannot move '/var/log/openvpn/00009-tmp.log' to './00009-tmp.log': Permission denied 2017-03-13T02:05:32 11141 Completed in 0 sec 2017-03-13T02:05:32 11141 Moved items count (successful canceled failed move_log [move_log_err]): 0 0 0 ./ancient-tmp.move-example.csv ./ancient-tmp.move-example.err.csv 0 0 2 ./ancient-logs.move-example.csv ./ancient-logs.move-example.err.csv 0 0 2 total 2017-03-13T02:05:32 11141 Finished action: mv removed '/dev/shm/findNclean.exch26957.tmp' removed '/dev/shm/findNclean.exch24396.tmp'
2017-03-13T02:06:48 11394 Start action: mv 2017-03-13T02:06:48 11394 Config: ./findNclean.conf.example 2017-03-13T02:06:48 11394 Debug mode enabled 2017-03-13T02:06:48 11394 Config rules count: 2 2017-03-13T02:06:48 11394 Relative paths allowed. Current dir: /home/user/path/to/findNclean 2017-03-13T02:06:48 11394 Effective config content: <Global> DEBUG = true REL_PATH_ALLOW = true APPROVED_SUFFIX = .approved </Global> <Rule1 find/move old unused *.tmp files in /tmp, /var and current user home dir> result_list = ./ancient-tmp.find-example.csv search_path = /var/ search_path = /tmp/ search_path = ~/ type = f name = .*\.tmp accessed < 2017 move_path = ./ move_log = ./ancient-tmp.move-example.csv move_log_err = ./ancient-tmp.move-example.err.csv </Rule1> <Rule2 The same as above but for small non-zero logs> result_list = ./ancient-logs.find-example.csv search_path = ~/ search_path = /var/ search_path = /tmp/ type = f name = .*\.log(\.gz)? size > 0 size <= 2M accessed < 2017 move_path = ./ move_log = ./ancient-logs.move-example.csv move_log_err = ./ancient-logs.move-example.err.csv </Rule2> 2017-03-13T02:06:48 11394 All non-zero size existing move_log and move_log_err files will be preliminary backuped: 2017-03-13T02:06:48 11394 Start processing the following move lists in background: 0 ./ancient-tmp.find-example.csv.approved 2 ./ancient-logs.find-example.csv.approved 2 total 2017-03-13T02:06:48 11394 Wait for all background processes completion... 2017-03-13T02:06:48 11394 Completed in 0 sec 2017-03-13T02:06:48 11394 Moved items count (successful canceled failed move_log [move_log_err]): 0 0 0 ./ancient-tmp.move-example.csv ./ancient-tmp.move-example.err.csv 2 0 0 ./ancient-logs.move-example.csv ./ancient-logs.move-example.err.csv 2 0 0 total 2017-03-13T02:06:48 11394 Finished action: mv removed '/dev/shm/findNclean.exch24681.tmp' removed '/dev/shm/findNclean.exch11002.tmp'
NOTIFICATION_SCRIPT
variable with the path to the script, then parameters will be passed to it with which it can do something, for example, send alerts.IGNORE_FILES_WITH_NEWLINES = false
. read -r -d '' VARIABLE <<'EOF' bla-bla line EOF
this is better than a periodically occurring construct: VARIABLE=$(cat <<'EOF' The bla-bla line EOF )
for it uses the built-in function without creating a subshell and calling an external program. unset DESCRIPTION USAGE CONFIG_USAGE TRICKS # No need help variables yet
bash
version, it looks something like this: printf -v BV '%d%03d%03d' "${BASH_VERSINFO[0]}" "${BASH_VERSINFO[1]}" "${BASH_VERSINFO[2]}" # Bash version in numbers like 4003046, where 4 is major version, 003 is minor, 046 is subminor. ((BV < 3002025)) && echo "WARNING: bash version ${BASH_VERSION} is too old (below 3.2.25)! This app was not tested with too ancient versions!" >&2
printf
in this case is better than BV=$(bla-bla)
, because again the built-in utility will be executed without launching the subshell. In addition, it allows you to format the source data.int
with further comparison as with a decimal number, if you have such a thing lying around in your bins, lay it out!((BV < 3002025))
also better than [[ "${BV}" -lt 3002025 ]]
, since faster, shorter and more human readable.printf %q
and, accordingly, eval
, but this is a kind of price for the support of almost all symbols, although not all places require these constructions, some of which were used by inertia. RULES_JOIN_ALLOW=1 # Allow optimize performance by join Rules with identical search_paths collection if ((RULES_JOIN_ALLOW)); then : # Do something here fi
$'\r'$'\n'
, it is better to use them with the echo
built-in command instead of the often recommended escape sequences for the external utility /bin/echo
, called with echo -e
(the built-in does not support them) . But in quoted expressions, this is somewhat inconvenient, you often have to close and open quotes. Fortunately, the same line break can be assigned to a variable and then no openings-closures readonly EOL=$'\n' # Newline ... approve|confirm) quit 255 "This action is not implemented yet.${EOL}You have to manually overview and approve list of files by appending${EOL}corresponding APPROVED_SUFFIX to the result_list value.${EOL}By default APPROVED_SUFFIX is none, so all result_list will be treated as approved."
read -r -d '' tmp <<-'EOF' This action is not implemented yet. You have to manually overview and approve list of files by appending corresponding APPROVED_SUFFIX to the result_list value. By default APPROVED_SUFFIX is none, so all result_list will be treated as approved. EOF quit 255 "${tmp}" unset tmp
When using spaces, you can also get out: tmp='This action is not implemented yet.'$'\n' tmp+='You have to manually overview and approve list of files by appending'$'\n' tmp+='corresponding APPROVED_SUFFIX to the result_list value.'$'\n' tmp+='By default APPROVED_SUFFIX is none, so all result_list will be treated as approved.' quit 255 "${tmp}" unset tmp
But I like my solution more, and it doesn’t matter whether there are tabs or spaces, although the string becomes indecently long.date
, although in new versions of bash
you can use printf %(datefmt)T
, but without the support of fractions of a second: readonly TIMESTAMP_FORMAT='%Y-%m-%dT%H:%M:%S' ... if ((BV > 4002000)); then # Modern bash versions log() { ## Fast (builtin) but sec is min sample for most implementations printf "%(${TIMESTAMP_FORMAT})T %5d %s\n" '-1' $$ "$*" # %b convert escapes, %s print as is } else # Legacy bash versions log() { ## Slow (subshell, date) but support nanoseconds echo "$(exec -c date +"${TIMESTAMP_FORMAT}") $$ $*" } fi
$(exec -c some command)
to run an external utility in a subshell. This allows you to get rid of the additional layer process between the current shell and the utility, well, just faster (by 0-2% depending on the situation, but with the world on a thread on the cap will be typed). This is what the $(some command)
tree clipping will look like: ├── /bin/bash # └── /bin/bash # └── some command
Together with exec
there will be no extra layer, and with the -c
option the command will be run with an empty environment, which will also speed up the work slightly. if ((BV > 4002000)); then # Modern bash versions ## Set global variable with the name $1 and time format $2 set_timestamp() { printf -v "$1" "%($2)T" '-1' } else # Legacy bash versions set_timestamp() { printf -v "$1" '%s' "$(exec -c date +"$2")" } fi ... set_timestamp ts '%s' # set_timestamp TS '%s' log "Completed in $((TS-ts)) sec"
SECONDS
could be used. It is equal to 0 at the start of the shell and is increased by one every second. If you nullify it, and then refer to it, you will get the execution time of the code segment between these events in seconds. In order not to lose the current value of the script execution time from the very beginning, you can also use intermediate variables: ts=${SECONDS} # TS=${SECONDS} log "Completed in $((TS-ts)) sec"
This, I think, is even more preferable than using the set_timestamp
function. while IFS= read -r line || [[ -n ${line} ]]; do ... done </path/to/some/text/file
Cleaning IFS
for read
necessary so that the initial and final tabs or spaces are not removed;-r
- so that the string is read as is, without interpreting escape sequences of the \t\r
;|| [[ -n ${line} ]]
|| [[ -n ${line} ]]
- so that the code inside the loop is executed even if the last line of the file does not contain a line break, without it read
returns a non-zero code and the cycle ends immediately.born
option): even if the file system (for example ext4) supports it and diligently writes it down every time a new file is created, stat
and, accordingly, find
cannot read it: user@host:~$stat / : '/' : 4096 : 8 /: 4096 : fc00h/64512d Inode: 2 : 25 : (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) : 2017-03-16 15:28:06.121675004 +0700 : 2017-02-27 17:09:09.937545795 +0700 : 2017-02-27 17:09:09.937545795 +0700 : -
but it is! user@host:~$sudo debugfs -R "stat <$(stat -c %i /)>" /dev/ROOTFSDRIVE Inode: 2 Type: directory Mode: 0755 Flags: 0x80000 Generation: 0 Version: 0x00000000:00000086 User: 0 Group: 0 Size: 4096 File ACL: 0 Directory ACL: 0 Links: 25 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x58b3fac5:df87410c -- Mon Feb 27 17:09:09 2017 atime: 0x58ca4c96:1d0273f0 -- Thu Mar 16 15:28:06 2017 mtime: 0x58b3fac5:df87410c -- Mon Feb 27 17:09:09 2017 crtime: 0x5801149b:00000000 -- Sat Oct 15 00:23:39 2016 Size of extra inode fields: 32 EXTENTS: (0):9249
Such is the flaw .mv
used in a slightly unusual way (for an interactive terminal): # , dest_name - mv --backup=t -vT -- "${name}" "${dest_name}" # , dest - , , '--', - mv --backup=t -vt "${dest}" -- "${name}" # , # , ! mv --backup=off -vft "${dest}" -- "${name}"
--backup[=CONTROL]
is a useful thing, in case of some kind of malfunction or unforeseen situation the probability of turning everything backwards greatly increases.--
also useful in scripts (this applies not only to mv
, but also to others), because to some extent decouple from the use of a limited set of characters in the values ​​of variables (for example, if the value starts with a hyphen). In this case, it is not so necessary, because all variables for mv
will first have either /
or ./
to avoid, but it is better to use this feature anyway.which
, it is better to use the built-in hash
: for util in ${DEPENDENCIES}; do hash "${util}" &>/dev/null || quit 1 "ERROR: '${util}' not found on this system" done; unset util
getopts
builtin command: OPTIND=2 # Ignore first argument ACTION even if it has leading hyphen while getopts ':dvc:' OPT; do [[ "${OPTARG:0:1}" = '-' ]] && quit 1 "ERROR: Option argument cannot start with hyphen, got: ${OPTARG}" case "${OPT}" in d|v) DEBUG=1; v=v; ;; c) CONFIG=${OPTARG} ;; :) quit 1 "ERROR: Option -${OPTARG} requires an argument"; ;; *) quit 1 "ERROR: Unrecognized option: -${OPTARG}" esac done
True, the scope of its application is somewhat limited by the fact that it does not support long options (starting with --
). ${var#word} # ${var##word} # ${var%word} # ${var%%word} # ${var/pattern/string} # ${var^pattern} # ${var^^pattern} # ${var,pattern} # ${var,,pattern} #
There is even an undocumented possibility: ${var~} # ${var~~} #
In the code there are many where you can meet. Here is one of the most entertaining for an array of commands for parallel execution: parallel_run "${Commands[@]/#/exec -c }" # Prepending of 'exec -c ' need to avoid additional subshell creation
It inserts ' exec -c
' at the beginning of each cell in the array.$((${MOVE_OK_COUNT[@]/#/+}))
bash
still a lot of pitfalls and you can dive after them until you drown even if you have mastered man bash
and we sew random passersby regularly take off their hat in front of you. BUGS It's too big and too slow.
Therefore, if possible, choose the #!/bin/bash
#!/bin/bash
## Find files by shebang shebang='#!/bin/bash'$'\n' search_path=./ while IFS= read -rd $'\0' filename; do IFS= read -rd '' -N ${#shebang} firstbytes <"${filename}" [[ "${firstbytes}" == "${shebang}" ]] && echo "${filename}" done < <(exec -c find "${search_path}" -type f -size +${#shebang}c -readable -print0)
You already understand the functionality of some of the “excesses” in this set of beeches;)?Source: https://habr.com/ru/post/323828/
All Articles