# 整合的檔案及搜尋功能之 bash 腳本
## 已改版,0.9 版在這:
https://hackmd.io/@kenwoo777/H1fXjcuk3
##
## 請注意最後版本 0.7r-2 不改版了,但針對此版的 bug 還是要修:請將第一行註解掉換成二三四行,那麼嚴重過慢的 'ct' 功能便能正常了!
(跑 'ct',我測試過 5000 支檔案沒問題,8分鐘。也確認過 23000 支是上界,再高 bash 就崩潰了,而實際上限多少請自行測試。有過多的檔案就是分成多份多次執行)
以下幾行,請自行替換
```sh=
#if [[ "$b_pathfile" = "${a_map}.*" ]]; then # should be [[ $b_pathfile =~ ${a_map}.* ]],
# however it is still wrong since some special characters in path/filename unable escape.
tmpx=$( expr match "${b_pathfile}" "\(${a_map}\)" )
if [ $? -eq 0 ]; then
```
### 狀態: 目前已將功能整合完善了,測試過也尚未遇到什麼問題。但有一些限制請見腳本內容之 limitation。當前版本 0.7r-2,終版,除非有 bug ^^"
### 有任何糾錯或更好/更多功能擴展/更有效率/更高容錯性等的看法想法做法歡迎批評指教建議,謝謝。
以下為 0.7 版功能,0.7r-2 版新增 'ct' 功能簡述如下:
我們可以先用 'cs' 功能產生例如某備份硬碟的所有檔案的 map 檔。以後若需要抓取出某些檔案就可以單從此 map 檔中搜尋。把結果的那些檔案路徑集中在另一份檔案,再執行 'ct' 功能自動地把檔案都複製出來。
此功能效能很低,歡迎提出改進的方法。
## 功能:
### 1. 可搜尋工作目錄及子目錄下的,列出所有有 $1 之 grep 型式的 pattern 的(檔名的)檔案。且位於壓縮檔內,的再壓縮檔內的檔案都可被列出來。
### 2. 其次若再指定 $2 pattern,則還會去撈檔案內容。且此 $2 可使用像 “tool -i -n -v -A5” 包含了 grep 選項;不過選項必須像這樣後綴。
### 3. 再次之若再指定 $3,則會將滿足 $2 的那些檔案複製到 $3 指定的絕對路徑,若檔案重覆會自動改名。
### 4. 若指定 $2 為關鍵字 “cp”,則直接複製。那要搜 “cp” 怎麼辦?仍是可使用 regex 避開且達同樣目的的。
### 5. 檔案複製會伴隨產生路徑檔。當某檔藏在壓縮檔內的再壓縮檔內的再…特別有用。路徑檔內記錄該檔路徑,可手動或供其他程式進一步追索用。
### 6. $2 = ‘cq’ 引數,只複製路徑檔而不複製來源檔。
### 7. 0.7版,已修正了兩處 bugs。引數 'cs' 的邏輯不正確。及大概前一兩版的增刪修導致其中一選項該複製目的檔而未複製。
### 8. 0.7 版已追加 'cr' 於路徑檔內嵌入目標檔的 checksum, 'cs' 標準輸出含 目標檔 checksum,兩選項功能。若嫌 cs 太慢,可註解掉一行 checksum 的計算。
### 9. 0.7 版,不再依賴路徑檔,只要 $2 不為空,便會標準輸出完整且正確索引的目標檔路徑。關鍵字 [[[FoundHere]]],但仍保留了 cp/cq/cr/cs 四項原有功能。
### 10. 檔名不再有限制,請見 limitation 說明。
### 11. 程式目的希望將標準輸出的結果作再次的運用,故其應沒遺漏什麼重要資訊,例如 error 之類的,可再次地 grep 收集出來。並有刻意加關鍵字或範例(請參考內文範例)以期更便利運用。
## 以下為 0.7r-2 腳本內容
```sh=
#!/bin/bash
# version 0.7r-2. 20230226. ken woo. copyleft.
# from 0.2:
# 1. bug fixed [sed -n '/Date/,/files/p'] to [sed -n '/Date/,/\lfile/p'].
# 2. found the $IFS bug but not yet resolved.
# from 0.2a:
# 1. the altering $IFS was changed into read -r for grep processing [grep $2 -- "$ctx"].
# from 0.2b:
# 1. added time elapsed.
# 2. added a 'cp' identifier for use in $2 if want to copy to $3 directly without context match just only filename match.
# 3. added to generate a path file besides of a target file, hence this target file could be easily found manually.
# from 0.3: only add some comments.
# from 0.3r:
# 1. fixed [sed -n '/Date/,/\lfile/p'] even it affects nothing(false negative).
# 2a. changed the sed command into correct one since it sometimes failed match, [tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )].
# 2b. it only affects the path in "path file" which would lose some intermediate path-string. the correct one uses bash internal cmd.
# 3. added "$2"=="cq" for only copying path files without target files. (since this way is simpler than outputing via std-out).
# from 0.4:
# 1. fixed [ while [ -e "${3}${k}" ] ] not concerning about path file.
# 2. added "$2"=="cr" for adding sha1sum checksum file *.cksm along with path file. to separate is easy access, e.g., "cat *.cksm".
# from 0.5:
# 1. in [ tmp_1=$( sha1sum "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' ) ],
# mind changed. use path file also to have checksum when 'cr' is specified. and for less disk-w and for the filename weakness.
# from 0.5r:
# 1. sha1sum -b might be faster.
# 2. might have fixed the limitation-12 issue by using -- and prevented from using sed',,'.
# 3. it is more confortable to output path string rather than path file. added $2=='cs' for this purpose. keyword is [[[Path Here]]].
# from 0.6:
# 1. the prior versions have a spawned bug that having $2 search-pattern with $3 path but no copy events.
# 2. the 'cs' option of the previous version has wrong logic.
# 3. so, for fixing 1 & 2, it is better to have had made moderate changes of code for either keeping c[pqrs] options still effect or
# making output to be able to show the exact path by prefixing a [[[FoundHere]]] line for if $2 is just set. then which means,
# the original 'cs' function is no longer needed, thereby,
# 'cs' would do new behavior of adding a line checksum between [[[FoundHere]]]-line and path-line.
# so, these new functions are updated at the next, "purposes".
# 4. so, this version is 0.7 would be the final release.
# from 0.7:
# 1. added 'ct' option. please refer to example 15.
# 2. the arguments are exclusively changed for 'ct' option from #7 to #9.
# 3. really final release.
# 1. 0.7r-1 from 0.7r: forget to remove tmp folder $arcpath after done XD
# 1. 0.7r-2 from 0.7r-1: add quote [[ ! -e "$a_rep" ]]. note that after by my test, 'ct' performance is worse.
# purposes:
#
# only tested on Ubuntu.
#
# 1a. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and
# including which packed within archive files even within archive of archive of archive of...
# 1b. all results are via standard output by any one of these functions.
# 1c. by this one of functions, the "path" is not the exact location if the file is within archive;
# for demanding the exact path, using the others following functions all of which could do.
#
# 2a. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and
# including options, e.g., $2==".*void\ \+main\(.*\) -i -n -B5 -A5". (note, the options in trailing is must).
# 2b. if found, would be outputed in the following format:
# the-matched-line(s)
# ==========>>>>>
# [[[FoundHere]]]
# (this line is the file checksum only exists by using the 'cs' function; see 5c.)
# the-exact-path/filename
# <<<<<==========
#
# 3. the same as 2., and copy the matched files with an accompanying path file to the assigned absolute-directory by "$3".
#
# 4a. the same as 3., but directly copies without specifying context pattern in $2, instead, by setting $2 as 'cp' for this purpose.
# note, you can try ".*" in $2 in comparison with "cp".
# 4b. another is 'cq' for copying path file only.
# 4c. yet another 'cr', 'cs'(see 5.).
#
# 5a. another path file(*.path) is unconditionally coming up with the copied target file.
# 5b. use $2=='cr' to not only 'cq' but also add checksum of the target file in the path file.
# 5c. use $2=='cs' to std-output file checksum; see 2b.-format; and no any copy event, only std-output.
#
# 6. note that generating the path file is the old-school function, kept for compt., except 1c., all done well via std-output.
#
# 7. please refer to example 15. it does not realize about deflating upon these specific target files in an archive, instead,
# it extracts whole archive. however, each concerning archive only extracted once since the list is sorted before processing.
# required:
# 1. apt install p7zip-full p7zip-rar
# 2. in the /tmp/ dir for extracting files for temporary use, auto deleted after done.
# known limitation:
# 1. the search pattern is in grep format.
# 2. search twice if the soft-link and its real-path are both in the searching-paths.
# 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script into completely.
# 4. it is used by "7z -px" to skip password protected archives.
# 5. it is used by "7z -aou" to auto rename repetitive extracted files.
# 6. token 'null' is used, so it is not for pattern search. so is the 'cp', 'cq', 'cs', and 'cr' in $2.
# 7. file descriptor 3 is used for identification purpose, so do not use it before running this script.
# 8. except parameters $1, $2, $3 are for user, the other params are only for internal use:
# "$4" passing a specific archive file for extracting and searching.
# "$5" script path.
# "$6" script name.
# "$7" progressive path for inheriting.
# 9. $2, the content pattern, must have the options in trailing. that is like as "the_pattern -opt1 -opt2".
# 10. the target path for copying can not be in the searching path or it might be searched and cause loop.
# 11a. $2 has 5 identifiers 'null' and 'c[pqrs]' can not be used, however, still could be the pattern by e.g., "nul[l]\{1\}".
# 11b. note, weird: [grep nul[l]\{1\} -n --color a.txt] is failed and [this_script ".*a\.txt.*" "nul[l]\{1\} -n --color"] is success.
# 11c. $1 needs full qualified name, that is, if "a.txt" is exactly the file to search, it still needs to be like as ".*a\.txt.*"
# 11d. the 11c might be the reason of prefixed path.
# 12a. known fault: the filenames to inspect if containing 1) comma "," 2) leading hyphen "-" etc., either made sed fault or bash fault.
# 12b. this issue might be fixed beginning from ver.0.6.
# example1: [this_script ".*\.pdf$"]
# find out all the "pdf" files where locate including sub-dirs and within archive files.
#
# example2: detach the task from shell for free run.
# [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown]
#
# example3: [this_script ".*\.c$" ".*printf.* -n -B3 -A5"]
# find out all the "*.c" files which have pattern '.*printf.*' in it. and
# also dump out these line-number and the before 3 lines and the after 5 lines.
#
# example4: [this_script ".*\.c$" ".*printf.* -n -B3 -A5" /home/user/Desktop/target]
# the same as example3 and copy matched files(with path files) to folder /home/user/Desktop/target/.
#
# example5: [this_script .*\.txt$ "[^[:blank:]]\+Group[[:blank:]]\{1\} -n -i -A5 -B4 --color" /home/user/Desktop/temp/]
#
# example6: [this_script .*\.txt$ cp /home/user/Desktop/temp/]
# all the found *.txt files with .path files will directly copy to /home/user/Desktop/temp/.
#
# example7: "the_copied_target.file" has a companion file "the_copied_target.file.path" unconditionally. so, rm *.path if not required.
#
# example8: [cat outputfile | sed '/^\(-\|=\)\+>\+\|^<\+=\+\|^[\(Open \)\(ERROR\)\(WARNINGS\)\(\[\[\[Found\)]\|^Is not archive/d;/^$/d;s/\(\/.*\/\)\([^\/]\+$\)/\2/']
# extract only filenames from result output. note, 7z-error/-open-error files may just not supported, so need to treat by other ways.
#
# example9: [this_script ".*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|bz2\|lzh\|txt\|ex[e_]\{1\}\|bat\|msi\|sys\|dll\|ocx\|bin\|pdf\|djvu\|html\?\|mht\|chm\|css\|js\|ps\|iso\|nrg\|gho\|cab\|avi\|rm\(vb\)\?\|jpe\?g\|bmp\|ico\|gif\|mp[34]\{1\}\|mpe\?g\|png\|doc\|ini\|hlp\|inf\|ttf\|pptx\?\|pps\|xlsx\?\|reg\|dat\|db\|bak\|log\|asm\|inc\|c\(pp\|xx\)\?\|h\(pp\)\?\|lib\)$" &> output.txt]
#
# example10: [find . -iname "*.path" -exec cat {} >> output.txt \;] if huge amount of files cause [cat * >> output.txt] failed.
#
# example11: [./this_script ".*\.\(c\|cpp\)$" "(?s)(int|void)[[:space:]]*main[[:space:]]*\([^\)]*\)[[:space:]]*{.* -izaPo --color" | sed 's/\x0//' > Dump.txt]
# dig out all the main function definitions.
# note the [sed 's/\x0//'], not only 1 occurrence for using this script however, only 1 occurrence when using single grep, that is,
# [sed '$s/\x0//'] is enough.
#
# example12: using 'cs', if do not want to have checksum, comment out this sha1sum generating line in if-cs code block.
#
# example13: [sed '/^\(-\|=\)\+>\+\|^<\+=\+\|^\[\[\[Found\|^[0-9a-fA-F]\+$\|^\/.*/d;/^$/d;=' native.txt | sed 'N;s/\n/ /' > output.txt]
# suppose a native.txt is generated by 'cs' option, then we want to check out(extract) all of the errors. line number is prefixed for grouping-able.
#
# example14: [ grep "^\/" native.txt > output.txt]
# suppose a native.txt is generated by 'cs' option, extract all the path/filename is simple.
#
# example15: [ ./this_script full_paths_in_this.file "ct" target_path_for_copy_to ]
# since to get a file each time by a thorough search upon some location is funny. better method is to generate a map upon it.
# that is, the 'cs' option with $1="anything_interested" done so.
# suppose a native.txt is generated by 'cs' option; which is a text file storing all interested files of full-path/filenames.
# then we get some full-paths of present of interests out of this map and collect to the file "full_paths_in_this.file",
# each full-path in one-another line. then this "ct" option would extract/copy these files to the "target_path_for_copy_to".
SERF_VER="0.7r-2"
function MYGUBED() {
return;
echo -e "\n<debug>"
for var in "$@"; do
[[ $var =~ ^-+ ]] && echo ${var#-}
done
for myvars in "$@"; do
[[ $myvars =~ ^-+ ]] && continue
echo "<debug> \$$myvars: ${!myvars}"
done
echo
}
function mycp1() { # $1 is the target path; std-input is the src file path
while read -r in; do
if [ -e "$in" ]; then
i=0;
j=$( basename -- "$in" )
k=$j
while [ -e "${1}${k}" ] || [ -e "${1}${k}.path" ]; do
(( i=$i+1 ));
k="$j($i)"
done
cp -- "$in" "${1}${k}"
echo "$in" > "${1}${k}.path"
echo " "; # pass forward. it is in order to keep the source ordered
else
echo "$in"; # pass forward. it must be invalid or in archive
fi
done
}
function mycp2() { # $1 is the target path; $2 is formal path; std-input is the src file path
while read -r in; do
if [ -e "$in" ]; then
i=0;
j=$( basename -- "$in" )
k=$j
while [ -e "${1}${k}" ] || [ -e "${1}${k}.path" ]; do
(( i=$i+1 ));
k="$j($i)"
done
cp -- "$in" "${1}${k}"
echo "$2" > "${1}${k}.path"
return 0;
else
return 1;
fi
done
}
# target files which could direct copy are copied beforehand and other than files in the $lines which need more treatments.
# the $lines is a sorted lines in number, e.g., [7,9,17,5,2,], each line contains at least 1 intermediate archive path/name,
# while processing, these real archive files locate at the course of real path(call it formal path) while is in root task,
# or at the course beginning from later generated tmp dir while is in sub tasks;
# $a_rep for handling these 2 conditions to locate either real files.
# $a_map is a part of the formal path corresponding to $a_rep in order to one-shot replacement.
# parameters are using pass-by-var-name for global vars could be used both into this function and for return. bash 4.3 later.
# at each round, $lines fans out the line(s) having a first common archive or is standalone. so it is shrinked after called;
# whose are moved to $a_num(having common archive). $a_rep is "" when is in root task for handling formal path;
# when in a child task it is an extracted archive root path via tmp path, e.g., /tmp/DataFile.zip.XXXXXXXX/,
# and it corresponds to the $a_map which is a part of formal path from the formal-paths source-file which is $1.
# and $a_map finally becomes the first-intermediate archive path for return. $a_num is the fan out line(s) for sub-task call.
# note the 3 vars $3/$4/$5 should be passed by var-names.
# $1 is the collect of formal paths file, $2==$a_rep the extracted archive tmp path, $3==$a_map, $4==$a_num, $5==$lines.
# after called, $3==part of formal path advanced to the next intermediate archive, $5==the $lines subtracted by $4==$a_num.
# keep in mind at this entrance moment, files are ready for looking up.
function myFanOut() {
a_rep=$2
local -n a_map=$3
local -n a_num=$4
local -n lines=$5
a_num=$( sed -n 's/\(^[0-9]\+\),.*/\1/p' <<< $lines ) # get the leading number "i"
a_pathfile=$( sed "${a_num}q;d" "$1" ) # exactly the i-th line in formal path file
lines=${lines#?*,} # discard it from $lines
a_pathfile="${a_rep}${a_pathfile#$a_map}" # the real existing location of this file
# after this loop, $a_rep would be empty or an imtermediate archive
while true; do
tgt_1="$a_rep"
a_rep=$( expr match "$a_pathfile" "\(${a_rep}/[^/]\+\).*" ) # level by level cd into.
if [ "$tgt_1" == "$a_rep" ]; then # tail; true should copied. false wrong formal.
if [ -e "$a_rep" ]; then
echo "found but error\; nothing done(1 logic error): $a_num $a_pathfile" >&2
else echo "file not found(2 formal path error): $a_num $a_pathfile" >&2
fi
a_rep=""
break
fi # so why thus not see
if [[ ! -e "$a_rep" ]]; then # where to stop
if [ -f "$tgt_1" ]; then # where we want
a_rep="$tgt_1"
break;
fi
a_rep="" # ! possibly 2 cases non-/existing directory,
echo "file not found(3): $a_num $a_pathfile" >&2 # ! which might caused by extraction error,
break # ! and 3rd case is wrong formal path.
fi
done # so why thus not see
[[ "${a_rep}" == "" ]] && return 1;
a_map="${a_map}${a_rep#$2}" # cast back to advanced incremented formal path.
a_num="${a_num}," # into correct format
while [[ $lines != "" ]]; do
b_num=$( sed -n 's/\(^[0-9]\+\),.*/\1/p' <<< $lines ) # collect all the same intermediate formal paths.
b_pathfile=$( sed "${b_num}q;d" "$1" )
if [[ "$b_pathfile" = "${a_map}.*" ]]; then
a_num="${a_num}${b_num},"
lines=${lines#?*,}
else break
fi
done
return 0;
}
lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root
if [ $? -eq 0 ]; then # if not the root task
if [ $# -ge 7 ]; then # if intends for fork-task
inipath=$( pwd )
arcname="$( basename -- "$4" ).XXXXXXXX"
arcpath=$( mktemp -t -d -- "$arcname" )
SCRIPT_DIR="$5"
SCRIPT_NAME="$6"
INI_DIR="$7"
TMP_DIR="$arcpath"
# AAAAA for handling the 'ct' option
if [ "$2" = ct ]; then
7z -px -aou -o"$arcpath" x -- "$4" 1>/dev/null # no masking error since what had handled was on the map which succeeded.
d_num=""
all_num=$9
while [[ $all_num != "" ]]; do
c_num=$( sed -n 's/\(^[0-9]\+\),.*/\1/p' <<< $all_num )
c_pathfile=$( sed "${c_num}q;d" "$1" )
c_formalpath="$8${4#$INI_DIR}"
za=$4; MYGUBED "-==========================" "-$( echo ${arcpath}${c_pathfile#$c_formalpath} )" \
c_num c_pathfile c_formalpath za INI_DIR all_num
d_pathfile="${arcpath}${c_pathfile#$c_formalpath}"
all_num=${all_num#?*,}
mycp2 "${3}" "${c_pathfile}" -- <<< "$d_pathfile" || d_num="${d_num}${c_num},"
done
while [[ $d_num != "" ]]; do
arg_3=$c_formalpath
arg_4=""
myFanOut "$1" "$arcpath" arg_3 arg_4 d_num
if [ $? -eq 0 ]; then
MYGUBED "-------------------------" "-$( echo $arcpath${arg_3#$c_formalpath} )" arcpath arg_3 arg_4
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$arcpath${arg_3#$c_formalpath}" "$SCRIPT_DIR" "$SCRIPT_NAME" \
"$arcpath" "$c_formalpath" "$arg_4"
fi
done
rm -rf "$arcpath"
exit 0
fi
# VVVVV for handling the 'ct' option
7z -px -aou -o"$arcpath" x -- "$4" 2>/dev/null 1>/dev/null
cd -- "$arcpath"
else # something wrong
exit 1
fi
else # the root task, generates fd3
# trap prevents from aborted by user and $IFS not yet recovered
IFS_OLD="$IFS"
function for_trap_exit() {
echo -e '\n\nuser abort\n';
[[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up;
# be careful not to clean-up /tmp/ here or get into catastrophe since I met.
exec 3>&-;
exit 1;
}
trap for_trap_exit SIGINT SIGKILL SIGSEGV
# wrong arguments
if [ $# -lt 1 ] || [ $# -gt 3 ]; then
echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]";
exit 1
fi
# no 7z bin file
7z | grep -i "copyright"
if [ $? -ne 0 ]; then
echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first.";
exit 1
fi
# set global vars
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" )
TMP_DIR="presently working directory"
INI_DIR=""
inipath=$( pwd )
arcpath=$( pwd )
# rearrange arguments
z1="$1"
if [ $# -eq 1 ]; then
z2='null'
z3='null'
elif [ $# -eq 2 ]; then
if [ "$2" = 'null' ]; then
echo '"null" not applicable.'
exit 1
elif [[ "$2" = c[pqrt] ]]; then
echo "the absolute path is needed."
exit 1
fi
z2="$2"
z3='null'
else
if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then
echo '"null" not applicable.'
exit 1
fi
z2="$2"
z3="$3"
echo "$z3" | grep -q "^/.*"
if [ $? -eq 0 ] && [ -d "$z3" ]; then
# try to padding "/"
echo "$z3" | grep -q ".*/$"
[[ $? -ne 0 ]] && z3="${z3}/"
else
echo; echo "the absolute path is needed."; echo;
exit 1
fi
fi
set --
set "$z1" "$z2" "$z3"
exec 3>&1
echo -e "\nscript version: $SERF_VER\n"
echo "cmd: $SCRIPT_DIR/$SCRIPT_NAME '$1' '$2' '$3'"
echo "pwd: $inipath"
echo; echo $( date ); echo;
elapse_time_b=$SECONDS
# AAAAA for handling the 'ct' option
if [ "$2" = ct ]; then
# sorting into sorted lines by e.g., the result of [7,9,17,5,2,].
# then use [sed -n 's/\(^[0-9]\+\),.*/\1/p' <<< $var] to get a line; [sed "${i-th}q;d"] to map a line;
# $var=${var#?*,} to del a line; inc_p=$( expr match "$full_p" "\(${inc_p}/[^/]\+\).*" ) to get the incremental path;
# especially note that by my test, 23000 path lines is about to the upper bound could be passed as the argument $9.
sorted_lines=$( cat "$1" | mycp1 "${3}" -- | sed '=;s/\(.*\/\)[^\/]\+$/\1/' | sed 'N;s/\n/ /' |\
sed '/^[0-9]\+[[:blank:]]\+$/d' | sort -k2 | sed 's/\(^[0-9]\+\).*/\1,/' | sed ':a;N;$!ba;s/\n//g' );
while [[ $sorted_lines != "" ]]; do
arg_3=""
arg_4=""
arg_5=$arg_3
tmparc=""
myFanOut "$1" "" arg_3 arg_4 sorted_lines
if [ $? -eq 0 ]; then
# next step is to recursively handle intermediate archive either the one next to the other(while-loop) or
# the one advanced to the other(recursive) that is what called fanout.
# as for the following call, arg_3 for part-formal-path & arg_4($9) for same-archive candidates are not enough,
# arg_5 is as the $8 to be the old-arg_3 inevitably needed too; so does a tmp-dir $7; to be more specific,
# the below $arg_3 should be the ${root/tmparc}${arg_3#$arg_5}; $7==$tmparc.
# so, commence from this point, the tasks grow number of vars from 7 to 9.
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$arg_3" "$SCRIPT_DIR" "$SCRIPT_NAME" "$tmparc" "$arg_5" "$arg_4"
fi
done
exec 3<&- # done the search and release fd3
echo; echo $( date ); echo;
(( time_elapsed=$SECONDS-$elapse_time_b ));
echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n";
exit 0;
fi
# VVVVV for handling the 'ct' option
fi
if [ "$2" != $'null' ]; then # needs for context search or direct-copy
find -L ~+ -type f -regextype grep -iregex "$1" | while read -r ctx;
do
if [ "$2" = 'cs' ]; then # only standard output the checksum and the path
# generate the path of the copied file
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$ctx" | sed 's,'"$TMP_DIR"',,' ) # remove the "/tmp/arc-name"
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${ctx##"$TMP_DIR"}
echo -e "==========>>>>>\n[[[FoundHere]]]";
echo -e "$( sha1sum -b -- "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1/' )";
echo -e "${INI_DIR}${tmp_str}\n<<<<<==========\n"; # wrap by considering if too long path.
# note, by the grep, the $2 not quoted for options injection.
elif [ "$2" = 'cp' ] || [ "$2" = 'cq' ] || [ "$2" = 'cr' ] || ! ! grep $2 -- "$ctx"; then
tmp_str=${ctx##"$TMP_DIR"}
echo -e "==========>>>>>\n[[[FoundHere]]]";
echo -e "${INI_DIR}${tmp_str}\n<<<<<==========\n";
if [ "$3" != $'null' ]; then
i=0;
j=$( basename -- "$ctx" )
k=$j
while [ -e "${3}${k}" ] || [ -e "${3}${k}.path" ]; do
(( i=$i+1 ));
k="$j($i)"
done
# 1) unconditionally path file; 2) match && nonempty $3 or 'cp' will copy target file;
# 3) 'cr' for adding checksum in path file.
tmp_1=""
if [ "$2" = 'cr' ]; then
tmp_1=$( sha1sum -b -- "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' )
elif [ "$2" != 'cq' ]; then
cp -- "$ctx" "${3}${k}" # copy the matched file to the target folder
fi
echo "${tmp_1}${INI_DIR}${tmp_str}" > "${3}${k}.path"
fi
else echo "=====>>>>> $ctx";
fi
done
else # only for file name search
find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs
fi
t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 )
IFS_OLD="$IFS"
IFS=$'\n'
for tt in $t; do
echo "----->>>>> $tt"
# the following line is incorrect but is ok since it through to the end(where meant to be) and "\lfile" meant to be not found.
# the new replacement command however weird, when use 1 instead of 0 would go wrong.
#ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' )
ttt=$( 7z -px l -- "$tt" | tac | sed '0,/fi/d' | tac | sed '0,/Dat/d' )
v=$( echo "$ttt" | grep -m1 -q -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" )
if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside
IFS="$IFS_OLD" # branch out then need recovery
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${tt##"$TMP_DIR"}
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}"
IFS=$'\n' # branch in then need recovery
elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search
echo "$ttt" | grep -m1 -q -i "$1"
if [ $? -eq 0 ]; then
IFS="$IFS_OLD"
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${tt##"$TMP_DIR"}
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}"
IFS=$'\n'
fi
else
echo "$ttt" | grep -i "$1"
fi
done
IFS="$IFS_OLD"
if [ "$inipath" != "$arcpath" ]; then # if not the root
cd -- "$inipath"
rm -rf -- "$arcpath"
else
exec 3>&- # done the search and release fd3
echo; echo $( date ); echo;
(( time_elapsed=$SECONDS-$elapse_time_b ));
echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n";
fi
# end of sh
```
## 以下為 0.7 版腳本內容
```sh=
#!/bin/bash
# version 0.7. 20230219. ken woo. copyleft.
# from 0.2:
# 1. bug fixed [sed -n '/Date/,/files/p'] to [sed -n '/Date/,/\lfile/p'].
# 2. found the $IFS bug but not yet resolved.
# from 0.2a:
# 1. the altering $IFS was changed into read -r for grep processing [grep $2 -- "$ctx"].
# from 0.2b:
# 1. added time elapsed.
# 2. added a 'cp' identifier for use in $2 if want to copy to $3 directly without context match just only filename match.
# 3. added to generate a path file besides of a target file, hence this target file could be easily found manually.
# from 0.3: only add some comments.
# from 0.3r:
# 1. fixed [sed -n '/Date/,/\lfile/p'] even it affects nothing(false negative).
# 2a. changed the sed command into correct one since it sometimes failed match, [tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )].
# 2b. it only affects the path in "path file" which would lose some intermediate path-string. the correct one uses bash internal cmd.
# 3. added "$2"=="cq" for only copying path files without target files. (since this way is simpler than outputing via std-out).
# from 0.4:
# 1. fixed [ while [ -e "${3}${k}" ] ] not concerning about path file.
# 2. added "$2"=="cr" for adding sha1sum checksum file *.cksm along with path file. to separate is easy access, e.g., "cat *.cksm".
# from 0.5:
# 1. in [ tmp_1=$( sha1sum "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' ) ],
# mind changed. use path file also to have checksum when 'cr' is specified. and for less disk-w and for the filename weakness.
# from 0.5r:
# 1. sha1sum -b might be faster.
# 2. might have fixed the limitation-12 issue by using -- and prevented from using sed',,'.
# 3. it is more confortable to output path string rather than path file. added $2=='cs' for this purpose. keyword is [[[Path Here]]].
# from 0.6:
# 1. the prior versions have a spawned bug that having $2 search-pattern with $3 path but no copy events.
# 2. the 'cs' option of the previous version has wrong logic.
# 3. so, for fixing 1 & 2, it is better to have had made moderate changes of code for either keeping c[pqrs] options still effect or
# making output to be able to show the exact path by prefixing a [[[FoundHere]]] line for if $2 is just set. then which means,
# the original 'cs' function is no longer needed, thereby,
# 'cs' would do new behavior of adding a line checksum between [[[FoundHere]]]-line and path-line.
# so, these new functions are updated at the next, "purposes".
# 4. so, this version is 0.7 would be the final release.
# purposes:
# 1a. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and
# including which packed within archive files even within archive of archive of archive of...
# 1b. all results are via standard output by any one of these functions.
# 1c. by this one of functions, the "path" is not the exact location if the file is within archive;
# for demanding the exact path, using the others following functions all of which could do.
#
# 2a. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and
# including options, e.g., $2==".*void\ \+main\(.*\) -i -n -B5 -A5". (note, the options in trailing is must).
# 2b. if found, would be outputed in the following format:
# the-matched-line(s)
# ==========>>>>>
# [[[FoundHere]]]
# (this line is the file checksum only exists by using the 'cs' function; see 5c.)
# the-exact-path/filename
# <<<<<==========
#
# 3. the same as 2., and copy the matched files with an accompanying path file to the assigned absolute-directory by "$3".
#
# 4a. the same as 3., but directly copies without specifying context pattern in $2, instead, by setting $2 as 'cp' for this purpose.
# note, you can try ".*" in $2 in comparison with "cp".
# 4b. another is 'cq' for copying path file only.
# 4c. yet another 'cr', 'cs'(see 5.).
#
# 5a. another path file(*.path) is unconditionally coming up with the copied target file.
# 5b. use $2=='cr' to not only 'cq' but also add checksum of the target file in the path file.
# 5c. use $2=='cs' to std-output file checksum; see 2b.-format; and no any copy event, only std-output.
#
# 6. note that generating the path file is the old-school function, kept for compt., except 1c., all done well via std-output.
# required:
# 1. apt install p7zip-full p7zip-rar
# 2. in the /tmp/ dir for extracting files for temporary use, auto deleted after done.
# known limitation:
# 1. the search pattern is in grep format.
# 2. search twice if the soft-link and its real-path are both in the searching-paths.
# 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script into completely.
# 4. it is used by "7z -px" to skip password protected archives.
# 5. it is used by "7z -aou" to auto rename repetitive extracted files.
# 6. token 'null' is used, so it is not for pattern search. so is the 'cp', 'cq', 'cs', and 'cr' in $2.
# 7. file descriptor 3 is used for identification purpose, so do not use it before running this script.
# 8. except parameters $1, $2, $3 are for user, the other params are only for internal use:
# "$4" passing a specific archive file for extracting and searching.
# "$5" script path.
# "$6" script name.
# "$7" progressive path for inheriting.
# 9. $2, the content pattern, must have the options in trailing. that is like as "the_pattern -opt1 -opt2".
# 10. the target path for copying can not be in the searching path or it might be searched and cause loop.
# 11a. $2 has 5 identifiers 'null' and 'c[pqrs]' can not be used, however, still could be the pattern by e.g., "nul[l]\{1\}".
# 11b. note, weird: [grep nul[l]\{1\} -n --color a.txt] is failed and [this_script ".*a\.txt.*" "nul[l]\{1\} -n --color"] is success.
# 11c. $1 needs full qualified name, that is, if "a.txt" is exactly the file to search, it still needs to be like as ".*a\.txt.*"
# 11d. the 11c might be the reason of prefixed path.
# 12a. known fault: the filenames to inspect if containing 1) comma "," 2) leading hyphen "-" etc., either made sed fault or bash fault.
# 12b. this issue might be fixed beginning from ver.0.6.
# example1: [this_script ".*\.pdf$"]
# find out all the "pdf" files where locate including sub-dirs and within archive files.
#
# example2: detach the task from shell for free run.
# [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown]
#
# example3: [this_script ".*\.c$" ".*printf.* -n -B3 -A5"]
# find out all the "*.c" files which have pattern '.*printf.*' in it. and
# also dump out these line-number and the before 3 lines and the after 5 lines.
#
# example4: [this_script ".*\.c$" ".*printf.* -n -B3 -A5" /home/user/Desktop/target]
# the same as example3 and copy matched files(with path files) to folder /home/user/Desktop/target/.
#
# example5: [this_script .*\.txt$ "[^[:blank:]]\+Group[[:blank:]]\{1\} -n -i -A5 -B4 --color" /home/user/Desktop/temp/]
#
# example6: [this_script .*\.txt$ cp /home/user/Desktop/temp/]
# all the found *.txt files with .path files will directly copy to /home/user/Desktop/temp/.
#
# example7: "the_copied_target.file" has a companion file "the_copied_target.file.path" unconditionally. so, rm *.path if not required.
#
# example8: [cat outputfile | sed '/^\(-\|=\)\+>\+\|^<\+=\+\|^[\(Open \)\(ERROR\)\(WARNINGS\)\(\[\[\[Found\)]\|^Is not archive/d;/^$/d;s/\(\/.*\/\)\([^\/]\+$\)/\2/']
# extract only filenames from result output. note, 7z-error/-open-error files may just not supported, so need to treat by other ways.
#
# example9: [this_script ".*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|bz2\|lzh\|txt\|ex[e_]\{1\}\|bat\|msi\|sys\|dll\|ocx\|bin\|pdf\|djvu\|html\?\|mht\|chm\|css\|js\|ps\|iso\|nrg\|gho\|cab\|avi\|rm\(vb\)\?\|jpe\?g\|bmp\|ico\|gif\|mp[34]\{1\}\|mpe\?g\|png\|doc\|ini\|hlp\|inf\|ttf\|pptx\?\|pps\|xlsx\?\|reg\|dat\|db\|bak\|log\|asm\|inc\|c\(pp\|xx\)\?\|h\(pp\)\?\|lib\)$" &> output.txt]
#
# example10: [find . -iname "*.path" -exec cat {} >> output.txt \;] if huge amount of files cause [cat * >> output.txt] failed.
#
# example11. [./this_script ".*\.\(c\|cpp\)$" "(?s)(int|void)[[:space:]]*main[[:space:]]*\([^\)]*\)[[:space:]]*{.* -izaPo --color" | sed 's/\x0//' > Dump.txt]
# dig out all the main function definitions.
# note the [sed 's/\x0//'], not only 1 occurrence for using this script however, only 1 occurrence when using single grep, that is,
# [sed '$s/\x0//'] is enough.
SERF_VER="0.7"
lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root
if [ $? -eq 0 ]; then # if not the root task
if [ $# -eq 7 ]; then # if intends for fork-task
inipath=$( pwd )
arcname="$( basename -- "$4" ).XXXXXXXX"
arcpath=$( mktemp -t -d -- "$arcname" )
SCRIPT_DIR="$5"
SCRIPT_NAME="$6"
INI_DIR="$7"
TMP_DIR="$arcpath"
7z -px -aou -o"$arcpath" x -- "$4" 2>/dev/null 1>/dev/null
cd -- "$arcpath"
else # something wrong
exit 1
fi
else # the root task, generates fd3
# trap prevents from aborted by user and $IFS not yet recovered
IFS_OLD="$IFS"
function for_trap_exit() {
echo -e '\n\nuser abort\n';
[[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up;
# be careful not to clean-up /tmp/ here or get into catastrophe since I met.
exec 3>&-;
exit 1;
}
trap for_trap_exit SIGINT SIGKILL SIGSEGV
# wrong arguments
if [ $# -lt 1 ] || [ $# -gt 3 ]; then
echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]";
exit 1
fi
# no 7z bin file
7z | grep -i "copyright"
if [ $? -ne 0 ]; then
echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first.";
exit 1
fi
# set global vars
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" )
TMP_DIR="presently working directory"
INI_DIR=""
inipath=$( pwd )
arcpath=$( pwd )
# rearrange arguments
z1="$1"
if [ $# -eq 1 ]; then
z2='null'
z3='null'
elif [ $# -eq 2 ]; then
if [ "$2" = 'null' ]; then
echo '"null" not applicable.'
exit 1
elif [ "$2" = cp ] || [ "$2" = cq ] || [ "$2" = cr ]; then
echo "the absolute path is needed."
exit 1
fi
z2="$2"
z3='null'
else
if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then
echo '"null" not applicable.'
exit 1
fi
z2="$2"
z3="$3"
echo "$z3" | grep -q "^/.*"
if [ $? -eq 0 ] && [ -d "$z3" ]; then
# try to padding "/"
echo "$z3" | grep -q ".*/$"
[[ $? -ne 0 ]] && z3="${z3}/"
else
echo; echo "the absolute path is needed."; echo;
exit 1
fi
fi
set --
set "$z1" "$z2" "$z3"
exec 3>&1
echo -e "\nscript version: $SERF_VER\n"
echo "cmd: $SCRIPT_DIR/$SCRIPT_NAME '$1' '$2' '$3'"
echo "pwd: $inipath"
echo; echo $( date ); echo;
elapse_time_b=$SECONDS
fi
if [ "$2" != $'null' ]; then # needs for context search or direct-copy
find -L ~+ -type f -regextype grep -iregex "$1" | while read -r ctx;
do
if [ "$2" = 'cs' ]; then # only standard output the checksum and the path
# generate the path of the copied file
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$ctx" | sed 's,'"$TMP_DIR"',,' ) # remove the "/tmp/arc-name"
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${ctx##"$TMP_DIR"}
echo -e "==========>>>>>\n[[[FoundHere]]]";
echo -e "$( sha1sum -b -- "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1/' )";
echo -e "${INI_DIR}${tmp_str}\n<<<<<==========\n"; # wrap by considering if too long path.
# note, by the grep, the $2 not quoted for options injection.
elif [ "$2" = 'cp' ] || [ "$2" = 'cq' ] || [ "$2" = 'cr' ] || ! ! grep $2 -- "$ctx"; then
tmp_str=${ctx##"$TMP_DIR"}
echo -e "==========>>>>>\n[[[FoundHere]]]";
echo -e "${INI_DIR}${tmp_str}\n<<<<<==========\n";
if [ "$3" != $'null' ]; then
i=0;
j=$( basename -- "$ctx" )
k=$j
while [ -e "${3}${k}" ] || [ -e "${3}${k}.path" ]; do
(( i=$i+1 ));
k="$j($i)"
done
# 1) unconditionally path file; 2) match && nonempty $3 or 'cp' will copy target file;
# 3) 'cr' for adding checksum in path file.
tmp_1=""
if [ "$2" = 'cr' ]; then
tmp_1=$( sha1sum -b -- "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' )
elif [ "$2" != 'cq' ]; then
cp -- "$ctx" "${3}${k}" # copy the matched file to the target folder
fi
echo "${tmp_1}${INI_DIR}${tmp_str}" > "${3}${k}.path"
fi
else echo "=====>>>>> $ctx";
fi
done
else # only for file name search
find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs
fi
t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 )
IFS_OLD="$IFS"
IFS=$'\n'
for tt in $t; do
echo "----->>>>> $tt"
# the following line is incorrect but is ok since it through to the end(where meant to be) and "\lfile" meant to be not found.
# the new replacement command however weird, when use 1 instead of 0 would go wrong.
#ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' )
ttt=$( 7z -px l -- "$tt" | tac | sed '0,/fi/d' | tac | sed '0,/Dat/d' )
v=$( echo "$ttt" | grep -m1 -q -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" )
if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside
IFS="$IFS_OLD" # branch out then need recovery
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${tt##"$TMP_DIR"}
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}"
IFS=$'\n' # branch in then need recovery
elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search
echo "$ttt" | grep -m1 -q -i "$1"
if [ $? -eq 0 ]; then
IFS="$IFS_OLD"
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${tt##"$TMP_DIR"}
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}"
IFS=$'\n'
fi
else
echo "$ttt" | grep -i "$1"
fi
done
IFS="$IFS_OLD"
if [ "$inipath" != "$arcpath" ]; then # if not the root
cd -- "$inipath"
rm -rf -- "$arcpath"
else
exec 3>&- # done the search and release fd3
echo; echo $( date ); echo;
(( time_elapsed=$SECONDS-$elapse_time_b ));
echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n";
fi
# end of sh
```
# version 0.5r
#### 以下為腳本內容
```sh=
#!/bin/bash
# version 0.5r. 20230213. ken woo. copyleft.
# from 0.2:
# 1. bug fixed [sed -n '/Date/,/files/p'] to [sed -n '/Date/,/\lfile/p'].
# 2. found the $IFS bug but not yet resolved.
# from 0.2a:
# 1. the altering $IFS was changed into read -r for grep processing [grep $2 -- "$ctx"].
# from 0.2b:
# 1. added time elapsed.
# 2. added a 'cp' identifier for use in $2 if want to copy to $3 directly without context match just only filename match.
# 3. added to generate a path file besides of a target file, hence this target file could be easily found manually.
# from 0.3: only add some comments.
# from 0.3r:
# 1. fixed [sed -n '/Date/,/\lfile/p'] even it affects nothing(false negative).
# 2a. changed the sed command into correct one since it sometimes failed match, [tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )].
# 2b. it only affects the path in "path file" which would lose some intermediate path-string. the correct one uses bash internal cmd.
# 3. added "$2"=="cq" for only copying path files without target files. (since this way is simpler than outputing via std-out).
# from 0.4:
# 1. fixed [ while [ -e "${3}${k}" ] ] not concerning about path file.
# 2. added "$2"=="cr" for adding sha1sum checksum file *.cksm along with path file. to separate is easy access, e.g., "cat *.cksm".
# from 0.5:
# 1. in [ tmp_1=$( sha1sum "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' ) ],
# mind changed. use path file also to have checksum when 'cr' is specified. and for less disk-w and for the filename weakness.
# purposes:
# 1. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and
# including which packed within archive files even within archive of archive of archive of...
#
# 2. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and
# including options, e.g., $2==".*void +main\(.*\) -i -n -B5 -A5". (note, the options in trailing is must).
#
# 3. the same as 2., and copy the matched files to the assigned absolute-directory by "$3".
#
# 4. the same as 3., but directly copies without specifying context pattern in $2, instead, set $2 as 'cp' for this purpose.
# note, you can try ".*" in $2 in comparison with "cp". another is 'cq' for copying path file only. yet another 'cr'.
#
# 5a. another path file(*.path) is coming up with the copied target file which can be easily found manually via this path file.
# 5b. use $2=='cr' to not only 'cq' but also add checksum in path file.
# required:
# 1. apt install p7zip-full p7zip-rar
# 2. in the /tmp/ dir for extracting files for temporary use, auto deleted after done.
# known limitation:
# 1. the search pattern is in grep format.
# 2. search twice if the soft-link and its real-path are both in the searching-paths.
# 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script into completely.
# 4. it is used by "7z -px" to skip password protected archives.
# 5. it is used by "7z -aou" to auto rename repetitive extracted files.
# 6. token 'null' is used, so it is not for pattern search. so is the 'cp', 'cq', and 'cr' in $2.
# 7. file descriptor 3 is used for identification purpose, so do not use it before running this script.
# 8. except parameters $1, $2, $3 are for user, the other params are only for internal use:
# "$4" passing a specific archive file for extracting and searching.
# "$5" script path.
# "$6" script name.
# "$7" progressive path for inheriting.
# 9. $2, the content pattern, must have the options in trailing. that is like as "the_pattern -opt1 -opt2".
# 10. the target path for copying can not be in the searching path or it might be searched and cause loop.
# 11a. $2 has 4 identifiers 'null' and 'c[pqr]' can not be used, however, still could be the pattern by e.g., "nul[l]\{1\}".
# 11b. note, weird: [grep nul[l]\{1\} -n --color a.txt] is failed and [this_script ".*a\.txt.*" "nul[l]\{1\} -n --color"] is success.
# 11c. $1 needs full qualified name, that is, if "a.txt" is exactly the file to search, it still needs to be like as ".*a\.txt.*"
# 11d. the 11c might be the reason of prefixed path.
# 12. known fault: the filenames to inspect if containing 1) comma "," 2) leading hyphen "-" etc., either made sed fault or grep fault.
# example1: [this_script ".*\.pdf$"]
# find out all the "pdf" files where locate including sub-dirs and within archive files.
#
# example2: detach the task from shell for free run.
# [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown]
#
# example3: [this_script ".*\.c$" ".*printf.* -n -B3 -A5"]
# find out all the "*.c" files which have pattern '.*printf.*' in it. and
# also dump out these line-number and the before 3 lines and the after 5 lines.
#
# example4: [this_script ".*\.c$" ".*printf.* -n -B3 -A5" /home/user/Desktop/target]
# the same as example3 and copy matched files to folder /home/user/Desktop/target/.
#
# example5: [this_script .*\.txt$ "[^[:blank:]]\+Group[[:blank:]]\{1\} -n -i -A5 -B4 --color" /home/user/Desktop/temp/]
#
# example6: [this_script .*\.txt$ cp /home/user/Desktop/temp/]
# all the found *.txt files with .path files will directly copy to /home/user/Desktop/temp/.
#
# example7: "the_copied_target.file" has a companion file "the_copied_target.file.path" unconditionally. so, rm *.path if not required.
#
# example8: [cat outputfile | sed '/^-\+>\+\|^[\(Open \)\(ERROR\)\(WARNINGS\)]\|^Is not archive/d;/^$/d;s/\(\/.*\/\)\([^\/]\+$\)/\2/']
# extract only filenames from result output. note, 7z-error/-open-error files may just not supported, so need to treat by other ways.
#
# example9: [this_script ".*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|bz2\|lzh\|txt\|ex[e_]\{1\}\|bat\|msi\|sys\|dll\|ocx\|bin\|pdf\|djvu\|html\?\|mht\|chm\|css\|js\|ps\|iso\|nrg\|gho\|cab\|avi\|rm\(vb\)\?\|jpe\?g\|bmp\|ico\|gif\|mp[34]\{1\}\|mpe\?g\|png\|doc\|ini\|hlp\|inf\|ttf\|pptx\?\|pps\|xlsx\?\|reg\|dat\|db\|bak\|log\|asm\|inc\|c\(pp\|xx\)\?\|h\(pp\)\?\|lib\)$" &> output.txt]
SERF_VER="0.5r"
lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root
if [ $? -eq 0 ]; then # if not the root task
if [ $# -eq 7 ]; then # if intends for fork-task
inipath=$( pwd )
arcname="$( basename "$4" ).XXXXXXXX"
arcpath=$( mktemp -t -d "$arcname" )
SCRIPT_DIR="$5"
SCRIPT_NAME="$6"
INI_DIR="$7"
TMP_DIR="$arcpath"
7z -px -aou -o"$arcpath" x "$4" 2>/dev/null 1>/dev/null
cd "$arcpath"
else # something wrong
exit 1
fi
else # the root task, generates fd3
# trap prevents from aborted by user and $IFS not yet recovered
IFS_OLD="$IFS"
function for_trap_exit() {
echo -e '\n\nuser abort\n';
[[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up;
# be careful not to clean-up /tmp/ here or get into catastrophe since I met.
exec 3>&-;
exit 1;
}
trap for_trap_exit SIGINT SIGKILL SIGSEGV
# wrong arguments
if [ $# -lt 1 ] || [ $# -gt 3 ]; then
echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]";
exit 1
fi
# no 7z bin file
7z | grep -i "copyright"
if [ $? -ne 0 ]; then
echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first.";
exit 1
fi
# set global vars
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" )
TMP_DIR="presently working directory"
INI_DIR=""
inipath=$( pwd )
arcpath=$( pwd )
# rearrange arguments
z1="$1"
if [ $# -eq 1 ]; then
z2='null'
z3='null'
elif [ $# -eq 2 ]; then
if [ "$2" = 'null' ]; then
echo '"null" not applicable.'
exit 1
elif [ "$2" = cp ] || [ "$2" = cq ] || [ "$2" = cr ]; then
echo "the absolute path is needed."
exit 1
fi
z2="$2"
z3='null'
else
if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then
echo '"null" not applicable.'
exit 1
fi
z2="$2"
z3="$3"
echo "$z3" | grep -q "^/.*"
if [ $? -eq 0 ] && [ -d "$z3" ]; then
# try to padding "/"
echo "$z3" | grep -q ".*/$"
[[ $? -ne 0 ]] && z3="${z3}/"
else
echo; echo "the absolute path is needed."; echo;
exit 1
fi
fi
set --
set "$z1" "$z2" "$z3"
exec 3>&1
echo -e "\nscript version: $SERF_VER\n"
echo "cmd: $SCRIPT_DIR/$SCRIPT_NAME '$1' '$2' '$3'"
echo "pwd: $inipath"
echo; echo $( date ); echo;
elapse_time_b=$SECONDS
fi
if [ "$2" != $'null' ]; then # needs for context search or direct-copy
find -L ~+ -type f -regextype grep -iregex "$1" | while read -r ctx;
do
# note, by the grep, the $2 not quoted for options injection
[[ "$2" = 'cp' ]] || [[ "$2" = 'cq' ]] || [[ "$2" = 'cr' ]] || grep $2 -- "$ctx"
if [ $? -eq 0 ]; then
echo "$ctx [[[Found Here]]]"; echo -e "----------------\n";
if [ "$3" != $'null' ]; then
i=0;
j=$( basename -- "$ctx" )
k=$j
while [ -e "${3}${k}" ] || [ -e "${3}${k}.path" ]; do
(( i=$i+1 ));
k="$j($i)"
done
[[ "$2" = 'cp' ]] && cp "$ctx" "${3}${k}" # copy the matched file to the target folder
# generate the path of the copied file
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$ctx" | sed 's,'"$TMP_DIR"',,' ) # remove the "/tmp/arc-name"
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${ctx##"$TMP_DIR"}
tmp_1=""
[[ "$2" = 'cr' ]] && tmp_1=$( sha1sum "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' )
echo "${tmp_1}${INI_DIR}${tmp_str}" > "${3}${k}.path"
fi
fi
done
else # only for file name search
find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs
fi
t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 )
IFS_OLD="$IFS"
IFS=$'\n'
for tt in $t; do
echo -e "----->>>>> $tt\n"
# the following line is incorrect but is ok since it through to the end(where meant to be) and "\lfile" meant to be not found.
# the new replacement command however weird, when use 1 instead of 0 would go wrong.
#ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' )
ttt=$( 7z -px l "$tt" | tac | sed '0,/fi/d' | tac | sed '0,/Dat/d' )
v=$( echo "$ttt" | grep -m1 -q -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" )
if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside
IFS="$IFS_OLD" # branch out then need recovery
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${tt##"$TMP_DIR"}
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}"
IFS=$'\n' # branch in then need recovery
elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search
echo "$ttt" | grep -m1 -q -i "$1"
if [ $? -eq 0 ]; then
IFS="$IFS_OLD"
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${tt##"$TMP_DIR"}
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}"
IFS=$'\n'
fi
else
echo "$ttt" | grep -i "$1"
fi
done
IFS="$IFS_OLD"
if [ "$inipath" != "$arcpath" ]; then # if not the root
cd "$inipath"
rm -rf "$arcpath"
else
exec 3>&- # done the search and release fd3
echo; echo $( date ); echo;
(( time_elapsed=$SECONDS-$elapse_time_b ));
echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n";
fi
# end of sh
```
# version 0.4
```sh=
#!/bin/bash
# version 0.4. 20230210. ken woo. copyleft.
# from 0.2:
# 1. bug fixed [sed -n '/Date/,/files/p'] to [sed -n '/Date/,/\lfile/p'].
# 2. found the $IFS bug but not yet resolved.
# from 0.2a:
# 1. the altering $IFS was changed into read -r for grep processing [grep $2 -- "$ctx"].
# from 0.2b:
# 1. added time elapsed.
# 2. added a 'cp' identifier for use in $2 if want to copy to $3 directly without context match just only filename match.
# 3. added to generate a path file besides of a target file, hence this target file could be easily found manually.
# from 0.3: only add some comments.
# from 0.3r:
# 1. fixed [sed -n '/Date/,/\lfile/p'] even it affects nothing(false negative).
# 2a. changed the sed command into correct one since it sometimes failed match, [tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )].
# 2b. it only affects the path in "path file" which would lose some intermediate path-string. the correct one uses bash internal cmd.
# 3. added "$2"=="cq" for only copying path files without target files. (since this way is simpler than outputing via std-out).
# purposes:
# 1. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and
# including which packed within archive files even within archive of archive of archive of...
#
# 2. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and
# including options, e.g., $2==".*void +main\(.*\) -i -n -B5 -A5". (note, the options in trailing is must).
#
# 3. the same as 2., and copy the matched files to the assigned absolute-directory by "$3".
#
# 4. the same as 3., but directly copies without specifying context pattern in $2, instead, set $2 as 'cp' for this purpose.
# note, you can try ".*" in $2 in comparison with "cp". another is 'cq' for copying path file only.
#
# 5. another path file(*.path) is coming up with the copied target file which can be easily found manually via this path file.
# required:
# 1. apt install p7zip-full p7zip-rar
# 2. in the /tmp/ dir for extracting files for temporary use, auto deleted after done.
# known limitation:
# 1. the search pattern is in grep format.
# 2. search twice if the soft-link and its real-path are both in the searching-paths.
# 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script into completely.
# 4. it is used by "7z -px" to skip password protected archives.
# 5. it is used by "7z -aou" to auto rename repetitive extracted files.
# 6. token 'null' is used, so it is not for pattern search. so is the 'cp', and 'cq' in $2.
# 7. file descriptor 3 is used for identification purpose, so do not use it before running this script.
# 8. except parameters $1, $2, $3 are for user, the other params are only for internal use:
# "$4" passing a specific archive file for extracting and searching.
# "$5" script path.
# "$6" script name.
# "$7" progressive path for inheriting.
# 9. $2, the content pattern, must have the options in trailing. that is like as "the_pattern -opt1 -opt2".
# 10. the target path for copying can not be in the searching path or it might be searched and cause loop.
# 11a. $2 has 3 identifiers 'null' and 'cp' and 'cq' can not be used, however, still could be the pattern by e.g., "nul[l]\{1\}".
# 11b. note, weird: [grep nul[l]\{1\} -n --color a.txt] is failed and [this_script ".*a\.txt.*" "nul[l]\{1\} -n --color"] is success.
# 11c. $1 needs full qualified name, that is, if "a.txt" is exactly the file to search, it still needs to be like as ".*a\.txt.*"
# 11d. the 11c might be the reason of prefixed path.
# 12. known fault: the filenames to inspect if containing 1) comma "," 2) leading hyphen "-" etc., either made sed fault or grep fault.
# example1: [this_script ".*\.pdf$"]
# find out all the "pdf" files where locate including sub-dirs and within archive files.
#
# example2: detach the task from shell for free run.
# [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown]
#
# example3: [this_script ".*\.c$" ".*printf.* -n -B3 -A5"]
# find out all the "*.c" files which have pattern '.*printf.*' in it. and
# also dump out these line-number and the before 3 lines and the after 5 lines.
#
# example4: [this_script ".*\.c$" ".*printf.* -n -B3 -A5" /home/user/Desktop/target]
# the same as example3 and copy matched files to folder /home/user/Desktop/target/.
#
# example5: [this_script .*\.txt$ "[^[:blank:]]\+Group[[:blank:]]\{1\} -n -i -A5 -B4 --color" /home/user/Desktop/temp/]
#
# example6: [this_script .*\.txt$ cp /home/user/Desktop/temp/]
# all the found *.txt files with .path files will directly copy to /home/user/Desktop/temp/.
#
# example7: "the_copied_target.file" has a companion file "the_copied_target.file.path" unconditionally. so, rm *.path if not required.
lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root
if [ $? -eq 0 ]; then # if not the root task
if [ $# -eq 7 ]; then # if intends for fork-task
inipath=$( pwd )
arcname="$( basename "$4" ).XXXXXXXX"
arcpath=$( mktemp -t -d "$arcname" )
SCRIPT_DIR="$5"
SCRIPT_NAME="$6"
INI_DIR="$7"
TMP_DIR="$arcpath"
7z -px -aou -o"$arcpath" x "$4" 2>/dev/null 1>/dev/null
cd "$arcpath"
else # something wrong
exit 1
fi
else # the root task, generates fd3
# trap prevents from aborted by user and $IFS not yet recovered
IFS_OLD="$IFS"
function for_trap_exit() {
echo -e '\n\nuser abort\n';
[[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up;
# be careful not to clean-up /tmp/ here or get into catastrophe since I met.
exec 3>&-;
exit 1;
}
trap for_trap_exit SIGINT SIGKILL SIGSEGV
# wrong arguments
if [ $# -lt 1 ] || [ $# -gt 3 ]; then
echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]";
exit 1
fi
# no 7z bin file
7z | grep -i "copyright"
if [ $? -ne 0 ]; then
echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first.";
exit 1
fi
# set global vars
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" )
TMP_DIR="presently working directory"
INI_DIR=""
inipath=$( pwd )
arcpath=$( pwd )
# rearrange arguments
z1="$1"
if [ $# -eq 1 ]; then
z2='null'
z3='null'
elif [ $# -eq 2 ]; then
if [ "$2" = 'null' ]; then
echo '"null" not applicable.'
exit 1
elif [ "$2" = cp ] || [ "$2" = cq ]; then
echo "the absolute path is needed."
exit 1
fi
z2="$2"
z3='null'
else
if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then
echo '"null" not applicable.'
exit 1
fi
z2="$2"
z3="$3"
echo "$z3" | grep -q "^/.*"
if [ $? -eq 0 ] && [ -d "$z3" ]; then
# try to padding "/"
echo "$z3" | grep -q ".*/$"
[[ $? -ne 0 ]] && z3="${z3}/"
else
echo; echo "the absolute path is needed."; echo;
exit 1
fi
fi
set --
set "$z1" "$z2" "$z3"
exec 3>&1
echo; echo $( date ); echo;
elapse_time_b=$SECONDS
fi
if [ "$2" != $'null' ]; then # needs for context search or direct-copy
find -L ~+ -type f -regextype grep -iregex "$1" | while read -r ctx;
do
# note, by the grep, the $2 not quoted for options injection
[[ "$2" = 'cp' ]] || [[ "$2" = 'cq' ]] || grep $2 -- "$ctx"
if [ $? -eq 0 ]; then
echo "$ctx [[[Found Here]]]"; echo -e "----------------\n";
if [ "$3" != $'null' ]; then
i=0;
j=$( basename -- "$ctx" )
k=$j
while [ -e "${3}${k}" ]; do
(( i=$i+1 ));
k="$j($i)"
done
[[ "$2" != 'cq' ]] && cp "$ctx" "${3}${k}" # copy the matched file to the target folder
# generate the path of the copied file
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$ctx" | sed 's,'"$TMP_DIR"',,' ) # remove the "/tmp/arc-name"
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${ctx##"$TMP_DIR"}
echo "${INI_DIR}${tmp_str}" > "${3}${k}.path"
fi
fi
done
else # only for file name search
find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs
fi
t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 )
IFS_OLD="$IFS"
IFS=$'\n'
for tt in $t; do
echo -e "----->>>>> $tt\n"
# the following line is incorrect but is ok since it through to the end(where meant to be) and "\lfile" meant to be not found.
# the new replacement command however weird, when use 1 instead of 0 would go wrong.
#ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' )
ttt=$( 7z -px l "$tt" | tac | sed '0,/fi/d' | tac | sed '0,/Dat/d' )
v=$( echo "$ttt" | grep -m1 -q -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" )
if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside
IFS="$IFS_OLD" # branch out then need recovery
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${tt##"$TMP_DIR"}
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}"
IFS=$'\n' # branch in then need recovery
elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search
echo "$ttt" | grep -m1 -q -i "$1"
if [ $? -eq 0 ]; then
IFS="$IFS_OLD"
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
#tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )
# the previous sed command sometimes causes failed match since some special chars still not escape,
# so changed to the following line
tmp_str=${tt##"$TMP_DIR"}
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}"
IFS=$'\n'
fi
else
echo "$ttt" | grep -i "$1"
fi
done
IFS="$IFS_OLD"
if [ "$inipath" != "$arcpath" ]; then # if not the root
cd "$inipath"
rm -rf "$arcpath"
else
exec 3>&- # done the search and release fd3
echo; echo $( date ); echo;
(( time_elapsed=$SECONDS-$elapse_time_b ));
echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n";
fi
# end of sh
```
# version 0.3r
```sh=
#!/bin/bash
# version 0.3r. 20230207. ken woo. copyleft.
# from 0.2:
# 1. bug fixed [sed -n '/Date/,/files/p'] to [sed -n '/Date/,/\lfile/p'].
# 2. found the $IFS bug but not yet resolved.
# from 0.2a:
# 1. the altering $IFS was changed into read -r for grep processing [grep $2 -- "$ctx"].
# from 0.2b:
# 1. added time elapsed.
# 2. added a 'cp' identifier for use in $2 if want to copy to $3 directly without context match just only filename match.
# 3. added to generate a path file besides of a target file, hence this target file could be easily found manually.
# from 0.3: only add some comments.
# purposes:
# 1. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and
# including which packed within archive files even within archive of archive of archive of...
#
# 2. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and
# including options, e.g., $2==".*void +main\(.*\) -i -n -B5 -A5". (note, the options in trailing is must).
#
# 3. the same as 2., and copy the matched files to the assigned absolute-directory by "$3".
#
# 4. the same as 3., but directly copies without specifying context pattern in $2, instead, set $2 as 'cp' for this purpose.
# note, you can try ".*" in $2 in comparison with "cp".
#
# 5. another path file(*.path) is coming up with the copied target file which can be easily found manually via this path file.
# required:
# 1. apt install p7zip-full p7zip-rar
# 2. in the /tmp/ dir for extracting files for temporary use, auto deleted after done.
# known limitation:
# 1. the search pattern is in grep format.
# 2. search twice if the soft-link and its real-path are both in the searching-paths.
# 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script into completely.
# 4. it is used by "7z -px" to skip password protected archives.
# 5. it is used by "7z -aou" to auto rename repetitive extracted files.
# 6. token 'null' is used, so it is not for pattern search. so is the 'cp' in $2.
# 7. file descriptor 3 is used for identification purpose, so do not use it before running this script.
# 8. except parameters $1, $2, $3 are for user, the other params are only for internal use:
# "$4" passing a specific archive file for extracting and searching.
# "$5" script path.
# "$6" script name.
# "$7" progressive path for inheriting.
# 9. $2, the content pattern, must have the options in trailing. that is like as "the_pattern -opt1 -opt2".
# 10. the target path for copying can not be in the searching path or it might be searched and cause loop.
# 11. $2 has 2 identifiers 'null' and 'cp' can not be used, however, still could be the pattern by e.g., "nul[l]\{1\}".
# 12. known fault: the filenames to inspect if containing 1) comma "," 2) leading hyphen "-" etc., either made sed fault or grep fault.
# example1: [this_script ".*\.pdf$"]
# find out all the "pdf" files where locate including sub-dirs and within archive files.
#
# example2: detach the task from shell for free run.
# [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown]
#
# example3: [this_script ".*\.c$" ".*printf.* -n -B3 -A5"]
# find out all the "*.c" files which have pattern '.*printf.*' in it. and
# also dump out these line-number and the before 3 lines and the after 5 lines.
#
# example4: [this_script ".*\.c$" ".*printf.* -n -B3 -A5" /home/user/Desktop/target]
# the same as example3 and copy matched files to folder /home/user/Desktop/target/.
#
# example5: [this_script .*\.txt$ "[^[:blank:]]\+Group[[:blank:]]\{1\} -n -i -A5 -B4 --color" /home/user/Desktop/temp/]
#
# example6: [this_script .*\.txt$ cp /home/user/Desktop/temp/]
# all the found *.txt files will directly copy to /home/user/Desktop/temp/.
#
# example7: "the_copied_target.file" has a companion file "the_copied_target.file.path" unconditionally. so, rm *.path if not required.
lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root
if [ $? -eq 0 ]; then # if not the root task
if [ $# -eq 7 ]; then # if intends for fork-task
inipath=$( pwd )
arcname="$( basename "$4" ).XXXXXXXX"
arcpath=$( mktemp -t -d "$arcname" )
SCRIPT_DIR="$5"
SCRIPT_NAME="$6"
INI_DIR="$7"
TMP_DIR="$arcpath"
7z -px -aou -o"$arcpath" x "$4" 2>/dev/null 1>/dev/null
cd "$arcpath"
else # something wrong
exit 1
fi
else # the root task, generates fd3
# trap prevents from aborted by user and $IFS not yet recovered
IFS_OLD="$IFS"
function for_trap_exit() {
echo -e '\n\nuser abort\n';
[[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up;
# be careful not to clean-up /tmp/ here or get into catastrophe since I met.
exec 3>&-;
exit 1;
}
trap for_trap_exit SIGINT SIGKILL SIGSEGV
# wrong arguments
if [ $# -lt 1 ] || [ $# -gt 3 ]; then
echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]";
exit 1
fi
# no 7z bin file
7z | grep -i "copyright"
if [ $? -ne 0 ]; then
echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first.";
exit 1
fi
# set global vars
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" )
TMP_DIR="presently working directory"
INI_DIR=""
inipath=$( pwd )
arcpath=$( pwd )
# rearrange arguments
z1="$1"
if [ $# -eq 1 ]; then
z2='null'
z3='null'
elif [ $# -eq 2 ]; then
if [ "$2" = 'null' ]; then
echo '"null" not applicable.'
exit 1
elif [ "$2" = 'cp' ]; then
echo "the absolute path is needed."
exit 1
fi
z2="$2"
z3='null'
else
if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then
echo '"null" not applicable.'
exit 1
fi
z2="$2"
z3="$3"
echo "$z3" | grep -q "^/.*"
if [ $? -eq 0 ] && [ -d "$z3" ]; then
# try to padding "/"
echo "$z3" | grep -q ".*/$"
[[ $? -ne 0 ]] && z3="${z3}/"
else
echo; echo "the absolute path is needed."; echo;
exit 1
fi
fi
set --
set "$z1" "$z2" "$z3"
exec 3>&1
echo; echo $( date ); echo;
elapse_time_b=$SECONDS
fi
if [ "$2" != $'null' ]; then # needs for context search or direct-copy
find -L ~+ -type f -regextype grep -iregex "$1" | while read -r ctx;
do
# note, by the grep, the $2 not quoted for options injection
[[ "$2" = 'cp' ]] || grep $2 -- "$ctx"
if [ $? -eq 0 ]; then
echo "$ctx [[[Found Here]]]"; echo -e "----------------\n";
if [ "$3" != $'null' ]; then
i=0;
j=$( basename -- "$ctx" )
k=$j
while [ -e "${3}${k}" ]; do
(( i=$i+1 ));
k="$j($i)"
done
cp "$ctx" "${3}${k}" # copy the matched file to the target folder
# generate the path of the copied file
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
tmp_str=$( echo "$ctx" | sed 's,'"$TMP_DIR"',,' ) # remove the "/tmp/arc-name"
echo "${INI_DIR}${tmp_str}" > "${3}${k}.path"
fi
fi
done
else # only for file name search
find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs
fi
t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 )
IFS_OLD="$IFS"
IFS=$'\n'
for tt in $t; do
echo -e "----->>>>> $tt\n"
ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' )
v=$( echo "$ttt" | grep -m1 -q -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" )
if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside
IFS="$IFS_OLD" # branch out then need recovery
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}"
IFS=$'\n' # branch in then need recovery
elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search
echo "$ttt" | grep -m1 -q -i "$1"
if [ $? -eq 0 ]; then
IFS="$IFS_OLD"
# note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string
# however it might be in the $PWD instead, so use sed to cover both conditions
tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}"
IFS=$'\n'
fi
else
echo "$ttt" | grep -i "$1"
fi
done
IFS="$IFS_OLD"
if [ "$inipath" != "$arcpath" ]; then # if not the root
cd "$inipath"
rm -rf "$arcpath"
else
exec 3>&- # done the search and release fd3
echo; echo $( date ); echo;
(( time_elapsed=$SECONDS-$elapse_time_b ));
echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n";
fi
# end of sh
```
## 以下是開發歷程
# grep 問題求助
壓縮檔 test01.zip 是測試用的。工作目錄最好在葉節點,並且有存放此壓縮檔。
```
功能:
可搜尋工作目錄及子目錄下的,列出所有有 $1 之 grep 型式的 pattern 的檔案。且位於壓縮檔內,的再壓縮檔內的檔案都可被列出來。
其次若再指定 $2 pattern,則還會去撈檔案內容。且此 $2 可使用像 "'tool' -i -n -v -A5" 包含了 grep 選項; 不過選項必須像這樣後綴。
再次之若再指定 $3,則會將滿足 $2 的那些檔案複製到 $3 指定的絕對路徑,若檔案重覆會自動改名。
bug 出現在 143 行.
grep $2 "$ctx" ############### the bug is here ###############################################################
#grep $2 "$ctx" # note the $2 not quoted for options injection
自己的除錯:
經將 grep $2 "$ctx" 展開來/dump 各變數內容,並沒有發現有什麼異樣,不過 grep 的結果就是失敗的/回傳 1.
而試過在該處寫展開的,
例如 grep 'tool' /home/ken/Desktop/1.txt 是可以成功的。
例如 grep 'tool' -i -n -A5 /home/ken/Desktop/1.txt 是可以成功的。
可直接拿壓縮檔內的 serf.sh.debug 來試,其與 serf.sh 內容相同,只是我追加了一些除錯的行。
```
```
ex1:
./serf.sh ".*\.txt$"
ex2:
./serf.sh ".*\.txt$\|.*\.debug$"
ex3:
./serf.sh ".*\.txt$\|.*\.debug$" "tool"
ex4:
./serf.sh ".*\.txt$\|.*\.debug$" "'tool'" (bug)
ex5:
./serf.sh ".*\.txt$\|.*\.debug$" "'tool' -i -n" (bug)
```
以下是腳本內容
```sh=
#!/bin/bash
# version 0.2. 20230204. copyleft.
# purposes:
# 1. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and
# including which packed within archive files even within archive of archive of archive of...
#
# 2. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and
# including options, e.g., $2=="'.*void +main\(.*\)' -i -n -B5 -A5". (note, the options in trailing is must).
#
# 3. the same as 2., and copy the matched files to the assigned absolute-directory by "$3".
# required:
# 1. apt install p7zip-full p7zip-rar
# 2. in the /tmp/ dir for extracting files for temporary use, auto removed after done.
# known limitation:
# 1. the search pattern is in grep format.
# 2. search twice if the soft-link and its real-path are both in the searching-paths.
# 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script for completely.
# 4. it is used by "7z -px" to skip password protected archives.
# 5. it is used by "7z -aou" to auto rename repetitive copied files.
# 6. token 'null' is used, so it is not for pattern search.
# 7. file descriptor 3 is used for identification purpose, so do not use it before running this script.
# 8. except parameters $1, $2, $3 are for user, the other params are only for internal use:
# "$4" passing a specific archive file for extracting and searching.
# "$5" script path.
# "$6" script name.
# 9. $2, the content pattern, must have the options in trailing. that is like "'the_pattern' -opt1 -opt2".
# example1: [this_script ".*\.pdf$"]
# find out all the "pdf" files where locate including sub-dirs and within archive files.
#
# example2: detach the task from shell for free run.
# [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown]
#
# example3: [this_script ".*\.c$" "'.*printf.*' -n -B3 -A5"]
# find out all the "*.c" files which have pattern '.*printf.*' in it. and
# also dump out these line-number and the before 3 lines and the after 5 lines.
#
# example4: [this_script ".*\.c$" "'.*printf.*' -n -B3 -A5" /home/user/Desktop/target]
# the same as example3 and copy matched files to folder /home/user/Desktop/target/.
lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root
if [ $? -eq 0 ]; then # if not the root task
if [ $# -eq 6 ]; then # if intends for fork-task
inipath=$( pwd )
arcname="$( basename "$4" ).XXXXXXXX"
arcpath=$( mktemp -t -d "$arcname" )
SCRIPT_DIR="$5"
SCRIPT_NAME="$6"
7z -px -aou -o"$arcpath" x "$4" 2>/dev/null 1>/dev/null
cd "$arcpath"
else # something wrong
exit 1
fi
else # the root task, generates fd3
# trap prevents from aborted by user and $IFS not yet recovered
IFS_OLD="$IFS"
function for_trap_exit() {
echo -e '\n\nuser abort\n';
[[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up;
exec 3>&-;
exit 1;
}
trap for_trap_exit SIGINT SIGKILL
# wrong arguments
if [ $# -lt 1 ] || [ $# -gt 3 ]; then
echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]";
exit 1
fi
# no 7z bin file
7z | grep -i "copyright"
if [ $? -ne 0 ]; then
echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first.";
exit 1
fi
# set global vars
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" )
INI_DIR=$( pwd )
inipath=$( pwd )
arcpath=$( pwd )
# rearrange arguments
z1="$1"
if [ $# -eq 1 ]; then
z2='null'
z3='null'
elif [ $# -eq 2 ]; then
if [ "$2" = 'null' ]; then
echo '"null" not applicable.'
exit 1
fi
z2="$2"
z3='null'
else
if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then
echo '"null" not applicable.'
exit 1
fi
z2="$2"
z3="$3"
echo "$z3" | grep -q "^/.*"
if [ $? -eq 0 ] && [ -d "$z3" ]; then
# try to padding "/"
echo "$z3" | grep -q ".*/$"
[[ $? -ne 0 ]] && z3="${z3}/"
else
echo; echo "the absolute path is needed."; echo;
exit 1
fi
fi
set --
set "$z1" "$z2" "$z3"
exec 3>&1
echo; echo $( date ); echo;
fi
if [ "$2" != $'null' ]; then # needs for context search
indivCS=$( find -L ~+ -type f -regextype grep -iregex "$1" )
IFS_OLD="$IFS"
IFS=$'\n'
for ctx in $indivCS; do
grep $2 "$ctx" ############### the bug is here ###############################################################
#grep $2 "$ctx" # note the $2 not quoted for options injection
if [ $? -eq 0 ]; then
echo "$ctx [[[Found Here]]]"; echo -e "----------------\n";
if [ "$3" != $'null' ]; then
i=0;
j=$( basename -- "$ctx" )
k=$j
while [ -e "${3}${k}" ]; do
(( i=$i+1 ));
k="$j($i)"
done
cp "$ctx" "${3}${k}"
fi
fi
done
IFS="$IFS_OLD"
else # only for file name search
find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs
fi
t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 )
IFS_OLD="$IFS"
IFS=$'\n'
for tt in $t; do
echo -e "----->>>>> $tt\n"
ttt=$( 7z -px l "$tt" | sed -n '/Date/,/files/p' )
v=$( echo "$ttt" | grep -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" )
if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside
IFS="$IFS_OLD" # branch out then need recovery
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME"
IFS=$'\n' # branch in then need recovery
elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search
echo "$ttt" | grep -m1 -q -i "$1"
if [ $? -eq 0 ]; then
IFS="$IFS_OLD"
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME"
IFS=$'\n'
fi
else
echo "$ttt" | grep -i "$1"
fi
done
IFS="$IFS_OLD"
if [ "$inipath" != "$arcpath" ]; then # if not the root
cd "$inipath"
rm -rf "$arcpath"
else
exec 3>&- # done the search and release fd3
echo; echo $( date ); echo;
fi
# end of sh
```
## 以下修正了一處bug。
## 另外也初步發現 grep 的異常行為是$IFS在作崇。
以下腳本中有特別標明出來。
但還不是解法,可能還需再想想把該段落怎麼改寫吧。
或是採用有前輩提出的將 pattern & options 分開成兩個引數的做法。
```shell
#!/bin/bash
# version 0.2a. 20230205.
lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root
if [ $? -eq 0 ]; then # if not the root task
if [ $# -eq 6 ]; then # if intends for fork-task
inipath=$( pwd )
arcname="$( basename "$4" ).XXXXXXXX"
arcpath=$( mktemp -t -d "$arcname" )
SCRIPT_DIR="$5"
SCRIPT_NAME="$6"
7z -px -aou -o"$arcpath" x "$4" 2>/dev/null 1>/dev/null
cd "$arcpath"
else # something wrong
exit 1
fi
else # the root task, generates fd3
# trap prevents from aborted by user and $IFS not yet recovered
IFS_OLD="$IFS"
function for_trap_exit() {
echo -e '\n\nuser abort\n';
[[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up;
# be careful not to clean-up /tmp/ here or get into catastrophe since I met.
exec 3>&-;
exit 1;
}
trap for_trap_exit SIGINT SIGKILL
# wrong arguments
if [ $# -lt 1 ] || [ $# -gt 3 ]; then
echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]";
exit 1
fi
# no 7z bin file
7z | grep -i "copyright"
if [ $? -ne 0 ]; then
echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first.";
exit 1
fi
# set global vars
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" )
INI_DIR=$( pwd )
inipath=$( pwd )
arcpath=$( pwd )
# rearrange arguments
z1="$1"
if [ $# -eq 1 ]; then
z2='null'
z3='null'
elif [ $# -eq 2 ]; then
if [ "$2" = 'null' ]; then
echo '"null" not applicable.'
exit 1
fi
z2="$2"
z3='null'
else
if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then
echo '"null" not applicable.'
exit 1
fi
z2="$2"
z3="$3"
echo "$z3" | grep -q "^/.*"
if [ $? -eq 0 ] && [ -d "$z3" ]; then
# try to padding "/"
echo "$z3" | grep -q ".*/$"
[[ $? -ne 0 ]] && z3="${z3}/"
else
echo; echo "the absolute path is needed."; echo;
exit 1
fi
fi
set --
set "$z1" "$z2" "$z3"
exec 3>&1
echo; echo $( date ); echo;
fi
if [ "$2" != $'null' ]; then # needs for context search
indivCS=$( find -L ~+ -type f -regextype grep -iregex "$1" )
IFS_OLD="$IFS"
IFS=$'\n'
for ctx in $indivCS; do
IFS=$IFS_OLD ########################## temporarily added ################################################
grep $2 -- "$ctx" ############### the bug is here ###############################################################
#exec grep $2 -- "$ctx"
#grep $2 "$ctx" # note the $2 not quoted for options injection
if [ $? -eq 0 ]; then
echo "$ctx [[[Found Here]]]"; echo -e "----------------\n";
if [ "$3" != $'null' ]; then
i=0;
j=$( basename -- "$ctx" )
k=$j
while [ -e "${3}${k}" ]; do
(( i=$i+1 ));
k="$j($i)"
done
cp "$ctx" "${3}${k}"
fi
fi
IFS=$'\n' ################################## temporarily added ########################################
done
IFS="$IFS_OLD"
else # only for file name search
find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs
fi
t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 )
IFS_OLD="$IFS"
IFS=$'\n'
for tt in $t; do
echo -e "----->>>>> $tt\n"
ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' )
v=$( echo "$ttt" | grep -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" )
if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside
IFS="$IFS_OLD" # branch out then need recovery
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME"
IFS=$'\n' # branch in then need recovery
elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search
echo "$ttt" | grep -m1 -q -i "$1"
if [ $? -eq 0 ]; then
IFS="$IFS_OLD"
"$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME"
IFS=$'\n'
fi
else
echo "$ttt" | grep -i "$1"
fi
done
IFS="$IFS_OLD"
if [ "$inipath" != "$arcpath" ]; then # if not the root
cd "$inipath"
rm -rf "$arcpath"
else
exec 3>&- # done the search and release fd3
echo; echo $( date ); echo;
fi
# end of sh
```
# 以下腳本移除了$IFS對grep所造成的影響
## 並採用了 read 的方式做讀取
目前的測試結果都正常了。
所以這應該是終版了。
感謝慢慢來大大及口烏大大對此腳本的建議與幫助修正
另外我也把 grep sed 二者的 regex 搞混了,grep 的 pattern 是不需要加引號的,因為 grep 的空格是用 [:blank:] 來表示的。
所以前面的示例請把引號拿掉。
```
#!/bin/bash
# deleted
```