# 整合的檔案及搜尋功能之 bash 腳本 ## 已改版,0.9 版在這: https://hackmd.io/@kenwoo777/H1fXjcuk3 ## ## 請注意最後版本 0.7r-2 不改版了,但針對此版的 bug 還是要修:請將第一行註解掉換成二三四行,那麼嚴重過慢的 'ct' 功能便能正常了! (跑 'ct',我測試過 5000 支檔案沒問題,8分鐘。也確認過 23000 支是上界,再高 bash 就崩潰了,而實際上限多少請自行測試。有過多的檔案就是分成多份多次執行) 以下幾行,請自行替換 ```sh= #if [[ "$b_pathfile" = "${a_map}.*" ]]; then # should be [[ $b_pathfile =~ ${a_map}.* ]], # however it is still wrong since some special characters in path/filename unable escape. tmpx=$( expr match "${b_pathfile}" "\(${a_map}\)" ) if [ $? -eq 0 ]; then ``` ### 狀態: 目前已將功能整合完善了,測試過也尚未遇到什麼問題。但有一些限制請見腳本內容之 limitation。當前版本 0.7r-2,終版,除非有 bug ^^" ### 有任何糾錯或更好/更多功能擴展/更有效率/更高容錯性等的看法想法做法歡迎批評指教建議,謝謝。 以下為 0.7 版功能,0.7r-2 版新增 'ct' 功能簡述如下: 我們可以先用 'cs' 功能產生例如某備份硬碟的所有檔案的 map 檔。以後若需要抓取出某些檔案就可以單從此 map 檔中搜尋。把結果的那些檔案路徑集中在另一份檔案,再執行 'ct' 功能自動地把檔案都複製出來。 此功能效能很低,歡迎提出改進的方法。 ## 功能: ### 1. 可搜尋工作目錄及子目錄下的,列出所有有 $1 之 grep 型式的 pattern 的(檔名的)檔案。且位於壓縮檔內,的再壓縮檔內的檔案都可被列出來。 ### 2. 其次若再指定 $2 pattern,則還會去撈檔案內容。且此 $2 可使用像 “tool -i -n -v -A5” 包含了 grep 選項;不過選項必須像這樣後綴。 ### 3. 再次之若再指定 $3,則會將滿足 $2 的那些檔案複製到 $3 指定的絕對路徑,若檔案重覆會自動改名。 ### 4. 若指定 $2 為關鍵字 “cp”,則直接複製。那要搜 “cp” 怎麼辦?仍是可使用 regex 避開且達同樣目的的。 ### 5. 檔案複製會伴隨產生路徑檔。當某檔藏在壓縮檔內的再壓縮檔內的再…特別有用。路徑檔內記錄該檔路徑,可手動或供其他程式進一步追索用。 ### 6. $2 = ‘cq’ 引數,只複製路徑檔而不複製來源檔。 ### 7. 0.7版,已修正了兩處 bugs。引數 'cs' 的邏輯不正確。及大概前一兩版的增刪修導致其中一選項該複製目的檔而未複製。 ### 8. 0.7 版已追加 'cr' 於路徑檔內嵌入目標檔的 checksum, 'cs' 標準輸出含 目標檔 checksum,兩選項功能。若嫌 cs 太慢,可註解掉一行 checksum 的計算。 ### 9. 0.7 版,不再依賴路徑檔,只要 $2 不為空,便會標準輸出完整且正確索引的目標檔路徑。關鍵字 [[[FoundHere]]],但仍保留了 cp/cq/cr/cs 四項原有功能。 ### 10. 檔名不再有限制,請見 limitation 說明。 ### 11. 程式目的希望將標準輸出的結果作再次的運用,故其應沒遺漏什麼重要資訊,例如 error 之類的,可再次地 grep 收集出來。並有刻意加關鍵字或範例(請參考內文範例)以期更便利運用。 ## 以下為 0.7r-2 腳本內容 ```sh= #!/bin/bash # version 0.7r-2. 20230226. ken woo. copyleft. # from 0.2: # 1. bug fixed [sed -n '/Date/,/files/p'] to [sed -n '/Date/,/\lfile/p']. # 2. found the $IFS bug but not yet resolved. # from 0.2a: # 1. the altering $IFS was changed into read -r for grep processing [grep $2 -- "$ctx"]. # from 0.2b: # 1. added time elapsed. # 2. added a 'cp' identifier for use in $2 if want to copy to $3 directly without context match just only filename match. # 3. added to generate a path file besides of a target file, hence this target file could be easily found manually. # from 0.3: only add some comments. # from 0.3r: # 1. fixed [sed -n '/Date/,/\lfile/p'] even it affects nothing(false negative). # 2a. changed the sed command into correct one since it sometimes failed match, [tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )]. # 2b. it only affects the path in "path file" which would lose some intermediate path-string. the correct one uses bash internal cmd. # 3. added "$2"=="cq" for only copying path files without target files. (since this way is simpler than outputing via std-out). # from 0.4: # 1. fixed [ while [ -e "${3}${k}" ] ] not concerning about path file. # 2. added "$2"=="cr" for adding sha1sum checksum file *.cksm along with path file. to separate is easy access, e.g., "cat *.cksm". # from 0.5: # 1. in [ tmp_1=$( sha1sum "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' ) ], # mind changed. use path file also to have checksum when 'cr' is specified. and for less disk-w and for the filename weakness. # from 0.5r: # 1. sha1sum -b might be faster. # 2. might have fixed the limitation-12 issue by using -- and prevented from using sed',,'. # 3. it is more confortable to output path string rather than path file. added $2=='cs' for this purpose. keyword is [[[Path Here]]]. # from 0.6: # 1. the prior versions have a spawned bug that having $2 search-pattern with $3 path but no copy events. # 2. the 'cs' option of the previous version has wrong logic. # 3. so, for fixing 1 & 2, it is better to have had made moderate changes of code for either keeping c[pqrs] options still effect or # making output to be able to show the exact path by prefixing a [[[FoundHere]]] line for if $2 is just set. then which means, # the original 'cs' function is no longer needed, thereby, # 'cs' would do new behavior of adding a line checksum between [[[FoundHere]]]-line and path-line. # so, these new functions are updated at the next, "purposes". # 4. so, this version is 0.7 would be the final release. # from 0.7: # 1. added 'ct' option. please refer to example 15. # 2. the arguments are exclusively changed for 'ct' option from #7 to #9. # 3. really final release. # 1. 0.7r-1 from 0.7r: forget to remove tmp folder $arcpath after done XD # 1. 0.7r-2 from 0.7r-1: add quote [[ ! -e "$a_rep" ]]. note that after by my test, 'ct' performance is worse. # purposes: # # only tested on Ubuntu. # # 1a. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and # including which packed within archive files even within archive of archive of archive of... # 1b. all results are via standard output by any one of these functions. # 1c. by this one of functions, the "path" is not the exact location if the file is within archive; # for demanding the exact path, using the others following functions all of which could do. # # 2a. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and # including options, e.g., $2==".*void\ \+main\(.*\) -i -n -B5 -A5". (note, the options in trailing is must). # 2b. if found, would be outputed in the following format: # the-matched-line(s) # ==========>>>>> # [[[FoundHere]]] # (this line is the file checksum only exists by using the 'cs' function; see 5c.) # the-exact-path/filename # <<<<<========== # # 3. the same as 2., and copy the matched files with an accompanying path file to the assigned absolute-directory by "$3". # # 4a. the same as 3., but directly copies without specifying context pattern in $2, instead, by setting $2 as 'cp' for this purpose. # note, you can try ".*" in $2 in comparison with "cp". # 4b. another is 'cq' for copying path file only. # 4c. yet another 'cr', 'cs'(see 5.). # # 5a. another path file(*.path) is unconditionally coming up with the copied target file. # 5b. use $2=='cr' to not only 'cq' but also add checksum of the target file in the path file. # 5c. use $2=='cs' to std-output file checksum; see 2b.-format; and no any copy event, only std-output. # # 6. note that generating the path file is the old-school function, kept for compt., except 1c., all done well via std-output. # # 7. please refer to example 15. it does not realize about deflating upon these specific target files in an archive, instead, # it extracts whole archive. however, each concerning archive only extracted once since the list is sorted before processing. # required: # 1. apt install p7zip-full p7zip-rar # 2. in the /tmp/ dir for extracting files for temporary use, auto deleted after done. # known limitation: # 1. the search pattern is in grep format. # 2. search twice if the soft-link and its real-path are both in the searching-paths. # 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script into completely. # 4. it is used by "7z -px" to skip password protected archives. # 5. it is used by "7z -aou" to auto rename repetitive extracted files. # 6. token 'null' is used, so it is not for pattern search. so is the 'cp', 'cq', 'cs', and 'cr' in $2. # 7. file descriptor 3 is used for identification purpose, so do not use it before running this script. # 8. except parameters $1, $2, $3 are for user, the other params are only for internal use: # "$4" passing a specific archive file for extracting and searching. # "$5" script path. # "$6" script name. # "$7" progressive path for inheriting. # 9. $2, the content pattern, must have the options in trailing. that is like as "the_pattern -opt1 -opt2". # 10. the target path for copying can not be in the searching path or it might be searched and cause loop. # 11a. $2 has 5 identifiers 'null' and 'c[pqrs]' can not be used, however, still could be the pattern by e.g., "nul[l]\{1\}". # 11b. note, weird: [grep nul[l]\{1\} -n --color a.txt] is failed and [this_script ".*a\.txt.*" "nul[l]\{1\} -n --color"] is success. # 11c. $1 needs full qualified name, that is, if "a.txt" is exactly the file to search, it still needs to be like as ".*a\.txt.*" # 11d. the 11c might be the reason of prefixed path. # 12a. known fault: the filenames to inspect if containing 1) comma "," 2) leading hyphen "-" etc., either made sed fault or bash fault. # 12b. this issue might be fixed beginning from ver.0.6. # example1: [this_script ".*\.pdf$"] # find out all the "pdf" files where locate including sub-dirs and within archive files. # # example2: detach the task from shell for free run. # [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown] # # example3: [this_script ".*\.c$" ".*printf.* -n -B3 -A5"] # find out all the "*.c" files which have pattern '.*printf.*' in it. and # also dump out these line-number and the before 3 lines and the after 5 lines. # # example4: [this_script ".*\.c$" ".*printf.* -n -B3 -A5" /home/user/Desktop/target] # the same as example3 and copy matched files(with path files) to folder /home/user/Desktop/target/. # # example5: [this_script .*\.txt$ "[^[:blank:]]\+Group[[:blank:]]\{1\} -n -i -A5 -B4 --color" /home/user/Desktop/temp/] # # example6: [this_script .*\.txt$ cp /home/user/Desktop/temp/] # all the found *.txt files with .path files will directly copy to /home/user/Desktop/temp/. # # example7: "the_copied_target.file" has a companion file "the_copied_target.file.path" unconditionally. so, rm *.path if not required. # # example8: [cat outputfile | sed '/^\(-\|=\)\+>\+\|^<\+=\+\|^[\(Open \)\(ERROR\)\(WARNINGS\)\(\[\[\[Found\)]\|^Is not archive/d;/^$/d;s/\(\/.*\/\)\([^\/]\+$\)/\2/'] # extract only filenames from result output. note, 7z-error/-open-error files may just not supported, so need to treat by other ways. # # example9: [this_script ".*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|bz2\|lzh\|txt\|ex[e_]\{1\}\|bat\|msi\|sys\|dll\|ocx\|bin\|pdf\|djvu\|html\?\|mht\|chm\|css\|js\|ps\|iso\|nrg\|gho\|cab\|avi\|rm\(vb\)\?\|jpe\?g\|bmp\|ico\|gif\|mp[34]\{1\}\|mpe\?g\|png\|doc\|ini\|hlp\|inf\|ttf\|pptx\?\|pps\|xlsx\?\|reg\|dat\|db\|bak\|log\|asm\|inc\|c\(pp\|xx\)\?\|h\(pp\)\?\|lib\)$" &> output.txt] # # example10: [find . -iname "*.path" -exec cat {} >> output.txt \;] if huge amount of files cause [cat * >> output.txt] failed. # # example11: [./this_script ".*\.\(c\|cpp\)$" "(?s)(int|void)[[:space:]]*main[[:space:]]*\([^\)]*\)[[:space:]]*{.* -izaPo --color" | sed 's/\x0//' > Dump.txt] # dig out all the main function definitions. # note the [sed 's/\x0//'], not only 1 occurrence for using this script however, only 1 occurrence when using single grep, that is, # [sed '$s/\x0//'] is enough. # # example12: using 'cs', if do not want to have checksum, comment out this sha1sum generating line in if-cs code block. # # example13: [sed '/^\(-\|=\)\+>\+\|^<\+=\+\|^\[\[\[Found\|^[0-9a-fA-F]\+$\|^\/.*/d;/^$/d;=' native.txt | sed 'N;s/\n/ /' > output.txt] # suppose a native.txt is generated by 'cs' option, then we want to check out(extract) all of the errors. line number is prefixed for grouping-able. # # example14: [ grep "^\/" native.txt > output.txt] # suppose a native.txt is generated by 'cs' option, extract all the path/filename is simple. # # example15: [ ./this_script full_paths_in_this.file "ct" target_path_for_copy_to ] # since to get a file each time by a thorough search upon some location is funny. better method is to generate a map upon it. # that is, the 'cs' option with $1="anything_interested" done so. # suppose a native.txt is generated by 'cs' option; which is a text file storing all interested files of full-path/filenames. # then we get some full-paths of present of interests out of this map and collect to the file "full_paths_in_this.file", # each full-path in one-another line. then this "ct" option would extract/copy these files to the "target_path_for_copy_to". SERF_VER="0.7r-2" function MYGUBED() { return; echo -e "\n<debug>" for var in "$@"; do [[ $var =~ ^-+ ]] && echo ${var#-} done for myvars in "$@"; do [[ $myvars =~ ^-+ ]] && continue echo "<debug> \$$myvars: ${!myvars}" done echo } function mycp1() { # $1 is the target path; std-input is the src file path while read -r in; do if [ -e "$in" ]; then i=0; j=$( basename -- "$in" ) k=$j while [ -e "${1}${k}" ] || [ -e "${1}${k}.path" ]; do (( i=$i+1 )); k="$j($i)" done cp -- "$in" "${1}${k}" echo "$in" > "${1}${k}.path" echo " "; # pass forward. it is in order to keep the source ordered else echo "$in"; # pass forward. it must be invalid or in archive fi done } function mycp2() { # $1 is the target path; $2 is formal path; std-input is the src file path while read -r in; do if [ -e "$in" ]; then i=0; j=$( basename -- "$in" ) k=$j while [ -e "${1}${k}" ] || [ -e "${1}${k}.path" ]; do (( i=$i+1 )); k="$j($i)" done cp -- "$in" "${1}${k}" echo "$2" > "${1}${k}.path" return 0; else return 1; fi done } # target files which could direct copy are copied beforehand and other than files in the $lines which need more treatments. # the $lines is a sorted lines in number, e.g., [7,9,17,5,2,], each line contains at least 1 intermediate archive path/name, # while processing, these real archive files locate at the course of real path(call it formal path) while is in root task, # or at the course beginning from later generated tmp dir while is in sub tasks; # $a_rep for handling these 2 conditions to locate either real files. # $a_map is a part of the formal path corresponding to $a_rep in order to one-shot replacement. # parameters are using pass-by-var-name for global vars could be used both into this function and for return. bash 4.3 later. # at each round, $lines fans out the line(s) having a first common archive or is standalone. so it is shrinked after called; # whose are moved to $a_num(having common archive). $a_rep is "" when is in root task for handling formal path; # when in a child task it is an extracted archive root path via tmp path, e.g., /tmp/DataFile.zip.XXXXXXXX/, # and it corresponds to the $a_map which is a part of formal path from the formal-paths source-file which is $1. # and $a_map finally becomes the first-intermediate archive path for return. $a_num is the fan out line(s) for sub-task call. # note the 3 vars $3/$4/$5 should be passed by var-names. # $1 is the collect of formal paths file, $2==$a_rep the extracted archive tmp path, $3==$a_map, $4==$a_num, $5==$lines. # after called, $3==part of formal path advanced to the next intermediate archive, $5==the $lines subtracted by $4==$a_num. # keep in mind at this entrance moment, files are ready for looking up. function myFanOut() { a_rep=$2 local -n a_map=$3 local -n a_num=$4 local -n lines=$5 a_num=$( sed -n 's/\(^[0-9]\+\),.*/\1/p' <<< $lines ) # get the leading number "i" a_pathfile=$( sed "${a_num}q;d" "$1" ) # exactly the i-th line in formal path file lines=${lines#?*,} # discard it from $lines a_pathfile="${a_rep}${a_pathfile#$a_map}" # the real existing location of this file # after this loop, $a_rep would be empty or an imtermediate archive while true; do tgt_1="$a_rep" a_rep=$( expr match "$a_pathfile" "\(${a_rep}/[^/]\+\).*" ) # level by level cd into. if [ "$tgt_1" == "$a_rep" ]; then # tail; true should copied. false wrong formal. if [ -e "$a_rep" ]; then echo "found but error\; nothing done(1 logic error): $a_num $a_pathfile" >&2 else echo "file not found(2 formal path error): $a_num $a_pathfile" >&2 fi a_rep="" break fi # so why thus not see if [[ ! -e "$a_rep" ]]; then # where to stop if [ -f "$tgt_1" ]; then # where we want a_rep="$tgt_1" break; fi a_rep="" # ! possibly 2 cases non-/existing directory, echo "file not found(3): $a_num $a_pathfile" >&2 # ! which might caused by extraction error, break # ! and 3rd case is wrong formal path. fi done # so why thus not see [[ "${a_rep}" == "" ]] && return 1; a_map="${a_map}${a_rep#$2}" # cast back to advanced incremented formal path. a_num="${a_num}," # into correct format while [[ $lines != "" ]]; do b_num=$( sed -n 's/\(^[0-9]\+\),.*/\1/p' <<< $lines ) # collect all the same intermediate formal paths. b_pathfile=$( sed "${b_num}q;d" "$1" ) if [[ "$b_pathfile" = "${a_map}.*" ]]; then a_num="${a_num}${b_num}," lines=${lines#?*,} else break fi done return 0; } lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root if [ $? -eq 0 ]; then # if not the root task if [ $# -ge 7 ]; then # if intends for fork-task inipath=$( pwd ) arcname="$( basename -- "$4" ).XXXXXXXX" arcpath=$( mktemp -t -d -- "$arcname" ) SCRIPT_DIR="$5" SCRIPT_NAME="$6" INI_DIR="$7" TMP_DIR="$arcpath" # AAAAA for handling the 'ct' option if [ "$2" = ct ]; then 7z -px -aou -o"$arcpath" x -- "$4" 1>/dev/null # no masking error since what had handled was on the map which succeeded. d_num="" all_num=$9 while [[ $all_num != "" ]]; do c_num=$( sed -n 's/\(^[0-9]\+\),.*/\1/p' <<< $all_num ) c_pathfile=$( sed "${c_num}q;d" "$1" ) c_formalpath="$8${4#$INI_DIR}" za=$4; MYGUBED "-==========================" "-$( echo ${arcpath}${c_pathfile#$c_formalpath} )" \ c_num c_pathfile c_formalpath za INI_DIR all_num d_pathfile="${arcpath}${c_pathfile#$c_formalpath}" all_num=${all_num#?*,} mycp2 "${3}" "${c_pathfile}" -- <<< "$d_pathfile" || d_num="${d_num}${c_num}," done while [[ $d_num != "" ]]; do arg_3=$c_formalpath arg_4="" myFanOut "$1" "$arcpath" arg_3 arg_4 d_num if [ $? -eq 0 ]; then MYGUBED "-------------------------" "-$( echo $arcpath${arg_3#$c_formalpath} )" arcpath arg_3 arg_4 "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$arcpath${arg_3#$c_formalpath}" "$SCRIPT_DIR" "$SCRIPT_NAME" \ "$arcpath" "$c_formalpath" "$arg_4" fi done rm -rf "$arcpath" exit 0 fi # VVVVV for handling the 'ct' option 7z -px -aou -o"$arcpath" x -- "$4" 2>/dev/null 1>/dev/null cd -- "$arcpath" else # something wrong exit 1 fi else # the root task, generates fd3 # trap prevents from aborted by user and $IFS not yet recovered IFS_OLD="$IFS" function for_trap_exit() { echo -e '\n\nuser abort\n'; [[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up; # be careful not to clean-up /tmp/ here or get into catastrophe since I met. exec 3>&-; exit 1; } trap for_trap_exit SIGINT SIGKILL SIGSEGV # wrong arguments if [ $# -lt 1 ] || [ $# -gt 3 ]; then echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]"; exit 1 fi # no 7z bin file 7z | grep -i "copyright" if [ $? -ne 0 ]; then echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first."; exit 1 fi # set global vars SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" ) TMP_DIR="presently working directory" INI_DIR="" inipath=$( pwd ) arcpath=$( pwd ) # rearrange arguments z1="$1" if [ $# -eq 1 ]; then z2='null' z3='null' elif [ $# -eq 2 ]; then if [ "$2" = 'null' ]; then echo '"null" not applicable.' exit 1 elif [[ "$2" = c[pqrt] ]]; then echo "the absolute path is needed." exit 1 fi z2="$2" z3='null' else if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then echo '"null" not applicable.' exit 1 fi z2="$2" z3="$3" echo "$z3" | grep -q "^/.*" if [ $? -eq 0 ] && [ -d "$z3" ]; then # try to padding "/" echo "$z3" | grep -q ".*/$" [[ $? -ne 0 ]] && z3="${z3}/" else echo; echo "the absolute path is needed."; echo; exit 1 fi fi set -- set "$z1" "$z2" "$z3" exec 3>&1 echo -e "\nscript version: $SERF_VER\n" echo "cmd: $SCRIPT_DIR/$SCRIPT_NAME '$1' '$2' '$3'" echo "pwd: $inipath" echo; echo $( date ); echo; elapse_time_b=$SECONDS # AAAAA for handling the 'ct' option if [ "$2" = ct ]; then # sorting into sorted lines by e.g., the result of [7,9,17,5,2,]. # then use [sed -n 's/\(^[0-9]\+\),.*/\1/p' <<< $var] to get a line; [sed "${i-th}q;d"] to map a line; # $var=${var#?*,} to del a line; inc_p=$( expr match "$full_p" "\(${inc_p}/[^/]\+\).*" ) to get the incremental path; # especially note that by my test, 23000 path lines is about to the upper bound could be passed as the argument $9. sorted_lines=$( cat "$1" | mycp1 "${3}" -- | sed '=;s/\(.*\/\)[^\/]\+$/\1/' | sed 'N;s/\n/ /' |\ sed '/^[0-9]\+[[:blank:]]\+$/d' | sort -k2 | sed 's/\(^[0-9]\+\).*/\1,/' | sed ':a;N;$!ba;s/\n//g' ); while [[ $sorted_lines != "" ]]; do arg_3="" arg_4="" arg_5=$arg_3 tmparc="" myFanOut "$1" "" arg_3 arg_4 sorted_lines if [ $? -eq 0 ]; then # next step is to recursively handle intermediate archive either the one next to the other(while-loop) or # the one advanced to the other(recursive) that is what called fanout. # as for the following call, arg_3 for part-formal-path & arg_4($9) for same-archive candidates are not enough, # arg_5 is as the $8 to be the old-arg_3 inevitably needed too; so does a tmp-dir $7; to be more specific, # the below $arg_3 should be the ${root/tmparc}${arg_3#$arg_5}; $7==$tmparc. # so, commence from this point, the tasks grow number of vars from 7 to 9. "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$arg_3" "$SCRIPT_DIR" "$SCRIPT_NAME" "$tmparc" "$arg_5" "$arg_4" fi done exec 3<&- # done the search and release fd3 echo; echo $( date ); echo; (( time_elapsed=$SECONDS-$elapse_time_b )); echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n"; exit 0; fi # VVVVV for handling the 'ct' option fi if [ "$2" != $'null' ]; then # needs for context search or direct-copy find -L ~+ -type f -regextype grep -iregex "$1" | while read -r ctx; do if [ "$2" = 'cs' ]; then # only standard output the checksum and the path # generate the path of the copied file # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$ctx" | sed 's,'"$TMP_DIR"',,' ) # remove the "/tmp/arc-name" # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${ctx##"$TMP_DIR"} echo -e "==========>>>>>\n[[[FoundHere]]]"; echo -e "$( sha1sum -b -- "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1/' )"; echo -e "${INI_DIR}${tmp_str}\n<<<<<==========\n"; # wrap by considering if too long path. # note, by the grep, the $2 not quoted for options injection. elif [ "$2" = 'cp' ] || [ "$2" = 'cq' ] || [ "$2" = 'cr' ] || ! ! grep $2 -- "$ctx"; then tmp_str=${ctx##"$TMP_DIR"} echo -e "==========>>>>>\n[[[FoundHere]]]"; echo -e "${INI_DIR}${tmp_str}\n<<<<<==========\n"; if [ "$3" != $'null' ]; then i=0; j=$( basename -- "$ctx" ) k=$j while [ -e "${3}${k}" ] || [ -e "${3}${k}.path" ]; do (( i=$i+1 )); k="$j($i)" done # 1) unconditionally path file; 2) match && nonempty $3 or 'cp' will copy target file; # 3) 'cr' for adding checksum in path file. tmp_1="" if [ "$2" = 'cr' ]; then tmp_1=$( sha1sum -b -- "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' ) elif [ "$2" != 'cq' ]; then cp -- "$ctx" "${3}${k}" # copy the matched file to the target folder fi echo "${tmp_1}${INI_DIR}${tmp_str}" > "${3}${k}.path" fi else echo "=====>>>>> $ctx"; fi done else # only for file name search find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs fi t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 ) IFS_OLD="$IFS" IFS=$'\n' for tt in $t; do echo "----->>>>> $tt" # the following line is incorrect but is ok since it through to the end(where meant to be) and "\lfile" meant to be not found. # the new replacement command however weird, when use 1 instead of 0 would go wrong. #ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' ) ttt=$( 7z -px l -- "$tt" | tac | sed '0,/fi/d' | tac | sed '0,/Dat/d' ) v=$( echo "$ttt" | grep -m1 -q -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" ) if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside IFS="$IFS_OLD" # branch out then need recovery # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' ) # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${tt##"$TMP_DIR"} "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}" IFS=$'\n' # branch in then need recovery elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search echo "$ttt" | grep -m1 -q -i "$1" if [ $? -eq 0 ]; then IFS="$IFS_OLD" # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' ) # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${tt##"$TMP_DIR"} "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}" IFS=$'\n' fi else echo "$ttt" | grep -i "$1" fi done IFS="$IFS_OLD" if [ "$inipath" != "$arcpath" ]; then # if not the root cd -- "$inipath" rm -rf -- "$arcpath" else exec 3>&- # done the search and release fd3 echo; echo $( date ); echo; (( time_elapsed=$SECONDS-$elapse_time_b )); echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n"; fi # end of sh ``` ## 以下為 0.7 版腳本內容 ```sh= #!/bin/bash # version 0.7. 20230219. ken woo. copyleft. # from 0.2: # 1. bug fixed [sed -n '/Date/,/files/p'] to [sed -n '/Date/,/\lfile/p']. # 2. found the $IFS bug but not yet resolved. # from 0.2a: # 1. the altering $IFS was changed into read -r for grep processing [grep $2 -- "$ctx"]. # from 0.2b: # 1. added time elapsed. # 2. added a 'cp' identifier for use in $2 if want to copy to $3 directly without context match just only filename match. # 3. added to generate a path file besides of a target file, hence this target file could be easily found manually. # from 0.3: only add some comments. # from 0.3r: # 1. fixed [sed -n '/Date/,/\lfile/p'] even it affects nothing(false negative). # 2a. changed the sed command into correct one since it sometimes failed match, [tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )]. # 2b. it only affects the path in "path file" which would lose some intermediate path-string. the correct one uses bash internal cmd. # 3. added "$2"=="cq" for only copying path files without target files. (since this way is simpler than outputing via std-out). # from 0.4: # 1. fixed [ while [ -e "${3}${k}" ] ] not concerning about path file. # 2. added "$2"=="cr" for adding sha1sum checksum file *.cksm along with path file. to separate is easy access, e.g., "cat *.cksm". # from 0.5: # 1. in [ tmp_1=$( sha1sum "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' ) ], # mind changed. use path file also to have checksum when 'cr' is specified. and for less disk-w and for the filename weakness. # from 0.5r: # 1. sha1sum -b might be faster. # 2. might have fixed the limitation-12 issue by using -- and prevented from using sed',,'. # 3. it is more confortable to output path string rather than path file. added $2=='cs' for this purpose. keyword is [[[Path Here]]]. # from 0.6: # 1. the prior versions have a spawned bug that having $2 search-pattern with $3 path but no copy events. # 2. the 'cs' option of the previous version has wrong logic. # 3. so, for fixing 1 & 2, it is better to have had made moderate changes of code for either keeping c[pqrs] options still effect or # making output to be able to show the exact path by prefixing a [[[FoundHere]]] line for if $2 is just set. then which means, # the original 'cs' function is no longer needed, thereby, # 'cs' would do new behavior of adding a line checksum between [[[FoundHere]]]-line and path-line. # so, these new functions are updated at the next, "purposes". # 4. so, this version is 0.7 would be the final release. # purposes: # 1a. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and # including which packed within archive files even within archive of archive of archive of... # 1b. all results are via standard output by any one of these functions. # 1c. by this one of functions, the "path" is not the exact location if the file is within archive; # for demanding the exact path, using the others following functions all of which could do. # # 2a. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and # including options, e.g., $2==".*void\ \+main\(.*\) -i -n -B5 -A5". (note, the options in trailing is must). # 2b. if found, would be outputed in the following format: # the-matched-line(s) # ==========>>>>> # [[[FoundHere]]] # (this line is the file checksum only exists by using the 'cs' function; see 5c.) # the-exact-path/filename # <<<<<========== # # 3. the same as 2., and copy the matched files with an accompanying path file to the assigned absolute-directory by "$3". # # 4a. the same as 3., but directly copies without specifying context pattern in $2, instead, by setting $2 as 'cp' for this purpose. # note, you can try ".*" in $2 in comparison with "cp". # 4b. another is 'cq' for copying path file only. # 4c. yet another 'cr', 'cs'(see 5.). # # 5a. another path file(*.path) is unconditionally coming up with the copied target file. # 5b. use $2=='cr' to not only 'cq' but also add checksum of the target file in the path file. # 5c. use $2=='cs' to std-output file checksum; see 2b.-format; and no any copy event, only std-output. # # 6. note that generating the path file is the old-school function, kept for compt., except 1c., all done well via std-output. # required: # 1. apt install p7zip-full p7zip-rar # 2. in the /tmp/ dir for extracting files for temporary use, auto deleted after done. # known limitation: # 1. the search pattern is in grep format. # 2. search twice if the soft-link and its real-path are both in the searching-paths. # 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script into completely. # 4. it is used by "7z -px" to skip password protected archives. # 5. it is used by "7z -aou" to auto rename repetitive extracted files. # 6. token 'null' is used, so it is not for pattern search. so is the 'cp', 'cq', 'cs', and 'cr' in $2. # 7. file descriptor 3 is used for identification purpose, so do not use it before running this script. # 8. except parameters $1, $2, $3 are for user, the other params are only for internal use: # "$4" passing a specific archive file for extracting and searching. # "$5" script path. # "$6" script name. # "$7" progressive path for inheriting. # 9. $2, the content pattern, must have the options in trailing. that is like as "the_pattern -opt1 -opt2". # 10. the target path for copying can not be in the searching path or it might be searched and cause loop. # 11a. $2 has 5 identifiers 'null' and 'c[pqrs]' can not be used, however, still could be the pattern by e.g., "nul[l]\{1\}". # 11b. note, weird: [grep nul[l]\{1\} -n --color a.txt] is failed and [this_script ".*a\.txt.*" "nul[l]\{1\} -n --color"] is success. # 11c. $1 needs full qualified name, that is, if "a.txt" is exactly the file to search, it still needs to be like as ".*a\.txt.*" # 11d. the 11c might be the reason of prefixed path. # 12a. known fault: the filenames to inspect if containing 1) comma "," 2) leading hyphen "-" etc., either made sed fault or bash fault. # 12b. this issue might be fixed beginning from ver.0.6. # example1: [this_script ".*\.pdf$"] # find out all the "pdf" files where locate including sub-dirs and within archive files. # # example2: detach the task from shell for free run. # [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown] # # example3: [this_script ".*\.c$" ".*printf.* -n -B3 -A5"] # find out all the "*.c" files which have pattern '.*printf.*' in it. and # also dump out these line-number and the before 3 lines and the after 5 lines. # # example4: [this_script ".*\.c$" ".*printf.* -n -B3 -A5" /home/user/Desktop/target] # the same as example3 and copy matched files(with path files) to folder /home/user/Desktop/target/. # # example5: [this_script .*\.txt$ "[^[:blank:]]\+Group[[:blank:]]\{1\} -n -i -A5 -B4 --color" /home/user/Desktop/temp/] # # example6: [this_script .*\.txt$ cp /home/user/Desktop/temp/] # all the found *.txt files with .path files will directly copy to /home/user/Desktop/temp/. # # example7: "the_copied_target.file" has a companion file "the_copied_target.file.path" unconditionally. so, rm *.path if not required. # # example8: [cat outputfile | sed '/^\(-\|=\)\+>\+\|^<\+=\+\|^[\(Open \)\(ERROR\)\(WARNINGS\)\(\[\[\[Found\)]\|^Is not archive/d;/^$/d;s/\(\/.*\/\)\([^\/]\+$\)/\2/'] # extract only filenames from result output. note, 7z-error/-open-error files may just not supported, so need to treat by other ways. # # example9: [this_script ".*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|bz2\|lzh\|txt\|ex[e_]\{1\}\|bat\|msi\|sys\|dll\|ocx\|bin\|pdf\|djvu\|html\?\|mht\|chm\|css\|js\|ps\|iso\|nrg\|gho\|cab\|avi\|rm\(vb\)\?\|jpe\?g\|bmp\|ico\|gif\|mp[34]\{1\}\|mpe\?g\|png\|doc\|ini\|hlp\|inf\|ttf\|pptx\?\|pps\|xlsx\?\|reg\|dat\|db\|bak\|log\|asm\|inc\|c\(pp\|xx\)\?\|h\(pp\)\?\|lib\)$" &> output.txt] # # example10: [find . -iname "*.path" -exec cat {} >> output.txt \;] if huge amount of files cause [cat * >> output.txt] failed. # # example11. [./this_script ".*\.\(c\|cpp\)$" "(?s)(int|void)[[:space:]]*main[[:space:]]*\([^\)]*\)[[:space:]]*{.* -izaPo --color" | sed 's/\x0//' > Dump.txt] # dig out all the main function definitions. # note the [sed 's/\x0//'], not only 1 occurrence for using this script however, only 1 occurrence when using single grep, that is, # [sed '$s/\x0//'] is enough. SERF_VER="0.7" lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root if [ $? -eq 0 ]; then # if not the root task if [ $# -eq 7 ]; then # if intends for fork-task inipath=$( pwd ) arcname="$( basename -- "$4" ).XXXXXXXX" arcpath=$( mktemp -t -d -- "$arcname" ) SCRIPT_DIR="$5" SCRIPT_NAME="$6" INI_DIR="$7" TMP_DIR="$arcpath" 7z -px -aou -o"$arcpath" x -- "$4" 2>/dev/null 1>/dev/null cd -- "$arcpath" else # something wrong exit 1 fi else # the root task, generates fd3 # trap prevents from aborted by user and $IFS not yet recovered IFS_OLD="$IFS" function for_trap_exit() { echo -e '\n\nuser abort\n'; [[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up; # be careful not to clean-up /tmp/ here or get into catastrophe since I met. exec 3>&-; exit 1; } trap for_trap_exit SIGINT SIGKILL SIGSEGV # wrong arguments if [ $# -lt 1 ] || [ $# -gt 3 ]; then echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]"; exit 1 fi # no 7z bin file 7z | grep -i "copyright" if [ $? -ne 0 ]; then echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first."; exit 1 fi # set global vars SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" ) TMP_DIR="presently working directory" INI_DIR="" inipath=$( pwd ) arcpath=$( pwd ) # rearrange arguments z1="$1" if [ $# -eq 1 ]; then z2='null' z3='null' elif [ $# -eq 2 ]; then if [ "$2" = 'null' ]; then echo '"null" not applicable.' exit 1 elif [ "$2" = cp ] || [ "$2" = cq ] || [ "$2" = cr ]; then echo "the absolute path is needed." exit 1 fi z2="$2" z3='null' else if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then echo '"null" not applicable.' exit 1 fi z2="$2" z3="$3" echo "$z3" | grep -q "^/.*" if [ $? -eq 0 ] && [ -d "$z3" ]; then # try to padding "/" echo "$z3" | grep -q ".*/$" [[ $? -ne 0 ]] && z3="${z3}/" else echo; echo "the absolute path is needed."; echo; exit 1 fi fi set -- set "$z1" "$z2" "$z3" exec 3>&1 echo -e "\nscript version: $SERF_VER\n" echo "cmd: $SCRIPT_DIR/$SCRIPT_NAME '$1' '$2' '$3'" echo "pwd: $inipath" echo; echo $( date ); echo; elapse_time_b=$SECONDS fi if [ "$2" != $'null' ]; then # needs for context search or direct-copy find -L ~+ -type f -regextype grep -iregex "$1" | while read -r ctx; do if [ "$2" = 'cs' ]; then # only standard output the checksum and the path # generate the path of the copied file # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$ctx" | sed 's,'"$TMP_DIR"',,' ) # remove the "/tmp/arc-name" # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${ctx##"$TMP_DIR"} echo -e "==========>>>>>\n[[[FoundHere]]]"; echo -e "$( sha1sum -b -- "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1/' )"; echo -e "${INI_DIR}${tmp_str}\n<<<<<==========\n"; # wrap by considering if too long path. # note, by the grep, the $2 not quoted for options injection. elif [ "$2" = 'cp' ] || [ "$2" = 'cq' ] || [ "$2" = 'cr' ] || ! ! grep $2 -- "$ctx"; then tmp_str=${ctx##"$TMP_DIR"} echo -e "==========>>>>>\n[[[FoundHere]]]"; echo -e "${INI_DIR}${tmp_str}\n<<<<<==========\n"; if [ "$3" != $'null' ]; then i=0; j=$( basename -- "$ctx" ) k=$j while [ -e "${3}${k}" ] || [ -e "${3}${k}.path" ]; do (( i=$i+1 )); k="$j($i)" done # 1) unconditionally path file; 2) match && nonempty $3 or 'cp' will copy target file; # 3) 'cr' for adding checksum in path file. tmp_1="" if [ "$2" = 'cr' ]; then tmp_1=$( sha1sum -b -- "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' ) elif [ "$2" != 'cq' ]; then cp -- "$ctx" "${3}${k}" # copy the matched file to the target folder fi echo "${tmp_1}${INI_DIR}${tmp_str}" > "${3}${k}.path" fi else echo "=====>>>>> $ctx"; fi done else # only for file name search find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs fi t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 ) IFS_OLD="$IFS" IFS=$'\n' for tt in $t; do echo "----->>>>> $tt" # the following line is incorrect but is ok since it through to the end(where meant to be) and "\lfile" meant to be not found. # the new replacement command however weird, when use 1 instead of 0 would go wrong. #ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' ) ttt=$( 7z -px l -- "$tt" | tac | sed '0,/fi/d' | tac | sed '0,/Dat/d' ) v=$( echo "$ttt" | grep -m1 -q -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" ) if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside IFS="$IFS_OLD" # branch out then need recovery # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' ) # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${tt##"$TMP_DIR"} "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}" IFS=$'\n' # branch in then need recovery elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search echo "$ttt" | grep -m1 -q -i "$1" if [ $? -eq 0 ]; then IFS="$IFS_OLD" # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' ) # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${tt##"$TMP_DIR"} "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}" IFS=$'\n' fi else echo "$ttt" | grep -i "$1" fi done IFS="$IFS_OLD" if [ "$inipath" != "$arcpath" ]; then # if not the root cd -- "$inipath" rm -rf -- "$arcpath" else exec 3>&- # done the search and release fd3 echo; echo $( date ); echo; (( time_elapsed=$SECONDS-$elapse_time_b )); echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n"; fi # end of sh ``` # version 0.5r #### 以下為腳本內容 ```sh= #!/bin/bash # version 0.5r. 20230213. ken woo. copyleft. # from 0.2: # 1. bug fixed [sed -n '/Date/,/files/p'] to [sed -n '/Date/,/\lfile/p']. # 2. found the $IFS bug but not yet resolved. # from 0.2a: # 1. the altering $IFS was changed into read -r for grep processing [grep $2 -- "$ctx"]. # from 0.2b: # 1. added time elapsed. # 2. added a 'cp' identifier for use in $2 if want to copy to $3 directly without context match just only filename match. # 3. added to generate a path file besides of a target file, hence this target file could be easily found manually. # from 0.3: only add some comments. # from 0.3r: # 1. fixed [sed -n '/Date/,/\lfile/p'] even it affects nothing(false negative). # 2a. changed the sed command into correct one since it sometimes failed match, [tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )]. # 2b. it only affects the path in "path file" which would lose some intermediate path-string. the correct one uses bash internal cmd. # 3. added "$2"=="cq" for only copying path files without target files. (since this way is simpler than outputing via std-out). # from 0.4: # 1. fixed [ while [ -e "${3}${k}" ] ] not concerning about path file. # 2. added "$2"=="cr" for adding sha1sum checksum file *.cksm along with path file. to separate is easy access, e.g., "cat *.cksm". # from 0.5: # 1. in [ tmp_1=$( sha1sum "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' ) ], # mind changed. use path file also to have checksum when 'cr' is specified. and for less disk-w and for the filename weakness. # purposes: # 1. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and # including which packed within archive files even within archive of archive of archive of... # # 2. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and # including options, e.g., $2==".*void +main\(.*\) -i -n -B5 -A5". (note, the options in trailing is must). # # 3. the same as 2., and copy the matched files to the assigned absolute-directory by "$3". # # 4. the same as 3., but directly copies without specifying context pattern in $2, instead, set $2 as 'cp' for this purpose. # note, you can try ".*" in $2 in comparison with "cp". another is 'cq' for copying path file only. yet another 'cr'. # # 5a. another path file(*.path) is coming up with the copied target file which can be easily found manually via this path file. # 5b. use $2=='cr' to not only 'cq' but also add checksum in path file. # required: # 1. apt install p7zip-full p7zip-rar # 2. in the /tmp/ dir for extracting files for temporary use, auto deleted after done. # known limitation: # 1. the search pattern is in grep format. # 2. search twice if the soft-link and its real-path are both in the searching-paths. # 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script into completely. # 4. it is used by "7z -px" to skip password protected archives. # 5. it is used by "7z -aou" to auto rename repetitive extracted files. # 6. token 'null' is used, so it is not for pattern search. so is the 'cp', 'cq', and 'cr' in $2. # 7. file descriptor 3 is used for identification purpose, so do not use it before running this script. # 8. except parameters $1, $2, $3 are for user, the other params are only for internal use: # "$4" passing a specific archive file for extracting and searching. # "$5" script path. # "$6" script name. # "$7" progressive path for inheriting. # 9. $2, the content pattern, must have the options in trailing. that is like as "the_pattern -opt1 -opt2". # 10. the target path for copying can not be in the searching path or it might be searched and cause loop. # 11a. $2 has 4 identifiers 'null' and 'c[pqr]' can not be used, however, still could be the pattern by e.g., "nul[l]\{1\}". # 11b. note, weird: [grep nul[l]\{1\} -n --color a.txt] is failed and [this_script ".*a\.txt.*" "nul[l]\{1\} -n --color"] is success. # 11c. $1 needs full qualified name, that is, if "a.txt" is exactly the file to search, it still needs to be like as ".*a\.txt.*" # 11d. the 11c might be the reason of prefixed path. # 12. known fault: the filenames to inspect if containing 1) comma "," 2) leading hyphen "-" etc., either made sed fault or grep fault. # example1: [this_script ".*\.pdf$"] # find out all the "pdf" files where locate including sub-dirs and within archive files. # # example2: detach the task from shell for free run. # [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown] # # example3: [this_script ".*\.c$" ".*printf.* -n -B3 -A5"] # find out all the "*.c" files which have pattern '.*printf.*' in it. and # also dump out these line-number and the before 3 lines and the after 5 lines. # # example4: [this_script ".*\.c$" ".*printf.* -n -B3 -A5" /home/user/Desktop/target] # the same as example3 and copy matched files to folder /home/user/Desktop/target/. # # example5: [this_script .*\.txt$ "[^[:blank:]]\+Group[[:blank:]]\{1\} -n -i -A5 -B4 --color" /home/user/Desktop/temp/] # # example6: [this_script .*\.txt$ cp /home/user/Desktop/temp/] # all the found *.txt files with .path files will directly copy to /home/user/Desktop/temp/. # # example7: "the_copied_target.file" has a companion file "the_copied_target.file.path" unconditionally. so, rm *.path if not required. # # example8: [cat outputfile | sed '/^-\+>\+\|^[\(Open \)\(ERROR\)\(WARNINGS\)]\|^Is not archive/d;/^$/d;s/\(\/.*\/\)\([^\/]\+$\)/\2/'] # extract only filenames from result output. note, 7z-error/-open-error files may just not supported, so need to treat by other ways. # # example9: [this_script ".*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|bz2\|lzh\|txt\|ex[e_]\{1\}\|bat\|msi\|sys\|dll\|ocx\|bin\|pdf\|djvu\|html\?\|mht\|chm\|css\|js\|ps\|iso\|nrg\|gho\|cab\|avi\|rm\(vb\)\?\|jpe\?g\|bmp\|ico\|gif\|mp[34]\{1\}\|mpe\?g\|png\|doc\|ini\|hlp\|inf\|ttf\|pptx\?\|pps\|xlsx\?\|reg\|dat\|db\|bak\|log\|asm\|inc\|c\(pp\|xx\)\?\|h\(pp\)\?\|lib\)$" &> output.txt] SERF_VER="0.5r" lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root if [ $? -eq 0 ]; then # if not the root task if [ $# -eq 7 ]; then # if intends for fork-task inipath=$( pwd ) arcname="$( basename "$4" ).XXXXXXXX" arcpath=$( mktemp -t -d "$arcname" ) SCRIPT_DIR="$5" SCRIPT_NAME="$6" INI_DIR="$7" TMP_DIR="$arcpath" 7z -px -aou -o"$arcpath" x "$4" 2>/dev/null 1>/dev/null cd "$arcpath" else # something wrong exit 1 fi else # the root task, generates fd3 # trap prevents from aborted by user and $IFS not yet recovered IFS_OLD="$IFS" function for_trap_exit() { echo -e '\n\nuser abort\n'; [[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up; # be careful not to clean-up /tmp/ here or get into catastrophe since I met. exec 3>&-; exit 1; } trap for_trap_exit SIGINT SIGKILL SIGSEGV # wrong arguments if [ $# -lt 1 ] || [ $# -gt 3 ]; then echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]"; exit 1 fi # no 7z bin file 7z | grep -i "copyright" if [ $? -ne 0 ]; then echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first."; exit 1 fi # set global vars SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" ) TMP_DIR="presently working directory" INI_DIR="" inipath=$( pwd ) arcpath=$( pwd ) # rearrange arguments z1="$1" if [ $# -eq 1 ]; then z2='null' z3='null' elif [ $# -eq 2 ]; then if [ "$2" = 'null' ]; then echo '"null" not applicable.' exit 1 elif [ "$2" = cp ] || [ "$2" = cq ] || [ "$2" = cr ]; then echo "the absolute path is needed." exit 1 fi z2="$2" z3='null' else if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then echo '"null" not applicable.' exit 1 fi z2="$2" z3="$3" echo "$z3" | grep -q "^/.*" if [ $? -eq 0 ] && [ -d "$z3" ]; then # try to padding "/" echo "$z3" | grep -q ".*/$" [[ $? -ne 0 ]] && z3="${z3}/" else echo; echo "the absolute path is needed."; echo; exit 1 fi fi set -- set "$z1" "$z2" "$z3" exec 3>&1 echo -e "\nscript version: $SERF_VER\n" echo "cmd: $SCRIPT_DIR/$SCRIPT_NAME '$1' '$2' '$3'" echo "pwd: $inipath" echo; echo $( date ); echo; elapse_time_b=$SECONDS fi if [ "$2" != $'null' ]; then # needs for context search or direct-copy find -L ~+ -type f -regextype grep -iregex "$1" | while read -r ctx; do # note, by the grep, the $2 not quoted for options injection [[ "$2" = 'cp' ]] || [[ "$2" = 'cq' ]] || [[ "$2" = 'cr' ]] || grep $2 -- "$ctx" if [ $? -eq 0 ]; then echo "$ctx [[[Found Here]]]"; echo -e "----------------\n"; if [ "$3" != $'null' ]; then i=0; j=$( basename -- "$ctx" ) k=$j while [ -e "${3}${k}" ] || [ -e "${3}${k}.path" ]; do (( i=$i+1 )); k="$j($i)" done [[ "$2" = 'cp' ]] && cp "$ctx" "${3}${k}" # copy the matched file to the target folder # generate the path of the copied file # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$ctx" | sed 's,'"$TMP_DIR"',,' ) # remove the "/tmp/arc-name" # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${ctx##"$TMP_DIR"} tmp_1="" [[ "$2" = 'cr' ]] && tmp_1=$( sha1sum "$ctx" | sed 's/^\([0-9a-fA-F]\+\).*/\1\ /' ) echo "${tmp_1}${INI_DIR}${tmp_str}" > "${3}${k}.path" fi fi done else # only for file name search find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs fi t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 ) IFS_OLD="$IFS" IFS=$'\n' for tt in $t; do echo -e "----->>>>> $tt\n" # the following line is incorrect but is ok since it through to the end(where meant to be) and "\lfile" meant to be not found. # the new replacement command however weird, when use 1 instead of 0 would go wrong. #ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' ) ttt=$( 7z -px l "$tt" | tac | sed '0,/fi/d' | tac | sed '0,/Dat/d' ) v=$( echo "$ttt" | grep -m1 -q -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" ) if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside IFS="$IFS_OLD" # branch out then need recovery # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' ) # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${tt##"$TMP_DIR"} "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}" IFS=$'\n' # branch in then need recovery elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search echo "$ttt" | grep -m1 -q -i "$1" if [ $? -eq 0 ]; then IFS="$IFS_OLD" # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' ) # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${tt##"$TMP_DIR"} "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}" IFS=$'\n' fi else echo "$ttt" | grep -i "$1" fi done IFS="$IFS_OLD" if [ "$inipath" != "$arcpath" ]; then # if not the root cd "$inipath" rm -rf "$arcpath" else exec 3>&- # done the search and release fd3 echo; echo $( date ); echo; (( time_elapsed=$SECONDS-$elapse_time_b )); echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n"; fi # end of sh ``` # version 0.4 ```sh= #!/bin/bash # version 0.4. 20230210. ken woo. copyleft. # from 0.2: # 1. bug fixed [sed -n '/Date/,/files/p'] to [sed -n '/Date/,/\lfile/p']. # 2. found the $IFS bug but not yet resolved. # from 0.2a: # 1. the altering $IFS was changed into read -r for grep processing [grep $2 -- "$ctx"]. # from 0.2b: # 1. added time elapsed. # 2. added a 'cp' identifier for use in $2 if want to copy to $3 directly without context match just only filename match. # 3. added to generate a path file besides of a target file, hence this target file could be easily found manually. # from 0.3: only add some comments. # from 0.3r: # 1. fixed [sed -n '/Date/,/\lfile/p'] even it affects nothing(false negative). # 2a. changed the sed command into correct one since it sometimes failed match, [tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' )]. # 2b. it only affects the path in "path file" which would lose some intermediate path-string. the correct one uses bash internal cmd. # 3. added "$2"=="cq" for only copying path files without target files. (since this way is simpler than outputing via std-out). # purposes: # 1. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and # including which packed within archive files even within archive of archive of archive of... # # 2. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and # including options, e.g., $2==".*void +main\(.*\) -i -n -B5 -A5". (note, the options in trailing is must). # # 3. the same as 2., and copy the matched files to the assigned absolute-directory by "$3". # # 4. the same as 3., but directly copies without specifying context pattern in $2, instead, set $2 as 'cp' for this purpose. # note, you can try ".*" in $2 in comparison with "cp". another is 'cq' for copying path file only. # # 5. another path file(*.path) is coming up with the copied target file which can be easily found manually via this path file. # required: # 1. apt install p7zip-full p7zip-rar # 2. in the /tmp/ dir for extracting files for temporary use, auto deleted after done. # known limitation: # 1. the search pattern is in grep format. # 2. search twice if the soft-link and its real-path are both in the searching-paths. # 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script into completely. # 4. it is used by "7z -px" to skip password protected archives. # 5. it is used by "7z -aou" to auto rename repetitive extracted files. # 6. token 'null' is used, so it is not for pattern search. so is the 'cp', and 'cq' in $2. # 7. file descriptor 3 is used for identification purpose, so do not use it before running this script. # 8. except parameters $1, $2, $3 are for user, the other params are only for internal use: # "$4" passing a specific archive file for extracting and searching. # "$5" script path. # "$6" script name. # "$7" progressive path for inheriting. # 9. $2, the content pattern, must have the options in trailing. that is like as "the_pattern -opt1 -opt2". # 10. the target path for copying can not be in the searching path or it might be searched and cause loop. # 11a. $2 has 3 identifiers 'null' and 'cp' and 'cq' can not be used, however, still could be the pattern by e.g., "nul[l]\{1\}". # 11b. note, weird: [grep nul[l]\{1\} -n --color a.txt] is failed and [this_script ".*a\.txt.*" "nul[l]\{1\} -n --color"] is success. # 11c. $1 needs full qualified name, that is, if "a.txt" is exactly the file to search, it still needs to be like as ".*a\.txt.*" # 11d. the 11c might be the reason of prefixed path. # 12. known fault: the filenames to inspect if containing 1) comma "," 2) leading hyphen "-" etc., either made sed fault or grep fault. # example1: [this_script ".*\.pdf$"] # find out all the "pdf" files where locate including sub-dirs and within archive files. # # example2: detach the task from shell for free run. # [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown] # # example3: [this_script ".*\.c$" ".*printf.* -n -B3 -A5"] # find out all the "*.c" files which have pattern '.*printf.*' in it. and # also dump out these line-number and the before 3 lines and the after 5 lines. # # example4: [this_script ".*\.c$" ".*printf.* -n -B3 -A5" /home/user/Desktop/target] # the same as example3 and copy matched files to folder /home/user/Desktop/target/. # # example5: [this_script .*\.txt$ "[^[:blank:]]\+Group[[:blank:]]\{1\} -n -i -A5 -B4 --color" /home/user/Desktop/temp/] # # example6: [this_script .*\.txt$ cp /home/user/Desktop/temp/] # all the found *.txt files with .path files will directly copy to /home/user/Desktop/temp/. # # example7: "the_copied_target.file" has a companion file "the_copied_target.file.path" unconditionally. so, rm *.path if not required. lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root if [ $? -eq 0 ]; then # if not the root task if [ $# -eq 7 ]; then # if intends for fork-task inipath=$( pwd ) arcname="$( basename "$4" ).XXXXXXXX" arcpath=$( mktemp -t -d "$arcname" ) SCRIPT_DIR="$5" SCRIPT_NAME="$6" INI_DIR="$7" TMP_DIR="$arcpath" 7z -px -aou -o"$arcpath" x "$4" 2>/dev/null 1>/dev/null cd "$arcpath" else # something wrong exit 1 fi else # the root task, generates fd3 # trap prevents from aborted by user and $IFS not yet recovered IFS_OLD="$IFS" function for_trap_exit() { echo -e '\n\nuser abort\n'; [[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up; # be careful not to clean-up /tmp/ here or get into catastrophe since I met. exec 3>&-; exit 1; } trap for_trap_exit SIGINT SIGKILL SIGSEGV # wrong arguments if [ $# -lt 1 ] || [ $# -gt 3 ]; then echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]"; exit 1 fi # no 7z bin file 7z | grep -i "copyright" if [ $? -ne 0 ]; then echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first."; exit 1 fi # set global vars SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" ) TMP_DIR="presently working directory" INI_DIR="" inipath=$( pwd ) arcpath=$( pwd ) # rearrange arguments z1="$1" if [ $# -eq 1 ]; then z2='null' z3='null' elif [ $# -eq 2 ]; then if [ "$2" = 'null' ]; then echo '"null" not applicable.' exit 1 elif [ "$2" = cp ] || [ "$2" = cq ]; then echo "the absolute path is needed." exit 1 fi z2="$2" z3='null' else if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then echo '"null" not applicable.' exit 1 fi z2="$2" z3="$3" echo "$z3" | grep -q "^/.*" if [ $? -eq 0 ] && [ -d "$z3" ]; then # try to padding "/" echo "$z3" | grep -q ".*/$" [[ $? -ne 0 ]] && z3="${z3}/" else echo; echo "the absolute path is needed."; echo; exit 1 fi fi set -- set "$z1" "$z2" "$z3" exec 3>&1 echo; echo $( date ); echo; elapse_time_b=$SECONDS fi if [ "$2" != $'null' ]; then # needs for context search or direct-copy find -L ~+ -type f -regextype grep -iregex "$1" | while read -r ctx; do # note, by the grep, the $2 not quoted for options injection [[ "$2" = 'cp' ]] || [[ "$2" = 'cq' ]] || grep $2 -- "$ctx" if [ $? -eq 0 ]; then echo "$ctx [[[Found Here]]]"; echo -e "----------------\n"; if [ "$3" != $'null' ]; then i=0; j=$( basename -- "$ctx" ) k=$j while [ -e "${3}${k}" ]; do (( i=$i+1 )); k="$j($i)" done [[ "$2" != 'cq' ]] && cp "$ctx" "${3}${k}" # copy the matched file to the target folder # generate the path of the copied file # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$ctx" | sed 's,'"$TMP_DIR"',,' ) # remove the "/tmp/arc-name" # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${ctx##"$TMP_DIR"} echo "${INI_DIR}${tmp_str}" > "${3}${k}.path" fi fi done else # only for file name search find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs fi t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 ) IFS_OLD="$IFS" IFS=$'\n' for tt in $t; do echo -e "----->>>>> $tt\n" # the following line is incorrect but is ok since it through to the end(where meant to be) and "\lfile" meant to be not found. # the new replacement command however weird, when use 1 instead of 0 would go wrong. #ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' ) ttt=$( 7z -px l "$tt" | tac | sed '0,/fi/d' | tac | sed '0,/Dat/d' ) v=$( echo "$ttt" | grep -m1 -q -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" ) if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside IFS="$IFS_OLD" # branch out then need recovery # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' ) # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${tt##"$TMP_DIR"} "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}" IFS=$'\n' # branch in then need recovery elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search echo "$ttt" | grep -m1 -q -i "$1" if [ $? -eq 0 ]; then IFS="$IFS_OLD" # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions #tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' ) # the previous sed command sometimes causes failed match since some special chars still not escape, # so changed to the following line tmp_str=${tt##"$TMP_DIR"} "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}" IFS=$'\n' fi else echo "$ttt" | grep -i "$1" fi done IFS="$IFS_OLD" if [ "$inipath" != "$arcpath" ]; then # if not the root cd "$inipath" rm -rf "$arcpath" else exec 3>&- # done the search and release fd3 echo; echo $( date ); echo; (( time_elapsed=$SECONDS-$elapse_time_b )); echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n"; fi # end of sh ``` # version 0.3r ```sh= #!/bin/bash # version 0.3r. 20230207. ken woo. copyleft. # from 0.2: # 1. bug fixed [sed -n '/Date/,/files/p'] to [sed -n '/Date/,/\lfile/p']. # 2. found the $IFS bug but not yet resolved. # from 0.2a: # 1. the altering $IFS was changed into read -r for grep processing [grep $2 -- "$ctx"]. # from 0.2b: # 1. added time elapsed. # 2. added a 'cp' identifier for use in $2 if want to copy to $3 directly without context match just only filename match. # 3. added to generate a path file besides of a target file, hence this target file could be easily found manually. # from 0.3: only add some comments. # purposes: # 1. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and # including which packed within archive files even within archive of archive of archive of... # # 2. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and # including options, e.g., $2==".*void +main\(.*\) -i -n -B5 -A5". (note, the options in trailing is must). # # 3. the same as 2., and copy the matched files to the assigned absolute-directory by "$3". # # 4. the same as 3., but directly copies without specifying context pattern in $2, instead, set $2 as 'cp' for this purpose. # note, you can try ".*" in $2 in comparison with "cp". # # 5. another path file(*.path) is coming up with the copied target file which can be easily found manually via this path file. # required: # 1. apt install p7zip-full p7zip-rar # 2. in the /tmp/ dir for extracting files for temporary use, auto deleted after done. # known limitation: # 1. the search pattern is in grep format. # 2. search twice if the soft-link and its real-path are both in the searching-paths. # 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script into completely. # 4. it is used by "7z -px" to skip password protected archives. # 5. it is used by "7z -aou" to auto rename repetitive extracted files. # 6. token 'null' is used, so it is not for pattern search. so is the 'cp' in $2. # 7. file descriptor 3 is used for identification purpose, so do not use it before running this script. # 8. except parameters $1, $2, $3 are for user, the other params are only for internal use: # "$4" passing a specific archive file for extracting and searching. # "$5" script path. # "$6" script name. # "$7" progressive path for inheriting. # 9. $2, the content pattern, must have the options in trailing. that is like as "the_pattern -opt1 -opt2". # 10. the target path for copying can not be in the searching path or it might be searched and cause loop. # 11. $2 has 2 identifiers 'null' and 'cp' can not be used, however, still could be the pattern by e.g., "nul[l]\{1\}". # 12. known fault: the filenames to inspect if containing 1) comma "," 2) leading hyphen "-" etc., either made sed fault or grep fault. # example1: [this_script ".*\.pdf$"] # find out all the "pdf" files where locate including sub-dirs and within archive files. # # example2: detach the task from shell for free run. # [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown] # # example3: [this_script ".*\.c$" ".*printf.* -n -B3 -A5"] # find out all the "*.c" files which have pattern '.*printf.*' in it. and # also dump out these line-number and the before 3 lines and the after 5 lines. # # example4: [this_script ".*\.c$" ".*printf.* -n -B3 -A5" /home/user/Desktop/target] # the same as example3 and copy matched files to folder /home/user/Desktop/target/. # # example5: [this_script .*\.txt$ "[^[:blank:]]\+Group[[:blank:]]\{1\} -n -i -A5 -B4 --color" /home/user/Desktop/temp/] # # example6: [this_script .*\.txt$ cp /home/user/Desktop/temp/] # all the found *.txt files will directly copy to /home/user/Desktop/temp/. # # example7: "the_copied_target.file" has a companion file "the_copied_target.file.path" unconditionally. so, rm *.path if not required. lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root if [ $? -eq 0 ]; then # if not the root task if [ $# -eq 7 ]; then # if intends for fork-task inipath=$( pwd ) arcname="$( basename "$4" ).XXXXXXXX" arcpath=$( mktemp -t -d "$arcname" ) SCRIPT_DIR="$5" SCRIPT_NAME="$6" INI_DIR="$7" TMP_DIR="$arcpath" 7z -px -aou -o"$arcpath" x "$4" 2>/dev/null 1>/dev/null cd "$arcpath" else # something wrong exit 1 fi else # the root task, generates fd3 # trap prevents from aborted by user and $IFS not yet recovered IFS_OLD="$IFS" function for_trap_exit() { echo -e '\n\nuser abort\n'; [[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up; # be careful not to clean-up /tmp/ here or get into catastrophe since I met. exec 3>&-; exit 1; } trap for_trap_exit SIGINT SIGKILL SIGSEGV # wrong arguments if [ $# -lt 1 ] || [ $# -gt 3 ]; then echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]"; exit 1 fi # no 7z bin file 7z | grep -i "copyright" if [ $? -ne 0 ]; then echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first."; exit 1 fi # set global vars SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" ) TMP_DIR="presently working directory" INI_DIR="" inipath=$( pwd ) arcpath=$( pwd ) # rearrange arguments z1="$1" if [ $# -eq 1 ]; then z2='null' z3='null' elif [ $# -eq 2 ]; then if [ "$2" = 'null' ]; then echo '"null" not applicable.' exit 1 elif [ "$2" = 'cp' ]; then echo "the absolute path is needed." exit 1 fi z2="$2" z3='null' else if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then echo '"null" not applicable.' exit 1 fi z2="$2" z3="$3" echo "$z3" | grep -q "^/.*" if [ $? -eq 0 ] && [ -d "$z3" ]; then # try to padding "/" echo "$z3" | grep -q ".*/$" [[ $? -ne 0 ]] && z3="${z3}/" else echo; echo "the absolute path is needed."; echo; exit 1 fi fi set -- set "$z1" "$z2" "$z3" exec 3>&1 echo; echo $( date ); echo; elapse_time_b=$SECONDS fi if [ "$2" != $'null' ]; then # needs for context search or direct-copy find -L ~+ -type f -regextype grep -iregex "$1" | while read -r ctx; do # note, by the grep, the $2 not quoted for options injection [[ "$2" = 'cp' ]] || grep $2 -- "$ctx" if [ $? -eq 0 ]; then echo "$ctx [[[Found Here]]]"; echo -e "----------------\n"; if [ "$3" != $'null' ]; then i=0; j=$( basename -- "$ctx" ) k=$j while [ -e "${3}${k}" ]; do (( i=$i+1 )); k="$j($i)" done cp "$ctx" "${3}${k}" # copy the matched file to the target folder # generate the path of the copied file # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions tmp_str=$( echo "$ctx" | sed 's,'"$TMP_DIR"',,' ) # remove the "/tmp/arc-name" echo "${INI_DIR}${tmp_str}" > "${3}${k}.path" fi fi done else # only for file name search find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs fi t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 ) IFS_OLD="$IFS" IFS=$'\n' for tt in $t; do echo -e "----->>>>> $tt\n" ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' ) v=$( echo "$ttt" | grep -m1 -q -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" ) if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside IFS="$IFS_OLD" # branch out then need recovery # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' ) "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}" IFS=$'\n' # branch in then need recovery elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search echo "$ttt" | grep -m1 -q -i "$1" if [ $? -eq 0 ]; then IFS="$IFS_OLD" # note the current file is in /tmp/arc-name/path-in-arc, so we need path-in-arc string # however it might be in the $PWD instead, so use sed to cover both conditions tmp_str=$( echo "$tt" | sed 's,'"$TMP_DIR"',,' ) "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" "${INI_DIR}${tmp_str}" IFS=$'\n' fi else echo "$ttt" | grep -i "$1" fi done IFS="$IFS_OLD" if [ "$inipath" != "$arcpath" ]; then # if not the root cd "$inipath" rm -rf "$arcpath" else exec 3>&- # done the search and release fd3 echo; echo $( date ); echo; (( time_elapsed=$SECONDS-$elapse_time_b )); echo -e "\nit took $(( $time_elapsed / 60 )) minute(s) $(( $time_elapsed % 60 )) seconds\n"; fi # end of sh ``` ## 以下是開發歷程 # grep 問題求助 壓縮檔 test01.zip 是測試用的。工作目錄最好在葉節點,並且有存放此壓縮檔。 ``` 功能: 可搜尋工作目錄及子目錄下的,列出所有有 $1 之 grep 型式的 pattern 的檔案。且位於壓縮檔內,的再壓縮檔內的檔案都可被列出來。 其次若再指定 $2 pattern,則還會去撈檔案內容。且此 $2 可使用像 "'tool' -i -n -v -A5" 包含了 grep 選項; 不過選項必須像這樣後綴。 再次之若再指定 $3,則會將滿足 $2 的那些檔案複製到 $3 指定的絕對路徑,若檔案重覆會自動改名。 bug 出現在 143 行. grep $2 "$ctx" ############### the bug is here ############################################################### #grep $2 "$ctx" # note the $2 not quoted for options injection 自己的除錯: 經將 grep $2 "$ctx" 展開來/dump 各變數內容,並沒有發現有什麼異樣,不過 grep 的結果就是失敗的/回傳 1. 而試過在該處寫展開的, 例如 grep 'tool' /home/ken/Desktop/1.txt 是可以成功的。 例如 grep 'tool' -i -n -A5 /home/ken/Desktop/1.txt 是可以成功的。 可直接拿壓縮檔內的 serf.sh.debug 來試,其與 serf.sh 內容相同,只是我追加了一些除錯的行。 ``` ``` ex1: ./serf.sh ".*\.txt$" ex2: ./serf.sh ".*\.txt$\|.*\.debug$" ex3: ./serf.sh ".*\.txt$\|.*\.debug$" "tool" ex4: ./serf.sh ".*\.txt$\|.*\.debug$" "'tool'" (bug) ex5: ./serf.sh ".*\.txt$\|.*\.debug$" "'tool' -i -n" (bug) ``` 以下是腳本內容 ```sh= #!/bin/bash # version 0.2. 20230204. copyleft. # purposes: # 1. dig out all the "$1" files(only show results of file path/name) under current working dir/subdir and # including which packed within archive files even within archive of archive of archive of... # # 2. the same as 1., and also search contents of the matched files by pattern "$2", which is also in grep style pattern and # including options, e.g., $2=="'.*void +main\(.*\)' -i -n -B5 -A5". (note, the options in trailing is must). # # 3. the same as 2., and copy the matched files to the assigned absolute-directory by "$3". # required: # 1. apt install p7zip-full p7zip-rar # 2. in the /tmp/ dir for extracting files for temporary use, auto removed after done. # known limitation: # 1. the search pattern is in grep format. # 2. search twice if the soft-link and its real-path are both in the searching-paths. # 3. multi-volume archive will cause 7z to exit. so rename existing ones are needed before running this script for completely. # 4. it is used by "7z -px" to skip password protected archives. # 5. it is used by "7z -aou" to auto rename repetitive copied files. # 6. token 'null' is used, so it is not for pattern search. # 7. file descriptor 3 is used for identification purpose, so do not use it before running this script. # 8. except parameters $1, $2, $3 are for user, the other params are only for internal use: # "$4" passing a specific archive file for extracting and searching. # "$5" script path. # "$6" script name. # 9. $2, the content pattern, must have the options in trailing. that is like "'the_pattern' -opt1 -opt2". # example1: [this_script ".*\.pdf$"] # find out all the "pdf" files where locate including sub-dirs and within archive files. # # example2: detach the task from shell for free run. # [sudo nohup this_sh ".*\.pdf$\|.*\.doc$\|.*\.chm$\|.*\.djvu$\|.*\.pptx\?$\|.*\.pps$\|.*\.xlsx\?$\|.*\.mht$" &> ~/Output.txt & disown] # # example3: [this_script ".*\.c$" "'.*printf.*' -n -B3 -A5"] # find out all the "*.c" files which have pattern '.*printf.*' in it. and # also dump out these line-number and the before 3 lines and the after 5 lines. # # example4: [this_script ".*\.c$" "'.*printf.*' -n -B3 -A5" /home/user/Desktop/target] # the same as example3 and copy matched files to folder /home/user/Desktop/target/. lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root if [ $? -eq 0 ]; then # if not the root task if [ $# -eq 6 ]; then # if intends for fork-task inipath=$( pwd ) arcname="$( basename "$4" ).XXXXXXXX" arcpath=$( mktemp -t -d "$arcname" ) SCRIPT_DIR="$5" SCRIPT_NAME="$6" 7z -px -aou -o"$arcpath" x "$4" 2>/dev/null 1>/dev/null cd "$arcpath" else # something wrong exit 1 fi else # the root task, generates fd3 # trap prevents from aborted by user and $IFS not yet recovered IFS_OLD="$IFS" function for_trap_exit() { echo -e '\n\nuser abort\n'; [[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up; exec 3>&-; exit 1; } trap for_trap_exit SIGINT SIGKILL # wrong arguments if [ $# -lt 1 ] || [ $# -gt 3 ]; then echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]"; exit 1 fi # no 7z bin file 7z | grep -i "copyright" if [ $? -ne 0 ]; then echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first."; exit 1 fi # set global vars SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" ) INI_DIR=$( pwd ) inipath=$( pwd ) arcpath=$( pwd ) # rearrange arguments z1="$1" if [ $# -eq 1 ]; then z2='null' z3='null' elif [ $# -eq 2 ]; then if [ "$2" = 'null' ]; then echo '"null" not applicable.' exit 1 fi z2="$2" z3='null' else if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then echo '"null" not applicable.' exit 1 fi z2="$2" z3="$3" echo "$z3" | grep -q "^/.*" if [ $? -eq 0 ] && [ -d "$z3" ]; then # try to padding "/" echo "$z3" | grep -q ".*/$" [[ $? -ne 0 ]] && z3="${z3}/" else echo; echo "the absolute path is needed."; echo; exit 1 fi fi set -- set "$z1" "$z2" "$z3" exec 3>&1 echo; echo $( date ); echo; fi if [ "$2" != $'null' ]; then # needs for context search indivCS=$( find -L ~+ -type f -regextype grep -iregex "$1" ) IFS_OLD="$IFS" IFS=$'\n' for ctx in $indivCS; do grep $2 "$ctx" ############### the bug is here ############################################################### #grep $2 "$ctx" # note the $2 not quoted for options injection if [ $? -eq 0 ]; then echo "$ctx [[[Found Here]]]"; echo -e "----------------\n"; if [ "$3" != $'null' ]; then i=0; j=$( basename -- "$ctx" ) k=$j while [ -e "${3}${k}" ]; do (( i=$i+1 )); k="$j($i)" done cp "$ctx" "${3}${k}" fi fi done IFS="$IFS_OLD" else # only for file name search find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs fi t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 ) IFS_OLD="$IFS" IFS=$'\n' for tt in $t; do echo -e "----->>>>> $tt\n" ttt=$( 7z -px l "$tt" | sed -n '/Date/,/files/p' ) v=$( echo "$ttt" | grep -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" ) if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside IFS="$IFS_OLD" # branch out then need recovery "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" IFS=$'\n' # branch in then need recovery elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search echo "$ttt" | grep -m1 -q -i "$1" if [ $? -eq 0 ]; then IFS="$IFS_OLD" "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" IFS=$'\n' fi else echo "$ttt" | grep -i "$1" fi done IFS="$IFS_OLD" if [ "$inipath" != "$arcpath" ]; then # if not the root cd "$inipath" rm -rf "$arcpath" else exec 3>&- # done the search and release fd3 echo; echo $( date ); echo; fi # end of sh ``` ## 以下修正了一處bug。 ## 另外也初步發現 grep 的異常行為是$IFS在作崇。 以下腳本中有特別標明出來。 但還不是解法,可能還需再想想把該段落怎麼改寫吧。 或是採用有前輩提出的將 pattern & options 分開成兩個引數的做法。 ```shell #!/bin/bash # version 0.2a. 20230205. lsof -a -p $$ -d 3 2>/dev/null | grep -i -q "\ 3w\ \|\ 3u\ " # use file descriptor 3 for recognizing root if [ $? -eq 0 ]; then # if not the root task if [ $# -eq 6 ]; then # if intends for fork-task inipath=$( pwd ) arcname="$( basename "$4" ).XXXXXXXX" arcpath=$( mktemp -t -d "$arcname" ) SCRIPT_DIR="$5" SCRIPT_NAME="$6" 7z -px -aou -o"$arcpath" x "$4" 2>/dev/null 1>/dev/null cd "$arcpath" else # something wrong exit 1 fi else # the root task, generates fd3 # trap prevents from aborted by user and $IFS not yet recovered IFS_OLD="$IFS" function for_trap_exit() { echo -e '\n\nuser abort\n'; [[ "$IFS_OLD" != "$IFS" ]] && IFS="$IFS_OLD" && echo clean up; # be careful not to clean-up /tmp/ here or get into catastrophe since I met. exec 3>&-; exit 1; } trap for_trap_exit SIGINT SIGKILL # wrong arguments if [ $# -lt 1 ] || [ $# -gt 3 ]; then echo; echo "usage: ${0} \"filename pattern for search\" [\"context pattern in file\" [\"path to extract if match\"] ]"; exit 1 fi # no 7z bin file 7z | grep -i "copyright" if [ $? -ne 0 ]; then echo; echo "utility 7z is needed. please try \"apt install p7zip-full p7zip-rar\" first."; exit 1 fi # set global vars SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) SCRIPT_NAME=$( basename -- "${BASH_SOURCE[0]}" ) INI_DIR=$( pwd ) inipath=$( pwd ) arcpath=$( pwd ) # rearrange arguments z1="$1" if [ $# -eq 1 ]; then z2='null' z3='null' elif [ $# -eq 2 ]; then if [ "$2" = 'null' ]; then echo '"null" not applicable.' exit 1 fi z2="$2" z3='null' else if [ "$2" = 'null' ] || [ "$3" = 'null' ]; then echo '"null" not applicable.' exit 1 fi z2="$2" z3="$3" echo "$z3" | grep -q "^/.*" if [ $? -eq 0 ] && [ -d "$z3" ]; then # try to padding "/" echo "$z3" | grep -q ".*/$" [[ $? -ne 0 ]] && z3="${z3}/" else echo; echo "the absolute path is needed."; echo; exit 1 fi fi set -- set "$z1" "$z2" "$z3" exec 3>&1 echo; echo $( date ); echo; fi if [ "$2" != $'null' ]; then # needs for context search indivCS=$( find -L ~+ -type f -regextype grep -iregex "$1" ) IFS_OLD="$IFS" IFS=$'\n' for ctx in $indivCS; do IFS=$IFS_OLD ########################## temporarily added ################################################ grep $2 -- "$ctx" ############### the bug is here ############################################################### #exec grep $2 -- "$ctx" #grep $2 "$ctx" # note the $2 not quoted for options injection if [ $? -eq 0 ]; then echo "$ctx [[[Found Here]]]"; echo -e "----------------\n"; if [ "$3" != $'null' ]; then i=0; j=$( basename -- "$ctx" ) k=$j while [ -e "${3}${k}" ]; do (( i=$i+1 )); k="$j($i)" done cp "$ctx" "${3}${k}" fi fi IFS=$'\n' ################################## temporarily added ######################################## done IFS="$IFS_OLD" else # only for file name search find -L ~+ -type f -regextype grep -iregex "$1" | sed 's/\(.*\)/\"\1\"/' | xargs -n1 # list all the existing pdfs fi t=$( find -L ~+ -type f -regextype sed -iregex '.*\.\(zip\|rar\|ace\|arj\|t\?gz\|tar\|z\|7z\|xz\|bz2\|lzh\|ex[e_]\{1\}\|iso\)' | sed 's/\(.*\)/\"\1\"/' | xargs -n1 ) IFS_OLD="$IFS" IFS=$'\n' for tt in $t; do echo -e "----->>>>> $tt\n" ttt=$( 7z -px l "$tt" | sed -n '/Date/,/\lfile/p' ) v=$( echo "$ttt" | grep -i "\.zip$\|\.rar$\|\.ace$\|\.arj$\|\.t\?gz$\|\.tar$\|\.z$\|\.7z$\|\.xz$\|\.bz2$\|\.lzh$\|\.ex[e_]\{1\}$\|\.iso$" ) if [ $? -eq 0 ]; then # need to extract further for advanced search since another package inside IFS="$IFS_OLD" # branch out then need recovery "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" IFS=$'\n' # branch in then need recovery elif [ "$2" != $'null' ]; then # need to extract further if at least one match for context search echo "$ttt" | grep -m1 -q -i "$1" if [ $? -eq 0 ]; then IFS="$IFS_OLD" "$SCRIPT_DIR"/"$SCRIPT_NAME" "$1" "$2" "$3" "$tt" "$SCRIPT_DIR" "$SCRIPT_NAME" IFS=$'\n' fi else echo "$ttt" | grep -i "$1" fi done IFS="$IFS_OLD" if [ "$inipath" != "$arcpath" ]; then # if not the root cd "$inipath" rm -rf "$arcpath" else exec 3>&- # done the search and release fd3 echo; echo $( date ); echo; fi # end of sh ``` # 以下腳本移除了$IFS對grep所造成的影響 ## 並採用了 read 的方式做讀取 目前的測試結果都正常了。 所以這應該是終版了。 感謝慢慢來大大及口烏大大對此腳本的建議與幫助修正 另外我也把 grep sed 二者的 regex 搞混了,grep 的 pattern 是不需要加引號的,因為 grep 的空格是用 [:blank:] 來表示的。 所以前面的示例請把引號拿掉。 ``` #!/bin/bash # deleted ```