0

First I will describe the task I have:

  • I have several disks
  • I have a list of phrases (words) in a patterns.txt file that I use as search patterns
  • I must to review the disks in search of files containing these patterns
  • Copy found files to a separate location on another disk
  • Erase the source disk

I came up with such a solution:

  1. Find folders and files by pattern file and save list them to the file:

    grep -ril -f /home/user/patterns.txt /media/user/source-disk | tee /home/user/listfiles.txt

  2. Find pdf files (because grep are skipping pdfs) by patterns file and attach them to the file:

    pdfgrep -ril -f /home/user/patterns.txt /media/user/source-disk | tee -a /home/user/listfiles.txt

  3. Copy the files found to the external disk2:

    cp -i --preserve=all -v `cat /home/user/listfiles.txt` /home/user/disk2

  4. Use ShredOS to wipe source-disk

Now I encountered a problem:

In the listfiles.txt results some folders and files contain spaces = that means that cp reports a mistake (which seems correct)

I found help here and added one more point to work, and now looks like this:

  1. Find folders and files containing space and replace it with underscores:

    find /media/user/source-disk -iname "* *" | while read file; do mv "$file" ${file// /_}; done

  2. Find folders and files by pattern file and save list them to the file:

    grep -ril -f /home/user/patterns.txt /media/user/source-disk | tee /home/user/listfiles.txt

  3. Find pdf files (because grep are skipping pdfs) by patterns file and attach them to the file:

    pdfgrep -ril -f /home/user/patterns.txt /media/user/source-disk | tee -a /home/user/listfiles.txt

  4. Copy the files found to the external disk2:

    cp -i --preserve=all -v `cat /home/user/listfiles.txt` /home/user/disk2

  5. Use ShredOS to wipe source-disk

My questions:

  1. Does it all look correct?
  2. Did I forget to do something? (After erasing, it will not be possible to repeat and fix my stupidity)
  3. Is it better to use the script from here instead of find -iname "* *" | while...?

Script:

#!/bin/bash # set -o xtrace # uncomment for debugging declare weirdchars=" &\'" function normalise_and_rename() { declare -a list=("${!1}") for fileordir in "${list[@]}"; do newname="${fileordir//[${weirdchars}]/_}" [[ ! -a "$newname" ]] && \ mv "$fileordir" "$newname" || \ echo "Skipping existing file, $newname." done } declare -a dirs files while IFS= read -r -d '' dir; do dirs+=("$dir") done < <(find -type d -print0 | sort -z) normalise_and_rename dirs[@] while IFS= read -r -d '' file; do files+=("$file") done < <(find -type f -print0 | sort -z) normalise_and_rename files[@] 
  • advantage: can find other strange characters except spaces
  • Disadvantage: I can't make the script work on any selected disk (for example, USB). It works only on my main start disk
  1. Can anyone help convert this script so that they work on the disk indicated by me?
  2. Can I do all the work in one command without problems using &&?

find /media/user/source-disk -iname "* *" | while read file; do mv "$file" ${file// /_}; done && grep -ril -f /home/user/patterns.txt /media/user/source-disk | tee /home/user/listfiles.txt && pdfgrep -ril -f /home/user/patterns.txt /media/user/source-disk | tee -a /home/user/listfiles.txt && cp -i --preserve=all -v `cat /home/user/listfiles.txt` /home/user/disk2

4
  • 3
    "some folders and files contain spaces = that means that cp reports a mistake" - this is indicative of broken code. Why not just fix that?Commented20 hours ago
  • 1
    "Can I do all the work in one command" - why would you want to sacrifice readability just so you can avoid hitting «Enter»?Commented18 hours ago
  • Please reduce this to a single problem (e.g. how to find files with spaces in their names OR how to search for "patterns" within files) with a minimal, reproducible example of the problem and your attempt to solve that problem so we can then help you. If after getting an answer you then need help with another problem then ask a new, separate question about THAT problem. See How to Ask.
    – Ed Morton
    Commented17 hours ago
  • Actually, given what you say under "My questions:", it sounds like you might be better posting this at codereview.stackexchange.com/questions/tagged/bash.
    – Ed Morton
    Commented17 hours ago

1 Answer 1

0
find -type f -exec grep -iqf ../patterns.txt {} "+" -exec cp --parents -t ../target "{}" "+" 

The switch --parents in the cp command creates the parents directories if needed.

For testing, I just used ../target and ../patterns.txt - you'll have to adjust that for your needs.

To find with -iname "* " is generally unneeded, since " *" does not contain a characters, which upper- or lowerness could be ignored, but maybe this doesn't cost you anything on the other side.

Find already creates an iterator, so no need to use a while-loop (if you happen to use GNU-find, which I suspect (default Linux find?)). But more important: Find knows whether a space is part of a filename, or a separator between such.

I would suggest to test the command first and only wipe or shred after carefully controlling whether it works as intended.

I didn't include a call to pdfgrep, because it isn't installed on my system, but to keep it simple, you may just use 2 variants of the command, with -iname "*.pdf" ... pdfgrep ... and -not -iname "*.pdf" ... grep ....

I tested with standard Linux commands and subdir and files with and without blanks.

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.