4

I am trying use the results of "file" which returns info about the file/s I specify.

E.g.

file *.doc 'all .doc extensions 

This then returns information about the file including "Name of Creating Application: Microsoft Word" Now, I am trying to search the results for the string "Word". I'm stuck here. How do I actually do that?

*This is what I tried after hours of searching. I just don't know what word I'm looking for. Please advise.

find . -type f -print0 | xargs -0 grep -lh "Microsoft Word" | xargs -I % mv % ../NewDirectory/ 

I think this searches for the string "Word" inside the files itself and moves it to the new directory.

2
  • so then are you saying you're trying to run the file command on any files that have a .doc extension?
    – Centimane
    CommentedSep 29, 2015 at 11:44
  • @Dave Yes, I am trying to run the "file" command on .doc extensions only and then move the files that will return "Microsoft Word" string. I do this because some files have wrong extensions (xls, ppt).
    – A. Mist
    CommentedOct 1, 2015 at 1:51

2 Answers 2

2

If I understand correctly, you want to move files from the current directory and its subdirectories recursively to another directory, but only if the file command reports them as “Microsoft Word” files. That is, you're interested in the files for which file "$filename" | grep 'Microsoft Word' produces some output.

An easy way is to take things calmly and to it file by file. If you only want the files in the current directory, you can use a for loop and a wildcard pattern:

for f in *.doc; do if … done 

What's the condition? We want to test if Microsoft Word appears in the output of file "$f". I use file -- to protect against files whose name begins with -.

for f in *.doc; do if file -- "$f" | grep -s 'Microsoft Word'; then … fi done 

All we need to do is add the command to move the files.

for f in *.doc; do if file -- "$f" | grep -s 'Microsoft Word'; then mv -- "$f" ../NewDirectory/ fi done 

If you want to look for files in subdirectories as well, use the ** wilcdard pattern for recursive globbing. In bash, it needs to be activated with shopt -s globstar (in ksh93, you need set -o globstar, and in zsh it works out of the box; other shells lack this feature). Beware that bash ≤4.2 follows symbolic links to directories.

for f in **/*.doc; do if file -- "$f" | grep -s 'Microsoft Word'; then mv -- "$f" ../NewDirectory/ fi done 

Note that all moved files end in ../NewDirectory/, no subdirectories are created. If you want to reproduce the directory tree, you can use string manipulation constructs to extract the directory part of the file name and mkdir -p to create the target directory if necessary.

for f in ./**/*.doc; do if file "$f" | grep -s 'Microsoft Word'; then d="${f%/*}" mkdir -p ../NewDirectory/"$d" mv "$f" ../NewDirectory/"$d" fi done 

Rather than parse the output of file, which is somewhat fragile, you might prefer to parse file -i, which prints standardized strings.

2
  • Thanks for explaining in detail! You just saved my ass. Also, about the subdirectories I didn't need it but it wouldn't hurt to learn! :) Great Community! I hope I can help next time.
    – A. Mist
    CommentedOct 1, 2015 at 2:09
  • @A.Mist if this answer your question you should except it as the answer
    – Centimane
    CommentedOct 1, 2015 at 11:01
2

Your first file example will not work because of the unmatched single quote, but I think you already found that because of your second example.

If you do:

find . -type f 

you can look at the output. Those are filenames. If you want to select something from that output use grep directly

find . -type f | grep "Microsoft Word" 

that searches through the filenames, not through the contents of the files listed. That is not completely accurate, as a filename could have a newline in them, making the output incomplete if a filename with "Microsoft Word" in it has a newline as part of the name.

If you do:

find . -type f -print0 | xargs -0 grep -lh "Microsoft Word" 

the xargs part actually hands the filenames to grep (the -print0 for find and -0 for xargs is to handle filename with newlines). This searches for the whole string "Microsoft Word" not just "Word" in the files.

The -lh specified for grep lists the filenames, and there might be a problem with that as newlines in filenames are printed normally, you should continue to use NUL terminated filenames by specifying -Z as well. If you don't specify -l you get the contents of the lines matches as well, which would make further processing (your mv) impossible.

If you want to move all the files into one directory it is often easier to use mv -t instead of mucking with xargs' -I option (which allows you to put parameters xargs reads from its input at a different place than the default end-of-line but which is slower as mv is called once for each file):

find . -type f -print0 | xargs -0 grep -lhZ "Microsoft Word" | xargs -0 mv -t ../NewDirectory/ 

And this moves all files somewhere under the current directory, with "Microsoft Word" in part of their content to the NewDirectory that is next to the current directory. Please note that ../NewDirectory has to exist.

2
  • Thank you. Does this use the results of "file" command or will it produce a similar output? I am curious.
    – A. Mist
    CommentedOct 1, 2015 at 2:02
  • @A.Mist I am not 100% sure what you mean by "this". If you are referring to the grep -Z part, then yes: it produces a list of NUL terminated filenames, just like file -print0 does. The complete sequence produces nothing as output, it just moves the files. If you are new at this, it often helps to make a few files with known simple content ('aa', 'ab', or 'bb') in small directory hierarchy, under a new directory and try the commands out. Make it simple enough so that you know what should happen if you run such a sequence and can check the results.
    – Anthon
    CommentedOct 1, 2015 at 4:41

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.