4

If I have a number of files like the following:

file1:

123 456 789 012 

file2:

line1 922 line2 392 line3 456 line5 291 line6 201 ... 

file3:

line1 111 line2 123 line3 19 line5 542 line6 456 ... 

What's the best way to get all of the lines in file1 which are contained in a line of bothfile2 and file3?

In this example, it would be just:

456 
2
  • are lines just these numbers or is there anything more in them?
    – FelixJN
    Aug 19, 2015 at 8:56
  • @Fiximan it's in this format - but longer numbers and different text instead of line1, etx
    – galois
    Aug 19, 2015 at 21:43

4 Answers 4

3
grep -of file1 file2|xargs -I {} grep -o "{}" file3 

This starts by taking the input of file1 and feeding it in line by line into file2, returning the exact matched text if any. Then the results if any are fed into file3 line by line again returning only matched text.

5
  • Follow up - is there a way that this would still work if, for example, by excluding substrings in the first part of a line of file2 that matches the line in file1 being searched for? For example - if file2 had 1000 entries, and we wanted to find all occurrences of 400, it would show up at least once in file2 - in the first column (line400). Would there be a way to exclude that? maybe with regex, to make sure the text being matched comes after the tab/space?
    – galois
    Aug 19, 2015 at 9:23
  • 1
    Yes, by pre-processing the contents of file1 with sed to put a space in front of each word: sed 's/^/ /' file1|grep -of - file2|xargs -I {} grep -o "{}" file3 or, slightly more elegantly: grep -o "$(sed 's/^/ /' file1|grep -of - file2)" file3 This will now match ' 400' but not just '400' (as in 'line400')
    – gogoud
    Aug 19, 2015 at 13:31
  • That's a nice idea - but it unfortunately returns nothing in my terminal
    – galois
    Aug 21, 2015 at 4:08
  • It works perfectly for me with sed (GNU sed) 4.2.2 and grep (GNU grep) 2.20. Check for typos, especially that the sed expression is 's/^/ /'. The way stackexchange shows the code (breaking the line at the space) may have misled you.
    – gogoud
    Aug 22, 2015 at 5:29
  • I think the problem was that in file2 - the columns are separated by a tab, instead of a space. sed 's/^/\t/' seems to return output that is at least sort of correct. thanks for the idea
    – galois
    Aug 22, 2015 at 20:31
2

You could use join 2 times on a row:

join -1 1 -2 2 -o 1.1 <(join -1 1 -2 2 <(sort file1) <(sort -k2 file2)) <(sort -k2 file3) 

Prints only:

465 

First look at the inner join. It joins file1 and file2 by using the field 1 in file1 and the field 2 in file2.

Then all of this is joined again with file3. Notice, the files must be sorted on the join fields (sort -k).

    1
    also(){ sed 'h; #save a copy of the line before edits s/[]$\./*^[]/\\&/g; #literally quote any metachars s|.*|/&/c\\|p; #print first half of sed command g; #get original copy out of hold space s/\\/&&/g;' | #double-up backslashes sed -nf - -- "$@" #read stdin script -file } 

    That function takes a pattern file as stdin and one or more search files as arguments. It writes to its output any line from its pattern file which can be matched in its search files. It is careful to reproduce the original exactly each time. And because it is, you can use it recursively.

    also <file1 file2 | also file3 

    456 
      0

      grep should suffice your solution

       grep -o "`grep -of file1 file2`" file3 

      the inner grep "grep -f file1 file2" will grep the pattern present in file1 and file2 and the pattern returned is searched in file3.

      2
      • this doesn't work as written because it needs the -o grep option thus: grep -o "$(grep -of file1 file2)" file3
        – gogoud
        Aug 19, 2015 at 6:19
      • got it. it will print the whole line.Aug 19, 2015 at 6:22

      You must log in to answer this question.

      Not the answer you're looking for? Browse other questions tagged .