1

Two files: data1

 Name |formula |no. |dose|days|cost |msg|em|notes Fname-Lname|BXXXT+GG |8262|4 |14 |57.78 | | |sq Fname-Lname|SJXXT+GG |8263|4¾ |14 |105.15| | |IB Fname-Lname|FJDHT+BH,LG,CQC,ZX|8264|5¾ |14 |46.20 | | |IB 

data2

10/12/2020|more-data-3456|105.15 10/12/2020|more-data-3456|95.10 11/12/2020|more.data-3456|30.30 14/12/2020|more-data-3456|45.55 

I am using the code snippet

awk 'BEGIN {FS = "|" } NR==FNR{a[$6];next} $3 in a {print $0}' data1 data2 

To match where a value in $6 of file data1 also occurs in $3 of file data2. Where there is a match print out the whole record ($0) containing the match from file data2. I am expecting:

10/12/2020|more-data-3456|105.15 

But I am only getting an output of a blank line. I removed the file separators "|" using a " " as replacement the command code worked exactly as expected however really want to preserve the field separator as | if at all possible . I would like to understand why the addition of a BEGIN block has caused this . Has it caused awk to load an empty array in place of taking data from S6 ? My awk level is just above beginner. Edit: I have also used the -F parameter with the same result, an out put of a blank line . I am using gawk .

2
  • 1
    what about if you do awk 'BEGIN {FS = "|" } NR==FNR{a[$6+0];next} $3+0 in a {print $0}' data1 data2? post also output of the command file data[12]CommentedMar 22, 2022 at 11:01
  • Could you edit the question to show the result of printing the files with cat -vet? This shows visually any characters which are non-graphic.CommentedMar 22, 2022 at 15:19

2 Answers 2

1
  1. You probably have DOS line endings, see why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it, so remove that if present (I'm removing that and any other trailing spaces with the sub() in my script below).
  2. If you're getting a blank line output then you have blank lines in each of your input files but I'd bet you aren't REALLY getting a blank line output and instead you're getting the 1 line of output you should get for 105.15 but the CR at the end is overwriting back to the start of the line - pipe the output to | cat -v to see if that's true.
  3. Your input has blanks before and after the |s in some places so you should set FS to match - FS=" *[|] *"
  4. You don't need to write {print $0} as that's the default behavior

Try this:

awk 'BEGIN{FS=" *[|] *"} {sub(/[[:space:]]+$/,"")} NR==FNR{a[$6];next} $3 in a' data1 data2 
2
  • Your solution worked thanks !!! I removed your sub() script , it still worked so no DOS endings. What I don't understand is why awk needs to be told about anything between the field separators in this instance and not others. For example awk 'BEGIN {FS="|"} 1 ' ~/awk_tests/data1 produces the entire file printed out with no problems. Piping the original command tocat -v produced one blank line . Is this expected ? Just trying to get to the bottom of the problem to improve my understanding of awk .
    – ajr_chm
    CommentedMar 22, 2022 at 23:14
  • Your values in data1 are not just |-separated. Consider |57.78 | - the value you want is 57.78 but that's not what's between the |s, it's 57.78<blank>. To get only57.78 you need to tell awk to include blanks as part of the separator, hence FS=" *[|] *". Yes, awk 'BEGIN {FS="|"} 1 ' prints the whole file as you're not accessing any fields and so not using the FS that you're setting. A blank line of output just means you had blank lines in the input,
    – Ed Morton
    CommentedMar 23, 2022 at 14:28
1

Your code works as-is for me, both with GNU awk 5.1.0 and with macOS awk 20200816.

Which version of awk are you using?

Note that you can also set the field separator with the -F command-line parameter; if you do that, then the BEGIN block is unnecessary.

5
  • I should have put that in the question. I used the -F command line parameter . Same result, an empty line .
    – ajr_chm
    CommentedMar 22, 2022 at 10:22
  • Which version of awk?CommentedMar 22, 2022 at 10:23
  • I am using gawk
    – ajr_chm
    CommentedMar 22, 2022 at 10:25
  • That's still not a version number, but never mind. Is the data2 file perhaps in DOS/Windows mode? gawk doesn't seem to like that (I get no output if I convert it to CRLF line endings, though it doesn't seem to be affected by such changes in data1)CommentedMar 22, 2022 at 10:30
  • GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0) File is not in DOS/Windows format . Created in Gvim on Ubuntu 20.04
    – ajr_chm
    CommentedMar 22, 2022 at 10:31

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.