awk matching fields in 2 separate files using command containing {BEGIN FS = "|"} returns output of blank lines

Question

Two files: data1

 Name |formula |no. |dose|days|cost |msg|em|notes Fname-Lname|BXXXT+GG |8262|4 |14 |57.78 | | |sq Fname-Lname|SJXXT+GG |8263|4¾ |14 |105.15| | |IB Fname-Lname|FJDHT+BH,LG,CQC,ZX|8264|5¾ |14 |46.20 | | |IB

data2

10/12/2020|more-data-3456|105.15 10/12/2020|more-data-3456|95.10 11/12/2020|more.data-3456|30.30 14/12/2020|more-data-3456|45.55

I am using the code snippet

awk 'BEGIN {FS = "|" } NR==FNR{a[$6];next} $3 in a {print $0}' data1 data2

To match where a value in $6 of file data1 also occurs in $3 of file data2. Where there is a match print out the whole record ($0) containing the match from file data2. I am expecting:

10/12/2020|more-data-3456|105.15

But I am only getting an output of a blank line. I removed the file separators "|" using a " " as replacement the command code worked exactly as expected however really want to preserve the field separator as | if at all possible . I would like to understand why the addition of a BEGIN block has caused this . Has it caused awk to load an empty array in place of taking data from S6 ? My awk level is just above beginner. Edit: I have also used the -F parameter with the same result, an out put of a blank line . I am using gawk .

what about if you do awk 'BEGIN {FS = "|" } NR==FNR{a[$6+0];next} $3+0 in a {print $0}' data1 data2? post also output of the command file data[12] — αғsнιη, CommentedMar 22, 2022 at 11:01
Could you edit the question to show the result of printing the files with cat -vet? This shows visually any characters which are non-graphic. — Paul_Pedant, CommentedMar 22, 2022 at 15:19

Ed Morton · Accepted Answer · 2022-03-22 13:53:09Z

You probably have DOS line endings, see why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it, so remove that if present (I'm removing that and any other trailing spaces with the sub() in my script below).
If you're getting a blank line output then you have blank lines in each of your input files but I'd bet you aren't REALLY getting a blank line output and instead you're getting the 1 line of output you should get for 105.15 but the CR at the end is overwriting back to the start of the line - pipe the output to | cat -v to see if that's true.
Your input has blanks before and after the |s in some places so you should set FS to match - FS=" *[|] *"
You don't need to write {print $0} as that's the default behavior

Try this:

awk 'BEGIN{FS=" *[|] *"} {sub(/[[:space:]]+$/,"")} NR==FNR{a[$6];next} $3 in a' data1 data2

Your solution worked thanks !!! I removed your sub() script , it still worked so no DOS endings. What I don't understand is why awk needs to be told about anything between the field separators in this instance and not others. For example awk 'BEGIN {FS="|"} 1 ' ~/awk_tests/data1 produces the entire file printed out with no problems. Piping the original command tocat -v produced one blank line . Is this expected ? Just trying to get to the bottom of the problem to improve my understanding of awk . — ajr_chm, CommentedMar 22, 2022 at 23:14
Your values in data1 are not just |-separated. Consider |57.78 | - the value you want is 57.78 but that's not what's between the |s, it's 57.78<blank>. To get only57.78 you need to tell awk to include blanks as part of the separator, hence FS=" *[|] *". Yes, awk 'BEGIN {FS="|"} 1 ' prints the whole file as you're not accessing any fields and so not using the FS that you're setting. A blank line of output just means you had blank lines in the input, — Ed Morton, CommentedMar 23, 2022 at 14:28

Wouter Verhelst · Accepted Answer · 2022-03-22 10:18:54Z

1

Your code works as-is for me, both with GNU awk 5.1.0 and with macOS awk 20200816.

Which version of awk are you using?

Note that you can also set the field separator with the -F command-line parameter; if you do that, then the BEGIN block is unnecessary.

answered Mar 22, 2022 at 10:18

Wouter Verhelst

9,64123 silver badges47 bronze badges

I should have put that in the question. I used the -F command line parameter . Same result, an empty line .
– ajr_chm
CommentedMar 22, 2022 at 10:22
Which version of awk?
– Wouter Verhelst
CommentedMar 22, 2022 at 10:23
I am using gawk
– ajr_chm
CommentedMar 22, 2022 at 10:25
That's still not a version number, but never mind. Is the data2 file perhaps in DOS/Windows mode? gawk doesn't seem to like that (I get no output if I convert it to CRLF line endings, though it doesn't seem to be affected by such changes in data1)
– Wouter Verhelst
CommentedMar 22, 2022 at 10:30
GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0) File is not in DOS/Windows format . Created in Gvim on Ubuntu 20.04
– ajr_chm
CommentedMar 22, 2022 at 10:31

Add a comment |

Stack Exchange Network

awk matching fields in 2 separate files using command containing {BEGIN FS = "|"} returns output of blank lines

2 Answers 2

You must log in to answer this question.

Hot Network Questions

awk matching fields in 2 separate files using command containing {BEGIN FS = "|"} returns output of blank lines

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions