I just discovered that substr()
in awk accepts either 0 or 1 as the initial index in a string. I tested this in Gawk 5.1.0 and MacOS awk 20070501.
awk 'BEGIN {print substr("abcd", 0, 1)}'
outputs "a", as does
awk 'BEGIN {print substr("abcd", 1, 1)}'
and
awk 'BEGIN {print substr("abcd", 2, 1)}'
outputs "b" just to prove that nothing's obviously wrong.
I didn't see anything in the man
pages or the Gawk info
file other than mentions of 1-indexing.
For consistency with the documentation and with the fact that index()
returns 1 for the first position and 0 for no match, it would be good policy to always use 1.
My question is why is this duality the case? Also, is it documented somewhere? Are there other awk implementations that do this?
echo 'a b' | awk '{print substr($0,-1,1)}'
will outputa
as discussed above. Butecho 'a b' | awk '{print $-1}'
will output an error message likeawk: cmd. line:1: (FILENAME=- FNR=1) fatal: attempt to access field -1
whileecho 'a b' | awk '{split($0,a); print a[-1]}'
will output a blank line. I understand why and each are reasonable but on the surface it's not particularly consistent across the 3 types of object :-).