0

I just discovered that substr() in awk accepts either 0 or 1 as the initial index in a string. I tested this in Gawk 5.1.0 and MacOS awk 20070501.

awk 'BEGIN {print substr("abcd", 0, 1)}' 

outputs "a", as does

awk 'BEGIN {print substr("abcd", 1, 1)}' 

and

awk 'BEGIN {print substr("abcd", 2, 1)}' 

outputs "b" just to prove that nothing's obviously wrong.

I didn't see anything in the man pages or the Gawk info file other than mentions of 1-indexing.

For consistency with the documentation and with the fact that index() returns 1 for the first position and 0 for no match, it would be good policy to always use 1.

My question is why is this duality the case? Also, is it documented somewhere? Are there other awk implementations that do this?

7
  • In a quick check, BWK (original awk) and the updated mawk do this, but not older mawk. Someone patient enough could bisect mawk to see if there's an explanation.CommentedJun 29, 2022 at 0:02
  • I've experienced quite a few differences between GNU awk and BSD awk, so much so that I try to avoid using it, or else rely explicitly on gawk in all cases.CommentedJun 29, 2022 at 0:58
  • BusyBox v1.33.1 awk also does this.CommentedJun 29, 2022 at 3:04
  • 1
    It's unfortunate that negative indices are converted to 1 because it would be useful if they counted from the end of the string.CommentedJun 29, 2022 at 14:13
  • 1
    Yeah, I could see that. What I find interesting is that strings, generated arrays, and fields start at 1 in awk, but they all handle negative indices differently. echo 'a b' | awk '{print substr($0,-1,1)}' will output a as discussed above. But echo 'a b' | awk '{print $-1}' will output an error message like awk: cmd. line:1: (FILENAME=- FNR=1) fatal: attempt to access field -1 while echo 'a b' | awk '{split($0,a); print a[-1]}' will output a blank line. I understand why and each are reasonable but on the surface it's not particularly consistent across the 3 types of object :-).
    – Ed Morton
    CommentedJun 29, 2022 at 15:15

1 Answer 1

5

From the GNU awk online documentation: 'substr() function':

If start is less than one, substr() treats it as if it was one. (POSIX doesn’t specify what to do in this case: BWK awk acts this way, and therefore gawk does too.) If start is greater than the number of characters in the string, substr() returns the null string. Similarly, if length is present but less than or equal to zero, the null string is returned.

0

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.