7

I wanted to use 2.6.2 Parameter Expansion to remove leading characters from a string, but was surprised to find out that "Remove Largest Prefix Pattern" doesn't automatically repeat the pattern.

$ x=aaaaabc $ printf %s\\n "${x##a}" aaaabc 

As you can see, only the first a has been removed. Expected output was bc for any of x=bc, x=abc, x=aabc, x=aaabc or x=aaaabc.

I'm struggling to figure out how I have to write the pattern if I want to remove as many a as possible from the beginning of $x. I had no luck searching for other threads either, because many answers use bash, but I'm looking for a POSIX shell solution.

1

3 Answers 3

9

For certain patterns you might be able to "reverse" the pattern by matching the part of the variable that you want to keep:

$ for x in "" a aa abc aabc aaabc aaabca aaabcabc bc bcaa > do > printf %s\\n "${x#"${x%%[!a]*}"}" > done bc bc bc bca bcabc bc bcaa 
4
  • Ah yes, keep everything starting with the first character that‘s not “a”!CommentedOct 7, 2022 at 14:23
  • For which patterns will this fail? Do you mean repeated multi character strings ?CommentedOct 7, 2022 at 17:47
  • @QuartzCristal this isn't a generalized solution, so it's easier to describe the patterns where it can succeed than those where it can fail. This approach only works if you can write a pattern to match the part of the string you want to keep. In this case, you want to keep the longest possible match that starts with "not a" followed by anything (i.e. *). (For cases where it can work, it's very clever, +1 from me!)
    – Wildcard
    CommentedOct 7, 2022 at 18:28
  • @Wildcard Yes, I understand, it is not easy to describe, and yes, +1 from me as well. But an example of what the certain patterns means, I mean, where it fails, would make this answer a lot more clear.CommentedOct 7, 2022 at 19:13
7

I don’t think you can do this in a generic fashion (i.e. ignoring specific features of the pattern), using only POSIX shell constructs, without using a loop:

until [ "${x#a}" = "$x" ]; do x="${x#a}"; done 
0
    5

    a as a pattern matches a, there's no way it can match aaa.

    While the POSIX sh specification is based on a subset of the Korn shell, and the Korn shell has *(foo) (matches a sequence of 0 or more foos) and +(foo) operators (matches a sequence of one or more foos, same as foo*(foo)), those were not specified by POSIX as they're not backward compatible with those of the Bourne shell, and means there are a number of contexts where they couldn't be used like in:

    • find . -name '*(x)' which is currently required to match on filenames that end in (x)
    • pattern='*(x)'; case $file in ($pattern) ...; esac or ${file##$pattern}. Same. You'll notice that ksh88 or pdksh do not recognise those operators in those cases.

    Repetition is supported in regular expressions. POSIX specifies a number of utilities that can match regular expressions (expr, grep, sed, awk...). Some shells have or have had some of those builtin. expr was built in (or could be made builtin) the Almquist shell. ksh93 can be built with expr, grep and sed builtin and can get their output without forking. Some ash-based shells can also get the output of command substitutions without forking when it's made of one invocation of a builtin command. The busybox shell is another example of a shell where all those utilities can be invoked without forking nor executing.

    On the other hand, printf which you use in your question is not builtin in ksh88 nor most pdksh derivatives. Appart from the special builtins and builtins such as export/getopts/read... which can only reasonably be builtin, POSIX does not give you guarantee that a command may or may not be builtin.

    So:

    x=$( expr "x$x" : 'xa*\(.*\)' ) 

    For instance could strip the leading as internally in the shell. With a couple of caveats though:

    • that returns with a failure exit status if the result is an empty string or some representations of 0
    • that also strips trailing newline characters.
    • you'll have noticed the x prefix we also need to add to prevent that to fail if $x happens to contain an expr operator (see the Application Usage section of POSIX expr specification for more details on that).

    Or with awk:

    x=$(awk 'BEGIN {sub(/^a*/, "", ARGV[1]); print ARGV[1]}' "$x") 

    Or sed:

    x=$(printf '%s\n' "$x" | sed '1s/^a*//') 

    (sed being the least appropriate here as it works line-based and need to be fed its input via stdin or a file).

    3
    • 1
      Why is x$x needed? In which conditions could it fail?CommentedOct 7, 2022 at 17:46
    • 1
      @QuartzCristal, see edit. You can try with x='(', x=+, x=index with various implementations.CommentedOct 7, 2022 at 18:20
    • I see, thanks @StéphaneChazelasCommentedOct 7, 2022 at 19:10

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.