4

If we do:

VAR=100:200:300:400 

We can do:

echo ${VAR%%:*} 100 

and

echo ${VAR##*:} 400 

Are there any equivalents I could use to get the values of 200 and 300?

For instance, a way to get only what's between the second and third colon specifically in case there are more than just 3 colons?

3
  • 3
    With an array: IFS=":" a=( $VAR ); echo "${a[1]}"
    – Cyrus
    Commentedyesterday
  • @Cyrus that's a very neat way to create an array, I didn't know you could do it without using read, thanks!
    – Cestarian
    Commentedyesterday
  • @Cyrus Thank you. That should be an answer. I'd note that unset IFS may not be needed afterwards, but it is a precaution that one might want to take.
    – Wastrel
    Commentedyesterday

4 Answers 4

12

The ksh-style (and now specified by POSIX for sh) ${var#pattern} and ${var%pattern} (and greedy variants with ## and %%) only remove text from the beginning and end respectively of the contents of the variable.

To get 200 out of 100:200:300:400 with those operators, you'd need to apply both ${VAR#*:} to remove 100: from the beginning and then ${VAR%%:*} to remove :300:400 from the end.

That would be with ${${VAR#*:}%%:*}, but while that would work in zsh, that doesn't work in bash or ksh which don't allow chaining parameter expansion operators.

In zsh, you'd rather use $VAR[(ws[:])2] to get the 2nd:separated word of $var¹, or use ${${(s[:])VAR}[2]} to first split the variable into an array and then get the second element.

In ksh93, you can do ${VAR/*:@(*):*:*/\1}, but while bash (like zsh²) has copied ksh93's ${param/pattern/replacement} operator, it hasn't copied the capture and reference part.

In bash (which is still widely used despite all its limitations as it's the GNU shell so pre-installed on virtually all GNU/Linux systems as well as a few non-GNU systems), you can do it in two steps:

tmp=${VAR#*:}; printf '%s\n' "${tmp%%:*}" 

bash's only builtin splitting operator (unless you want to consider bash 4.4+'s readarray -d which is more for reading records from some input stream into an array; and the kind of splitting it does by separating out arguments in its syntax) which is the Bourne-style (extended by Korn) IFS-splitting which is performed upon unquoted expansions (you used it by mistake in echo ${VAR%%:*} where you forgot the quotes), and by read.

read works on one line, so can only be used for variable that don't contain newline characters. Using split+glob (glob being the other side effect of leaving an expansion unquoted) is cumbersome as we need to disable the glob part, and change a global $IFS parameter.

In bash 4.4+, you can do it like:

nth() { local - # local - is to make the changes to option settings local to # the function like in the Almquist shell. The idea is that it # makes the $- special parameter local to the function. That's # equivalent to the set -o localoptions of zsh. In bash, that # only works for the set of option managed by set, not the ones # managed by shopt. local string="$1" n="$2" IFS="${3- }" set -o noglob # disable the glob part set -- $string'' # apply split+glob with glob disabled printf '%s\n' "${!n}" # dereference the parameter whose name is stored in $n } nth "$VAR" 2 : 

Without the '', 100::300: would be split into "100", "", "300" only, so for instance $# would expand to 3 even though the variable has 4 :-separated field. Beware though that it means an empty variable is split into 1 empty element instead of none.

For read (or readarray), a similar work around would be to add an extra delimiter at the end of the input. Since bash variables (contrary to zsh's) can't contain NUL characters anyway, with read and arbitrary variable values, you could do:

words=() IFS=: LC_ALL=C read -rd '' -a words < <(printf '%s:\0' "$VAR") && printf '%s\n' "${word[2 - 1]}" 

(- 1 because read starts filling up the array at index 0, not 1; LC_ALL=C works around some bugs for text not encoded in the user's encoding in bash versions 5.0 to 5.2)

With readarray (bash 4.4):

records=() readarray -O1 -td : records < <(printf %s: "$VAR") && printf '%s\n' "${records[2]}" 

Again, : added at the end to prevent a trailing empty element being discarded (but again meaning an empty input results in one empty element).


¹ Though note that it splits ::a:b::c::: into a, b and c only like IFS-splitting did in the Bourne shell (but not in modern Bourne-like shells except when space, tab or newline are used as separators).

² zsh supports it but with a different syntax and that needs the extendedglob option to be enabled: ${VAR/(#b)*:(*):*:*/$match[1]}

1
  • 3
    Chazelas-man, you're really like a man page when it comes to the shell ^^Commentedyesterday
6

I don't try to use substitution syntax on the colon-separated string value. Instead I use splitting to put the parts between colons into an array and then perform variable substitution (and even array slicing) to return the desired array elements:

#!/usr/bin/env bash VAR='100:200:300:400' IFS=: read -ra VAR2 <<<"$VAR" echo "${VAR}" echo "${VAR2[1]}" echo "${VAR2[@]}" echo "${VAR2[@]:1:2}" 

produces the output:

100:200:300:400 200 100 200 300 400 200 300 

This may not have the precise effects you want if whitespace characters also appear between colons in the original value (you should test such things carefully), but your example didn't have any.

2
  • This seems to be the officially recommended way for working with delimiters in bash, using read -r is perfect as splitting the outputs into separate variables is preferable for my actual use case. I liked all 3 answers so far, this is the one i'm going to actually end up using because it can be done conveniently in one line, and I can use parameter expansions in conjunction with this to ensure it's all lowercase which I happen to also need. But I will be marking Stéphane Chazelas's answer as the solution since he answers the question (when taken most literally) best.
    – Cestarian
    Commentedyesterday
  • 3
    IFS=: read -ra array <<< "$string" only works for strings that don't contain newline characters (and in some versions of bash are valid text in the user's locale) and discards an empty trailing element. For arbitrary strings, you need more something like IFS=: LC_ALL=C read -rd '' -a array < <(printf '%s:\0' "$string" as detailed in my answer.Commentedyesterday
6

You could use IFS and set to separate and access all four:

$ VAR=100:200:300:400 $ IFS=: $ set -- $VAR $ echo $1 100 $ echo $2 200 $ echo $3 300 $ echo $4 400 
    5

    For instance a way to get only what's between the second and third colon specifically in case there are more than just 3 colons?

    That would require a lot of parameter expansion manually. You could loop through it using the delimiter, something like

    #!/bin/sh counter=0 var='100:200:300:400' while [ -n "$var" ]; do ##: Test if $var is not empty. first="${var%%:*}" ##: Extract the first string sep by a `:' rest=${var#*"$first"} ##: Remove the first string sep by a `:' var="${rest#:}" ##: Remove the first occurence of `:' counter=$((counter+1)) ##: Increment the counter by one case "$counter" in (2|3) printf '%s\n' "$first";; ##: Counter reached the 2nd and 3rd field, print it (4) break ;; ##: Break the loop from the 4th field (if there are more) esac ##: Repeat the while loop until the condition is met. done ##: which is until $var is empty. 

    Output:

    200 300 

    With a Bash version that has loadable builtin called dsv that can parse a string like what cut is doing. You can try something like:

    #!/usr/bin/env bash enable dsv || exit var='1 00:20 0:300 : 4 0 0 ' dsv -a array -d':' "$var" declare -p array 

    Output:

    declare -a array=([0]="1 00" [1]="20 0" [2]="300 " [3]=" 4 0 0 ") 

    If some fields are empty like below:

    var='1 00::200:300 : 4 0 0 ' 

    The -g option can be used to skip/remove it.

    dsv -a array -gd':' "$var" 

    In some cases the input can have double quoted fields like below.

    The parsing understands and skips over double-quoted strings.

    var='1 00:"20:0":300 : 4 0 0 ' 

    Output:

    declare -a array=([0]="1 00" [1]="20:0" [2]="300 " [3]=" 4 0 0 ") 

    • The loadable builtins is a separate package from bash it is not included afaik.

    According to help dsv

    dsv: dsv [-a ARRAYNAME] [-d DELIMS] [-Sgp] string Read delimiter-separated fields from STRING. Parse STRING, a line of delimiter-separated values, into individual fields, and store them into the indexed array ARRAYNAME starting at index 0. The parsing understands and skips over double-quoted strings. If ARRAYNAME is not supplied, "DSV" is the default array name. If the delimiter is a comma, the default, this parses comma- separated values as specified in RFC 4180. The -d option specifies the delimiter. The delimiter is the first character of the DELIMS argument. Specifying a DELIMS argument that contains more than one character is not supported and will produce unexpected results. The -S option enables shell-like quoting: double- quoted strings can contain backslashes preceding special characters, and the backslash will be removed; and single-quoted strings are processed as the shell would process them. The -g option enables a greedy split: sequences of the delimiter are skipped at the beginning and end of STRING, and consecutive instances of the delimiter in STRING do not generate empty fields. If the -p option is supplied, dsv leaves quote characters as part of the generated field; otherwise they are removed. The return value is 0 unless an invalid option is supplied or the ARRAYNAME argument is invalid or readonly. 

      You must log in to answer this question.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.