3

I am trying to split a string into an array by any character that is not alphanumeric. Can assign a regex pattern to the IFS variable to accomplish this?

I have tried it like so:

input="$1" IFS="[^a-zA-Z]" read -ra name_parts <<< "$input" 

But this splits the string by any "a" or "A" - not even recognizing the "^". This question looks similar by title, but does not appear to be about the question I'm asking.

3
  • I should add that I am actually only concerned about alphabetic characters, so "alphanumeric" was inaccurate. I don't need to catch [0-9].CommentedJun 29, 2020 at 5:38
  • What is your expected output and your input string? Clearly this is an XY problem
    – Inian
    CommentedJun 29, 2020 at 5:48
  • This is an exercise for generating acronyms from names that may contain spaces, dashes, underscores, or some shell globbing characters. I am able to pass my particular set of tests with IFS=" |-|_|*". I understand XY problem, but I wanted to understand the limits of using IFS, and think about how I might be able to solve it with an unlimited variation of possible delimiters. I read about IFS, but was unable to find specifics about that limitation. Thanks for your answer.CommentedJun 29, 2020 at 5:55

3 Answers 3

2

IFS cannot be used that way. It does not take a regular expression. At the minimum, the characters (literal) composing the IFS is used by the shell to split words when it does expansion of words. E.g.

IFS=: read -r v1 v2 <<<"foo:bar" 

What you have defined in IFS="[^a-zA-Z]" takes the characters literally i.e. each of [, ^, a, -, z, A, Z and ] are used as separators to split your input string which is clearly not something you would expect to do.

    1

    IFS is just a bunch of characters (or bytes), not a regex. But you could use e.g. awk or sed to split the string based on a regex, print it out with a simpler separator and then read it with the shell's read.

    read -ra name_parts < <(awk -vFS='[^a-zA-Z]' -vOFS=' ' '{$1=$1; print}' <<< "$input") 

    or

    read -ra name_parts < <(sed -e 's/[^a-zA-Z]/ /g' <<< "$input") 
      0

      Instead of tinkering with IFS, you're better off mapping the the input string and then splitting it using the default IFS:

      read -ra name_parts <<<"$(printf '%s\n' "$input" | LC_ALL=C tr -cs 'a-zA-Z\n' '[ *]')" 

      Now the array name _parts will hold the string sliced at the non alphabetic positions.

        You must log in to answer this question.

        Start asking to get answers

        Find the answer to your question by asking.

        Ask question

        Explore related questions

        See similar questions with these tags.