3
\$\begingroup\$

I've got a simple spec for defining character sets, ranges, or wildcards. I'd like to take the string and appropriately create an array based on these rules. I have a solution but I'm sure there is a better way!

  • Single characters may be used
  • Sets and ranges are enclosed in brackets []
  • Sets are comma separated, [1,3,S]
  • Ranges are hyphenated, [1-3]
  • Sets and ranges can be combined, [1-3,7,9-10,S]
  • Wildcard character is - and not enclosed in brackets
  • Strings can contain the following characters are A-Z0-9-.

Example inputs and expected outputs

  • 'ABC' outputs ['A', 'B', 'C']
  • 'S[1-2]-' outputs ['S', '1-2', '-']
  • 'S[1-2]2.0[0-9][1,3,7]S' outputs ['S', '1-2', '2', '.', '0', '0-9', '1,3,7', 'S']
  • 'S[1-2,7]' outputs ['S', '1-2,7']

My implementation:

function stringToArray(code) { var exploded = code.split(''), charGroups = [], tempGroup = '', inGroup = false; while (exploded.length > 0) { var cur = exploded.shift(); if (cur === '[') { inGroup = true; continue; } if (inGroup) { if (cur === ']') { inGroup = false; charGroups.push(tempGroup); tempGroup = ''; } else { tempGroup += cur; } } if (!inGroup) { if (cur !== ',' && cur !== '[' && cur !== ']') { charGroups.push(cur); } } } return charGroups; } 

Demo fiddle here:http://jsfiddle.net/3q25eps3/

\$\endgroup\$

    2 Answers 2

    3
    \$\begingroup\$

    You could perhaps use regular expressions, provided the spec stays simple. For instance

    function stringToArray(string) { var segments = []; string.replace(/(\[([^\]]+)\]|[^,])/g, function (m0, m1, m2) { segments.push(m2 || m1); // push bracket content (m2) or single char (m1) }); return segments; } 

    The code uses replace to map/scan the string (it doesn't actually bother with replacing anything), since it accepts a callback function.

    The pattern used will match either:

    • The content between [ and ] as a contiguous string: \[([^\]]+)\]
    • Single, non-comma character outside brackets: [^,]

    And due to the g (global) flag in the pattern, it'll repeat matching until the end of the string.

    This depends on bracket not being nested or unbalanced (it can't handle that). It's also assumes anything that's not a comma should be captured. I.e. even if the character @ isn't part of the spec, it'll still get treated as a "valid" character and included in the output. Similarly it'll skip over multiple commas in a row, even if such a string is invalid.

    So it's kinda rough, and I can't guarantee it'll cover all cases, but it gives the correct output for your test cases.

    \$\endgroup\$
      1
      \$\begingroup\$

      I do recommend changing the name of your function to something more specific to its process, making it clear that it's looking for a specific type of string. Something like getArrayFromStringCode might be your best bet.

      There are several different approaches for parsing strings, but there's no right way. The best way will likely be the one that is easiest to use, maintain, and debug.

      My first thought was that a recursive method would be the best way to parse a string for placeholders, so that when you hit an opening bracket you can return the substring up to the closure, then pass the rest of the string back into the recursive function to be processed.

      function parseRecursive(code){ if(code.length >0){ var nextClosure = code.indexOf("]"); if(code[0] !== "[" || nextClosure < 0){ // if this isn't the start of a set or there are no more sets... // return an array with this character and keep processing remainder of string return code.length > 1 ? [code[0]].concat(parseRecursive(code.substring(1))) : [code[0]]; }else{ // if this is the start of a set... // return an array with the set as a string and keep processing the remainder of the string return code.length > nextClosure+1 ? [code.substring(1,nextClosure)].concat(parseRecursive(code.substring(nextClosure+1))) : [code.substring(1,nextClosure)]; } } } 

      This method saw an insignificant performance improvement in Internet Explorer 11, but was actually somewhat slower than your original code in Firefox and Chrome.

      My next thought was to use RegEx, but Flambino beat me to the punch! Flambino's method performs significantly faster in Internet Explorer 11, but (alas!) significantly slower in Chrome and Firefox.

      Here's a fork of your fiddle with the different approaches.

      Here's the jsperf test to see how they compare.

      Since recursive programming requires a bit of "special" thinking and regular expressions require a bit of research, the linear approach in your original code may actually be the easiest to maintain in the long run. On the other hand, the regular expression approach has fewer lines and logical tests.

      \$\endgroup\$

        Start asking to get answers

        Find the answer to your question by asking.

        Ask question

        Explore related questions

        See similar questions with these tags.