11
\$\begingroup\$

C#'s String.Split method comes from C# 2.0, and lazy operations weren't a feature back then. The task is to split a string according to a (single) separator. Doing so with String.Split is used like

string[] split = myString.Split(new string[] { separator }); 

Now, not that bad, but if you want to add more operations to that string[] (and you probably do), you'll need to loop over the whole array, basically iterating the string twice. Using coroutine-like behaviour of the lazy yield keyword, you can (maybe) do more than one operation while only iterating once over the string.

public static IEnumerable<string> LazySplit(this string stringToSplit, string separator) { if (stringToSplit == null) throw new ArgumentNullException("stringToSplit"); if (separator == null) throw new ArgumentNullException("separator"); var lastIndex = 0; var index = -1; do { index = stringToSplit.IndexOf(separator, lastIndex); if (index < 0 && lastIndex != stringToSplit.Length) { yield return stringToSplit.Substring(lastIndex); yield break; } else if (index >= lastIndex) { yield return stringToSplit.Substring(lastIndex, index - lastIndex); } lastIndex = index + separator.Length; } while (index > 0); } 

While this does not have the "remove empty entries" option, using myString.LazySplit(separator).Where(str => !String.IsNullOrWhiteSpace(str)) should do the job with an O(n) operation, or am I wrong here?

I'm not sure about the time complexity using co-routines, but for the functionality I've written some unit tests to be sure its working:

[TestMethod] public void LazyStringSplit() { var str = "ab;cd;;"; var resp = str.LazySplit(";"); var expected = new[] { "ab", "cd", "" }; var result = resp.ToArray(); CollectionAssert.AreEqual(expected, result); } [TestMethod] public void LazyStringSplitEmptyString() { var str = ""; var resp = str.LazySplit(";"); var expected = new string[0]; var result = resp.ToArray(); CollectionAssert.AreEqual(expected, result); } [TestMethod] public void LazyStringSplitWithoutEmpty() { var str = "ab;cd;;"; var resp = str.LazySplit(";").Where(s => !string.IsNullOrWhiteSpace(s)); var expected = new[] { "ab", "cd" }; var result = resp.ToArray(); CollectionAssert.AreEqual(expected, result); } [TestMethod] public void LazyStringSplitNoSplit() { var str = "ab;cd;;"; var resp = str.LazySplit(" "); var expected = new[] { "ab;cd;;" }; var result = resp.ToArray(); CollectionAssert.AreEqual(expected, result); } 
\$\endgroup\$
4
  • 1
    \$\begingroup\$I don't think this works the way you want it to, ";abc".LazySplit(";") returns an empty sequence.\$\endgroup\$
    – mjolka
    CommentedMar 15, 2015 at 23:12
  • \$\begingroup\$@mjolka yeah, you're right, should've written better unit tests.\$\endgroup\$
    – Mephy
    CommentedMar 15, 2015 at 23:15
  • \$\begingroup\$In which case, it isn't working yet and is off-topic till it is, sadly. You need to get it to function as intended first. Fixing that may even lead you to enlightenment, ofc.\$\endgroup\$
    – itsbruce
    CommentedMar 15, 2015 at 23:28
  • 1
    \$\begingroup\$@itsbruce As it was an edge-case, I'd say this question is still on-topic. Feel free to review it. AFAIK, the fix is a very simple one to apply. > 0 to >= 0, right?\$\endgroup\$CommentedMar 15, 2015 at 23:40

2 Answers 2

9
\$\begingroup\$

Edge cases:

  • ";abc".LazySplit(";") will return an empty sequence. To match the behaviour of ";abc".Split(new char[] { ';' }) it should return the sequence { "", "abc" }.

  • ";abc".LazySplit("") will return a sequence with a single item, the empty string. To match the behaviour of ";abc".Split(new char[] { }) it should return the sequence { ";abc" }.

Here's how I would suggest writing it.

First, deal with the empty separator

if (separator.Length == 0) { yield return value; yield break; } 

Then have two variables, start and end that refer to the start and end of the substring we want to extract.

var start = 0; for (var end = value.IndexOf(separator); end != -1; end = value.IndexOf(separator, start)) { yield return value.Substring(start, end - start); start = end + separator.Length; } yield return value.Substring(start); 

To make your unit tests match the behaviour of string.Split, you also want to change LazyStringSplit to have

var expected = new[] { "ab", "cd", "", "" }; 

and LazyStringSplitEmptyString to have

var expected = new string[] { "" }; 

If you want to test that your implementation matches the behaviour of string.Split, I would suggest introducing a helper method for the tests. Something like

var expected = value.Split(new string[] { separator }, StringSplitOptions.None); CollectionAssert.AreEqual(expected, value.LazySplit(separator)); 
\$\endgroup\$
1
  • \$\begingroup\$I'll keep in mind to check more aggresively for edge-cases. Thanks for the insight.\$\endgroup\$
    – Mephy
    CommentedMar 16, 2015 at 1:39
2
\$\begingroup\$

you'll need to loop over the whole array, basically iterating the string twice

Iterating twice doesn't have to be slower than iterating once, if iterating once is more complicated. When it comes to time complexity, both options are \$O(n)\$. When it comes to actual performance, you need to measure. (And that's assuming that performance of this code actually matters.)

Specifically, arrays are very efficient in .Net, whereas iterating IEnumerable requires two virtual calls for every item.

\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.