(A previous version of this answer was based on a total misunderstanding of the problem. I've deleted it.)
I've been unable to discover how to properly memoize my results at each recursive step and check if I've already solve the sub-problem.
The problem given could be stated as: we have tokens "1", "2", "3", ... "26", and we are given a text which consists of arbitrarily many tokens concatenated together. How many possible lexes are there?
For example, "1234" could lex as "1 2 3 4", or "1 23 4", or "12 3 4" but not "123 4" or "12 34".
Let's start by implementing the logic much more cleanly:
static int Lexes(string s) { // Assumption: the string is non-null, possibly empty, // and contains only digits 0-9. If those assumptions // are not valid, add error logic. if (s.Length == 0) return 0; else if (s[0] == '0') return 0; else if (s.Length == 1) return 1; else if (s[0] == '1') return Lexes(s.Substring(1)) + Lexes(s.Substring(2)); else if (s[0] == '2' && '0' <= s[1] && s[1] <= '6') return Lexes(s.Substring(1)) + Lexes(s.Substring(2)); else return Lexes(s.Substring(1)); }
Plainly this allocates n-squared memory for all the substrings, but we'll solve that problem later. My point is that we now have something very easy to understand that we can work from, and that doesn't mess around with integer parsers or any such thing.
At this point we could do a straightforward memoization:
static Func<A, R> Memoize<A, R>(Func<A, R> f) { var d = new Dictionary<A, R>(); return a=> { R r; if (!d.TryGetValue(a, out r)) { r = f(a); d.Add(a, r); } return r; }; }
Now we have a device that can memoize a function, so let's rename:
static int LexesUnmemoized(string s) { // Assumption: the string is non-null, possibly empty, // and contains only digits 0-9. If those assumptions // are not valid, add error logic. if (s.Length == 0) return 0; else if (s[0] == '0') return 0; else if (s.Length == 1) return 1; else if (s[0] == '1') return Lexes(s.Substring(1)) + Lexes(s.Substring(2)); else if (s[0] == '2' && '0' <= s[1] && s[1] <= '6') return Lexes(s.Substring(1)) + Lexes(s.Substring(2)); else return Lexes(s.Substring(1)); } static Func<string, int> Lexes = Memoize(LexesUnmemoized);
And now we can call Lexes
and it will be memoized. But we can do better than this. Full on memoization is actually more work than we need to do, and we're still creating too many strings.
What we next observe is that we always ask a question about the end of the string. This means that we can answer the question by working backwards! If we know the answer for the last two characters, we can work out the answer for the third last, which lets us work out the answer for the fourth last, and so on:
static int LexesWithDynamicProgramming(string s) { if (s.Length == 0) return 0; if (s.Length == 1) return Lexes(s); // We have at least two digits. int[] results = new int[s.Length]; // All zeros. // How many lexes are there of the last character? results[s.Length - 1] = Lexes(s.Substring(s.Length - 1)); // How many lexes are there of the last two characters? results[s.Length - 2] = Lexes(s.Substring(s.Length - 2)); // Now we're set up to fill in results fully: for (int i = s.Length - 3; i >= 0; i -= 1) { if (s[i] == '0') results[i] = 0; else if (s[i] == '1') results[i] = results[i + 1] + results[i + 2]; else if (s[i] == '2' && '0' <= s[i + 1] && s[i + 1] <= '6') results[i] = results[i + 1] + results[i + 2]; else results[i] = results[i + 1]; } return results[0]; }
This technique of filling in a table that represents the recursive computations that you'll need in the future is very common in dynamic programming.
EXERCISES
Eliminate the unnecessary Lexes
used here; we only call it for cases where the string has 1 or 2 characters, which are the easy cases.
I made an array of size O(n) in the size of the string, but that's not necessary. Can you see how to rewrite this logic so that we consume only O(1) extra memory?
Can you make this program generally more elegant and easy to follow?
Solve the more general problem: given a list of words with no spaces, and a block of text with no spaces, how many ways are there to insert spaces such that you end up with only words that were in the list? This is surprisingly hard to solve in a manner that is efficient in both time and space; it is character-building to try! (Hint: it is even more character-building to trie.)