Leetcode #91 - Number of ways to decode a string

Question

I'm working on problem 91 on Leetcode.com called Decode Ways and I've successfully managed to get a working recursive solution but it results in Time Limited Exceeded (TLE). I'm new to utilizing memoization and I've been unable to discover how to properly memoize my results at each recursive step and check if I've already solve the sub-problem.

Prompt:

A message containing letters from A-Z is being encoded to numbers using the following mapping:
'A' -> 1 'B' -> 2 ... 'Z' -> 26
Given a non-empty string containing only digits, determine the total number of ways to decode it.

Working Code (TLE):

public int NumDecodings(string s) { if (s == null || s.Length == 0) return 0; // int[] dp = new int[s.Length + 1]; // for (int i = 0; i < dp.Length; ++i) dp[i] = -1; // dp[0] = 1; // dp[1] = Decode(s, 0, 1, dp) + Decode(s, 0, 2, dp); int jump1 = Decode(s, 0, 1); int jump2 = Decode(s, 0, 2); return jump1 + jump2; // return dp[1]; } public int Decode(string s, int i, int len) { if (i + len - 1 >= s.Length) return 0; // if (dp[i] > -1) // return dp[i]; int sub = Convert.ToInt32(s.Substring(i, len)); if (sub > 26 || sub < 1 || s.Substring(i, len)[0] == '0') return 0; else if (i + len - 1 == s.Length - 1 && sub < 27 && sub > 0) return 1; int jump1 = i + len - 1 + 1 < s.Length ? Decode(s, i + len, 1) : 0; int jump2 = i + len - 1 + 2 < s.Length ? Decode(s, i + len, 2) : 0; // dp[i] = jump1 + jump2; return jump1 + jump2; // return dp[i]; }

Eric Lippert · Accepted Answer · 2018-12-04 17:44:59Z

(A previous version of this answer was based on a total misunderstanding of the problem. I've deleted it.)

I've been unable to discover how to properly memoize my results at each recursive step and check if I've already solve the sub-problem.

The problem given could be stated as: we have tokens "1", "2", "3", ... "26", and we are given a text which consists of arbitrarily many tokens concatenated together. How many possible lexes are there?

For example, "1234" could lex as "1 2 3 4", or "1 23 4", or "12 3 4" but not "123 4" or "12 34".

Let's start by implementing the logic much more cleanly:

static int Lexes(string s) { // Assumption: the string is non-null, possibly empty, // and contains only digits 0-9. If those assumptions // are not valid, add error logic. if (s.Length == 0) return 0; else if (s[0] == '0') return 0; else if (s.Length == 1) return 1; else if (s[0] == '1') return Lexes(s.Substring(1)) + Lexes(s.Substring(2)); else if (s[0] == '2' && '0' <= s[1] && s[1] <= '6') return Lexes(s.Substring(1)) + Lexes(s.Substring(2)); else return Lexes(s.Substring(1)); }

Plainly this allocates n-squared memory for all the substrings, but we'll solve that problem later. My point is that we now have something very easy to understand that we can work from, and that doesn't mess around with integer parsers or any such thing.

At this point we could do a straightforward memoization:

static Func<A, R> Memoize<A, R>(Func<A, R> f) { var d = new Dictionary<A, R>(); return a=> { R r; if (!d.TryGetValue(a, out r)) { r = f(a); d.Add(a, r); } return r; }; }

Now we have a device that can memoize a function, so let's rename:

static int LexesUnmemoized(string s) { // Assumption: the string is non-null, possibly empty, // and contains only digits 0-9. If those assumptions // are not valid, add error logic. if (s.Length == 0) return 0; else if (s[0] == '0') return 0; else if (s.Length == 1) return 1; else if (s[0] == '1') return Lexes(s.Substring(1)) + Lexes(s.Substring(2)); else if (s[0] == '2' && '0' <= s[1] && s[1] <= '6') return Lexes(s.Substring(1)) + Lexes(s.Substring(2)); else return Lexes(s.Substring(1)); } static Func<string, int> Lexes = Memoize(LexesUnmemoized);

And now we can call Lexes and it will be memoized. But we can do better than this. Full on memoization is actually more work than we need to do, and we're still creating too many strings.

What we next observe is that we always ask a question about the end of the string. This means that we can answer the question by working backwards! If we know the answer for the last two characters, we can work out the answer for the third last, which lets us work out the answer for the fourth last, and so on:

static int LexesWithDynamicProgramming(string s) { if (s.Length == 0) return 0; if (s.Length == 1) return Lexes(s); // We have at least two digits. int[] results = new int[s.Length]; // All zeros. // How many lexes are there of the last character? results[s.Length - 1] = Lexes(s.Substring(s.Length - 1)); // How many lexes are there of the last two characters? results[s.Length - 2] = Lexes(s.Substring(s.Length - 2)); // Now we're set up to fill in results fully: for (int i = s.Length - 3; i >= 0; i -= 1) { if (s[i] == '0') results[i] = 0; else if (s[i] == '1') results[i] = results[i + 1] + results[i + 2]; else if (s[i] == '2' && '0' <= s[i + 1] && s[i + 1] <= '6') results[i] = results[i + 1] + results[i + 2]; else results[i] = results[i + 1]; } return results[0]; }

This technique of filling in a table that represents the recursive computations that you'll need in the future is very common in dynamic programming.

EXERCISES

Eliminate the unnecessary Lexes used here; we only call it for cases where the string has 1 or 2 characters, which are the easy cases.
I made an array of size O(n) in the size of the string, but that's not necessary. Can you see how to rewrite this logic so that we consume only O(1) extra memory?
Can you make this program generally more elegant and easy to follow?
Solve the more general problem: given a list of words with no spaces, and a block of text with no spaces, how many ways are there to insert spaces such that you end up with only words that were in the list? This is surprisingly hard to solve in a manner that is efficient in both time and space; it is character-building to try! (Hint: it is even more character-building to trie.)

Although you didn't exactly implement memoization into my code, your example helped me create a DP solution and its not accepted :) — springathing, CommentedDec 7, 2018 at 19:25

Pieter Witvoet · Accepted Answer · 2018-12-06 09:51:56Z

Memoization

Looking into memoization is a good idea - in this case it allows you to achieve \$O(n)\$ runtime performance instead of \$O(2^n)\$. One way to add memoization is to create a wrapper method that takes the same arguments as Decode. This method first looks up the given arguments in a dictionary, and if they're present, it can return the memoized result. If not, it will call Decode and store its result in the dictionary before returning it.

However, I would recommend removing NumDecodings and rewriting Decode so it only takes a single string parameter. That will simplify memoization and make it a little more efficient. For example, "412123" and "512123" both have the same suffix ("12123"), but because the 'sub-calls' ("412123", 1, 1) and ("512123", 1, 1) use different arguments (and thus different memoization keys) you won't benefit from memoization across those different inputs.

Other notes

Instead of passing the original input string and an offset and length, consider passing 'the rest of the string' instead. Decode can then call Decode(s.Substring(1)) if the leading digit is a valid unit, and Decode(s.Substring(2)) if the leading 2 digits are a valid unit.
There are various things in the current code that can be simplified:
- if (i + len - 1 >= s.Length) to if (i + len > s.Length)
- s.Substring(i, len)[0] == '0' to s[i] == '0'
- else if (i + len - 1 == s.Length - 1 to else if (i + len == s.Length
- jump1 = i + len - 1 + 1 to jump1 = i + len
- jump2 = i + len - 1 + 2 to jump2 = i + len + 1
Moving s.Substring(i, len)[0] == '0' to an else if branch allows you to remove the sub < 27 && sub > 0 checks below.
Disabling code during development is normal, but it's a good idea to clean things up from time to time.
I would rename Decode to something more descriptive like ValidDecodingsCount - Decode implies that it actually decodes the given input, which is not the case. jump1 and jump2 aren't very descriptive names either, but it's a bit difficult to come up with better names. Maybe rest1Combinations and rest2Combinations?

I was hoping more for a concrete implementation of memoization with my code, but your optimizations were great — springathing, CommentedDec 7, 2018 at 19:23

score 2 · Accepted Answer · 2018-12-06 11:05:49Z

I'm a little late with this answer, and from your question and the other answers I can't figure out if the problem has changed from TLE to memoization.

For small code strings (length < 100) it will help a lot avoiding the creation of the substrings, but for larger code strings this is a minor problem compared to the bad overall time complexity as Pieter Witvoet has explained.

Below I've refactored your algorithm so it no longer creates strings but instead creates the values to check by looking chars up in the code string itself. I've changed the names of your variables - which doesn't improve the execution time but only my understanding of the code :-):

public int NumDecodings(string code) { if (code == null || code.Length == 0) return 0; int count1 = Decode(code, 0, 1); int count2 = Decode(code, 0, 2); return count1 + count2; } public int Decode(string code, int offset, int numLength) { if (offset + numLength - 1 >= code.Length || code[offset] == '0') return 0; int value = 0; if (numLength == 1) value = code[offset] - '0'; else { value = (code[offset] - '0') * 10 + (code[offset + 1] - '0'); } if (value > 26 || value < 1) return 0; else if (offset + numLength - 1 == code.Length - 1 && value < 27 && value > 0) return 1; int count1 = offset + numLength - 1 + 1 < code.Length ? Decode(code, offset + numLength, 1) : 0; int count2 = offset + numLength - 1 + 2 < code.Length ? Decode(code, offset + numLength, 2) : 0; return count1 + count2; }

When testing against 5000 random valid strings with a max length of 60 the above refactoring reduces the total execution time from about 1100 to 60 ms in release mode.

The below is an iterative version, that to the best of my knowledge is O(n) in time complexity:

public int CountCodeCombinations(string code) { if (string.IsNullOrWhiteSpace(code) || code[0] == '0') return 0; if (code.Any(ch => !char.IsDigit(ch))) throw new ArgumentException("Only digit characters are allowed in the input string", nameof(code)); int length = code.Length; int combinations = 1; char current; char next; int lastCodeIndex = code.Length - 2; int lastDifference = 0; for (int i = length - 2; i >= 0; i--) { current = code[i]; next = code[i + 1]; if (next == '0') { if (current > '2' || current == '0') return 0; } else if (current == '1' || current == '2' && next < '7') { if (i < length - 2 && code[i + 2] == '0') continue; int oldCombinations = combinations; if (combinations == 1) combinations += 1; else if (lastCodeIndex - i > 1) combinations = combinations * 2; else combinations = combinations * 2 - lastDifference; lastDifference = combinations - oldCombinations; lastCodeIndex = i; } } return combinations; }

While that helps, it's still O(2^n), so runtime will still blow up past certain input sizes. — Pieter Witvoet, CommentedDec 6, 2018 at 9:56

Stack Exchange Network

Leetcode #91 - Number of ways to decode a string

3 Answers 3

Memoization

Other notes

Hot Network Questions

Leetcode #91 - Number of ways to decode a string

3 Answers 3

Memoization

Other notes

Related

Hot Network Questions