5
\$\begingroup\$

How can the following recursion + memoization code be converted into a tabulation format dynamic programming solution?

The code is working, but I want to improve it.

The challenge I am facing is handling negative indices (count < 0) and shifting them to the positive side.

Problem link: https://leetcode.com/problems/valid-parenthesis-string/description/

Given a string s containing only three types of characters: '(', ')' and '*', return trueifsis valid.

The following rules define a valid string:

  • Any left parenthesis '(' must have a corresponding right parenthesis ')'.
  • Any right parenthesis ')' must have a corresponding left parenthesis '('.
  • Left parenthesis '(' must go before the corresponding right parenthesis ')'.
  • '*' could be treated as a single right parenthesis ')' or a single left parenthesis '(' or an empty string "".

Example 1:

Input: s = "()"
Output: true

Example 2:

Input: s = "(*)"
Output: true

Example 3:

Input: s = "(*))"
Output: true

Constraints:

  • 1 <= s.length <= 100
  • s[i] is '(', ')' or '*'.
public static boolean checkValidStringMem(String s) { int n = s.length(); Boolean[][] mem = new Boolean[n][n + 1]; return checkValidStringMem(s, 0, 0, mem); } public static boolean checkValidStringMem(String s, int i, int count, Boolean[][] mem) { if (count < 0) return false; if (i == s.length()) return count == 0; if (mem[i][count] != null) return mem[i][count]; if (s.charAt(i) == '(') return mem[i][count] = checkValidStringMem(s, i + 1, count + 1, mem); else if (s.charAt(i) == ')') return mem[i][count] = checkValidStringMem(s, i + 1, count - 1, mem); else // '*' can be ')' or '(' or empty character { return mem[i][count] = checkValidStringMem(s, i + 1, count + 1, mem) || checkValidStringMem(s, i + 1, count - 1, mem) || checkValidStringMem(s, i + 1, count, mem); } } 
\$\endgroup\$

    3 Answers 3

    6
    \$\begingroup\$

    When you use dynamic programming / memoization, document the meaning of the table entries or the function parameters/result. I usually write that down before I start coding. Here for example:

     // dp[i][count] = Whether after the first i characters // we might have count open parentheses boolean[][] dp = new boolean[n + 1][n + 1]; 

    Then:

    1. Start with dp[0][0] = true.
    2. Spread the trues.
    3. Return dp[n][0].

    The code:

    public static boolean checkValidString(String s) { int n = s.length(); // dp[i][count] = Whether after the first i characters // we might have count open parentheses boolean[][] dp = new boolean[n + 1][n + 1]; dp[0][0] = true; for (int i = 0; i < n; i++) { char c = s.charAt(i); for (int count = 0; count <= i; count++) { if (dp[i][count]) { if (c == '(') { dp[i + 1][count + 1] = true; } else if (c == ')') { if (count > 0) dp[i + 1][count - 1] = true; } else { dp[i + 1][count + 1] = true; if (count > 0) dp[i + 1][count - 1] = true; dp[i + 1][count] = true; } } } } return dp[n][0]; } 

    You can save memory by only holding two table rows, for the current and the next row (i and i + 1).

    You can save even more memory, and a lot of time, and code, by realizing that the trues in each row are contiguous. For example after seeing the characters "((*(" we might have 2, 3 or 4 open parentheses. It's not possible to have something like 2 or 4 but not 3.

    So you really only need to remember the minimum and maximum number of open parentheses. This takes O(n) time and O(1) space:

    public static boolean checkValidString(String s) { int n = s.length(); int minOpen = 0, maxOpen = 0; for (int i = 0; i < n; i++) { char c = s.charAt(i); if (c == '(') { minOpen++; maxOpen++; } else if (c == ')') { if (minOpen > 0) minOpen--; if (maxOpen == 0) return false; maxOpen--; } else { if (minOpen > 0) minOpen--; maxOpen++; } } return minOpen == 0; } 

    Or we could use an integer's bits to reflect a dp row (requires import java.math.BigInteger;). This makes it O(n^2) again, but n is limited to 100, so that's fine. And it's short and simple code, as the integers do looping and clipping for us:

    public static boolean checkValidString(String s) { BigInteger open = BigInteger.ONE; for (char c : s.toCharArray()) if (c == '(') open = open.shiftLeft(1); else if (c == ')') open = open.shiftRight(1); else open = open.or(open.shiftLeft(1)) .or(open.shiftRight(1)); return open.testBit(0); } 

    And actually the n=100 limit means tracking up to 50 open parentheses suffices and we don't need BigInteger (just saw that pointed out on LeetCode when I looked whether someone had done this already).

    public static boolean checkValidString(String s) { long open = 1; for (char c : s.toCharArray()) if (c == '(') open <<= 1; else if (c == ')') open >>= 1; else open |= (open << 1) | (open >> 1); return (open & 1) == 1; } 
    \$\endgroup\$
    5
    • \$\begingroup\$Instead of documenting what the table means in comments, better to use meaningful names for variables.\$\endgroup\$CommentedFeb 27 at 9:57
    • \$\begingroup\$@kiner_shah No, definitely not. Maybe additionally, but not instead. And... I checked: all of my variable names already are meaningful (and most are common).\$\endgroup\$CommentedFeb 27 at 10:12
    • \$\begingroup\$I disagree with you. It should not be additionally, but instead. You can better rename the variables in your very first function. For instance, dp can be renamed to lookupTable or cache or precomputed. Your loop variable i can be renamed to afterFirstKCharacters or simply afterFirstK or a better name. count can be renamed to noOfOpenParentheses. Some effort needs to be put, but in the end, the code is easily understandable.\$\endgroup\$CommentedFeb 27 at 10:55
    • \$\begingroup\$@kiner_shah No. Not even all of your longest names together adequately explain the meaning of the table. So removing the explanation would be bad. Why do you want to do that? And dp is the standard name, see for example LeetCode's own editorial for this problem. Using standards is good. Helps people understand. And a ridiculously long nonstandard name like afterFirstKCharacters instead of the standard and easy to read i? That would just make the code hard to read, you'd barely be able to see the algorithm behind your masses of characters.\$\endgroup\$CommentedFeb 27 at 11:28
    • \$\begingroup\$Well, it seems we both disagree with each other. I will leave it here and take a snack break.\$\endgroup\$CommentedFeb 27 at 11:31
    5
    \$\begingroup\$

    What I like

    • The code has a nice and consistent layout that inspires confidence.

    What I don't like

    • Both functions have the same name, although one is the 'interface' while the other is the 'worker bee'.
    • Not fond of generic count representing the dearth of comments available to the reader.
    • How many times will the string's length be determined during execution?
    • if/else if/else combined with 3 returns is suggestive of a necessity that is not part of the solution. Two if conditionals would suffice.
    • Brute force will find a resolution ("good" or "bad"), but is costly. Imagine the string's maximum length was 100,000 or more...

    Suggestions

    Is "memoization" a requirement, or simply an approach taken?

    1. Try downcountingi so the base case will be when i == 0.
    2. Re-write without recursion (that has the potential to burst any stack).
    3. Use "pre-filtering" of the string to potentially reduce the search space.
      "...()...", "^(....)$", "...)(..." are samples of substrings that could be (iteratively) eliminated before searching the many, many permutations (with backtracking) of what remains.

    Late addition:

    Exhaustive searching of large permutation possibilities is to be avoided.

     if (count < 0) // have run-out of open's and possible open's 

    With full credit to @BenVoigt for his comment, another limit that may prune the search space should be used:

     if (count < 0 || count > (s.length - i)) 

    If more than 1/2 of the characters were 'opens', there's no possibility to close them all with what remains.

    This would also shrink the 'memoization' array down to 1/2 its current size.

    Leaving it to the OP to account for any off-by-one problem.


    Speculating and heuristics

     else // '*' can be ')' or '(' or empty character { return mem[i][count] = checkValidStringMem(s, i + 1, count + 1, mem) || checkValidStringMem(s, i + 1, count - 1, mem) || checkValidStringMem(s, i + 1, count, mem); } 

    It would be better if the comment reflected the sequence of speculative interpretations of * that the code performs. In this case:

     else // '*' can be '(' or ')' or empty character '' 

    Further, the brute force search is blind to the characters that are yet to be examined.

    My gut feeling is that, without resorting to more complexity, it may be better to initially presume any given * is an 'empty character', thereby placing fewer expectations on the values of those remaining characters. What remains is unknown. As written, the code initially compels a deeper nesting by what remains, and must explore those permutations. Consider the operations to determine that the string "***(*******)****" IS valid.

    \$\endgroup\$
    3
    • 1
      \$\begingroup\$"Newhart" fans will appreciate the humour of, "This is my brother Daryl... and this is my other brother Daryl..." :-)\$\endgroup\$
      – Fe2O3
      CommentedFeb 13 at 6:25
    • 2
      \$\begingroup\$A more general approach to prefiltering might count the number of ( and ) and guarantee that the balance of * is equal to the mismatch between ( and ). This reduces the number of branches that need to be explored from pow(3, N) to O(nCr(N, k)). For example with 49 (s, 40 )s and 11 *s there can be at most 1 * representing a (.\$\endgroup\$
      – Ben Voigt
      CommentedFeb 14 at 0:06
    • \$\begingroup\$@BenVoigt Thank you for that. I think another answer, here, really does 'cut to the chase', using "tabulation" to quickly get to the desired result. The OP tackled a challenge in one way, and I hoped only to suggest some 'tweaks' to their approach. Again, thank you. :-)\$\endgroup\$
      – Fe2O3
      CommentedFeb 14 at 0:15
    4
    \$\begingroup\$

    Hello and welcome to code review!

    When you go through your repo two years from now and stumble upon this code, will you understand what it does? For your own sake, you must explain the logic of the algorithm in the class-level code comment. Since we have a rule of not changing the code after posting, you should add that explanation to the question text. I've recently come to the conclusion that the most important quality of a programmer is empathy. The ability to put yourself into someone else's position and think do they understand the code I wrote? This takes practice. The best way is to look at other people's code and think if you understand it, what would have to change for you to understand it and apply that to your own code. Guess what? You are in the correct place for that. :) Code review is almost as much learning to the reviewer as it is to the reviewee.

    Another important thing you should build into a habit is thinking about your variable names. The old joke that "there are only two hard things in computer science, concurrency and variable naming" is bollocks. Variable naming is easy. You just have to look at the name and think "does this name describe why the variable exists?" if it does, ask "does this name apply to more than one thing?" If it does, the name is too generic and you must improve the name until it passes the test.

    None of the parameter names s, i, count or mem pass this test. "S" should be input, "i" looks like it is the startIndex. Is count supposed to keep count of the the balance between left and right parenthesis? Then consider a name that has words "balance" and "parenthesis" in it.

    Regarding the algorithm, I don't understand it straight away and I have very little desire to reverse-engineer it from the code only. As leetcode problems go, this is pretty much as badly spcified as they come. There's only a few trivial examples of correct input, no description of invalid input. Is "()()" even supposed to be valid? I have to guess from the restrictions placed (the very limited input length) and the examples that this is intended to be a simple recursion problem of "nested parenthesis with an extra step" where you only need to consider the leftmost and rightmost characters at a given time.

    \$\endgroup\$
    0

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.