3
\$\begingroup\$

Hello programming gurus/reviewers, I attempted to solve the Minimum Window Substring (Leetcode 76) problem using Python 3. My solution involved using Counters and backtracking. On submission, the solution worked for all cases, other than the last test case where the strings were almost 10000 characters long. This last case timed out, and I am wondering if someone could point out the bottleneck(s) in my code that's causing this. Any input would be greatly appreciated. Thanks in advance!

from collections import Counter def get_indices(s: str, t: str, left_edge: int): """ Helper method to return the indices of test string 't' in main string 's' :param s: Main string to determine window in :param t: Test string :param left_edge: Starting index where search for t in s should take place :return: A List of indices corresponding to positions of characters of t in s. If character is not found, the index is set to -1 """ indices = [0] * len(t) index_list = dict() for index, char in enumerate(t): if char not in s[left_edge:]: index_list[char] = -1 else: if index_list.get(char, -1) >= 0: index_list[char] = s.find(char, index_list.get(char, 0) + 1) else: index_list[char] = s.find(char, left_edge) indices[index] = index_list[char] return indices def is_valid_window(main_string, search_string): """ Helper method to determine if the search_string is in the main_string. This is done using the character count in both strings :param main_string: String where the window is to be validated :param search_string: Substring to search for in the main string :return: True, if the search string is 'contained' in the main string False, otherwise """ main_counter = Counter(main_string) search_counter = Counter(search_string) for key, value in search_counter.items(): if main_counter[key] < value: return False return True def get_min_window(s, t): """ Main method to determine the smallest window of substring t in s. The substring may or may not be contained in the same order/sequence in s. If a substring is not found, return an empty string :param s: Main string to determine a window in :param t: Test string to search for, in the main string :return: If found, the smallest window in s, that contains t. Empty string otherwise """ min_window = '' # Set the min length to be larger than the main string's length min_length = len(s) + 1 # If the test string is larger than the main string, or if either string is empty, or the test string is a single # character which is not in the main string, return an empty string if not s or not t: return min_window if len(s) < len(t): return min_window if len(t) == 1: if t in s: return t else: return min_window # Markers to track the left and right edges of the window left = right = 0 while right < len(s): indices = get_indices(s, t, left) left = min(indices) right = max(indices) if left < 0: return min_window # At this point, we have a window in place. But now, we need to determine if this is the smallest window between # characters at 'left' and 'right' indices. For that, we move from 'left' to 'right' and see if t can still be # contained in s. This will give us a new left index. Then we determine if this is the smallest window we have # encountered thus far, and update if necessary while left < right: if is_valid_window(s[left + 1:right + 1], t): left += 1 break if right - left + 1 < min_length: min_window = s[left:right + 1] min_length = len(min_window) # We have now found the smallest window at index left. So we move on to the next index left += 1 return min_window 
\$\endgroup\$

    1 Answer 1

    1
    \$\begingroup\$

    Style, naming, comments, etc.

    Great job! Yes, you could add some more type annotations, but never mind. You're good.

    Optimization

    Python is a great language in terms of readability... and not so great in terms of performance traps. Every slice, in and find on str are \$O(n)\$ time complexity, and used in a loop over characters of that string, you get \$O(n^2)\$, which is not good. You got the idea right, but because of this hidden work you're losing it.

    Don't save the string - save its coordinates

    I think code speaks for itself:

    if right - left + 1 < min_length: min_right, min_left = right, left min_length = right - left + 1 ... return s[min_right:min_left+1] 

    One slice after the loop is enough.

    Keep track on counters

    The whole algorithm is something like this:

    • We start at left=0, right=0, counter of t (tcounter) filled (and remains so), counter of s[left:right] (scounter) empty.
    • While scounter doesn't have all needed characters (compare it to tcounter), increase right, adding s[right] into scounter (with Counter.update())
    • While scounter has all needed characters, increase left, removing from scounter (with Counter.subract())
    • Save (left-1,right) pair (not the substring!), if it is minimal (i.e. right-left is minimal), and repeat the process until right reaches the end of the string.
    • Now, use saved left and right values to get the slice you need.

    This algorithm is something near \$O(len(s) len(t) log(len(t)))\$, with everything after \$len(s)\$ is for accesing counter. I guess it should fit.

    Can we do better?

    I think yes. We can use lists instead of Counters (like scounter[s[right]-'a'] += 1 instead of scounter.update(s[right])), removing \$log(len(t))\$ part, and even more:

    • let's count the number of needed chars, say needed.
    • Whenever a char is added to scounter, if it brings us closer to the value in tcounter, needed is reduced.
    • Whenever a char is removed, if it makes not enough chars, needed is increased. -
    • If needed is 0 - we have the window between left and right. This way we can get \$O(len(s))\$ time complexity.
    \$\endgroup\$
    2
    • \$\begingroup\$Ahh thank you sir! My initial guess was that Counters were causing this to slow down. But I forgot about slicing and other operations operating in O(n) complexity! I am going to try your suggestions to see if I can get over the hump. Will probably update this thread in a few days.\$\endgroup\$CommentedJul 22, 2021 at 14:51
    • \$\begingroup\$Just incorporated your suggestions in a new method. Worked like a charm! Thank you!\$\endgroup\$CommentedJul 28, 2021 at 17:42

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.