5
\$\begingroup\$

I was attempting to solve LeetCode 139 - Word Break

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.

Note that the same word in the dictionary may be reused multiple times in the segmentation.

Example 1:

Input: s = "leetcode", wordDict = ["leet","code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".

Example 2:

Input: s = "applepenapple", wordDict = ["apple","pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple". Note that you are allowed to reuse a dictionary word.

Example 3:

Input: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
Output: false

Constraints:

  • 1 <= s.length <= 300
  • 1 <= wordDict.length <= 1000
  • 1 <= wordDict[i].length <= 20
  • s and wordDict[i] consist of only lowercase English letters.
  • All the strings of wordDict are unique.

I watched some videos saying implementing a trie would be an efficient approach to solve this so I implemented a Trie as follows:

fileprivate class TrieNode { var data: Character? var isWordEnd = false var next = [Character: TrieNode]() } fileprivate struct Trie { var root = TrieNode() func insert(_ word: String) { var rootNode = root for char in word { rootNode = insertInternal(char, at: rootNode) } rootNode.isWordEnd = true } func insertInternal(_ char: Character, at node: TrieNode) -> TrieNode { var nextNode = TrieNode() if node.next[char] != nil { nextNode = node.next[char]! } nextNode.data = char node.next[char] = nextNode return nextNode } func doesWordExist(_ word: String) -> Bool { var rootNode = root for char in word { let nextNode = rootNode.next if nextNode[char] == nil { return false } rootNode = nextNode[char]! } return rootNode.isWordEnd } } 

Then in order to solve the problem, I took the recursion / backtracking approach with the help of the Trie created above in the following way:

func wordBreak(_ s: String, _ wordDict: [String]) -> Bool { let trie = Trie() for word in wordDict { trie.insert(word) } return wordBreakInternal(s, wordDict: trie) } fileprivate func wordBreakInternal(_ s: String, wordDict: Trie) -> Bool { if s.isEmpty { return true } let leftIndex = s.startIndex var rightIndex = s.startIndex while rightIndex < s.endIndex { let word = String(s[leftIndex ... rightIndex]) if wordDict.doesWordExist(word) { var nextStartIndex = rightIndex s.formIndex(after: &nextStartIndex) let nextWord = String(s[nextStartIndex ..< s.endIndex]) if wordBreakInternal(nextWord, wordDict: wordDict) { return true } } s.formIndex(after: &rightIndex) } return false } 

I feel the solution works in terms of giving the desired output for these test cases:

print(wordBreak("leetcode", ["leet","code"])) // true print(wordBreak("applepenapple", ["apple","pen"])) // true print(wordBreak("catsandog", ["cats","dog","sand","and","cat"])) // false print(wordBreak("aaaaaaa", ["aaaa","aaa"])) // true 

However, when a large string is the input:

s = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"

wordDict = ["a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa","aaaaaaaaa","aaaaaaaaaa"]

I get Time Limit Exceeded in my LeetCode submission.

Could someone advice if the optimization needs to be done in my trie implementation or in my backtracking / recursion with some ideas on how to make things more efficient ?

\$\endgroup\$

    2 Answers 2

    5
    \$\begingroup\$

    Summary: Your code is written clearly, and works correctly as far as I can see. It can be improved at a few points, and string slices can be used for a small performance improvement. In order to make it work with really large strings within the given time constraint, additional techniques like memoization must be used.

    Simplify and improve the code

    The var data: Character? of class TrieNode is assigned, but never used. It can be removed.

    In func insertInternal() a

    var nextNode = TrieNode() 

    is created but not used if node.next[char] already exists. Also the forced unwrapping can be avoided:

    func insertInternal(_ char: Character, at node: TrieNode) -> TrieNode { if let nextNode = node.next[char] { return nextNode } else { let nextNode = TrieNode() node.next[char] = nextNode return nextNode } } 

    In func doesWordExist the variable name of

    let nextNode = rootNode.next 

    is misleading: it is not a node but a dictionary. Again the forced unwrapping (and the comparison against nil) can be avoided:

    func doesWordExist(_ word: Substring) -> Bool { var rootNode = root for char in word { if let nextNode = rootNode.next[char] { rootNode = nextNode } else { return false } } return rootNode.isWordEnd } 

    Use Substring

    It is more efficient to pass Substrings to the recursively called wordBreakInternal() functions instead of Strings. Substring is a “string slice” and shares the storage with the original string. The helper function is then declared as

    func wordBreakInternal(_ s: Substring, wordDict: Trie) -> Bool 

    and called as

    return wordBreakInternal(s[...], wordDict: trie) 

    This makes the function a bit faster, but not fast enough to solve the programming challenge within the time limit.

    Memoization

    The function is too slow for long strings because of the heavy recursion. One approach to reduce the amount of recursive calls is memoization: For each substring which has been determined to be word-breakable or not, remember the result:

    func wordBreak(_ s: String, _ wordDict: [String]) -> Bool { let trie = Trie() var memo = [Substring: Bool]() for word in wordDict { trie.insert(word) } func wordBreakInternal(_ s: Substring) -> Bool { if let memoized = memo[s] { return memoized } if s.isEmpty { return true } let leftIndex = s.startIndex var rightIndex = s.startIndex while rightIndex < s.endIndex { let word = (s[leftIndex ... rightIndex]) if trie.doesWordExist(word) { var nextStartIndex = rightIndex s.formIndex(after: &nextStartIndex) let nextWord = (s[nextStartIndex ..< s.endIndex]) if wordBreakInternal(nextWord) { memo[s] = true return true } } s.formIndex(after: &rightIndex) } memo[s] = false return false } return wordBreakInternal(s[...]) } 

    I have also made the helper function a nested function in the main function, so that the memoization dictionary (and the trie) need not be passed around as function parameters.

    This function computes

    let s = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab" let wordDict = ["a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa","aaaaaaaaa","aaaaaaaaaa"] let result = wordBreak(s, wordDict) 

    in about 7 milliseconds (compiled in Release mode, running on a Macbook Air M2).

    To trie or not to trie

    You have used a Trie in order to determine quickly if the strings starts with one of the given words. As it turns out, using the built-in hasPrefix() method does this even better. Using string slices and memoization as before we get

    func wordBreak(_ s: String, _ wordDict: [String]) -> Bool { var memo = [Substring: Bool]() func wordBreakInternal(_ s: Substring) -> Bool { if let memoized = memo[s] { return memoized } if s.isEmpty { return true } for word in wordDict { if s.hasPrefix(word) { if wordBreakInternal(s.dropFirst(word.count)) { memo[s] = true return true } } } memo[s] = false return false } return wordBreakInternal(s[...]) } 

    which computes the above test case in less than a millisecond.

    \$\endgroup\$
    1
    • 1
      \$\begingroup\$Thank you very much Martin. This answer was so helpful. It helped me optimize my code, improved code readability but it introduced me to memoization which was a key concept in optimizing the solution for submission. Your second solution was also amazing and was something I would never has thought of myself as I just learnt about the swift prefix functions after reading your answer - thank you !\$\endgroup\$CommentedSep 3, 2022 at 17:16
    3
    \$\begingroup\$

    Avoid repeated computations

    For every prefix that will not lead to a solution, the computation is repeated for all permutations of the words that could produce that prefix. For example given the words ["hot", "dog", "hotdog"], and the target string is "hotdogbreadandamillionmorechars", there are two ways to make the prefix hotdog, and the program will compute twice that it's not possible to break the input to words. In this example that's ok, but for the example where you get time limit exceeded it's easy to see that there are many prefixes for which there is a large number of permutations.

    To avoid repeated computations, you could use a set of indexes for which the result has already been computed, and skip.

    Another way is to use memoization as the other answer suggested.

    Take advantage of the trie

    The code doesn't take advantage of the trie to its full potential. The main loop in wordBreakInternal moves the rightIndex to the right one by one, and in every iteration checking if the substring defined by leftIndex and rightIndex is in the trie or not. That is, each iteration starts searching in the trie from the top.

    For example if the tree contains "apple" and you run wordBreakInternal for "apple", the loop will search for "a", then "ap", then "app", and so on, and in each iteration the trie also explores the same sequence of characters.

    A more effective use of the trie would be to add a findPrefixes method, which would take target string to match, and returned all the valid prefixes that exist in the trie. For example if the tree contains ["ap", "apple", "pie"] and the input is "applepie", it would return "ap" and "apple". The caller would then trie to see if it's possible to make the remaining of the input from "plepie" and "pie", respectively.

    Use simpler and more precise names

    In insert, the variable rootNode is repeatedly replaced with an inserted child node. Then it's not the root anymore, so simply node would be a better name, less confusing.

    doesWordExist is a bit lengthy, and "word" is already implied by the trie concept. How about simply exists, or more precisely hasPrefix.

    \$\endgroup\$
    1
    • 2
      \$\begingroup\$Thank you very much Janos. Your tips helped me improve my code readability and avoid redundant computations. The most useful part for me was your tip on using the trie to find prefixes - I would never have thought of this on my own. Thank you very much for taking the time to answer me ... +1\$\endgroup\$CommentedSep 3, 2022 at 17:18

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.