You can do this more efficiently by using a trie.
Let's start from the beginning. Imagine the solution is ['we', 'these'].
You can call your check method with all of your initial characters, and it will tell you that only the characters: 'weths' are in the solution.
Then you can check for two character sequences, at the moment you're currently going back to the full set of initial characters, but you don't need to do this, you know that only the characters 'weths' are possible (+ the empty case where you've found a suffix. So you can just search for 'ww, we, wt, wh, ws, ew, ...' and so on.
After this you'll have the following combinations:
- 'we', 'e$', 'th', 'he', 'es', 'se' (where $ means no extension matched - i.e. a suffix)
Now we can try and extend again. But here we have a bit more of an advantage. When we look at the word 'we' we know that it must end in a valid two-letter sequence starting with 'e', i.e. 'es' or 'e$', so we can just try 'wes', and then when that doesn't work we know we have 'we$', part of the solution. We can repeat this logic with all the remaining sequences:
- 'we' -> either 'wes' or 'we$'
- 'th' -> either 'the' or 'th$
- 'he' -> either 'hes' or 'he$'
- 'es' -> either 'ese' or 'es$'
- 'se' -> either 'ses' or 'se$'
Once finished we're left with the following:
'we$', 'e$', 'the', 'hes', 'ese', 'se$'
We can drop the 'e$' as it's a substring of 'we$' and 'se$' and you're only looking for a minimal set of strings (I assume).
This brings us onto our third round, and at this point we can go through again, skipping complete words like 'we$'. But this time we can look at the last two letters of each word to find our next options:
- 'we$' Complete
- 'the' -> 'thes' or 'the$'
- 'hes' -> 'hese' or 'hes$'
- 'ese' -> 'ese$' only
This gives us: 'we$', 'e$', 'thes', 'hese', 'ese$', 'se$'
Now our fourth round:
- 'thes' -> 'these' or 'thes$'
- 'hese' -> 'hese$' only
And finally:
This gives us a list of valid suffices:
['we$', 'e$', 'these$', 'hese$', 'ese$', 'se$']
Which we can deduplicate as you did above.
In terms of implementation - I'd create a trie as a nested dictionary structure, with methods to add words and identify which possible continuations words have:
from collections import defaultdict def recursive_factory(): return defaultdict(recursive_factory) class Trie: def __init__(self): self._lookup = defaultdict(recursive_factory) def _add_string(self, string): lookup = self._lookup for char in string: lookup = lookup[char] def add_word(self, word): for offset in range(len(word)): self._add_string(word[offset:]) def add_words(self, words): for word in words: self.add_word(word) def get_valid_next_chars(self, prefix=''): lookup = self._lookup for char in prefix: lookup = lookup[char] return set(lookup.keys())
Then we just need to go through the steps above (I created a helper class to store the final result).
class WordList: def __init__(self): self._suffices = set() def add_suffix(self, suffix): to_drop = set() for old_suffix in self._suffices: if old_suffix in suffix: to_drop.add(old_suffix) self._suffices -= to_drop self._suffices.add(suffix) def add_suffices(self, suffices): for suffix in suffices: self.add_suffix(suffix) def words(self): return tuple(self._suffices) def next_guesses(valid_guesses, trie): for guess in valid_guesses: for char in trie.get_valid_next_chars(prefix=guess[1:]): yield guess + char def solve(initial_guesses, check, trie=None, word_list=None): if trie is None: trie = Trie() if word_list is None: word_list = WordList() valid_guesses = [guess for guess in initial_guesses if check(guess)] if valid_guesses: trie.add_words(valid_guesses) word_list.add_suffices(valid_guesses) solve(next_guesses(valid_guesses, trie), check, trie=trie, word_list=word_list) return word_list.words()
All in all this algorithm is about 200x faster on my machine. It can be made faster still by increasing the speed of the check function (which can also be very efficiently solved with a trie). But at this point we're only calling check ~360 times compared to over 4000.
check
function supposed to be a "black box", as in, it's an external function you have no control over? And you're trying to figure out which words satisfy it?\$\endgroup\$