I've written a short python script that attempts to solve the problem of finding the longest common substring using the dynamic programming technique. It is meant to be generalised so I could plug in any number of strings and it would find the longest common substring.
def longest_common_substring(*strings): table = defaultdict(int) for pos in product(*(range(len(s)) for s in strings)): same = len(set(s[i] for s, i in zip(strings, pos))) is 1 table[pos] = table[tuple(map(lambda n: n - 1, pos))] + 1 if same else 0 return max(table.items(), key=operator.itemgetter(1))
This works fine for a small number of short strings, but the space and time complexity absolutely blows up with longer strings.
I got this from wikipedia, and since this approach is clearly terrible for multiple longer strings (or maybe my implementation is just bad!?), I am wondering what I could do to improve it? Wikipedia also metions a generalised suffix trees... I am not familiar with them at all so would that be a better approach?
Also, if its my implementation, I'd love to know what's wrong and what I could do better in terms of space complexity.