Longest common substring using dynamic programming

Question

I've written a short python script that attempts to solve the problem of finding the longest common substring using the dynamic programming technique. It is meant to be generalised so I could plug in any number of strings and it would find the longest common substring.

def longest_common_substring(*strings): table = defaultdict(int) for pos in product(*(range(len(s)) for s in strings)): same = len(set(s[i] for s, i in zip(strings, pos))) is 1 table[pos] = table[tuple(map(lambda n: n - 1, pos))] + 1 if same else 0 return max(table.items(), key=operator.itemgetter(1))

This works fine for a small number of short strings, but the space and time complexity absolutely blows up with longer strings.

I got this from wikipedia, and since this approach is clearly terrible for multiple longer strings (or maybe my implementation is just bad!?), I am wondering what I could do to improve it? Wikipedia also metions a generalised suffix trees... I am not familiar with them at all so would that be a better approach?

Also, if its my implementation, I'd love to know what's wrong and what I could do better in terms of space complexity.

AChampion · Accepted Answer · 2016-10-29 15:08:39Z

Sorry I've not examined too closely your code to make a comment, but just considering the problem as stated and the fact you are using Py3, I would probably solve it with itertools.accumulate, e.g.:

>>> import itertools as it >>> import operator as op >>> ss = ["thisishello", "dfdsishdllo", "ashsisdsdsf"] >>> i, l = max(enumerate(it.accumulate(it.chain([0], zip(*ss)), ... lambda x, y: (x+1)*(len(set(y)) == 1))), key=op.itemgetter(1)) >>> i, l, ss[0][i-l:i] (6, 3, 'sis')

Should work well for an arbitrary number of strings as it uses generators and doesn't create any intermediate data structures.
It does use the fact that a False equates to 0 with len(set(y)) == 1 but if that is uncomfortable you could simply replace with 1 if len(set(y)) == 1 else 0.

Note: I still wish that itertools.accumulate had an initial value argument much like functools.reduce has, would avoid the need of chaining the initial value to the iterable.

This is a very interesting approach, I've never really used accumulate much. Very interesting use case :) I'll be sure to test it out! — Pavlin, CommentedOct 29, 2016 at 18:06

Stack Exchange Network

Longest common substring using dynamic programming

1 Answer 1

Hot Network Questions

Longest common substring using dynamic programming

1 Answer 1

Related

Hot Network Questions