Memory optimization when finding the Cartesian product in Python

Question

My function gets the Cartesian product of a list of lists of tuples. The function is correctly producing the list of lists, but it's a memory hog (and leads to a MemoryError for large data sets, with the mapping/casting line probably the biggest culprit) and I have very little experience optimizing code with memory in mind. The function is given here:

def make_ratio_groups(self): list_of_lists_of_tuples = [] for part in whole: list_of_lists_of_tuples.append(part.list_of_tuples) all_groups = itertools.product(*list_of_lists_of_tuples) # Mapping and casting to list is necessary to put in the correct format for a subsequent function all_groups_list = list(map(list, all_groups)) return all_groups_list

Is there a more memory-efficient way of doing this? Thanks in advance for your time and patience.

Edit: It's definitely the mapping and casting that's causing the problem. I created a generator that handled the Cartesian product calculations and didn't see a big improvement in performance, so I tried profiling and found that the mapping/casting is definitely the issue.

Can you use a generator? IE, what does the downstream processing look like? — Stephen Rauch, CommentedMar 25, 2017 at 1:46
I'm not familiar with generators (yet - am reading up now). The returned object is subsequently mutated by two other functions; each time, some of the groups are removed from the object. The object needs to be iterable. — jda, CommentedMar 25, 2017 at 1:48
If you post more of the code, you'll get better answers (and comments). For example, what is the definition of whole? — Also, definitely look into generators (i.e. yield instead of return), so that you don't have to keep everything in memory at once in the first place. — Quuxplusone, CommentedMar 25, 2017 at 5:42
Generators look fantastic; thank you guys for pointing them out! As for the rest,part and whole are complicated objects, but they have nothing to do with the code given here aside from demonstrating that the tuples in question are grabbed from a series of part objects. That's not a memory-intensive chunk of code. The only two pertinent sections IMO are the Cartesian product calculations and the subsequent mapping and casting to list. The informal profiling I've done supports that. I've been able to make a generator to replace product, but am not sure how to deal with map/cast. — jda, CommentedMar 25, 2017 at 5:50
I'm afraid your requirements as-is do not allow much memory optimization. You say that in the end you need the result to be a list of lists -- and that is what's really large. If you can't change that requirement (perhaps a good compromise would be an iterable of lists), then there's not much you can do. — Rafael Lerm, CommentedMar 25, 2017 at 21:56

Stack Exchange Network

Memory optimization when finding the Cartesian product in Python

0

Hot Network Questions

Memory optimization when finding the Cartesian product in Python

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions