3
\$\begingroup\$

Problem:

Given a list L consisting of not necessarily unique non-negative integers, find the minimum number of subset with maximum sum k (given) in which it can be divided such that:

each subset shouldn't contain consecutive elements from the list*.

(*): I can see an obvious ambiguity:
Q: If there is 3 4, is the restriction is on all 3 or just the adjacent three?

A: Just on adjacent 3.

Examples:

 1. L = [2, 1, 1, 2], k = 2 => 4 Explanation: First set can take 0th index element second set can take 1st index element and similarly third and fourth can take 2nd and 3rd index element respectively Here, you can see that k is working as a bottleneck More examples: L = [2, 4, 1, 0] k = 8 => 2 L = [3, 2, 3, 0, 0] k = 5 => 3 

This seems to have some features of dynamic programming problems, so I went for dynamic programming and wrote the following recursive code:

from functools import lru_cache def min_count(L, k): ''' index - stores the current state's index sets - this stores information of already stored sets: It is tuple of tuple, In each tuple: First element - index of last stored element Second element - stores the sum of current set k - It is limit|capacity of each subset(same as the one given in arugment) ''' @lru_cache(maxsize=None) def recurse(index, sets, k): #If control reaches to the end of the list if index == len(L): return len(sets) #In case the current indexes is zero if L[index] == 0: #If this is the last index, then if index + 1 == len(L): return len(sets) return recurse(index + 1, sets, k) ''' The job of this for loop is iterate all of the stored sets: And if that's capable of storing further, then add current index to it, and recurse ''' m = 1e10 for i, j in sets: if i - index < -1 and j + L[index] <= k: if index + 1 == len(sets): return len(sets) m = min(m, recurse(index + 1,tuple(_ if _ != (i, j) else (index, j + L[index]) for _ in sets) , k)) m = min(m, recurse(index + 1, sets + ((index, L[index]), ), k)) return m return recurse(0, (), k) 

This code works, but the problem is that it is very slow; the expected time complexity must be lower than \$O(n^2)\$.

How can I make it fast?

\$\endgroup\$
2
  • \$\begingroup\$when would more than two subsets be required? Taking every other element should be non consecutive\$\endgroup\$CommentedJan 7, 2021 at 16:55
  • \$\begingroup\$Example: L = [2, 2, 2, 2, 2] with k = 2\$\endgroup\$
    – xrfxlp
    CommentedJan 7, 2021 at 17:04

2 Answers 2

1
\$\begingroup\$

Here are some minor coding style suggestions. They are not likely to improve performance.

Documentation

Add a docstring for the min_count function to describe its purpose, its input types and return type.

Layout

Add a blank line between the import statement and the function.

The placement of the first docstring in your code is misleading because docstrings typically come after the function def line. Here is better placement:

from functools import lru_cache def min_count(L, k): ''' Given a list L consisting of not necessarily unique non-negative integers, find the minimum number of subset with maximum sum "k" in which it can be divided such that: each subset shouldn't contain consecutive elements from the list. ''' @lru_cache(maxsize=None) def recurse(index, sets, k): ''' index - stores the current state's index sets - this stores information of already stored sets: It is tuple of tuple, In each tuple: First element - index of last stored element Second element - stores the sum of current set k - It is limit|capacity of each subset(same as the one given in arugment) ''' 

There is a typo in the docstring: arugment

This long line is hard to read and understand:

m = min(m, recurse(index + 1,tuple(_ if _ != (i, j) else (index, j + L[index]) for _ in sets) , k)) 

The black program can be used to automatically reformat it across several lines to more clearly show its structure.

DRY

This expression is used repeatedly:

len(L) 

You should set it to a variable:

list_len = len(L) 

The same is true for len(sets).

Naming

The code would be easier to understand with more meaningful names for some of the variables. For example, j is typically used as an index, but it looks like you are using it for a sum.

m does not convey much meaning. If it is used for a maximum of some sort, then having "max" in the name would be helpful.

\$\endgroup\$
    1
    \$\begingroup\$

    Your implementation has extremely severe superlinear time blowup. Comparing your implementation to one I'll describe momentarily, here are some rudimentary benchmark comparisons:

     n, old_ms, new_ms 2, 0.0, 4.4 3, 0.0, 4.3 4, 0.0, 4.7 5, 0.1, 5.2 6, 0.1, 5.8 7, 0.4, 6.9 8, 2.0, 8.2 9, 9.7, 10.0 10, 57.1, 13.5 11, 336.5, 17.4 12, 2081.5, 20.2 13,19222.3, 24.8 

    If we were to believe these figures blindly and assume polynomial time, then the equivalent exponent across the last two times would be

    $$\frac {\log (19/2.2)} {\log (13/12)} \approx 27$$

    but of course the formal time complexity is not \$O(n^{27})\$. For a semi-educated guess, it may be \$O(n^{n/2})\$. I'm not going to put more effort into figuring out what it is in reality.

    The alternative method I propose has a little more setup time but gentler time complexity. (How much exactly is "complicated" due to being LP.)

    About the alternative implementation:

    • Frame it as an integer linear program (my frequent hammer for this kind of nail)
    • The problem size is \$ (3n + n(n - 1) ) \times ( n + n^2) \$
    • The problem is fairly sparse; all four of the constraints require Kronecker products:
      • Exclusive element-group assignment
      • Group sum limit
      • Correlation of group assignment and element assignment
      • Consecutive element prevention

    This demonstration shows how to construct the sparse constraint matrices by hand; other high-level libraries will be able to make it a little more intuitive.

    import functools import time import typing import numpy as np import scipy.sparse as sp from scipy.optimize import milp, Bounds, LinearConstraint def op_min_count(L, k): ... def min_subsets( inputs: typing.Sequence[int], # i.e. 'L' sum_limit: int, # i.e. 'k' ) -> int: n = len(inputs) # Variables: # n binary group enables # n*n binary element-group assignments, slow axis is group, fast axis is element minimize_group_count = np.zeros(n + n*n) minimize_group_count[:n] = 1 # Elements must be assigned to exactly one group group_zeros = sp.csc_array((n, n)) eye_n = sp.eye_array(n, n) h_ones = np.ones((1, n)) exclusive_assignment = LinearConstraint( A=sp.hstack( (group_zeros, sp.kron(h_ones, eye_n)), format='csc', ), lb=1, ub=1, ) # Limit group sums input_array = np.asarray(inputs)[np.newaxis, :] group_sums = LinearConstraint( A=sp.hstack( (group_zeros, sp.kron(eye_n, input_array)), format='csc', ), ub=sum_limit, ) # Correlate element assignment and group assignment # element <= group: 0 <= group - element assignment_correlation = LinearConstraint( A=sp.hstack( ( n*eye_n, sp.kron(eye_n, -h_ones), ), format='csc', ), lb=0, ) # Consecutive inputs elements may not be assigned to the same output group # x_i + x_{i+1} <= 1 no_consecutives = LinearConstraint( A=sp.hstack( ( sp.csc_array((n*(n - 1), n)), sp.kron( eye_n, sp.eye_array(n - 1, n) + # first element in consecutive run sp.eye_array(n - 1, n, k=1), # second element in consecutive run ), ), format='csc', ), ub=1, ) result = milp( c=minimize_group_count, integrality=1, # all variables are integral bounds=Bounds(lb=0, ub=1), # all variables are binary constraints=( assignment_correlation, exclusive_assignment, group_sums, no_consecutives, ), ) if not result.success: raise ValueError(result.message) group_enables = np.empty(shape=n, dtype=np.int8) element_assignments = np.empty(shape=(n, n), dtype=np.int8) np.rint(result.x[:n], out=group_enables, casting='unsafe') np.rint(result.x[n:].reshape((n, n)), out=element_assignments, casting='unsafe') return group_enables.sum() def test() -> None: # OP tests assert min_subsets((2, 1, 1, 2), 2) == 4, 'k working as a bottleneck' assert min_subsets((2, 4, 1, 0), 8) == 2, 'two simple groups' assert min_subsets((3, 2, 3, 0, 0), 5) == 3, 'limited by consecutive constraint' def benchmark() -> None: rand = np.random.default_rng(seed=0) print(f'{"n":>2},{"old_ms":>7},{"new_ms":>7}') for n in range(2, 14): inputs = rand.integers(low=1, high=10, size=n) k = n * 10//2 t0 = time.perf_counter() size_a = op_min_count(inputs, k) t1 = time.perf_counter() size_b = min_subsets(inputs, k) t2 = time.perf_counter() assert size_a == size_b print(f'{n:2},{(t1 - t0)*1e3:7.1f},{(t2 - t1)*1e3:7.1f}') if __name__ == '__main__': test() benchmark() 

    This has a constraint sparsity pattern that - for an example n=6 - looks like

    sparsity pattern

    \$\endgroup\$

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.