1. Bugs
maxset
returns the wrong answer for some inputs. For example:
>>> maxset([2, 1, -1, 1, 3]) [2, 1]
The correct answer is [1, 3]
. The cause of this bug is the line:
if max_array < array[start:stop]:
which causes the sub-array with the higher sum (here, [1, 3]
) to be ignored because it lexicographically precedes max_array
(here, [2, 1]
).
maxset
goes wrong when all the elements of the input are negative:
>>> maxset([-1, -2]) -1
The number -1
is not a valid result: it's not the maximum sub-array, it's just a number.
It also goes wrong when the input is empty:
>>> maxset([]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "cr193902.py", line 7, in maxset max_array = array[0] IndexError: list index out of range
To fix these two problems, instead of this initialization:
max_array = array[0]
start with an empty list:
max_array = []
2. Review
The code in the post uses the print
statement and so only runs on Python 2.7. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. If you really can't use Python 3 then use:
from __future__ import print_function
so that your code is portable to Python 3.
The name maxset
is misleading since it does not return a set.
When you take a slice of a list, for example array[start:stop]
, Python has to copy out all the elements of the slice in order to construct a new list. This is a waste unless you are actually going to need the new list. So instead of:
len(array[start:stop])
(which creates a list and then throws it away again) you could write:
stop - start
This line copies out a slice of the list each time you find a new maximum sub-array:
max_array = array[start:stop]
but in many cases you are never going to need this list, because later on you will find an even bigger sub-array. So instead of copying out a slice, you could remember the location of the maximum sub-array found so far:
max_start = start max_stop = stop
and at the end of the function
return array[max_start:max_stop]
The code has a loop over the indexes into array
:
for i in range(len(array)):
and then within the loop it looks up the element at that index using array[i]
. Whenever you see this pattern, you can simplify it using the built-in function enumerate
. In this case the loop becomes:
for i, element in enumerate(array):
and wherever you had array[i]
you now have element
.
The two branches of the if: ... elif: ...
both have the line:
max_array = array[start:stop]
This duplication could be avoided by combining the two conditions into one:
if (max_sum < current_sum or (max_sum == current_sum and max_stop - max_start < stop - start)):
and this can be further simplified by comparing tuples:
if (max_sum, max_stop - max_start) < (current_sum, stop - start):
There's no need to maintain a running variable stop
since it always has the value i + 1
.
The test cases don't check the results. When I run it, I get the output:
>>> main() [1, 2, 5] [1] [1, 2, 5, 7, 2, 5] [222] [1, 1] [1, 1, 1] [5] [1, 1, 1] [1, 1, 1]
Is this right? The usual way to write a test case is to raise an exception if the result is incorrect. There are various ways to do this. You could use the assert
statement and write:
assert maxset([ 1, 2, 5, -7, 2, 5 ]) == [1, 2, 5]
or you could use the unittest
module from the Python standard library.
3. Revised code
def max_sub_array(array): """Return maximum sub-array of non-negative numbers.""" start = 0 # Start of current sub-array current_sum = 0 # Sum of current sub-array max_start = 0 # Start of maximum sub-array max_stop = 0 # Stop of maximum sub-array max_sum = 0 # Sum of maxium sub-array for stop, element in enumerate(array, start=1): if element >= 0: current_sum += element if (max_sum, max_stop - max_start) < (current_sum, stop - start): max_sum = current_sum max_start = start max_stop = stop else: start = stop current_sum = 0 return array[max_start:max_stop] import unittest class TestMaxSubArray(unittest.TestCase): CASES = [ # array, expected result ([], []), ([-1, -2], []), ([2, 1, -1, 1, 3], [1, 3]), ([1, 2, 5, -7, 2, 5], [1, 2, 5]), ([1,-1, 1, -1, 1, -1], [1]), ([1, 2, 5, 7, 2, 5] , [1, 2, 5, 7, 2, 5]), ([222], [222]), ([1, 1], [1, 1]), ([3, -1, 1, 1, 1, -1, 3], [1, 1, 1]), ([5, -1, 1, 1, 1, 1, -1, 5], [5]), ([3, -1, 1, 1, 1, -1, 2], [1, 1, 1]), ([3, -3, -1, 1, 1, 1, -1, 2], [1, 1, 1]), ] def test_max_sub_array(self): for array, expected in self.CASES: self.assertEqual(expected, max_sub_array(array))
4. A different approach
The single responsibility principle say that each piece of code should do just one thing. By splitting code into pieces with single responsibilities, the individual pieces become easier to understand, check and test.
So for this problem we could split the code into two parts:
- find all the sub-arrays of non-negative numbers in
array
; - find the maximum sub-array in the results of step 1.
For part 1 the Python standard library has a useful function itertools.groupby
for splitting an iterable into groups based on a key function. In this case the key function needs to be "is the element non-negative". Then for part 2 we can use the built-in max
function, again using a key function.
from itertools import groupby def max_sub_array(array): """Return maximum sub-array of non-negative numbers.""" sub_arrays = [list(group) for non_negative, group in groupby(array, lambda e: e >= 0) if non_negative] if sub_arrays: return max(sub_arrays, key=lambda a: (sum(a), len(a))) else: return []
In Python 3.4 or later, max
has a default
keyword argument to handle the case where there are no items, and so we can combine this into a single expression:
from itertools import groupby def max_sub_array(array): """Return maximum sub-array of non-negative numbers.""" return max((list(group) for non_negative, group in groupby(array, lambda e: e >= 0) if non_negative), key=lambda a: (sum(a), len(a)), default=[])