Optimize function that returns length of a subarray

Question

I did an exercise today on where the task is to write a function that returns the length of the longest subarray where the difference between the smallest and biggest value in the subarray is no more than 1. I had a solution that works but it's too slow to pass the tests that have big arrays. What is it about my solution that makes it very slow with huge arrays?

def longestSubarray(arr): longest = 0 arr_len = len(arr) for i in range(arr_len): for j in range(i, arr_len): sub = arr[i: j+1] if max(sub) - min(sub) > 1: continue length = len(sub) if length > longest: longest = length return longest

RootTwo · Accepted Answer · 2020-07-03 01:14:47Z

The code is slow on larger arrays because of the nested for loops. The number of times the inner loop runs depends on the length of the array. The outer loop runs arr_len times and the inner loop runs arr_len/2 times on average. As a result the code in the loop runs arr_len**2 / 2. If the size of the array doubles, the amount of work the code does quadruples. If the size goes up by a thousand, the work goes up by a million. You may see this described as O(n^2) time complexity.

The trick is to find an algorithm than scans the array once (or maybe a few times). Here are some observations that might help:

If the 1st element of a subarray is x, then a valid subarray is a sequence of x's, a sequence of x's and x+1's or a sequence of x's and x-1's. For example, [2,2,2], [2,2,3,3,2,3], and [2,1,1,2,2,2,2,2] could be valid subarrays.

Depending on the form of the subarray, min, max is either (x,x), (x, x+1), or (x-1, x). And all the values in the valid subarray are min or max.

Depending on the value that ends the current subarray, a new subarray can start where it changed between min to max, or the reverse. Or it could start with the new value. For example, [1,1,2,2,3,3] has overlapping subarrays [1,1,2,2] and [2,2,3,3]. But [1,1,2,2,0,0] doesn't: [1,1,2,2] and [0,0].

l0b0 · Accepted Answer · 2020-07-02 20:43:58Z

Specific suggestions:

In many situations it would probably be useful to return sub rather than len(sub). This makes the code somewhat simpler and gives the caller more information. Depending on the context that may or may not be useful, but it simplifies the code, which makes it easier to focus on the core issue.
The code iterates over all sub-lists, calculating min and max of each of them. This is a lot of duplicate work. You can instead iterate only once over the list (with some backtracking):
1. Set the longest sub-list to the empty list.
2. Set the working sub-list to a list containing just the first item in the input list.
3. Set working sub-list min and max to the first item in the working sub-list.
4. Move along the input list, updating min and max as long as the difference between them ends up being less than one and adding the current item to the working sub-list.
5. Once you encounter a number which makes the difference between min and max more than one (or the end of the list, at which point you'd terminate after this step), check if the working sub-list is longer than the longest sub-list and update the latter if so.
6. Once you encounter a number which makes the difference between min and max more than one you can't simply start the new working sub-list from that list item, because valid sub-lists may overlap (for example, [1, 1, 2, 3] has valid sub-lists [1, 1, 2] and [2, 3]). At this point you need to set the new working sub-list to include all the items at the tail end of the previous working sub-list which make a valid sub-list with the new item (if any).
7. Go to 3 until you reach the end of the list.

Tool support

black can automatically format your code to be more idiomatic, for example adding spaces around infix mathematical operators like +.
flake8 can give you more hints to write idiomatic Python beyond black:
```
 [flake8] ignore = W503,E203 
```

I would then recommend validating your type hints using a strict mypy configuration:

 [mypy] check_untyped_defs = true disallow_any_generics = true disallow_untyped_defs = true ignore_missing_imports = true no_implicit_optional = true warn_redundant_casts = true warn_return_any = true warn_unused_ignores = true

In your case it looks like the type of arr should be Iterable[int] and the output is int.

Stack Exchange Network

Optimize function that returns length of a subarray

2 Answers 2

Tool support

Hot Network Questions

Optimize function that returns length of a subarray

2 Answers 2

Tool support

Related

Hot Network Questions