PEP-8
The Style Guide for Python Code enumerates a set of rules that all Python programs should follow. Of particular note is the Naming Conventions, which indicates that function names and variable names should all be snake_case
, not CapitalizedWords
. So MergeSort
should be renamed merge_sort
, and MyList
should be renamed my_list
.
Private functions
merge
is an internal helper function for MergeSort
. PEP-8 also recommends naming internal helpers with a leading underscore, to indicate it is private, and should not be used externally, so merge
should be named _merge
.
Repeated computations and lookups
while len(left) > i and len(right) > j: if left[i] > right[j]: sorted_list.append(right[j]) j += 1 else: sorted_list.append(left[i]) i += 1
This code has some inefficiencies.
- Every iteration is recomputing
len(left)
and len(right)
. - Every iteration is performing 3 indexing operations, first looking up
left[i]
and right[j]
, and then looking up either left[i]
or right[j]
depending on which branch of the if
/else
is taken.
You only need to test len(left) > i
when i
changes, and len(right) > j
when j
changes:
while True: if left[i] > right[j]: sorted_list.append(right[j]) j += 1 if len(right) > j: break else: sorted_list.append(left[i]) i += 1: if len(left) > i: break
Now, the len()
function should be called half as often, and only the part of the expression that is changing is being evaluated, and so this code, while longer, should actually run faster.
Similarly, we can remove the repeated indexing operations, by only performing the indexing operation when the corresponding index changes:
left_value = left[i] right_value = right[j] while True: if left_value > right_value: sorted_list.append(right_value) j += 1 if len(right) > j: break right_value = right[j] else: sorted_list.append(left_value) i += 1: if len(left) > i: break left_value = left[i]
Again, more code, but it should run faster.
Iterators
The above code is complicate because we are responsible for testing if the indices get to the end of the list, looking up the values at the given indices, and incrementing the indices.
Python has iterators, which do all of these operations for us! So the above code could simply be written as:
left_iter = iter(left) right_iter = iter(right) left_value = next(left_iter) right_value = next(right_iter) while True: if left_value > right_value: sorted_list.append(right_value) right_value = next(right_iter) else: sorted_list.append(left_value) left_value = next(left_iter)
This takes the left
and right
lists, creates iterators for them, extracts the first values of each, and then adds the smaller of the two values to the sorted_list
, and then fetches the next value from the corresponding iterator.
The only problem is it will raise a StopIteration
exception when one of the two iterators runs out. We just need to catch this, and add the remaining entries from the non-expired iterator. But which one? Actually this is easy, since the values for left_value
and right_value
haven't changed, left_value > right_value
will reveal which branch of the if
/else
the exception was raised in.
The only "tricky" part to remember is that if the next(left_iter)
raised an exception, then right_value
should be added to the sorted_list
before the rest of the values from right_iter
, and vis-versa. sorted_list.extend()
is an easy way of adding the remaining items from the non-expired iterator.
def _merge(left, right): left_iter = iter(left) right_iter = iter(right) sorted_list = [] # These should never fail because both left & right have one or more elements. left_value = next(left_iter) right_value = next(right_iter) try: while True: if left_value > right_value: sorted_list.append(right_value) right_value = next(right_iter) else: sorted_list.append(left_value) left_value = next(left_iter) except StopIteration: if left_value > right_value: sorted_list.append(left_value) sorted_list.extend(left_iter) else: sorted_list.append(right_value) sorted_list.extend(right_iter) return sorted_list
We've approximately double the number of lines of code in _merge(left, right)
, but the code should be faster, since all of the indexing, is now being handled by Python itself.