2
\$\begingroup\$

I wasn't able to find a standardized source for a working example of Merge Sort.

Each learning site had a slightly different take on the algorithm implementation, disregarding the bottom-top implementation.

I wrote my own code for this but unsure if it's the most efficient or if efficiency can be improved?

Any other syntax/logic improvements are welcomed to be pointed out.

def merge(left, right): i = 0 #left index j = 0 #right index sorted_list = [] while len(left) > i and len(right) > j: if left[i] > right[j]: sorted_list.append(right[j]) j += 1 else: sorted_list.append(left[i]) i += 1 sorted_list += left[i:] + right[j:] return sorted_list def MergeSort(MyList): if len(MyList) > 1: mid = len(MyList) // 2 left = MyList[mid:] right = MyList[:mid] left = MergeSort(left) right = MergeSort(right) sorted_list = merge(left, right) return sorted_list return MyList 

Also, a source where I can find standardized code for most algorithms in Python would be nice? By standardized; I mean code that is deemed efficient and the most up-to-date by the Python community.

\$\endgroup\$
0

    1 Answer 1

    2
    \$\begingroup\$

    PEP-8

    The Style Guide for Python Code enumerates a set of rules that all Python programs should follow. Of particular note is the Naming Conventions, which indicates that function names and variable names should all be snake_case, not CapitalizedWords. So MergeSort should be renamed merge_sort, and MyList should be renamed my_list.

    Private functions

    merge is an internal helper function for MergeSort. PEP-8 also recommends naming internal helpers with a leading underscore, to indicate it is private, and should not be used externally, so merge should be named _merge.

    Repeated computations and lookups

     while len(left) > i and len(right) > j: if left[i] > right[j]: sorted_list.append(right[j]) j += 1 else: sorted_list.append(left[i]) i += 1 

    This code has some inefficiencies.

    1. Every iteration is recomputing len(left) and len(right).
    2. Every iteration is performing 3 indexing operations, first looking up left[i] and right[j], and then looking up either left[i] or right[j] depending on which branch of the if/else is taken.

    You only need to test len(left) > i when i changes, and len(right) > j when j changes:

     while True: if left[i] > right[j]: sorted_list.append(right[j]) j += 1 if len(right) > j: break else: sorted_list.append(left[i]) i += 1: if len(left) > i: break 

    Now, the len() function should be called half as often, and only the part of the expression that is changing is being evaluated, and so this code, while longer, should actually run faster.

    Similarly, we can remove the repeated indexing operations, by only performing the indexing operation when the corresponding index changes:

     left_value = left[i] right_value = right[j] while True: if left_value > right_value: sorted_list.append(right_value) j += 1 if len(right) > j: break right_value = right[j] else: sorted_list.append(left_value) i += 1: if len(left) > i: break left_value = left[i] 

    Again, more code, but it should run faster.

    Iterators

    The above code is complicate because we are responsible for testing if the indices get to the end of the list, looking up the values at the given indices, and incrementing the indices.

    Python has iterators, which do all of these operations for us! So the above code could simply be written as:

     left_iter = iter(left) right_iter = iter(right) left_value = next(left_iter) right_value = next(right_iter) while True: if left_value > right_value: sorted_list.append(right_value) right_value = next(right_iter) else: sorted_list.append(left_value) left_value = next(left_iter) 

    This takes the left and right lists, creates iterators for them, extracts the first values of each, and then adds the smaller of the two values to the sorted_list, and then fetches the next value from the corresponding iterator.

    The only problem is it will raise a StopIteration exception when one of the two iterators runs out. We just need to catch this, and add the remaining entries from the non-expired iterator. But which one? Actually this is easy, since the values for left_value and right_value haven't changed, left_value > right_value will reveal which branch of the if/else the exception was raised in.

    The only "tricky" part to remember is that if the next(left_iter) raised an exception, then right_value should be added to the sorted_list before the rest of the values from right_iter, and vis-versa. sorted_list.extend() is an easy way of adding the remaining items from the non-expired iterator.

    def _merge(left, right): left_iter = iter(left) right_iter = iter(right) sorted_list = [] # These should never fail because both left & right have one or more elements. left_value = next(left_iter) right_value = next(right_iter) try: while True: if left_value > right_value: sorted_list.append(right_value) right_value = next(right_iter) else: sorted_list.append(left_value) left_value = next(left_iter) except StopIteration: if left_value > right_value: sorted_list.append(left_value) sorted_list.extend(left_iter) else: sorted_list.append(right_value) sorted_list.extend(right_iter) return sorted_list 

    We've approximately double the number of lines of code in _merge(left, right), but the code should be faster, since all of the indexing, is now being handled by Python itself.

    \$\endgroup\$

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.