Merge sort algorithm written in Python

Question

def merge_sort(mslist): if len(mslist) < 2: return mslist left = mslist[:int(len(mslist)/2)] right = mslist[int(len(mslist)/2):] left = merge_sort(left) right = merge_sort(right) return merge(left, right) def merge(left,right): result = [] while len(left) > 0 or len(right) > 0: if len(left) > 0 and len(right) > 0: if left[0] <= right[0]: result.append(left[0]) left.pop(0) else: result.append(right[0]) right.pop(0) elif len(left) > 0: result.append(left[0]) left.pop(0) elif len(right) > 0: result.append(right[0]) right.pop(0) return result def main(): mslist = [-9,1,1,2,3,2,7,3,6,4,5,4,3,5,6,7,8,9,0,0] print(merge_sort(mslist)) if __name__ == "__main__": main()

Please give me some comments about the actuall algorithm, i used the pseudo code from the wikipedia page as a reference.
Please give me some Pythonic advice or any other general coding paradigm you feel should be followed.

I already made changes to my code regarding the amount of times len(direction) is done.

Community · Accepted Answer · 2017-04-13 12:40:57Z

You took into account remarks from your other question and I couldn't be happier about this but I still have a few comments to make.

I usually find it quite hard to read code when the same variable gets assigned multiple times and this is not required. It makes me think "ok, at this particular point, this corresponds to this type of value while afterward, it corresponds to this other type of value". Typically, in your merge_sort function, left and right stands for "unsorted list" and then for "sorted list" making things a bit awkward. You could go for a solution with more variables or you could go for a solution with ... no variables :

def merge_sort(l): if len(l) <= 1: return l return merge( merge_sort(l[:int(len(l)/2)]), merge_sort(l[int(len(l)/2):]))

Also, you could make things clearer by adding a variable to denote the middle index.

def merge_sort(l): if len(l) <= 1: return l mid = len(l)//2 return merge( merge_sort(l[:mid]), merge_sort(l[mid:]))

In your merging procedure, you spend a lot of times checking the same conditions again and again. This is purely personal but I'd rather go for a solution where conditions are checked a minimal number of times :

while True: if len(left) > 0: if len(right) > 0: if left[0] <= right[0]: result.append(left[0]) left.pop(0) else: result.append(right[0]) right.pop(0) else: result.append(left[0]) left.pop(0) elif len(right) > 0: result.append(right[0]) right.pop(0) else: break return result

The main drawback is that it removes the symetry which is a bit of an issue from an aesthethic point of view.

Now, it is time to do things in a more efficient way. The traditional merge algorithm keeps track of the 2 indices going through the 2 lists. If we implement it that way, we have a huge boost of performance as we don't have to call pop anymore :

def merge(left, right): result = [] left_ind, right_ind = 0, 0 left_len, right_len = len(left), len(right) while True: if left_ind < left_len: left_val = left[left_ind] if right_ind < right_len: right_val = right[right_ind] if left_val <= right_val: result.append(left_val) left_ind+=1 else: result.append(right_val) right_ind+=1 else: result.append(left_val) left_ind+=1 elif right_ind < right_len: result.append(right[right_ind]) right_ind+=1 else: break return result

Just doing this, timing goes from :

(1, 5.2928924560546875e-05) (10, 0.0007579326629638672) (100, 0.011490106582641602) (1000, 0.15107107162475586) (10000, 4.067746877670288)

to :

(1, 3.790855407714844e-05) (10, 0.0004601478576660156) (100, 0.005733966827392578) (1000, 0.06619787216186523) (10000, 0.7241289615631104)

(Number on the left hand side is the number I multiply your list by before sorting it so it is proportionnal to the number of items, number on the right is the time it took on my desktop computer).

Finally, a last optimisation is to use the fact that there is no need to loop when we've reached the end of one of the list :

def merge(left, right): result = [] left_ind, right_ind = 0, 0 left_len, right_len = len(left), len(right) while True: if left_ind < left_len: if right_ind < right_len: left_val = left[left_ind] right_val = right[right_ind] if left_val <= right_val: result.append(left_val) left_ind+=1 else: result.append(right_val) right_ind+=1 else: # right is empty return result + left[left_ind:] elif right_ind < right_len: # left is empty return result + right[right_ind:] else: return result

Timing is now :

(1, 3.504753112792969e-05) (10, 0.00040793418884277344) (100, 0.004902839660644531) (1000, 0.06235790252685547) (10000, 0.6899728775024414)

Last comment to be nitpicky, your merge_sort has a quite unusual behavior as it returns a new list object sometimes (when the original list was longer than 1) and the original list sometimes. It could be cool to be consistent and always return a new list : return list(l).

Emanuele Paolini · Accepted Answer · 2014-08-07 12:17:04Z

Here:

if len(mslist) < 2:

I would write len(mslist) <= 1 because 1 is the interesting number here.

Here:

int(len(mslist)/2)

just write len(mslist)//2 which is explicit integer division.

Here:

 result.append(left[0]) left.pop(0)

you can write result.append(left.pop()) since pop returns the element removed.

The elifs in the merge function are not properly aligned.

Notice that the use of the pop method could alter algorithm complexity (because I think that pop is \$O(n)\$). It could be better to use two indices to keep track of the position in the two lists.

Thanks! Nice advices! I did a quickfix on the elif alignment. Thanks :) — macbug, CommentedAug 7, 2014 at 12:24

vnp · Accepted Answer · 2014-08-07 19:19:28Z

Testing for a validity of an index is very anti-pythonic. Follow Grace Hopper principle:

try: while True: while left[left_idx] < right[right_idx]: result.append(left[left_idx]) left_idx += 1 while left[left_idx] >= right[right_idx]: result.append(right[right_idx]) right_idx += 1 except IndexError: # append the rest of the lists; one of them is guaranteed to be empty

Stack Exchange Network

Merge sort algorithm written in Python

3 Answers 3

Linked

Hot Network Questions

Merge sort algorithm written in Python

3 Answers 3

Linked

Related

Hot Network Questions