You took into account remarks from your other question and I couldn't be happier about this but I still have a few comments to make.
I usually find it quite hard to read code when the same variable gets assigned multiple times and this is not required. It makes me think "ok, at this particular point, this corresponds to this type of value while afterward, it corresponds to this other type of value". Typically, in your merge_sort
function, left
and right
stands for "unsorted list" and then for "sorted list" making things a bit awkward. You could go for a solution with more variables or you could go for a solution with ... no variables :
def merge_sort(l): if len(l) <= 1: return l return merge( merge_sort(l[:int(len(l)/2)]), merge_sort(l[int(len(l)/2):]))
Also, you could make things clearer by adding a variable to denote the middle index.
def merge_sort(l): if len(l) <= 1: return l mid = len(l)//2 return merge( merge_sort(l[:mid]), merge_sort(l[mid:]))
In your merging procedure, you spend a lot of times checking the same conditions again and again. This is purely personal but I'd rather go for a solution where conditions are checked a minimal number of times :
while True: if len(left) > 0: if len(right) > 0: if left[0] <= right[0]: result.append(left[0]) left.pop(0) else: result.append(right[0]) right.pop(0) else: result.append(left[0]) left.pop(0) elif len(right) > 0: result.append(right[0]) right.pop(0) else: break return result
The main drawback is that it removes the symetry which is a bit of an issue from an aesthethic point of view.
Now, it is time to do things in a more efficient way. The traditional merge algorithm keeps track of the 2 indices going through the 2 lists. If we implement it that way, we have a huge boost of performance as we don't have to call pop
anymore :
def merge(left, right): result = [] left_ind, right_ind = 0, 0 left_len, right_len = len(left), len(right) while True: if left_ind < left_len: left_val = left[left_ind] if right_ind < right_len: right_val = right[right_ind] if left_val <= right_val: result.append(left_val) left_ind+=1 else: result.append(right_val) right_ind+=1 else: result.append(left_val) left_ind+=1 elif right_ind < right_len: result.append(right[right_ind]) right_ind+=1 else: break return result
Just doing this, timing goes from :
(1, 5.2928924560546875e-05) (10, 0.0007579326629638672) (100, 0.011490106582641602) (1000, 0.15107107162475586) (10000, 4.067746877670288)
to :
(1, 3.790855407714844e-05) (10, 0.0004601478576660156) (100, 0.005733966827392578) (1000, 0.06619787216186523) (10000, 0.7241289615631104)
(Number on the left hand side is the number I multiply your list by before sorting it so it is proportionnal to the number of items, number on the right is the time it took on my desktop computer).
Finally, a last optimisation is to use the fact that there is no need to loop when we've reached the end of one of the list :
def merge(left, right): result = [] left_ind, right_ind = 0, 0 left_len, right_len = len(left), len(right) while True: if left_ind < left_len: if right_ind < right_len: left_val = left[left_ind] right_val = right[right_ind] if left_val <= right_val: result.append(left_val) left_ind+=1 else: result.append(right_val) right_ind+=1 else: # right is empty return result + left[left_ind:] elif right_ind < right_len: # left is empty return result + right[right_ind:] else: return result
Timing is now :
(1, 3.504753112792969e-05) (10, 0.00040793418884277344) (100, 0.004902839660644531) (1000, 0.06235790252685547) (10000, 0.6899728775024414)
Last comment to be nitpicky, your merge_sort
has a quite unusual behavior as it returns a new list object sometimes (when the original list was longer than 1) and the original list sometimes. It could be cool to be consistent and always return a new list : return list(l)
.