Python Pandas - Sorting a MultiIndex



Sorting MultiIndex in Pandas is used to efficiently organize the hierarchical datasets. In Pandas MultiIndex is also known as a hierarchical index and it has multiple levels of index in Pandas data structures such as, DataFrame or Series objects. Each level in a MultiIndexed object can be sorted independently to apply the efficient slicing, indexing, filtering, and retrieving operations on your data.

Below are the key methods to sort MultiIndexed objects in Pandas −

  • sort_index(): Sort object by labels.

  • sortlevel(): Used for sorting the MultiIndexed object at a specific level.

  • sort_values(): Used to get the sorted copy if the DataFrame.

In this tutorial, we will learn how to sort a MultiIndexed objects in Pandas using these methods with different approaches.

Sorting MultiIndex Using sort_index()

The Pandas DataFrame.sort_index() method is used to sort a MultiIndex by all levels. Sorting a MultiIndex object can be useful for efficient indexing and slicing of the data.

Example

Here is the basic example of using the df.sort_index() method is to sort a MultiIndex by all levels. This sorts the data according to both levels of the MultiIndex.

 import pandas as pd # Create a MultiIndex object index = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('A', 'three'),('B', 'one'), ('B', 'two'), ('B', 'three')], names=["level0", "level1"]) # Create a DataFrame data = [[1, 2], [3, 4], [1, 1], [5, 6], [7, 8], [2, 2]] df = pd.DataFrame(data, index=index, columns=['X', 'Y']) # Display the input DataFrame print('Original MultiIndexed DataFrame:\n',df) # Sort MultiIndex with default levels sorted_df = df.sort_index() print("Resultant DataFrame:") print(sorted_df) 

Following is the output of the above code −

 Original MultiIndexed DataFrame: 
XY
level1level2
Aone12
two34
three11
Bone56
two78
three22
Resultant DataFrame:
XY
level1level2
Aone12
three11
two34
Bone56
three22
two78

Sorting MultiIndex by Specific Level

If you want to sort by a specific level of the MultiIndex, you can use the level parameter of the df.sort_index() method.

Example

Following is the example of sorting a MultiIndex by its the first level (ie., level=0).

 import pandas as pd # Create a MultiIndex object index = pd.MultiIndex.from_tuples([('C', 'one'), ('C', 'two'),('B', 'one'), ('B', 'two')]) # Create a DataFrame data = [[1, 2], [3, 4], [5, 6], [7, 8]] df = pd.DataFrame(data, index=index, columns=['X', 'Y']) # Display the input DataFrame print('Original MultiIndexed DataFrame:\n',df) # Sort MultiIndex by the first level sorted_df = df.sort_index(level=0) print("Resultant DataFrame:") print(sorted_df) 

Following is the output of the above code −

 Original MultiIndexed DataFrame: 
XY
Cone12
two34
Bone56
two78
Resultant DataFrame:
XY
Bone56
two78
Cone12
two34

Sorting MultiIndex by Level Names

Similar to the above approach you can also sort the MultiIndex by level names instead of the numerical index using the df.sort_index() method with level parameter.

Example

This example sorts the MultiIndex by using the level name specified to the level parameter of the set_names() method.

 import pandas as pd # Create a MultiIndex object index = pd.MultiIndex.from_tuples([('D', 'z'), ('D', 'x'), ('D', 'y'),('B', 't'), ('B', 's'), ('B', 'v')], names=["level0", "level1"]) # Create a DataFrame data = [[1, 2], [3, 4], [1, 1], [5, 6], [7, 8], [2, 2]] df = pd.DataFrame(data, index=index, columns=['X', 'Y']) # Display the input DataFrame print('Original MultiIndexed DataFrame:\n',df) # Sort by the level name sorted_df = df.sort_index(level='level1') print("Resultant DataFrame:") print(sorted_df) 

Following is the output of the above code −

 Original MultiIndexed DataFrame: 
XY
level1level2
Dz12
x34
y11
Bt56
s78
v22
Resultant DataFrame:
XY
level1level2
Bs78
t56
v22
Dx34
y11
z12

Sorting MultiIndex at Specific Levels with sortlevel()

By using the MultiIndex.sortlevel() method you can also sort a MultiIndex at a specific level.

Example

Following is the example of sorting the MultiIndex object by using the MultiIndex.sortlevel() method.

 import pandas as pd # Create arrays arrays = [[2, 4, 3, 1], ['Peter', 'Chris', 'Andy', 'Jacob']] # The from_arrays() is used to create a MultiIndex multiIndex = pd.MultiIndex.from_arrays(arrays, names=('ranks', 'student')) # display the MultiIndex print("The Multi-index...\n",multiIndex) # get the levels in MultiIndex print("\nThe levels in Multi-index...\n",multiIndex.levels) # Sort MultiIndex # The specific level to sort is set as a parameter i.e. level 1 here print("\nSort MultiIndex at the requested level...\n",multiIndex.sortlevel(1)) 

Following is the output of the above code −

 The Multi-index... MultiIndex([(2, 'Peter'), (4, 'Chris'), (3, 'Andy'), (1, 'Jacob')], names=['ranks', 'student']) The levels in Multi-index... [[1, 2, 3, 4], ['Andy', 'Chris', 'Jacob', 'Peter']] Sort MultiIndex at the requested level... (MultiIndex([(3, 'Andy'), (4, 'Chris'), (1, 'Jacob'), (2, 'Peter')], names=['ranks', 'student']), array([2, 1, 3, 0])) 

Sorting MultiIndex Using sort_values()

The sort_values() method sorts the index object and returns the copy of the index.

Example

The following example demonstrates how to sort the MultiIndex object using the sort_values() method.

 import pandas as pd # Create arrays arrays = [[2, 4, 3, 1], ['Peter', 'Chris', 'Andy', 'Jacob']] # The from_arrays() is used to create a MultiIndex multiIndex = pd.MultiIndex.from_arrays(arrays, names=('ranks', 'student')) # display the MultiIndex print("The Multi-index...\n",multiIndex) # Sort MultiIndex using the sort_values() method print("\nSort MultiIndex...\n",multiIndex.sort_values()) 

Following is the output of the above code −

 The Multi-index... MultiIndex([(2, 'Peter'), (4, 'Chris'), (3, 'Andy'), (1, 'Jacob')], names=['ranks', 'student']) Sort MultiIndex... MultiIndex([(1, 'Jacob'), (2, 'Peter'), (3, 'Andy'), (4, 'Chris')], names=['ranks', 'student']) 
Advertisements
close