Python Pandas - Indexing with MultiIndex



Indexing with MultiIndex refers to accessing and selecting data in a Pandas DataFrame that has multiple levels of indexing. Unlike standard DataFrames that have a single index, a MultiIndexed DataFrame allows hierarchical indexing, where rows and columns are labeled using multiple keys.

This type of indexing is useful for handling structured datasets, making it easier to perform operations like grouping, slicing, and advanced selections. Instead of using a single label or position-based indexing, you can use tuples of labels to access data at different levels.

In this tutorial, you will learn how to use MultiIndex for advanced indexing and selection, including slicing, and Boolean indexing.

Basic Indexing with MultiIndex

Indexing with MultiIndex is similar to single-index DataFrames, but here you can also use tuples to index by multiple levels.

Example

Here is a basic example of selecting a subset of data using the level name with the .loc[] indexer.

 import pandas as pd # Create a MultiIndex object index = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')]) # Create a DataFrame data = [[1, 2], [3, 4], [5, 6], [7, 8]] df = pd.DataFrame(data, index=index, columns=['X', 'Y']) # Display the input DataFrame print('Original MultiIndexed DataFrame:\n',df) # Select all rows based on the level label print('Selected Subset:\n',df.loc['A']) 

Following is the output of the above code −

 Original MultiIndexed DataFrame: 
XY
Aone12
two34
Bone56
two78
Selected Subset:
XY
one12
two34

Example

Here is another example demonstrating indexing a MultiIndexed DataFrame using a tuple of level labels with the .loc[] indexer.

 import pandas as pd # Create a MultiIndex object index = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')]) # Create a DataFrame data = [[1, 2], [3, 4], [5, 6], [7, 8]] df = pd.DataFrame(data, index=index, columns=['X', 'Y']) # Display the input DataFrame print('Original MultiIndexed DataFrame:\n',df) # Index the data based on the tuple of level labels print('Selected Subset:') print(df.loc[('B', 'one')]) 

Following is the output of the above code −

 Original MultiIndexed DataFrame: 
XY
Aone12
two34
Bone56
two78
Selected Subset: X 5 Y 6 Name: (B, one), dtype: int64

Advanced Indexing with MultiIndexed Data

Advanced indexing with a MultiIndexed DataFrame can be done by using the .loc indexer, it allows you to specify more complex conditions and selections in a MultiIndex DataFrame.

Example

Following is the example of selecting the data from a MultiIndexed DataFrame using the advanced indexing with .loc[] indexer.

 import pandas as pd # Create a MultiIndex object index = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')]) # Create a DataFrame data = [[1, 2], [3, 4], [5, 6], [7, 8]] df = pd.DataFrame(data, index=index, columns=['X', 'Y']) # Display the input DataFrame print('Original MultiIndexed DataFrame:\n',df) # Select specific element print('Selected data:') print(df.loc[('A', 'two'), 'Y']) 

Following is the output of the above code −

 Original MultiIndexed DataFrame: 
XY
Aone12
two34
Bone56
two78
Selected data: 4

Boolean Indexing with MultiIndex

Pandas MultiIndexed objects allows you to apply the boolean indexing to filter data based on conditions. It will create a mask and apply it to the DataFrame.

Example

The following example demonstrates applying the boolean indexing to the MultiIndexed DataFrame to select the rows where 'X' is greater than 2.

 import pandas as pd # Create a MultiIndex object index = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')]) # Create a DataFrame data = [[1, 2], [3, 4], [5, 6], [7, 8]] df = pd.DataFrame(data, index=index, columns=['X', 'Y']) # Display the input DataFrame print('Original MultiIndexed DataFrame:\n',df) # Select data based on the boolean indexing print('Selected data:') mask = df['X'] > 2 print(df[mask]) 

Following is the output of the above code −

 Original MultiIndexed DataFrame: 
XY
Aone12
two34
Bone56
two78
Selected data:
XY
Atwo34
Bone56
two78

Slicing with MultiIndex

Slicing with MultiIndex works similarly to single-index DataFrames but requires tuples for complex operations.

Example

This example demonstrates how to apply slicing to a MultiIndexed DataFrame using the pandas slicer and the .loc[] indexer.

 import pandas as pd # Create a MultiIndex object index = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('A', 'three'),('B', 'one'), ('B', 'two'), ('B', 'three')]) # Create a DataFrame data = [[1, 2], [3, 4], [1, 1], [5, 6], [7, 8], [2, 2]] df = pd.DataFrame(data, index=index, columns=['X', 'Y']) # Display the input DataFrame print('Original MultiIndexed DataFrame:\n',df) # Slice rows between 'A' and 'B' print('Sliced data:') print(df.loc[('A', 'B'),['one','three'],:]) 

Following is the output of the above code −

 Original MultiIndexed DataFrame: 
XY
Aone12
two34
three11
Bone56
two78
three22
Sliced data:
XY
Aone12
three11
Bone56
three22
Advertisements
close