Python Pandas - Comparing Categorical Data



Comparing categorical data is an essential task for getting insights and understanding the relationships between different categories of the data. In Python, Pandas provides various ways to perform comparisons using comparison operators (==, !=, >, >=, <, and <=) on categorical data. These comparisons can be made in three main scenarios −

  • Equality comparison (== and !=).

  • All comparisons (==, !=, >, >=, <, and <=).

  • Comparing categorical data to a scalar value.

It is important to note that any non-equality comparisons between categorical data with different categories or between a categorical Series and a list-like object will raise a TypeError. This is due to the categories ordering could be interpreted in two ways, one with taking into account the ordering and one without.

In this tutorial, we will learn how to compare categorical data in Python Pandas library using the comparison operators such as ==, !=, >, >=, <, and <=.

Equality comparisons of Categorical Data

In Pandas, comparing categorical data for equality is possible with a variety of objects such as lists, arrays, or Series objects of the same length as the categorical data.

Example

The following example demonstrates how to perform equality and inequality comparisons between categorical Series and the list-like objects.

 import pandas as pd from pandas.api.types import CategoricalDtype import numpy as np # Creating a categorical Series s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) # Creating another categorical Series for comparison s2 = pd.Series([2, 2, 2, 1, 1, 3, 3, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) # Equality comparison print("Equality comparison (s == s2):") print(s == s2) print("\nInequality comparison (s != s2):") print(s != s2) # Equality comparison with a NumPy array print("\nEquality comparison with NumPy array:") print(s == np.array([1, 2, 3, 1, 2, 3, 2, 1])) 

Following is the output of the above code −

 Equality comparison (s == s2): 0 False 1 True 2 False 3 True 4 False 5 True 6 False 7 True dtype: bool Inequality comparison (s != s2): 0 True 1 False 2 True 3 False 4 True 5 False 6 True 7 False dtype: bool Equality comparison with NumPy array: 0 True 1 True 2 False 3 True 4 True 5 True 6 False 7 False dtype: bool 

All Comparisons of Categorical Data

Pandas allows you to perform various comparison operations including (>, >=, <=, <=) between the ordered categorical data.

Example

This example demonstrates how to perform non-equality comparisons (>, >=, <=, <=) on ordered categorical data.

 import pandas as pd from pandas.api.types import CategoricalDtype import numpy as np # Creating a categorical Series s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) # Creating another categorical Series for comparison s2 = pd.Series([2, 2, 2, 1, 1, 3, 3, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) # Greater than comparison print("Greater than comparison:\n",s > s2) # Less than comparison print("\nLess than comparison:\n",s < s2) # Greater than or equal to comparison print("\nGreater than or equal to comparison:\n",s >= s2) # Lessthan or equal to comparison print("\nLess than or equal to comparison:\n",s <= s2) 

Following is the output of the above code −

 Greater than comparison: 0 True 1 False 2 True 3 False 4 False 5 False 6 True 7 False dtype: bool Less than comparison: 0 False 1 False 2 False 3 False 4 True 5 False 6 False 7 False dtype: bool Greater than or equal to comparison: 0 True 1 True 2 True 3 True 4 False 5 True 6 True 7 True dtype: bool Lessthan or equal to comparison: 0 False 1 True 2 False 3 True 4 True 5 True 6 False 7 True dtype: bool 

Comparing Categorical Data to Scalars

Categorical data can also be compared to scalar values using all comparison operators (==, !=, >, >=, <, and <=). The categorical values are compared to the scalar based on the order of their categories.

Example

The following example demonstrates how the categorical data can be compared to a scalar value.

 import pandas as pd # Creating a categorical Series s = pd.Series([1, 2, 3]).astype(pd.CategoricalDtype([3, 2, 1], ordered=True)) # Compare to a scalar print("Comparing categorical data to a scalar:") print(s > 2) 

Following is the output of the above code −

 Comparing categorical data to a scalar: 0 True 1 False 2 False dtype: bool 

Comparing Categorical Data with Different Categories

When comparing two categorical Series that have different categories or orderings, then a TypeError will be raised.

Example

The following example demonstrates handling the TypeError while performing the comparison between the two categorical Series objects with the different categories or orders.

 import pandas as pd from pandas.api.types import CategoricalDtype import numpy as np # Creating a categorical Series s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) # Creating another categorical Series for comparison s3 = pd.Series([2, 2, 2, 1, 1, 3, 1, 2]).astype(CategoricalDtype(ordered=True)) try: print("Attempting to compare differently ordered two Series objects:") print(s > s3) except TypeError as e: print("TypeError:", str(e)) 

Following is the output of the above code −

 Attempting to compare differently ordered two Series objects: TypeError: Categoricals can only be compared if 'categories' are the same. 
Advertisements
close