
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Comparing Categorical Data
Comparing categorical data is an essential task for getting insights and understanding the relationships between different categories of the data. In Python, Pandas provides various ways to perform comparisons using comparison operators (==, !=, >, >=, <, and <=) on categorical data. These comparisons can be made in three main scenarios −
Equality comparison (== and !=).
All comparisons (==, !=, >, >=, <, and <=).
Comparing categorical data to a scalar value.
It is important to note that any non-equality comparisons between categorical data with different categories or between a categorical Series and a list-like object will raise a TypeError. This is due to the categories ordering could be interpreted in two ways, one with taking into account the ordering and one without.
In this tutorial, we will learn how to compare categorical data in Python Pandas library using the comparison operators such as ==, !=, >, >=, <, and <=.
Equality comparisons of Categorical Data
In Pandas, comparing categorical data for equality is possible with a variety of objects such as lists, arrays, or Series objects of the same length as the categorical data.
Example
The following example demonstrates how to perform equality and inequality comparisons between categorical Series and the list-like objects.
import pandas as pd from pandas.api.types import CategoricalDtype import numpy as np # Creating a categorical Series s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) # Creating another categorical Series for comparison s2 = pd.Series([2, 2, 2, 1, 1, 3, 3, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) # Equality comparison print("Equality comparison (s == s2):") print(s == s2) print("\nInequality comparison (s != s2):") print(s != s2) # Equality comparison with a NumPy array print("\nEquality comparison with NumPy array:") print(s == np.array([1, 2, 3, 1, 2, 3, 2, 1]))
Following is the output of the above code −
Equality comparison (s == s2): 0 False 1 True 2 False 3 True 4 False 5 True 6 False 7 True dtype: bool Inequality comparison (s != s2): 0 True 1 False 2 True 3 False 4 True 5 False 6 True 7 False dtype: bool Equality comparison with NumPy array: 0 True 1 True 2 False 3 True 4 True 5 True 6 False 7 False dtype: bool
All Comparisons of Categorical Data
Pandas allows you to perform various comparison operations including (>, >=, <=, <=) between the ordered categorical data.
Example
This example demonstrates how to perform non-equality comparisons (>, >=, <=, <=) on ordered categorical data.
import pandas as pd from pandas.api.types import CategoricalDtype import numpy as np # Creating a categorical Series s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) # Creating another categorical Series for comparison s2 = pd.Series([2, 2, 2, 1, 1, 3, 3, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) # Greater than comparison print("Greater than comparison:\n",s > s2) # Less than comparison print("\nLess than comparison:\n",s < s2) # Greater than or equal to comparison print("\nGreater than or equal to comparison:\n",s >= s2) # Lessthan or equal to comparison print("\nLess than or equal to comparison:\n",s <= s2)
Following is the output of the above code −
Greater than comparison: 0 True 1 False 2 True 3 False 4 False 5 False 6 True 7 False dtype: bool Less than comparison: 0 False 1 False 2 False 3 False 4 True 5 False 6 False 7 False dtype: bool Greater than or equal to comparison: 0 True 1 True 2 True 3 True 4 False 5 True 6 True 7 True dtype: bool Lessthan or equal to comparison: 0 False 1 True 2 False 3 True 4 True 5 True 6 False 7 True dtype: bool
Comparing Categorical Data to Scalars
Categorical data can also be compared to scalar values using all comparison operators (==, !=, >, >=, <, and <=). The categorical values are compared to the scalar based on the order of their categories.
Example
The following example demonstrates how the categorical data can be compared to a scalar value.
import pandas as pd # Creating a categorical Series s = pd.Series([1, 2, 3]).astype(pd.CategoricalDtype([3, 2, 1], ordered=True)) # Compare to a scalar print("Comparing categorical data to a scalar:") print(s > 2)
Following is the output of the above code −
Comparing categorical data to a scalar: 0 True 1 False 2 False dtype: bool
Comparing Categorical Data with Different Categories
When comparing two categorical Series that have different categories or orderings, then a TypeError will be raised.
Example
The following example demonstrates handling the TypeError while performing the comparison between the two categorical Series objects with the different categories or orders.
import pandas as pd from pandas.api.types import CategoricalDtype import numpy as np # Creating a categorical Series s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) # Creating another categorical Series for comparison s3 = pd.Series([2, 2, 2, 1, 1, 3, 1, 2]).astype(CategoricalDtype(ordered=True)) try: print("Attempting to compare differently ordered two Series objects:") print(s > s3) except TypeError as e: print("TypeError:", str(e))
Following is the output of the above code −
Attempting to compare differently ordered two Series objects: TypeError: Categoricals can only be compared if 'categories' are the same.