Python Pandas - Histograms



A histogram is a graphical representation of the distribution of a dataset. It helps you to visualize the frequency of data within defined intervals, called bins. A histogram looks similar to a bar plot but the difference is, histograms represents the distribution of numerical data grouped into ranges (bins), whereas bar plots represent categorical data, with each bar corresponding to a specific category.

In this tutorial, we will learn how to create and customize histograms using the Pandas library with different examples.

Creating Histograms in Pandas

In Pandas, histograms can be created using the plot.hist() method for both the Series and DataFrames objects. This method results a matplotlib.AxesSubplot object containing the histogram plot.

  • DataFrame.plot.hist(): Creates histogram for one or more columns in a DataFrame.

  • Series.plot.hist(): Creates a histogram for a specific column or Series.

Syntax

Following is the syntax of the hist() method −

 DataFrame.plot.hist(by=None, bins=10, **kwargs) 

Where,

  • by: Groups the DataFrame by a column.

  • bins: The number of bins to use for the histogram. The default value is 10.

  • **kwargs: Additional arguments to customize the plot.

Example

Here is a basic example of creating a histogram for a DataFrame using the plot.hist() method.

 import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame with random data df = pd.DataFrame(np.random.rand(10, 2), columns=["a", "b"]) # Plot histogram ax = df.plot.hist() plt.title("Simple Histogram") plt.show() 

Following is the output of the above code −

Simple Histogram

Plotting a Stacked Histogram

A stacked histogram displays multiple numerical columns stacked on top of each other. This can be done by using the stacked=True parameter.

Example

This example creates a stacked histogram for a DataFrame using the stacked=True parameter.

 import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame with random data df = pd.DataFrame(np.random.rand(10, 2), columns=["a", "b"]) # Plot the stacked histogram df.plot.hist(stacked=True, bins=20, alpha=0.7, title="Stacked Histogram") plt.show() 

On executing the above code we will get the following output −

Stacked Histogram

Creating the Horizontal Histograms

To create a horizontal histogram, you can use orientation='horizontal' parameter of the plot.hist() method.

Example

This example creates a stacked histogram for a DataFrame using the stacked=True parameter.

 import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame with random data df = pd.DataFrame(np.random.rand(10, 2), columns=["a", "b"]) # Plot the stacked histogram df.plot.hist(orientation='horizontal', bins=20, alpha=0.7, title="Horizontal Histogram") plt.show() 

Following is the output of the above code −

Horizontal Histogram

Plotting the Cumulative Histogram

Cumulative histograms show the cumulative frequency distribution. Plotting the cumulative histogram can be done by setting the cumulative parameter to True.

Example

This example demonstrates plotting a cumulative histogram for a DataFrame using the cumulative=True parameter of the plot.hist() method.

 import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame with random data df = pd.DataFrame(np.random.rand(10, 2), columns=["a", "b"]) # Plot the Cumulative histogram df.plot.hist(cumulative='horizontal', bins=20, alpha=0.7, title="Cumulative Histogram") plt.show() 

On executing the above code we will get the following output −

Cumulative Histogram

Subplots for Histograms

You can create individual subplots for histograms of each column of a DataFrame using the direct DataFrame.hist() method.

Example

This example creates subplots for histogram of DataFrame columns using the DataFrame.hist() method.

 import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame with random data df = pd.DataFrame(np.random.rand(10, 2), columns=["a", "b"]) # Subplots for each column df.hist(color='lightgreen', bins=20) plt.suptitle("Histograms into Subplots") plt.show() 

Following is the output of the above code −

Histograms into Subplots

Grouped Histograms

Grouped histograms allow you to visualize data distribution by specific categories. We can use the by parameter to create histograms grouped by a column.

Example

This example creates a grouped histogram for DataFrame columns using the by parameter.

 import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame with random data x = ['A']*30 + ['B']*70 y = np.random.randn(100) df = pd.DataFrame({'Letter': x, 'Numbers': y}) # Plot the Grouped histogram df.plot.hist(by='Letter', bins=20, alpha=0.7, title="Grouped Histogram") plt.show() 

Following is the output of the above code −

Grouped Histograms
Advertisements
close