jupyter | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
In statistics, a histogram is representation of the distribution of numerical data, where the data are binned and the count for each bin is represented. More generally, in Plotly a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...) which can be used to visualize data on categorical and date axes as well as linear axes.
Alternatives to histogram plots for visualizing distributions include violin plots, box plots, ECDF plots and strip charts.
If you're looking instead for bar charts, i.e. representing raw, unaggregated data with rectangular bar, go to the Bar Chart tutorial.
Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures.
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="total_bill") fig.show()
importplotly.expressaspxdf=px.data.tips() # Here we use a column with categorical datafig=px.histogram(df, x="day") fig.show()
By default, the number of bins is chosen so that this number is comparable to the typical number of samples in a bin. This number can be customized, as well as the range of values.
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="total_bill", nbins=20) fig.show()
Plotly histograms will automatically bin date data in addition to numerical data:
importplotly.expressaspxdf=px.data.stocks() fig=px.histogram(df, x="date") fig.update_layout(bargap=0.2) fig.show()
Plotly histograms will automatically bin numerical or date data but can also be used on raw categorical data, as in the following example, where the X-axis value is the categorical "day" variable:
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="day", category_orders=dict(day=["Thur", "Fri", "Sat", "Sun"])) fig.show()
Dash is the best way to build analytical apps in Python using Plotly figures. To run the app below, run pip install dash
, click "Download" to get the code and run python app.py
.
Get started with the official Dash docs and learn how to effortlessly style & deploy apps like this with Dash Enterprise.
fromIPython.displayimportIFramesnippet_url='https://python-docs-dash-snippets.herokuapp.com/python-docs-dash-snippets/'IFrame(snippet_url+'histograms', width='100%', height=1200)
Sign up for Dash Club → Free cheat sheets plus updates from Chris Parmer and Adam Schroeder delivered to your inbox every two months. Includes tips and tricks, community apps, and deep dives into the Dash architecture. Join now.
JavaScript calculates the y-axis (count) values on the fly in the browser, so it's not accessible in the fig
. You can manually calculate it using np.histogram
.
importplotly.expressaspximportnumpyasnpdf=px.data.tips() # create the binscounts, bins=np.histogram(df.total_bill, bins=range(0, 60, 5)) bins=0.5* (bins[:-1] +bins[1:]) fig=px.bar(x=bins, y=counts, labels={'x':'total_bill', 'y':'count'}) fig.show()
The default mode is to represent the count of samples in each bin. With the histnorm
argument, it is also possible to represent the percentage or fraction of samples in each bin (histnorm='percent'
or probability
), or a density histogram (the sum of all bar areas equals the total number of sample points, density
), or a probability density histogram (the sum of all bar areas equals 1, probability density
).
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="total_bill", histnorm='probability density') fig.show()
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="total_bill", title='Histogram of bills', labels={'total_bill':'total bill'}, # can specify one label per df columnopacity=0.8, log_y=True, # represent bars with log scalecolor_discrete_sequence=['indianred'] # color of histogram bars ) fig.show()
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="total_bill", color="sex") fig.show()
For each bin of x
, one can compute a function of data using histfunc
. The argument of histfunc
is the dataframe column given as the y
argument. Below the plot shows that the average tip increases with the total bill.
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="total_bill", y="tip", histfunc='avg') fig.show()
The default histfunc
is sum
if y
is given, and works with categorical as well as binned numeric data on the x
axis:
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="day", y="total_bill", category_orders=dict(day=["Thur", "Fri", "Sat", "Sun"])) fig.show()
New in v5.0
Histograms afford the use of patterns (also known as hatching or texture) in addition to color:
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="sex", y="total_bill", color="sex", pattern_shape="smoker") fig.show()
With the marginal
keyword, a marginal is drawn alongside the histogram, visualizing the distribution. See the distplot page for more examples of combined statistical representations.
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="total_bill", color="sex", marginal="rug", # can be `box`, `violin`hover_data=df.columns) fig.show()
New in v5.5
You can add text to histogram bars using the text_auto
argument. Setting it to True
will display the values on the bars, and setting it to a d3-format
formatting string will control the output format.
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="total_bill", y="tip", histfunc="avg", nbins=8, text_auto=True) fig.show()
If Plotly Express does not provide a good starting point, it is also possible to use the more generic go.Histogram
class from plotly.graph_objects
. All of the available histogram options are described in the histogram section of the reference page: https://plotly.com/python/reference#histogram.
importplotly.graph_objectsasgoimportnumpyasnpnp.random.seed(1) x=np.random.randn(500) fig=go.Figure(data=[go.Histogram(x=x)]) fig.show()
importplotly.graph_objectsasgoimportnumpyasnpx=np.random.randn(500) fig=go.Figure(data=[go.Histogram(x=x, histnorm='probability')]) fig.show()
importplotly.graph_objectsasgoimportnumpyasnpy=np.random.randn(500) # Use `y` argument instead of `x` for horizontal histogramfig=go.Figure(data=[go.Histogram(y=y)]) fig.show()
importplotly.graph_objectsasgoimportnumpyasnpx0=np.random.randn(500) # Add 1 to shift the mean of the Gaussian distributionx1=np.random.randn(500) +1fig=go.Figure() fig.add_trace(go.Histogram(x=x0)) fig.add_trace(go.Histogram(x=x1)) # Overlay both histogramsfig.update_layout(barmode='overlay') # Reduce opacity to see both histogramsfig.update_traces(opacity=0.75) fig.show()
importplotly.graph_objectsasgoimportnumpyasnpx0=np.random.randn(2000) x1=np.random.randn(2000) +1fig=go.Figure() fig.add_trace(go.Histogram(x=x0)) fig.add_trace(go.Histogram(x=x1)) # The two histograms are drawn on top of anotherfig.update_layout(barmode='stack') fig.show()
importplotly.graph_objectsasgoimportnumpyasnpx0=np.random.randn(500) x1=np.random.randn(500) +1fig=go.Figure() fig.add_trace(go.Histogram( x=x0, histnorm='percent', name='control', # name used in legend and hover labelsxbins=dict( # bins used for histogramstart=-4.0, end=3.0, size=0.5 ), marker_color='#EB89B5', opacity=0.75 )) fig.add_trace(go.Histogram( x=x1, histnorm='percent', name='experimental', xbins=dict( start=-3.0, end=4, size=0.5 ), marker_color='#330C73', opacity=0.75 )) fig.update_layout( title_text='Sampled Results', # title of plotxaxis_title_text='Value', # xaxis labelyaxis_title_text='Count', # yaxis labelbargap=0.2, # gap between bars of adjacent location coordinatesbargroupgap=0.1# gap between bars of the same location coordinates ) fig.show()
You can add text to histogram bars using the texttemplate
argument. In this example we add the x-axis values as text following the format %{variable}
. We also adjust the size of the text using textfont_size
.
importplotly.graph_objectsasgonumbers= ["5", "10", "3", "10", "5", "8", "5", "5"] fig=go.Figure() fig.add_trace(go.Histogram(x=numbers, name="count", texttemplate="%{x}", textfont_size=20)) fig.show()
importplotly.graph_objectsasgoimportnumpyasnpx=np.random.randn(500) fig=go.Figure(data=[go.Histogram(x=x, cumulative_enabled=True)]) fig.show()
importplotly.graph_objectsasgox= ["Apples","Apples","Apples","Oranges", "Bananas"] y= ["5","10","3","10","5"] fig=go.Figure() fig.add_trace(go.Histogram(histfunc="count", y=y, x=x, name="count")) fig.add_trace(go.Histogram(histfunc="sum", y=y, x=x, name="sum")) fig.show()
For custom binning along x-axis, use the attribute nbinsx
. Please note that the autobin algorithm will choose a 'nice' round bin size that may result in somewhat fewer than nbinsx
total bins. Alternatively, you can set the exact values for xbins
along with autobinx = False
.
importplotly.graph_objectsasgofromplotly.subplotsimportmake_subplotsx= ['1970-01-01', '1970-01-01', '1970-02-01', '1970-04-01', '1970-01-02', '1972-01-31', '1970-02-13', '1971-04-19'] fig=make_subplots(rows=3, cols=2) trace0=go.Histogram(x=x, nbinsx=4) trace1=go.Histogram(x=x, nbinsx=8) trace2=go.Histogram(x=x, nbinsx=10) trace3=go.Histogram(x=x, xbins=dict( start='1969-11-15', end='1972-03-31', size='M18'), # M18 stands for 18 monthsautobinx=False ) trace4=go.Histogram(x=x, xbins=dict( start='1969-11-15', end='1972-03-31', size='M4'), # 4 months bin sizeautobinx=False ) trace5=go.Histogram(x=x, xbins=dict( start='1969-11-15', end='1972-03-31', size='M2'), # 2 monthsautobinx=False ) fig.add_trace(trace0, 1, 1) fig.add_trace(trace1, 1, 2) fig.add_trace(trace2, 2, 1) fig.add_trace(trace3, 2, 2) fig.add_trace(trace4, 3, 1) fig.add_trace(trace5, 3, 2) fig.show()
If you want to display information about the individual items within each histogram bar, then create a stacked bar chart with hover information as shown below. Note that this is not technically the histogram chart type, but it will have a similar effect as shown below by comparing the output of px.histogram
and px.bar
. For more information, see the tutorial on bar charts.
importplotly.expressaspxdf=px.data.tips() fig1=px.bar(df, x='day', y='tip', height=300, title='Stacked Bar Chart - Hover on individual items') fig2=px.histogram(df, x='day', y='tip', histfunc='sum', height=300, title='Histogram Chart') fig1.show() fig2.show()
In this example both histograms have a compatible bin settings using bingroup attribute. Note that traces on the same subplot, and with the same barmode
("stack", "relative", "group") are forced into the same bingroup
, however traces with barmode = "overlay"
and on different axes (of the same axis type) can have compatible bin settings. Histogram and histogram2d trace can share the same bingroup
.
importplotly.graph_objectsasgoimportnumpyasnpfig=go.Figure(go.Histogram( x=np.random.randint(7, size=100), bingroup=1)) fig.add_trace(go.Histogram( x=np.random.randint(7, size=20), bingroup=1)) fig.update_layout( barmode="overlay", bargap=0.1) fig.show()
Histogram bars can also be sorted based on the ordering logic of the categorical values using the categoryorder attribute of the x-axis. Sorting of histogram bars using categoryorder
also works with multiple traces on the same x-axis. In the following examples, the histogram bars are sorted based on the total numerical values.
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="day").update_xaxes(categoryorder='total ascending') fig.show()
importplotly.expressaspxdf=px.data.tips() fig=px.histogram(df, x="day", color="smoker").update_xaxes(categoryorder='total descending') fig.show()
See function reference for px.histogram()
or https://plotly.com/python/reference/histogram/ for more information and chart attribute options!