statsmodels.graphics.boxplots.beanplot

statsmodels.graphics.boxplots.beanplot(data, ax=None, labels=None, positions=None, side='both', jitter=False, plot_opts={})[source]

Make a bean plot of each dataset in the data sequence.

A bean plot is a combination of a violinplot (kernel density estimate of the probability density function per point) with a line-scatter plot of all individual data points.

Parameters:

data : sequence of ndarrays

Data arrays, one array per value in positions.

ax : Matplotlib AxesSubplot instance, optional

If given, this subplot is used to plot in instead of a new figure being created.

labels : list of str, optional

Tick labels for the horizontal axis. If not given, integers 1..len(data) are used.

positions : array_like, optional

Position array, used as the horizontal axis of the plot. If not given, spacing of the violins will be equidistant.

side : {‘both’, ‘left’, ‘right’}, optional

How to plot the violin. Default is ‘both’. The ‘left’, ‘right’ options can be used to create asymmetric violin plots.

jitter : bool, optional

If True, jitter markers within violin instead of plotting regular lines around the center. This can be useful if the data is very dense.

plot_opts : dict, optional

A dictionary with plotting options. All the options for violinplot can be specified, they will simply be passed to violinplot. Options specific to beanplot are:

  • ‘violin_width’ : float. Relative width of violins. Max available
    space is 1, default is 0.8.
  • ‘bean_color’, MPL color. Color of bean plot lines. Default is ‘k’.
    Also used for jitter marker edge color if jitter is True.
  • ‘bean_size’, scalar. Line length as a fraction of maximum length.
    Default is 0.5.
  • ‘bean_lw’, scalar. Linewidth, default is 0.5.
  • ‘bean_show_mean’, bool. If True (default), show mean as a line.
  • ‘bean_show_median’, bool. If True (default), show median as a
    marker.
  • ‘bean_mean_color’, MPL color. Color of mean line. Default is ‘b’.
  • ‘bean_mean_lw’, scalar. Linewidth of mean line, default is 2.
  • ‘bean_mean_size’, scalar. Line length as a fraction of maximum length.
    Default is 0.5.
  • ‘bean_median_color’, MPL color. Color of median marker. Default
    is ‘r’.
  • ‘bean_median_marker’, MPL marker. Marker type, default is ‘+’.
  • ‘jitter_marker’, MPL marker. Marker type for jitter=True.
    Default is ‘o’.
  • ‘jitter_marker_size’, int. Marker size. Default is 4.
  • ‘jitter_fc’, MPL color. Jitter marker face color. Default is None.
  • ‘bean_legend_text’, str. If given, add a legend with given text.
Returns:

fig : Matplotlib figure instance

If ax is None, the created figure. Otherwise the figure to which ax is connected.

See also

violinplot
Violin plot, also used internally in beanplot.
matplotlib.pyplot.boxplot
Standard boxplot.

References

P. Kampstra, “Beanplot: A Boxplot Alternative for Visual Comparison of Distributions”, J. Stat. Soft., Vol. 28, pp. 1-9, 2008.

Examples

We use the American National Election Survey 1996 dataset, which has Party Identification of respondents as independent variable and (among other data) age as dependent variable.

>>> data = sm.datasets.anes96.load_pandas()
>>> party_ID = np.arange(7)
>>> labels = ["Strong Democrat", "Weak Democrat", "Independent-Democrat",
...           "Independent-Indpendent", "Independent-Republican",
...           "Weak Republican", "Strong Republican"]

Group age by party ID, and create a violin plot with it:

>>> plt.rcParams['figure.subplot.bottom'] = 0.23  # keep labels visible
>>> age = [data.exog['age'][data.endog == id] for id in party_ID]
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> sm.graphics.beanplot(age, ax=ax, labels=labels,
...                      plot_opts={'cutoff_val':5, 'cutoff_type':'abs',
...                                 'label_fontsize':'small',
...                                 'label_rotation':30})
>>> ax.set_xlabel("Party identification of respondent.")
>>> ax.set_ylabel("Age")
>>> plt.show()

(Source code, png, hires.png, pdf)

../_images/graphics_boxplot_beanplot.png