Skip to contents

This function implements the generalized boxplot, a robust data visualization technique designed to effectively represent skewed and heavy-tailed distributions, as proposed by Bruffaerts et al. (2014).

Usage

generalized_boxplot(
  x,
  alpha = 0.05,
  p = 0.9,
  plot = TRUE,
  xlabels.angle = 90,
  xlabels.vjust = 1,
  xlabels.hjust = 1,
  box.width = 0.5,
  notch = FALSE,
  notchwidth = 0.5,
  staplewidth = 0.5
)

Arguments

x

A numeric data frame or tibble.

alpha

A scalar, between 0 and 1 that specifies the desired detection rate of atypical values.

p

A scalar, between 0.5 and 1 that specifies the quantile order for estimating g and h.

plot

Logical value indicating whether to plot the boxplot or return the boxplot statistics.

xlabels.angle

A numeric value specifying the angle (in degrees) for x-axis labels (default is 90).

xlabels.vjust

A numeric value specifying the vertical justification of x-axis labels (default is 1).

xlabels.hjust

A numeric value specifying the horizontal justification of x-axis labels (default is 1).

box.width

A numeric value specifying the width of the boxplot (default is 0.5).

notch

A logical value indicating whether to display a notched boxplot (default is FALSE).

notchwidth

A numeric value specifying the width of the notch relative to the body of the boxplot (default is 0.5).

staplewidth

A numeric value specifying the width of staples at the ends of the whiskers.

Value

  • If plot = TRUE, returns a ggplot2 object containing the generalized boxplot.

  • If plot = FALSE, returns a list of tibbles with the generalized boxplot statistics and potantial outliers.

Details

This method extends the adjusted boxplot method by leveraging the flexible Tukey's g-and-h parametric distribution to model the underlying data structure, particularly for asymmetric or long-tailed datasets, providing a more nuanced and informative summary of the data's central tendency, spread, and potential outliers.

References

  • Bruffaerts, C., Verardi, V., Vermandele, C. (2014). A generalized boxplot for skewed and heavy-tailed distributions. Statistics and Probability Letters 95(C):110–117

Author

Christian L. Goueguel

Examples

set.seed(123)
data <- data.frame(
  normal = rnorm(100),
  skewed = rexp(100, rate = 0.5),
  heavy_tailed = rt(100, df = 3)
)

# Plot the generalized boxplot
generalized_boxplot(data)


# Retrieve the generalized boxplot statistics
generalized_boxplot(data, plot = FALSE)
#> $stats
#> # A tibble: 3 × 6
#>   variable      lower     q1 median    q3 upper
#>   <fct>         <dbl>  <dbl>  <dbl> <dbl> <dbl>
#> 1 normal       -2.42  -0.494 0.0618 0.692  2.31
#> 2 skewed       -0.217  0.685 1.43   2.99   8.96
#> 3 heavy_tailed -8.74  -0.569 0.146  0.835  6.56
#> 
#> $outliers
#> # A tibble: 0 × 3
#> # ℹ 3 variables: variable <fct>, out <chr>, value <dbl>
#>