Skip to contents

Compute the coordinate points of confidence ellipses at a specified confidence level.

Usage

confidence_ellipse(
  .data,
  x,
  y,
  .group_by = NULL,
  conf_level = 0.95,
  robust = FALSE,
  distribution = "normal"
)

Arguments

.data

data frame or tibble.

x

column name for the x-axis variable.

y

column name for the y-axis variable.

.group_by

column name for the grouping variable (NULL by default). Note that this grouping variable must be a factor.

conf_level

confidence level for the ellipse (0.95 by default).

robust

optional (FALSE by default). When set to TRUE, it indicates that robust estimation method is employed to calculate the coordinates of the ellipse. The location is estimated using a 1-step M-estimator with the biweight psi function, while the scale is estimated using the Minimum Covariance Determinant (MCD) estimator. This approach is more resistant to outliers and provides more reliable ellipse boundaries when the data contains extreme values or follows a non-normal distribution.

distribution

optional ("normal" by default). The distribution used to calculate the quantile for the ellipse. It can be either "normal" or "hotelling".

Value

Data frame of the coordinates points.

Details

The function computes the coordinates of the confidence ellipse based on the specified confidence level and the provided data. It can handle both classical and robust estimation methods, and it supports grouping by a factor variable. The distribution parameter controls the statistical approach used for ellipse calculation. The "normal" option uses the chi-square distribution quantile, which is appropriate when working with very large samples. Whereas the "hotelling" option uses Hotelling's T² distribution quantile. This approach accounts for uncertainty in estimating both mean and covariance from sample data, producing larger ellipses that better reflect sampling uncertainty. This is statistically more rigorous for smaller sample sizes where parameter estimation uncertainty is higher.

The combination of distribution = "hotelling" and robust = TRUE offers the most conservative and statistically rigorous approach, particularly recommended for exploratory data analysis and when dealing with datasets that may not meet ideal statistical assumptions. For very large samples, the default settings (distribution = "normal", robust = FALSE) may be sufficient, as the differences between methods diminish with increasing sample size.

References

  • Raymaekers, J., Rousseeuw P.J. (2019). Fast robust correlation for high dimensional data. Technometrics, 63(2), 184-198.

  • Brereton, R. G. (2016). Hotelling’s T-squared distribution, its relationship to the F distribution and its use in multivariate space. Journal of Chemometrics, 30(1), 18–21.

Author

Christian L. Goueguel

Examples

# Data
data("glass", package = "ConfidenceEllipse")
# Confidence ellipse
ellipse <- confidence_ellipse(.data = glass, x = SiO2, y = Na2O)
ellipse_grp <- confidence_ellipse(
.data = glass,
x = SiO2,
y = Na2O,
.group_by = glassType
)