This function calculates Hotelling's T-squared statistic and, when applicable, the lengths of the semi-axes of the Hotelling's ellipse. It can work with a specified number of components or use a cumulative variance threshold.

ellipseParam(
  x,
  k = 2,
  pcx = 1,
  pcy = 2,
  threshold = NULL,
  rel.tol = 0.001,
  abs.tol = .Machine$double.eps
)

Arguments

x

A matrix, data frame or tibble containing scores from PCA, PLS, ICA, or other similar methods. Each column should represent a component, and each row an observation.

k

An integer specifying the number of components to use (default is 2). This parameter is ignored if threshold is provided.

pcx

An integer specifying which component to use for the x-axis when k = 2 (default is 1).

pcy

An integer specifying which component to use for the y-axis when k = 2 (default is 2).

threshold

A numeric value between 0 and 1 specifying the desired cumulative explained variance threshold (default is NULL). If provided, the function determines the minimum number of components needed to explain at least this proportion of total variance. When NULL, the function uses the fixed number of components specified by k.

rel.tol

A numeric value specifying the minimum proportion of total variance a component should explain to be considered non-negligible (default is 0.001, i.e., 0.1%).

abs.tol

A numeric value specifying the minimum absolute variance a component should have to be considered non-negligible (default is .Machine$double.eps).

Value

A list containing the following elements:

  • Tsquare: A data frame containing the T-squared statistic for each observation.

  • Ellipse: A data frame containing the lengths of the semi-minor and semi-major axes (only when k = 2).

  • cutoff.99pct: The T-squared cutoff value at the 99% confidence level.

  • cutoff.95pct: The T-squared cutoff value at the 95% confidence level.

  • nb.comp: The number of components used in the calculation.

Details

When threshold is used, the function selects the minimum number of k components that cumulatively explain at least the specified proportion of variance. This parameter allows for dynamic component selection based on explained variance, rather than using a fixed number of components. It must be greater than rel.tol. Typical values range from 0.8 to 0.95.

The rel.tol parameter sets a minimum variance threshold for individual components. Components with variance below this threshold are considered negligible and are removed from the analysis. Setting rel.tol too high may remove potentially important components, while setting it too low may retain noise or cause computational issues. Adjust based on your data characteristics and analysis goals.

Note that components are considered to have near-zero variance and are removed if their relative variance is below rel_tol or their absolute variance is below abs_tol. This dual-threshold approach helps ensure numerical stability while also accounting for the relative importance of components. The default value for abs.tol is set to .Machine$double.eps, providing a lower bound for detecting near-zero variance that may cause numerical instability.

Author

Christian L. Goueguel christian.goueguel@gmail.com

Examples

if (FALSE) {
# Load required libraries
library(HotellingEllipse)
library(dplyr)

data("specData", package = "HotellingEllipse")

# Perform PCA
set.seed(123)
pca_mod <- specData %>%
  select(where(is.numeric)) %>%
  FactoMineR::PCA(scale.unit = FALSE, graph = FALSE)

# Extract PCA scores
pca_scores <- pca_mod$ind$coord %>% as.data.frame()

# Example 1: Calculate Hotelling's T-squared and ellipse parameters using
# the 2nd and 4th components
T2_fixed <- ellipseParam(x = pca_scores, pcx = 2, pcy = 4)

# Example 2: Calculate using the first 4 components
T2_comp <- ellipseParam(x = pca_scores, k = 4)

# Example 3: Calculate using a cumulative variance threshold
T2_threshold <- ellipseParam(x = pca_scores, threshold = 0.95)
}