This function calculates Hotelling's T-squared statistic and, when applicable, the lengths of the semi-axes of the Hotelling's ellipse. It can work with a specified number of components or use a cumulative variance threshold.
ellipseParam(
x,
k = 2,
pcx = 1,
pcy = 2,
threshold = NULL,
rel.tol = 0.001,
abs.tol = .Machine$double.eps
)
A matrix, data frame or tibble containing scores from PCA, PLS, ICA, or other similar methods. Each column should represent a component, and each row an observation.
An integer specifying the number of components to use (default is 2). This parameter is ignored if threshold
is provided.
An integer specifying which component to use for the x-axis when k = 2
(default is 1).
An integer specifying which component to use for the y-axis when k = 2
(default is 2).
A numeric value between 0 and 1 specifying the desired cumulative explained variance threshold (default is NULL
). If provided, the function determines the minimum number of components needed to explain at least this proportion of total variance. When NULL
, the function uses the fixed number of components specified by k
.
A numeric value specifying the minimum proportion of total variance a component should explain to be considered non-negligible (default is 0.001, i.e., 0.1%).
A numeric value specifying the minimum absolute variance a component should have to be considered non-negligible (default is .Machine$double.eps
).
A list containing the following elements:
Tsquare
: A data frame containing the T-squared statistic for each observation.
Ellipse
: A data frame containing the lengths of the semi-minor and semi-major axes (only when k = 2
).
cutoff.99pct
: The T-squared cutoff value at the 99% confidence level.
cutoff.95pct
: The T-squared cutoff value at the 95% confidence level.
nb.comp
: The number of components used in the calculation.
When threshold
is used, the function selects the minimum number of k
components
that cumulatively explain at least the specified proportion of variance. This
parameter allows for dynamic component selection based on explained variance,
rather than using a fixed number of components. It must be greater than rel.tol
.
Typical values range from 0.8 to 0.95.
The rel.tol
parameter sets a minimum variance threshold for individual components.
Components with variance below this threshold are considered negligible and are
removed from the analysis. Setting rel.tol
too high
may remove potentially important components, while setting it too low may
retain noise or cause computational issues. Adjust based on your data
characteristics and analysis goals.
Note that components are considered to have near-zero variance and are removed
if their relative variance is below rel_tol
or their absolute variance is
below abs_tol
. This dual-threshold approach helps ensure numerical stability
while also accounting for the relative importance of components. The default
value for abs.tol
is set to .Machine$double.eps
, providing a lower bound
for detecting near-zero variance that may cause numerical instability.
if (FALSE) {
# Load required libraries
library(HotellingEllipse)
library(dplyr)
data("specData", package = "HotellingEllipse")
# Perform PCA
set.seed(123)
pca_mod <- specData %>%
select(where(is.numeric)) %>%
FactoMineR::PCA(scale.unit = FALSE, graph = FALSE)
# Extract PCA scores
pca_scores <- pca_mod$ind$coord %>% as.data.frame()
# Example 1: Calculate Hotelling's T-squared and ellipse parameters using
# the 2nd and 4th components
T2_fixed <- ellipseParam(x = pca_scores, pcx = 2, pcy = 4)
# Example 2: Calculate using the first 4 components
T2_comp <- ellipseParam(x = pca_scores, k = 4)
# Example 3: Calculate using a cumulative variance threshold
T2_threshold <- ellipseParam(x = pca_scores, threshold = 0.95)
}