Transforms each variable in a dataset toward central normality using re-weighted maximum likelihood to robustly fit the Box-Cox or Yeo-Johnson transformation.
Arguments
- x
A data frame or tibble containing the variables to be transformed.
- var
A vector of character or numeric variable names to be transformed. If
NULL
(default), all columns are selected.- type
A character string specifying the transformation method(s) to use. Allowed values are "BC", "YJ", or "bestObj" (default).
- quantile
A numeric value between 0 and 1 specifying the quantile to use for determining the weights in the re-weighting step. Default is 0.99.
- nbsteps
An integer specifying the number of re-weighting steps to perform. Default is 2.
Value
A list containing two data frames:
summary
:variable
: the variable(s) namelambda
: the estimated lambda parametermethod
: the method used ('BC' for Box-Cox or 'YJ' for Yeo-Johnson)objective
: the objective function value
transformation
:the transformed variable(s)
Details
The Box-Cox and Yeo-Johnson transformations are power transformations
aimed at making the data distribution more normal-like. The Box-Cox
transformation is suitable for strictly positive values, while the
Yeo-Johnson transformation can handle both positive and negative values.
The function is a wrapper around the transfo
function
from the cellWise
package, which applies a robust version of these
transformations by using re-weighted maximum likelihood estimation.
This approach downweights outlying observations to make the transformation
more robust to their influence.
The type
parameter controls which transformation method(s) to use:
"BC": Only applies the Box-Cox transformation to strictly positive variables.
"YJ": Only applies the Yeo-Johnson transformation to all variables.
"bestObj" (default): For strictly positive variables, both BC and YJ are applied, and the solution with the lowest objective function value is kept. For variables with negative values, only YJ is applied.