Skip to contents

This function calculates classical or robust z-score (standardization) for a numeric vector.

Usage

zscore(x, robust = FALSE, drop.na = FALSE)

Arguments

x

A numeric vector.

robust

A logical value indicating whether to calculate classical or robust z-score. If FALSE (the default), uses the classical approach. If TRUE, computes the robust method, i.e. the so-called Stahel-Donoho outlyingness.

drop.na

A logical value indicating whether to remove missing values (NA) from the calculations. If TRUE, missing values will be removed. If FALSE (the default), missing values will be included in the calculations.

Value

A tibble with two columns:

  • data: The original numeric values.

  • score: The calculated z-scores.

Details

Z-scores are useful for comparing data points from different distributions because they are dimensionless and standardized. A positive z-score indicates that the data point is above the mean (or the median in the robust approach), while a negative z-score indicates that the data point is below the mean (or the median). One common rule to detect outliers using z-scores is the "three-sigma rule", in which data points with an absolute z-score greater than 3 (|z| > 3) can be considered potential outliers, as they fall outside the range that covers 99.7% of the data points in a normal distribution. (Note that a cutoff of |z| > 2.5 is also often used).

References

  • Rousseeuw, P. J., and Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424), 1273-1283.

  • Rousseeuw, P. J., and Hubert, M. (2011). Robust statistics for outlier detection. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1), 73-79.

  • Donoho, D., (1982). Breakdown properties of multivariate location estimators. Ph.D. Qualifying paper, Dept. Statistics, Harvard University, Boston.

  • Stahel, W., (1981). Robuste Schätzungen: infinitesimale Optimalität und Schätzungen vonKovarianzmatrizen. PhD thesis, ETH Zürich.

Author

Christian L. Goueguel

Examples

x <- c(1:5, 100)
# Non-robust approach
zscore(x)
#> # A tibble: 6 × 2
#>    data  score
#>   <dbl>  <dbl>
#> 1   100  2.04 
#> 2     5 -0.358
#> 3     4 -0.383
#> 4     3 -0.408
#> 5     2 -0.433
#> 6     1 -0.458

# Robust approach
zscore(x, robust = TRUE)
#> # A tibble: 6 × 2
#>    data  score
#>   <dbl>  <dbl>
#> 1   100 43.4  
#> 2     5  0.674
#> 3     4  0.225
#> 4     3 -0.225
#> 5     2 -0.674
#> 6     1 -1.12