Skip to contents

Computes various correlation coefficients between a specified response variable and each of the remaining variables in a given data frame or tibble. The available correlation methods are Pearson's product-moment correlation (parametric), Spearman's rank correlation, Kendall's tau correlation (non-parametric), Chatterjee's new correlation coefficient, and the biweight midcorrelation (a robust correlation measure).

Usage

correlation(
  x,
  var,
  method = "pearson",
  plot = FALSE,
  color = "#111D71",
  interactive = FALSE
)

Arguments

x

A data frame or tibble containing the variables of interest.

var

A character string specifying the name of the response variable.

method

A character string indicating the correlation method to use. Allowed values are "pearson", "spearman", "kendall", "chatterjee", or "bicor" (for biweight midcorrelation). The default is "pearson".

plot

A logical value indicating whether to produce a visualization of the correlations. Default is FALSE (no plot).

color

A character string specifying the color to use for the plot. Default is "#111D71".

interactive

A logical value indicating whether to create an interactive plot using plotly. Default is FALSE (static ggplot2 plot).

Value

A list containing:

  • correlation: A tibble with columns for the variable name, correlation value, and method used.

  • plot: If plot = TRUE, a ggplot2 object (or a plotly object if interactive = TRUE).

Details

The Pearson correlation coefficient measures the linear relationship between two continuous variables and is suitable when the data follows a bivariate normal distribution. The Spearman and Kendall correlations are non-parametric measures of monotonic association, making them suitable for non-linear relationships and when the data deviates from normality. The Chatterjee correlation coefficient is a recently proposed measure that aims to address some limitations of existing correlation coefficients, particularly for heavy-tailed distributions and in the presence of outliers. The biweight midcorrelation is a robust correlation measure that downweights the influence of outliers and is recommended when the data contains extreme values or deviates significantly from normality.

References

  • Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116(536):2009-2022.

  • Wilcox, R. (2012). Introduction to robust estimation and hypothesis testing (3rd ed.). Academic Press. (ISBN 978-0123869838).

Author

Christian L. Goueguel