Correlation Coefficients: Pearson, Spearman, Kendall, Chatterjee, and Biweight Midcorrelation
Source:R/correlation.R
correlation.Rd
Computes various correlation coefficients between a specified response variable and each of the remaining variables in a given data frame or tibble. The available correlation methods are Pearson's product-moment correlation (parametric), Spearman's rank correlation, Kendall's tau correlation (non-parametric), Chatterjee's new correlation coefficient, and the biweight midcorrelation (a robust correlation measure).
Usage
correlation(
x,
var,
method = "pearson",
plot = FALSE,
color = "#111D71",
interactive = FALSE
)
Arguments
- x
A data frame or tibble containing the variables of interest.
- var
A character string specifying the name of the response variable.
- method
A character string indicating the correlation method to use. Allowed values are "pearson", "spearman", "kendall", "chatterjee", or "bicor" (for biweight midcorrelation). The default is "pearson".
- plot
A logical value indicating whether to produce a visualization of the correlations. Default is FALSE (no plot).
- color
A character string specifying the color to use for the plot. Default is "#111D71".
- interactive
A logical value indicating whether to create an interactive plot using plotly. Default is FALSE (static ggplot2 plot).
Value
A list containing:
correlation
: A tibble with columns for the variable name, correlation value, and method used.plot
: Ifplot = TRUE
, aggplot2
object (or aplotly
object ifinteractive = TRUE
).
Details
The Pearson correlation coefficient measures the linear relationship between two continuous variables and is suitable when the data follows a bivariate normal distribution. The Spearman and Kendall correlations are non-parametric measures of monotonic association, making them suitable for non-linear relationships and when the data deviates from normality. The Chatterjee correlation coefficient is a recently proposed measure that aims to address some limitations of existing correlation coefficients, particularly for heavy-tailed distributions and in the presence of outliers. The biweight midcorrelation is a robust correlation measure that downweights the influence of outliers and is recommended when the data contains extreme values or deviates significantly from normality.