Benchmark · Fairness

fairevaluate(classifiers, X, y; measures=nothing, measure=nothing, grp=:class, priv_grps, random_seed=12345, n_grps=6)

Performed paired t-test for each pair of classifier in classifiers and return p values and t statistics.

Arguments

classifiers: Array of classifiers to compare
X: DataFrame with features and protected attribute
y: Binary Target Variable
measures=nothing: The measures to be evaluated and used for HypothesisTests. If this is not specified, the measure argument is used
measure=nothing: The performance/fairness measure used to perform hypothesis tests. If no values for measure is passed, then Disparate Impact will be used by default.
grp=:class: Protected Attribute Name
priv_grps=nothing: If default measure i.e. Disparate Impact is used, then pass an array of groups which are privileged in dataset.
random_seed=12345: Random seed to ensure reproducibility
n_grps=6: Number of folds for cross validation

Returns

A dictionary with following keys vs values is returned

measures: names of the measures
classifier_names: names of the classifiers. If a pipeline is used, it will show pipeline and associated number.
results: 3-dimensional array with evaluation result. Its size is measures x classifiers x fold_number.
pvalues: 3-dimensional array with pvalues for each pair of classifier. Its size is measures x classifiers x classifiers.
tstats:3-dimensional array with tstats for each pair of classifier. Its size is measures x classifiers x classifiers.