Log fairness classification metrics to Neptune

Train your model and run predictions

Let’s train a model on a synthetic problem predict on test data.

[2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.linear_model import LogisticRegression

TARGET_COLS = 'two_year_recid'
NUMERICAL_FEATURE_COLS = ['age',
                          'juv_fel_count','juv_misd_count','juv_other_count',
                          'priors_count','jail_time']
CATEGORICAL_FEATURE_COLS = ['sex','race',
                            'c_charge_degree']
FEATURE_NAMES = NUMERICAL_FEATURE_COLS+CATEGORICAL_FEATURE_COLS

data = pd.read_csv('../data/processed/compas-scores-two-years-processed.csv')

train, test = train_test_split(data, test_size=0.2, random_state=1234)

X_train, y_train = train[FEATURE_NAMES], train[TARGET_COLS]
X_test, y_test = test[FEATURE_NAMES], test[TARGET_COLS]

clf = LogisticRegression(random_state=1234)
clf.fit(X_train, y_train)

y_test_pred = clf.predict_proba(X_test)
test['recid_prediction_score'] = y_test_pred[:,1]
test['recid_prediction_class'] = (test['recid_prediction_score'] >0.5).astype(int)

roc_auc = roc_auc_score(y_test, y_test_pred[:,1])

Instantiate Neptune

[ ]:
import neptune

neptune.init(project_qualified_name='USER_NAME/PROJECT_NAME')

Send all fairness classification metrics to Neptune

With just one function call you can log a lot of information.

Metrics:

  • true_positive_rate_difference
  • false_positive_rate_difference
  • false_omission_rate_difference
  • false_discovery_rate_difference, error_rate_difference
  • false_positive_rate_ratio, false_negative_rate_ratio, false_omission_rate_ratio
  • false_discovery_rate_ratio, error_rate_ratio, average_odds_difference
  • disparate_impact, statistical_parity_difference, equal_opportunity_difference
  • theil_index, between_group_theil_index, between_all_groups_theil_index
  • coefficient_of_variation, between_group_coefficient_of_variation, between_all_groups_coefficient_of_variation
  • generalized_entropy_index, between_group_generalized_entropy_index, between_all_groups_generalized_entropy_index

Performance by group charts:

  • confusion matrix
  • TPR, TNR, FPR, FNR, PPV, NPV, FDR, FOR
  • ACC, error_rate, selection_rate, power
  • precision, recall
  • sensitivity, specificity
[2]:
from neptunecontrib.monitoring.fairness import log_fairness_classification_metrics

with neptune.create_experiment()
    neptune.log_metric('roc_auc',roc_auc)
    log_fairness_classification_metrics(test['two_year_recid'], test['recid_prediction_class'],
                                        test['recid_prediction_score'], test[['race']],
                                        favorable_label=0, unfavorable_label=1,
                                        privileged_groups={'race':[3]}, unprivileged_groups={'race':[1,2,4,5,6]})

It is now safely logged in Neptune. Check out this experiment.

fairness classification metrics

[ ]: