Scikit-learn: sklearn.metrics.classification_report incorrect?

Created on 1 Apr 2020 · 3Comments · Source: scikit-learn/scikit-learn

Describe the bug

sklearn.metrics.classification may report flipped values for precision and recall?

Steps/Code to Reproduce

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets

def calc_precision_recall(conf_matrix, class_labels):

    # for each class
    for i in range(len(class_labels)):

        # calculate true positives
        true_positives =(conf_matrix[i, i])

        # false positives
        false_positives = (conf_matrix[i, :].sum() - true_positives)

        # false negatives
        false_negatives = 0
        for j in range(len(class_labels)):
            false_negatives += conf_matrix[j, i]
        false_negatives -= true_positives

        # and finally true negatives
        true_negatives= (conf_matrix.sum() - false_positives - false_negatives - true_positives)

        # print calculated values
        print(
            "Class label", class_labels[i],
            "T_positive", true_positives,
            "F_positive", false_positives,
            "T_negative", true_negatives,
            "F_negative", false_negatives,
            "\nSensitivity/recall", true_positives / (true_positives + false_negatives),
            "Specificity", true_negatives / (true_negatives + false_positives),
            "Precision", true_positives/(true_positives+false_positives), "\n"
        )

    return

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, 0:3]  # we only take the first two features.
y = iris.target

# Random_state parameter is just a random seed that can be used to reproduce these specific results.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=27)

# Instantiate a K-Nearest Neighbors Classifier:
KNN_model = KNeighborsClassifier(n_neighbors=2)

# Fit the classifiers:
KNN_model.fit(X_train, y_train)

# Predict and store the prediction:
KNN_prediction = KNN_model.predict(X_test)

# Generate the confusion matrix
conf_matrix = confusion_matrix(KNN_prediction, y_test)

# Print the classification report
print(classification_report(KNN_prediction, y_test))

# Dummy class labels for the three iris classes
class_labels = [0,1,2]

# Own function to calculate precision and recall from the confusion matrix
calc_precision_recall(conf_matrix, class_labels)

Expected Results

My function returns the following for each class:

Class label 0 T_positive 7 F_positive 0 T_negative 23 F_negative 0
Sensitivity/recall 1.0 Specificity 1.0 Precision 1.0

Class label 1 T_positive 11 F_positive 1 T_negative 18 F_negative 0
Sensitivity/recall 1.0 Specificity 0.9473684210526315 Precision 0.9166666666666666

Class label 2 T_positive 11 F_positive 0 T_negative 18 F_negative 1
Sensitivity/recall 0.9166666666666666 Specificity 1.0 Precision 1.0

          precision    recall  

       0       1.00      1.00      
       1       0.92      1.00    
       2       1.00      0.92

My function assumes the confusion matrix is structured with actual values on the top x-axis and predicted values down the left y-axis. This is the same structure as the one used in Wikipedia and the one referenced in the documentation for the confusion matrix function.

Actual Results

In contrast these are the results reported by sklearn.metrics import classification_report

           precision    recall  f1-score   support

       0       1.00      1.00      1.00         7
       1       1.00      0.92      0.96        12
       2       0.92      1.00      0.96        11

Versions

System:
python: 3.8.1 (default, Jan 8 2020, 22:29:32) [GCC 7.3.0]
executable: /home/will/anaconda3/envs/ElStatLearn/bin/python
machine: Linux-4.15.0-91-generic-x86_64-with-glibc2.10

Python dependencies:
pip: 20.0.2
setuptools: 38.2.5
sklearn: 0.22.1
numpy: 1.18.1
scipy: 1.4.1
Cython: None
pandas: 1.0.1
matplotlib: 3.1.3
joblib: 0.14.1

Built with OpenMP: True

triage metrics

Source

AntiDoctor

Most helpful comment

I think that y_test should come first in print(classification_report(KNN_prediction, y_test)).

So: print(classification_report(y_test, KNN_prediction)).

The function sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2, output_dict=False, zero_division='warn') has y_true as the first argument. This would flip precision and recall.

See classification_report.

Edit: your confusion matrix is backwards too, but it works out because sklearn's confusion matrix is backwards from wikipedia.

>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

You can see that there is 1 observation in row 1 and 0 in column 1, so the rows are ground truth and columns are predictions. So you can use the C[i, j] notation shown at confusion_matrix

ericbassett on 9 Apr 2020

👍2

All 3 comments

I think that y_test should come first in print(classification_report(KNN_prediction, y_test)).

So: print(classification_report(y_test, KNN_prediction)).

See classification_report.

Edit: your confusion matrix is backwards too, but it works out because sklearn's confusion matrix is backwards from wikipedia.

>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

You can see that there is 1 observation in row 1 and 0 in column 1, so the rows are ground truth and columns are predictions. So you can use the C[i, j] notation shown at confusion_matrix

ericbassett on 9 Apr 2020

👍2

Thank you so much for clarrifying that - the wikipedia reference had me confused!

AntiDoctor on 14 Apr 2020

No problem, probably should get Wikipedia to switch their example to the sklearn orientation.

ericbassett on 14 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings