Scikit-learn: sklearn.metrics.classification_report가 잘못 되었습니까?

에 만든 2020년 04월 01일 · 3코멘트 · 출처: scikit-learn/scikit-learn

버그 설명

sklearn.metrics.classification은 정밀도 및 재현율을 위해 뒤집힌 값을보고 할 수 있습니까?

재현 할 단계 / 코드

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets

def calc_precision_recall(conf_matrix, class_labels):

    # for each class
    for i in range(len(class_labels)):

        # calculate true positives
        true_positives =(conf_matrix[i, i])

        # false positives
        false_positives = (conf_matrix[i, :].sum() - true_positives)

        # false negatives
        false_negatives = 0
        for j in range(len(class_labels)):
            false_negatives += conf_matrix[j, i]
        false_negatives -= true_positives

        # and finally true negatives
        true_negatives= (conf_matrix.sum() - false_positives - false_negatives - true_positives)

        # print calculated values
        print(
            "Class label", class_labels[i],
            "T_positive", true_positives,
            "F_positive", false_positives,
            "T_negative", true_negatives,
            "F_negative", false_negatives,
            "\nSensitivity/recall", true_positives / (true_positives + false_negatives),
            "Specificity", true_negatives / (true_negatives + false_positives),
            "Precision", true_positives/(true_positives+false_positives), "\n"
        )

    return

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, 0:3]  # we only take the first two features.
y = iris.target

# Random_state parameter is just a random seed that can be used to reproduce these specific results.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=27)

# Instantiate a K-Nearest Neighbors Classifier:
KNN_model = KNeighborsClassifier(n_neighbors=2)

# Fit the classifiers:
KNN_model.fit(X_train, y_train)

# Predict and store the prediction:
KNN_prediction = KNN_model.predict(X_test)

# Generate the confusion matrix
conf_matrix = confusion_matrix(KNN_prediction, y_test)

# Print the classification report
print(classification_report(KNN_prediction, y_test))

# Dummy class labels for the three iris classes
class_labels = [0,1,2]

# Own function to calculate precision and recall from the confusion matrix
calc_precision_recall(conf_matrix, class_labels)

예상 결과

내 함수는 각 클래스에 대해 다음을 반환합니다.

클래스 레이블 0 T_positive 7 F_positive 0 T_negative 23 F_negative 0
민감도 / 회상 력 1.0 특이성 1.0 정밀도 1.0

클래스 레이블 1 T_positive 11 F_positive 1 T_negative 18 F_negative 0
민감도 / 회상 율 1.0 특이도 0.9473684210526315 정밀도 0.9166666666666666

클래스 레이블 2 T_positive 11 F_positive 0 T_negative 18 F_negative 1
민감도 / 재현율 0.9166666666666666 특이성 1.0 정밀도 1.0

          precision    recall  

       0       1.00      1.00      
       1       0.92      1.00    
       2       1.00      0.92

내 함수는 혼동 행렬이 상단 x 축의 실제 값과 왼쪽 y 축의 예측 값으로 구성되어 있다고 가정합니다. 이것은 Wikipedia에서 사용 된 구조와 혼동 행렬 함수에 대한 문서에서 참조 된 구조와 동일합니다.

실제 결과

대조적으로 이것은 sklearn.metrics import classification_report에 의해보고 된 결과입니다.

           precision    recall  f1-score   support

       0       1.00      1.00      1.00         7
       1       1.00      0.92      0.96        12
       2       0.92      1.00      0.96        11

버전

체계:
python : 3.8.1 (기본값, 2020 년 1 월 8 일, 22:29:32) [GCC 7.3.0]
실행 파일 : / home / will / anaconda3 / envs / ElStatLearn / bin / python
컴퓨터 : Linux-4.15.0-91-generic-x86_64-with-glibc2.10

Python 종속성 :
핍 : 20.0.2
setuptools : 38.2.5
sklearn : 0.22.1
numpy : 1.18.1
scipy : 1.4.1
Cython : 없음
판다 : 1.0.1
matplotlib : 3.1.3
joblib : 0.14.1

OpenMP로 구축 : True

triage metrics

출처

AntiDoctor

가장 유용한 댓글

y_test 가 print(classification_report(KNN_prediction, y_test)) 에서 먼저 와야한다고 생각합니다.

따라서 : print(classification_report(y_test, KNN_prediction)) .

sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2, output_dict=False, zero_division='warn') 함수의 첫 번째 인수는 y_true 입니다. 이것은 정밀도와 리콜을 뒤집을 것입니다.

classification_report를 참조하십시오.

편집 : 혼동 행렬도 거꾸로되어 있지만 sklearn의 혼동 행렬이 위키피디아에서 거꾸로되어 있기 때문에 작동합니다.

>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

1 행에 1 개의 관측치가 있고 1 열에 0이 있으므로 행은 실측 값이고 열은 예측입니다. 따라서 confusion_matrix에 표시된 C[i, j] 표기법을 사용할 수 있습니다.

ericbassett 에 2020년 04월 09일

👍2

모든 3 댓글

y_test 가 print(classification_report(KNN_prediction, y_test)) 에서 먼저 와야한다고 생각합니다.

따라서 : print(classification_report(y_test, KNN_prediction)) .

classification_report를 참조하십시오.

편집 : 혼동 행렬도 거꾸로되어 있지만 sklearn의 혼동 행렬이 위키피디아에서 거꾸로되어 있기 때문에 작동합니다.

>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

ericbassett 에 2020년 04월 09일

👍2

설명 해주셔서 감사합니다. 위키피디아 참고 자료가 혼란 스러웠습니다!

AntiDoctor 에 2020년 04월 14일

문제 없습니다. Wikipedia에서 예제를 sklearn 방향으로 전환해야합니다.

ericbassett 에 2020년 04월 14일

이 페이지가 도움이 되었나요?

0 / 5 - 0 등급