Scikit-learn: sklearn.metrics.classification_report๊ฐ€ ์ž˜๋ชป ๋˜์—ˆ์Šต๋‹ˆ๊นŒ?

์— ๋งŒ๋“  2020๋…„ 04์›” 01์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: scikit-learn/scikit-learn

๋ฒ„๊ทธ ์„ค๋ช…

sklearn.metrics.classification์€ ์ •๋ฐ€๋„ ๋ฐ ์žฌํ˜„์œจ์„ ์œ„ํ•ด ๋’ค์ง‘ํžŒ ๊ฐ’์„๋ณด๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์žฌํ˜„ ํ•  ๋‹จ๊ณ„ / ์ฝ”๋“œ

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets

def calc_precision_recall(conf_matrix, class_labels):

    # for each class
    for i in range(len(class_labels)):

        # calculate true positives
        true_positives =(conf_matrix[i, i])

        # false positives
        false_positives = (conf_matrix[i, :].sum() - true_positives)

        # false negatives
        false_negatives = 0
        for j in range(len(class_labels)):
            false_negatives += conf_matrix[j, i]
        false_negatives -= true_positives

        # and finally true negatives
        true_negatives= (conf_matrix.sum() - false_positives - false_negatives - true_positives)

        # print calculated values
        print(
            "Class label", class_labels[i],
            "T_positive", true_positives,
            "F_positive", false_positives,
            "T_negative", true_negatives,
            "F_negative", false_negatives,
            "\nSensitivity/recall", true_positives / (true_positives + false_negatives),
            "Specificity", true_negatives / (true_negatives + false_positives),
            "Precision", true_positives/(true_positives+false_positives), "\n"
        )

    return

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, 0:3]  # we only take the first two features.
y = iris.target

# Random_state parameter is just a random seed that can be used to reproduce these specific results.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=27)

# Instantiate a K-Nearest Neighbors Classifier:
KNN_model = KNeighborsClassifier(n_neighbors=2)

# Fit the classifiers:
KNN_model.fit(X_train, y_train)

# Predict and store the prediction:
KNN_prediction = KNN_model.predict(X_test)

# Generate the confusion matrix
conf_matrix = confusion_matrix(KNN_prediction, y_test)

# Print the classification report
print(classification_report(KNN_prediction, y_test))

# Dummy class labels for the three iris classes
class_labels = [0,1,2]

# Own function to calculate precision and recall from the confusion matrix
calc_precision_recall(conf_matrix, class_labels)

์˜ˆ์ƒ ๊ฒฐ๊ณผ

๋‚ด ํ•จ์ˆ˜๋Š” ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•ด ๋‹ค์Œ์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

ํด๋ž˜์Šค ๋ ˆ์ด๋ธ” 0 T_positive 7 F_positive 0 T_negative 23 F_negative 0
๋ฏผ๊ฐ๋„ / ํšŒ์ƒ ๋ ฅ 1.0 ํŠน์ด์„ฑ 1.0 ์ •๋ฐ€๋„ 1.0

ํด๋ž˜์Šค ๋ ˆ์ด๋ธ” 1 T_positive 11 F_positive 1 T_negative 18 F_negative 0
๋ฏผ๊ฐ๋„ / ํšŒ์ƒ ์œจ 1.0 ํŠน์ด๋„ 0.9473684210526315 ์ •๋ฐ€๋„ 0.9166666666666666

ํด๋ž˜์Šค ๋ ˆ์ด๋ธ” 2 T_positive 11 F_positive 0 T_negative 18 F_negative 1
๋ฏผ๊ฐ๋„ / ์žฌํ˜„์œจ 0.9166666666666666 ํŠน์ด์„ฑ 1.0 ์ •๋ฐ€๋„ 1.0

          precision    recall  

       0       1.00      1.00      
       1       0.92      1.00    
       2       1.00      0.92

๋‚ด ํ•จ์ˆ˜๋Š” ํ˜ผ๋™ ํ–‰๋ ฌ์ด ์ƒ๋‹จ x ์ถ•์˜ ์‹ค์ œ ๊ฐ’๊ณผ ์™ผ์ชฝ y ์ถ•์˜ ์˜ˆ์ธก ๊ฐ’์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ Wikipedia์—์„œ ์‚ฌ์šฉ ๋œ ๊ตฌ์กฐ์™€ ํ˜ผ๋™ ํ–‰๋ ฌ ํ•จ์ˆ˜์— ๋Œ€ํ•œ ๋ฌธ์„œ์—์„œ ์ฐธ์กฐ ๋œ ๊ตฌ์กฐ์™€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค.

์‹ค์ œ ๊ฒฐ๊ณผ

๋Œ€์กฐ์ ์œผ๋กœ ์ด๊ฒƒ์€ sklearn.metrics import classification_report์— ์˜ํ•ด๋ณด๊ณ  ๋œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

           precision    recall  f1-score   support

       0       1.00      1.00      1.00         7
       1       1.00      0.92      0.96        12
       2       0.92      1.00      0.96        11

๋ฒ„์ „

์ฒด๊ณ„:
python : 3.8.1 (๊ธฐ๋ณธ๊ฐ’, 2020 ๋…„ 1 ์›” 8 ์ผ, 22:29:32) [GCC 7.3.0]
์‹คํ–‰ ํŒŒ์ผ : / home / will / anaconda3 / envs / ElStatLearn / bin / python
์ปดํ“จํ„ฐ : Linux-4.15.0-91-generic-x86_64-with-glibc2.10

Python ์ข…์†์„ฑ :
ํ• : 20.0.2
setuptools : 38.2.5
sklearn : 0.22.1
numpy : 1.18.1
scipy : 1.4.1
Cython : ์—†์Œ
ํŒ๋‹ค : 1.0.1
matplotlib : 3.1.3
joblib : 0.14.1

OpenMP๋กœ ๊ตฌ์ถ• : True

triage metrics

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

y_test ๊ฐ€ print(classification_report(KNN_prediction, y_test)) ์—์„œ ๋จผ์ € ์™€์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ : print(classification_report(y_test, KNN_prediction)) .

sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2, output_dict=False, zero_division='warn') ํ•จ์ˆ˜์˜ ์ฒซ ๋ฒˆ์งธ ์ธ์ˆ˜๋Š” y_true ์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์ •๋ฐ€๋„์™€ ๋ฆฌ์ฝœ์„ ๋’ค์ง‘์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

classification_report๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

ํŽธ์ง‘ : ํ˜ผ๋™ ํ–‰๋ ฌ๋„ ๊ฑฐ๊พธ๋กœ๋˜์–ด ์žˆ์ง€๋งŒ sklearn์˜ ํ˜ผ๋™ ํ–‰๋ ฌ์ด ์œ„ํ‚คํ”ผ๋””์•„์—์„œ ๊ฑฐ๊พธ๋กœ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

1 ํ–‰์— 1 ๊ฐœ์˜ ๊ด€์ธก์น˜๊ฐ€ ์žˆ๊ณ  1 ์—ด์— 0์ด ์žˆ์œผ๋ฏ€๋กœ ํ–‰์€ ์‹ค์ธก ๊ฐ’์ด๊ณ  ์—ด์€ ์˜ˆ์ธก์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ confusion_matrix์— ํ‘œ์‹œ๋œ C[i, j] ํ‘œ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋“  3 ๋Œ“๊ธ€

y_test ๊ฐ€ print(classification_report(KNN_prediction, y_test)) ์—์„œ ๋จผ์ € ์™€์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ : print(classification_report(y_test, KNN_prediction)) .

sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2, output_dict=False, zero_division='warn') ํ•จ์ˆ˜์˜ ์ฒซ ๋ฒˆ์งธ ์ธ์ˆ˜๋Š” y_true ์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์ •๋ฐ€๋„์™€ ๋ฆฌ์ฝœ์„ ๋’ค์ง‘์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

classification_report๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

ํŽธ์ง‘ : ํ˜ผ๋™ ํ–‰๋ ฌ๋„ ๊ฑฐ๊พธ๋กœ๋˜์–ด ์žˆ์ง€๋งŒ sklearn์˜ ํ˜ผ๋™ ํ–‰๋ ฌ์ด ์œ„ํ‚คํ”ผ๋””์•„์—์„œ ๊ฑฐ๊พธ๋กœ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

1 ํ–‰์— 1 ๊ฐœ์˜ ๊ด€์ธก์น˜๊ฐ€ ์žˆ๊ณ  1 ์—ด์— 0์ด ์žˆ์œผ๋ฏ€๋กœ ํ–‰์€ ์‹ค์ธก ๊ฐ’์ด๊ณ  ์—ด์€ ์˜ˆ์ธก์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ confusion_matrix์— ํ‘œ์‹œ๋œ C[i, j] ํ‘œ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์„ค๋ช… ํ•ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์œ„ํ‚คํ”ผ๋””์•„ ์ฐธ๊ณ  ์ž๋ฃŒ๊ฐ€ ํ˜ผ๋ž€ ์Šค๋Ÿฌ์› ์Šต๋‹ˆ๋‹ค!

๋ฌธ์ œ ์—†์Šต๋‹ˆ๋‹ค. Wikipedia์—์„œ ์˜ˆ์ œ๋ฅผ sklearn ๋ฐฉํ–ฅ์œผ๋กœ ์ „ํ™˜ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰