Scikit-learn: sklearn.metrics.classification_report गलत है?

को निर्मित 1 अप्रैल 2020 · 3टिप्पणियाँ · स्रोत: scikit-learn/scikit-learn

बग का वर्णन करें

sklearn.metrics.classification सटीक और याद के लिए फ़्लिप किए गए मानों की रिपोर्ट कर सकता है?

चरण / कोड को पुन: प्रस्तुत करना

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets

def calc_precision_recall(conf_matrix, class_labels):

    # for each class
    for i in range(len(class_labels)):

        # calculate true positives
        true_positives =(conf_matrix[i, i])

        # false positives
        false_positives = (conf_matrix[i, :].sum() - true_positives)

        # false negatives
        false_negatives = 0
        for j in range(len(class_labels)):
            false_negatives += conf_matrix[j, i]
        false_negatives -= true_positives

        # and finally true negatives
        true_negatives= (conf_matrix.sum() - false_positives - false_negatives - true_positives)

        # print calculated values
        print(
            "Class label", class_labels[i],
            "T_positive", true_positives,
            "F_positive", false_positives,
            "T_negative", true_negatives,
            "F_negative", false_negatives,
            "\nSensitivity/recall", true_positives / (true_positives + false_negatives),
            "Specificity", true_negatives / (true_negatives + false_positives),
            "Precision", true_positives/(true_positives+false_positives), "\n"
        )

    return

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, 0:3]  # we only take the first two features.
y = iris.target

# Random_state parameter is just a random seed that can be used to reproduce these specific results.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=27)

# Instantiate a K-Nearest Neighbors Classifier:
KNN_model = KNeighborsClassifier(n_neighbors=2)

# Fit the classifiers:
KNN_model.fit(X_train, y_train)

# Predict and store the prediction:
KNN_prediction = KNN_model.predict(X_test)

# Generate the confusion matrix
conf_matrix = confusion_matrix(KNN_prediction, y_test)

# Print the classification report
print(classification_report(KNN_prediction, y_test))

# Dummy class labels for the three iris classes
class_labels = [0,1,2]

# Own function to calculate precision and recall from the confusion matrix
calc_precision_recall(conf_matrix, class_labels)

अपेक्षित परिणाम

मेरा कार्य प्रत्येक वर्ग के लिए निम्नलिखित है:

कक्षा लेबल 0 T_positive 7 F_positive 0 T_negative 23 F_negative 0
संवेदनशीलता / रिकॉल 1.0 विशिष्टता 1.0 परिशुद्धता 1.0

कक्षा लेबल 1 T_positive 11 F_positive 1 T_negative 18 F_negative 0
संवेदनशीलता / स्मरण 1.0 विशिष्टता 0.9473684210526315 परिशुद्धता 0.9166666666666666

कक्षा लेबल 2 T_positive 11 F_positive 0 T_negative 18 F_negative 1
संवेदनशीलता / स्मरण ०.११६६६६६६६६६६६६६६ विशिष्ट १.० शुद्धता १.०

          precision    recall  

       0       1.00      1.00      
       1       0.92      1.00    
       2       1.00      0.92

मेरा फ़ंक्शन मानता है कि मैट्रिक्स शीर्ष x- अक्ष पर वास्तविक मानों के साथ संरचित है और बाईं y- अक्ष के नीचे अनुमानित मान है। यह वही संरचना है जो विकिपीडिया में उपयोग की गई है और एक भ्रम मैट्रिक्स फ़ंक्शन के लिए प्रलेखन में संदर्भित है।

वास्तविक परिणाम

इसके विपरीत ये परिणाम sklearn.metrics द्वारा वर्गीकृत किए गए परिणाम हैं

           precision    recall  f1-score   support

       0       1.00      1.00      1.00         7
       1       1.00      0.92      0.96        12
       2       0.92      1.00      0.96        11

संस्करणों

प्रणाली:
अजगर: 3.8.1 (डिफ़ॉल्ट, 8 जनवरी 2020, 22:29:32) [जीसीसी 7.3.0]
निष्पादन योग्य: / घर / इच्छा / anaconda3 / envs / ElStatLearn / bin / python
मशीन: लिनक्स-4.15.0-91-जेनेरिक- x86_64-with-glibc2.10

अजगर निर्भरता:
पाइप: 20.0.2
सेटप्टूल: 38.2.5
sklearn: 0.22.1
सुन्न: 1.18.1
डरपोक: १.४.१
साइथन: कोई नहीं
पांडा: 1.0.1
matplotlib: 3.1.3
joblib: 0.14.1

OpenMP के साथ निर्मित: सच

triage metrics

स्रोत

AntiDoctor

सबसे उपयोगी टिप्पणी

मुझे लगता है कि y_test को पहले print(classification_report(KNN_prediction, y_test)) में आना चाहिए।

तो: print(classification_report(y_test, KNN_prediction)) ।

फ़ंक्शन sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2, output_dict=False, zero_division='warn') का पहला तर्क के रूप में y_true है। यह सटीक और याद फ्लिप होगा।

वर्गीकरण_ देखें।

संपादित करें: आपका भ्रम मैट्रिक्स पीछे की ओर भी है, लेकिन यह काम करता है क्योंकि स्केलेर का भ्रम मैट्रिक्स विकिपीडिया से पीछे है।

>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

आप देख सकते हैं कि कॉलम 1 में पंक्ति 1 और 0 में 1 अवलोकन है, इसलिए पंक्तियां जमीनी सच्चाई हैं और कॉलम भविष्यवाणियां हैं। तो आप confusion_matrix में दिखाए गए C[i, j] नोटेशन का उपयोग कर सकते हैं

ericbassett 9 अप्रैल 2020

👍2

सभी 3 टिप्पणियाँ

मुझे लगता है कि y_test को पहले print(classification_report(KNN_prediction, y_test)) में आना चाहिए।

तो: print(classification_report(y_test, KNN_prediction)) ।

वर्गीकरण_ देखें।

>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

ericbassett 9 अप्रैल 2020

👍2

आपको धन्यवाद देने के लिए बहुत बहुत धन्यवाद - विकिपीडिया संदर्भ ने मुझे भ्रमित कर दिया था!

AntiDoctor 14 अप्रैल 2020

कोई बात नहीं, शायद अपने उदाहरण को स्केलेर ओरिएंटेशन पर स्विच करने के लिए विकिपीडिया मिलना चाहिए।

ericbassett 14 अप्रैल 2020

क्या यह पृष्ठ उपयोगी था?

0 / 5 - 0 रेटिंग्स