Evalml: stdout์—์„œ sklearn UndefinedMetric ๊ฒฝ๊ณ  ์–ต์ œ(F1 ์ ์ˆ˜)

์— ๋งŒ๋“  2020๋…„ 03์›” 02์ผ  ยท  6์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: alteryx/evalml

๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ณผ์ •์—์„œ ์ด๋Ÿฐ ๊ฒฝ๊ณ ๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค.

Screen Shot 2020-03-02 at 4 54 36 PM

์ด์™€ ๊ฐ™์€ ํ˜ธ์ถœ๋กœ ๋” ๊ตฌ์ฒด์ ์œผ๋กœ ํŠธ๋ฆฌ๊ฑฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from evalml.objectives import F1

f1 = F1()
f1.score(y_predicted=[0, 0],
         y_true=[0, 1])

์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ• ์ธก๋ฉด์—์„œ. ์—ฌ๊ธฐ ์ƒ๊ฐ๋‚˜๋Š” ๊ฒƒ์ด ์žˆ์Šต๋‹ˆ๋‹ค

  1. automl ๊ฒ€์ƒ‰ ํ”„๋กœ์„ธ์Šค ๋™์•ˆ ์™„์ „ํžˆ ์นจ๋ฌตํ•ฉ๋‹ˆ๋‹ค. ์ ์ˆ˜๋ฅผ nan ๋˜๋Š” ์ธก์ •ํ•ญ๋ชฉ์— ๋Œ€ํ•ด ๊ฐ€๋Šฅํ•œ ์ตœ์•…์˜ ์ ์ˆ˜๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๊ฒฐ๊ณผ ์‚ฌ์ „์˜ ์–ด๋”˜๊ฐ€์— ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  2. ์—ฌ๋Ÿฌ ์ค„์„ ์ฐจ์ง€ํ•˜์ง€ ์•Š๋Š” ๊น”๋”ํ•œ ๊ฒฝ๊ณ  ๋ฉ”์‹œ์ง€ ๋งŒ๋“ค๊ธฐ
enhancement

๋ชจ๋“  6 ๋Œ“๊ธ€

๋„ค, ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ์ฝ”๋“œ๊ฐ€ ๊ทธ๋ ‡๊ฒŒ ํ•˜์ง€ ์•Š๋Š” ํ•œ ์•„๋ฌด ๊ฒƒ๋„ stdout์— ์ธ์‡„๋˜๋Š” ๊ฒƒ์„ ํ—ˆ์šฉํ•ด์„œ๋Š” ์•ˆ ๋ฉ๋‹ˆ๋‹ค.

์ด๊ฒƒ์ด #311๊ณผ ๊ด€๋ จ์ด ์žˆ๋Š”์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค.

์ด ๋‘ ๊ฐ€์ง€ ์ œ์•ˆ์„ ๋ชจ๋‘ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์ด sklearn stdout ์ถœ๋ ฅ์„ ์–ต์ œํ•˜๊ณ  ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ ์ž์ฒด ๊ฒฝ๊ณ  ๋ฉ”์‹œ์ง€๋„ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

@christopherbunn RE ํšŒ์˜์—์„œ ์ด๊ฒƒ์„ ์–ธ๊ธ‰

@christopherbunn @jeremyliweishih ๋‚˜๋Š” ์ด๊ฒƒ์— ๋Œ€ํ•ด ์ข€ ๋” ์ฝ์—ˆ๊ณ  , ์—ฌ๊ธฐ์—์„œ ๋ชจ๋“  ์ •๋ฐ€๋„ ๋ฐ f1 ๋ชฉํ‘œ(์ด์ง„ ๋ฐ ๋‹ค์ค‘ ํด๋ž˜์Šค)์— ๋Œ€ํ•ด zero_division=0.0 ๋ฅผ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ˆ˜์ •ํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

์„ค๋ช…
์ •๋ฐ€๋„ ๋Š” n_true_pos / (n_true_pos + n_false_pos) ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจ๋ธ์ด ๋ฌธ์ œ์˜ ๋ฐ์ดํ„ฐ ๋ถ„ํ• ์—์„œ ํŠน์ • ๋ ˆ์ด๋ธ”์„ ์ „ํ˜€ ์˜ˆ์ธกํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ํ•ด๋‹น ๋ ˆ์ด๋ธ”์— ๋Œ€ํ•ด ์ฐธ ๋˜๋Š” ๊ฑฐ์ง“ ๊ธ์ •์ด ์—†์œผ๋ฏ€๋กœ 0์œผ๋กœ ๋‚˜๋ˆ„๊ธฐ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. f1์˜ ๊ฒฝ์šฐ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.

๋…ผ์Ÿ
ํด๋ž˜์Šค์˜ ๊ท ํ˜•์„ ์ž˜ ์œ ์ง€ํ•˜๊ณ  ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค(ํ˜„์žฌ๋Š” ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š์ง€๋งŒ ๋ณ„๋„์˜ ์ฃผ์ œ์ธ #194 #457). ๊ทธ๋ ‡๋‹ค๋ฉด ํ›ˆ๋ จ ๋˜๋Š” ๊ฒ€์ฆ ๋ถ„ํ• ์— ํŠน์ • ํด๋ž˜์Šค์˜ ์ธ์Šคํ„ด์Šค๊ฐ€ ๊ฑฐ์˜ ํฌํ•จ๋  ๊ฐ€๋Šฅ์„ฑ์ด ๊ฑฐ์˜ ์—†์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์ •ํ•  ์ˆ˜ ์žˆ๊ณ  ๋ชจ๋ธ์ด ์—ฌ์ „ํžˆ ํŠน์ • ๋ ˆ์ด๋ธ”์— ๋Œ€ํ•ด ์˜ˆ์ธก์„ ํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด, ๋‚˜๋Š” ๊ทธ๊ฒƒ์ด ์ข‹์ง€ ์•Š์€ ๋ชจ๋ธ์ด๋ผ๊ณ  ์ฃผ์žฅํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ์ •๋ฐ€๋„์™€ f1 ๋ชจ๋‘์— ๋Œ€ํ•ด 0์ธ ๊ฐ€๋Šฅํ•œ ๊ฐ€์žฅ ๋‚ฎ์€ ์ ์ˆ˜๋ฅผ ๋ถ€์—ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ข‹์€ ์†Œ๋ฆฌ?

์˜ˆ์‹œ

In [38]: import numpy as np

In [39]: import sklearn.metrics

In [40]: y_true = np.array([0, 0, 0, 0, 1])

In [41]: y_pred = np.array([0, 0, 0, 0, 0])

In [42]: sklearn.metrics.precision_score(y_true, y_pred)
/Users/dylan.sherry/.pyenv/versions/3.8.2/envs/evalml/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[42]: 0.0

In [43]: sklearn.metrics.precision_score(y_true, y_pred, zero_division=0.0)
Out[43]: 0.0

@dsherry ๋Š” ๋‚˜์—๊ฒŒ ์˜๋ฏธ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๊ทธ๊ฒƒ์ด ์šฐ๋ฆฌ๊ฐ€ ์ง€๊ธˆํ•˜๊ณ ์žˆ๋Š” ์ผ์ด์ง€๋งŒ ๊ฒฝ๊ณ ์™€ ํ•จ๊ป˜ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค (๊ธฐ๋ณธ๊ฐ’์€ 0์œผ๋กœ ์„ค์ •๋˜๊ณ  ๊ฒฝ๊ณ ๋„ ๊ฒŒ์‹œํ•˜๋Š” "๊ฒฝ๊ณ "๋กœ ๊ธฐ๋ณธ ์„ค์ •๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์—)

@jeremyliweishih ๋„ค!

@christopherbunn ์—์„œ

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰