Scikit-learn: MSE๋Š” cross_val_score์—์„œ ๋ฐ˜ํ™˜ ๋  ๋•Œ ์Œ์ˆ˜์ž…๋‹ˆ๋‹ค.

์— ๋งŒ๋“  2013๋…„ 09์›” 12์ผ  ยท  58์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: scikit-learn/scikit-learn

sklearn.cross_validation.cross_val_score์—์„œ ๋ฐ˜ํ™˜ ๋œ ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ๋Š” ํ•ญ์ƒ ์Œ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ผ๋ถ€ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์ฃผ์–ด์ง€๋ฉด์ด ํ•จ์ˆ˜์˜ ์ถœ๋ ฅ์„ ์ตœ๋Œ€ํ™”์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋œ ๊ฒฐ์ •์ด์ง€๋งŒ cross_val_score๋ฅผ ์ง์ ‘ ์‚ฌ์šฉํ•  ๋•Œ๋Š” ๋งค์šฐ ํ˜ผ๋ž€ ์Šค๋Ÿฝ์Šต๋‹ˆ๋‹ค. ์ ์–ด๋„ ๋‚˜๋Š” ์ œ๊ณฑ์˜ ํ‰๊ท ์ด ์–ด๋–ป๊ฒŒ ์Œ์ˆ˜๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š”์ง€ ์Šค์Šค๋กœ์—๊ฒŒ ๋ฌผ์—ˆ๊ณ  cross_val_score๊ฐ€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ž‘๋™ํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ ์ œ๊ณต๋œ ๋ฉ”ํŠธ๋ฆญ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค. sklearn ์†Œ์Šค ์ฝ”๋“œ๋ฅผ ํŒŒํ—ค์นœ ํ›„์—์•ผ ํ‘œ์ง€ํŒ์ด ๋’ค์ง‘ํžŒ ๊ฒƒ์„ ๊นจ๋‹ฌ์•˜์Šต๋‹ˆ๋‹ค.

์ด ๋™์ž‘์€ scorer.py์˜ make_scorer์— ์–ธ๊ธ‰๋˜์–ด ์žˆ์ง€๋งŒ cross_val_score์—๋Š” ์–ธ๊ธ‰๋˜์–ด ์žˆ์ง€ ์•Š์œผ๋ฉฐ ๊ทธ๋ ‡๊ฒŒ๋˜์–ด์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

API Bug Documentation

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์•„๋งˆ๋„ negmse๊ฐ€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋ชจ๋“  58 ๋Œ“๊ธ€

๋‹น์‹ ์€

greater_is_better : boolean, default=True

Whether score_func is a score function (default), meaning high is good, 
or a loss function, meaning low is good. In the latter case, the scorer 
object will sign-flip the outcome of the score_func.

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html์—์„œ
? (์ฐธ๊ณ ๋กœ)

cross_val_score ๋ฌธ์„œ์—์„œ ๋” ๋ช…ํ™•ํ•ด์งˆ ์ˆ˜ ์žˆ๋‹ค๋Š” ๋ฐ ๋™์˜ํ•ฉ๋‹ˆ๋‹ค.

์‹ ๊ณ  ํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค

์‚ฌ์‹ค ์šฐ๋ฆฌ๋Š” Scorer ๋ฆฌํŒฉํ† ๋ง์„ ํ•  ๋•Œ ๊ทธ ๋ฌธ์ œ๋ฅผ ๊ฐ„๊ณผํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ๋งค์šฐ ๋ฐ˜ ์ง๊ด€์ ์ž…๋‹ˆ๋‹ค.

>>> import numpy as np
>>> from sklearn.datasets import load_boston
>>> from sklearn.linear_model import RidgeCV
>>> from sklearn.cross_validation import cross_val_score

>>> boston = load_boston()
>>> np.mean(cross_val_score(RidgeCV(), boston.data, boston.target, scoring='mean_squared_error'))
-154.53681864311497

/ cc @larsmans

BTW ๋ฌธ์„œ ๋ฌธ์ œ๋ผ๋Š” ๋ฐ ๋™์˜ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. cross_val_score ๋Š” ์ ์ˆ˜ ์ด๋ฆ„๊ณผ ์ผ์น˜ํ•˜๋Š” ๋ถ€ํ˜ธ๊ฐ€์žˆ๋Š” ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์ด์ƒ์ ์œผ๋กœ๋Š” GridSearchCV(*params).fit(X, y).best_score_ ๋„ ์ผ๊ด€์„ฑ์ด ์žˆ์–ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด API๊ฐ€ ๋งค์šฐ ํ˜ผ๋ž€ ์Šค๋Ÿฝ์Šต๋‹ˆ๋‹ค.

๋‚˜๋Š” ๋˜ํ•œ ๋ถ€ํ˜ธ๋ฅผ ๋ฐ”๊พธ์ง€ ์•Š๊ณ  ์‹ค์ œ MSE๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ๋ณ€๊ฒฝ์ด ๋” ๋‚˜์€ ์˜ต์…˜์ด ๋  ๊ฒƒ์ด๋ผ๋Š” ๋ฐ ๋™์˜ํ•ฉ๋‹ˆ๋‹ค.

๋“์ ์ž ๊ฐ์ฒด๋Š” greater_is_better ํ”Œ๋ž˜๊ทธ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋“์ ์ž๊ฐ€ ์‚ฌ์šฉ๋  ๋•Œ๋งˆ๋‹ค ํ•„์š”ํ•œ ๊ฒฝ์šฐ๋ฅผ ๋Œ€๋น„ํ•˜์—ฌ ๋ถ€ํ˜ธ๋ฅผ ๋’ค์ง‘์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (์˜ˆ : GridSearchCV .

์—ฌ๊ธฐ์— ์‚ฌ์šฉ์„ฑ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค๋Š” ๋ฐ ๋™์˜ํ•˜์ง€๋งŒ @ogrisel ์˜ ์†”๋ฃจ์…˜์— ์™„์ „ํžˆ ๋™์˜ํ•˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค.

์ ์ˆ˜ ์ด๋ฆ„๊ณผ ์ผ์น˜ํ•˜๋Š” ๋ถ€ํ˜ธ๊ฐ€์žˆ๋Š” ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์žฅ๊ธฐ์ ์œผ๋กœ ๋ณด๋ฉด ์‹ ๋ขฐํ•  ์ˆ˜์—†๋Š” ํ•ดํ‚น์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ˆ„๊ตฐ๊ฐ€ mse ์™€ ๊ฐ™์€ ์ด๋ฆ„์œผ๋กœ ์‚ฌ์šฉ์ž ์ง€์ • ์ฑ„์ ์ž๋ฅผ ์ •์˜ํ•˜๋ฉด ์–ด๋–ป๊ฒŒ๋ฉ๋‹ˆ๊นŒ? ์ด๋ฆ„ ์ง€์ • ํŒจํ„ด์„ ๋”ฐ๋ฅด์ง€๋งŒ ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•˜๋Š” ๋ฐ์ฝ”๋ ˆ์ดํ„ฐ๋กœ ๋“์ ์ž๋ฅผ ๊ฐ์‹ผ๋‹ค๋ฉด ์–ด๋–ป๊ฒŒ ๋ ๊นŒ์š”?

๋“์ ์ž ๊ฐ์ฒด๋Š” greater_is_better ํ”Œ๋ž˜๊ทธ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋“์ ์ž๊ฐ€ ์‚ฌ์šฉ๋  ๋•Œ๋งˆ๋‹ค ํ•„์š”ํ•œ ๊ฒฝ์šฐ์— ํ‘œ์‹œ๋ฅผ ๋’ค์ง‘์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (์˜ˆ : GridSearchCV).

์ด๊ฒƒ์€ ๋“์ ์ž๊ฐ€ 0.13๊ณผ 0.14 ๋ฆด๋ฆฌ์Šค ์‚ฌ์ด์˜ ๊ฐœ๋ฐœ ๊ณผ์ •์—์„œ ์›๋ž˜ ํ•œ ์ผ์ด๋ฉฐ ์ •์˜๋ฅผ ํ›จ์”ฌ ๋” ์–ด๋ ต๊ฒŒ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์Šค์ฝ”์–ด๋Ÿฌ ์ฝ”๋“œ์—์„œ greater_is_better ์†์„ฑ์ด ์‚ฌ๋ผ์ง€๊ณ  ๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰ ์ฝ”๋“œ ์ค‘๊ฐ„์— ๋‹ค์‹œ ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด๊ธฐ ๋•Œ๋ฌธ์— ์ฝ”๋“œ๋ฅผ ๋”ฐ๋ฅด๊ธฐ ์–ด๋ ต๊ฒŒ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์ƒ์ ์œผ๋กœ๋Š” ๊ฐ„๋‹จํ•œ ํ•จ์ˆ˜๋กœ ํ•  ์ˆ˜์žˆ๋Š” ์ผ์„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํŠน๋ณ„ํ•œ Scorer ํด๋ž˜์Šค๊ฐ€ ํ•„์š”ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ ์ˆ˜๋ฅผ ์ตœ์ ํ™”ํ•˜๋ ค๋ฉด _ ์ตœ๋Œ€ํ™” _ํ•ด์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž ํŽธ์˜๋ฅผ ์œ„ํ•ด ์ ์ˆ˜์˜ _display_ ๋งŒ ๋ณ€๊ฒฝํ•˜๊ณ  ๊ธฐ๋ณธ ์ œ๊ณต ์ด๋ฆ„์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํœด๋ฆฌ์Šคํ‹ฑ์„ ์‚ฌ์šฉํ•  ์ˆ˜์žˆ๋Š” ๋งค๊ฐœ ๋ณ€์ˆ˜ score_is_loss โˆˆ ["auto", True, False] ๋ฅผ ๋„์ž… ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ์ฐจ์—์„œ ๋‚ด๋ ค์•ผํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์„œ๋‘˜๋Ÿฌ ์‘๋‹ตํ–ˆ์Šต๋‹ˆ๋‹ค. "๋””์Šคํ”Œ๋ ˆ์ด"๊ฐ€ ์˜๋ฏธํ•˜๋Š” ๊ฒƒ์€ ์‹ค์ œ๋กœ cross_val_score ์˜ ๋ฐ˜ํ™˜ ๊ฐ’์ž…๋‹ˆ๋‹ค. ๋“์ ์ž๋Š” ๋‹จ์ˆœํ•˜๊ณ  ๊ท ์ผํ•ด์•ผํ•˜๋ฉฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํ•ญ์ƒ ์ตœ๋Œ€ํ™”๋˜์–ด์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

์ด๋กœ ์ธํ•ด ๊ธฐ๋ณธ ์ œ๊ณต ๋ฐ ์‚ฌ์šฉ์ž ์ง€์ • ์ฑ„์ ์ž๊ฐ„์— ๋น„๋Œ€์นญ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

ํ•‘ @GaelVaroquaux.

๋‚˜๋Š” score_is_loss ์†”๋ฃจ์…˜์ด๋‚˜ ๊ทธ ํšจ๊ณผ๋ฅผ ์ข‹์•„ํ•ฉ๋‹ˆ๋‹ค. ์ ์ˆ˜ ์ด๋ฆ„๊ณผ ์ผ์น˜ํ•˜๋Š” ๋ถ€ํ˜ธ ๋ณ€๊ฒฝ์€ ์œ ์ง€ํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค @larsmans๊ฐ€ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด ๋ฌธ์ œ๋ฅผ ์ผ์œผํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก ์€ ๋ฌด์—‡์ด๋ฉฐ ์–ด๋–ค ํ•ด๊ฒฐ์ฑ…์„ ์ฐพ์•„์•ผํ•ฉ๋‹ˆ๊นŒ? :)

@tdomhan @jaquesgrobler @larsmans ์ด๊ฒƒ์ด r2 ์—๋„ ์ ์šฉ๋˜๋Š”์ง€ ์•Œ๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ? ๋‚˜๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€ ๋‚ฉ๋‹ˆ๊นŒ r2 ์ ์ˆ˜์— ์˜ํ•ด ๋ฐ˜ํ™˜ GridSearchCV ๋„์— ๋Œ€ํ•œ ๋Œ€๋ถ€๋ถ„ ๋ถ€์ •์ ์ธ ElasticNet , Lasso ๋ฐ Ridge .

Rยฒ๋Š” ์–‘์ˆ˜ ๋˜๋Š” ์Œ์ˆ˜ ์ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ ์Œ์ˆ˜๋Š” ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๋งค์šฐ ๋‚ฎ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

IIRC, @GaelVaroquaux ๋Š” greater_is_better=False ๋•Œ ์Œ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ์ง€์ง€์ž์˜€์Šต๋‹ˆ๋‹ค.

r2 ๋Š” ์ ์ˆ˜ ํ•จ์ˆ˜ (ํฌ๋ฉด ํด์ˆ˜๋ก ์ข‹์Œ)์ด๋ฏ€๋กœ ๋ชจ๋ธ์ด ์ข‹์œผ๋ฉด ์–‘์ˆ˜ ์—ฌ์•ผํ•˜์ง€๋งŒ ์‹ค์ œ๋กœ ์Œ์ˆ˜ ์ผ ์ˆ˜์žˆ๋Š” ๋ช‡ ์•ˆ๋˜๋Š” ์„ฑ๋Šฅ ์ธก์ • ํ•ญ๋ชฉ ์ค‘ ํ•˜๋‚˜์ด๋ฉฐ ์ด๋Š” 0๋ณด๋‹ค ๋” ๋‚˜์ฉ๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ํ•ฉ์˜๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ? ์ œ ์ƒ๊ฐ์—๋Š” cross_val_score ๋Š” ๋ชจ๋ธ ์„ ํƒ ๋„๊ตฌ๊ฐ€ ์•„๋‹ˆ๋ผ ํ‰๊ฐ€ ๋„๊ตฌ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์›๋ž˜ ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

PR # 2759์—์„œ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณ€๊ฒฝ ์‚ฌํ•ญ์œผ๋กœ ์ธํ•ด ์‰ฝ๊ฒŒ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ํŠธ๋ฆญ์€ ์‚ฌ์ธ์„ ๋’ค์ง‘์ง€ ์•Š๊ณ  ๋Œ€์‹  ๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ ํ•  ๋•Œ ๋“์ ์ž์˜ greater_is_better ์†์„ฑ์— ์•ก์„ธ์Šคํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ํ•ฉ์˜๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ? ์ œ ์ƒ๊ฐ์—๋Š” cross_val_score๋Š”
๋ชจ๋ธ ์„ ํƒ ๋„๊ตฌ๊ฐ€ ์•„๋‹Œ ํ‰๊ฐ€ ๋„๊ตฌ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ฐ˜ํ™˜๋˜์–ด์•ผํ•ฉ๋‹ˆ๋‹ค.
์›๋ž˜ ๊ฐ’.

ํŠน๋ณ„ํ•œ ๊ฒฝ์šฐ๋Š” ๋‹ค์–‘ํ•œ ๋™์ž‘์ด ์†Œํ”„ํŠธ์›จ์–ด ๋ฌธ์ œ์˜ ์›์ธ์ž…๋‹ˆ๋‹ค.

๋ชฉ๋ก์—์„œ "mse"์˜ ์ด๋ฆ„์„ "negated_mse"๋กœ ๋ณ€๊ฒฝํ•ด์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.
ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•œ ์ ์ˆ˜ ๋ฌธ์ž์—ด.

๋ˆ„๊ตฐ๊ฐ€ mse์™€ ๊ฐ™์€ ์ด๋ฆ„์œผ๋กœ ์‚ฌ์šฉ์ž ์ง€์ • ์ฑ„์ ์ž๋ฅผ ์ •์˜ํ•˜๋ฉด ์–ด๋–ป๊ฒŒ๋ฉ๋‹ˆ๊นŒ? ์ด๋ฆ„ ์ง€์ • ํŒจํ„ด์„ ๋”ฐ๋ฅด์ง€๋งŒ ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•˜๋Š” ๋ฐ์ฝ”๋ ˆ์ดํ„ฐ๋กœ ๋“์ ์ž๋ฅผ ๊ฐ์‹ผ๋‹ค๋ฉด ์–ด๋–ป๊ฒŒ ๋ ๊นŒ์š”?

@ogrisel ์ด ์›๋ž˜ ๋ฉ”ํŠธ๋ฆญ๊ณผ ์ผ์น˜ํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฆ„ ์ผ์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋„๋ก ์ œ์•ˆํ–ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. @ogrisel์ด ์ž˜๋ชป๋˜๋ฉด ์ €๋ฅผ ์ˆ˜์ •ํ•˜์‹ญ์‹œ์˜ค.

ํ—ˆ์šฉ๋˜๋Š” ์ ์ˆ˜ ๋ฌธ์ž์—ด ๋ชฉ๋ก์—์„œ "mse"์˜ ์ด๋ฆ„์„ "negated_mse"๋กœ ๋ณ€๊ฒฝํ•ด์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

scikit-learn์˜ ๋‚ด๋ถ€๋ฅผ ๋ชจ๋ฅธ๋‹ค๋ฉด ์™„์ „ํžˆ ์ง๊ด€์ ์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์‹œ์Šคํ…œ์„ ๊ตฌ๋ถ€๋ ค ์•ผํ•œ๋‹ค๋ฉด ๋””์ž์ธ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค๋Š” ์‹ ํ˜ธ๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

scikit-learn์˜ ๋‚ด๋ถ€๋ฅผ ๋ชจ๋ฅธ๋‹ค๋ฉด ์™„์ „ํžˆ ์ง๊ด€์ ์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
์ด๋ ‡๊ฒŒ ์‹œ์Šคํ…œ์„ ๊ตฌ๋ถ€๋ ค ์•ผํ•œ๋‹ค๋ฉด
๋””์ž์ธ ๋ฌธ์ œ.

๋™์˜ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ธ๊ฐ„์€ ๋งŽ์€ ์‚ฌ์ „ ์ง€์‹๊ณผ
๋ฌธ๋งฅ. ๊ทธ๊ฒƒ๋“ค์€ ๋ชจ๋‘ ์ฒด๊ณ„์ ์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์„ ์†Œํ”„ํŠธ์›จ์–ด์— ํฌํ•จ ์‹œํ‚ค๋ ค๊ณ 
ํŠน๋ณ„ํ•œ ๊ฒฝ์šฐ์™€ ๊ฐ™์€ ์‡ผํ•‘ ๋ชฉ๋ก์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ
์œ ์ง€ํ•˜๊ธฐ ์–ด๋ ค์šด ์†Œํ”„ํŠธ์›จ์–ด์ด์ง€๋งŒ
์ด๋Ÿฌํ•œ ์˜ˆ์™ธ๋Š” ๋†€๋ผ์šด ๋™์ž‘์„ ์ผ์œผํ‚ค๊ณ  ๋ฒ„๊ทธ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.
๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ฝ”๋“œ.

์–ด๋–ค ํŠน๋ณ„ํ•œ ๊ฒฝ์šฐ๋ฅผ ์—ผ๋‘์—๋‘๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ?

๋ช…ํ™•ํ•˜๊ฒŒ ๋งํ•˜๋ฉด GridSearchCV ๊ฐ์ฒด์— ์ €์žฅ๋œ ๊ต์ฐจ ๊ฒ€์ฆ ์ ์ˆ˜๊ฐ€ _also_ ์›๋ž˜ ๊ฐ’์ด์–ด์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค (๋ถ€ํ˜ธ ๋ฐ˜์ „์ด ์•„๋‹˜).

AFAIK, ๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰ ๊ตฌํ˜„์„ ์ข€ ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๊ธฐํ˜ธ ๋’ค์ง‘๊ธฐ๊ฐ€ ๋„์ž…๋˜์—ˆ์ง€๋งŒ ์‚ฌ์šฉ์„ฑ์— ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€๋Š” ์•Š์•˜์Šต๋‹ˆ๋‹ค.

์–ด๋–ค ํŠน๋ณ„ํ•œ ๊ฒฝ์šฐ๋ฅผ ์—ผ๋‘์—๋‘๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ?

๊ธ€์Ž„์š”, ์ผ๋ถ€ ๋ฉ”ํŠธ๋ฆญ์˜ ๊ฒฝ์šฐ ๋” ํฐ ๊ฒƒ์ด ๋” ์ข‹์€ ๋ฐ˜๋ฉด ๋‹ค๋ฅธ ๋ฉ”ํŠธ๋ฆญ์˜ ๊ฒฝ์šฐ
๊ทธ ๋ฐ˜๋Œ€์ž…๋‹ˆ๋‹ค.

AFAIK, ๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰์„ ์œ„ํ•ด ์‚ฌ์ธ ๋’ค์ง‘๊ธฐ ๋„์ž…
๊ตฌํ˜„์ด ์กฐ๊ธˆ ๋” ๊ฐ„๋‹จํ•˜์ง€๋งŒ ์˜ํ–ฅ์„์ฃผ์ง€ ์•Š์•„์•ผํ•ฉ๋‹ˆ๋‹ค.
์œ ์šฉ์„ฑ.

๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰์ด ์•„๋‹ˆ๋ผ ๊ด€์‹ฌ์‚ฌ ๊ตฌ๋ถ„ : ์ ์ˆ˜
๊ทธ๋“ค์— ๋Œ€ํ•ด ์•„๋ฌด๊ฒƒ๋„ ๋ชจ๋ฅด๊ณ  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์–ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด
๊ทธ๋“ค์˜ ํŠน์ด์„ฑ์„ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์€ ์ „์ฒด ์ฝ”๋“œ๋ฒ ์ด์Šค๋กœ ํผ์งˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์žˆ๋‹ค
์ด๋ฏธ ๋งŽ์€ ์ ์ˆ˜ ์ฝ”๋“œ.

๊ทธ๋Ÿฌ๋‚˜ ๊ทธ๊ฒƒ์€ ์‚ฌ์šฉ์ž ์ฝ”๋“œ์— ๋Œ€ํ•œ ๋ฌธ์ œ๋ฅผ ๋‹ค์†Œ ๋ฏธ๋ฃจ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ฌด๋„ "๋ถ€์ • ๋œ MSE"๋ฅผ ๊ทธ๋ฆฌ๋Š” ๊ฒƒ์„ ์›ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ์‚ฌ์šฉ์ž๋Š” ์ฝ”๋“œ์—์„œ ๊ธฐํ˜ธ๋ฅผ ๋‹ค์‹œ ๋’ค์ง‘์–ด ์•ผํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ๋‹ค์ค‘ ๋ฉ”ํŠธ๋ฆญ ๊ต์ฐจ ๊ฒ€์ฆ ๋ณด๊ณ ์„œ (PR # 2759)์˜ ๊ฒฝ์šฐ ๊ฐ ๋ฉ”ํŠธ๋ฆญ์„ ๊ฐœ๋ณ„์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•ด์•ผํ•˜๋ฏ€๋กœ ์ด๋Š” ๋ถˆํŽธํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ์ฝ”๋“œ์™€ ์ง๊ด€์  ์ธ ๊ฒฐ๊ณผ๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ์žฅ์ ์„ ๋ชจ๋‘ ์–ป์„ ์ˆ˜ ์žˆ๋Š”์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ๊ทธ๊ฒƒ์€ ์‚ฌ์šฉ์ž ์ฝ”๋“œ์— ๋Œ€ํ•œ ๋ฌธ์ œ๋ฅผ ๋‹ค์†Œ ๋ฏธ๋ฃจ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ฌด๋„ ์›ํ•˜์ง€ ์•Š๋Š”๋‹ค
"๋ถ€์ • ๋œ MSE"๋ฅผ ํ”Œ๋กœํŒ…ํ•˜์—ฌ ์‚ฌ์šฉ์ž๊ฐ€ ์ž์‹ ์˜
์•”ํ˜ธ.

ํ™•์‹คํžˆ ์„ธ์ƒ์˜ ๋์€ ์•„๋‹™๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์„ ์ฝ์„ ๋•Œ ๋˜๋Š”
ํ”„๋ ˆ์  ํ…Œ์ด์…˜์„ ๋ณด๋ฉด ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜ํ”„๊ฐ€
์ž˜ ํ–ˆ์–ด, ๋‚˜๋Š” ์‹œ๊ฐ„๊ณผ ์ •์‹ ์  ๋Œ€์—ญํญ์„
๋” ํฐ ๊ฒƒ์ด ๋” ์ข‹์€์ง€ ์•„๋‹Œ์ง€ ๊ทธ๋ฆผ.

ํŠนํžˆ ๋‹ค์ค‘ ๋ฉ”ํŠธ๋ฆญ ๊ต์ฐจ ๊ฒ€์ฆ์˜ ๊ฒฝ์šฐ ๋ถˆํŽธํ•ฉ๋‹ˆ๋‹ค.
๋ณด๊ณ ์„œ (PR # 2759), ๊ฐ ์ธก์ • ํ•ญ๋ชฉ์„ ๊ฐœ๋ณ„์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

์™œ. ํ•ญ์ƒ ํฐ ๊ฒƒ์ด ๋” ์ข‹๋‹ค๋Š” ๊ฒƒ์„ ๋ฐ›์•„๋“ค์ด๋ฉด
๊ฒฐ๊ณผ ํ•ด์„์„ ํฌํ•จํ•˜์—ฌ ๋ชจ๋“  ๊ฒƒ์ด ๋” ์‰ฝ์Šต๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๊ฐ€ ๋‘ ๊ฐ€์ง€ ์žฅ์ ์„ ๋ชจ๋‘ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์„์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜ ์ฝ”๋“œ์™€
์ง๊ด€์  ์ธ ๊ฒฐ๊ณผ.

์œ„ํ—˜์€ ์œ ์ง€ ๊ด€๋ฆฌ ์†๋„๋ฅผ ๋Šฆ์ถ”๋Š” ๋งค์šฐ ๋ณต์žกํ•œ ์ฝ”๋“œ๋ฅผ ๊ฐ–๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๋ฐ ๊ฐœ๋ฐœ. Scikit-learn์€ ๋ฌด๊ฒŒ๋ฅผ ๋Š˜๋ฆฌ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•ญ์ƒ ๋” ํฐ ๊ฒƒ์ด ๋” ์ข‹๋‹ค๋Š” ๊ฒƒ์„ ๋ฐ›์•„๋“ค์ด๋ฉด

๊ทธ๊ฒƒ์ด ๊ทธ๋…€๊ฐ€ ๋งํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค :)

๋” ์ง„์ง€ํ•˜๊ฒŒ, ์ด๊ฒƒ์ด ์‚ฌ๋žŒ๋“ค์„ ํ˜ผ๋ž€์Šค๋Ÿฝ๊ฒŒํ•˜๋Š” ํ•œ ๊ฐ€์ง€ ์ด์œ ๋Š” cross_val_score ์˜ ์ถœ๋ ฅ์ด ๋ฉ”ํŠธ๋ฆญ๊ณผ ์ผ์น˜ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๊ท€ํ•˜์˜ ๋…ผ๋ฆฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ๊ฒฝ์šฐ sklearn.metrics์˜ ๋ชจ๋“  ๋ฉ”ํŠธ๋ฆญ์€ "ํฐ ๊ฒƒ์ด ๋” ์ข‹์Œ"์„ ๋”ฐ๋ผ์•ผํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๊ฒƒ์ด ๊ทธ๋…€๊ฐ€ ๋งํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค :)

์ž˜ ํ–ˆ์–ด!

๋” ์ง„์ง€ํ•˜๊ฒŒ, ์ด๊ฒƒ์ด ์‚ฌ๋žŒ๋“ค์„ ํ˜ผ๋ž€์Šค๋Ÿฝ๊ฒŒํ•˜๋Š” ํ•œ ๊ฐ€์ง€ ์ด์œ ๋Š”
cross_val_score์˜ ์ถœ๋ ฅ์ด ๋ฉ”ํŠธ๋ฆญ๊ณผ ์ผ์น˜ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€
๋…ผ๋ฆฌ๋ฅผ ๋”ฐ๋ฅด๊ณ  sklearn.metrics์˜ ๋ชจ๋“  ๋ฉ”ํŠธ๋ฆญ์€ "๋” ํฐ
๋” ๋‚˜์€".

๋™์˜ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ด๋ฆ„์„ ๋ฐ”๊พธ๋Š” ๊ฒƒ์ด ๋งˆ์Œ์— ๋“ญ๋‹ˆ๋‹ค.
์‚ฌ๋žŒ๋“ค์˜ ๋ˆˆ์—.

๋” ์ง„์ง€ํ•˜๊ฒŒ, ์ด๊ฒƒ์ด ์‚ฌ๋žŒ๋“ค์„ ํ˜ผ๋ž€์Šค๋Ÿฝ๊ฒŒํ•˜๋Š” ํ•œ ๊ฐ€์ง€ ์ด์œ ๋Š” cross_val_score์˜ ์ถœ๋ ฅ์ด ๋ฉ”ํŠธ๋ฆญ๊ณผ ์ผ์น˜ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์ด๊ฒƒ์€ ์ฐจ๋ก€๋กœ scoring ์‹ค์ œ๋ณด๋‹ค ๋” ์‹ ๋น„์Šค๋Ÿฝ๊ฒŒ ๋ณด์ด๊ฒŒํ•ฉ๋‹ˆ๋‹ค.

์„ ํ˜• ํšŒ๊ท€๋ฅผ ์‹œ๋„ ํ•  ๋•Œ ์˜ค๋Š˜ 0.16.1์—์„œ ์ด๊ฒƒ์— ๋ฌผ ๋ ธ์Šต๋‹ˆ๋‹ค. ์ ์ˆ˜์˜ ๋ถ€ํ˜ธ๋Š” ๋ถ„๋ฅ˜ ์ž์— ๋Œ€ํ•ด ๋” ์ด์ƒ ๋ฐ˜์ „๋˜์ง€ ์•Š์ง€๋งŒ ์„ ํ˜• ํšŒ๊ท€์—์„œ๋Š” ์—ฌ์ „ํžˆ ๋ฐ˜์ „๋ฉ๋‹ˆ๋‹ค. ํ˜ผ๋ž€์„ ๋”ํ•˜๊ธฐ ์œ„ํ•ด LinearRegression.score ()๋Š” ๋ฐ˜์ „๋˜์ง€ ์•Š์€ ์ ์ˆ˜ ๋ฒ„์ „์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋‚˜๋Š” ๊ทธ๊ฒƒ์„ ๋ชจ๋‘ ์ผ๊ด€๋˜๊ฒŒ ๋งŒ๋“ค๊ณ  ์„ ํ˜• ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ๋„ ๋ถ€ํ˜ธ ๋ฐ˜์ „๋˜์ง€ ์•Š์€ ์ ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

์˜ˆ:

from sklearn import linear_model
from sklearn.naive_bayes import GaussianNB
from sklearn import cross_validation
from sklearn import datasets
iris = datasets.load_iris()
nb = GaussianNB()
scores = cross_validation.cross_val_score(nb, iris.data, iris.target)
print("NB score:\t  %0.3f" % scores.mean() )

iris_reg_data = iris.data[:,:3]
iris_reg_target = iris.data[:,3]
lr = linear_model.LinearRegression()
scores = cross_validation.cross_val_score(lr, iris_reg_data, iris_reg_target)
print("LR score:\t %0.3f" % scores.mean() )

lrf = lr.fit(iris_reg_data, iris_reg_target)
score = lrf.score(iris_reg_data, iris_reg_target)
print("LR.score():\t  %0.3f" % score )

์ด๊ฒƒ์€ ๋‹ค์Œ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

NB score:     0.934    # sign is not flipped
LR score:    -0.755    # sign is flipped
LR.score():   0.938    # sign is not flipped

๊ต์ฐจ ๊ฒ€์ฆ์€ ํด์ˆ˜๋ก ๋” ์ข‹์€ ๋ชจ๋ธ์˜ ๋ชจ๋“  ์‹ ํ˜ธ๋ฅผ ๋’ค์ง‘์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ์—ฌ์ „ํžˆ์ด ๊ฒฐ์ •์— ๋™์˜ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๊ทธ๊ฒƒ์˜ ์ฃผ๋œ ์ง€์ง€์ž๊ฐ€ @GaelVaroquaux ์ด๊ณ  ์•„๋งˆ๋„ @mblondel์ด์—ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

์˜ค ๊ดœ์ฐฎ์•„์š”. ๋ชจ๋“  ๋…ผ์˜๋Š” ์œ„์— ์žˆ์Šต๋‹ˆ๋‹ค.
mse์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ๊ธฐํ˜ธ๋ฅผ ๋’ค์ง‘๋Š” ๋Š๋‚Œ์ด ๋“ค์—ˆ๊ณ  r2๋Š” ํ›จ์”ฌ ๋œ ์ง๊ด€์ ์ž…๋‹ˆ๋‹ค :-/

@Huitzilo GaussianNB๋Š” ๋ถ„๋ฅ˜๊ธฐ์ด๋ฉฐ ์ •ํ™•๋„๋ฅผ ๊ธฐ๋ณธ ์ฑ„์ ์ž๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. LinearRegression์€ ํšŒ๊ท€ ์ž์ด๋ฉฐ r2 ์ ์ˆ˜๋ฅผ ๊ธฐ๋ณธ ์ฑ„์ ์ž๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์ ์ˆ˜๋Š” ์Œ์ˆ˜์ด์ง€๋งŒ r2 ์ ์ˆ˜๋Š” ์Œ์ˆ˜ ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ iris๋Š” ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ฐ์ดํ„ฐ ์„ธํŠธ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋Œ€์ƒ์€ ๋ฒ”์ฃผ ํ˜•์ž…๋‹ˆ๋‹ค. ํšŒ๊ท€์ž๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

๋งž์•„์š”, ๋‚˜๋Š” ๋ฌด์Šจ ์ผ์ด ์ผ์–ด๋‚˜๋Š”์ง€์— ๋Œ€ํ•ด ์•ฝ๊ฐ„ ํ˜ผ๋ž€ ์Šค๋Ÿฌ์› ์Šต๋‹ˆ๋‹ค. r2๋Š” ๋’ค์ง‘ ํžˆ์ง€ ์•Š์Šต๋‹ˆ๋‹ค ... mse ๋งŒ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ „์ฒด ๋ฌธ์ œ์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…์€ negmse ์ด๋ฆ„์„ ๋ฐ”๊พธ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

@mblondel ๋ฌผ๋ก  ๋‹น์‹ ์ด ์˜ณ์Šต๋‹ˆ๋‹ค, ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋Š” ํšŒ๊ท€์— ๋Œ€ํ•œ ์˜ˆ๋ฅผ ๋น ๋ฅด๊ฒŒ ํ•จ๊ป˜ ๋‘๋“œ๋ฆฌ๊ณ  ์žˆ์—ˆ๊ณ  ํ™์ฑ„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ณผ์‹ ์œผ๋กœ ๋‹ค๋ฅธ ๊ฒƒ์˜ ๊ธฐ๋Šฅ # 4๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ํšจ๊ณผ๊ฐ€ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค (์–‘์ˆ˜ R2 ์‚ฌ์šฉ). ๊ทธ๋Ÿฌ๋‚˜ ๊ทธ๊ฒƒ์€ ์Œ์˜ R2๊ฐ€ ์•„๋‹ˆ ์—ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋’ค์ง‘๋Š” ํ”์ ์ด ์—†์Šต๋‹ˆ๋‹ค. ํ™•์ธ. ๋‚ด ์ž˜๋ชป์ด์•ผ.

๊ทธ๋ž˜๋„ ๊ธฐํ˜ธ๋Š” cross_val_score ์—์„œ ์–ป์€ MSE์—์„œ ๋’ค์ง‘ํ˜€ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‚˜๋ฟ ์ผ ์ˆ˜๋„ ์žˆ์ง€๋งŒ,์ด ๋ถˆ์ผ์น˜๊ฐ€ ๋งค์šฐ ํ˜ผ๋ž€ ์Šค๋Ÿฝ์Šต๋‹ˆ๋‹ค. MSE๋Š” ์™œ R2๊ฐ€ ์•„๋‹Œ sign-flipped ์—ฌ์•ผํ•ฉ๋‹ˆ๊นŒ?

๋‚˜๋ฟ ์ผ ์ˆ˜๋„ ์žˆ์ง€๋งŒ,์ด ๋ถˆ์ผ์น˜๊ฐ€ ๋งค์šฐ ํ˜ผ๋ž€ ์Šค๋Ÿฝ์Šต๋‹ˆ๋‹ค. MSE๋Š” ์™œ R2๊ฐ€ ์•„๋‹Œ sign-flipped ์—ฌ์•ผํ•ฉ๋‹ˆ๊นŒ?

์ ์ˆ˜์˜ ์˜๋ฏธ๊ฐ€ ๋†’์„์ˆ˜๋ก ์ข‹์Šต๋‹ˆ๋‹ค. ๋†’์€ MSE๋Š” ๋‚˜์˜๋‹ค.

์•„๋งˆ๋„ negmse๊ฐ€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

@amueller ๋™์˜ํ•ฉ๋‹ˆ๋‹ค. ์ ์ˆ˜ ๋งค๊ฐœ ๋ณ€์ˆ˜์˜ ์ด๋ฆ„์—์„œ ๋ถ€ํ˜ธ ๋ฐ˜์ „์„ ๋ช…์‹œ ์ ์œผ๋กœ ๋งŒ๋“ค๋ฉด ํ˜ผ๋ž€์„ ํ”ผํ•˜๋Š” ๋ฐ ํ™•์‹คํžˆ ๋„์›€์ด๋ฉ๋‹ˆ๋‹ค.

์–ด์ฉŒ๋ฉด [1]์˜ ๋ฌธ์„œ๋Š” ์–ด๋–ค ์ ์ˆ˜์—์„œ ๊ธฐํ˜ธ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋’ค์ง‘ํžˆ๋Š” ์ง€์— ๋Œ€ํ•ด ํ›จ์”ฌ ๋” ๋ถ„๋ช… ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ œ ๊ฒฝ์šฐ์—๋Š” ์ •๋ณด๊ฐ€ ๋นจ๋ฆฌ ํ•„์š”ํ–ˆ๊ณ  3.1.1.1 ์•„๋ž˜์˜ ํ‘œ๋งŒ ๋ณด์•˜์ง€๋งŒ ํ…์ŠคํŠธ๋Š” ์ฝ์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค ( "ํฌ๋ฉด ํด์ˆ˜๋ก ์ข‹๋‹ค"๋ผ๋Š” ์›์น™์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค). IMHO๋Š” 3.1.1.1์˜ ํ‘œ์— mse, median ๋ฐ mean absolute error์— ๋Œ€ํ•œ ์ฃผ์„์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ถ€์ •์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์œผ๋กœ ์‹ค์ œ ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ ๋„ ์ด๋ฏธ ๋งŽ์€ ๋„์›€์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

[1] http://scikit-learn.org/stable/modules/model_evaluation.html#scoring -parameter

๋งค์šฐ ํฅ๋ฏธ๋กœ์šด ์‚ฌ๋ก€๋ฅผ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.

from sklearn.cross_validation import cross_val_score
model = LinearRegression()
scores = cross_val_score(model, X, target, cv=2, scoring='r2')
scores

๊ฒฐ๊ณผ

array([-0.17026282, -2.21315179])

๋™์ผํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•ด ๋‹ค์Œ ์ฝ”๋“œ

model = LinearRegression()
model.fit(X, target)
prediction = model.predict(X)
print r2_score(target, prediction)

ํ•ฉ๋ฆฌ์ ์ธ ๊ฐ€์น˜๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค

0.353035789318

์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ (์ ˆํŽธ ํฌํ•จ)์— ๋Œ€ํ•œ AFAIK๋Š” R ^ 2> 1 ๋˜๋Š” R ^ 2 <0์„ ์–ป์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ cv ๊ฒฐ๊ณผ๋Š” ๋ถ€ํ˜ธ๊ฐ€ ๋ฐ˜์ „ ๋œ R ^ 2์ฒ˜๋Ÿผ ๋ณด์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์–ด๋Š ์‹œ์ ์—์„œ ๋‚ด๊ฐ€ ํ‹€๋ ธ์Šต๋‹ˆ๊นŒ?

r2๋Š” ์Œ์ˆ˜ ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (์ž˜๋ชป๋œ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ). 1๋ณด๋‹ค ํด ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

์•„๋งˆ๋„ ๊ณผ์  ํ•ฉ ์ƒํƒœ ์ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‹œํ—˜:

from sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, target, test_size=0.2, random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
pred_train = model.predict(X_train)
print("train r2: %f" % r2_score(y_train, pred_train))

pred_test = model.predict(X_test)
print("test r2: %f" % r2_score(y_test, pred_test))

์ž„์˜ ๋ถ„ํ• ์„ ์ œ์–ดํ•˜๋Š” random_state ์ •์ˆ˜ ์‹œ๋“œ์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ๊ฐ’์„ ์‚ฌ์šฉํ•ด๋ณด์‹ญ์‹œ์˜ค.

์•„๋งˆ๋„ negmse๊ฐ€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

'neg_mse'์˜ ๊ฒฝ์šฐ +1 (๋ฐ‘์ค„์ด ๋” ์ฝ๊ธฐ ์‰ฝ๊ฒŒ ๋งŒ๋“ ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค).

๋ชจ๋“  ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋ฉ๋‹ˆ๊นŒ? ๋‹ค๋ฅธ ์ ์ˆ˜๊ฐ€ ๋†’์„์ˆ˜๋ก ์ข‹์ง€ ์•Š์Šต๋‹ˆ๊นŒ?

๋‹ค์Œ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • log_loss
  • mean_absolute_error
  • median_absolute_error

doc/modules/model_evaluation.rst ๋”ฐ๋ฅด๋ฉด ๊ทธ๊ฒŒ ์ „๋ถ€์ž…๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  hinge_loss ๋‚ด ์ƒ๊ฐ ์—”?

๋ชจ๋“  ์†์‹ค์— neg_ ์ ‘๋‘์‚ฌ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ ์–ด์ƒ‰ํ•ฉ๋‹ˆ๋‹ค.

์•„์ด๋””์–ด๋Š” ์›๋ž˜ ์ ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค (๋ถ€ํ˜ธ ๋’ค์ง‘๊ธฐ์—†์ด) ๊ทธ๋Ÿฌ๋‚˜ ndarray๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ๋Œ€์‹  best() , arg_best() , best_sorted() ๊ฐ™์€ ๋ฉ”์„œ๋“œ๋กœ ndarray๋ฅผ ํ™•์žฅํ•˜๋Š” ํด๋ž˜์Šค๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค

๊ฒฝ์ฒฉ ์†์‹ค์— ๋Œ€ํ•œ ์ฑ„์ ์ž๋Š” ์—†์Šต๋‹ˆ๋‹ค (ํ‰๊ฐ€์— ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์„ ๋ณธ ์ ์ด ์—†์Šต๋‹ˆ๋‹ค).

๋“์ ์ž๋Š” numpy ๋ฐฐ์—ด์„ ๋ฐ˜ํ™˜ํ•˜์ง€ ์•Š๊ณ  float๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
์‚ฌ์šฉ์ž ์ •์˜ ">"๊ฐ€ ์žˆ์ง€๋งŒ ๋ถ€๋™์ฒ˜๋Ÿผ ๋ณด์ด๋Š” ์ ์ˆ˜ ๊ฐœ์ฒด๋ฅผ ๋ฐ˜ํ™˜ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด์ „ ์†”๋ฃจ์…˜๋ณด๋‹ค ๋” ์ธ์œ„์ ์ธ ๋Š๋‚Œ์ด ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์ „ ์†”๋ฃจ์…˜์€ GridSearchCV์—์„œ ์‚ฌ์šฉ ๋œ bool "lower_is_better"๋กœ ๋“์ ์ž์— ํƒœ๊ทธ๋ฅผ ์ง€์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.

cross_val_score ๋Š” ๋ฐฐ์—ด์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์‹ค์ œ๋กœ cross_val_score ๋ฐ˜ํ™˜ ๋œ ์ ์ˆ˜๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์ •๋ ฌ ํ•  ํ•„์š”๊ฐ€ ์—†์œผ๋ฉฐ ํ‰๊ท  ๋งŒ ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค.

๋˜ ๋‹ค๋ฅธ ์•„์ด๋””์–ด๋Š” sorted ๋ฉ”์„œ๋“œ๋ฅผ _BaseScorer ์ž…๋‹ˆ๋‹ค.

my_scorer = make_scorer(my_metric, greater_is_better=False)
scores = my_scorer.sorted(scores)  # takes into account my_scorer._sign
best = scores[0]

cross_val_score ๋Š” ๋ฐฐ์—ด์„ ๋ฐ˜ํ™˜ํ•˜์ง€๋งŒ ๋“์ ์ž๋Š” ๋ถ€๋™ ์†Œ์ˆ˜์ ์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. GridSearchCV์™€ ๋‹ค๋ฅธ ๋ชจ๋“  CV ๊ฐœ์ฒด์—์„œ ๋™์ผํ•œ ๋™์ž‘์„ ์›ํ•˜๊ธฐ ๋•Œ๋ฌธ์— cross_val_score ์— ํŠน์ • ๋…ผ๋ฆฌ๋ฅผ ๊ฐ–๋Š” ๊ฒƒ์ด ์ด์ƒํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

GridSearchCV์—์„œ ์ตœ๊ณ ์˜ ์ ์ˆ˜์™€ ์ตœ๊ณ ์˜ ์ธ๋ฑ์Šค๋ฅผ ์›ํ•˜๊ธฐ ๋•Œ๋ฌธ์— argsort ๋ฉ”์„œ๋“œ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

scikit-learn์— ์˜ํ•ด "ํ†ต์ œ ์งˆ๋ฌธ์—์„œ ์ž‘์—…์ž ์˜ค๋ฅ˜์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ์ถ”์ • ํ•œ ๋‹ค์Œ ์˜ˆ์ธก์— ๋Œ€ํ•œ ์ถ”์ • ํŽธํ–ฅ์„ ์ œ๊ฑฐํ•œ ํ›„ ๊ฐ€์ค‘ ํ‰๊ท ์„ ๊ณ„์‚ฐ"ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

IIRC๋Š” ์Šคํ”„๋ฆฐํŠธ (์ง€๋‚œ ์—ฌ๋ฆ„?!)์—์„œ์ด ๋ฌธ์ œ๋ฅผ ๋…ผ์˜ํ–ˆ๊ณ  neg_mse (๋˜๋Š” neg-mse ์˜€์Œ)๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ•˜๊ณ  ํ˜„์žฌ ๋งˆ์ด๋„ˆ์Šค ๋ถ€ํ˜ธ๊ฐ€์žˆ๋Š” ๋ชจ๋“  ๋“์ ์ž / ๋ฌธ์ž์—ด์„ ํ๊ธฐํ–ˆ์Šต๋‹ˆ๋‹ค.
์ด๊ฒƒ์ด ์—ฌ์ „ํžˆ ํ•ฉ์˜์ž…๋‹ˆ๊นŒ? 0.18 ์ด์ „์—ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
ํ•‘ @GaelVaroquaux @agramfort @jnothman @ogrisel @raghavrv

๋„ค, ์šฐ๋ฆฌ๋Š” neg_mse AFAIK์— ๋™์˜ํ–ˆ์Šต๋‹ˆ๋‹ค

neg_mse

๋˜ํ•œ ๋‹ค์Œ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

  • neg_log_loss
  • neg_mean_absolute_error
  • neg_median_absolute_error

๋ชจ๋ธ = Sequential ()
keras.layers.Flatten ()
model.add (Dense (11, input_dim = 3, kernel_initializer = keras.initializers.he_normal (seed = 2),
kernel_regularizer = regularizers.l2 (2)))
keras.layers.LeakyReLU (์•ŒํŒŒ = 0.1)
model.add (Dense (8, kernel_initializer = keras.initializers.he_normal (seed = 2)))
keras.layers.LeakyReLU (์•ŒํŒŒ = 0.1)
model.add (Dense (4, kernel_initializer = keras.initializers.he_normal (seed = 2)))
keras.layers.LeakyReLU (์•ŒํŒŒ = 0.1)
model.add (Dense (1, kernel_initializer = keras.initializers.he_normal (seed = 2)))
keras.layers.LeakyReLU (์•ŒํŒŒ = 0.2)
adag = RMSprop (lr = 0.0002)
model.compile (loss = losses.mean_squared_error,
Optimizer = adag
)
history = model.fit (X_train, Y_train, epochs = 2000,
batch_size = 20, shuffle = True)

์œ„ ์ฝ”๋“œ๋ฅผ ๊ต์ฐจ ๊ฒ€์ฆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ? ๋‚˜๋Š” ์ด๊ฒƒ์— ์‚ฌ์šฉ๋˜๋Š” ๊ต์ฐจ ๊ฒ€์ฆ ๋ฐฉ๋ฒ•์„ ๋‚จ๊ฒจ๋‘๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

@shreyassks ์ด๊ฒƒ์€ ๊ท€ํ•˜์˜ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์˜ฌ๋ฐ”๋ฅธ ์žฅ์†Œ๋Š” ์•„๋‹ˆ์ง€๋งŒ https://keras.io/scikit-learn-api๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ๋„คํŠธ์›Œํฌ๋ฅผ scikit-learn ์ถ”์ •๊ธฐ๋กœ ๋ž˜ํ•‘ ํ•œ ๋‹ค์Œ model_selection.cross_val_score

์˜ˆ. ์ „์ ์œผ๋กœ ๋™์˜ํ•ฉ๋‹ˆ๋‹ค! ์ด๊ฒƒ์€ Brier_score_loss์—์„œ๋„ ๋ฐœ์ƒํ–ˆ์œผ๋ฉฐ Brier_score_loss๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์™„๋ฒฝํ•˜๊ฒŒ ์ž‘๋™ํ•˜์ง€๋งŒ GridSearchCV์—์„œ ์˜ฌ ๋•Œ ํ˜ผ๋ž€์Šค๋Ÿฌ์›Œ์ง€๊ณ  ๋ถ€์ •์ ์ธ Brier_score_loss๊ฐ€ ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค. ์ ์–ด๋„ Brier_score_loss๊ฐ€ ์†์‹ค์ด๊ธฐ ๋•Œ๋ฌธ์— (๋‚ฎ์„์ˆ˜๋ก ๋” ์ข‹์Œ), ์—ฌ๊ธฐ์„œ ์ ์ˆ˜ ๋งค๊ธฐ๊ธฐ ๊ธฐ๋Šฅ์€ ๋ถ€ํ˜ธ๋ฅผ ๋’ค์ง‘์–ด ์Œ์ˆ˜๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

์•„์ด๋””์–ด๋Š” cross_val_score๊ฐ€ ๊ฒฐ๊ณผ์˜ ์ ˆ๋Œ€ ๊ฐ’์— ์ „์ ์œผ๋กœ ์ดˆ์ ์„ ๋งž์ถฐ์•ผํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋‚ด ์ง€์‹์œผ๋กœ๋Š” cross_val_score์—์„œ MSE (ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ)์— ๋Œ€ํ•ด ์–ป์€ ์Œ์ˆ˜ ๋ถ€ํ˜ธ (-)์˜ ์ค‘์š”์„ฑ์ด ๋ฏธ๋ฆฌ ์ •์˜๋˜์–ด ์žˆ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜๋Š” sklearn์˜ ์—…๋ฐ์ดํŠธ ๋œ ๋ฒ„์ „์„ ๊ธฐ๋‹ค๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

ํšŒ๊ท€ ์‚ฌ์šฉ ์‚ฌ๋ก€์˜ ๊ฒฝ์šฐ :
model_score = cross_val_score (model, df_input, df_target, scoring = 'neg_mean_squared_error', cv = 3)
๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฐ’์„ ์–ป๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

SVR :
[-6.20938025 -1.397376 -1.94519]
-3.183982080147279

์„ ํ˜• ํšŒ๊ท€:
[-5.94898085 -9.30931808 -1.15760676]
-5.4719685646934275

์˜ฌ๊ฐ€๋ฏธ:
[-7.22363814 -10.47734135 -2.20807684]
-6.6363521107522345

์‚ฐ๋“ฑ์„ฑ์ด:
[-5.95990385 -4.17946756 -1.36885809]
-3.8360764993832004

๊ทธ๋ž˜์„œ ์–ด๋Š ๊ฒƒ์ด ๊ฐ€์žฅ ์ข‹์Šต๋‹ˆ๊นŒ?
SVR?

ํšŒ๊ท€ ์‚ฌ์šฉ ์‚ฌ๋ก€์˜ ๊ฒฝ์šฐ :
์‚ฌ์šฉํ•  ๋•Œ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.
(1) ์ ์ˆ˜๊ฐ€ 'neg_mean_squared_error'์ธ "cross_val_score"
๊ณผ
(2) "GridSearchCV"๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  'best_score_'๋ฅผ ํ™•์ธํ•  ๋•Œ ๋™์ผํ•œ ์ž…๋ ฅ์— ๋Œ€ํ•ด

ํšŒ๊ท€ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ์–ด๋Š ๊ฒƒ์ด ๋” ๋‚ซ์Šต๋‹ˆ๊นŒ?

  • scoring = 'neg_mean_squared_error'์ธ "cross_val_score"
    (๋˜๋Š”)
  • "GridSearchCV"๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  'best_score_'๋ฅผ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

๋ฟก ๋นต๋€จ
์‚ฌ์šฉ๋ฒ•์— ๋Œ€ํ•œ ์งˆ๋ฌธ์„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์Šˆ ํŠธ๋ž˜์ปค๋Š” ์ฃผ๋กœ ๋ฒ„๊ทธ์™€ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‚ฌ์šฉ๋ฒ• ์งˆ๋ฌธ์˜ ๊ฒฝ์šฐ Stack Overflow ๋˜๋Š” ๋ฉ”์ผ ๋ง๋ฆฌ์ŠคํŠธ ๋ฅผ ์‚ฌ์šฉํ•ด ๋ณด๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰