sklearn.cross_validation.cross_val_scoreã«ãã£ãŠè¿ãããå¹³åäºä¹èª€å·®ã¯åžžã«è² ã§ãã ãã®é¢æ°ã®åºåãããã€ãã®ãã€ããŒãã©ã¡ãŒã¿ãæå®ããŠæ倧åã«äœ¿çšã§ããããã«èšèšããã決å®ã§ãããcross_val_scoreãçŽæ¥äœ¿çšããå Žåã¯éåžžã«æ··ä¹±ããŸãã å°ãªããšããæ£æ¹åœ¢ã®å¹³åãè² ã«ãªãå¯èœæ§ãããããšãèªåããcross_val_scoreãæ£ããæ©èœããŠããªãããæäŸãããã¡ããªãã¯ã䜿çšããŠããªããšèããŸããã sklearnã®ãœãŒã¹ã³ãŒããæãäžããŠåããŠãèšå·ãå転ããŠããããšã«æ°ã¥ããŸããã
ãã®åäœã¯scorer.pyã®make_scorerã«èšèŒãããŠããŸãããcross_val_scoreã«ã¯èšèŒãããŠãããããããã¹ãã ãšæããŸããããããªããšãcross_val_scoreãæ£ããæ©èœããŠããªããšæãããããã§ãã
ããªããèšåããŠãã
greater_is_better : boolean, default=True
Whether score_func is a score function (default), meaning high is good,
or a loss function, meaning low is good. In the latter case, the scorer
object will sign-flip the outcome of the score_func.
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html
ïŒ ïŒåèã®ããã«ïŒ
cross_val_scoreããã¥ã¡ã³ãã§ããæ確ã«ã§ããããšã«åæããŸã
å ±åããŠãããŠããããšã
å®éãã¹ã³ã¢ã©ãŒã®ãªãã¡ã¯ã¿ãªã³ã°ãè¡ããšãã«ããã®åé¡ãèŠèœãšããŠããŸããã 以äžã¯éåžžã«çŽæçã§ã¯ãããŸããã
>>> import numpy as np
>>> from sklearn.datasets import load_boston
>>> from sklearn.linear_model import RidgeCV
>>> from sklearn.cross_validation import cross_val_score
>>> boston = load_boston()
>>> np.mean(cross_val_score(RidgeCV(), boston.data, boston.target, scoring='mean_squared_error'))
-154.53681864311497
/ cc @larsmans
ãšããã§ãç§ã¯ãããããã¥ã¡ã³ãã®åé¡ã§ããããšã«åæããŸããã cross_val_score
ã¯ãã¹ã³ã¢ãªã³ã°åãšäžèŽãã笊å·ä»ãã®å€ãè¿ãå¿
èŠããããŸãã çæ³çã«ã¯ã GridSearchCV(*params).fit(X, y).best_score_
ãäžè²«ããŠããå¿
èŠããããŸãã ããã§ãªããã°ãAPIã¯éåžžã«æ··ä¹±ããŸãã
ãŸãã笊å·ãåãæ¿ããã«å®éã®MSEãè¿ãããã«å€æŽããæ¹ããããè¯ããªãã·ã§ã³ã§ããããšã«åæããŸãã
ã¹ã³ã¢ã©ãŒãªããžã§ã¯ãã¯greater_is_better
ãã©ã°ãæ ŒçŽããã ãã§ãããã¹ã³ã¢ã©ãŒã䜿çšããããã³ã«ãå¿
èŠã«å¿ããŠãããšãã°GridSearchCV
ããã«ç¬Šå·ãå転ãããããšãã§ããŸãã
ããã§ãŠãŒã¶ããªãã£ã®åé¡ãããããšã«åæããŸããã @ ogriselã®è§£æ±ºçã«å®å šã«ã¯åæããŸããã
ã¹ã³ã¢ãªã³ã°åãšäžèŽãã笊å·ä»ãã®å€ãè¿ããŸã
ããã¯é·æçã«ã¯ä¿¡é Œã§ããªãããã¯ã ããã§ãã 誰ããmse
ãªã©ã®ååã§ã«ã¹ã¿ã ã¹ã³ã¢ã©ãŒãå®çŸ©ããå Žåã¯ã©ããªããŸããïŒ ããããåœåãã¿ãŒã³ã«åŸã£ãŠããããååãå€æŽãããã³ã¬ãŒã¿ã§ã¹ã³ã¢ã©ãŒãã©ããããå Žåã¯ã©ããªããŸããïŒ
ã¹ã³ã¢ã©ãŒãªããžã§ã¯ãã¯greater_is_betterãã©ã°ãæ ŒçŽããã ãã§ãããã¹ã³ã¢ã©ãŒã䜿çšããããã³ã«ãå¿ èŠã«å¿ããŠãããšãã°GridSearchCVã§ç¬Šå·ãå転ãããããšãã§ããŸãã
ããã¯ã0.13ãªãªãŒã¹ãš0.14ãªãªãŒã¹ã®éã®éçºäžã«ã¹ã³ã¢ã©ãŒãæåã«è¡ã£ãããšã§ãããå®çŸ©ãéåžžã«é£ãããªããŸããã ãŸããã¹ã³ã¢ã©ãŒã³ãŒãã§greater_is_better
å±æ§ãæ¶ããŠãã°ãªããæ€çŽ¢ã³ãŒãã®éäžã§ã®ã¿å衚瀺ããããããã³ãŒãã远跡ããã®ãå°é£ã«ãªããŸããã çæ³çã«ã¯åçŽãªé¢æ°ãå®è¡ããããšãå®è¡ããã«ã¯ãç¹å¥ãªScorer
ã¯ã©ã¹ãå¿
èŠã§ããã
ã¹ã³ã¢ãæé©åããå Žåã¯ãã¹ã³ã¢ãæ倧åããå¿
èŠããããšæããŸãã ãŠãŒã¶ãŒãã¬ã³ããªãŒã®ããã«ãã¹ã³ã¢ã®_display_ã®ã¿ãå€æŽããçµã¿èŸŒã¿ã®ååã«åºã¥ããã¥ãŒãªã¹ãã£ãã¯ã䜿çšã§ãããã©ã¡ãŒã¿ãŒscore_is_loss
â ["auto", True, False]
ãå°å
¥ããå¯èœæ§ããããšæããŸãã
é»è»ãéããªããã°ãªããªãã£ãã®ã§ãããã¯æ¥ãã§ã®åå¿ã§ããã ã衚瀺ããšã¯ãå®éã«ã¯cross_val_score
ããã®æ»ãå€ã§ãã ã¹ã³ã¢ã©ãŒã¯ã·ã³ãã«ã§åäžã§ãªããã°ãªãããã¢ã«ãŽãªãºã ã¯åžžã«æ倧åãããã¹ãã ãšæããŸãã
ããã«ãããçµã¿èŸŒã¿ã¹ã³ã¢ã©ãŒãšã«ã¹ã¿ã ã¹ã³ã¢ã©ãŒã®éã«é察称æ§ãçããŸãã
@GaelVaroquauxã«pingãå®è¡ããŸãã
ç§ã¯score_is_lossãœãªã¥ãŒã·ã§ã³ããŸãã¯ãã®å¹æã®ããã®äœãã奜ãã§ã..ã¹ã³ã¢ãªã³ã°åã«äžèŽããããã«ç¬Šå·ãå€æŽããããšã¯ç¶æããã®ãé£ããããã§ã@larsmansãè¿°ã¹ãããã«åé¡ãåŒãèµ·ããå¯èœæ§ããããŸã
çµè«ã¯äœã§ãããã©ã®è§£æ±ºçãéžã¶ã¹ãã§ããïŒ :)
@tdomhan @jaquesgrobler @larsmansãããr2
ã«ãåœãŠã¯ãŸããã©ããç¥ã£ãŠããŸããïŒ GridSearchCV
ã«ãã£ãŠè¿ãããr2
ã¹ã³ã¢ãã ElasticNet
ã Lasso
ãããã³Ridge
ã«å¯ŸããŠã»ãšãã©è² ã§ããããšã«æ°ã¥ããŠããŸãã
R²ã¯æ£ãŸãã¯è² ã®ããããã«ãªããŸããè² ã¯ãã¢ãã«ã®ããã©ãŒãã³ã¹ãéåžžã«äœãããšãæå³ããŸãã
IIRCã @ GaelVaroquauxã¯ã greater_is_better=False
ãšãã«è² ã®æ°ãè¿ãããšãææ¡ããŸããã
r2
ã¯ã¹ã³ã¢é¢æ°ïŒå€§ããã»ã©è¯ãïŒãªã®ã§ãã¢ãã«ãè¯å¥œã§ããã°æ£ã®å€ã«ãªãã¯ãã§ãããå®éã«ã¯è² ã®å€ãã€ãŸã0ãããæªãããã©ãŒãã³ã¹ã¡ããªãã¯ã®1ã€ã§ãã
ãã®åé¡ã«é¢ããã³ã³ã»ã³ãµã¹ã¯äœã§ããïŒ ç§ã®æèŠã§ã¯ã cross_val_score
ã¯è©äŸ¡ããŒã«ã§ãããã¢ãã«éžæããŒã«ã§ã¯ãããŸããã ãããã£ãŠãå
ã®å€ãè¿ãå¿
èŠããããŸãã
ç§ãè¡ã£ãå€æŽã«ããä¿®æ£ãéåžžã«ç°¡åã«ãªããããPRïŒ2759ã§ä¿®æ£ã§ããŸãã ç§èš£ã¯ããµã€ã³ãåãã£ãŠå転ãããã®ã§ã¯ãªããã°ãªããæ€çŽ¢ãè¡ããšãã«ã¹ã³ã¢ã©ãŒã®greater_is_better
å±æ§ã«ã¢ã¯ã»ã¹ããããšã§ãã
ãã®åé¡ã«é¢ããã³ã³ã»ã³ãµã¹ã¯äœã§ããïŒ ç§ã®æèŠã§ã¯ãcross_val_scoreã¯
ã¢ãã«éžæããŒã«ã§ã¯ãªããè©äŸ¡ããŒã«ã ãããã£ãŠãããã¯æ»ãã¯ãã§ã
å ã®å€ã
ç¹æ®ãªã±ãŒã¹ã¯ãããŸããŸãªåäœããœãããŠã§ã¢ã®åé¡ã®åå ã§ãã
ãªã¹ãå
ã®ãmseãã®ååããnegated_mseãã«å€æŽããå¿
èŠããããšåçŽã«èããŠããŸã
蚱容å¯èœãªã¹ã³ã¢ãªã³ã°æååã®ã
誰ããmseãªã©ã®ååã§ã«ã¹ã¿ã ã¹ã³ã¢ã©ãŒãå®çŸ©ããå Žåã¯ã©ããªããŸããïŒ ããããåœåãã¿ãŒã³ã«åŸã£ãŠããããååãå€æŽãããã³ã¬ãŒã¿ã§ã¹ã³ã¢ã©ãŒãã©ããããå Žåã¯ã©ããªããŸããïŒ
@ogriselããå ã®ã¡ããªãã¯ãšã®äžè²«æ§ã
åãå ¥ãå¯èœãªã¹ã³ã¢ãªã³ã°æååã®ãªã¹ãã§ããmseãã®ååããnegated_mseãã«å€æŽããå¿ èŠããããšåçŽã«èããŠããŸãã
scikit-learnã®å éšãç¥ããªããã°ãããã¯å®å šã«çŽæçã§ã¯ãããŸããã ãã®ããã«ã·ã¹ãã ãæ²ããå¿ èŠãããå Žåã¯ãèšèšäžã®åé¡ãããããšã瀺ããŠãããšæããŸãã
scikit-learnã®å éšãç¥ããªããã°ãããã¯å®å šã«çŽæçã§ã¯ãããŸããã
ãã®ããã«ã·ã¹ãã ãæ²ããå¿ èŠãããå Žåãããã¯ãããšããå åã ãšæããŸã
èšèšäžã®åé¡ã
åæããŸããã 人éã¯å€ãã®äºåç¥èãæã£ãŠç©äºãç解ãã
ç°å¢ã ãããã¯ã»ãšãã©äœç³»çã§ãã ããããœãããŠã§ã¢ã«åã蟌ãããšããŠããŸã
ç¹å¥ãªå Žåã®ã»ããã®ãããªè²·ãç©ãªã¹ããæäŸããŸãã ããã¯äœãã ãã§ã¯ãããŸãã
ãœãããŠã§ã¢ã®ã¡ã³ããã³ã¹ã¯é£ããã§ãããæã£ãŠããªã人ã
ãããã®äŸå€ã¯é©ãã¹ãåäœã«ééãããã°ãæžãããšã«æ³šæããŠãã ãã
ã©ã€ãã©ãªã䜿çšããã³ãŒãã
ã©ã®ãããªç¹å¥ãªã±ãŒã¹ãèããŠããŸããïŒ
æ確ã«ããããã«ã GridSearchCV
ãªããžã§ã¯ãã«æ ŒçŽãããŠããçžäºæ€èšŒã¹ã³ã¢ã¯å
ã®å€ã§ããå¿
èŠããããšæããŸãïŒç¬Šå·ãå転ãããŠããªãïŒã
AFAIKãã°ãªããæ€çŽ¢ã®å®è£ ãå°ãç°¡åã«ããããã«ç¬Šå·ãå転ããããšãå°å ¥ãããŸãããã䜿ããããã«åœ±é¿ãäžããããšã¯æ³å®ãããŠããŸããã§ããã
ã©ã®ãããªç¹å¥ãªã±ãŒã¹ãèããŠããŸããïŒ
ããŠãããã€ãã®ã¡ããªãã¯ã§ã¯å€§ããæ¹ãè¯ãã®ã«å¯Ÿããä»ã®ã¡ããªãã¯ã§ã¯ããè¯ããšããäºå®
ããã¯å察ã§ãã
AFAIKãã°ãªããæ€çŽ¢ãè¡ãããã«ç¬Šå·ãå転ããããšãå°å ¥ãããŸãã
å®è£ ã¯å°ãç°¡åã§ããã圱é¿ããã¯ãã¯ãããŸããã§ãã
䜿ããããã
ã°ãªããæ€çŽ¢ã§ã¯ãªããé¢å¿ã®åé¢ïŒã¹ã³ã¢
ãããã«ã€ããŠäœãç¥ããªããŠã䜿çšå¯èœã§ããå¿
èŠããããŸãã
ãããã®ç¹ç°æ§ã«å¯ŸåŠããããšã¯ãã³ãŒãããŒã¹å
šäœã«åºãããŸãã æã
ãã§ã«å€ãã®ã¹ã³ã¢ãªã³ã°ã³ãŒãã
ããããããã¯åé¡ããŠãŒã¶ãŒã³ãŒãã«ãããã延æããŠããŸãã ãåŠå®ãããMSEããããããããããªãã®ã§ããŠãŒã¶ãŒã¯ã³ãŒãå ã®èšå·ãå ã«æ»ãå¿ èŠããããŸãã ããã¯ãç¹ã«è€æ°ã¡ããªãã¯ã®çžäºæ€èšŒã¬ããŒãïŒPRïŒ2759ïŒã®å Žåãåã¡ããªãã¯ãåå¥ã«åŠçããå¿ èŠããããããäžäŸ¿ã§ãã ãžã§ããªãã¯ã³ãŒããšçŽæçãªçµæãšããäž¡æ¹ã®é·æã掻çšã§ããã®ã§ã¯ãªãããšæããŸãã
ããããããã¯åé¡ããŠãŒã¶ãŒã³ãŒãã«ãããã延æããŠããŸãã 誰ãæãã§ããªã
ãåŠå®ãããMSEããããããããŠããŠãŒã¶ãŒããµã€ã³ãå ã«æ»ãå¿ èŠãããããã«ããŸã
ã³ãŒãã
確ãã«äžçã®çµããã§ã¯ãããŸããã è«æãèªããšãããŸãã¯
ãã¬ãŒã³ããŒã·ã§ã³ãèŠããšãåãåé¡ããããŸããã°ã©ããããã§ãªãå Žå
ãããã£ããç§ã¯å°ãæéãšç²Ÿç¥çãªåž¯åå¹
ã倱ããŸãã
倧ããæ¹ãè¯ããã©ãããå€æããŸãã
ããã¯ãç¹ã«è€æ°ã¡ããªãã¯ã®çžäºæ€èšŒã§ã¯äžäŸ¿ã§ãã
åã¡ããªãã¯ãåå¥ã«åŠçããå¿ èŠããããããã¬ããŒãïŒPRïŒ2759ïŒã
ãªãã åžžã«å€§ããæ¹ãè¯ãããšãåãå
¥ããã ãã§ã
çµæã®è§£éãå«ãããã¹ãŠãç°¡åã«ãªããŸãã
ãžã§ããªãã¯ã³ãŒããšãžã§ããªãã¯ã³ãŒãã®äž¡æ¹ã®é·æã掻çšã§ãããã©ããçåã«æããŸã
çŽæçãªçµæã
ãªã¹ã¯ã¯ãã¡ã³ããã³ã¹ã®ããã«ç§ãã¡ãé
ãããéåžžã«è€éãªã³ãŒããæã€ããšã§ã
ãšéçºã Scikit-learnã¯éèŠæ§ãå¢ããŠããŸãã
åžžã«å€§ããæ¹ãè¯ãããšãåãå ¥ããã ããªã
ããã¯åœŒå¥³ãèšã£ãããšã§ã ïŒïŒ
ãã£ãšæ·±å»ãªããšã«ãããã人ã
ãæ··ä¹±ãããŠããçç±ã®1ã€ã¯ã cross_val_score
ã®åºåãã¡ããªãã¯ãšäžèŽããŠããªãããã ãšæããŸãã ããªãã®è«çã«åŸãå Žåãsklearn.metricsã®ãã¹ãŠã®ã¡ããªãã¯ã¯ã倧ããã»ã©è¯ããã«åŸãå¿
èŠããããŸãã
ããã¯åœŒå¥³ãèšã£ãããšã§ã ïŒïŒ
è¯ãã§ããïŒ
ãã£ãšçå£ã«ãããã人ã ãæ··ä¹±ãããŠããçç±ã®1ã€ã¯
cross_val_scoreã®åºåãã¡ããªãã¯ãšäžèŽããŠããŸããã ããç§é
ããªãã®è«çã«åŸã£ãŠãã ãããsklearn.metricsã®ãã¹ãŠã®ã¡ããªãã¯ã¯ããã倧ãããã«åŸãå¿ èŠããããŸã
åªããŠãã"ã
åæããŸããã ã ããç§ã¯ååãå€æŽãããšããã¢ã€ãã¢ã奜ãã§ãïŒããã¯ãããã¢ããããŸã
人ã
ã®ç®ã«ã
ããã«æ·±å»ãªããšã«ãããã人ã ãæ··ä¹±ãããçç±ã®1ã€ã¯ãcross_val_scoreã®åºåãã¡ããªãã¯ãšäžèŽããŠããªãããã ãšæããŸãã
ãããŠãããã«ããã scoring
ã¯ãããããç¥ç§çã«èŠããŸãã
ç·åœ¢ååž°ãè¡ãããšãããšãä»æ¥0.16.1ã§ããã«åãŸããŸããã ã¹ã³ã¢ã®ç¬Šå·ã¯ãåé¡åšã§ã¯å転ãããŠããªãããã§ãããç·åœ¢ååž°ã§ã¯å転ãããŠããŸãã æ··ä¹±ãå©é·ããããã«ãLinearRegression.scoreïŒïŒã¯ãå転ãããŠããªãããŒãžã§ã³ã®ã¹ã³ã¢ãè¿ããŸãã
ãã¹ãŠãäžè²«æ§ã®ãããã®ã«ããç·åœ¢ã¢ãã«ã§ã笊å·ãå転ããªãã¹ã³ã¢ãè¿ãããšããå§ãããŸãã
äŸïŒ
from sklearn import linear_model
from sklearn.naive_bayes import GaussianNB
from sklearn import cross_validation
from sklearn import datasets
iris = datasets.load_iris()
nb = GaussianNB()
scores = cross_validation.cross_val_score(nb, iris.data, iris.target)
print("NB score:\t %0.3f" % scores.mean() )
iris_reg_data = iris.data[:,:3]
iris_reg_target = iris.data[:,3]
lr = linear_model.LinearRegression()
scores = cross_validation.cross_val_score(lr, iris_reg_data, iris_reg_target)
print("LR score:\t %0.3f" % scores.mean() )
lrf = lr.fit(iris_reg_data, iris_reg_target)
score = lrf.score(iris_reg_data, iris_reg_target)
print("LR.score():\t %0.3f" % score )
ããã¯äžããïŒ
NB score: 0.934 # sign is not flipped
LR score: -0.755 # sign is flipped
LR.score(): 0.938 # sign is not flipped
亀差æ€å®ã¯ã倧ããã»ã©è¯ãã¢ãã«ã®ãã¹ãŠã®å åãå転ãããŸãã ç§ã¯ãŸã ãã®æ±ºå®ã«åæããŸããã ãã®äž»ãªæ¯æè ã¯@GaelVaroquauxãšãããã@mblondelã ã£ããšæããŸã[ã¹ã³ã¢ã©ãŒã³ãŒãããªãã¡ã¯ã¿ãªã³ã°ããããšãæãåºããŸãã]ã
ãããæ°ã«ããªãã§ãã ããããã¹ãŠã®è°è«ã¯äžã«ãããŸãã
mseã§ã¯ããã©ã«ãã§ç¬Šå·ãå転ããããã«æããŸãããr2ã¯ããã«çŽæçã§ã¯ãããŸããïŒ-/
@Huitzilo GaussianNBã¯åé¡åã§ãããããã©ã«ãã®ã¹ã³ã¢ã©ãŒãšããŠç²ŸåºŠã䜿çšããŸãã LinearRegressionã¯ãªã°ã¬ããµãŒã§ãããããã©ã«ãã®ã¹ã³ã¢ã©ãŒãšããŠr2ã¹ã³ã¢ã䜿çšããŸãã 2çªç®ã®ã¹ã³ã¢ã¯è² ã§ãããr2ã¹ã³ã¢ã¯è² ã«ãªãå¯èœæ§ãããããšã«æ³šæããŠãã ããã ãŸããã¢ã€ãªã¹ã¯ãã«ãã¯ã©ã¹ã®ããŒã¿ã»ããã§ãã ãããã£ãŠãã¿ãŒã²ããã¯ã«ããŽãªã«ã«ã§ãã ãªã°ã¬ããµã¯äœ¿çšã§ããŸããã
ããã§ããç§ã¯äœãèµ·ãããã«ã€ããŠå°ãæ··ä¹±ããŠããŸãããr2ã¯å転ãããŠããŸãã... mseã ããå転ããŸãã
ãã¶ããåé¡å
šäœã®è§£æ±ºçã¯ãç©ã®ååãnegmse
å€æŽããããšã§ããïŒ
@mblondelãã¡ããããªãã¯æ£ããã§ããããããªããã ç§ã¯ååž°ã®äŸããã°ãããŸãšããŠããŸãããè¹åœ©ããŒã¿ã«èªä¿¡éå°ã ã£ãã®ã§ãä»ã®æ©èœããæ©èœïŒ4ãäºæž¬ããããšã§ããŸããããšæããŸããïŒæ£ã®R2ã§ïŒã ãããããããããè² ã®R2ã§ã¯ãããŸããã§ããã ããã§ã¯ãµã€ã³ãå転ããŠããŸããã OKã ç§ã®æªãã
ããã§ãã cross_val_score
ããååŸããMSEã§ã¯ç¬Šå·ãå転ããŠããŸãã
ãã¶ãããã¯ç§ã ãã§ããããã®ççŸã¯éåžžã«æ··ä¹±ããŠãããšæããŸãïŒãããç§ããã®åé¡ã«å·»ã蟌ãã çç±ã§ãïŒã R2ã§ã¯ãªãMSEã笊å·å転ããå¿ èŠãããã®ã¯ãªãã§ããïŒ
ãã¶ãããã¯ç§ã ãã§ããããã®ççŸã¯éåžžã«æ··ä¹±ããŠãããšæããŸãïŒãããç§ããã®åé¡ã«å·»ã蟌ãã çç±ã§ãïŒã R2ã§ã¯ãªãMSEã笊å·å転ããå¿ èŠãããã®ã¯ãªãã§ããïŒ
ã¹ã³ã¢ã®ã»ãã³ãã£ã¯ã¹ãé«ãã»ã©è¯ãã§ãã é«MSEã¯æªãã§ãã
å€ånegmseã¯åé¡ã解決ããã§ããã
@amuelleråæããŸããã¹ã³ã¢ãªã³ã°ãã©ã¡ãŒã¿ã®ååã§ç¬Šå·ãæ瀺çã«å転ããããšãæ··ä¹±ãé¿ããã®ã«ééããªã圹ç«ã¡ãŸãã
ãã¶ãã[1]ã®ããã¥ã¡ã³ãã¯ãããã€ãã®ã¹ã³ã¢ã§èšå·ãã©ã®ããã«å転ããŠãããã«ã€ããŠãããã«æ確ã«ãªã£ãŠããå¯èœæ§ããããŸãã ç§ã®å Žåãç§ã¯ããã«æ å ±ãå¿ èŠãšãã3.1.1.1ã®äžã®è¡šã ããèŠãŸããããããã¹ããèªã¿ãŸããã§ããïŒããã¯ã倧ããã»ã©è¯ããååã説æããŠããŸãïŒã IMHOã¯ã3.1.1.1ã®äžã®è¡šã«ãmseãäžå€®å€ãããã³å¹³å絶察誀差ã®ã³ã¡ã³ããè¿œå ããŠããããã®åŠå®ã瀺ããŠããŸãããå®éã®ã³ãŒããå€æŽããããšãªãããã§ã«å€§ãã«åœ¹ç«ã¡ãŸãã
[1] http://scikit-learn.org/stable/modules/model_evaluation.html#scoring -parameter
ç§ã¯éåžžã«èå³æ·±ãã±ãŒã¹ã«åºããããŸããïŒ
from sklearn.cross_validation import cross_val_score
model = LinearRegression()
scores = cross_val_score(model, X, target, cv=2, scoring='r2')
scores
çµæã¯
array([-0.17026282, -2.21315179])
åãããŒã¿ã»ããã®å Žåã次ã®ã³ãŒã
model = LinearRegression()
model.fit(X, target)
prediction = model.predict(X)
print r2_score(target, prediction)
劥åœãªå€ã«ãªããŸã
0.353035789318
ç·åœ¢ååž°ã¢ãã«ïŒåçããïŒã®AFAIKã¯ãR ^ 2> 1ãŸãã¯R ^ 2 <0ãååŸã§ããŸããã
ãããã£ãŠãcvã®çµæã¯ã笊å·ãå転ããR ^ 2ã®ããã«ã¯èŠããŸããã ç§ã¯ããæç¹ã§ééã£ãŠããŸããïŒ
r2ã¯è² ã«ãªãå¯èœæ§ããããŸãïŒäžè¯ã¢ãã«ã®å ŽåïŒã 1ãã倧ããããããšã¯ã§ããŸããã
ããªãã¯ããããéå°é©åããŠããŸãã è©ŠããŠãã ããïŒ
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, target, test_size=0.2, random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
pred_train = model.predict(X_train)
print("train r2: %f" % r2_score(y_train, pred_train))
pred_test = model.predict(X_test)
print("test r2: %f" % r2_score(y_test, pred_test))
ã©ã³ãã åå²ãå¶åŸ¡ããrandom_state
æŽæ°ã·ãŒãã«ããŸããŸãªå€ãè©ŠããŠãã ããã
å€ånegmseã¯åé¡ã解決ããã§ããã
'neg_mse'ã®å Žåã¯+1ïŒã¢ã³ããŒã¹ã³ã¢ã䜿çšãããšèªã¿ããããªããšæããŸãïŒã
ããã¯ãã¹ãŠã®åé¡ã解決ããŸããïŒ ä»ã®ã¹ã³ã¢ã倧ããã£ãã®ã¯è¯ããªãã§ããïŒ
ãããïŒ
log_loss
mean_absolute_error
median_absolute_error
doc/modules/model_evaluation.rst
ã«ãããšãããããã¹ãŠã®ã¯ãã§ãã
ãããŠhinge_loss
ã ãšæããŸããïŒ
ããããã¹ãŠã®æ倱ã«neg_
ãã¬ãã£ãã¯ã¹ãè¿œå ããã®ã¯ãåä»ã ãšæããŸãã
ã¢ã€ãã¢ã¯ïŒç¬Šå·å転ãªãã§ïŒå
ã®ã¹ã³ã¢ãè¿ãããšã§ãããndarrayãè¿ã代ããã«ã best()
ã arg_best()
ã best_sorted()
ãããªã¡ãœããã§ndarrayãæ¡åŒµããã¯ã©ã¹ãè¿ããŸãã
ãã³ãžæ倱ã®ã¹ã³ã¢ã©ãŒã¯ãããŸããïŒãããŠãè©äŸ¡ã«äœ¿çšãããŠããã®ãèŠãããšããããŸããïŒã
ã¹ã³ã¢ã©ãŒã¯numpyé
åãè¿ããŸããããããŒããè¿ããŸãããïŒ
ã«ã¹ã¿ã ">"ãæã£ãŠãããããããŒãã®ããã«èŠããã¹ã³ã¢ãªããžã§ã¯ããè¿ãããšãã§ããŸãã
ããã¯ãGridSearchCVã§äœ¿çšãããããŒã«å€ãlower_is_betterãã§ã¹ã³ã¢ã©ãŒã«ã¿ã°ãä»ããŠãã以åã®ãœãªã¥ãŒã·ã§ã³ãããäžèªç¶ã«æããŸãã
cross_val_score
ã¯é
åãè¿ããŸãã
å®éã cross_val_score
ã«ãã£ãŠè¿ãããã¹ã³ã¢ã¯ãéåžžã䞊ã¹æ¿ããå¿
èŠã¯ãªããå¹³ååããã ãã§ãã
ãã1ã€ã®ã¢ã€ãã¢ã¯ã sorted
ã¡ãœããã_BaseScorer
ã«è¿œå ããããšã§ãã
my_scorer = make_scorer(my_metric, greater_is_better=False)
scores = my_scorer.sorted(scores) # takes into account my_scorer._sign
best = scores[0]
cross_val_score
ã¯é
åãè¿ããŸãããã¹ã³ã¢ã©ãŒã¯æµ®åå°æ°ç¹æ°ãè¿ããŸãã GridSearchCVãšä»ã®ãã¹ãŠã®CVãªããžã§ã¯ãã§åãåäœããããã®ã§ã cross_val_score
ç¹å®ã®ããžãã¯ãããã®ã¯å¥åŠã ãšæããŸãã
GridSearchCVã§ã¯æé«ã®ã¹ã³ã¢ãšæé«ã®ã€ã³ããã¯ã¹ãå¿ èŠãªãããargsortã¡ãœãããå¿ èŠã«ãªããŸãã
scikit-learnã«ãã£ãŠãå¶åŸ¡è³ªåããåŽåè ã®èª€å·®ã®å¹³åãšåæ£ãæšå®ããäºæž¬ã®æšå®ãã€ã¢ã¹ãé€å»ããåŸã«å éå¹³åãèšç®ããããå®è£ ããã«ã¯ã©ãããã°ããã§ããïŒ
IIRCã¯ã¹ããªã³ãã§ããã«ã€ããŠè©±ãåãïŒå»å¹Žã®å€ïŒïŒïŒã neg_mse
ïŒãŸãã¯neg-mse
ïŒã䜿çšããŠãçŸåšè² ã®ç¬Šå·ããããã¹ãŠã®ã¹ã³ã¢ã©ãŒ/æååãéæšå¥šã«ããããšã«ããŸããã
ããã¯ãŸã ã³ã³ã»ã³ãµã¹ã§ããïŒ ãããªã0.18ããåã«ãããããã¹ãã§ãã
Ping @GaelVaroquaux @agramfort @jnothman @ogrisel @raghavrv
ã¯ããneg_mseAFAIKã«åæããŸãã
neg_mse
ãŸãã次ã®ãã®ãå¿ èŠã§ãã
neg_log_loss
neg_mean_absolute_error
neg_median_absolute_error
ã¢ãã«= SequentialïŒïŒ
keras.layers.FlattenïŒïŒ
model.addïŒDenseïŒ11ãinput_dim = 3ãkernel_initializer = keras.initializers.he_normalïŒseed = 2ïŒã
kernel_regularizer = regularizers.l2ïŒ2ïŒïŒïŒ
keras.layers.LeakyReLUïŒalpha = 0.1ïŒ
model.addïŒDenseïŒ8ãkernel_initializer = keras.initializers.he_normalïŒseed = 2ïŒïŒïŒ
keras.layers.LeakyReLUïŒalpha = 0.1ïŒ
model.addïŒDenseïŒ4ãkernel_initializer = keras.initializers.he_normalïŒseed = 2ïŒïŒïŒ
keras.layers.LeakyReLUïŒalpha = 0.1ïŒ
model.addïŒDenseïŒ1ãkernel_initializer = keras.initializers.he_normalïŒseed = 2ïŒïŒïŒ
keras.layers.LeakyReLUïŒalpha = 0.2ïŒ
adag = RMSpropïŒlr = 0.0002ïŒ
model.compileïŒloss = losses.mean_squared_errorã
ãªããã£ãã€ã¶ãŒ=ã¢ãã°
ïŒ
history = model.fitïŒX_trainãY_trainãepochs = 2000ã
batch_size = 20ãshuffle = TrueïŒ
äžèšã®ã³ãŒããçžäºæ€èšŒããæ¹æ³ã¯ïŒ ããã§äœ¿çšããã¯ãã¹æ€èšŒæ¹æ³ã1ã€é€å€ããããšæããŸãã
@shreyassksããã¯ããªãã®è³ªåã®æ£ããå Žæã§ã¯ãããŸããããç§ã¯ããããã§ãã¯ããŸãïŒ https ïŒ //keras.io/scikit-learn-api ã ãããã¯ãŒã¯ãscikit-learn
æšå®åšã§ã©ããããŠãããw / model_selection.cross_val_score
ã¯ãã å šããã£ãŠåãæèŠã§ãïŒ ããã¯Brier_score_lossã«ãçºçããBrier_score_lossã䜿çšãããšå®å šã«æ£åžžã«æ©èœããŸãããGridSearchCVããååŸãããšæ··ä¹±ããè² ã®Brier_score_lossãè¿ãããŸãã å°ãªããšããBrier_score_lossã¯æ倱ã§ããããïŒäœãã»ã©è¯ãïŒãããã§ã®ã¹ã³ã¢ãªã³ã°é¢æ°ã¯ç¬Šå·ãå転ããŠè² ã«ããããã次ã®ãããªåºåã®æ¹ãé©åã§ãã
cross_val_scoreã¯ãçµæã®çµ¶å¯Ÿå€ã«å®å šã«çŠç¹ãåãããå¿ èŠããããšããèãæ¹ã§ãã ç§ã®ç¥ãéããcross_val_scoreã®MSEïŒå¹³åäºä¹èª€å·®ïŒã«å¯ŸããŠååŸãããè² ã®ç¬Šå·ïŒ-ïŒã®éèŠæ§ã¯äºåå®çŸ©ãããŠããŸããã ãã®åé¡ãåŠçãããsklearnã®æŽæ°ããŒãžã§ã³ãåŸ ã¡ãŸãããã
ååž°ãŠãŒã¹ã±ãŒã¹ã®å ŽåïŒ
model_score = cross_val_scoreïŒmodelãdf_inputãdf_targetãscoring = 'neg_mean_squared_error'ãcv = 3ïŒ
ç§ã¯æ¬¡ã®ããã«å€ãååŸããŠããŸãïŒ
SVRïŒ
[-6.20938025 -1.397376 -1.94519]
-3.183982080147279
ç·åœ¢ååž°ïŒ
[-5.94898085 -9.30931808 -1.15760676]
-5.4719685646934275
ãªããªãïŒ
[-7.22363814 -10.47734135 -2.20807684]
-6.6363521107522345
海嶺ïŒ
[-5.95990385 -4.17946756 -1.36885809]
-3.8360764993832004
ã§ã¯ãã©ã¡ããæé©ã§ããïŒ
SVRïŒ
ååž°ãŠãŒã¹ã±ãŒã¹ã®å ŽåïŒ
䜿çšãããšç°ãªãçµæãåŸãããŸã
ïŒ1ïŒscoring = 'neg_mean_squared_error'ã® "cross_val_score"
ãããŠ
ïŒ2ïŒãGridSearchCVãã䜿çšããŠãbest_score_ãããã§ãã¯ãããšãã®åãå
¥åã®å Žå
ååž°ã¢ãã«ã®å Žåãã©ã¡ããåªããŠããŸããïŒ
@pritishban
ããªãã¯äœ¿çšæ³ã®è³ªåãããŠããŸãã 課é¡è¿œè·¡ã·ã¹ãã ã¯ãäž»ã«ãã°ãšæ°æ©èœã察象ãšããŠããŸãã 䜿çšæ³ã«é¢ãã質åã«ã€ããŠã¯ã StackOverflowãŸãã¯ã¡ãŒãªã³ã°ãªã¹ããè©Šãããšããå§ãããŸãã
æãåèã«ãªãã³ã¡ã³ã
å€ånegmseã¯åé¡ã解決ããã§ããã