Xgboost: xgboost๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ต์ฐจ ๊ฒ€์ฆ ํ›„ ์˜ˆ์ธก [์งˆ๋ฌธ]

์— ๋งŒ๋“  2014๋…„ 11์›” 01์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: dmlc/xgboost

This is my first trial with xgboost (very fast!). ์ด๊ฒƒ์€ xgboost๋ฅผ ์‚ฌ์šฉํ•œ ์ฒซ ๋ฒˆ์งธ ์‹œ๋„์ž…๋‹ˆ๋‹ค(๋งค์šฐ ๋น ๋ฆ„!). But I'm a little bit confused . ํ•˜์ง€๋งŒ ์กฐ๊ธˆ ํ˜ผ๋ž€์Šค๋Ÿฝ์Šต๋‹ˆ๋‹ค.
In fact, I trained a model using xgb.cv as follows: ์‹ค์ œ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์ด xgb.cv๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ–ˆ์Šต๋‹ˆ๋‹ค.
xgbmodel=xgb.cv(params=param, data=trainingdata, nrounds=100, nfold=5,showsd=T,metrics='logloss') xgbmodel=xgb.cv(params=param, data=trainingdata, nrounds=100, nfold=5,showsd=T,metrics='logloss')
Now I want to predict with my test set but xgbmodel seems to be a logical value (TRUE in this case) ์ด์ œ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ์˜ˆ์ธกํ•˜๊ณ  ์‹ถ์ง€๋งŒ xgbmodel์€ ๋…ผ๋ฆฌ์  ๊ฐ’์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค(์ด ๊ฒฝ์šฐ TRUE).
How could I predict after cv? cv ํ›„์— ์–ด๋–ป๊ฒŒ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ? Should I use xgb.train then? ๊ทธ๋Ÿฌ๋ฉด xgb.train์„ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ?
HR ์ธ์‚ฌ

en

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

Yes, the xgb.cv does not return the model, but the cv history of the process. ์˜ˆ, xgb.cv๋Š” ๋ชจ๋ธ์„ ๋ฐ˜ํ™˜ํ•˜์ง€ ์•Š์ง€๋งŒ ํ”„๋กœ์„ธ์Šค์˜ cv ๊ธฐ๋ก์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. Since in cv we are training n models to evaluate the result. cv์—์„œ ์šฐ๋ฆฌ๋Š” ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด n๊ฐœ์˜ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—.

A normal use case of cv is to select parameters, so usually you use cv to find a good parameter, and use xgb.train to train the model on the entire dataset cv์˜ ์ผ๋ฐ˜์ ์ธ ์‚ฌ์šฉ ์‚ฌ๋ก€๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด๋ฏ€๋กœ ์ผ๋ฐ˜์ ์œผ๋กœ cv๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ข‹์€ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ฐพ๊ณ  xgb.train์„ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒด ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

en

๋ชจ๋“  3 ๋Œ“๊ธ€

Yes, the xgb.cv does not return the model, but the cv history of the process. ์˜ˆ, xgb.cv๋Š” ๋ชจ๋ธ์„ ๋ฐ˜ํ™˜ํ•˜์ง€ ์•Š์ง€๋งŒ ํ”„๋กœ์„ธ์Šค์˜ cv ๊ธฐ๋ก์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. Since in cv we are training n models to evaluate the result. cv์—์„œ ์šฐ๋ฆฌ๋Š” ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด n๊ฐœ์˜ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—.

A normal use case of cv is to select parameters, so usually you use cv to find a good parameter, and use xgb.train to train the model on the entire dataset cv์˜ ์ผ๋ฐ˜์ ์ธ ์‚ฌ์šฉ ์‚ฌ๋ก€๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด๋ฏ€๋กœ ์ผ๋ฐ˜์ ์œผ๋กœ cv๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ข‹์€ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ฐพ๊ณ  xgb.train์„ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒด ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

en

์ข‹์•„, ์ด์ œ ๋” ๋ช…ํ™•ํ•ด

en

Hi, ์•ˆ๋…•,

There is a parameter prediction=TRUE in xgb.cv, which returns the prediction of cv folds. xgb.cv์—๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ predict=TRUE๊ฐ€ ์žˆ์œผ๋ฉฐ, ์ด๋Š” cv ํด๋“œ์˜ ์˜ˆ์ธก์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. But it is not clear from the document that for which nround, the predictions are returned? ๊ทธ๋Ÿฌ๋‚˜ ๋ฌธ์„œ์—์„œ ์–ด๋–ค nround์— ๋Œ€ํ•ด ์˜ˆ์ธก์ด ๋ฐ˜ํ™˜๋˜๋Š”์ง€ ๋ช…ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

en
์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰